Amanda::MainLoop - Perl interface to the Glib MainLoop
use Amanda::MainLoop;
my $to = Amanda::MainLoop::timeout_source(2000);
$to->set_callback(sub {
print "Time's Up!\n";
$to->remove(); # dont' re-queue this timeout
Amanda::MainLoop::quit(); # return from Amanda::MainLoop::run
});
Amanda::MainLoop::run();
Note that all functions in this module are individually available for export, e.g.,
use Amanda::MainLoop qw(run quit);
The main event loop of an application is a tight loop which waits for events, and calls functions to respond to those events. This design allows an IO-bound application to multitask within a single thread, by responding to IO events as they occur instead of blocking on particular IO operations.
The Amanda security API, transfer API, and other components rely on the event loop to allow them to respond to their own events in a timely fashion.
The overall structure of an application, then, is to initialize its state, register callbacks for some events, and begin looping. In each iteration, the loop waits for interesting events to occur (data available for reading or writing, timeouts, etc.), and then calls functions to handle those interesting things. Thus, the application spends most of its time waiting. When some application-defined state is reached, the loop is terminated and the application cleans up and exits.
The Glib main loop takes place within a call to Amanda::MainLoop::run()
. This function executes until a call to Amanda::MainLoop::quit()
occurs, at which point run()
returns. You can check whether the loop is running with Amanda::MainLoop::is_running()
.
The functions in this section are intended to make asynchronous programming as simple as possible. They are implemented on top of the interfaces described in the LOW-LEVEL INTERFACE section.
In most cases, a callback does not need to be invoked immediately. In fact, because Perl does not do tail-call optimization, a long chain of callbacks may cause the perl stack to grow unnecessarily.
The solution is to queue the callback for execution on the next iteration of the main loop, and call_later($cb, @args)
does exactly this.
sub might_delay {
my ($cb) = @_;
if (can_do_it_now()) {
my $result = do_it();
Amanda::MainLoop::call_later($cb, $result)
} else {
# ..
}
}
When starting the main loop, an application usually has a sub that should run after the loop has started. call_later
works in this situation, too.
my $main = sub {
# ..
Amanda::MainLoop::quit();
};
Amanda::MainLoop::call_later($main);
# ..
Amanda::MainLoop::run();
As an optimization, make_cb
wraps a sub with a call to call_later while also naming the sub (using Sub::Name
, if available):
my $fetched_cb = make_cb(fetched_cb => sub {
# .. callback body
}
In general, make_cb
should be used whenever a callback is passed to some other library. For example, the Changer API (see Amanda::Changer) might be invoked like this:
my $reset_finished_cb = make_cb(reset_finished_cb => sub {
my ($err) = @_;
die "while resetting: $err" if $err;
# ..
});
Be careful not to use make_cb
in cases where some action must take place before the next iteration of the main loop. In practice, this means make_cb
should be avoided with file-descriptor callbacks, which will trigger repeatedly until the descriptors' needs are addressed.
make_cb
is exported automatically.
Sometimes you need the MainLoop equivalent of sleep()
. That comes in the form of call_later($delay, $cb, @args)
, which takes a delay (in milliseconds), a sub, and an arbitrary number of arguments. The sub is called with the arguments after the delay has elapsed.
sub countdown {
my $counter;
$counter = sub {
print "$i..\n";
if ($i) {
Amanda::MainLoop::call_after(1000, $counter, $i-1);
}
}
$counter->(10);
}
The function returns the underlying event source (see below), enabling the caller to cancel the pending call:
my $tosrc = Amanda::MainLoop::call_after(15000, $timeout_cb):
# ...data arrives before timeout...
$tosrc->remove();
To monitor a child process for termination, give its pid to call_on_child_termination($pid, $cb, @args)
. When the child exits for any reason, this will collect its exit status (via waitpid
) and call $cb
as
$cb->($exitstatus, @args);
Like call_after
, this function returns the event source to allow early cancellation if desired.
async_read(
fd => $fd,
size => $size, # optional, default 0
async_read_cb => $async_read_cb,
args => [ .. ]); # optional
This function will read $size
bytes when they are available from file descriptor $fd
, and invoke the callback with the results:
$async_read_cb->($err, $buf, @args);
If $size
is zero, then the callback will get whatever data is available as soon as it is available, up to an arbitrary buffer size. If $size
is nonzero, then a short read may still occur if $size
bytes do not become available simultaneously. On EOF, $buf
will be the empty string. It is the caller's responsibility to set $fd
to non-blocking mode. Note that not all operating sytems generate errors that might be reported here. For example, on Solaris an invalid file descriptor will be silently ignored.
The return value is an event source, and calling its remove
method will cancel the read. It is an error to have more than one async_read
operation on a single file descriptor at any time, and will lead to unpredictable results.
This function adds a new FdSource every time it is invoked, so it is not well-suited to processing large amounts of data. For that purpose, consider using the low-level interface or, better, the transfer architecture (see Amanda::Xfer).
async_write(
fd => $fd,
data => $data,
async_write_cb => $async_write_cb,
args => [ .. ]); # optional
This function will write $data
to file descriptor $fd
and invoke the callback with the number of bytes written:
$cb->($err, $bytes_written, @args);
If $bytes_written
is less than then length of <$data>, then an error occurred, and is given in $err
. As for async_read
, the caller should set $fd
to non-blocking mode. Multiple parallel invocations of this function for the same file descriptor are allowed and will be serialized in the order the calls were made:
async_write($fd, "HELLO!\n",
async_write_cb => make_cb(wrote_hello => sub {
print "wrote 'HELLO!'\n";
}));
async_write($fd, "GOODBYE!\n",
async_write_cb => make_cb(wrote_goodbye => sub {
print "wrote 'GOODBYE!'\n";
}));
In this case, the two strings are guaranteed to be written in the same order, and the callbacks will be called in the correct order.
Like async_read, this function may add a new FdSource every time it is invoked, so it is not well-suited to processing large amounts of data.
Java has the notion of a "synchronized" method, which can only execute in one thread at any time. This is a particular application of a lock, in which the lock is acquired when the method begins, and released when it finishes.
With Amanda::MainLoop
, this functionality is generally not needed because there is no unexpected preemeption. However, if you break up a long-running operation (that doesn't allow concurrency) into several callbacks, you'll need to ensure that at most one of those operations is going on at a time. The synchronized
function manages that for you.
The function takes a $lock
argument, which should be initialized to an empty arrayref ([]
). It is used like this:
use Amanda::MainLoop 'synchronized';
# ..
sub dump_data {
my $self = shift;
my ($arg1, $arg2, $dump_cb) = @_;
synchronized($self->{'lock'}, $dump_cb, sub {
my ($dump_cb) = @_; # IMPORTANT! See below
$self->do_dump_data($arg1, $arg2, $dump_cb);
};
}
Here, do_dump_data
may take a long time to complete (perhaps it starts a long-running data transfer) but only one such operation is allowed at any time and other Amanda::MainLoop
callbacks may occur (e.g. a timeout). When the critical operation is complete, it calls $dump_cb
which will release the lock before transferring control to the caller.
Note that the $dump_cb
in the inner sub
shadows that in dump_data
-- this is intentional, the a call to the the inner $dump_cb
is how synchronized
knows that the operation has completed.
Several methods may be synchronized with one another by simply sharing the same lock.
When writing asynchronous code, it's easy to write code that is *very* difficult to read or debug. The suggestions in this section will help write code that is more readable, and also ensure that all asynchronous code in Amanda uses similar, common idioms.
Most often, callbacks are short, and can be specified as anonymous subs. They should be specified with make_cb, like this:
some_async_function(make_cb(foo_cb => sub {
my ($x, $y) = @_;
# ...
}));
If a callback is more than about two lines, specify it in a named variable, rather than directly in the function call:
my $foo_cb = make_cb(foo_cb => sub {
my ($src) = @_;
# .
# . long function
# .
});
some_async_function($foo_cb);
When using callbacks from an object-oriented package, it is often useful to treat a method as a callback. This requires an anonymous sub "wrapper", which can be written on one line:
some_async_function(sub { $self->foo_cb(@_) });
The single most important factor in readability is linearity. If a function that performs operations A, B, and C in that order, then the code for A, B, and C should appear in that order in the source file. This seems obvious, but it's all too easy to write
sub three_ops {
my $do_c = sub { .. };
my $do_b = sub { .. $do_c->() .. };
my $do_a = sub { .. $do_b->() .. };
$do_a->();
}
Which isn't very readable. Be readable.
Amanda's use of callbacks emulates continuation-passing style. As such, when a function finishes -- whether successfully or with an error -- it should call a single callback. This ensures that the function has a simple control interface: perform the operation and call the callback.
Some operations require a long squence of asynchronous operations. For example, often the results of one operation are required to initiate another. The step syntax is useful to make this much more readable, and also eliminate some nasty reference-counting bugs. The idea is that each "step" in the process gets its own sub, and then each step calls the next step. The first step defined will be called automatically.
sub send_file {
my ($hostname, $port, $data, $sendfile_cb) = @_;
my ($addr, $socket); # shared lexical variables
my $steps = define_steps
cb_ref => \$sendfile_cb;
step lookup_addr => sub {
return async_gethostbyname(hostname => $hostname,
ghbn_cb => $steps->{'got_addr'});
};
step ghbn_cb => sub {
my ($err, $hostinfo) = @_;
die $err if $err;
$addr = $hostinfo->{'ipaddr'};
return $steps->{'connect'}->();
};
step connect => sub {
return async_connect(
ipaddr => $addr,
port => $port,
connect_cb => $steps->{'connect_cb'},
);
};
step connect_cb => sub {
my ($err, $conn_sock) = @_;
die $err if $err;
$socket = $conn_sock;
return $steps->{'write_block'}->();
};
# ...
}
The define_steps
function sets the stage. It is given a reference to the callback for this function (recall there is only one exit point!), and "patches" that reference to free $steps
, which otherwise forms a reference loop, on exit.
WARNING: if the function or method needs to do any kind of setup before its first step, that setup should be done either in a setup
step or before the define_steps
invocation. Do not write any statements other than step declarations after the define_steps
call.
Note that there are more steps in this example than are strictly necessary: the body of connect
could be appended to ghbn_cb
. The extra steps make the overall operation more readable by adding "punctuation" to separate the task of handling a callback (ghbn_cb
) from starting the next operation (connect
).
Also note that the enclosing scope contains some lexical (my
) variables which are shared by several of the callbacks.
All of the steps are wrapped by make_cb
, so each step will be executed on a separate iteration of the MainLoop. This generally has the effect of making asynchronous functions share CPU time more fairly. Sometimes, especially when using the low-level interface, a callback must be called immediately. To achieve this for all callbacks, add immediate => 1
to the define_steps
invocation:
my $steps = define_steps
cb_ref => \$finished_cb,
immediate => 1;
To do the same for a single step, add the same keyword to the step
invocation:
step immediate => 1,
connect => sub { .. };
In some case, you want to execute some code when the step finish, it can be done by defining a finalize code in define_steps:
my $steps = define_steps
cb_ref => \$finished_cb,
finalize => sub { .. };
With slow operations, it is often useful to perform multiple operations simultaneously. As an example, the following code might run two system commands simultaneously and capture their output:
sub run_two_commands {
my ($finished_cb) = @_;
my $running_commands = 0;
my ($result1, $result2);
my $steps = define_steps
cb_ref => \$finished_cb;
step start => sub {
$running_commands++;
run_command($command1,
run_cb => $steps->{'command1_done'});
$running_commands++;
run_command($command2,
run_cb => $steps->{'command2_done'});
};
step command1_done => sub {
$result1 = $_[0];
$steps->{'maybe_done'}->();
};
step command2_done => sub {
$result2 = $_[0];
$steps->{'maybe_done'}->();
};
step maybe_done => sub {
return if --$running_commands; # not done yet
$finished_cb->($result1, $result2);
};
}
It is tempting to optimize out the $running_commands
with something like:
step maybe_done { ## BAD!
return unless defined $result1 and defined $result2;
$finished_cb->($result1, $result2);
}
However this can lead to trouble. Remember that define_steps automatically applies make_cb
to each step, so a maybe_done
is not invoked immediately by command1_done
and command2_done
- instead, maybe_done
is scheduled for invocation in the next loop of the mainloop (via call_later
). If both commands finish before maybe_done
is invoked, call_later
will be called twice, with both $result1
and $result2
defined both times. The result is that $finished_cb
is called twice, and mayhem ensues.
This is a complex case, but worth understanding if you want to be able to debug difficult MainLoop bugs.
When designing a library or interface that will accept and invoke callbacks, follow these guidelines so that users of the interface will not need to remember special rules.
Each callback signature within a package should always have the same name, ending with _cb
. For example, a hypothetical Amanda::Estimate
module might provide its estimates through a callback with four parameters. This callback should be referred to as estimate_cb
throughout the package, and its parameters should be clearly defined in the package's documentation. It should take positional parameters only. If error conditions must also be communicated via the callback, then the first parameter should be an $error
parameter, which is undefined when no error has occurred. The Changer API's res_cb
is typical of such a callback signature.
A caller can only know that an operation is complete by the invocation of the callback, so it is important that a callback be invoked exactly once in all circumstances. Even in an error condition, the caller needs to know that the operation has failed. Also beware of bugs that might cause a callback to be invoked twice.
Functions or methods taking callbacks as arguments should either take only a callback (like call_later
), or take hash-key parameters, where the callback's key is the signature name. For example, the Amanda::Estimate
package might define a function like perform_estimate
, invoked something like this:
my $estimate_cb = make_cb(estimate_cb => sub {
my ($err, $size, $level) = @_;
die $err if $err;
# ...
});
Amanda::Estimate::perform_estimate(
host => $host,
disk => $disk,
estimate_cb => $estimate_cb,
);
When invoking a user-supplied callback within the library, there is no need to wrap it in a call_later
invocation, as the user already supplied that wrapper via make_cb
, or is not interested in using such a wrapper.
Callbacks are a form of continuation (http://en.wikipedia.org/wiki/Continuations), and as such should only be called at the end of a function. Do not do anything after invoking a callback, as you cannot know what processing has gone on in the callback.
sub estimate_done {
# ...
$self->{'estimate_cb'}->(undef, $size, $level);
$self->{'estimate_in_progress'} = 0; # BUG!!
}
In this case, the estimate_cb
invocation may have called perform_estimate
again, setting estimate_in_progress
back to 1. A technique to avoid this pitfall is to always return
a callback's result, even though that result is not important. This makes the bug much more apparent:
sub estimate_done {
# ...
return $self->{'estimate_cb'}->(undef, $size, $level);
$self->{'estimate_in_progress'} = 0; # BUG (this just looks silly)
}
MainLoop events are generated by event sources. A source may produce multiple events over its lifetime. The higher-level methods in the previous section provide a more Perlish abstraction of event sources, but for efficiency it is sometimes necessary to use event sources directly.
The method $src->set_callback(\&cb)
sets the function that will be called for a given source, and "attaches" the source to the main loop so that it will begin generating events. The arguments to the callback depend on the event source, but the first argument is always the source itself. Unless specified, no other arguments are provided.
Event sources persist until they are removed with $src->remove()
, even if the source itself is no longer accessible from Perl. Although Glib supports it, there is no provision for "automatically" removing an event source. Also, calling $src->remove()
more than once is a potentially-fatal error. As an example:
sub start_timer {
my ($loops) = @_;
Amanda::MainLoop::timeout_source(200)->set_callback(sub {
my ($src) = @_;
print "timer\n";
if (--$loops <= 0) {
$src->remove();
Amanda::MainLoop::quit();
}
});
}
start_timer(10);
Amanda::MainLoop::run();
There is no means in place to specify extra arguments to be provided to a source callback when it is set. If the callback needs access to other data, it should use a Perl closure in the form of lexically scoped variables and an anonymous sub. In fact, this is exactly what the higher-level functions (described above) do.
my $src = Amanda::MainLoop::timeout_source(10000);
A timeout source will create events at the specified interval, specified in milliseconds (thousandths of a second). The events will continue until the source is destroyed.
my $src = Amanda::MainLoop::idle_source(2);
An idle source will create events continuously except when a higher-priority source is emitting events. Priorities are generally small positive integers, with larger integers denoting lower priorities. The events will continue until the source is destroyed.
my $src = Amanda::MainLoop::child_watch_source($pid);
A child watch source will issue an event when the process with the given PID dies. To avoid race conditions, it will issue an event even if the process dies before the source is created. The callback is called with three arguments: the event source, the PID, and the child's exit status.
Note that this source is totally incompatible with any thing that would cause perl to change the SIGCHLD handler. If SIGCHLD is changed, under some circumstances the module will recognize this circumstance, add a warning to the debug log, and continue operating. However, it is impossible to catch all possible situations.
my $src = Amanda::MainLoop::fd_source($fd, $G_IO_IN);
This source will issue an event whenever one of the given conditions is true for the given file (a file handle or integer file descriptor). The conditions are from Glib's GIOCondition, and are $G_IO_IN
, G_IO_OUT
, $G_IO_PRI
, $G_IO_ERR
, $G_IO_HUP
, and $G_IO_NVAL
. These constants are available with the import tag :GIOCondition
.
Generally, when reading from a file descriptor, use $G_IO_IN|$G_IO_HUP|$G_IO_ERR
to ensure that an EOF triggers an event as well. Writing to a file descriptor can simply use $G_IO_OUT|$G_IO_ERR
.
The callback attached to an FdSource should read from or write to the underlying file descriptor before returning, or it will be called again in the next iteration of the main loop, which can lead to unexpected results. Do not use make_cb
here!
Event sources are often set up in groups, e.g., a long-term operation and a timeout. When this is the case, be careful that all sources are removed when the operation is complete. The easiest way to accomplish this is to include all sources in a lexical scope and remove them at the appropriate times:
{
my $op_src = long_operation_src();
my $timeout_src = Amanda::MainLoop::timeout_source($timeout);
sub finish {
$op_src->remove();
$timeout_src->remove();
}
$op_src->set_callback(sub {
print "Operation complete\n";
finish();
});
$timeout_src->set_callback(sub {
print "Operation timed out\n";
finish();
});
}
Glib's main event loop is described in the Glib manual: http://library.gnome.org/devel/glib/stable/glib-The-Main-Event-Loop.html. Note that Amanda depends only on the functionality available in Glib-2.2.0, so many functions described in that document are not available in Amanda. This module provides a much-simplified interface to the glib library, and is not intended as a generic wrapper for it: Amanda's perl-accessible main loop only runs a single GMainContext
, and always runs in the main thread; and (aside from idle sources), event priorities are not accessible from Perl.
This page was automatically generated Tue Mar 19 07:08:16 2019 from the Amanda source tree, and documents the most recent development version of Amanda. For documentation specific to the version of Amanda on your system, use the 'perldoc' command.