Amanda::MainLoop - Perl interface to the Glib MainLoop
use Amanda::MainLoop;
my $to = Amanda::MainLoop::timeout_source(2000); $to->set_callback(sub { print "Time's Up!\n"; $to->remove(); # dont' re-queue this timeout Amanda::MainLoop::quit(); # return from Amanda::MainLoop::run });
Amanda::MainLoop::run();
Note that all functions in this module are individually available for export, e.g.,
use Amanda::MainLoop qw(run quit);
The main event loop of an application is a tight loop which waits for events, and calls functions to respond to those events. This design allows an IO-bound application to multitask within a single thread, by responding to IO events as they occur instead of blocking on particular IO operations.
The Amanda security API, transfer API, and other components rely on the event loop to allow them to respond to their own events in a timely fashion.
The overall structure of an application, then, is to initialize its state, register callbacks for some events, and begin looping. In each iteration, the loop waits for interesting events to occur (data available for reading or writing, timeouts, etc.), and then calls functions to handle those interesting things. Thus, the application spends most of its time waiting. When some application-defined state is reached, the loop is terminated and the application cleans up and exits.
The Glib main loop takes place within a call to
Amanda::MainLoop::run()
. This function executes until a call to
Amanda::MainLoop::quit()
occurs, at which point run()
returns.
You can check whether the loop is running with
Amanda::MainLoop::is_running()
.
The functions in this section are intended to make asynchronous programming as simple as possible. They are implemented on top of the interfaces described in the LOW-LEVEL INTERFACE section.
In most cases, a callback does not need to be invoked immediately. In fact, because Perl does not do tail-call optimization, a long chain of callbacks may cause the perl stack to grow unnecessarily.
The solution is to queue the callback for execution on the next
iteration of the main loop, and call_later($cb, @args)
does exactly
this.
sub might_delay { my ($cb) = @_; if (can_do_it_now()) { my $result = do_it(); Amanda::MainLoop::call_later($cb, $result) } else { # .. } }
When starting the main loop, an application usually has a sub that
should run after the loop has started. call_later
works in this
situation, too.
my $main = sub { # .. Amanda::MainLoop::quit(); }; Amanda::MainLoop::call_later($main); # .. Amanda::MainLoop::run();
As an optimization, make_cb
wraps a sub with a call to call_later
while also naming the sub (using Sub::Name
, if available):
my $fetched_cb = make_cb(fetched_cb => sub { # .. callback body }
In general, make_cb
should be used whenever a callback is passed to
some other library. For example, the Changer API (see
Amanda::Changer) might be invoked like this:
my $reset_finished_cb = make_cb(reset_finished_cb => sub { my ($err) = @_; die "while resetting: $err" if $err; # .. });
Be careful not to use make_cb
in cases where some action must
take place before the next iteration of the main loop. In practice,
this means make_cb
should be avoided with file-descriptor
callbacks, which will trigger repeatedly until the descriptors' needs
are addressed.
make_cb
is exported automatically.
Sometimes you need the MainLoop equivalent of sleep()
. That comes
in the form of call_later($delay, $cb, @args)
, which takes a delay
(in milliseconds), a sub, and an arbitrary number of arguments. The
sub is called with the arguments after the delay has elapsed.
sub countdown { my $counter; $counter = sub { print "$i..\n"; if ($i) { Amanda::MainLoop::call_after(1000, $counter, $i-1); } } $counter->(10); }
The function returns the underlying event source (see below), enabling the caller to cancel the pending call:
my $tosrc = Amanda::MainLoop::call_after(15000, $timeout_cb): # ...data arrives before timeout... $tosrc->remove();
To monitor a child process for termination, give its pid to
call_on_child_termination($pid, $cb, @args)
. When the child exits
for any reason, this will collect its exit status (via waitpid
) and
call $cb
as
$cb->($exitstatus, @args);
Like call_after
, this function returns the event source to allow
early cancellation if desired.
async_read( fd => $fd, size => $size, # optional, default 0 async_read_cb => $async_read_cb, args => [ .. ]); # optional
This function will read $size
bytes when they are available from
file descriptor $fd
, and invoke the callback with the results:
$async_read_cb->($err, $buf, @args);
If $size
is zero, then the callback will get whatever data is
available as soon as it is available, up to an arbitrary buffer size.
If $size
is nonzero, then a short read may still occur if $size
bytes do not become available simultaneously. On EOF, $buf
will be
the empty string. It is the caller's responsibility to set $fd
to
non-blocking mode. Note that not all operating sytems generate errors
that might be reported here. For example, on Solaris an invalid file
descriptor will be silently ignored.
The return value is an event source, and calling its remove
method
will cancel the read. It is an error to have more than one
async_read
operation on a single file descriptor at any time, and
will lead to unpredictable results.
This function adds a new FdSource every time it is invoked, so it is not well-suited to processing large amounts of data. For that purpose, consider using the low-level interface or, better, the transfer architecture (see Amanda::Xfer).
async_write( fd => $fd, data => $data, async_write_cb => $async_write_cb, args => [ .. ]); # optional
This function will write $data
to file descriptor $fd
and invoke
the callback with the number of bytes written:
$cb->($err, $bytes_written, @args);
If $bytes_written
is less than then length of <$data>, then an
error occurred, and is given in $err
. As for async_read
, the
caller should set $fd
to non-blocking mode. Multiple parallel
invocations of this function for the same file descriptor are allowed
and will be serialized in the order the calls were made:
async_write($fd, "HELLO!\n", async_write_cb => make_cb(wrote_hello => sub { print "wrote 'HELLO!'\n"; })); async_write($fd, "GOODBYE!\n", async_write_cb => make_cb(wrote_goodbye => sub { print "wrote 'GOODBYE!'\n"; }));
In this case, the two strings are guaranteed to be written in the same order, and the callbacks will be called in the correct order.
Like async_read, this function may add a new FdSource every time it is invoked, so it is not well-suited to processing large amounts of data.
Java has the notion of a "synchronized" method, which can only execute in one thread at any time. This is a particular application of a lock, in which the lock is acquired when the method begins, and released when it finishes.
With Amanda::MainLoop
, this functionality is generally not needed because
there is no unexpected preemeption. However, if you break up a long-running
operation (that doesn't allow concurrency) into several callbacks, you'll need
to ensure that at most one of those operations is going on at a time. The
synchronized
function manages that for you.
The function takes a $lock
argument, which should be initialized to an empty
arrayref ([]
). It is used like this:
use Amanda::MainLoop 'synchronized'; # .. sub dump_data { my $self = shift; my ($arg1, $arg2, $dump_cb) = @_;
synchronized($self->{'lock'}, $dump_cb, sub { my ($dump_cb) = @_; # IMPORTANT! See below $self->do_dump_data($arg1, $arg2, $dump_cb); }; }
Here, do_dump_data
may take a long time to complete (perhaps it starts
a long-running data transfer) but only one such operation is allowed at any
time and other Amanda::MainLoop
callbacks may occur (e.g. a timeout).
When the critical operation is complete, it calls $dump_cb
which will
release the lock before transferring control to the caller.
Note that the $dump_cb
in the inner sub
shadows that in
dump_data
-- this is intentional, the a call to the the inner
$dump_cb
is how synchronized
knows that the operation has completed.
Several methods may be synchronized with one another by simply sharing the same lock.
When writing asynchronous code, it's easy to write code that is *very* difficult to read or debug. The suggestions in this section will help write code that is more readable, and also ensure that all asynchronous code in Amanda uses similar, common idioms.
Most often, callbacks are short, and can be specified as anonymous subs. They should be specified with make_cb, like this:
some_async_function(make_cb(foo_cb => sub { my ($x, $y) = @_; # ... }));
If a callback is more than about two lines, specify it in a named variable, rather than directly in the function call:
my $foo_cb = make_cb(foo_cb => sub { my ($src) = @_; # . # . long function # . }); some_async_function($foo_cb);
When using callbacks from an object-oriented package, it is often useful to treat a method as a callback. This requires an anonymous sub "wrapper", which can be written on one line:
some_async_function(sub { $self->foo_cb(@_) });
The single most important factor in readability is linearity. If a function that performs operations A, B, and C in that order, then the code for A, B, and C should appear in that order in the source file. This seems obvious, but it's all too easy to write
sub three_ops { my $do_c = sub { .. }; my $do_b = sub { .. $do_c->() .. }; my $do_a = sub { .. $do_b->() .. }; $do_a->(); }
Which isn't very readable. Be readable.
Amanda's use of callbacks emulates continuation-passing style. As such, when a function finishes -- whether successfully or with an error -- it should call a single callback. This ensures that the function has a simple control interface: perform the operation and call the callback.
Some operations require a long squence of asynchronous operations. For example, often the results of one operation are required to initiate another. The step syntax is useful to make this much more readable, and also eliminate some nasty reference-counting bugs. The idea is that each "step" in the process gets its own sub, and then each step calls the next step. The first step defined will be called automatically.
sub send_file { my ($hostname, $port, $data, $sendfile_cb) = @_; my ($addr, $socket); # shared lexical variables my $steps = define_steps cb_ref => \$sendfile_cb; step lookup_addr => sub { return async_gethostbyname(hostname => $hostname, ghbn_cb => $steps->{'got_addr'}); }; step ghbn_cb => sub { my ($err, $hostinfo) = @_; die $err if $err; $addr = $hostinfo->{'ipaddr'}; return $steps->{'connect'}->(); }; step connect => sub { return async_connect( ipaddr => $addr, port => $port, connect_cb => $steps->{'connect_cb'}, ); }; step connect_cb => sub { my ($err, $conn_sock) = @_; die $err if $err; $socket = $conn_sock; return $steps->{'write_block'}->(); }; # ... }
The define_steps
function sets the stage. It is given a reference to the
callback for this function (recall there is only one exit point!), and
"patches" that reference to free $steps
, which otherwise forms a reference
loop, on exit.
WARNING: if the function or method needs to do any kind of setup before its
first step, that setup should be done either in a setup
step or before
the define_steps
invocation. Do not write any statements other than step
declarations after the define_steps
call.
Note that there are more steps in this example than are strictly necessary: the
body of connect
could be appended to ghbn_cb
. The extra steps make the
overall operation more readable by adding "punctuation" to separate the task of
handling a callback (ghbn_cb
) from starting the next operation (connect
).
Also note that the enclosing scope contains some lexical (my
)
variables which are shared by several of the callbacks.
All of the steps are wrapped by make_cb
, so each step will be executed on a
separate iteration of the MainLoop. This generally has the effect of making
asynchronous functions share CPU time more fairly. Sometimes, especially when
using the low-level interface, a callback must be called immediately. To
achieve this for all callbacks, add immediate => 1
to the define_steps
invocation:
my $steps = define_steps cb_ref => \$finished_cb, immediate => 1;
To do the same for a single step, add the same keyword to the step
invocation:
step immediate => 1, connect => sub { .. };
In some case, you want to execute some code when the step finish, it can be done by defining a finalize code in define_steps:
my $steps = define_steps cb_ref => \$finished_cb, finalize => sub { .. };
With slow operations, it is often useful to perform multiple operations simultaneously. As an example, the following code might run two system commands simultaneously and capture their output:
sub run_two_commands { my ($finished_cb) = @_; my $running_commands = 0; my ($result1, $result2); my $steps = define_steps cb_ref => \$finished_cb; step start => sub { $running_commands++; run_command($command1, run_cb => $steps->{'command1_done'}); $running_commands++; run_command($command2, run_cb => $steps->{'command2_done'}); }; step command1_done => sub { $result1 = $_[0]; $steps->{'maybe_done'}->(); }; step command2_done => sub { $result2 = $_[0]; $steps->{'maybe_done'}->(); }; step maybe_done => sub { return if --$running_commands; # not done yet $finished_cb->($result1, $result2); }; }
It is tempting to optimize out the $running_commands
with something like:
step maybe_done { ## BAD! return unless defined $result1 and defined $result2; $finished_cb->($result1, $result2); }
However this can lead to trouble. Remember that define_steps automatically
applies make_cb
to each step, so a maybe_done
is not invoked immediately
by command1_done
and command2_done
- instead, maybe_done
is scheduled
for invocation in the next loop of the mainloop (via call_later
). If both
commands finish before maybe_done
is invoked, call_later
will be called
twice, with both $result1
and $result2
defined both times. The result
is that $finished_cb
is called twice, and mayhem ensues.
This is a complex case, but worth understanding if you want to be able to debug difficult MainLoop bugs.
When designing a library or interface that will accept and invoke callbacks, follow these guidelines so that users of the interface will not need to remember special rules.
Each callback signature within a package should always have the same
name, ending with _cb
. For example, a hypothetical
Amanda::Estimate
module might provide its estimates through a
callback with four parameters. This callback should be referred to as
estimate_cb
throughout the package, and its parameters should be
clearly defined in the package's documentation. It should take
positional parameters only. If error conditions must also be
communicated via the callback, then the first parameter should be an
$error
parameter, which is undefined when no error has occurred.
The Changer API's res_cb
is typical of such a callback signature.
A caller can only know that an operation is complete by the invocation of the callback, so it is important that a callback be invoked exactly once in all circumstances. Even in an error condition, the caller needs to know that the operation has failed. Also beware of bugs that might cause a callback to be invoked twice.
Functions or methods taking callbacks as arguments should either take
only a callback (like call_later
), or take hash-key parameters,
where the callback's key is the signature name. For example, the
Amanda::Estimate
package might define a function like
perform_estimate
, invoked something like this:
my $estimate_cb = make_cb(estimate_cb => sub { my ($err, $size, $level) = @_; die $err if $err; # ... }); Amanda::Estimate::perform_estimate( host => $host, disk => $disk, estimate_cb => $estimate_cb, );
When invoking a user-supplied callback within the library, there is no
need to wrap it in a call_later
invocation, as the user already
supplied that wrapper via make_cb
, or is not interested in using
such a wrapper.
Callbacks are a form of continuation (http://en.wikipedia.org/wiki/Continuations), and as such should only be called at the end of a function. Do not do anything after invoking a callback, as you cannot know what processing has gone on in the callback.
sub estimate_done { # ... $self->{'estimate_cb'}->(undef, $size, $level); $self->{'estimate_in_progress'} = 0; # BUG!! }
In this case, the estimate_cb
invocation may have called
perform_estimate
again, setting estimate_in_progress
back to 1.
A technique to avoid this pitfall is to always return
a callback's
result, even though that result is not important. This makes the bug
much more apparent:
sub estimate_done { # ... return $self->{'estimate_cb'}->(undef, $size, $level); $self->{'estimate_in_progress'} = 0; # BUG (this just looks silly) }
MainLoop events are generated by event sources. A source may produce multiple events over its lifetime. The higher-level methods in the previous section provide a more Perlish abstraction of event sources, but for efficiency it is sometimes necessary to use event sources directly.
The method $src->set_callback(\&cb)
sets the function that will
be called for a given source, and "attaches" the source to the main
loop so that it will begin generating events. The arguments to the
callback depend on the event source, but the first argument is always
the source itself. Unless specified, no other arguments are provided.
Event sources persist until they are removed with
$src->remove()
, even if the source itself is no longer accessible from Perl.
Although Glib supports it, there is no provision for "automatically"
removing an event source. Also, calling $src->remove()
more than
once is a potentially-fatal error. As an example:
sub start_timer { my ($loops) = @_; Amanda::MainLoop::timeout_source(200)->set_callback(sub { my ($src) = @_; print "timer\n"; if (--$loops <= 0) { $src->remove(); Amanda::MainLoop::quit(); } }); } start_timer(10); Amanda::MainLoop::run();
There is no means in place to specify extra arguments to be provided to a source callback when it is set. If the callback needs access to other data, it should use a Perl closure in the form of lexically scoped variables and an anonymous sub. In fact, this is exactly what the higher-level functions (described above) do.
my $src = Amanda::MainLoop::timeout_source(10000);
A timeout source will create events at the specified interval, specified in milliseconds (thousandths of a second). The events will continue until the source is destroyed.
my $src = Amanda::MainLoop::idle_source(2);
An idle source will create events continuously except when a higher-priority source is emitting events. Priorities are generally small positive integers, with larger integers denoting lower priorities. The events will continue until the source is destroyed.
my $src = Amanda::MainLoop::child_watch_source($pid);
A child watch source will issue an event when the process with the given PID dies. To avoid race conditions, it will issue an event even if the process dies before the source is created. The callback is called with three arguments: the event source, the PID, and the child's exit status.
Note that this source is totally incompatible with any thing that would cause perl to change the SIGCHLD handler. If SIGCHLD is changed, under some circumstances the module will recognize this circumstance, add a warning to the debug log, and continue operating. However, it is impossible to catch all possible situations.
my $src = Amanda::MainLoop::fd_source($fd, $G_IO_IN);
This source will issue an event whenever one of the given conditions
is true for the given file (a file handle or integer file descriptor).
The conditions are from Glib's GIOCondition, and are $G_IO_IN
,
G_IO_OUT
, $G_IO_PRI
, $G_IO_ERR
, $G_IO_HUP
, and
$G_IO_NVAL
. These constants are available with the import tag
:GIOCondition
.
Generally, when reading from a file descriptor, use
$G_IO_IN|$G_IO_HUP|$G_IO_ERR
to ensure that an EOF triggers an
event as well. Writing to a file descriptor can simply use
$G_IO_OUT|$G_IO_ERR
.
The callback attached to an FdSource should read from or write to the
underlying file descriptor before returning, or it will be called
again in the next iteration of the main loop, which can lead to
unexpected results. Do not use make_cb
here!
Event sources are often set up in groups, e.g., a long-term operation and a timeout. When this is the case, be careful that all sources are removed when the operation is complete. The easiest way to accomplish this is to include all sources in a lexical scope and remove them at the appropriate times:
{ my $op_src = long_operation_src(); my $timeout_src = Amanda::MainLoop::timeout_source($timeout);
sub finish { $op_src->remove(); $timeout_src->remove(); }
$op_src->set_callback(sub { print "Operation complete\n"; finish(); });
$timeout_src->set_callback(sub { print "Operation timed out\n"; finish(); }); }
Glib's main event loop is described in the Glib manual:
http://library.gnome.org/devel/glib/stable/glib-The-Main-Event-Loop.html.
Note that Amanda depends only on the functionality available in
Glib-2.2.0, so many functions described in that document are not
available in Amanda. This module provides a much-simplified interface
to the glib library, and is not intended as a generic wrapper for it:
Amanda's perl-accessible main loop only runs a single GMainContext
,
and always runs in the main thread; and (aside from idle sources),
event priorities are not accessible from Perl.
This page was automatically generated Tue Feb 21 19:14:01 2012 from the Amanda source tree, and documents the most recent development version of Amanda. For documentation specific to the version of Amanda on your system, use the 'perldoc' command.