Amanda::MainLoop

NAME

Amanda::MainLoop - Perl interface to the Glib MainLoop

SYNOPSIS

    use Amanda::MainLoop;

    my $to = Amanda::MainLoop::timeout_source(2000);
    $to->set_callback(sub {
        print "Time's Up!\n";
        $to->remove();              # dont' re-queue this timeout
        Amanda::MainLoop::quit();   # return from Amanda::MainLoop::run
    });

    Amanda::MainLoop::run();

Note that all functions in this module are individually available for export, e.g.,

    use Amanda::MainLoop qw(run quit);

OVERVIEW

The main event loop of an application is a tight loop which waits for events, and calls functions to respond to those events. This design allows an IO-bound application to multitask within a single thread, by responding to IO events as they occur instead of blocking on particular IO operations.

The Amanda security API, transfer API, and other components rely on the event loop to allow them to respond to their own events in a timely fashion.

The overall structure of an application, then, is to initialize its state, register callbacks for some events, and begin looping. In each iteration, the loop waits for interesting events to occur (data available for reading or writing, timeouts, etc.), and then calls functions to handle those interesting things. Thus, the application spends most of its time waiting. When some application-defined state is reached, the loop is terminated and the application cleans up and exits.

The Glib main loop takes place within a call to Amanda::MainLoop::run(). This function executes until a call to Amanda::MainLoop::quit() occurs, at which point run() returns. You can check whether the loop is running with Amanda::MainLoop::is_running().

HIGH-LEVEL INTERFACE

The functions in this section are intended to make asynchronous programming as simple as possible. They are implemented on top of the interfaces described in the LOW-LEVEL INTERFACE section.

call_later

In most cases, a callback does not need to be invoked immediately. In fact, because Perl does not do tail-call optimization, a long chain of callbacks may cause the perl stack to grow unnecessarily.

The solution is to queue the callback for execution on the next iteration of the main loop, and call_later($cb, @args) does exactly this.

    sub might_delay {
        my ($cb) = @_;
        if (can_do_it_now()) {
            my $result = do_it();
            Amanda::MainLoop::call_later($cb, $result)
        } else {
            # ..
        }
    }

When starting the main loop, an application usually has a sub that should run after the loop has started. call_later works in this situation, too.

    my $main = sub {
        # ..
        Amanda::MainLoop::quit();
    };
    Amanda::MainLoop::call_later($main);
    # ..
    Amanda::MainLoop::run();

make_cb

As an optimization, make_cb wraps a sub with a call to call_later while also naming the sub (using Sub::Name, if available):

    my $fetched_cb = make_cb(fetched_cb => sub {
        # .. callback body
    }

In general, make_cb should be used whenever a callback is passed to some other library. For example, the Changer API (see Amanda::Changer) might be invoked like this:

    my $reset_finished_cb = make_cb(reset_finished_cb => sub {
        my ($err) = @_;
        die "while resetting: $err" if $err;
        # ..
    });

Be careful not to use make_cb in cases where some action must take place before the next iteration of the main loop. In practice, this means make_cb should be avoided with file-descriptor callbacks, which will trigger repeatedly until the descriptors' needs are addressed.

make_cb is exported automatically.

call_after

Sometimes you need the MainLoop equivalent of sleep(). That comes in the form of call_later($delay, $cb, @args), which takes a delay (in milliseconds), a sub, and an arbitrary number of arguments. The sub is called with the arguments after the delay has elapsed.

    sub countdown {
        my $counter;
        $counter = sub {
            print "$i..\n";
            if ($i) {
                Amanda::MainLoop::call_after(1000, $counter, $i-1);
            }
        }
        $counter->(10);
    }

The function returns the underlying event source (see below), enabling the caller to cancel the pending call:

    my $tosrc = Amanda::MainLoop::call_after(15000, $timeout_cb):
    # ...data arrives before timeout...
    $tosrc->remove();

call_on_child_termination

To monitor a child process for termination, give its pid to call_on_child_termination($pid, $cb, @args). When the child exits for any reason, this will collect its exit status (via waitpid) and call $cb as

    $cb->($exitstatus, @args);

Like call_after, this function returns the event source to allow early cancellation if desired.

async_read

    async_read(
        fd => $fd,
        size => $size,        # optional, default 0
        async_read_cb => $async_read_cb,
        args => [ .. ]);      # optional

This function will read $size bytes when they are available from file descriptor $fd, and invoke the callback with the results:

    $async_read_cb->($err, $buf, @args);

If $size is zero, then the callback will get whatever data is available as soon as it is available, up to an arbitrary buffer size. If $size is nonzero, then a short read may still occur if $size bytes do not become available simultaneously. On EOF, $buf will be the empty string. It is the caller's responsibility to set $fd to non-blocking mode. Note that not all operating sytems generate errors that might be reported here. For example, on Solaris an invalid file descriptor will be silently ignored.

The return value is an event source, and calling its remove method will cancel the read. It is an error to have more than one async_read operation on a single file descriptor at any time, and will lead to unpredictable results.

This function adds a new FdSource every time it is invoked, so it is not well-suited to processing large amounts of data. For that purpose, consider using the low-level interface or, better, the transfer architecture (see Amanda::Xfer).

async_write

    async_write(
        fd => $fd,
        data => $data,
        async_write_cb => $async_write_cb,
        args => [ .. ]);      # optional

This function will write $data to file descriptor $fd and invoke the callback with the number of bytes written:

    $cb->($err, $bytes_written, @args);

If $bytes_written is less than then length of <$data>, then an error occurred, and is given in $err. As for async_read, the caller should set $fd to non-blocking mode. Multiple parallel invocations of this function for the same file descriptor are allowed and will be serialized in the order the calls were made:

    async_write($fd, "HELLO!\n",
        async_write_cb => make_cb(wrote_hello => sub {
            print "wrote 'HELLO!'\n";
        }));
    async_write($fd, "GOODBYE!\n",
        async_write_cb => make_cb(wrote_goodbye => sub {
            print "wrote 'GOODBYE!'\n";
        }));

In this case, the two strings are guaranteed to be written in the same order, and the callbacks will be called in the correct order.

Like async_read, this function may add a new FdSource every time it is invoked, so it is not well-suited to processing large amounts of data.

synchronized

Java has the notion of a "synchronized" method, which can only execute in one thread at any time. This is a particular application of a lock, in which the lock is acquired when the method begins, and released when it finishes.

With Amanda::MainLoop, this functionality is generally not needed because there is no unexpected preemeption. However, if you break up a long-running operation (that doesn't allow concurrency) into several callbacks, you'll need to ensure that at most one of those operations is going on at a time. The synchronized function manages that for you.

The function takes a $lock argument, which should be initialized to an empty arrayref ([]). It is used like this:

    use Amanda::MainLoop 'synchronized';
    # ..
    sub dump_data {
        my $self = shift;
        my ($arg1, $arg2, $dump_cb) = @_;

        synchronized($self->{'lock'}, $dump_cb, sub {
            my ($dump_cb) = @_; # IMPORTANT! See below
            $self->do_dump_data($arg1, $arg2, $dump_cb);
        };
    }

Here, do_dump_data may take a long time to complete (perhaps it starts a long-running data transfer) but only one such operation is allowed at any time and other Amanda::MainLoop callbacks may occur (e.g. a timeout). When the critical operation is complete, it calls $dump_cb which will release the lock before transferring control to the caller.

Note that the $dump_cb in the inner sub shadows that in dump_data -- this is intentional, the a call to the the inner $dump_cb is how synchronized knows that the operation has completed.

Several methods may be synchronized with one another by simply sharing the same lock.

ASYNCHRONOUS STYLE

When writing asynchronous code, it's easy to write code that is *very* difficult to read or debug. The suggestions in this section will help write code that is more readable, and also ensure that all asynchronous code in Amanda uses similar, common idioms.

USING CALLBACKS

Most often, callbacks are short, and can be specified as anonymous subs. They should be specified with make_cb, like this:

    some_async_function(make_cb(foo_cb => sub {
        my ($x, $y) = @_;
        # ...
    }));

If a callback is more than about two lines, specify it in a named variable, rather than directly in the function call:

    my $foo_cb = make_cb(foo_cb => sub {
        my ($src) = @_;
        # .
        # .  long function
        # .
    });
    some_async_function($foo_cb);

When using callbacks from an object-oriented package, it is often useful to treat a method as a callback. This requires an anonymous sub "wrapper", which can be written on one line:

    some_async_function(sub { $self->foo_cb(@_) });

LINEARITY

The single most important factor in readability is linearity. If a function that performs operations A, B, and C in that order, then the code for A, B, and C should appear in that order in the source file. This seems obvious, but it's all too easy to write

    sub three_ops {
        my $do_c = sub { .. };
        my $do_b = sub { .. $do_c->() .. };
        my $do_a = sub { .. $do_b->() .. };
        $do_a->();
    }

Which isn't very readable. Be readable.

SINGLE ENTRY AND EXIT

Amanda's use of callbacks emulates continuation-passing style. As such, when a function finishes -- whether successfully or with an error -- it should call a single callback. This ensures that the function has a simple control interface: perform the operation and call the callback.

MULTIPLE STEPS

Some operations require a long squence of asynchronous operations. For example, often the results of one operation are required to initiate another. The step syntax is useful to make this much more readable, and also eliminate some nasty reference-counting bugs. The idea is that each "step" in the process gets its own sub, and then each step calls the next step. The first step defined will be called automatically.

    sub send_file {
        my ($hostname, $port, $data, $sendfile_cb) = @_;
        my ($addr, $socket); # shared lexical variables
        my $steps = define_steps
                cb_ref => \$sendfile_cb;
        step lookup_addr => sub {
            return async_gethostbyname(hostname => $hostname,
                                ghbn_cb => $steps->{'got_addr'});
        };
        step ghbn_cb => sub {
            my ($err, $hostinfo) = @_;
            die $err if $err;
            $addr = $hostinfo->{'ipaddr'};
            return $steps->{'connect'}->();
        };
        step connect => sub {
            return async_connect(
                ipaddr => $addr,
                port => $port,
                connect_cb => $steps->{'connect_cb'},
            );
        };
        step connect_cb => sub {
            my ($err, $conn_sock) = @_;
            die $err if $err;
            $socket = $conn_sock;
            return $steps->{'write_block'}->();
        };
        # ...
    }

The define_steps function sets the stage. It is given a reference to the callback for this function (recall there is only one exit point!), and "patches" that reference to free $steps, which otherwise forms a reference loop, on exit.

WARNING: if the function or method needs to do any kind of setup before its first step, that setup should be done either in a setup step or before the define_steps invocation. Do not write any statements other than step declarations after the define_steps call.

Note that there are more steps in this example than are strictly necessary: the body of connect could be appended to ghbn_cb. The extra steps make the overall operation more readable by adding "punctuation" to separate the task of handling a callback (ghbn_cb) from starting the next operation (connect).

Also note that the enclosing scope contains some lexical (my) variables which are shared by several of the callbacks.

All of the steps are wrapped by make_cb, so each step will be executed on a separate iteration of the MainLoop. This generally has the effect of making asynchronous functions share CPU time more fairly. Sometimes, especially when using the low-level interface, a callback must be called immediately. To achieve this for all callbacks, add immediate => 1 to the define_steps invocation:

    my $steps = define_steps
            cb_ref => \$finished_cb,
            immediate => 1;

To do the same for a single step, add the same keyword to the step invocation:

    step immediate => 1,
         connect => sub { .. };

In some case, you want to execute some code when the step finish, it can be done by defining a finalize code in define_steps:

    my $steps = define_steps
            cb_ref => \$finished_cb,
            finalize => sub { .. };

JOINING ASYNCHRONOUS "THREADS"

With slow operations, it is often useful to perform multiple operations simultaneously. As an example, the following code might run two system commands simultaneously and capture their output:

    sub run_two_commands {
        my ($finished_cb) = @_;
        my $running_commands = 0;
        my ($result1, $result2);
        my $steps = define_steps
            cb_ref => \$finished_cb;
        step start => sub {
            $running_commands++;
            run_command($command1,
                run_cb => $steps->{'command1_done'});
            $running_commands++;
            run_command($command2,
                run_cb => $steps->{'command2_done'});
        };
        step command1_done => sub {
            $result1 = $_[0];
            $steps->{'maybe_done'}->();
        };
        step command2_done => sub {
            $result2 = $_[0];
            $steps->{'maybe_done'}->();
        };
        step maybe_done => sub {
            return if --$running_commands; # not done yet
            $finished_cb->($result1, $result2);
        };
    }

It is tempting to optimize out the $running_commands with something like:

    step maybe_done { ## BAD!
        return unless defined $result1 and defined $result2;
        $finished_cb->($result1, $result2);
    }

However this can lead to trouble. Remember that define_steps automatically applies make_cb to each step, so a maybe_done is not invoked immediately by command1_done and command2_done - instead, maybe_done is scheduled for invocation in the next loop of the mainloop (via call_later). If both commands finish before maybe_done is invoked, call_later will be called twice, with both $result1 and $result2 defined both times. The result is that $finished_cb is called twice, and mayhem ensues.

This is a complex case, but worth understanding if you want to be able to debug difficult MainLoop bugs.

WRITING ASYNCHRONOUS INTERFACES

When designing a library or interface that will accept and invoke callbacks, follow these guidelines so that users of the interface will not need to remember special rules.

Each callback signature within a package should always have the same name, ending with _cb. For example, a hypothetical Amanda::Estimate module might provide its estimates through a callback with four parameters. This callback should be referred to as estimate_cb throughout the package, and its parameters should be clearly defined in the package's documentation. It should take positional parameters only. If error conditions must also be communicated via the callback, then the first parameter should be an $error parameter, which is undefined when no error has occurred. The Changer API's res_cb is typical of such a callback signature.

A caller can only know that an operation is complete by the invocation of the callback, so it is important that a callback be invoked exactly once in all circumstances. Even in an error condition, the caller needs to know that the operation has failed. Also beware of bugs that might cause a callback to be invoked twice.

Functions or methods taking callbacks as arguments should either take only a callback (like call_later), or take hash-key parameters, where the callback's key is the signature name. For example, the Amanda::Estimate package might define a function like perform_estimate, invoked something like this:

    my $estimate_cb = make_cb(estimate_cb => sub {
        my ($err, $size, $level) = @_;
        die $err if $err;
        # ...
    });
    Amanda::Estimate::perform_estimate(
        host => $host,
        disk => $disk,
        estimate_cb => $estimate_cb,
    );

When invoking a user-supplied callback within the library, there is no need to wrap it in a call_later invocation, as the user already supplied that wrapper via make_cb, or is not interested in using such a wrapper.

Callbacks are a form of continuation (http://en.wikipedia.org/wiki/Continuations), and as such should only be called at the end of a function. Do not do anything after invoking a callback, as you cannot know what processing has gone on in the callback.

    sub estimate_done {
        # ...
        $self->{'estimate_cb'}->(undef, $size, $level);
        $self->{'estimate_in_progress'} = 0; # BUG!!
    }

In this case, the estimate_cb invocation may have called perform_estimate again, setting estimate_in_progress back to 1. A technique to avoid this pitfall is to always return a callback's result, even though that result is not important. This makes the bug much more apparent:

    sub estimate_done {
        # ...
        return $self->{'estimate_cb'}->(undef, $size, $level);
        $self->{'estimate_in_progress'} = 0; # BUG (this just looks silly)
    }

LOW-LEVEL INTERFACE

MainLoop events are generated by event sources. A source may produce multiple events over its lifetime. The higher-level methods in the previous section provide a more Perlish abstraction of event sources, but for efficiency it is sometimes necessary to use event sources directly.

The method $src->set_callback(\&cb) sets the function that will be called for a given source, and "attaches" the source to the main loop so that it will begin generating events. The arguments to the callback depend on the event source, but the first argument is always the source itself. Unless specified, no other arguments are provided.

Event sources persist until they are removed with $src->remove(), even if the source itself is no longer accessible from Perl. Although Glib supports it, there is no provision for "automatically" removing an event source. Also, calling $src->remove() more than once is a potentially-fatal error. As an example:

  sub start_timer {
    my ($loops) = @_;
    Amanda::MainLoop::timeout_source(200)->set_callback(sub {
      my ($src) = @_;
      print "timer\n";
      if (--$loops <= 0) {
        $src->remove();
        Amanda::MainLoop::quit();
      }
    });
  }
  start_timer(10);
  Amanda::MainLoop::run();

There is no means in place to specify extra arguments to be provided to a source callback when it is set. If the callback needs access to other data, it should use a Perl closure in the form of lexically scoped variables and an anonymous sub. In fact, this is exactly what the higher-level functions (described above) do.

Timeout

  my $src = Amanda::MainLoop::timeout_source(10000);

A timeout source will create events at the specified interval, specified in milliseconds (thousandths of a second). The events will continue until the source is destroyed.

Idle

  my $src = Amanda::MainLoop::idle_source(2);

An idle source will create events continuously except when a higher-priority source is emitting events. Priorities are generally small positive integers, with larger integers denoting lower priorities. The events will continue until the source is destroyed.

Child Watch

  my $src = Amanda::MainLoop::child_watch_source($pid);

A child watch source will issue an event when the process with the given PID dies. To avoid race conditions, it will issue an event even if the process dies before the source is created. The callback is called with three arguments: the event source, the PID, and the child's exit status.

Note that this source is totally incompatible with any thing that would cause perl to change the SIGCHLD handler. If SIGCHLD is changed, under some circumstances the module will recognize this circumstance, add a warning to the debug log, and continue operating. However, it is impossible to catch all possible situations.

File Descriptor

  my $src = Amanda::MainLoop::fd_source($fd, $G_IO_IN);

This source will issue an event whenever one of the given conditions is true for the given file (a file handle or integer file descriptor). The conditions are from Glib's GIOCondition, and are $G_IO_IN, G_IO_OUT, $G_IO_PRI, $G_IO_ERR, $G_IO_HUP, and $G_IO_NVAL. These constants are available with the import tag :GIOCondition.

Generally, when reading from a file descriptor, use $G_IO_IN|$G_IO_HUP|$G_IO_ERR to ensure that an EOF triggers an event as well. Writing to a file descriptor can simply use $G_IO_OUT|$G_IO_ERR.

The callback attached to an FdSource should read from or write to the underlying file descriptor before returning, or it will be called again in the next iteration of the main loop, which can lead to unexpected results. Do not use make_cb here!

Combining Event Sources

Event sources are often set up in groups, e.g., a long-term operation and a timeout. When this is the case, be careful that all sources are removed when the operation is complete. The easiest way to accomplish this is to include all sources in a lexical scope and remove them at the appropriate times:

    {
        my $op_src = long_operation_src();
        my $timeout_src = Amanda::MainLoop::timeout_source($timeout);

        sub finish {
            $op_src->remove();
            $timeout_src->remove();
        }

        $op_src->set_callback(sub {
            print "Operation complete\n";
            finish();
        });

        $timeout_src->set_callback(sub {
            print "Operation timed out\n";
            finish();
        });
    }

Relationship to Glib

Glib's main event loop is described in the Glib manual: http://library.gnome.org/devel/glib/stable/glib-The-Main-Event-Loop.html. Note that Amanda depends only on the functionality available in Glib-2.2.0, so many functions described in that document are not available in Amanda. This module provides a much-simplified interface to the glib library, and is not intended as a generic wrapper for it: Amanda's perl-accessible main loop only runs a single GMainContext, and always runs in the main thread; and (aside from idle sources), event priorities are not accessible from Perl.

ABOUT THIS PAGE

This page was automatically generated Tue Feb 21 19:14:01 2012 from the Amanda source tree, and documents the most recent development version of Amanda. For documentation specific to the version of Amanda on your system, use the 'perldoc' command.