Amanda::DB::Catalog

NAME

Amanda::DB::Catalog - access to the Amanda catalog: where is that dump?

SYNOPSIS

  use Amanda::DB::Catalog;

  # get all dump timestamps on record
  my @timestamps = Amanda::DB::Catalog::get_timestamps();

  # loop over those timestamps, printing dump info for each one
  for my $timestamp (@timestamps) {
      my @dumpfiles = Amanda::DB::Catalog::get_parts(
          timestamp => $timestamp,
          ok => 1
      );
      print "$timstamp:\n";
      for my $dumpfile (@dumpfiles) {
          print " ", $dumpfile->{hostname}, ":", $dumpfile->{diskname}, 
                " level ", $dumpfile->{level}, "\n";
      }
  }

MODEL

The Amanda catalog is modeled as a set of dumps comprised of parts. A dump is a complete bytestream received from an application, and is uniquely identified by the combination of hostname, diskname, dump_timestamp, level, and write_timestamp. A dump may be partial, or even a complete failure.

A part corresponds to a single file on a volume, containing a portion of the data for a dump. A part, then, is completely specified by a volume label and a file number (filenum). Each part has, among other things, a part number (partnum) which gives its relative position within the dump. The bytestream for a dump is recovered by concatenating all of the successful (status = OK) parts matching the dump.

Files in the holding disk are considered part of the catalog, and are represented as single-part dumps (holding-disk chunking is ignored, as it is distinct from split parts).

DUMPS

The dump table contains one row per dump. It has the following columns:

dump_timestamp: (string) -- timestamp of the run in which the dump was created
write_timestamp: (string) -- timestamp of the run in which the part was written to this volume, or "00000000000000" for dumps in the holding disk.
hostname: (string) -- dump hostname
diskname: (string) -- dump diskname
level: (integer) -- dump level
status: (string) -- The status of the dump - "OK", "PARTIAL", or "FAIL". If a disk failed to dump at all, then it is not part of the catalog and thus will not have an associated dump row.
message: (string) -- reason for PARTIAL or FAIL status
nparts: (integer) -- number of successful parts in this dump
bytes: (integer) -- size (in bytes) of the dump on disk, 0 if the size is not known.
kb: (integer) -- size (in kb) of the dump on disk
orig_kb: (integer) -- size (in kb) of the complete dump (before compression or encryption); undef if not available
sec: (integer) -- time (in seconds) spent writing this part
parts: (arrayref) -- array of parts, indexed by partnum (so $parts->[0] is always undef). When multiple partial parts are available, the choice of the partial that is included in this array is undefined.

A dump is represented as a hashref with these keys.

The write_timestamp gives the time of the amanda run in which the part was written to this volume. The write_timestamp may differ from the dump_timestamp if, for example, amflush wrote the part to tape after the initial dump.

PARTS

The parts table contains one row per part, and has the following columns:

label: (string) -- volume label (not present for holding files)
filenum: (integer) -- file on that volume (not present for holding files)
holding_file: (string) -- fully-qualified pathname of the holding file (not present for on-media dumps)
dump: (object ref) -- a reference to the dump containing this part
status: (string) -- The status of the part - "OK", "PARTIAL", or "FAILED".
partnum: (integer) -- part number of a split part (1-based)
kb: (integer) -- size (in kb) of this part
sec: (integer) -- time (in seconds) spent writing this part

A part is represented as a hashref with these keys. The label and filenum serve as a primary key.

Note that parts' dump and dumps' parts create a reference loop. This is broken by making the parts array's contents weak references in get_dumps, and the dump reference weak in get_parts.

NOTES

All timestamps used in this module are full-length, in the format YYYYMMDDHHMMSS. If the underlying data contains only datestamps, they are zero-extended into timestamps: YYYYMMDD000000. A dump_timestamp always corresponds to the initiation of the original dump run, while write_timestamp gives the time the file was written to the volume. When parts are migrated from volume to volume (e.g., by amvault), the dump_timestamp does not change.

In Amanda, the tuple (hostname, diskname, level, dump_timestamp) serves as a unique identifier for a dump bytestream, but because the bytestream may appear several times in the catalog (due to vaulting) the additional write_timestamp is required to identify a particular on-storage instance of a dump. Note that the part sizes may differ between instances, so it is not valid to concatenate parts from different dump instances.

INTERFACES

SUMMARY DATA

The following functions provide summary data based on the contents of the catalog.

get_write_timestamps(): Get a list of all write timestamps, sorted in chronological order.
get_latest_write_timestamp(): Return the most recent write timestamp.
get_latest_write_timestamp(type => 'amvault') =item get_latest_write_timestamp(types => [ 'amvault', .. ]): Return the timestamp of the most recent dump of the given type or types. The available types are given below for get_run_type.
get_labels_written_at_timestamp($ts): Return a list of labels for volumes written at the given timestamp.
get_run_type($ts): Return the type of run made at the given timestamp. The result is one of amvault, amdump, amflush, or the default, unknown.

PARTS

get_parts(%parameters)

This function returns a sequence of parts. Values in %parameters restrict the set of parts that are returned. The hash can have any of the following keys:

write_timestamp: restrict to parts written at this timestamp
write_timestamps: (arrayref) restrict to parts written at any of these timestamps (note that holding-disk files have no write_timestamp, so this option and the previous will omit them)
dump_timestamp: restrict to parts with exactly this timestamp
dump_timestamps: (arrayref) restrict to parts with any of these timestamps
dump_timestamp_match: restrict to parts with timestamps matching this expression
holding: if true, only return dumps on holding disk. If false, omit dumps on holding disk.
hostname: restrict to parts with exactly this hostname
hostnames: (arrayref) restrict to parts with any of these hostnames
hostname_match: restrict to parts with hostnames matching this expression
diskname: restrict to parts with exactly this diskname
disknames: (arrayref) restrict to parts with any of these disknames
diskname_match: restrict to parts with disknames matching this expression
label: restrict to parts with exactly this label
labels: (arrayref) restrict to parts with any of these labels
level: restrict to parts with exactly this level
levels: (arrayref) restrict to parts with any of these levels
status: restrict to parts with this status
labelstr: restrict to parts on volume matching the labelstr.
dumpspecs: (arrayref of dumpspecs) restruct to parts matching one or more of these dumpspecs

Match expressions are described in the amanda(8) manual page.

sort_parts([ $key1, $key2, .. ], @parts)

Given a list of parts, this function sorts that list by the requested keys. The following keys are available:

hostname
diskname
write_timestamp
dump_timestamp
level
filenum
label: Note that this sorts labels lexically, not necessarily in the order they were used!
partnum
nparts

Keys are processed from left to right: if two dumps have the same value for $key1, then $key2 is examined, and so on. Key names may be prefixed by a dash (-) to reverse the order.

Note that some of these keys are dump keys; the function will automatically access those values via the dump attribute.

DUMPS

get_dumps(%parameters): This function returns a sequence of dumps. Values in %parameters restrict the set of dumps that are returned. The same keys as are used for get_parts are available here, with the exception of label and labels. In this case, the status parameter applies to the dump status, not the status of its constituent parts.
sort_dumps([ $key1, $key2 ], @dumps): Like sort_parts, this sorts a sequence of dumps generated by get_dumps. The same keys are available, with the exception of label, filenum, and partnum.

ADDING DATA

add_part($part)

Add the given part to the database. In terms of logfiles, this will either create a new logfile (if the part's write_timestamp has not been seen before) or append to an existing logfile. Note that a new logfile will require a corresponding new entry in the tapelist.

Note that no locking is performed: multiple simultaneous calls to this function can result in a corrupted or incorrect logfile.

TODO: add_dump

ABOUT THIS PAGE

This page was automatically generated Tue Oct 4 19:45:36 2016 from the Amanda source tree, and documents the most recent development version of Amanda. For documentation specific to the version of Amanda on your system, use the 'perldoc' command.