XFA/Data Handling Model
Introduction
Amanda's data-handling model sits at the core of the application. Its purpose is fairly simple: move data between a backup client and archival storage. However, it is subject to some stringent constraints.
- Tapes are linear-access devices, with optimal block- and file-sizes; performance deteriorates rapidly as parameters diverge from these optima.
- Recovery of single files (or "user objects", as described below) should be relatively quick, and not require reading an entire dumpfile.
- Recovery should be possible with only native tools and the contents of tapes.
- Amanda must be able to recover indexes and all other metadata from the on-tape data alone (a "reindex" operation).
Terminology
Much of this terminology is part of the the Application API; this section briefly summarizes those terms..
- Application
- An Amanda component that interfaces directly with the client data.
- User Object
- The smallest object that can be restored (e.g., a file for GNU Tar).
- Collection
- The smallest amount of data that the application can operate on (e.g., a 512-byte block for GNU Tar).
- Transfer
- A data-movement operation.
- Transfer Element
- A component of a transfer; elements are combined in a kind of pipeline, with each element sending data to the next.
- Filter
- A transfer element which performs some transformation, such as compression or encryption, on the data that passes through it. Filters are described as operating normally when performing a backup, and in reverse on restore. When operating in reverse, a filter transforms data that it produced during normal operation, transforming it back into the original input. For example, an encryption filter encrypts in normal operation, and decrypts in reverse.
- Seekable Filter
- A filter which, when operating in reverse, can begin at arbitrary points in the datastream; contrast non-seekable filters.
- Non-seekable Filter
- A filter which, when operating in reverse, must always begin at byte zero of the datastream.
- Catenary Filter
- A filter for which concatenation is distributive over filtering. A filter is catenary if
cat file1 file2 | filter | filter -reverse
produces the same output as
(filter <file1 ; filter <file2) | filter -reverse
Gzip, for example is catenary.
- Bytestream
- A linear sequence of bytes; the data exchanged between transfer elements.
- Debit
- The creation of a range of bytes in a bytestream by a transfer element (we try to avoid the terms "block" or "chunk" here, although the effect is similar).
- Credit
- The consumption of a range of bytes from a bytestream by a transfer element.
Notes on Diagrams
Diagrams should be read from top (data source) to bottom (data sink). Each horizontal line is a bytestream, with a transfer element on either side. Tickmarks above a line represent debits, while tickmarks below a line represent credits.