XFA/Data Handling Model

From wiki.zmanda.com
Revision as of 06:08, 1 October 2007 by Dustin (talk | contribs) (→‎Notes on Diagrams: add diagrams)
Jump to navigation Jump to search

Introduction

Amanda's data-handling model sits at the core of the application. Its purpose is fairly simple: move data between a backup client and archival storage. However, it is subject to some stringent constraints.

  • Tapes are linear-access devices, with optimal block- and file-sizes; performance deteriorates rapidly as parameters diverge from these optima.
  • Recovery of single files (or "user objects", as described below) should be relatively quick, and not require reading an entire dumpfile.
  • Recovery should be possible with only native tools and the contents of tapes.
  • Amanda must be able to recover indexes and all other metadata from the on-tape data alone (a "reindex" operation).

Terminology

Much of this terminology is part of the the Application API; this section briefly summarizes those terms..

Application
An Amanda component that interfaces directly with the client data.
User Object
The smallest object that can be restored (e.g., a file for GNU Tar).
Collection
The smallest amount of data that the application can operate on (e.g., a 512-byte block for GNU Tar).
Transfer
A data-movement operation.
Transfer Element
A component of a transfer; elements are combined in a kind of pipeline, with each element sending data to the next.
Filter
A transfer element which performs some transformation, such as compression or encryption, on the data that passes through it. Filters are described as operating normally when performing a backup, and in reverse on restore. When operating in reverse, a filter transforms data that it produced during normal operation, transforming it back into the original input. For example, an encryption filter encrypts in normal operation, and decrypts in reverse.
Seekable Filter
A filter which, when operating in reverse, can begin at arbitrary points in the datastream; contrast non-seekable filters.
Non-seekable Filter
A filter which, when operating in reverse, must always begin at byte zero of the datastream.
Catenary Filter
A filter for which concatenation is distributive over filtering. A filter is catenary if
cat file1 file2 | filter | filter -reverse

produces the same output as

(filter <file1 ; filter <file2) | filter -reverse

Gzip, for example is catenary.

Bytestream
A linear sequence of bytes; the data exchanged between transfer elements.
Debit
The creation of a range of bytes in a bytestream by a transfer element (we try to avoid the terms "block" or "chunk" here, although the effect is similar).
Credit
The consumption of a range of bytes from a bytestream by a transfer element.

Diagrams

A single bytestream.
A single bytestream.

A bytestream is represented by a solid horizontal line; debits made by the transfer element producing the bytestream are indicated with tickmarks above the line, while credits for the consuming element are delimited with tickmarks below the line.


A filter, transforming the upper bytestream to the lower.
A filter, transforming the upper bytestream to the lower.

A filter element, then, consumes one bytestream (the upper bytestream) and produces another (the lower). Note that there is a one-to-one correspondence of an element's credits (against its source bytestream) and debits (to the bytestream it produces).