GSWA/A Peek Under the Hood

From The Open Source Backup Wiki (Amanda, MySQL Backup, BackupPC)

Jump to: navigation, search

So now you have a working Amanda install (right?). What's it doing?

Contents

Where's my Data?

Amanda stores backup data in dumpfiles, which for most configurations are just tarballs (optionally compressed or even encrypted). These dumpfiles are stored on volumes, which are structured like tapes: a label and a numbered sequence of big hunks of data. The tpchanger directive in your amanda.conf(5) tells Amanda to use chg-disk, which stores data on disk. In particular it stores that data at /amanda/vtapes, in a set of four slots"

amanda@knuth ~ $ find /amanda/vtapes/slot1/ -ls
/amanda/vtapes/slot1/
/amanda/vtapes/slot1/00000.MyData01
/amanda/vtapes/slot1/00001.localhost._etc.0

The last file contains the data for host localhost, disk /etc. 00000.MyData01 contains the tape label. Amanda automatically labeled this tape based on the autolabel configuration parameter. The labelstr parameter tells Amanda what labels it can re-use later, and will generally match the autolabel.

That's all you really need to know for now. Ordinarily, Amanda takes care of getting its data out of this format and back into something you can use. Even in the unlikely event that Amanda is completely unusable, you can still get at the data fairly easily - see How To:Do a Bare Metal Restore for the gory details.

Utilities

You can see the contents of the Amanda catalog with amadmin(8). For example, after two dumps:

amanda@knuth ~ $ amadmin MyConfig find
date                host      disk lv tape or file file part status
2011-01-01 13:16:43 localhost /etc  0 MyData01        1  1/1 OK 
2011-01-02 13:14:03 localhost /etc  1 MyData02        1  1/1 OK 

this shows that /etc on localhost was backed up twice, with a full (level 0) backup on January 1 to volume MyData01 and an incremental (level 1) backup to volume MyData02.

How Does Amanda Know What to Back Up?

The disklist(5) tells Amanda what to back up. In the example, it contains only one disk list entry (DLE), for /etc on the local system:

localhost /etc simple-gnutar-local

The simple-gnutar-local refers to the dumptype of the same name in amanda.conf(5), which tells Amanda how to back up this sort of disk. In this case, we're using no compression and backing up with GNUTAR. The 'auth' parameter tells the Amanda server how to contact the Amanda client; in this case we're using "local" because it is easy to set up. See Backing Up Other Systems to see how to back up non-local clients.

How Does the Backup Work?

Without getting into too much detail, the dump process looks something like this:

The Amanda server (amdump(8), specifically) contacts the Amanda client and asks it to send the dumpfile for each DLE. This is an important point: in general, the Amanda server is the one reaching out to contact the Amanda clients - the reverse of the more normal situation of clients contacting the server.

The data is then sent from the Amanda client to the server, and the server writes it to the holding disk. Once the entire dumpfile is in holding, the server begins writing it to the storage (/amanda/vtapes in this case). The use of holding disk has a few advantages:

  1. if the storage backend is broken somehow, Amanda will still do the backups, keeping them in holding. With enough holding space, you could go for a few days without a working tape drive!
  2. most storage backends can only write one stream of data at a time. If you're backing up lots of Amanda clients, you want to do so in parallel, particularly since clients tend to be slower than the Amanda server. Amanda achieves this parallelism by writing to holding disk in parallel, and then streaming data out of the holding disk to the storage backend.

The holding disk is configured in the holdingdisk section of amanda.conf(5); in the example configuration it is limited to 50MB, but most real configurations will use a much larger size.

How Does Amanda Decide Whether to do a Full or an Incremental?

Actually, before requesting the data, the Amanda planner runs on the server to decide exactly how to go about backing things up. It, too, contacts each Amanda client and requests an estimate of the size of full and incremental dumps for each DLE. It then does some complex planning based on the history of each DLE, the estimated sizes, the available storage space, and a number of tweakable parameters to decide what to back up.

This often confuses newcomers, who have control issues and want to tell Amanda when to do full backups and when to do incrementals. The planner is one of Amanda's strengths! Don't fight it!

Configuration Parameters

Let's look a little more deeply. First, Amanda promises to do at least one full backup of each DLE in each dumpcycle. The dumpcycle is given in days in amanda.conf(5) - 3 in the example. In this case, at least every third dump of a DLE will be a full dump, although fulls may come more often if space allows. Amanda promotes dumps like this to try to even out the time and space used on each night of the dump cycle.

The other major cycle in Amanda's configuration is the tapecycle. This tells Amanda how many volumes are available, and thus defines the retention period. In the example configuration, we have a tapecycle of 4, so data can be recovered from the last four nights (more or less - incrementals make this more complicated, but we won't get into that detail now). It's common to choose the tapecycle based on how long data should stick around, and then set the dumpcycle to a value that balances usage of storage (too many fulls) against long recoveries (too many incrementals).


Other languages: English  • Fran├žais


Personal tools