VFS Device

From wiki.zmanda.com
Revision as of 13:34, 10 May 2006 by Paul.bijnens (talk | contribs) (Redundancy)
Jump to navigation Jump to search

Based on text by: Stefan G. Weichinger, November - December, 2003 ; minor updates in April, 2005.

Introduction

Since release 2.4.3 Amanda supports the usage of a output driver called "file". See the manual page of amanda, section OUTPUT DRIVERS, for more information.

As the name suggests, this driver uses files on disk as virtual tapes. Amanda can write to and read from virtual tapes, just like real tapes. A bunch of virtual tapes can even be manipulated with a changer.

Possible Uses

  • Test installations: You can easily explore the rich features of Amanda on systems without tape drives. Virtual tapes are usually also much faster than many real tape drives. For a quick start, have a look at: Test environment with virtual tapes.
  • Inexpensive installations: Without buying a tape drive you can enjoy the benefits of Amanda and backup to a bunch of harddisks. You can create CD/DVD-sized backups which you can burn onto optical disks later. Or you can backup to external disks connected with Firewire or USB.
  • Disk-based installations: You can use the file driver to backup onto a set of virtual tapes hosted on a bunch of hard-disks or a RAID-system. Combined with another Amanda configuration that dumps the virtual tapes to real tapes, you can provide reliable backup with faster tapeless recovery. This is called "disk-to-disk-to-tape" backup by some people today.

Please be sure to understand the differences between holding disks and virtual tapes. The two serve different purposes; holding disks allow for parallelism of multiple disklist entries (DLE's) being backed up while virtual tapes are a replacement for physical tapes.

The virtual tapes are also called "vtapes" in this document.

Disk requirements

Before beginning you will need to decide on dedicated parts of your hard disks for your virtual tape storage. While this space could be spread among several file systems and hard disks, I recommend to dedicate at least a specific partition, better a specific physical harddisk to the task of keeping your vtapes. The use of a dedicated disk will speed things up definitely.

The disk space you dedicate for your vtapes should NOT be backed up by Amanda. Also, for performance reasons there should be NO holding disks on the same partition as the vtapes, preferably not even on the same physical drive.

If you only have one harddisk, it will work out, too, but you will suffer low performance due to massive head-moving in your harddisk, resulting from copying data between the filesystems.

Prepare the filesystem used for the vtapes

Decide on where to put your files, create the appropriate partition(s) and filesystem(s) and mount them. In our example we have the dedicated partition hdc1, mounted on /amandatapes for vtape storage. The filesystem must also be capable of creating large file (> 4Gbyte) and must be able to handle symlinks (no vfat).

$ mount
[...]
/dev/hdc1 on /amandatapes type ext3 (rw)
[...] 

Make sure there is space left. Determine the amount of space you will use.

$ df -h /amandatapes
Filesystem      Size  Used  Avail  Use%   Mounted on
/dev/hdc1        20G    0G    20G    0%   /amandatapes 

In our example we have 20GB diskspace left on /amandatapes.

Determine length and number of tapes

The safe calculation

After deciding on the number of vtapes you want to create, evenly allocate the available space among them.

Look at the following rule of thumb: As many filesystems exhibit dramatically reduced performance when they are nearly full I have chosen to allocate only 90% of the available space. So we have:

     (Available Space * 0.9) >= tapelength * tapecycle

This is a very conservative approach to make sure you don´t suffer any performance drop due to a nearly-full-filesystem. As it is uncommon for Amanda to fill an entire tape you may also wish to use more space than that. So you could determine possible combinations of tapelength/tapecycle with the more general formula:

     Available Space >= tapelength * tapecycle

In our example we take the conservative approach, and so we could create the following combinations:

20 GB * 0.9 = 18 GB to use
  • 18 GB = 18 GB * 1
  • 18 GB = 9 GB * 2
  • 18 GB = 6 GB * 3
  • 18 GB = 3 GB * 6
  • 18 GB = ...

Using only one tape is generally considered a bad idea when it comes to backup, so we should use at least 3 tapes (for testing purposes), better 6 or more tapes.

  • 18 GB = 3 GB * 6

so we get the value 3 GB for the tapelength if we want to use 6 tapes.

The optimistic calculation

You could just as well specify a vtape size that is very large, large enough to hold all dumps on an extreme day. On average however, all the dumps over the cycle should still fit in the available space.

Even if the number of vtapes times the capacity of a single vtape is larger than the total space, over the complete tapecycle you hope it will not be more than the total capacity. If you do run out of diskspace, Amanda will encounter a tape IO error, just like a normal tape. However you will not be able to do a simple "amflush", unless freeing up some space on the partition.

Using this strategy, you can run very unbalanced configurations: dumping one very large DLE on one day, compensated by small incrementals on the other days. And still using a simple configuration with one vtape for each run.

Create a tapetype definition

Add a new tapetype definition similar to the following to your amanda.conf. I named my definition "HARD-DISK". Choose whatever name you consider appropriate.

define tapetype HARD-DISK {
    comment "Dump onto hard disk"
    length 3072 mbytes 	# specified in mbytes to get the exact size of 3GB
}

You don´t have to specify the parameter speed (as it is commonly listed in tapetype definitions and reported by the program amtapetype). Amanda does not use this parameter right now.

There is also an optional parameter filemark, which indicates the amount of space "wasted" after each tape-listitem. Leave it blank and Amanda uses the default of 1KB.

The tapetype defined above should just be choosen by the paramater tapetype in amanda.conf too:

 tapetype HARD-DISK

Simple virtual tapes

A virtual tape is implemented as a directory with a subdirectory named "data" in it. Let's create one for our "test" configuration:

# chown amanda:disk /amandatapes
# chmod 750 /amandatapes                       # backups contain secret files!
# su - amanda
$ mkdir -p /amandatapes/test/tape1/data

This tape can be manipulated by the ammt command, a replacement for the system command "mt". The ammt command understands the different output drivers from Amanda:

$ ammt -f file:/amandatapes/test/tape1 status
$ ammt -f file:/amandatapes/test/tape1 rewind

Vtapes are always non-rewinding. Just like Amanda needs them. That's why you always need to rewind it when you want to start reading a vtape from the beginning.

Basic writing to a vtape can be done with amdd, a replacement for the system command "dd". Virtual tapes have no real builtin capacity; the upper limit is "diskspace, the final frontier". However Amanda does obey the size you specify in tapetype definition of a vtape in amanda.conf. The amdd command also can specify an upperlimit on the virtual tapesize with the -l option:

$ amdd -l 200k if=/dev/urandom of=file:/amandatapes/test/tape1 bs=32k
amdd: write error: No space left on device
8+0 in
6+1 out

The above command writes 200 Kbytes of garbage (6 full blocks of 32k + 1 partial block) on the vtape before it bumps into the end of the virtual tape.

When there is no "data" subdirectory in a vtape, the vtape is "offline". You could burn the contents of the data directory to a CD-R, and store that away. When you want to read it, just mount is as a "data" directory, or even simpler, create a symlink "data" pointing to the mounted cdrom.

$ rm -r /amandatapes/test/tape1/data
$ ammt -f file:/amandatapes/test/tape1 status
file:/amandatapes/tape1: status: OFFLINE
$ ln -s /media/cdrom /amandatapes/test/tape1/data
$ ammt -f file:/amandatapes/test/tape1 status
file:/amandatapes/test/tape1: status: ONLINE

Amanda cannot backup a to CD-R, but can use it as a read protected vtape; making a backup to a vtape, and and later burning the contents of the data directory to a CD or DVD is the normal way.

We can use such a simple vtape as a tape device in amanda.conf with a line like:

tapedev "file:/amandatapes/test/tape1"

Each run we point the data symlink to a different directory manually. But read on, this can also be automated.

Virtual tapes with chg-disk

  • To use chg-disk you need to have at least amanda-2.4.4p1-20031202.

The changer script "chg-disk" is specifically written to handle a bunch for virtual tapes on disk. This script does not need a separate configuration file, like most other changer scripts do. Instead it uses these parameters in amanda.conf:

tpchanger "chg-disk"
changerfile "/home/amanda/test/chg-disk-status"    # status files prefix
tapedev "file:/amandatapes/test/slots"
tapecycle 5
# changerdev is ignored

"Chg-disk" operates the virtual changer by pointing the symlink data to another directory, named slotX, where X is the slot number. The directory tree should look like:

slot_root_dir -|
               |- info
               |- data -> slot1/
               |- slot1/
               |- slot2/
               |- ...
               |- slotN/

"slot_root_dir" is the value of the tapedev parameter, and N is value of tapecycle in amanda.conf. The changer script uses the value of changerfile as prefix of some files which store the status of the virtual changer.

We create the virtual slots tree for the chg-disk changer, and set it "online", by creating the "data" symlink:

$ mkdir -p /amandatapes/test/slots
$ cd /amandatapes/test/slots
$ for i in 1 2 3 4 5; do mkdir -p slot$i; done
$ ln -s slot1 data
$ ammt -f file:/amandatapes/test/slots status

The file "info" in the slot_root_dir is created automatically on first use. Do not add a leading zero to the slot number, chg-disk would not understand that. Create as many slots as you have specified as tapecycle in amanda.conf.

And we label the virtual tapes:

$ for i in 1 2 3 4 5; do amlabel test TEST-$i slot $i; done
$ amcheck test

When using Amanda 2.5.0, you can let Amanda label the tapes automatically on first use. Add the parameter label_new_tapes to the amanda.conf file, and give it a template with '%' signs for the number, like:

label_new_tapes "TEST-%%"

The usual warnings about this being dangerous because it will erase non-Amanda tapes, does not apply here, because we are pretty shure that all the vtapes are indeed to be used for this purpose.

As always we end with "amcheck test" and solve the issues it complains about. We can verify all the virtual tapes, and load the first tape again, ready for the first amdump run:

$ amtape test show
$ amtape test reset

When the command "amtape config show" cycles through the changer, it leaves the last displayed tape as current. Therefor we reset the changer and load the tape in the first slot again. Remember to do this when you experiment with "amtape config show", otherwise, the sequence of tapes Amanda will use, will be out of order (considering the numbers we labeled the tapes).

Virtual tapes with chg-multi

A much older changer script is "chg-multi", which emulates a changer consisting of multiple tape drives. If you have two tape drives of the same model, and hook them to the server, this script allows you to emulate a changer with two slots.

Vtapes are complete tape drives, as far as Amanda is concerned, and you can operate a bunch of these with the chg-multi changer script too:

Create 5 vtapes for our "test" configuration:

for i in 1 2 3 4 5; do mkdir -p /amandatapes/test/tape$i/data; done

Now we have a server with 5 tape drives. They are virtual tapes, but Amanda isn't picky about that.

Change amanda.conf for the "test" configuration to have these values:

tpchanger "chg-multi"
changerfile "chg-multi.conf"       # name of the special configuration file
# tapedev is ignored if present, to avoid confusion, just comment it out
# changerdev is ignored too

The chg-multi script needs more configuration and uses the parameter changerfile in amanda.conf as the name of that special config file. And add a file in the same directory as amanda.conf with the name we specified above as the changerfile chg-multi.conf. Because "chg-multi" is useful in a wide context, we need to specify a lot of parameters that are irrelevant for vtapes:

multieject 0
needeject 0
gravity 0
ejectdelay 0
statefile /home/amanda/test/changerstatus
firstslot 1
lastslot 5
slot 1 file:/amandatapes/test/tape1
slot 2 file:/amandatapes/test/tape2
slot 3 file:/amandatapes/test/tape3
slot 4 file:/amandatapes/test/tape4
slot 5 file:/amandatapes/test/tape5

The statefile parameter in the file chg-multi.conf is now the prefix of some files that hold the status of the emulated changer.

And we label the tapes in all the slots:

$ for i in 1 2 3 4 5; do amlabel test TEST-$i slot $i; done

Or use the label_new_tapes parameter in amanda.conf (if you have at least Amanda 2.5.0) to this automatically, as explained in the chg-disk section above.

Then, as usual, the final checks:

$ amcheck test
$ amtape test show
$ amtape test reset

And we are ready to use our tape changer with 5 tape drives.

Comparison of chg-disk and chg-multi for virtual tapes

The two changers chg-disk and chg-multi have a different approach to the handling of tapes:

  • The script chg-multi handles many drives with a tape in each drive.
  • The script chg-disk handles a library with one drive and multiple tapes.

This implies that chg-disk can drive only one tape at a time, while in the setup with chg-multi, you can always specify one specific tape, and use that one for e.g. restoring, while amdump is using another tape.

While the chg-disk changer is very straightforward to set up, the chg-multi script has a wider range of uses, but also is slightly more complicated to set up. The chg-multi script can e.g. be extended to rait, making a mirror of the backup to a real tape or a vtape on an external disk at the same time.

For chg-multi, the underlying filesystem does not need to be able to handle symlinks. You can use plain old VFAT on an external USB-disk that is also accessible by Windows. (But you'll need to limit the maximum tapesize to 4 Gbyte, or using multi-tape-split dumps, available in 2.5.0, to avoid running into the max filesize limit of vfat.)

If you don't need the complexity of chg-multi, stay with the easy chg-disk.

Migrating to chg-multi is easy by just moving the slotX directories of the chg-disk vtape tree to the data directories of each vtape in the chg-multi tree.

The vtape directory tree for chg-disk looks like:

    root_dir -|
              |- data -> slot1/
              |- slot1/
              |- slot2/
              |- ...
              |- slotN/

The vtape directory tree for chg-multi looks like:

    root_dir -|
              |- vtape1 --- data
              |- vtape2 --- data
              |- ...
              |- vtapeN --- data

There is only one parent of the "data" subdirectory in a chg-disk tree: it has one virtual tape drive. Chg-disk "inserts" tapes into the drive by manipulating the symlink "data". In chg-multi, each slot is an independent virtual tape drive for a single vtape. Actualy in chg-multi, the vtapes do not even need a common parent and can easily be spread over many disks.

To use chg-disk you need to have at least amanda-2.4.4p1-20031202. Chg-multi is much older.

Copying a vtape to a physical tape

There are times when you want to copy the contents of a vtape to a physical tape, e.g. for offline storage. Some people have even used vtapes on a large disk temporarily while their physical tape drive was in repair. And when the tape drive was connected again, they want to copy the virtual tapes to the real tapes.

Copy file files having pattern  ?????.* to the tape:

$ cd /amandatapes/test/tape1/data
$ mt -t /dev/nst0 rewind
$ mt -t /dev/nst0 blksize 0
$ mt -t /dev/nst0 compression off
$ for f in ?????.*; do dd if=$f of=/dev/nst0 bs=32k; done

The files having pattern  ?????-* are used internally by the file driver, and should not be put on tape.

Note you can also use a RAIT mirror to write a vtape and a physical tape at the same time.


Holdingdisk and vtapes

Should we use a holdingdisk when the final destination of the backup is a virtual tape on disk?

Usually the answer is "yes"!

A holdingdisk serves a complete different purpose: it allows different clients to backup simultaneously and the server collects the incomplete images in the holdingdisk. When a backup image is complete, the server then adds it to the tape queue. The taper can choose the best file, depending on the criteria you set in "taperalgo". Especially "taperalgo largestfit" is helpful fitting as much as possible on a vtape. (e.g. when you want to burn the backups to a DVD later).

When you do not specify a holdingdisk, then only one DLE can be dumped at once. No parallelism is possible. And taper cannot use any of its algorithms either.

In the case where you have only one client (e.g. your home PC), then avoiding a holdingdisk could indeed be a reasonable decision. But if your vtapes live on a slow USB-1 connected device, then a holdingdisk might still be faster. In case there are problems with your vtapes (e.g. the external disk with vtapes is full, or got disconnected), then Amanda can still fall back to degraded mode, and dump only incrementals to the holdingdisk.

Recovery

For recovering files from a backup to vtapes with chg-disk or chg-multi, make sure these settings are added in amanda.conf:

amrecover_do_fsf  true
amrecover_check_label  true
amrecover_changer  "changer"

The first line assures we do an implicit rewind of the vtape before reading it. Remember: vtapes are always non-rewinding!

With the last line we give the changer a name, which we can use instead of the tape device in amrecover, when starting the command:

# /usr/local/amanda/sbin/amrecover daily -d changer

or from inside amrecover with the settape command, or even:

Extracting files using tape drive file:/BACKUP2/slots/ on host
backupserver.local. Load tape B3_14 now
Continue [?/Y/n/s/t]? t
New tape device [?]: changer
Using tape "changer" from server backupserver.local.
Continue [?/Y/n/s/t]? y

Loading tapes manually

Instead of letting amrecover use the changer, you can also do everything manually.

I will simply paste an amrecover session here (provided by JC Simonetti, author of chg-disk):

# /usr/local/amanda/sbin/amrecover woo
AMRECOVER Version 2.4.4p3. Contacting server on backupserver.local ... 
220 backupserver AMANDA index server (2.4.4p3) ready.
200 Access OK
Setting restore date to today (2004-10-08)
200 Working date set to 2004-10-08.
Scanning /BACKUP2/holding...
Scanning /BACKUP/holding...
200 Config set to woo.
200 Dump host set to backupserver.local.
Trying disk /tmp ...
$CWD '/tmp/RECOVER' is on disk '/tmp' mounted at '/tmp'.
200 Disk set to /tmp.
Invalid directory - /tmp/RECOVER
amrecover> sethost backupserver.local
200 Dump host set to backupserver.local.
amrecover> setdisk /
200 Disk set to /.
amrecover> cd /etc
/etc
amrecover> add passwd
Added /etc/passwd
amrecover> list
TAPE B3_14 LEVEL 0 DATE 2004-09-26
       /etc/passwd
amrecover> extract

Extracting files using tape drive file:/BACKUP2/slots/ on host
backupserver.local. The following tapes are needed: B3_14

Restoring files into directory /tmp/RECOVER
Continue [?/Y/n]? Y

Extracting files using tape drive file:/BACKUP2/slots/ on host
backupserver.local. 
Load tape B3_14 now
Continue [?/Y/n/s/t]? Y
. /etc/passwd
amrecover> quit
200 Good bye.  

Nothing spectacular? The trick is this: When Amanda asks you

Load tape B3_14 now 
Continue [?/Y/n/s/t]?  

you have to run the following in a second terminal:

$ amtape woo slot 14
amtape: changed to slot 14 on file:/BACKUP2/slots/ 

This step is necessary to load the proper tape into your virtual changer. Let me express this in a more general way:

When amrecover prompts for the tape it needs to restore the files you requested, you have to "load" the tape it requests. The recommended way to do this is to use amtape. The options that make sense in this context are:

# amtape
Usage: amtape <conf> <command>
       Valid commands are:
               [...]
               slot <slot #>        load tape from slot <slot #>
               [...]
               label <label>        find and load labeled tape
               [...] 

If you know which slot contains the requested tape (for example, if you have tape daily01 in slot 1, tape daily02 in slot 2, and so on) you may use the first option. If you just know the label of the tape you need, use the second option.

To continue the upper example:

amtape woo slot 14 	        # option 1 OR
amtape woo label B3_14 	# option 2 

amtape will return something like:

amtape: label B3_14 is now loaded.  

After this you can return to your amrecover-session and continue restoring your files.

Redundancy

Putting all your vtapes on a single destination makes it much more sensitive to massive data loss. Even on an expensive RAID device things can go wrong, resulting in a complete unusable filesystem; and then you loose all the vtapes. Without RAID, accidents happen even more.

Consider using RAIT to mirror the dumps to disk + tape or to disk + external disk.

Or do the "daily" backup to vtapes mixing fulls and incrementals as usual for the "normal" restores, and run a weekly "archive" backup to tape with only full dumps and store that tape offsite.

Of course it all depends on the cost of losing the vtapes, versus the investment to make redundant copies.