How To:Set Up Virtual Tapes

From wiki.zmanda.com
Revision as of 05:10, 9 November 2018 by Nathanst (talk | contribs) (expand "new chg-disk:" section (including adding a mention of the auto-create-slot functionality))
Jump to navigation Jump to search

Virtual tapes use the file driver to emulate a tape, while storing data on disk in a directory tree. Although there are alternatives available, the recommended changer for virtual tapes is chg-disk.

Requirements

Before beginning you will need to decide on dedicated parts of your hard disks for your virtual tape storage. While this space could be spread among several file systems and hard disks, I recommend to dedicate at least a specific partition, better a specific physical harddisk to the task of keeping your vtapes. The use of a dedicated disk will speed things up definitely.

The disk space you dedicate for your vtapes should NOT be backed up by Amanda. Also, for performance reasons there should be NO holding disks on the same partition as the vtapes, preferably not even on the same physical drive. However, you should still use a holding disk - see the FAQ entry.

If you only have one harddisk, it will work out, too, but you will suffer low performance due to massive head-moving in your harddisk, resulting from copying data between the filesystems.

Length and number of tapes

The next step is to figure out how to divide your disk into tapes. You should have good estimates of the following:

  • total amount of data to be backed up
  • number of incremental (level 1 and higher) dumps per full (level 0) dump
  • size of incremental dumps relative to full dumps (which depends on the rate at which files change on the target system)
  • total tapecycle (which dictates the data retention period)
  • the projected change in these numbers over the long term

Armed with this information, the first choice to make is whether you want to future-proof the solution, at the cost of more disk space up-front, or design things to optimize current usage of resources. A safe, conservative calculation will give you a system that you can expect to "run itself" for months or years, adjusting appropriately for changes in the environment. The other option, "oversubscribing", provides a system that will utilize available disk resources optimally, but which must be watched carefully to prevent backup failures.

The safe calculation

This calculation divides the available space evenly into tapes, such that the total size of the tapes is equal to the available space.

Look at the following rule of thumb: As many filesystems exhibit dramatically reduced performance when they are nearly full I have chosen to allocate only 90% of the available space. So we have:

 available space * 0.9 >= tapelength * tapecycle

This is a very conservative approach to make sure you don´t suffer any performance drop due to a nearly-full-filesystem. Depending on your filesystem and other circumstances, you may need to adjust this factor. As it is uncommon for Amanda to fill an entire tape you may also wish to use more space than that.

Since we are given the available space and tapecycle, the tape length can be calculated as

 available space * 0.9 / tapecycle

In our example

20 GB * 0.9 = 18 GB to use
18 GB / 9 = 2 GB per tape (for a tapecycle of 9)

Oversubscription

Oversubscription refers to promising more of a resource than actually exists. In the case of Amanda, this means configuring Amanda to use more (virtual) tape than is actually available.

The formula is then

 available space * oversubscription factor = tapelength * tapecycle

where the oversubscription factor is some number between 1.0 and the tapecycle.

The success of this method assumes that Amanda will not, in general, fill a tape to capacity. This may be the case if:

  • only a few, large DLEs (disklist entries) are defined, so that tape usage is high for days with full dumps, and low on other days (a "lumpy" dump distribution)
  • the total data to be backed up is smaller than the tapesize

Note that Amanda does not know that its space has been oversubscribed, so it cannot make any allowances in the planning of dumps.

Before adopting this strategy, consider how it can go awry. Assume we have 15GB of storage, a tapecycle of 15, and tape length 2GB. This corresponds to an oversubscription factor of 2. While you were away on vacation, disk use on one DLE increased dramatically, and the work of a new team has caused incrementals on other DLEs to grow quite large. Backups for the last 9 days have, as a consequence, been larger than projected, although Amanda has not signaled any error conditions. The tape sizes are now

T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15
1.9GB 1.6GB 1.4GB 1.7GB 1.8GB 1.9GB 1.2GB 1.7GB 1.8GB 0.3GB 0.5GB 0.9GB 0.4GB 0.6GB 0.4GB

T11 will be overwritten by tonight's backup, but once it is erased, only 0.4GB are available. Tonight's backup will fail with a short write, and leave dumps in the holding disk, and be unable to flush them. This situation will continue at least until T01, or until you add more disk space.

An oversubscription factor of 2.0 is very high. The appropriate value depends on the situation, but values in the range 1.1-1.5 are not uncommon.

Redundancy

Putting all your vtapes on a single destination makes it much more sensitive to massive data loss. Even on an expensive RAID device things can go wrong, resulting in a complete unusable filesystem; and then you lose all the vtapes. Without RAID, accidents happen even more.

Consider using RAIT to mirror the dumps to disk + tape or to disk + external disk.

Or do the "daily" backup to vtapes mixing fulls and incrementals as usual for the "normal" restores, and run a weekly "archive" backup to tape with only full dumps and store that tape offsite.

Of course it all depends on the cost of losing the vtapes, versus the investment to make redundant copies.

Configuration

Prepare the filesystem used for the vtapes

Decide on where to put your files, create the appropriate partition(s) and filesystem(s) and mount them. In our example we have the dedicated partition hdc1, mounted on /amandatapes for vtape storage. The filesystem must also be capable of creating large file (> 4Gbyte) and must be able to handle symlinks (not FAT).

$ mount
[...]
/dev/hdc1 on /amandatapes type ext3 (rw)
[...] 

Make sure there is adequate space:

$ df -h /amandatapes
Filesystem      Size  Used  Avail  Use%   Mounted on
/dev/hdc1        20G    0G    20G    0%   /amandatapes 

Create a tapetype definition

Add a new tapetype definition similar to the following to your amanda.conf(5). I named my definition "HARD-DISK". Choose whatever name you consider appropriate.

define tapetype HARD-DISK {
    comment "Dump onto hard disk"
    length 3072 mbytes 	# specified in mbytes to get the exact size of 3GB
}

You don´t have to specify the parameter speed (as it is commonly listed in tapetype definitions and reported by the program amtapetype). Amanda does not use this parameter right now.

There is also an optional parameter filemark, which indicates the amount of space "wasted" after each tape-listitem. Leave it blank and Amanda uses the default of 1KB.

The tapetype defined above should be selected by the paramater tapetype in amanda.conf(5), too:

 tapetype HARD-DISK

Create A Single Tape

If you're not using a changer, this is how you can create a single vtape. This is a good way to get started with your configuration, but is not very useful for most backup situations.

# chown amanda:disk /amandatapes
# chmod 750 /amandatapes                       # backups contain secret files!
# su - amanda
$ mkdir -p /amandatapes/test/mytape/data

We can use such a simple vtape as a tape device in amanda.conf with a line like:

tapedev "file:/amandatapes/test/mytape"
NOTE: this is a single vtape, which is not very useful - you will generally want to use more than one vtape. Read on!

Set up A Changer

There are currently two disk changers in Amanda, since we are in a period of transition between two changer APIs. The new chg-disk: is recommended when it is available, unless you are using a filesystem that does not support symlinks or do not want all of your vtapes in the same parent directory, in which case chg-multi is your best bet (see How To:Backup to Virtual Tapes on a non-UNIX Filesystem in either case). If you are not using Amanda-2.6.1 or later, then the old chg-disk is also a good choice.

See amanda-changers(7) for details on these options.

Using the new chg-disk:

The disk changer, introduced in Amanda-2.6.1, handles access to virtual tapes, and can even marshal simultaneous access to a library of vtapes by multiple Amanda processes. It's easy to set up:

tpchanger "chg-disk:/path/to/vtapes"

where /path/to/vtapes specifies a directory containing some number of slotN directories representing the "slots" in this changer.

There is no need to create anything more that the slot subdirectories, so creating the proper directory structure can be as simple as:

cd /path/to/vtapes
for slot in `seq 1 25`; do mkdir slot$slot; done

Note that the slot directories must be named "slot1", "slot2", ... "slot9", "slot10", etc., and the digits in the directory name cannot be zero padded. (Some versions of the ls command [e.g. the GNU coreutils' version] have a "-v" option that is useful for getting the slot directories to sort in numerical order.)

When vtape read/write operations are underway, the changer automatically creates "drives" for each slot reservation, named driveN; you do not have to do this manually.

Of course, you can use this device as a component of a larger device, or sequester it in a changer definition for reference by name to something like amvault(8):

define changer "archivedisks" {
  tpchanger "chg-disk:/path/to/vtapes"
}

Note that for Amanda v3.3 and later, you can tell the chg-disk: change to autocreate the slotN directories (thus avoiding the need to manually create them as shown above):

define changer archivedisks {
  tpchanger "chg-disk:/path/to/vtapes"
  property "num-slot" "21"
  property "auto-create-slot" "yes"
}

See amanda-changers(7) for more info.


You can use convenient shell script that will create folder structure and label your virtual tapes: File:Createvtapes.sh.txt

USAGE:
        ./createvtapes.sh -h
        ./createvtapes.sh <CONFIG_NAME> <TAPENAME>              #  if you know your tapename please provide it here as such
        ./createvtapes.sh <CONFIG_NAME>                         # if you don't know tapename it will show you regexp from config

Example:
        ./createvtapes.sh linux-ws LINUXBACKUP-

Using the old chg-disk

NOTE: This section is only necessary if you're using a version of Amanda older than 2.6.1

The changer script chg-disk has been around since Amanda-2.4.4, and is specifically written to handle a bunch for virtual tapes on disk. This script does not need a separate configuration file, like most other changer scripts do. Instead it uses these parameters in amanda.conf(5):

tpchanger "chg-disk"
changerfile "/home/amanda/test/chg-disk-status"    # status files prefix
tapedev "file:/amandatapes/test/slots"
tapecycle 5
# changerdev is ignored

chg-disk operates the virtual changer by pointing the symlink data to another directory, named slotX, where X is the slot number. The directory tree should look like:

  tapedev_dir -|
               |- data -> slot1/
               |- slot1/
               |- slot2/
               |- ...
               |- slotN/

"tapedev_dir" is the value of the tapedev parameter, and N is value of tapecycle in amanda.conf. The changer script uses the value of changerfile as prefix of some files which store the status of the virtual changer. The file itself need not exist, but the dumpuser must be able to write to the directory it is in.

We create the virtual slots tree for the chg-disk changer, and set it "online", by creating the "data" symlink:

$ mkdir -p /amandatapes/test/slots
$ cd /amandatapes/test/slots
$ TAPENUMS="1 2 3 4 5" # up to your tapecycle
$ for i in $TAPENUMS; do mkdir -p slot$i; done
$ ln -s slot1 data

Do not add a leading zero to the slot number, as chg-disk would not understand that. Create as many slots as you have specified as tapecycle in amanda.conf.

If you are unsuccessful in creating a link in the last step given the file system you are using, please see How To:Backup to Virtual Tapes on a Windows Server.

Check the result (amdevcheck is only available in Amanda-2.6.0):

$ amdevcheck DailySet1 file:/amandatapes/test/slots
MESSAGE Error loading device header -- unlabeled volume?
VOLUME_UNLABELED
DEVICE_ERROR
VOLUME_ERROR

Labeling the Virtual Tapes

And we label the virtual tapes:

$ for i in $TAPENUMS; do amlabel DailySet1 Tape-$i slot $i; done
$ amcheck DailySet1

(substitute the name of your configuration for DailySet1, and of your tapes for Tape)

When using Amanda 2.5.0 and later, you can let Amanda label the tapes automatically on first use. Add the parameter label_new_tapes to the amanda.conf file, and give it a template with '%' signs for the number, like:

label_new_tapes "Tape-%%"

The usual warnings about this being dangerous because it will erase non-Amanda tapes, does not apply here, because we are pretty sure that all the vtapes are indeed to be used for this purpose.

As always we end with "amcheck test" and solve the issues it complains about. We can verify all the virtual tapes, and load the first tape again, ready for the first amdump run:

$ amtape DailySet1 show
$ amtape DailySet1 reset

When the command "amtape DailySet1 show" cycles through the changer, it leaves the last displayed tape as current. Therefore we reset the changer and load the tape in the first slot again. Remember to do this when you experiment with "amtape config show", otherwise, the sequence of tapes Amanda will use, will be out of order (considering the numbers we labeled the tapes).

Credit

Based on text by Stefan G. Weichinger, November - December, 2003, with updates in April, 2005.