Difference between revisions of "How To:Set Up Virtual Tapes"

From The Open Source Backup Wiki (Amanda, MySQL Backup, BackupPC)
Jump to navigationJump to search
(→‎Create A Tape: single-tape config less confusing)
m
Line 142: Line 142:
 
  $ TAPENUMS="1 2 3 4 5" # up to your tapecycle
 
  $ TAPENUMS="1 2 3 4 5" # up to your tapecycle
 
  $ for i in $TAPENUMS; do mkdir -p slot$i; done
 
  $ for i in $TAPENUMS; do mkdir -p slot$i; done
 +
$ ln -s slot1 data
 
Do not add a leading zero to the slot number, as ''chg-disk'' would not understand that. Create as many slots as you have specified as '''tapecycle''' in amanda.conf.
 
Do not add a leading zero to the slot number, as ''chg-disk'' would not understand that. Create as many slots as you have specified as '''tapecycle''' in amanda.conf.
 +
 +
If you are unsuccessful in creating a link in the last step given the file system you are using, please see [[How To:Backup to Virtual Tapes on a Windows Server]].
  
 
Check the result (amdevcheck is only available in Amanda-2.6.0):
 
Check the result (amdevcheck is only available in Amanda-2.6.0):
  
$ ln -s slot1 data
 
 
  $ amdevcheck DailySet1 file:/amandatapes/test/slots
 
  $ amdevcheck DailySet1 file:/amandatapes/test/slots
 
  VOLUME_UNLABELED
 
  VOLUME_UNLABELED

Revision as of 21:10, 12 November 2008

Virtual tapes use the file driver to emulate a tape, while storing data on disk in a directory tree. Although there are alternatives available, the recommended changer for virtual tapes is chg-disk.

Requirements

Before beginning you will need to decide on dedicated parts of your hard disks for your virtual tape storage. While this space could be spread among several file systems and hard disks, I recommend to dedicate at least a specific partition, better a specific physical harddisk to the task of keeping your vtapes. The use of a dedicated disk will speed things up definitely.

The disk space you dedicate for your vtapes should NOT be backed up by Amanda. Also, for performance reasons there should be NO holding disks on the same partition as the vtapes, preferably not even on the same physical drive.

If you only have one harddisk, it will work out, too, but you will suffer low performance due to massive head-moving in your harddisk, resulting from copying data between the filesystems.

Length and number of tapes

The next step is to figure out how to divide your disk into tapes. You should have good estimates of the following:

* total amount of data to be backed up
* number of incremental (level 1 and higher) dumps per full (level 0) dump
* size of incremental dumps relative to full dumps (which depends on the rate at which files change on the target system)
* total tapecycle (which dictates the data retention period)
* the projected change in these numbers over the long term

Armed with this information, the first choice to make is whether you want to future-proof the solution, at the cost of more disk space up-front, or design things to optimize current usage of resources. A safe, conservative calculation will give you a system that you can expect to "run itself" for months or years, adjusting appropriately for changes in the environment. The other option, "oversubscribing", provides a system that will utilize available disk resources optimally, but which must be watched carefully to prevent backup failures.

The safe calculation

This calculation divides the available space evenly into tapes, such that the total size of the tapes is equal to the available space.

Look at the following rule of thumb: As many filesystems exhibit dramatically reduced performance when they are nearly full I have chosen to allocate only 90% of the available space. So we have:

 available space * 0.9 >= tapelength * tapecycle

This is a very conservative approach to make sure you don´t suffer any performance drop due to a nearly-full-filesystem. Depending on your filesystem and other circumstances, you may need to adjust this factor. As it is uncommon for Amanda to fill an entire tape you may also wish to use more space than that.

Since we are given the available space and tapecycle, the tape length can be calculated as

 available space * 0.9 / tapecycle

In our example

20 GB * 0.9 = 18 GB to use
18 GB / 9 = 2 GB per tape (for a tapecycle of 9)

Oversubscription

Oversubscription refers to promising more of a resource than actually exists. In the case of Amanda, this means configuring Amanda to use more (virtual) tape than is actually available.

The formula is then

 available space * oversubscription factor = tapelength * tapecycle

where the oversubscription factor is some number between 1.0 and the tapecycle.

The success of this method assumes that Amanda will not, in general, fill a tape to capacity. This may be the case if:

  • only a few, large DLEs (disklist entries) are defined, so that tape usage is high for days with full dumps, and low on other days (a "lumpy" dump distribution)
  • the total data to be backed up is smaller than the tapesize

Note that Amanda does not know that its space has been oversubscribed, so it cannot make any allowances in the planning of dumps.

Before adopting this strategy, consider how it can go awry. Assume we have 15GB of storage, a tapecycle of 15, and tape length 2GB. This corresponds to an oversubscription factor of 2. While you were away on vacation, disk use on one DLE increased dramatically, and the work of a new team has caused incrementals on other DLEs to grow quite large. Backups for the last 9 days have, as a consequence, been larger than projected, although Amanda has not signaled any error conditions. The tape sizes are now

T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15
1.9GB 1.6GB 1.4GB 1.7GB 1.8GB 1.9GB 1.2GB 1.7GB 1.8GB 0.3GB 0.5GB 0.9GB 0.4GB 0.6GB 0.4GB

T11 will be overwritten by tonight's backup, but once it is erased, only 0.4GB are available. Tonight's backup will fail with a short write, and leave dumps in the holding disk, and be unable to flush them. This situation will continue at least until T01, or until you add more disk space.

An oversubscription factor of 2.0 is very high. The appropriate value depends on the situation, but values in the range 1.1-1.5 are not uncommon.

Redundancy

Putting all your vtapes on a single destination makes it much more sensitive to massive data loss. Even on an expensive RAID device things can go wrong, resulting in a complete unusable filesystem; and then you lose all the vtapes. Without RAID, accidents happen even more.

Consider using RAIT to mirror the dumps to disk + tape or to disk + external disk.

Or do the "daily" backup to vtapes mixing fulls and incrementals as usual for the "normal" restores, and run a weekly "archive" backup to tape with only full dumps and store that tape offsite.

Of course it all depends on the cost of losing the vtapes, versus the investment to make redundant copies.

Configuration

Prepare the filesystem used for the vtapes

Decide on where to put your files, create the appropriate partition(s) and filesystem(s) and mount them. In our example we have the dedicated partition hdc1, mounted on /amandatapes for vtape storage. The filesystem must also be capable of creating large file (> 4Gbyte) and must be able to handle symlinks (not FAT).

$ mount
[...]
/dev/hdc1 on /amandatapes type ext3 (rw)
[...] 

Make sure there is adequate space:

$ df -h /amandatapes
Filesystem      Size  Used  Avail  Use%   Mounted on
/dev/hdc1        20G    0G    20G    0%   /amandatapes 

Create a tapetype definition

Add a new tapetype definition similar to the following to your amanda.conf(5). I named my definition "HARD-DISK". Choose whatever name you consider appropriate.

define tapetype HARD-DISK {
    comment "Dump onto hard disk"
    length 3072 mbytes 	# specified in mbytes to get the exact size of 3GB
}

You don´t have to specify the parameter speed (as it is commonly listed in tapetype definitions and reported by the program amtapetype). Amanda does not use this parameter right now.

There is also an optional parameter filemark, which indicates the amount of space "wasted" after each tape-listitem. Leave it blank and Amanda uses the default of 1KB.

The tapetype defined above should be selected by the paramater tapetype in amanda.conf(5), too:

 tapetype HARD-DISK

Create A Tape

If you're not using a changer, this is how you can create a single vtape

# chown amanda:disk /amandatapes
# chmod 750 /amandatapes                       # backups contain secret files!
# su - amanda
$ mkdir -p /amandatapes/test/mytape/data

We can use such a simple vtape as a tape device in amanda.conf with a line like:

tapedev "file:/amandatapes/test/mytape"

Set up A Changer

You may use one of the 2 possible changers :

  • chg-disk (handles a library with one drive and multiple tapes)
  • chg-multi (handles many drives with a tape in each drive)

You'll find a comparison analysis bellow

Using chg-disk

The changer script chg-disk is specifically written to handle a bunch for virtual tapes on disk. This script does not need a separate configuration file, like most other changer scripts do. Instead it uses these parameters in amanda.conf(5):

tpchanger "chg-disk"
changerfile "/home/amanda/test/chg-disk-status"    # status files prefix
tapedev "file:/amandatapes/test/slots"
tapecycle 5
# changerdev is ignored

chg-disk operates the virtual changer by pointing the symlink data to another directory, named slotX, where X is the slot number. The directory tree should look like:

  tapedev_dir -|
               |- data -> slot1/
               |- slot1/
               |- slot2/
               |- ...
               |- slotN/

"tapedev_dir" is the value of the tapedev parameter, and N is value of tapecycle in amanda.conf. The changer script uses the value of changerfile as prefix of some files which store the status of the virtual changer. The file itself need not exist, but the dumpuser must be able to write to the directory it is in.

We create the virtual slots tree for the chg-disk changer, and set it "online", by creating the "data" symlink:

$ mkdir -p /amandatapes/test/slots
$ cd /amandatapes/test/slots
$ TAPENUMS="1 2 3 4 5" # up to your tapecycle
$ for i in $TAPENUMS; do mkdir -p slot$i; done
$ ln -s slot1 data

Do not add a leading zero to the slot number, as chg-disk would not understand that. Create as many slots as you have specified as tapecycle in amanda.conf.

If you are unsuccessful in creating a link in the last step given the file system you are using, please see How To:Backup to Virtual Tapes on a Windows Server.

Check the result (amdevcheck is only available in Amanda-2.6.0):

$ amdevcheck DailySet1 file:/amandatapes/test/slots
VOLUME_UNLABELED
SUCCESS

And we label the virtual tapes:

$ for i in $TAPENUMS; do amlabel DailySet1 Tape-$i slot $i; done
$ amcheck DailySet1

(substitute the name of your configuration for DailySet1, and of your tapes for Tape)

When using Amanda 2.5.0 and later, you can let Amanda label the tapes automatically on first use. Add the parameter label_new_tapes to the amanda.conf file, and give it a template with '%' signs for the number, like:

label_new_tapes "Tape-%%"

The usual warnings about this being dangerous because it will erase non-Amanda tapes, does not apply here, because we are pretty sure that all the vtapes are indeed to be used for this purpose.

As always we end with "amcheck test" and solve the issues it complains about. We can verify all the virtual tapes, and load the first tape again, ready for the first amdump run:

$ amtape DailySet1 show
$ amtape DailySet1 reset

When the command "amtape DailySet1 show" cycles through the changer, it leaves the last displayed tape as current. Therefore we reset the changer and load the tape in the first slot again. Remember to do this when you experiment with "amtape config show", otherwise, the sequence of tapes Amanda will use, will be out of order (considering the numbers we labeled the tapes).

Using chg-multi

A much older changer script is "chg-multi", which emulates a changer consisting of multiple tape drives. If you have two tape drives of the same model, and hook them to the server, this script allows you to emulate a changer with two slots.

Vtapes are complete tape drives, as far as Amanda is concerned, and you can operate a bunch of these with the chg-multi changer script too:

Create 5 vtapes for our "test" configuration:

for i in 1 2 3 4 5; do mkdir -p /amandatapes/test/tape$i/data; done

Now we have a server with 5 tape drives. They are virtual tapes, but Amanda isn't picky about that.

Change amanda.conf for the "test" configuration to have these values:

tpchanger "chg-multi"
changerfile "chg-multi.conf"       # name of the special configuration file
# tapedev is ignored if present, to avoid confusion, just comment it out
# changerdev is ignored too

The chg-multi script needs more configuration and uses the parameter changerfile in amanda.conf as the name of that special config file. And add a file in the same directory as amanda.conf with the name we specified above as the changerfile chg-multi.conf. Because "chg-multi" is useful in a wide context, we need to specify a lot of parameters that are irrelevant for vtapes:

multieject 0
needeject 0
gravity 0
ejectdelay 0
statefile /home/amanda/test/changerstatus
firstslot 1
lastslot 5
slot 1 file:/amandatapes/test/tape1
slot 2 file:/amandatapes/test/tape2
slot 3 file:/amandatapes/test/tape3
slot 4 file:/amandatapes/test/tape4
slot 5 file:/amandatapes/test/tape5

The statefile parameter in the file chg-multi.conf is now the prefix of some files that hold the status of the emulated changer.

And we label the tapes in all the slots:

$ for i in 1 2 3 4 5; do amlabel test TEST-$i slot $i; done

Or use the label_new_tapes parameter in amanda.conf (if you have at least Amanda 2.5.0) to this automatically, as explained in the chg-disk section above.

Then, as usual, the final checks:

$ amcheck test
$ amtape test show
$ amtape test reset

And we are ready to use our tape changer with 5 tape drives.

Comparison of chg-disk and chg-multi for virtual tapes

The two changers chg-disk and chg-multi have a different approach to the handling of tapes:

  • The script chg-multi handles many drives with a tape in each drive.
  • The script chg-disk handles a library with one drive and multiple tapes.

This implies that chg-disk can drive only one tape at a time, while in the setup with chg-multi, you can always specify one specific tape, and use that one for e.g. restoring, while amdump is using another tape.

While the chg-disk changer is very straightforward to set up, the chg-multi script has a wider range of uses, but also is slightly more complicated to set up. The chg-multi script can e.g. be extended to rait, making a mirror of the backup to a real tape or a vtape on an external disk at the same time.

For chg-multi, the underlying filesystem does not need to be able to handle symlinks. You can use plain old VFAT on an external USB-disk that is also accessible by Windows. (But you'll need to limit the maximum tapesize to 4 Gbyte, or using multi-tape-split dumps (see also How To:Split Dumps Across Tapes), available in 2.5.0, to avoid running into the max filesize limit of vfat.)

If you don't need the complexity of chg-multi, stay with the easy chg-disk.

Migrating to chg-multi is easy by just moving the slotX directories of the chg-disk vtape tree to the data directories of each vtape in the chg-multi tree.

The vtape directory tree for chg-disk looks like:

    root_dir -|
              |- data -> slot1/
              |- slot1/
              |- slot2/
              |- ...
              |- slotN/

The vtape directory tree for chg-multi looks like:

    root_dir -|
              |- vtape1 --- data
              |- vtape2 --- data
              |- ...
              |- vtapeN --- data

There is only one parent of the "data" subdirectory in a chg-disk tree: it has one virtual tape drive. Chg-disk "inserts" tapes into the drive by manipulating the symlink "data". In chg-multi, each slot is an independent virtual tape drive for a single vtape. Actualy in chg-multi, the vtapes do not even need a common parent and can easily be spread over many disks.

To use chg-disk you need to have at least amanda-2.4.4p1-20031202. Chg-multi is much older.

Credit

Based on text by Stefan G. Weichinger, November - December, 2003, with updates in April, 2005.