How To:Fill tapes to 100%

From The Open Source Backup Wiki (Amanda, MySQL Backup, BackupPC)
Jump to navigationJump to search

This article is a part of the How Tos collection.

Problem

> Hi all! I'm stuck while reading my today Amanda report: 27 Gb backed
> up today that did not fit in 40 Gb (2 tapes of 20 Gb)! Well... It's
> normal, since: nearly all data were onto tape 1 (about 7 Gb). Another
> DLE tried to go on this tape, but too large (14 Gb). So it took the
> next tape and went on it. And the last DLE, that was around 7 Gb, did
> not fit onto this tape, and failed (I configured Amanda to use 2
> tapes).

Delay writing to tape

Amanda version 2.6.0 and later has a parameter, where you can delay the tape writing until enough data has collected on the holdingdisk. See the parameters flush-threshold-dumped and flush-threshold-scheduled in amanda.conf(5).

Assuming the holdingdisk is large enough, you can add the parameter:

 flush-threshold-dumped 100

This will delay the writing to tape until the amount to fill one whole tape has been collected on the holdingdisk. Such a collection of backup images has a much greater chance of having the good mix between large and small images for the largestfit algorithm (see below) to work optimally.

See How To:Delay writing to tape for better tape utilization.

Splitting images across tapes

Amanda version 2.5.0 and later is capable of splitting an image across tapes too, at the expense of making it a little bit more difficult to restore without Amanda tools.

See How To:Split Dumps Across Tapes.

Interaction between taperalgo and dumporder

Since Amanda version 2.4.3 there are the amanda.conf(5) parameters taperalgo and dumporder to tune this.

In my config I have set the options:

 inparallel  10                # the number of dumpers
 taperalgo   largestfit
 dumporder   "TTTTTTTTTT"      # as many T's as there are dumpers
 runtapes    3

and the first 2 of my 3 tapes are filled near 100%!

I chose dumperorder "T": longest first, because that way the slowest computer is finished faster. In another mix of slow/fast computers with small/large filesystems, then maybe "S" (largest first) could improve the tape usage in some boundary cases.

The taperalgo setting only works good enough when there are many images to choose from. Taperalgo chooses only from those images that have finished dumping.

For an easy explanation, let's assume there are 10 dumpers (inparallel 10) and start the largest dumps first (dumporder "SSSSSSSSSS") and all the file systems on all the clients have the same dumprate.

The first dump that will be finished is the smallest of those 10. Taperalgo largestfit has only one image to choose from, and starts taping this one. While it is taping this image the next one to finish dumping is the next to smallest one, etc. By the time taper has finished its first image, it can choose between, let's say 5 images. This is the time when taperalgo largestfit can optimize the tape usage.

The default dumporder is "tttTTTTTTT": when there are a few large dumps, and many smaller, it could happen that all the small dumps are finished before the first large one arrives. Then all the small dumps are in the beginning of the tape, and the large dumps at the end. If a large dump hits EOT, it has to start all over again on the next tape; and that could waste a lot of tape.

Because it's easier for Amanda to shuffle with many smaller pieces than a few large ones, I break up my very large filesystems in a few smaller ones too.

When all the smaller dumps finish too early, then decreasing the number of dumpers (inparallel) helps.