How To:Backup to Amazon S3

From The Open Source Backup Wiki (Amanda, MySQL Backup, BackupPC)
Revision as of 23:01, 21 October 2010 by Systems Joe (talk | contribs) (autolabel)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This article is a part of the How Tos collection.

The S3 Device performs backups to Amazon's S3, a private, affordable, and reliable off-site data storage service. If you have a relatively small amount of very important data, then this service may be ideal. WAN-based backups by definition traverse a relatively low-bandwidth link, so large backup sets may not be practical. Remember that, as is the case with all remote storage systems, data sent to S3 can be accessed if sent unecrypted. See "SSL" below, or How To:Set up data encryption for amanda's encryption options and procedures.

Note that the S3 device interfaces with the Device API, and as such is only available in Amanda-2.6.0 and later.

Before You Start

Familiarize yourself with S3 at http://amazon.com/s3, and sign up for the service. You will receive an access (public) key and a secret key. In this document, we will use the access key '1ATXQ3HHA59CYF1CVS02' and the secret key '09dfma0928m0sd9f8m-adf/asdf098asdf'.

Figure out about how much data you'll be backing up per run (the tapetype length), and how many tapes you want (tapecycle). Calculate the costs to transfer and store that much data, to avoid any surprises. In the example below, we'll assume a tapecycle of 10.

New chg-multi (Amanda 3.1 and later)

With Amanda 3.1.0, a new chg-multi device was introduced, listed as "chg-multi:DEVICE-LIST" in [1].This is the new and recommended way of using the S3 backend.

Configuration

The amanda-S3.conf template is currently (as of 3.1.0) tuned to the old chg-multi. You can use it as a starting-point, but "tapedev" and "changerfile" should be excluded.

For a backup named DailySet1:

amanda.conf:

# amazonaws S3
device_property "S3_ACCESS_KEY" "1ATXQ3HHA59CYF1CVS02"                # Your S3 Access Key
device_property "S3_SECRET_KEY" "09dfma0928m0sd9f8m-adf/asdf098asdf"  # Your S3 Secret Key
device_property "S3_SSL" "YES"                                        # Curl needs to have S3 Certification Authority (Verisign today) 
                                                                      # in its CA list. If connection fails, try setting this no NO 
tpchanger "chg-multi:s3:1ATXQ3HHA59CYF1CVS02-backups/DailySet1/slot-{01,02,03,04,05,06,07,08,09,10}" # Number of tapes in your "tapecycle"
changerfile  "s3-statefile"                                           # Amanda will create this file
tapetype S3

define tapetype S3 {
    comment "S3 Bucket"
    length 10240 gigabytes # Bucket size 10TB
}


We then need to label the tapes, for our 10 tapes tapecycle:

for i in 1 2 3 4 5 6 7 8 9 10; do amlabel DailySet1 DailySet1-$i slot $i; done;

This should return successful if the parameters in the configuration file are right. Beware of the curl CA with the default SSL behavior for errors, as commented above.

Next, do a last check:

amdevcheck DailySet1 s3:1ATXQ3HHA59CYF1CVS02-backups/DailySet1/slot-10

Should return:

SUCCESS


Old chg-multi (Amanda 2.6 and later)

Deprecated
The old chg-multi device is deprecated and should not be used for new deployments. Use the the new chg-multi instead.

[2]

Configuration

I recommend starting with the template.d/amanda-S3.conf you can find shipped with Amanda 2.6.0. Then, the following will help you configure a changer wrapping S3 with multiple virtual tapes:

amanda.conf:

tapedev "null:" # (device should come from the changer)
device_property "S3_ACCESS_KEY" "1ATXQ3HHA59CYF1CVS02"
device_property "S3_SECRET_KEY" "09dfma0928m0sd9f8m-adf/asdf098asdf"
tpchanger "chg-multi"
changerfile "changer.conf"

changer.conf:

multieject 0
gravity 0
needeject 0
ejectdelay 0
statefile /var/amanda/changer-status
firstslot 1
lastslot 10

slot  1  s3:1ATXQ3HHA59CYF1CVS02-backups/slot-01
slot  2  s3:1ATXQ3HHA59CYF1CVS02-backups/slot-02
slot  3  s3:1ATXQ3HHA59CYF1CVS02-backups/slot-03
slot  4  s3:1ATXQ3HHA59CYF1CVS02-backups/slot-04
slot  5  s3:1ATXQ3HHA59CYF1CVS02-backups/slot-05
slot  6  s3:1ATXQ3HHA59CYF1CVS02-backups/slot-06
slot  7  s3:1ATXQ3HHA59CYF1CVS02-backups/slot-07
slot  8  s3:1ATXQ3HHA59CYF1CVS02-backups/slot-08
slot  9  s3:1ATXQ3HHA59CYF1CVS02-backups/slot-09
slot  10 s3:1ATXQ3HHA59CYF1CVS02-backups/slot-10

Note that we're using the bucket 1ATXQ3HHA59CYF1CVS02-backups, which has our public key as a prefix. This helps to avoid namespaces collisions with other users of S3. Also, the above configuration will create files on S3 like: s3:1ATXQ3HHA59CYF1CVS02-backups/slot-01special-tapestart, s3:1ATXQ3HHA59CYF1CVS02-backups/slot-01f00000001-filestart, ..., s3:1ATXQ3HHA59CYF1CVS02-backups/slot-02special-tapestart, ...

I prefer keeping the files for each virtual tape sorted into one directory per "tape":

slot  1  s3:1ATXQ3HHA59CYF1CVS02-backups/DailySet1/0001/
slot  2  s3:1ATXQ3HHA59CYF1CVS02-backups/DailySet1/0002/
slot  3  s3:1ATXQ3HHA59CYF1CVS02-backups/DailySet1/0003/
...

If you enable label_new_tapes (autolabel in version 3.1), then there's nothing more to do -- the S3 device will create the bucket and amdump will label the first tape during the first run. Otherwise, proceed to label the tapes as usual:

amlabel MYBACKUPS MYBACKUPS01 slot 1

You can check a tape's status with:

amdevcheck MYBACKUPS s3:1ATXQ3HHA59CYF1CVS02-backups/slot-10
SUCCESS


Notes

Time Sync

Proper S3 authentication depends on your system's clock being fairly accurate. If your clock tends to drift, you may need to install and configure an NTP client, or invoke rdate or the equivalent against a known-good machine.

SSL

To encrypt traffic to and from Amazon, but not that stored on Amazon itself, set the "S3_SSL" property as shown above. The performance hit for this is minimal in comparison to the data-security advantages.