Results missing

From The Open Source Backup Wiki (Amanda, MySQL Backup, BackupPC)
(Redirected from Amdump: results missing)
Jump to navigationJump to search

This article is a part of the Troubleshooting collection.

Problem

amdump reports

results missing

for a DLE

Solution

This error message can have many causes; what follows is a useful checklist to diagnose it.

Amcheck passes all tests?

First test with amcheck(8) and solve all issues it complains about. (Messages marked with NOTE do not necessarily need to be handled)

Timeout during estimate?

If the estimates take too long, the Amanda server may have already given up waiting for that client. Find out how long the estimate took on the client. Locate the debug file for sendsize on that client in the AMANDA_DBGDIR directory. The last line of the file contains a timestamp. Also count how many disklist entries (DLE) there are for this host.

$ tail -1 sendsize.20051206000412.debug
 sendsize: time 256.797: pid 16014 finish time Tue Dec  6 00:08:29 2005

$ grep 'estimate time'  sendsize.20051206000412.debug
sendsize[16017]: estimate time for /space level 0: 0.653
sendsize[16017]: estimate time for /space level 1: 0.007
sendsize[16016]: estimate time for / level 0: 217.741
sendsize[16016]: estimate time for / level 1: 25.902
sendsize[16038]: estimate time for /boot level 0: 0.044
sendsize[16038]: estimate time for /boot level 1: 0.007
sendsize[16041]: estimate time for /var level 0: 11.492
sendsize[16041]: estimate time for /var level 2: 1.037

The Amanda server waits etimeout seconds (default 300 s) muliplied by the number of disklist entries for that host. In the above example that would be 4 * 300 seconds. (The above estimate took 257 seconds, so that is good.)

You may need to adjust the etimeout value in amanda.conf(5). Take into account that Amanda can ask estimates for 3 dump levels (level 0, level N, and level N+1) for each disklist entry.

Investigate why some of the estimates take so long. Using gnutar on a filesystem with many small files can take a very long time. On the other hand, a non-responsive NFS-server may block gnutar for a long time, when it traverses the tree and does a stat() on that mountpoint (it is only after the stat() that gnutar can know it would cross a filesystem).

Since Amanda version 2.4.5 you can also use faster but less accurate methods for the estimates with a parameter for the dumptype (see amanda.conf(5)):

estimate client|calcsize|server
client
Use the same program as the dumping program, this is the most accurate way to do estimates, but it can take a long time. This is the default.
calcsize
Use a faster program to do estimates, but the result is less accurate.
server
Use only statistics from the previous run to give an estimate, it takes only a few seconds but the result is not accurate if your disk usage changes from day to day.

Choose a faster method for a particularly large and slow partition, e.g.:

disklist:

 myhost.example.com   /cvsrep  {
         comp-user-tar
         estimate calcsize
       }

If using auth=bsd or auth=bsdudp

Timeout due to firewall?

If there is a firewall between (or on!) the client and server, the reply UDP packet may be dropped by the firewall. Since UDP is connectionless, firewalls have a difficult time identifying an incoming packet as a reply to a packet sent earlier. Some firewalls require replies to come in as few as 40 seconds, which is reasonable for protocols like DNS, but not for Amanda.

When using the iptables ip_conntrack_amanda kernel module in the firewall, you may need to adjust the master_timeout parameter for this module. The master_timeout is the time that the module keeps connections tracked on the 10080/udp port.

In /etc/modprobe.conf:

options  ip_conntrack_amanda  master_timeout=3600

A manual test for UDP timeouts can be simulated with the netcat utility. On the Amanda client first make sure nothing is already listening on port 10080 (modify (x)inetd.conf). Then start on the Amanda client:

 $ nc -vv -u -l -p 10080                       # older versions of nc (e.g. 1.10)
 $ nc -vv -u -l 10080                          # recent versions of nc (e.g. 1.84)

Next on the Amanda server:

 $ nc -vv -u theclient.example.com 10080

What you type on one side should appear on the other. The firewall is probably configured to pass replies through only when initiated from the server: first type something on the server, then on the client. Let the connection idle for some longer time, and type something again on the client, and verify if the server receives the bytes.

UDP packet too large?

One of the possible reasons is that you have requested too many disklist entries for a single host. Because Amanda sends the estimate request in a single UDP packet, it is possible for the estimate request or reply to exceed the maximum packet size. Fixing this problem is suprisingly difficult. This issue can come up even if things were working initially, because the number of estimates taken increases as full and level-1 dumps are done.

If using Amanda 2.5.1 or later, the best and simplist way to address this problem is to switch to bsdtcp, rsh or ssh (see amanda-auth(7)), none of which have this issue.

If you have to stick with the UDP packet transport, then here are some solutions to work around the problem:

By default, Amanda sends UDP packets of no more than 32 KB. If your system supports it, you can increase this to 64 KB by editing the source.

FreeBSD 5.1 (and later) and also Mac OS 10.4 and later limit the UDP packet size to 9216 bytes by default. That limit can be increased up to 64Kbytes with the command:

# sysctl net.inet.udp.maxdgram=63535

or automatically when rebooting by adding to the file /etc/sysctl.conf this line:

net.inet.udp.maxdgram=65535

Other OSes may use different commands to increase the maximum UDP packet size. You'll have to look in the man pages for that OS.

Sometimes buggy UDP implementations garble packages exceeding the MTU size (1500 bytes on ethernet).

You can see the UDP reply packet in the amandad.datetime.debug file in the AMANDA_DBGDIR directory on the client:

...
amandad: time 0.000: got packet:
--------
Amanda 2.4 REQ HANDLE 003-B88F0708 SEQ 1133823863
SECURITY USER amanda
SERVICE sendsize
OPTIONS features=fffffeff9ffe7f;maxdumps=2;hostname=myclient.example.com;
GNUTAR /space 0 1970:1:1:0:0:0 1 OPTIONS |;bsd-auth;compress-fast;index;
GNUTAR /space 1 2005:11:26:0:46:56 1 OPTIONS |;bsd-auth;compress-fast;index;
GNUTAR /var 0 1970:1:1:0:0:0 1 OPTIONS |;bsd-auth;compress-fast;index;
GNUTAR /var 1 2005:12:3:0:41:14 1 OPTIONS |;bsd-auth;compress-fast;index;
GNUTAR /boot 0 1970:1:1:0:0:0 1 OPTIONS |;bsd-auth;compress-fast;index;
GNUTAR /boot 1 2005:11:29:2:11:9 1 OPTIONS |;bsd-auth;compress-fast;index;
GNUTAR / 0 1970:1:1:0:0:0 1 OPTIONS |;bsd-auth;compress-fast;index;
GNUTAR / 1 2005:12:1:0:27:38 1 OPTIONS |;bsd-auth;compress-fast;index;
GNUTAR / 2 2005:12:3:0:26:8 1 OPTIONS |;bsd-auth;compress-fast;index;
--------
...

The UDP packet is the text between the lines with hyphens. Copy/paste the text in a file, and count the characters with "wc". Add the 20 bytes for the IP-header plus 8 bytes for the UDP-header. This is the size of the UDP packet.

A network trace on traffic for port 10080/udp (with snoop on Solaris, tcpdump, ethereal...) may help investigating this further.

You can notice that amanda asks for 1, 2 or even 3 different dump levels of each disklist entry . A run on a different day, could result in more or less lines, bumping into the UDP packet size limit on some days only. (After a "amadmin config force myclient.example.com" would request a lot less estimates, possibly allowing the packet delivery to succeed)

If you use many include/exclude directives in the dumptypes, find out what takes less size bytes to communicate: a pathname on the client with "include list", or a few "include file append" directives in the dumptype itself.

One possible work-around is to try to shorten the pathnames of the directories, so that more requests fit in the UDP packet. You may create short-named links in some directory or exclude files closer to the root (/) so as to reduce the length of names. I.e., instead of backing up /usr/home/foo and /usr/home/bar, create the following links:

  /.foo -> /usr/home/foo
  /.bar -> /usr/home/bar

then list /.foo and /.bar in the disklist.

Another approach is to group sub-directories in backup sets, instead of backing up them all separately. For example, create /usr/home/.bkp1 and move `foo' and `bar' into it, then create links so that the original pathnames remain functional. Then, list /usr/home/.bkp1 in the disklist. You may create as many `.bkpN' directories as you need.

A simpler approach, that may work for you, is to backup only a subset of the subdirectories of a filesystem separately. The others can be backed up together with the root of the filesystem, using an exclude list that prevents duplicate backups.