Selfcheck request failed

From The Open Source Backup Wiki (Amanda, MySQL Backup, BackupPC)
Revision as of 17:44, 13 October 2010 by Dustin (talk | contribs) (fix links)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This article is a part of the Troubleshooting collection.

Problem

For an Amanda 2.5.0 server and later:

 Amanda Backup Client Hosts Check
 --------------------------------
 WARNING: 192.168.15.245: selfcheck request failed: timeout waiting for ACK
 Client check: 1 host checked in 30.097 seconds, 1 problem found

Another variation of the same message:

WARNING: client.company.com: selfcheck request failed: client.company.com: did not resolve to client.company.com

For an Amanda 2.4.4 server and earlier the error message was worded differently:

AMANDA backup client hosts check
--------------------------------
Warning: selfcheck request timed out.  Host down?
Client check: 1 host checked in 30.051 seconds, 1 problem found

Diagnostics

If the Amanda server is running version 2.6.1 or higher, you can use amservice(8) to diagnose network problems, independent of any server configuration. This provides an effective way to divide your troubleshooting search in half: if amservice succeeds but amcheck fails to connect, then the problem is with the server configuration. If amservice fails, then the problem is on the client.

Run the 'noop' service on the client like this:

 $ amservice clienthostname yourauth noop </dev/null
 OPTIONS features=ffffffff9ffeffffffff0f;

where yourauth is the authentication method you're using to connect to your client (bsdtcp, ssh, etc.). If you don't see the OPTIONS string, then there is a problem with the authentication between the server and the client. If the OPTIONS string does appear, then the server is able to run a service on the client, and you will need to investigate why the sendbackup service is failing.

NOTE: If the client is a Windows platform you will get this response:

  OPTIONS features=ff7ffeff9cfeffffd3cf1300;

Solution

Although there are many possibilities for misconfiguration, an amcheck failure is most commonly a client configuration error. Below are possible reasons and solutions for an amcheck failure.

Check if network services (x/inetd) and .amandahosts are configured correctly

Correct xinetd and .amandahosts configuration are available at How To:Configure bsdtcp authentication or for older auths, How To:Configure Backward-compatible Authentication Methods.

Here is a checklist once you have verified correct configuration:

  • Make sure you have added the Amanda services to /etc/services (or the NIS services map).
  • Make sure you signalled x/inetd to reread its configuration (some systems may need rebooting), for example
/etc/init.d/xinetd reload
  • Check the inetd man-page for possible differences between the standard inetd.conf format and the one in your system. For example, you will need to specify 'amandad' once again, as the first argument (argv[0]), with openbsd-inetd.
  • Pay special attention to typos in inetd.conf; error messages will probably appear in /var/adm/messages or /var/log/messages if you have typed the amandad program name incorrectly.
  • If you are building Amanda binaries on your own, make sure the dump user that has been specified at configure-time (--with-user=USERNAME) is listed in the (x)inetd config file.
  • Check whether the dump user has permission to run amandad, as well as any shared libraries amandad depends upon, by running the specified amandad command by hand, as the Amanda user. It should just time-out after 30 seconds waiting for a UDP packet. If you type anything, it will abort immediately, because it can't read a UDP packet from the keyboard.
  • The only_from parameter in xinetd configuration, if specified, should be correctly defined (it should be set to amanda server)
  • Verify, if applicable, whether xinetd or inetd is running such as by executing
ps -ef | grep inetd

If not, start it manually, for example

/etc/init.d/xinetd start

Once network services are running

netstat -a | grep amanda

can be used to verify that there is some program listening on the amanda/udp or /tcp port (usually 10080). Another tool that can used for verifying that amandad is listening on the udp or tcp port is lsof, for example

lsof -c xinetd
Solaris 10 specific

after editing inetd.conf to add amanda services be sure to run inetconv to update SMF.

# cat /etc/inetd.conf  | grep amanda
amanda dgram   udp     wait    amanda /usr/local/amanda-2.5.1p1/libexec/amandad amandad

# inetconv -i /etc/inet/inetd.conf
amanda -> /var/svc/manifest/network/amanda-udp.xml
Importing amanda-udp.xml ...Done

dmesg should report something like:

Jun  4 13:47:56 moe.sol10 inetd[12234]: [ID 702911 daemon.warning] Configuration file /etc/inet/inetd.conf has been modifiedsince inetconv was last run. "inetconv -i /etc/inet/inetd.conf" must be run to apply any changes to the SMF

Clients are using "bsdtcp" authentication but server is not

When following the steps from The 15-Minute Backup Solution, I also get the error:

WARNING: host.corp.com: selfcheck request failed: timeout waiting for ACK
Client check: 1 host checked in 30.024 seconds, 1 problem found

when I failed to do the following instructions in my amanda.conf:

Go to the “define dumptype global” section in the amanda.conf file and add the auth "bsdtcp" line right before the last “}” bracket. This is done to enable “BSDTCP” authentication.

Backing Up Older Amanda Clients (pre-2.5.1)

You can backup older Amanda client using a Amanda 2.5.1 and later Server however you must use a auth "bsd" setting as the older Amanda clients can only use udp datagrams. If this is not correct you will get errors such as

  selfcheck request failed: recv error: Connection reset by peer

for a amcheck on the server and an error such as

  Transport endpoint is not connected

in the /tmp/amanda/amandad*.debug files on the client.

To back up disks on the older clients you can override a global auth "bsdtcp" setting in special dumptype entry in "amanda.conf" for use with older clients.

Firewall/TCP-wrapper settings

Firewall between backup server and client can cause selfcheck to timeout if the firewall is not configured correctly.

Like most services started from x/inetd, the firewall or TCP-wrapper on the client has to be configured to allow the server to come in.

If you are using tcpd wrapper for amanda inetd entries (as the following example), hosts.allow(5) have to modified to allow amanda connections.

Example: inetd configuration entry using tcpd:

amanda dgram udp wait amandabackup /usr/sbin/tcpd /usr/lib/amanda/amandad 

hosts.allow file:

amandad: ALL : ALLOW
amindexd: ALL : ALLOW
amidxtape: ALL : ALLOW

Access to amanda processes should be restricted to only Amanda clients.

Wrong permissions or ownership for Amanda user home or log directories

Incorrect permissions or ownership for log directories (/var/log/amanda) and/or the home directory of the Amanda user (/var/lib/amanda)on a client can produce the following amcheck error

WARNING: clientname: selfcheck request failed: tcpm_recv_token: invalid size: amandad: 
Client check: 3 hosts checked in 1.121 seconds.  1 problem found.

Such incorrect permissions such as Amanda's home directory being owned by a different user can often occur if Amanda has been installed more than once on the same system.

Example of permissions for Amanda's home directory on Linux

drwxr-xr-x 11 amandabackup disk 4096 2008-12-02 17:11 /var/lib/amanda

Check for unwritable debug directory

Locate the AMANDA_DBGDIR directory (usually /tmp/amanda) and find a file named amandad.<DATETIME.debug> in the directory. When amandad starts, the debug file will be created for the process.

If the debug file does not exist, the Amanda client process, amandad, has not been started properly. Go through the checklist for inetd/xinetd/daemontools in the section above.

It is also possible that the debug directory (/tmp/amanda) is not writeable by the amanda backup user (example: amandabackup)

Verify the owner and permissions of /tmp/amanda directory. It should be owned by the user that is specified in inetd/xinetd configuration and the directory permissions should be 700 (drwx------).

Also check the permissions of the parent directory (usually /tmp: permissions 1777, drwxrwxrwt). If the amandabackup user does not have write access in the parent directory, you must create the debug directory yourself, and set ownership/permissions manually.

You may erase the directory and run amcheck again: the directory is created automatically.

If you are using Cygwin Amanda client, the /tmp/amanda - Amanda debug directory is created by amcheck(8) command with owner being the user who installed Cygwin. The directory should be owned by the Amanda backup user.

Slow NFS-server

If Amanda programs are NFS auto-mounted on the client, some clients may fail to mount the Amanda binaries in time for the check.

Failing DNS service

Name services on the Amanda client are not configured correctly or are not working.

This message indicates Amanda client cannot resolve the Fully Qualified Domain Name (FDQN) of the server. To fix the problem, do either of the following:

  • Check the forward and reverse name resolution on the Amanda client. Make sure the Amanda client is able to connect to the Amanda server using the Amanda server FQDN.

NOTE: 1. The note on correct/existing reverse DNS resolution is very important. Sometimes it works when you just use the raw IP address.

2. Take care with using canonical names for the hosts in your DNS database. For security reasons, Amanda always resolves a host to its canonical name.

  • Add an amanda server name (FQDN) entry to the /etc/hosts of client machines.
  • Check that the nsswitch.conf has files before dns for hosts.

Aliases on the network interface

If there are IP aliases on the network interface that is being used for backup, replace the SRC with the correct IP address or use the network interface without IP aliases. For example :

bond0 -> 192.168.18.7
bond0:0 -> 192.168.18.8

If Amanda backup is using the bond0:0 interface and the SRC route uses 192.168.18.7 IP address, the amcheck will fail with this error. To fix the problem, use 192.168.18.7 for Amanda backup and SRC route should use 192.168.18.7.

Linux-VServer

If you are backing up a hardware server which runs linux-vserver instances (http://linux-vserver.org), the IP addresses of the vservers can also confuse things.

For example:

servername:~# ip address list dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
   link/ether 00:1a:a0:26:aa:a6 brd ff:ff:ff:ff:ff:ff
   inet 192.168.98.142/23 brd 192.168.99.255 scope global eth0
   inet 192.168.98.179/24 brd 192.168.98.255 scope global secondary eth0
   inet 192.168.98.188/23 brd 192.168.99.255 scope global secondary eth0
...

The above will cause "selfcheck failed" because of the inconsistent netmask for one of the vservers...

Fixing the network mask for the vserver (192.168.98.179 in this example) solves the problem:

servername:~# ip address list dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
   link/ether 00:1a:a0:26:aa:a6 brd ff:ff:ff:ff:ff:ff
   inet 192.168.98.142/23 brd 192.168.99.255 scope global eth0
   inet 192.168.98.179/23 brd 192.168.99.255 scope global secondary eth0
   inet 192.168.98.188/23 brd 192.168.99.255 scope global secondary eth0
...

Virtual machine interaction with NIS

If your client is a Virtual Machine, and /tmp/ directory is locked/engaged by some process. e.g NIS client of VM fetching NIS server.

Remedy: Stop the NIS service, ensure "ls /tmp" or ls "/var/tmp" responds promptly as other directories, then run amcheck.