DirectTCP

From wiki.zmanda.com
Revision as of 20:00, 18 January 2010 by Dustin (talk | contribs) (try to touch on all aspects of DirectTCP)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

DirectTCP is a fancy name for a very simple operation: sending backup data alone, with no framing, over a raw TCP connection. In many cases, this can be done in such a way that neither end of the TCP connection is on the Amanda server, removing the server as a performance bottleneck.

Connection Setup and Teardown

Sending a datastream through a TCP connection is fairly trivial. But setting up that connection is not. The process looks something like this:

  • one end listens for a connection, providing one or more addresses (each specified as an IP address and port) for incoming connections
  • the other end connects to one of the given addresses
  • data begins flowing

Note that there is not necessarily any relationship between the direction of data flow and the initiator of the connection. Amanda uses this to its advantage by preferring to do the connecting, rather than the listening. This avoids the complexity of discovering interface addresses and working around firewalled TCP ports.

Error Handling

TCP does not distinguish a normally closed connection from a connection closed by an abnormal process termination. That means that receiving an EOF on a DirectTCP connection does not necessarily indicate that all data was transferred successfully. It is up to higher layers to coordinate the results from both ends of the connection and make a determination as to the transfer's success.

Implementation

DirectTCP Devices

A device which can support DirectTCP implements a few additional methods, described in Amanda::Device. The device can listen for and accept incoming connections, but (for the moment at least) does not support connecting. Amanda will attempt to use DirectTCP whenever possible, on the assumption that it is more efficient.

When writing split parts, the data from a DirectTCP connection may need to end up on several volumes. To support this, the Device API defines an opaque DirectTCPConnection object and a protocol to suggest to a device that it use an existing connection. However, a given DirectTCPConnection object may not be compatible with a particular device, so it is not always possible to span a dump from one DirectTCP device to another. In general, this limitation is simple for users to understand, so this limitation is not important in practice. For example, a dump initially sent to one NDMP filer cannot be split onto another filer.

DirectTCP Applications

An application which can support DirectTCP advertises this in its SUPPORT response. The backup and recovery operations then supply a set of IP:port pairs for the application's use. Thus, applications (as of this writing) never listen for an incoming DirectTCP connection.

DirectTCP Transfers

The Transfer Architecture has an XFER_MECH_DIRECTTCP_LISTEN mechanism, which supports listening for incoming connections. In general, the XFA provides an abstraction for a DirectTCP transfer, and can handle the coordination of error handling, etc. automatically.