Difference between revisions of "Application API"

From The Open Source Backup Wiki (Amanda, MySQL Backup, BackupPC)
Jump to navigationJump to search
(21 intermediate revisions by 3 users not shown)
Line 1: Line 1:
==Introduction==
+
<div style="float:right">__TOC__</div>
 +
This page documents the Application API from a developer's perspective -- in particular, someone interested in modifying an existing application or creating a new one.  For the basics of ''using'' the Application API in an Amanda configuration, see [[How To:Use Amanda Applications on a Client]].  Note that the implementation of the Application API is [[/Implementation|still in progress]].
 +
 
 +
Most of the useful content is held in subpages
 +
* [[/Terminology]] describes some of the terms used around the API
 +
* [[/Operations]] describes the API operations in detail
 +
* [[/DAR]] describes DAR (Direct access recovery)
 +
* [[/Implementation]] gives the roadmap for the API's implementation in Amanda
 +
* [[/Misconceptions]] will set your thinking straight about how the API works
 +
 
 +
==Background==
 
There are two compelling reasons to introduce the Application API:
 
There are two compelling reasons to introduce the Application API:
 
* To allow recovery of a single file without transmitting the entire backup archive to the client.
 
* To allow recovery of a single file without transmitting the entire backup archive to the client.
 
* To make it easier to support new client backup mechanisms, both at the filesystem and application level.
 
* To make it easier to support new client backup mechanisms, both at the filesystem and application level.
The Application API addresses these needs by changing '''backup''', '''restore''', '''selfcheck''', and other Amanda client commands.
+
The Application API addresses these needs by changing the way Amanda client operations work.
  
 
Historically, Amanda has focused on managing large chunks of data generated by one of only a few hard-coded applications (generally either GNU '''tar''', some version of '''dump''', or '''smbclient'''). The Application API addresses both limitations in the following manner:  
 
Historically, Amanda has focused on managing large chunks of data generated by one of only a few hard-coded applications (generally either GNU '''tar''', some version of '''dump''', or '''smbclient'''). The Application API addresses both limitations in the following manner:  
Line 10: Line 20:
 
*It extends Amanda to allow more granular backup and restore options.
 
*It extends Amanda to allow more granular backup and restore options.
  
=== Application API vs Dumper API ===
+
=== Backward Compatability ===
This Application API replaces and supercedes the previous [[Dumper API]] proposal.
+
The Application API maintains backward compatibility by extending existing behavior rather than replacing it.  Essentially, it adds "APPLICATION" as an alternative program to "GNUTAR" and "DUMP".  The latter two options remain unchanged.
  
Why make this change? The Dumper API has a number of limitations that the Application API avoids:
 
* The Dumper API had no restore support at all (only backup)
 
* The Dumper API included a lot of the details of dumping applications in the API itself.
 
* The Dumper API requires transmitting the entire archive to a client to extract even a single file.
 
* No Dumper API implementation work exists, though the proposal is over 4 years old.
 
 
=== Backward Compatability ===
 
The Application API maintains backward compatibility by extending existing behavior rather than replacing it:
 
 
* Legacy clients can be dumped as before. The server writes data to tape in the legacy format.
 
* Legacy clients can be dumped as before. The server writes data to tape in the legacy format.
 
* Legacy tapes can be read as before.
 
* Legacy tapes can be read as before.
Line 26: Line 28:
 
** When restoring to a legacy client, the restore works as before.
 
** When restoring to a legacy client, the restore works as before.
 
* Legacy clients cannot restore data backed up by Application API clients; legacy clients can  be restored only only from legacy dumps.
 
* Legacy clients cannot restore data backed up by Application API clients; legacy clients can  be restored only only from legacy dumps.
 
== Nomenclature ==
 
 
This nomenclature is derived from the SCSI command-set standard INCITS T10/1731-D.
 
 
A '''User Object''' is the basic unit of backup and restore, from the user perspective. Currently, a user object is a file or directory. In the future other types of data may be supported. Each user object has a hierarchical identifier and a set of associated attributes. Also, each user object is entirely contained within some set of collections, but a single collection may contain data from multiple user objects.
 
 
A '''Collection''' is the basic unit of backup and restore as it resides on the backup media. A collection is the smallest unit that can be stored or retrieved from media.
 
 
Each collection and user object may originate from only a single backup job, collection merge, or collection copy/migration.
 
 
== Application API Operations ==
 
 
Implementing the Application API requires changes to the backup server, but most of the code that constitutes the API itself resides on the client. The operations listed below are from the perspective of the backup clients.
 
 
=== Backup ===
 
Input: Specifies what is to be backed up: A filesystem, device, particular set of files, database table, etc.
 
 
Action: Reads the specified object.
 
 
Output: A set of collections (containing the backup data), and information on a set of user objects (identifier, attributes, associated collections)
 
 
=== Restore ===
 
Input: List of user objects to be restored, relevant collections, and target locations for the restore.
 
 
Action: Reads the collections and writes the relevant user objects in their original form to the specified location.
 
 
Output: None (other than administrative messages)
 
 
=== Reindex ===
 
Input: Octet stream of all the collections from a single job.
 
 
Output: Byte offsets for each collection in the stream, and information on the set of user objects in the stream.
 
 
=== Estimate ===
 
Input: Information on what is to be backed up: A filesystem, device, particular set of files, database table, etc.
 
 
Output: An estimate of how much space this data set will consume.
 
 
=== selfcheck ===
 
Input: Information on what is to be backed up: A filesystem, device, particular set of files, database table, etc.
 
 
Action: Determines if there are any configuration problems.
 
 
Output: Success or failure.
 
 
=== Capabilities ===
 
Input/Action: none.
 
 
Output: Capabilities of this application driver. For example, the application may not support exclusion. This command can also tell if this driver can read a dump from some other version of the same driver.
 
 
=== Print-Command ===
 
Input: Information on what is to be backed up: A filesystem, device, particular set of files, database table, etc.
 
 
Output: Prints a one-line command, if one exists, to restore this data from tape. This can be used for non-Amanda bare-metal disaster recovery.
 
 
==Implementation Phases==
 
 
This section has been moved to a [[Application_API_implementation | separate page]].
 
 
== Examples ==
 
Here are some examples of how the generic nomenclature might be applied in a particular application driver.
 
 
=== Dump ===
 
User object => Filesystem object (file, directory, socket, pipe, etc.)<BR>
 
Collection => Entire filesystem
 
 
=== GNU tar ===
 
User object => Archive object (file, directory, etc.)<BR>
 
Collection => one 512-byte tar block.
 
 
Note that having such collections can be problematic; see below.
 
 
=== SQL database ===
 
User Object => Database table<BR>
 
Collection => Entire database
 
 
==== Alternative SQL database ====
 
User Object => Table row<BR>
 
Collection => Entire database
 
 
This conception is only useful if you have very large table rows; otherwise, the indices will be as big as the original database!
 
 
== Media Formats ==
 
At present, there are two tape formats: Traditional and Spanned. In the future, more formats might be added. The Application API will not change this. Indeed, the on-tape format will not change at all; you could still restore a '''gnutar''' dump under the Application API using some earlier version of Amanda. The only thing that changes is the terminology:
 
 
=== Traditional ===
 
When dumping, we continue to write a 32k Amanda header followed by the complete set of collections provided by the client. As before, these are written as 32K tape blocks. On restore, we can use the BSR command to seek the tape drive to the appropriate tape block, and then read the desired collections. Thus the index need only store the byte offset of each collection.
 
 
To recover without Amanda, you can still use <tt>dd</tt> to read the tape contents into the tool directly.
 
 
=== Spanned ===
 
When using the Spanned tape format with the Application API, we again take the complete set of collections provided by the client, treating it as a single BLOB, and dividing it into chunks. We continue to write a 32k Amanda header at the beginning of each chunk, followed by a number of 32k tape blocks. On restore, we again can use the BSR command to find the appropriate tape block and read the desired collections. Under this scenario, we must again note where each collection is stored. This could be done by storing the byte-offset of each collection, along with chunk information for the job as a whole, or else by storing the chunk and byte offset for each piece of each collection.
 
 
=== Future ===
 
There are several limitations in the existing tape formats that may be addressed in the future. A new tape format might also take better advantage of Application API features. But such a change is not directly connected to or required by this API.
 
 
== Clarifying Common misconceptions ==
 
Because the Application API represents a major departure from historical Amanda thinking, misconceptions are common. This section attempts to address some of the most common.
 
 
=== The exact location of user objects is known. ===
 
Amanda can restore a user object only by retrieving the associated collections. Aside from tracking the collection that contains it, Amanda doesn't store the exact location of any user object. Amanda still has enough information to efficiently restore a user object without reading the whole dump -- assuming that collections are smaller than a dump.
 
 
To put it another way, the object may not be found at any particular byte offset in the backup. Even if it could, Amanda wouldn't know that offset. But nonetheless Amanda has sufficient information perform restores efficiently.
 
 
=== Collections must not be very small (or very big) ===
 
Although Amanda will not enforce any particular size restriction on a collection, the optimal size for roughly corresponds to the size of a user object. In general, there is not much advantage to having collections smaller than about 64k. Very small collections will bloat the index; very large collections may cause slower restores, especially partial restores of small objects from the collection.
 
 
=== The server can understand a collection ===
 
As today, the server doesn't know anything about the collections on media -- it can only store and retrieve them. An entire collection (not an entire job) must be sent to an Amanda client running the same Application API for interpretation.
 
 
Note, however, that this Amanda client may be on the same physical machine as the Amanda server.
 
 
=== Inputs Outputs above are associated with particular sockets ===
 
As there is as yet no line protocol associated with this API, it would be premature to talk about particular sockets. But it is very possible that all output data (octet stream, collection byte offsets, and user object information) will be multiplexed in a single network socket.
 
 
=== On restore, The client may seek to a particular place in the backup data ===
 
Although the client could do this, the server doesn't know anything about it. Rather, the server provides the set of collections that includes all the user objects of interest. Then (and only then) the client goes about restoring user objects from this set of collections.
 
 
=== The data stream sent to the server is opaque to the server ===
 
Although the collection data itself is opaque, the other data (collection sizes, user object identifiers and attributes) is very much interpreted by the server. There should, for example, be a standard way of representing file permissions and timestamps as user object attributes.
 
 
 
 
= Application calling convention =
 
This is subject to change
 
== '''support''' command ==
 
'''support''' ['''--config''' ''config''] ['''--host''' ''host''] ['''--disk''' ''disk''] ['''--device''' ''device''] ['''--PROPERTY_NAME''' ''PROPERTY_VALUE'']*
 
;0utput on fd1:
 
    CONFIG YES|NO
 
    HOST YES|NO
 
    DISK YES|NO
 
    MAX-LEVEL ''level''
 
    INDEX-LINE YES|NO
 
    INDEX-XML YES|NO
 
    MESSAGE-LINE YES|NO
 
    MESSAGE-XML YES|NO
 
    RECORD YES|NO
 
    INCLUDE YES|NO
 
    INCLUDE-LIST YES|NO
 
    INCLUDE-OPTIONAL YES|NO
 
    EXCLUDE YES|NO
 
    EXCLUDE-LIST YES|NO
 
    EXCLUDE-OPTIONAL YES|NO
 
    COLLECTION YES|NO
 
    CALCSIZE YES|NO
 
    MULTI-ESTIMATE YES|NO
 
 
== '''selfcheck''' command ==
 
'''selfcheck''' ['''--message''' ('''line'''|'''xml''')] ['''--config''' ''config''] ['''--host''' ''host''] ['''--disk''' ''disk''] '''--device''' ''device'' '''--level''' ''level'' ['''--record'''] ['''--PROPERTY_NAME''' ''PROPERTY_VALUE'']*
 
;Output on fd1: (if no '''--message''' or '''--message line''') (Could be many lines)
 
    OK [''message'']
 
    ERROR ['''message''']
 
;0utput on fd1: (if '''--message xml''')
 
    format not yet defined
 
 
== '''estimate''' command ==
 
'''estimate''' ['''--message''' ['''line'''|'''xml''']] ['''--config''' ''config''] ['''--host''' ''host''] ['''--disk''' ''disk''] '''--device''' ''amdevice'' '''--level''' ''level'' ['''--PROPERTY_NAME''' ''PROPERTY_VALUE'']*
 
  output on fd1: (if no '''--message''' or '''--message line''')
 
    error message that should be logged.
 
    '''SIZE''' ''value''''suffix'' where suffix could be K, M, G
 
  output on fd1: (if '''--message xml''')
 
    format not yet defined
 
 
== '''backup''' command ==
 
'''backup''' ['''--message''' ('''line'''|'''xml''')] ['''--index''' ('''line'''|'''xml''')] ['''--config''' ''config''] ['''--host''' ''host''] ['''--disk''' ''disk''] '''--device''' ''amdevice'' '''--level''' ''level'' ['''--record'''] ['''--PROPERTY_NAME''' ''PROPERTY_VALUE'']*
 
;output on fd1:
 
    data stream
 
;output on fd3: (if no '''--message''' or '''--message line''')
 
    error message
 
    '''HEADER''' ''variable'''''='''''value'', information that should go in the amanda header.
 
    '''SIZE''' ''value suffix'' where ''suffix'' could be '''K''', (kilobytes) '''M''', (megabytes) or '''G''' (gigabytes)
 
;output on fd3: (if '''--message xml''')
 
    format not yet defined
 
;output on fd4: (if '''--index line''')
 
    index stream (One filename by line)
 
;output on fd4: (if '''--index xml''')
 
    xml index stream (format not yet defined)
 
 
Error messages should begin with a '''|''' for normal ouput
 
                                    '''?''' for strange or error output
 
                                    '''&''' for unknown output
 
 
== '''restore''' command ==
 
'''restore''' ['''--message''' ('''line'''|'''xml''')] ['''--index''' ('''line'''|'''xml''')] ['''--config''' ''config''] ['''--host''' ''host''] ['''--disk''' ''disk''] '''--device''' ''amdevice'' '''--level''' ''level'' ['''--PROPERTY_NAME''' ''PROPERTY_VALUE'']* [./file-to-restore]+
 
;Input on fd0:
 
    data stream
 
;Output on fd1: (if no '''--message''' or '''--message line''')
 
    error message
 
;0utput on fd1: (if '''--message xml''')
 
    format not yet defined
 
 
== '''index''' command ==
 
'''index''' ['''--message''' ('''line'''|'''xml''')] ['''--index''' ('''line'''|'''xml''')] ['''--config''' ''config''] ['''--host''' ''host''] ['''--disk''' ''disk''] '''--device''' ''amdevice'' '''--level''' ''level'' ['''--PROPERTY_NAME''' ''PROPERTY_VALUE'']*
 
  input fd1:
 
    data stream
 
  output on fd3: (if no --message or --message line)
 
    error message
 
  output on fd3: (if --message xml)
 
    format not yet defined
 
  output on fd4: (if --index line)
 
    index stream (One filename by line)
 
  output on fd4: (if --index xml)
 
    xml index stream
 
 
== '''tool''' property format ==
 
Each property is passed as command line option, if a property has many values, then it must have an option for each value.
 
 
= How to use =
 
== The application must be defined in '''amanda.conf''' ==
 
Define the '''my_application''' application using the '''myapplication''' binary.
 
  define application-tool my_application {
 
    comment "a comment"
 
    "my_app"                          # inherit config of the my_app application
 
    plugin  "myapplication"          # name of the application, it must be installed in dumper dir
 
    property "mailto" "amandabackup"  # can set property
 
  }
 
 
== The '''dumptype''' must specify the application ==
 
Define the "my_dumptype" dumptype using the "my_application" application
 
  define dumptype my_dumptype {
 
    program "APPLICATION"
 
    application "my_application"
 
  }
 
 
Define the '''my_dumptype_2''' using a modified '''my_application''' application
 
  define dumptype my_dumptype_2 {
 
    program "APPLICATION"
 
    application {                # define a custom application
 
        "my_application"          # inherit setting from another application
 
        property "mailto" "root"  # override property
 
    }
 
  }
 
 
Disk List Entries using '''my_dumptype''' or '''my_dumptype_2''' will use the '''myapplication''' application to back up the client.
 
 
== Available application ==
 
 
=== amgtar ===
 
  define application-tool app_amgtar {
 
      comment "amgtar"
 
      plugin  "amgtar"
 
      #property "GNUTAR-PATH" "/path/to/gtar"
 
      #property "GNUTAR-LISTDIR" "/path/to/gnutar_list_dir"
 
                    #default from gnutar_list_dir setting in amanda-client.conf
 
      #property "ONE-FILE-SYSTEM" "yes"  #use '--one-file-system' option
 
      #property "SPARSE" "yes"          #use '--sparse' option
 
      #property "ATIME-PRESERVE" "yes"  #use '--atime-preserve=system' option
 
      #property "CHECK-DEVICE" "yes"    #use '--no-check-device' if set to "no"
 
  }
 
 
  define dumptype dt_amgtar {
 
      program "APPLICATION"
 
      application "app_amgtar"
 
  }
 
 
Your DLE must inherit from the dt_amgtar dumptype.
 
 
=== amstar ===
 
  define application-tool app_amstar {
 
      comment "amstar"
 
      plugin  "amstar"
 
      #property "STAR-PATH" "/path/to/star"
 
      #property "STAR-TARDUMP" "/path/to/tardumps"  # default /etc/tardumps
 
      #property "STAR-DLE-TARDUMP" "no"
 
          # if 'yes' then create a different tardump file for each DLE,
 
          # it is required if you do many dump in parallel (maxdump>1)
 
      #property "ONE-FILE-SYSTEM" "yes"  #use '-xdev' option
 
      #property "SPARSE" "yes"          #use '-sparse' option
 
  }
 
 
  define dumptype dt_amstar {
 
      program "APPLICATION"
 
      application "app_amstar"
 
  }
 
 
Your DLE must inherit from the dt_amstar dumptype.
 
amstar can only be used to backup full disk, i.e. the mount point.
 

Revision as of 15:03, 23 February 2016

This page documents the Application API from a developer's perspective -- in particular, someone interested in modifying an existing application or creating a new one. For the basics of using the Application API in an Amanda configuration, see How To:Use Amanda Applications on a Client. Note that the implementation of the Application API is still in progress.

Most of the useful content is held in subpages

  • /Terminology describes some of the terms used around the API
  • /Operations describes the API operations in detail
  • /DAR describes DAR (Direct access recovery)
  • /Implementation gives the roadmap for the API's implementation in Amanda
  • /Misconceptions will set your thinking straight about how the API works

Background

There are two compelling reasons to introduce the Application API:

  • To allow recovery of a single file without transmitting the entire backup archive to the client.
  • To make it easier to support new client backup mechanisms, both at the filesystem and application level.

The Application API addresses these needs by changing the way Amanda client operations work.

Historically, Amanda has focused on managing large chunks of data generated by one of only a few hard-coded applications (generally either GNU tar, some version of dump, or smbclient). The Application API addresses both limitations in the following manner:

  • It provides modular support for adding client backup tools, both for filesystems and applications such as databases, mail servers, etc.
  • It extends Amanda to allow more granular backup and restore options.

Backward Compatability

The Application API maintains backward compatibility by extending existing behavior rather than replacing it. Essentially, it adds "APPLICATION" as an alternative program to "GNUTAR" and "DUMP". The latter two options remain unchanged.

  • Legacy clients can be dumped as before. The server writes data to tape in the legacy format.
  • Legacy tapes can be read as before.
    • When restoring to a new client (one using the Application API), the server provides the legacy dump as one large collection.
    • When restoring to a legacy client, the restore works as before.
  • Legacy clients cannot restore data backed up by Application API clients; legacy clients can be restored only only from legacy dumps.