Amanda log files/Trace Logs

From wiki.zmanda.com
Revision as of 15:57, 5 January 2011 by Dustin (talk | contribs) (formatting)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Each dump or flush creates a file named log.YYYYMMDDHHMMSS.N (or, if usetimestamps is off, log.YYYYMMDD.N) in the logdir directory, where YYYYMMDDHHMMSS is the timestamp of the start of the amdump/amflush run, and N is a sequence number starting at 0. These log files represent the Amanda catalog -- the database describing the contents of storage volumes (tapes). They are used for generating backup reports with amreport, tracking dumps with amadmin - find, and finding the required tapes for restore operations.

Do not delete these log files! Removing them means you cannot use Amanda tools to locate older backups (the information to restore the most recent backup is kept in the curinfo database, and is read by "amadmin xx info").

File Creation

During a run, the various components of Amanda write lines to $logdir/log. The presence of this file indicates that an Amanda application is running (really, that Amanda is using a device), and will cause most applications to suggest that the user run amcleanup. When a run is complete, amlogroll renames log to the full filename, pulling the datestamp from the file itself and using the zero-based sequence number to disambiguate logfiles with the same datestamp.

At the end of a run, amdump calls amtrmlog to move logfiles for runs whose tapes have been reused into a subdirectory named oldlog inside the logdir directory. It is up to the administrator to remove older log files from the oldlog subdirectory.

File Format

Each line in the file begins with a keyword and a program name, followed by keyword/program-specific text. The following description is organized by program, and then by keyword. It is not complete, and is based on a possibly-incorrect reading of the current Amanda source.

(general)

START

 START app date timestamp

The planner, driver, amflush, amvault, and amdump programs all will print a START line. This is simply a timestamp, but also indicates the program that is running.

INFO pid/pid-done

INFO app procname pid pid
INFO app pid-done pid

Lines of this form indicate the start and normal shutdown, respectively, of an Amanda process. They are used by Amanda::Process and amcleanup(8) to be sure that all processes associated with an Amanda run are killed.

INFO/WARNING

In general, INFO and WARNING lines are not programmatically interpreted, but are displayed to the user directly by amreport (with the exception of the pid logs, described above).

 INFO app error_message
 WARNING app error_message

The error_message is not quoted - all words following the app to the end of the line should be used as the message.

ERROR/FATAL

 ERROR app error_message
 FATAL app error_message

These lines report any unexpected problems or failures that interrupt the process. Certain errors may have special meaning; they are detailed in the processes which they appear. FATAL lines indicate that the program was not able to recover and are recorded separately from errors.

FAIL

 FAIL taper hostname diskname dump_timestamp level "message"
 FAIL taper hostname diskname dump_timestamp level [message]

These messages indicate that whatever procedure they were found in have failed in an un-recoverable manner. These failures are reported in the "FAILURE DUMP SUMMARY" portion of amreport.

Note that some processes encapsulate their failure messages in square brackets, and some double-quote them.

DISK

 DISK amflush hostname disk
 DISK amvault hostname disk
 DISK planner hostname disk

The DISK lines indicate which disklist entries are being scheduled to be flushed to tape this run.

taper

START

START taper datestamp timestamp label label tape t

This message indicates that the taper is starting a new tape. The first occurrence of this message will have t equal to 1, and t will count up from there, the value of t is unused. The timestamp will always match the timestamp of the logfile.

PART

PART taper label filenum hostname diskname dump_timestamp partnum/nparts level [sec secs bytes bytes kps kps orig-kb orig-kb]

before 3.2.2:

PART taper label filenum hostname diskname dump_timestamp partnum/nparts level [sec secs kn kb kps kps orig-kb orig-kb]

before 3.1.0:

PART taper label filenum hostname diskname dump_timestamp partnum/nparts level [sec secs kn kb kps kps]

This indicates that the taper has written a file, filenum, to volume label. The file contains a part (perhaps the only part) of the dump identified by hostname, diskname, dump_timestamp, and level, where dump_timestamp is the timestamp of the run in which the client dump took place, and may not be the same as the timestamp of the logfile. If nparts is not known, it is given as -1.

History:

  • The kb is replaced by bytes in version 3.2.2
  • The orig-kb is added in version 3.1.0.

PARTPARTIAL

PARTPARTIAL taper label filenum hostname diskname dump_timestamp partnum/nparts level [sec secs bytes bytes kps kps orig-kb orig-kb] "message"

before 3.2.2:

PARTPARTIAL taper label filenum hostname diskname dump_timestamp partnum/nparts level [sec secs kb kb kps kps orig-kb orig-kb] "message"

before 3.1:

PARTPARTIAL taper label filenum hostname diskname dump_timestamp partnum/nparts level [sec secs kb kb kps kps] "message"

Similar to PART, but for split parts that are not completely written to the volume. Note that the taper does not report such split parts to the driver, but does add them to the logfile. If Amanda does not manage to write the entire split part to tape, then this partial split part may contain critical user data, so it should not be forgotten!

History:

  • The kb is replaced by bytes in version 3.2.2
  • The orig-kb is added in version 3.1.0.

DONE

DONE taper hostname diskname dump_timestamp nparts level [sec secs bytes bytes kps kps orig-kb orig-kb]

before 3.2.2:

DONE taper hostname diskname dump_timestamp nparts level [sec secs kb kb kps kps orig-kb orig-kb]

before 3.1:

DONE taper hostname diskname dump_timestamp nparts level [sec secs kb kb kps kps]

This indicates that the taper has completely written the given dump to tape, and will only occur after all parts are written.

History:

  • The kb is replaced by bytes in version 3.2.2
  • The orig-kb is added in version 3.1.0.

PARTIAL

PARTIAL taper hostname diskname dump_timestamp nparts level [sec secs byte bytes kps kps orig-kb orig-kb] "message"

before 3.2.2:

PARTIAL taper hostname diskname dump_timestamp nparts level [sec secs kb kb kps kps orig-kb orig-kb] "message"

before 3.1:

PARTIAL taper hostname diskname dump_timestamp nparts level [sec secs kb kb kps kps] "message"

This indicates that the taper has written at least a file header to the volume, but encountered an error before finishing the dump. The format is identical to DONE, with the addition of a quoted error message at the end. nparts is the part number during which the failure occurred (so, for example, if one whole part was written, and then the taper ran out of space during the second part, nparts would be 2). The message is omitted when the taper writes an incomplete dumpfile without error (when the driver replies to DUMPER_STATUS with FAILED). Like the PARTIAL command in the Driver-Taper protocol, the statistics section of this command does not include any failed parts (although note that, in Amanda-2.6.1 and earlier, some failed parts were included). Thus, nparts will always be 1 or more, but kb and secs may be zero if an error occurred before any data was written. The orig-kb is only specified in logfiles after version 3.1.

History:

  • The kb is replaced by bytes in version 3.2.2
  • The orig-kb is added in version 3.1.0.

FAIL

FAIL taper hostname diskname dump_timestamp level "message"

No data from the described dump was written to the volume. This is similar to PARTIAL, except that PARTIAL is used when some (but not all) data was written.

Note that, if a dump fails before the taper even gets started (e.g., if the dumper or chunker fails to send a header), then this line may be logged even before a START taper line.

INFO / WARNING

INFO taper tape label kb kb fm num_files [OK]

this line is not interpreted by Amanda, but is familiar to users from amreport. It summarizes the usage of the volume with label label. The kb includes only successful parts (not PARTPARTIAL), but the num_files includes all files written to tape, partial or complete. The [OK] is always presend and thus meaningless.

INFO taper Will request retry of failed split part.

This is added to the log when writing a split part fails but can be re-started on the next volume.

INFO taper Will write new label `label' to new tape

This summarizes the result of a taper scan. Note that this line is added to the logfile before the driver has a chance to refuse permission to overwrite the tape.

ERROR

ERROR taper no-tape config|error [reason]

This line displays reasons that the taper could not find a volume to write to (and thus failed). Note that the reason is not quoted in the usual Amanda fashion. The config keyword means it is because of the configuration, the error means any others reason.

SUCCESS

SUCCESS taper hostname diskname dump_timestamp level [sec sec kb kb kps kps {random statistics}]

This is an old line indicating a complete, one-part dump.

dumper

SUCCESS

 SUCCESS dumper hostname disk level [sec secs kb kb kps kps orig-kb orig_kb]

A SUCCESS line indicates that the dump was successful.

STRANGE

 STRANGE dumper hostname disk level [sec secs kb kb kps kps orig-kb orig_kb]

A STRANGE line indicates that something unexpected happened during the dump, but did not necessarily cause a failure. Note that no datestamp is included, unlike the SUCCESS line.

chunker

SUCCESS/PARTIAL

 SUCCESS chunker hostname disk timestamp level [sec secs kb kb kps kps]

A chunker SUCCESS indicates that the chunker was able to write all the chunks to the holding disk.

A chunker PARTIAL line signifies that the chunker encountered an error and was only able to write a number of the chunks to the holding disk.

FAIL

 FAIL chunker hostname disk timestamp level [errmsg]

A FAILED line signifies a complete failure to write any data to holding.

driver

FINISH

 FINISH driver date timestamp time secs

A FINISH line indicates that the driver program has finished the core of its operation. This line is expected to be found in every successful logfile. If amreport does not find this line, it will suspect something went wrong.

STATS

 STATS driver estimate hostname disk timestamp level [sec secs nkb nkb ckb ckb kps kps]

These estimates are computed and used during a dump for various calculations.

 STATS driver hostname hostname

Hostname lines appear in the beginning of a run, mostly to list which hosts are going to be backed up.

 STATS driver startup time secs

This STATS line states how long it took the driver to execute before dumps started.

planner

FINISH

 FINISH planner date timestamp time sec

The FINISH line indicates that the planner has finished at the indicated timestamp, and how long it took to run, in seconds.

SUCCESS

 SUCCESS planner hostname disk datestamp level [skipped: reason]

The planner logs a SUCCESS line for a dump that it has decided to skip (Wouldn't SKIPPED make more sense, you ask? Amanda specializes in not making sense.) For extra security, Amanda::Report checks for the skipped, in the error messages, too.

Examples

Below are some examples of important messages with explanation in this log file.

  • START taper datestamp datestamp label tape-label tape current-tape
START taper datestamp 20070305162948 label test-00 tape 0

This indicates that the taper started to write a tape.

  • INFO taper tape tape label kb total tape used fm filemark [error or OK]
INFO taper tape test-00 kb 1472 fm 1 [OK]

This indicates that the taper has completed a tape and is writing and end marker.

  • FINISH planner date datestamp time time elapsed since planner started.
FINISH planner date 20070305162948 time 0.294

This indicates that the planner has finished creating its plan, and dumping can begin.

  • SUCCESS taper client disk datestamp level [ sec time-took kb output-kb kp/s KB-per-second {wr: writers number of writes rdwait read wait time wrwait write wait time filemark filemark wait time}]
SUCCESS taper localhost /tmp/quicktest/data 20070305162948 0 [sec 0.205 kb 1440 kps 7021.9 {wr: writers 46 rdwait 0.192 wrwait 0.011 filemark 0.0}]

This indicates that a dump has been successfully written to tape.

  • SUCCESS dumper client disk timestamp level [sec total runtime kb size of dump kps kp per second orig-kb original size]
SUCCESS dumper localhost /tmp/quicktest/data 20070305162948 0 [sec 0.169 kb 1430 kps 8437.1 orig-kb 1430] 

This indicates that a dump has been successfully dumped (either to holding or directly to tape).