GNU Tar Include and Exclude Behavior: Difference between revisions

From wiki.zmanda.com
Jump to navigation Jump to search
(info about list output)
Line 1: Line 1:
This table represents the results of <tt>installcheck/gnutar.pl</tt> across multiple GNU Tar versions.  Note that this page only deals with include and exclude behavior; see [[FAQ:What versions of GNU Tar are Amanda-compatible?|the GNU Tar FAQ entry]] for other undesirable behaviors.
This table represents the results of <tt>installcheck/gnutar.pl</tt> across multiple GNU Tar versions.  Note that this page only deals with include and exclude behavior; see [[FAQ:What versions of GNU Tar are Amanda-compatible?|the GNU Tar FAQ entry]] for other undesirable behaviors.


= Include and Exclude Expressions =
<table border=1>
<table border=1>
<tr>
<tr>
Line 480: Line 481:
This was tested against tar versions 1.15, 1.15.1, 1.16, 1.17, 1.18, 1.19, 1.20, 1.21, 1.22, 1.23, and the current git HEAD (e21d54e8c).
This was tested against tar versions 1.15, 1.15.1, 1.16, 1.17, 1.18, 1.19, 1.20, 1.21, 1.22, 1.23, and the current git HEAD (e21d54e8c).


= Summary =
== Summary ==
This is the most concise summary I can invent.  Yes, there are *five* different matching schemes implemented in GNU tar.
This is the most concise summary I can invent.  Yes, there are *five* different matching schemes implemented in GNU tar.


Line 505: Line 506:
;type &epsilon;: Only literal matches are accepted, ''except'' that both <tt>\</tt> and <tt>\\</tt> will match <tt>\</tt>.
;type &epsilon;: Only literal matches are accepted, ''except'' that both <tt>\</tt> and <tt>\\</tt> will match <tt>\</tt>.


== Note regarding -t ==
= List (-t) output =
Note that testing these options using tar's -t option will lead to confusing results, since the output of the -t command has backslashes escaped with backslashes, although it does not escape any other characters - making it a decent, though not ideal, input for &epsilon;.
The -t option performs quoting of some non-printing characters as described in [http://www.gnu.org/software/tar/manual/html_node/Selecting-Archive-Members.html the GNU Tar manual].  Its behavior is consistent over all of the versions tested above:
 
<table>
<tr><th>character (hex)</th><th>becomes</th></tr>
<tr><td>BEL (07)</td><td>\a</td></tr>
<tr><td>BS (08)</td><td>\b</td></tr>
<tr><td>HT (09)</td><td>\t</td></tr>
<tr><td>LF (0a)</td><td>\n</td></tr>
<tr><td>VF (0b)</td><td>\v</td></tr>
<tr><td>FF (0c)</td><td>\f</td></tr>
<tr><td>CR (0d)</td><td>\r</td></tr>
<tr><td>\ (5C)</td><td>\\</td></tr>
<tr><td>DEL (7f)</td><td>\177</td></tr>
<tr><td>(80)</td><td>\200</td></tr>
<tr><td>\x (5C 78)</td><td>\\x</td></tr>
<tr><td>\\ (5C 5C)</td><td>\\\\</td></tr>
</table>


= To Do =
= To Do =
* Explore the inside of character classes: how do you specify ] or \ in a character class?  Negation?
* Explore the inside of character classes: how do you specify ] or \ in a character class?  Negation?
* Look at the source to figure out what's going on with backslashes
* Look at the source to figure out what's going on with backslashes
* --quote and --unquote

Revision as of 15:12, 28 May 2010

This table represents the results of installcheck/gnutar.pl across multiple GNU Tar versions. Note that this page only deals with include and exclude behavior; see the GNU Tar FAQ entry for other undesirable behaviors.

Include and Exclude Expressions

pat file include exclude
no args -wc -no-wc no args -wc -no-wc
<1.16 1.16-22 >1.22 <1.16 1.16-22 >1.22 <1.16 1.16-22 >1.22 <1.23 1.23 >1.23 <1.23 1.23 >1.23 <1.23 1.23 >1.23
α ε ε α β β α ε ε γ δ γ γ δ γ
./A*AA*A                                    
./A*AAxA                                    
./A\*AA*A                                    
./A\*AAxA                                    
./B?BB?B                                    
./B?BBxB                                    
./B\?BB?B                                    
./B\?BBxB                                    
./C[CC[C                                    
./C\[CC[C                                    
./D\]DD]D                                    
./D]DD]D                                    
./E\EE\E                                    
./E\\EE\E                                    
./F'FF'F                                    
./F\'FF'F                                    
./G"GG"G                                    
./G\"GG"G                                    
./H HH H                                    
./H\ HH H                                    

This was tested against tar versions 1.15, 1.15.1, 1.16, 1.17, 1.18, 1.19, 1.20, 1.21, 1.22, 1.23, and the current git HEAD (e21d54e8c).

Summary

This is the most concise summary I can invent. Yes, there are *five* different matching schemes implemented in GNU tar.

  • Single quotes ('), double quotes ("), and spaces always match themselves exactly, regardless of wildcards.
  • Includes
    • The default behavior is identical to --no-wildcards
    • Behavior changed with version 1.16:
      • In versions before 1.16, the wildcard option is ignored for includes, and type α wildcard matching is always applied.
      • In versions 1.16 and higher, when wildcard matching is enabled, type β wildcard matching is applied. When wildcard matching is disabled, type ε matching is applied (!).
  • Excludes
    • The default behavior is identical to --wildcards
    • behavior is identical whether excluding on extract (-x) or create (-c)
    • When wildcards are disabled, they are truly disabled: only literal matches are accepted (type &emtpy;).
    • When wildcards are enabled, version 1.23 has a bug that causes incorrect behavior:
      • In versions other than 1.23, when wildcards are enabled, type γ matching is applied.
      • In versions 1.23, when wildcards are enabled, type δ matching is applied.

Matching types mentioned above:

type ∅
Literal matching - no wildcards or escaping.
type α
Only *?[\ are special, and only special characters can be escaped by \ - otherwise, the escaping backslash is treated literally (e.g., E\E matches against itself, but not against EE). There is a bug with \?, which is treated as \0177 internally.
type β
Only *?[\ are special, and \ can escape any character, so \X and X will both match X. However, both \? and \\ are buggy and will not match ? and \, respectively.
type γ
Only *?[\ are special, and \ can escape any character. There is no bug with \? or \\.
type δ
Only *?[ are special, and no escaping is possible (note that this is a bug in version 1.23)
type ε
Only literal matches are accepted, except that both \ and \\ will match \.

List (-t) output

The -t option performs quoting of some non-printing characters as described in the GNU Tar manual. Its behavior is consistent over all of the versions tested above:

character (hex)becomes
BEL (07)\a
BS (08)\b
HT (09)\t
LF (0a)\n
VF (0b)\v
FF (0c)\f
CR (0d)\r
\ (5C)\\
DEL (7f)\177
(80)\200
\x (5C 78)\\x
\\ (5C 5C)\\\\

To Do

  • Explore the inside of character classes: how do you specify ] or \ in a character class? Negation?
  • Look at the source to figure out what's going on with backslashes
  • --quote and --unquote