Difference between revisions of "GNU Tar Include and Exclude Behavior"

From The Open Source Backup Wiki (Amanda, MySQL Backup, BackupPC)
Jump to navigationJump to search
(add note)
(→‎Include Expressions: add names used in installcheck/gnutar.pl)
Line 50: Line 50:
 
<td align='center'>&alpha;</td>
 
<td align='center'>&alpha;</td>
 
<td align='center'>&beta;</td>
 
<td align='center'>&beta;</td>
<td align='center'>&nbsp;</td>
+
<td align='center'>&zeta;</td>
 
<td align='center'>&nbsp;</td>
 
<td align='center'>&nbsp;</td>
 
<td align='center'>&gamma;</td>
 
<td align='center'>&gamma;</td>
<td align='center'>&nbsp;</td>
+
<td align='center'>&eta;</td>
 
</tr>
 
</tr>
 
<tr>
 
<tr>

Revision as of 23:11, 18 May 2011

This table represents the results of installcheck/gnutar.pl across multiple GNU Tar versions. Note that this page only deals with include and exclude behavior; see the GNU Tar FAQ entry for other undesirable behaviors.

This was tested against tar versions 1.15, 1.15.1, 1.16, 1.17, 1.18, 1.19, 1.20, 1.21, 1.22, 1.22.90, 1.23, and the current git HEAD (e21d54e8c).

NOTE: the behavior for 1.25 and higher described here has not been validated or worked-around yet. Be cautious.

Matching Types

GNU Tar exhibits the following types of matching behavior:

type ∅
Literal matching - no wildcards or escaping.
type α
Only *?[\ are special, and only special characters can be escaped by \ - otherwise, the escaping backslash is unquoted according to the manual (e.g., E\E matches against itself, but not against EE, while N\nN will only match against an embedded newline). In particular, note that \?, which is unquoted to \0177 does not match a literal ? or \?.
type β
Only *?[\ are special, and \ can escape any character, so \X and X will both match X. However, both \? and \\ are buggy and will not match ? and \, respectively.
type γ
Only *?[\ are special, and \ can escape any character. There is no problem with \? or \\.
type δ
Only *?[ are special, and no escaping is possible (note that this is a bug in version 1.22.90 to 1.23)
type ε
Only literal matches are accepted, except that both \ and \\ will match \.

Include Expressions

pat file no wildcards* wildcards
unq* no unq unq* no unq
<1.16 ≥1.16 <1.16 ≥1.16 <1.16 ≥1.16<1.25 ≥1.25 <1.16 ≥1.16<1.25 ≥1.25
α ε   α β ζ   γ η
./A*AA*A                    
./A*AAxA                    
./A\*AA*A                    
./A\*AAxA                    
./B?BB?B                    
./B?BBxB                    
./B\?BB?B                    
./B\?BBxB                    
./C[CC[C                    
./C\[CC[C                    
./D\]DD]D                    
./D]DD]D                    
./E\EE\E                    
./E\\EE\E                    
./F'FF'F                    
./F\'FF'F                    
./G"GG"G                    
./G\"GG"G                    
./H HH H                    
./H\ HH H                    

Summary

  • Single quotes ('), double quotes ("), and spaces always match themselves exactly, regardless of wildcards.
  • --no-wildcards and --unquote are the default: supplying these options ordinarily has no effect. However, note that many binary distributions of perl (notably, RedHat distros) reverse this and make --wildcards the default!
  • Include behavior changed with version 1.16 (actually, apparently in 1.15.91, but we'll call it 1.16 in this document):
    • In versions before 1.16, the wildcard option is ignored for includes, and type α matching is always applied. The --unquote and --no-unquote options are not recognized and will cause an error.
    • In versions 1.16 and higher, when wildcard matching is disabled, type ε matching is applied under --unquote, and type ∅ is applied under --no-unquote. When wildcard matching is enabled, type β matching is applied under --unquote (the default) and type γ matching is applied under --no-unquote.
  • Note that --{no-}anchored and --{no-}wildcards-match-slash are not examined here

Exclude Expressions

pat file no wildcards wildcards*
<1.22.90 or >1.23 ≥1.22.90 and ≤1.23 <1.22.90 or >1.23 ≥1.22.90 and ≤1.23 ≥1.25
γ δ  
./A*AA*A          
./A*AAxA          
./A\*AA*A          
./A\*AAxA          
./B?BB?B          
./B?BBxB          
./B\?BB?B          
./B\?BBxB          
./C[CC[C          
./C\[CC[C          
./D\]DD]D          
./D]DD]D          
./E\EE\E          
./E\\EE\E          
./F'FF'F          
./F\'FF'F          
./G"GG"G          
./G\"GG"G          
./H HH H          
./H\ HH H          

Summary

  • Single quotes ('), double quotes ("), and spaces always match themselves exactly, regardless of wildcards.
  • The default behavior is identical to --wildcards. --unquote and --no-unquote are ignored, but it is not an error to supply them.
  • Behavior is identical whether excluding during extract (-x) or create (-c)
  • When wildcards are disabled, they are truly disabled: only literal matches are accepted (type &emtpy;).
  • When wildcards are enabled:
  • In versions <1.22.90 or >1.23, type γ matching is applied.
    • In versions 1.22.90 to 1.23, type δ matching is applied - this is a reported bug that has been fixed in version control, but not yet released.
  • Note that --{no-}anchored and --{no-}wildcards-match-slash are not examined here

List (-t) output

The -t option performs quoting of some non-printing characters as described in the GNU Tar manual. Its behavior is consistent over all of the versions tested above. However, note that locale-sensitive functions are used, especially to determine whether a character is printable. In particular, with an LC_CTYPE other than C, high-ascii is legal and will not be quoted. This behavior is implemented using the gnulib quotearg module.

character (hex)becomes
BEL (07)\a
BS (08)\b
HT (09)\t
LF (0a)\n
VF (0b)\v
FF (0c)\f
CR (0d)\r
\ (5C)\\
DEL (7f)\177
(80)\200
\x (5C 78)\\x
\\ (5C 5C)\\\\