GNU Tar Include and Exclude Behavior: Difference between revisions

From wiki.zmanda.com
Jump to navigation Jump to search
(update with info from git head)
(tag columns with matching type)
Line 3: Line 3:
<table border=1>
<table border=1>
<tr>
<tr>
<th rowspan='3' valign='bottom'>pat</th>
<th rowspan='4' valign='bottom'>pat</th>
<th rowspan='3' valign='bottom'>file</th>
<th rowspan='4' valign='bottom'>file</th>
<th colspan='9' align='center'>include</th>
<th colspan='9' align='center'>include</th>
<th colspan='9' align='center'>exclude</th>
<th colspan='9' align='center'>exclude</th>
Line 28: Line 28:
<th align='center'>&lt;1.23</th>
<th align='center'>&lt;1.23</th>
<th align='center'>1.23</th>
<th align='center'>1.23</th>
<th align='center'>&gt1.23</th>
<th align='center'>&gt;1.23</th>
<th align='center'>&lt;1.23</th>
<th align='center'>&lt;1.23</th>
<th align='center'>1.23</th>
<th align='center'>1.23</th>
Line 35: Line 35:
<th align='center'>1.23</th>
<th align='center'>1.23</th>
<th align='center'>&gt;1.23</th>
<th align='center'>&gt;1.23</th>
</tr>
<tr>
<th align='center'>&alpha; </th>
<th align='center'>&eta;</th>
<th align='center'>&eta;</th>
<th align='center'>&alpha;</th>
<th align='center'>&beta;</th>
<th align='center'>&beta;</th>
<th align='center'>&alpha; </th>
<th align='center'>&eta;</th>
<th align='center'>&eta;</th>
<th align='center'>&gamma;</th>
<th align='center'>&delta;</th>
<th align='center'>&gamma;</th>
<th align='center'>&gamma;</th>
<th align='center'>&delta;</th>
<th align='center'>&gamma;</th>
<th align='center'>&empty;</th>
<th align='center'>&empty;</th>
<th align='center'>&empty;</th>
</tr>
</tr>
<tr>
<tr>
Line 471: Line 491:
* Excludes
* Excludes
** The default behavior is identical to --wildcards
** The default behavior is identical to --wildcards
** When wildcards are disabled, they are truly disabled: only literal matches are accepted.
** When wildcards are disabled, they are truly disabled: only literal matches are accepted (type &emtpy;).
** When wildcards are enabled, version 1.23 has a bug that causes incorrect behavior:
** When wildcards are enabled, version 1.23 has a bug that causes incorrect behavior:
*** In versions other than 1.23, when wildcards are enabled, type &gamma; matching is applied.   
*** In versions other than 1.23, when wildcards are enabled, type &gamma; matching is applied.   
Line 477: Line 497:


Matching types mentioned above:
Matching types mentioned above:
;type &empty;: Literal matching - no wildcards or escaping.
;type &alpha;: Only <tt>*?[\</tt> are special, and only special characters can be escaped by <tt>\</tt> - otherwise, the escaping backslash is treated literally (e.g., <tt>E\E</tt> matches against itself, but not against <tt>EE</tt>).  There is a bug with <tt>\?</tt>, which is treated as <tt>\0177</tt> internally.
;type &alpha;: Only <tt>*?[\</tt> are special, and only special characters can be escaped by <tt>\</tt> - otherwise, the escaping backslash is treated literally (e.g., <tt>E\E</tt> matches against itself, but not against <tt>EE</tt>).  There is a bug with <tt>\?</tt>, which is treated as <tt>\0177</tt> internally.
;type &beta;: Only <tt>*?[\</tt> are special, and <tt>\</tt> can escape any character, so <tt>\X</tt> and <tt>X</tt> will both match <tt>X</tt>.  However, both <tt>\?</tt> and <tt>\\</tt> are buggy and will not match <tt>?</tt> and <tt>\</tt>, respectively.
;type &beta;: Only <tt>*?[\</tt> are special, and <tt>\</tt> can escape any character, so <tt>\X</tt> and <tt>X</tt> will both match <tt>X</tt>.  However, both <tt>\?</tt> and <tt>\\</tt> are buggy and will not match <tt>?</tt> and <tt>\</tt>, respectively.

Revision as of 21:37, 26 May 2010

This table represents the results of installcheck/gnutar.pl across multiple GNU Tar versions. Note that this page only deals with include and exclude behavior; see the GNU Tar FAQ entry for other undesirable behaviors.

pat file include exclude
no args -wc -no-wc no args -wc -no-wc
<1.16 1.16-22 >1.22 <1.16 1.16-22 >1.22 <1.16 1.16-22 >1.22 <1.23 1.23 >1.23 <1.23 1.23 >1.23 <1.23 1.23 >1.23
α η η α β β α η η γ δ γ γ δ γ
./A*AA*A                                    
./A*AAxA                                    
./A\*AA*A                                    
./A\*AAxA                                    
./B?BB?B                                    
./B?BBxB                                    
./B\?BB?B                                    
./B\?BBxB                                    
./C[CC[C                                    
./C\[CC[C                                    
./D\]DD]D                                    
./D]DD]D                                    
./E\EE\E                                    
./E\\EE\E                                    
./F'FF'F                                    
./F\'FF'F                                    
./G"GG"G                                    
./G\"GG"G                                    
./H HH H                                    
./H\ HH H                                    

This was tested against tar versions 1.15, 1.15.1, 1.16, 1.17, 1.18, 1.19, 1.20, 1.21, 1.22, 1.23, and the current git HEAD (e21d54e8c).

Summary

This is the most concise summary I can invent. Yes, there are *five* different matching schemes implemented in GNU tar.

  • Single quotes ('), double quotes ("), and spaces always match themselves exactly, regardless of wildcards.
  • Includes
    • The default behavior is identical to --no-wildcards
    • Behavior changed with version 1.16:
      • In versions before 1.16, the wildcard option is ignored for includes, and type α wildcard matching is always applied.
      • In versions 1.16 and higher, when wildcard matching is enabled, type β wildcard matching is applied. When wildcard matching is disabled, type ε matching is applied (!).
  • Excludes
    • The default behavior is identical to --wildcards
    • When wildcards are disabled, they are truly disabled: only literal matches are accepted (type &emtpy;).
    • When wildcards are enabled, version 1.23 has a bug that causes incorrect behavior:
      • In versions other than 1.23, when wildcards are enabled, type γ matching is applied.
      • In versions 1.23, when wildcards are enabled, type δ matching is applied.

Matching types mentioned above:

type ∅
Literal matching - no wildcards or escaping.
type α
Only *?[\ are special, and only special characters can be escaped by \ - otherwise, the escaping backslash is treated literally (e.g., E\E matches against itself, but not against EE). There is a bug with \?, which is treated as \0177 internally.
type β
Only *?[\ are special, and \ can escape any character, so \X and X will both match X. However, both \? and \\ are buggy and will not match ? and \, respectively.
type γ
Only *?[\ are special, and \ can escape any character. There is no bug with \? or \\.
type δ
Only *?[ are special, and no escaping is possible (note that this is a bug in version 1.23)
type ε
Only literal matches are accepted, except that both \ and \\ will match \.

To Do

  • Explore the inside of character classes: how do you specify ] or \ in a character class? Negation?
  • Look at the source to figure out what's going on with backslashes