GNU Tar Include and Exclude Behavior: Difference between revisions

From wiki.zmanda.com
Jump to navigation Jump to search
(More detail)
(MORE details)
Line 464: Line 464:
* Includes
* Includes
** Behavior with --no-wildcards is identical to the default behavior.
** Behavior with --no-wildcards is identical to the default behavior.
** In versions before 1.16, the wildcard option is ignored for includes, and type α wildcard matching is always applied.
** Behavior changed with version 1.16:
** All versions 1.16 and higher have identical behavior: TODO
*** In versions before 1.16, the wildcard option is ignored for includes, and type α wildcard matching is always applied.
*** In versions 1.16 and higher, when wildcard matching is enabled, type &gamma; wildcard matching is applied.  When wildcard matching is disabled, only literal matches are accepted, ''except'' that both <tt>\</tt> and <tt>\\</tt> will match <tt>\</tt>.
* Excludes
* Excludes
** Behavior with --wildcards is identical to the default behavior.
** Behavior with --wildcards is identical to the default behavior.
** In versions up to 1.23, when wildcards are enabled, type &beta; matching is applied.  When wildcards are disabled, they are truly disabled: only literal matches are accepted.
** When wildcards are disabled, they are truly disabled: only literal matches are accepted.
** All versions before 1.23 have identical behavior: TODO
** Behavior changed with version 1.23:
*** In versions up to 1.23, when wildcards are enabled, type &beta; matching is applied.   
*** In versions 1.23 and higher, TODO


Matching types mentioned above:
Matching types mentioned above:
;&alpha: Only <tt>*?[\</tt> are special, and only special characters can be escaped by <tt>\</tt> - otherwise, the escaping backslash is treated literally (e.g., <tt>E\E</tt> matches against itself, but not against <tt>EE</tt>).  There is a bug with <tt>\?</tt>, which is treated as <tt>\0177</tt> internally.
;type &alpha;: Only <tt>*?[\</tt> are special, and only special characters can be escaped by <tt>\</tt> - otherwise, the escaping backslash is treated literally (e.g., <tt>E\E</tt> matches against itself, but not against <tt>EE</tt>).  There is a bug with <tt>\?</tt>, which is treated as <tt>\0177</tt> internally.
;&beta: Like &alpha;, only <tt>*?[\</tt> are special, but <tt>\</tt> can escape any character, so <tt>\X</tt> and <tt>X</tt> will both match <tt>X</tt>.
;type &beta;: Only <tt>*?[\</tt> are special, and <tt>\</tt> can escape any character, so <tt>\X</tt> and <tt>X</tt> will both match <tt>X</tt>. There is no bug with <tt>\?</tt> or <tt>\\</tt>.
;type &gamma;: Only <tt>*?[\</tt> are special, and <tt>\</tt> can escape any character.  However, both <tt>\?</tt> and <tt>\\</tt> are buggy and will not match <tt>?</tt> and <tt>\</tt>, respectively.


= To Do =
= To Do =
* Explore the inside of character classes: how do you specify ] or \ in a character class?  Negation?
* Explore the inside of character classes: how do you specify ] or \ in a character class?  Negation?
* Look at the source to figure out what's going on with backslashes

Revision as of 20:50, 25 May 2010

This table represents the results of installcheck/gnutar.pl across multiple GNU Tar versions. Note that this page only deals with include and exclude behavior; see the GNU Tar FAQ entry for other undesirable behaviors.

pat file include exclude
no args -wc -no-wc no args -wc -no-wc
<1.16 1.16-22 >1.22 <1.16 1.16-22 >1.22 <1.16 1.16-22 >1.22 <1.16 1.16-22 >1.22 <1.16 1.16-22 >1.22 <1.16 1.16-22 >1.22
./A*AA*A                                    
./A*AAxA                                    
./A\*AA*A                                    
./A\*AAxA                                    
./B?BB?B                                    
./B?BBxB                                    
./B\?BB?B                                    
./B\?BBxB                                    
./C[CC[C                                    
./C\[CC[C                                    
./D\]DD]D                                    
./D]DD]D                                    
./E\EE\E                                    
./E\\EE\E                                    
./F'FF'F                                    
./F\'FF'F                                    
./G"GG"G                                    
./G\"GG"G                                    
./H HH H                                    
./H\ HH H                                    

This was tested against tar versions 1.15, 1.15.1, 1.16, 1.17, 1.18, 1.19, 1.20, 1.21, 1.22, and 1.23.

There are some interesting patterns to note here:

  • Single quotes ('), double quotes ("), and spaces always match themselves exactly, regardless of wildcards.
  • Includes
    • Behavior with --no-wildcards is identical to the default behavior.
    • Behavior changed with version 1.16:
      • In versions before 1.16, the wildcard option is ignored for includes, and type α wildcard matching is always applied.
      • In versions 1.16 and higher, when wildcard matching is enabled, type γ wildcard matching is applied. When wildcard matching is disabled, only literal matches are accepted, except that both \ and \\ will match \.
  • Excludes
    • Behavior with --wildcards is identical to the default behavior.
    • When wildcards are disabled, they are truly disabled: only literal matches are accepted.
    • Behavior changed with version 1.23:
      • In versions up to 1.23, when wildcards are enabled, type β matching is applied.
      • In versions 1.23 and higher, TODO

Matching types mentioned above:

type α
Only *?[\ are special, and only special characters can be escaped by \ - otherwise, the escaping backslash is treated literally (e.g., E\E matches against itself, but not against EE). There is a bug with \?, which is treated as \0177 internally.
type β
Only *?[\ are special, and \ can escape any character, so \X and X will both match X. There is no bug with \? or \\.
type γ
Only *?[\ are special, and \ can escape any character. However, both \? and \\ are buggy and will not match ? and \, respectively.

To Do

  • Explore the inside of character classes: how do you specify ] or \ in a character class? Negation?
  • Look at the source to figure out what's going on with backslashes