GNU Tar Include and Exclude Behavior: Difference between revisions

From wiki.zmanda.com
Jump to navigation Jump to search
(more)
(Done?)
Line 460: Line 460:
This was tested against tar versions 1.15, 1.15.1, 1.16, 1.17, 1.18, 1.19, 1.20, 1.21, 1.22, and 1.23.
This was tested against tar versions 1.15, 1.15.1, 1.16, 1.17, 1.18, 1.19, 1.20, 1.21, 1.22, and 1.23.


There are some interesting patterns to note here:
= Summary =
This is the most concise summary I can invent.  Yes, there are *five* different matching schemes implemented in GNU tar.
 
* Single quotes ('), double quotes ("), and spaces always match themselves exactly, regardless of wildcards.
* Single quotes ('), double quotes ("), and spaces always match themselves exactly, regardless of wildcards.
* Includes
* Includes
Line 478: Line 480:
;type &beta;: Only <tt>*?[\</tt> are special, and <tt>\</tt> can escape any character, so <tt>\X</tt> and <tt>X</tt> will both match <tt>X</tt>.  However, both <tt>\?</tt> and <tt>\\</tt> are buggy and will not match <tt>?</tt> and <tt>\</tt>, respectively.
;type &beta;: Only <tt>*?[\</tt> are special, and <tt>\</tt> can escape any character, so <tt>\X</tt> and <tt>X</tt> will both match <tt>X</tt>.  However, both <tt>\?</tt> and <tt>\\</tt> are buggy and will not match <tt>?</tt> and <tt>\</tt>, respectively.
;type &gamma;: Only <tt>*?[\</tt> are special, and <tt>\</tt> can escape any character. There is no bug with <tt>\?</tt> or <tt>\\</tt>.
;type &gamma;: Only <tt>*?[\</tt> are special, and <tt>\</tt> can escape any character. There is no bug with <tt>\?</tt> or <tt>\\</tt>.
;type &delta;:  
;type &delta;: Only <tt>*?[</tt> are special, and no escaping is possible.
;type &epsilon;: Only literal matches are accepted, ''except'' that both <tt>\</tt> and <tt>\\</tt> will match <tt>\</tt>.
;type &epsilon;: Only literal matches are accepted, ''except'' that both <tt>\</tt> and <tt>\\</tt> will match <tt>\</tt>.



Revision as of 21:04, 25 May 2010

This table represents the results of installcheck/gnutar.pl across multiple GNU Tar versions. Note that this page only deals with include and exclude behavior; see the GNU Tar FAQ entry for other undesirable behaviors.

pat file include exclude
no args -wc -no-wc no args -wc -no-wc
<1.16 1.16-22 >1.22 <1.16 1.16-22 >1.22 <1.16 1.16-22 >1.22 <1.16 1.16-22 >1.22 <1.16 1.16-22 >1.22 <1.16 1.16-22 >1.22
./A*AA*A                                    
./A*AAxA                                    
./A\*AA*A                                    
./A\*AAxA                                    
./B?BB?B                                    
./B?BBxB                                    
./B\?BB?B                                    
./B\?BBxB                                    
./C[CC[C                                    
./C\[CC[C                                    
./D\]DD]D                                    
./D]DD]D                                    
./E\EE\E                                    
./E\\EE\E                                    
./F'FF'F                                    
./F\'FF'F                                    
./G"GG"G                                    
./G\"GG"G                                    
./H HH H                                    
./H\ HH H                                    

This was tested against tar versions 1.15, 1.15.1, 1.16, 1.17, 1.18, 1.19, 1.20, 1.21, 1.22, and 1.23.

Summary

This is the most concise summary I can invent. Yes, there are *five* different matching schemes implemented in GNU tar.

  • Single quotes ('), double quotes ("), and spaces always match themselves exactly, regardless of wildcards.
  • Includes
    • The default behavior is identical to --no-wildcards
    • Behavior changed with version 1.16:
      • In versions before 1.16, the wildcard option is ignored for includes, and type α wildcard matching is always applied.
      • In versions 1.16 and higher, when wildcard matching is enabled, type β wildcard matching is applied. When wildcard matching is disabled, type ε matching is applied (!).
  • Excludes
    • The default behavior is identical to --wildcards
    • When wildcards are disabled, they are truly disabled: only literal matches are accepted.
    • Behavior changed with version 1.23:
      • In versions up to 1.23, when wildcards are enabled, type γ matching is applied.
      • In versions 1.23 and higher, when wildcards are enabled, type δ matching is applied.

Matching types mentioned above:

type α
Only *?[\ are special, and only special characters can be escaped by \ - otherwise, the escaping backslash is treated literally (e.g., E\E matches against itself, but not against EE). There is a bug with \?, which is treated as \0177 internally.
type β
Only *?[\ are special, and \ can escape any character, so \X and X will both match X. However, both \? and \\ are buggy and will not match ? and \, respectively.
type γ
Only *?[\ are special, and \ can escape any character. There is no bug with \? or \\.
type δ
Only *?[ are special, and no escaping is possible.
type ε
Only literal matches are accepted, except that both \ and \\ will match \.

To Do

  • Explore the inside of character classes: how do you specify ] or \ in a character class? Negation?
  • Look at the source to figure out what's going on with backslashes