Difference between revisions of "Talk:1638: Backslashes"

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
(Example of a match)
 
(34 intermediate revisions by 25 users not shown)
Line 1: Line 1:
...Maybe it's meant to search for all Game Grumps transcripts which make mention of the "[http://gamegrumps.wikia.com/wiki/Grep Grep]" gag? [[Special:Contributions/108.162.216.55|108.162.216.55]] 15:53, 3 February 2016 (UTC)
 
 
 
It should be noted that this also occurs in almost every programming language where "\" is the escape character. i.e.
 
It should be noted that this also occurs in almost every programming language where "\" is the escape character. i.e.
 
  print("Hello")
 
  print("Hello")
Line 10: Line 8:
 
Oh, and by the way, isn't this the third comic to mention "Ba'al, the Soul Eater"? Maybe we should start a category. (Others are [http://www.explainxkcd.com/wiki/index.php/1246:_Pale_Blue_Dot 1246] (title text) and [http://www.explainxkcd.com/wiki/index.php/1419:_On_the_Phone 1419].)
 
Oh, and by the way, isn't this the third comic to mention "Ba'al, the Soul Eater"? Maybe we should start a category. (Others are [http://www.explainxkcd.com/wiki/index.php/1246:_Pale_Blue_Dot 1246] (title text) and [http://www.explainxkcd.com/wiki/index.php/1419:_On_the_Phone 1419].)
 
[[Special:Contributions/173.245.54.29|173.245.54.29]] 06:14, 3 February 2016 (UTC)
 
[[Special:Contributions/173.245.54.29|173.245.54.29]] 06:14, 3 February 2016 (UTC)
:[[:Category:Ba'al|Did that]] before seeing you comment, so yes I agree. --[[User:Kynde|Kynde]] ([[User talk:Kynde|talk]]) 09:47, 3 February 2016 (UTC)
+
:Did that before seeing you comment, so yes I agree. --[[User:Kynde|Kynde]] ([[User talk:Kynde|talk]]) 09:47, 3 February 2016 (UTC)
 +
::But Davidy did not so the category has been deleted again. I have just cleaned up after my mess ;-) so there are no left over links to the dead category... --[[User:Kynde|Kynde]] ([[User talk:Kynde|talk]]) 22:27, 8 February 2016 (UTC)
 +
:::I noticed theres no character for Ba'al the soul eater on the character page. Is it just me or does someone else think that Ba'al should have a spot given that he(she??they??it??) has been in more comics than a lot of the minor characters? Just a thought. --[[User:Apollo11|Apollo11]] ([[User talk:Apollo11|talk]]) 10:42, 11 March 2024 (CST)
 +
::::That would be pretty funny. What if Ba'al the Soul Eater likes scones? He and Beret Guy would be best friends. {{unsigned ip|172.68.245.229|19:11, 21 August 2024}}
 +
Beret Guy would probably just think he's cranky from sleeping so long. Like "Oh my gosh are those HORNS?! THATS SO COOL!! Wanna come over and eat some bread?"
 +
Ba'al starts a curse "Hash slen non desth NEVAIR-"
 +
Beret Guy interrupting "Wow your congested, I have some soup that'll fix you right up" [[User:Apollo11|Apollo11]] ([[User talk:Apollo11|talk]]) 14:20, 22 August 2024 (UTC)
  
 
The last entry may also be an oblique reference to the infinitely-expandable recursive acronym "GOD = GOD Over Djinn" mentioned in Richard Hofstadter's Gödel, Escher, Bach.[[User:Taibhse|Taibhse]] ([[User talk:Taibhse|talk]]) 16:42, 3 February 2016 (UTC)
 
The last entry may also be an oblique reference to the infinitely-expandable recursive acronym "GOD = GOD Over Djinn" mentioned in Richard Hofstadter's Gödel, Escher, Bach.[[User:Taibhse|Taibhse]] ([[User talk:Taibhse|talk]]) 16:42, 3 February 2016 (UTC)
  
 
;I don't think the regex is invalid
 
;I don't think the regex is invalid
 +
''Note: The regex changed after initial publication. See '''Changed Regex''' below''
  
 
According to <tt>man grep</tt> you need to specify the <tt>-E</tt> option to use extended regex; without it unescaped parentheses are not interpreted, so they don't need to match.
 
According to <tt>man grep</tt> you need to specify the <tt>-E</tt> option to use extended regex; without it unescaped parentheses are not interpreted, so they don't need to match.
Line 30: Line 35:
 
After passing the regex through bash, you get <nowiki>\\[[(].*\\[\])][^)\]]*$</nowiki> That is, the literal character \, followed by [ or (, followed by any number of any characters, followed by \, followed by ] or ), followed by any number of characters that aren't ) or ], until the end of the line. [[Special:Contributions/108.162.216.44|108.162.216.44]] 08:33, 3 February 2016 (UTC)
 
After passing the regex through bash, you get <nowiki>\\[[(].*\\[\])][^)\]]*$</nowiki> That is, the literal character \, followed by [ or (, followed by any number of any characters, followed by \, followed by ] or ), followed by any number of characters that aren't ) or ], until the end of the line. [[Special:Contributions/108.162.216.44|108.162.216.44]] 08:33, 3 February 2016 (UTC)
 
:It sounds like you know what you are talking about. Anyone who can explain it good enough for the explanation, and correct the explanation of the title text if it is wrong to say that it would not work. I have added this as the reason for incomplete. But maybe also examples are needed for people with not programming skills/knowledge. We also enjoy xkcd ;-) --[[User:Kynde|Kynde]] ([[User talk:Kynde|talk]]) 09:51, 3 February 2016 (UTC)
 
:It sounds like you know what you are talking about. Anyone who can explain it good enough for the explanation, and correct the explanation of the title text if it is wrong to say that it would not work. I have added this as the reason for incomplete. But maybe also examples are needed for people with not programming skills/knowledge. We also enjoy xkcd ;-) --[[User:Kynde|Kynde]] ([[User talk:Kynde|talk]]) 09:51, 3 February 2016 (UTC)
 +
:I'm thinking that it's grepping for regular expressions that contain regular expressions. A regex containing <nowiki>"\[...\]"</nowiki> or <nowiki>"\(...\)"</nowiki> will match other regular expressions, as almost all non-trivial regexes use either character lists or groups. Now why out.txt is likely to contain not just regexes but rather regexes that search for regexes I have no idea - perhaps he had actually put too many backslashes in and he was trying to grep just for <nowiki>"[...]"</nowiki> or <nowiki>"(...)"</nowiki> (i.e. to locate probable regular expressions in out.txt, or anything else in parenthesis for that matter such as countless kinds of code/markup)? [[Special:Contributions/162.158.152.185|162.158.152.185]] 17:35, 4 February 2016 (UTC)
  
 
For fun:  
 
For fun:  
Line 35: Line 41:
  
 
Outputs your bash_history, ordered by relative gibberishness. This was copied by hand from desktop to mobile, might well have a few typos.--[[Special:Contributions/162.158.90.208|162.158.90.208]] 10:04, 3 February 2016 (UTC)
 
Outputs your bash_history, ordered by relative gibberishness. This was copied by hand from desktop to mobile, might well have a few typos.--[[Special:Contributions/162.158.90.208|162.158.90.208]] 10:04, 3 February 2016 (UTC)
 +
::Besides the fact that -d is a GNU extension to xargs (so it won't exist on OS X, FreeBSD, or anything else but Linux), this is a weird way to calculate gibberishness; I'm guessing functions, variable substitutions, .. and ./, etc. are going to swamp the more unreadable grep and the like. Plus, I think you need a uniq in there somewhere; otherwise, aren't the first few pages are all going to be filled with the 78 copies of "422 cd .." that tied for most gibberishy in my last 500 commands? --[[Special:Contributions/162.158.255.82|162.158.255.82]] 22:51, 7 March 2016 (UTC)
 +
:::I know it's been almost a decade, but I wanted to provide a more portable solution. Here's a posixly correct way to do that: <code>sed 'h;:b;s/./-/g;:a;s/-/<<123456789-01>/;s/\(.\)<.*\1\(-*.\).*>/\2/;/^$/s/^/0/;ta;x;H;/\n/!s/[a-zA-Z0-9 ]*//g;tb;x;s/\n/ /g;s/0 0/0 1/' ~/.bash_history | awk '{$1=$1/$2;$2="";print $0}' | sort -rn | less</code>. However, if you do have GNU sed, I think this is nicer: <code>sed -E 'h;:b;s/./-/g;:a;s/-/<<123456789-01>/;s/(.)<.*\1(-*.).*>/\2/;/^$/s/^/0/;ta;x;H;/\n/!s/[a-zA-Z0-9 ]*//g;tb;x;s/^/dc -e"Fk/;s%\n%/n";#%2e;G;s/\n.*\n/ /;s/\.(..)([^ ]*)/\1.\2%/' ~/.bash_history | sort -rn | less</code> if your 'most gibberishy commands' are '<code>cd ..</code>' then you're obviously not having enough fun on the command line. For example, a short and sweet one I found which I'd encourage everyone to try is <code>PS1='C:${PWD////\\\\}>'</code> [[User:Regex user|Regex user]] ([[User talk:Regex user|talk]]) 10:30, 23 June 2024 (UTC)
  
 
The problem in the comic is not with regexes per se but with situations when the entered text or expression passes through several interpreters, like bash -> grep/sed/awk, or program text -> external shell command. In such cases, you have to escape backslashes for each program in the sequence, and it gets worse if you have 'real' backslashes in the final text that you're processing with the utilities (Windows' file paths, for example). See https://en.wikipedia.org/wiki/Leaning_toothpick_syndrome.
 
The problem in the comic is not with regexes per se but with situations when the entered text or expression passes through several interpreters, like bash -> grep/sed/awk, or program text -> external shell command. In such cases, you have to escape backslashes for each program in the sequence, and it gets worse if you have 'real' backslashes in the final text that you're processing with the utilities (Windows' file paths, for example). See https://en.wikipedia.org/wiki/Leaning_toothpick_syndrome.
Line 46: Line 54:
 
enough with the harry potter fancruft. "elder" is a [[Wiktionary:elder|perfectly good word]]. just because you came across it for the first time in harry potter means you are *typing carefully* the kind of person that likes harry potter. unless this is a ''harry potter reference'' wiki, of course. in which case i'll prepare a complete list of every word that appears both here and there and put a list on every page. oh, right, no i won't. --[[Special:Contributions/141.101.106.161|141.101.106.161]] 12:41, 3 February 2016 (UTC)
 
enough with the harry potter fancruft. "elder" is a [[Wiktionary:elder|perfectly good word]]. just because you came across it for the first time in harry potter means you are *typing carefully* the kind of person that likes harry potter. unless this is a ''harry potter reference'' wiki, of course. in which case i'll prepare a complete list of every word that appears both here and there and put a list on every page. oh, right, no i won't. --[[Special:Contributions/141.101.106.161|141.101.106.161]] 12:41, 3 February 2016 (UTC)
  
Remember that "Elder" is used in a lot of RPGs to denote high level enemies or items. I feel like that's what Randall's referring to here, more than Harry Potter or the general sense of the term "Elder."
+
Remember that "Elder" is used in a lot of RPGs to denote high level enemies or items. I feel like that's what Randall's referring to here, more than Harry Potter or the general sense of the term "Elder." {{unsigned ip|108.162.245.156}}
 
: +1. Between the fact that harry potter (, ages, or tribes) aren't mentioned anywhere else in the text and the comic being a progressive list, I see this being the most likely explanation. Plus the metion of demons, which are easily the most* common usage of the modifier.
 
: +1. Between the fact that harry potter (, ages, or tribes) aren't mentioned anywhere else in the text and the comic being a progressive list, I see this being the most likely explanation. Plus the metion of demons, which are easily the most* common usage of the modifier.
 
:: (*) or second most, after "elder gods", who are, let's face it, also demons. [[Special:Contributions/162.158.180.125|162.158.180.125]] 14:41, 3 February 2016 (UTC)
 
:: (*) or second most, after "elder gods", who are, let's face it, also demons. [[Special:Contributions/162.158.180.125|162.158.180.125]] 14:41, 3 February 2016 (UTC)
 
::: I'm pretty sure that "Elder backslash" is in reference to the "Elder gods" of Lovecraft. [[Special:Contributions/173.245.54.35|173.245.54.35]] 16:51, 3 February 2016 (UTC)
 
::: I'm pretty sure that "Elder backslash" is in reference to the "Elder gods" of Lovecraft. [[Special:Contributions/173.245.54.35|173.245.54.35]] 16:51, 3 February 2016 (UTC)
 
:::: Note also that it's called 'The Elder Wand' not as an intensifier, as in this comic and the other examples given, but because it is literally ''made from the wood of an [https://en.wikipedia.org/wiki/Sambucus_nigra Elder Tree]'' I'm pretty sure it's not an intentional reference. -Graptor [[Special:Contributions/173.245.54.23|173.245.54.23]] 19:29, 3 February 2016 (UTC)
 
:::: Note also that it's called 'The Elder Wand' not as an intensifier, as in this comic and the other examples given, but because it is literally ''made from the wood of an [https://en.wikipedia.org/wiki/Sambucus_nigra Elder Tree]'' I'm pretty sure it's not an intentional reference. -Graptor [[Special:Contributions/173.245.54.23|173.245.54.23]] 19:29, 3 February 2016 (UTC)
 +
::::: If it's an intentional reference to anything, it's to Lovecraft (or to something similar). I suspect the Elder Wand was an intentional pun by Rowling, however. --[[Special:Contributions/162.158.180.137|162.158.180.137]] 04:16, 4 February 2016 (UTC)
 +
::::: Since no-one else seemed to want to, I just restructured that paragraph to make it more clear that if anything Harry Potter was inspired by the older examples, not the other way around. Expanded the LOTR reference and added DnD. If anything Randall is likely to be referencing either the Lovecraft references, or the concept of Elder in general. [[Special:Contributions/141.101.64.173|141.101.64.173]] 11:50, 4 February 2016 (UTC)
  
Attempting to add to the discussion: This regex is not necessarily invalid or incomprehensible.  It looks like he was looking for a line with a regular expression or definitely some code.  You just have to work your way through the backslashes.  Although it might be invalid depending on the precise rules.  He has some unescaped closing brackets and closing parenthesis.  If these have to always be escaped then the regex is invalid.  If however you  don't have to escape a closing bracket with no opening bracket, then things are fine.  I'm not familiar enough with grep's regex parser to know how it handles that edge case.  Presuming those unescaped paren and brackets are fine, his regex searches for:
+
Attempting to add to the discussion: This regex is not necessarily invalid or incomprehensible.  (''Note: The regex changed after initial publication. See '''Changed Regex''' below.'') It looks like he was looking for a line with a regular expression or definitely some code.  You just have to work your way through the backslashes.  Although it might be invalid depending on the precise rules.  He has some unescaped closing brackets and closing parenthesis.  If these have to always be escaped then the regex is invalid.  If however you  don't have to escape a closing bracket with no opening bracket, then things are fine.  I'm not familiar enough with grep's regex parser to know how it handles that edge case.  Presuming those unescaped paren and brackets are fine, his regex searches for:
  
 
1. A backslash
 
1. A backslash
Line 85: Line 95:
  
 
Ninjaed by Cmancone, above. I agree with that result in every respect except for the start-of-string being potentially anything, but putting my own analysis in here because it took long enough to type!
 
Ninjaed by Cmancone, above. I agree with that result in every respect except for the start-of-string being potentially anything, but putting my own analysis in here because it took long enough to type!
Depth-of-backslash might depend upon depth of utility. In Perl, <nowiki>''</nowiki>-quotes (among others) treat everything within as literal whilst ""-quotes (and variations) interpolates any special characters, variables, etc that you put in it.  (Search for "Quote and Quote-like operators" in your favourite PerlDocs source.)  '\sss' is a literal backslash followed by three 's' characters , while "\sss" is the special \s escape (a whitespace) followed by two further regular characters.  You might need to define the first when you need to use it to provide a not-previously-escaped \s so that it might be escaped within another context.  ''Or'' you define it as "\\sss" (escaped-\) the first time, as equivalent to '\sss'.  But '\\sss' would be a literal that, later, could be interpreted as an escaped-\ to the input of a further context where the \s finally becomes 'match a whitespace'.
+
 
 +
Depth-of-backslash might depend upon depth of utility. In Perl, <nowiki>''</nowiki>-quotes (among others) treat everything within as literal whilst ""-quotes (and variations) interpolates any special characters, variables, etc that you put in it.  (Search for "Quote and Quote-like operators" in your favourite PerlDocs source.)  '\sss' is a literal backslash followed by three 's' characters , while "\sss" is the special \s escape (a whitespace) followed by two further regular characters.  You might need to define the first when you need to use it to provide a not-previously-escaped \s so that it might be escaped within another context.  ''Or'' you define it as "\\sss" (escaped-\) the first time, as equivalent to '\sss'.  But '\\sss' would be a literal that, later, could be interpreted as an escaped-\ to the input of a further context where the \s finally becomes 'match a whitespace'.
 
   
 
   
'\\\sss' would be literal, whilst "\\\sss" could be equivalent to '\ ss' (literal backslash, literal space, rest of characters).  Then, instead of literal '\\sss', for some purpose, you could interpolate two escaped-backslashes "\\\\sss"... and so on.
+
'\\\sss' would be literal, whilst "\\\sss" could be equivalent to '\ ss' (literal backslash, literal space, rest of characters).  Then, instead of literal '\\sss', for some purpose, you could interpolate two escaped-backslashes "\\\\sss"... and so on.
 
   
 
   
Meanwhile I ''think'', just from visual inspection, "'''\\\[[(].*\\\[\])][^)\]]*$'''" in Bash should obey the interpolation rules quite nicely.  The first two characters must be a literal backslash (from the escaped-backslash) and a literal open-square bracket (again, escaped).  The next open-square and the close-square shortly after depict a character class that contains only an open-parenthesis, and could have been written as '''\('''.
+
Meanwhile I ''think'', just from visual inspection, "'''\\\[[(].*\\\[\])][^)\]]*$'''" in Bash should obey the interpolation rules quite nicely.  The first two characters must be a literal backslash (from the escaped-backslash) and a literal open-square bracket (again, escaped).  The next open-square and the close-square shortly after depict a character class that contains only an open-parenthesis, and could have been written as '''\('''.
 
   
 
   
The '''.*''' indicates zero-or-more (the asterix) instances of ''any'' character (the dot).  There is then a literal backslash (from the next '''\\''' duo) and a literal open-square (the '''\[''' pair) and close-square (the '''\]''' pair).  The ''')''' is literal and does not need escaping (as a parenthesis group had not yet been opened), as is the next ''']''' character.  To be sure, I would have written these two as the pair escapes '''\)\]''', but horses for courses...
+
The '''.*''' indicates zero-or-more (the asterix) instances of ''any'' character (the dot).  There is then a literal backslash (from the next '''\\''' duo) and a literal open-square (the '''\[''' pair) and close-square (the '''\]''' pair).  The ''')''' is literal and does not need escaping (as a parenthesis group had not yet been opened), as is the next ''']''' character.  To be sure, I would have written these two as the pair escapes '''\)\]''', but horses for courses...
 
   
 
   
Then there's another character class (the next '''[''' and the final ''']''') required zero-or-more times (the asterix) to use up all the rest of the characters to the end (the ending '''$''' character).  As there was no '''^''' character (a.k.a. caret/circumflex/etc) at the start, the match isn't bothered about what unmatched characters appear before the original '''\('''.  This character class, however, starts with a '''^''' which in this context (the very first character of a character-class definition, not somewhere where an entire match-string starts) indicates negation of the following selection, so it is all characters ''but'' those specified, which is the regular close-parenthesis and (because it needs to be contained within a '''[]''' pair) the escaped close-square.
+
Then there's another character class (the next '''[''' and the final ''']''') required zero-or-more times (the asterix) to use up all the rest of the characters to the end (the ending '''$''' character).  As there was no '''^''' character (a.k.a. caret/circumflex/etc) at the start, the match isn't bothered about what unmatched characters appear before the original '''\('''.  This character class, however, starts with a '''^''' which in this context (the very first character of a character-class definition, not somewhere where an entire match-string starts) indicates negation of the following selection, so it is all characters ''but'' those specified, which is the regular close-parenthesis and (because it needs to be contained within a '''[]''' pair) the escaped close-square.
 
   
 
   
So, all matching strings must start with '''\[(''', i.e. the backslash, open-square and open-paren.  They can continue with ''any'' further text, before then having a '''\[])]''', i.e. backslash, open-and-close-squares and close-paren, close-square.  After this, the match continues just as long as there are no non-closing square/classic brackets before the ending.
+
So, all matching strings must start with '''\[(''', i.e. the backslash, open-square and open-paren.  They can continue with ''any'' further text, before then having a '''\[])]''', i.e. backslash, open-and-close-squares and close-paren, close-square.  After this, the match continues just as long as there are no non-closing square/classic brackets before the ending.
 
   
 
   
The minimum matching literal string would be '''\[(\[])]''' with longer variants being of the form '''X\[(Y\[])]Z''' where X and Y can be replaced by anything (or be absent), and Z can be replaced by anything (or absent!) ''so long as it doesn't contain possibly relevent close-brackets!''. The latter stipulation is likely because the Y (and X) ''is'' allowed to contain these characters, and for some reason you don't want to confuse the test by finding some other '''\[])]''' segment within the X/Y-zones.  (In this context, it doesn't actually seem to matter too much.  But it might do in ways I haven't spotted or just be a hang-over from a prior permutation of the test.)
+
The minimum matching literal string would be '''\[(\[])]''' with longer variants being of the form '''X\[(Y\[])]Z''' where X and Y can be replaced by anything (or be absent), and Z can be replaced by anything (or absent!) ''so long as it doesn't contain possibly relevent close-brackets!''. The latter stipulation is likely because the Y (and X) ''is'' allowed to contain these characters, and for some reason you don't want to confuse the test by finding some other '''\[])]''' segment within the X/Y-zones.  (In this context, it doesn't actually seem to matter too much.  But it might do in ways I haven't spotted or just be a hang-over from a prior permutation of the test.)
 
   
 
   
The "grep -o" function is working on the output to the file being '''cat'''ed (there are alternate ways of doing this that some people might prefer), to only accept the lines in the file that match the '''X\[(Y\[])]Z''' string.  These lines would appear to be lines of out.txt (a fairly generic name that reveals little to its original purpose) that are well-formed for some other purpose.  A safety-escaped (i.e. not to be taken literally by any simple parser) '''[]'''-grouping containing a '''()'''-group (''not'' escaped, perhaps reasonably in context) containing potentially random text followed by an empty '''[]''' pair (again, safety-escaped).  Depending on the source, the empty '''[]'''-pair could mean many things, as with the other layers.  And the lines may end with any further text.
+
The "grep -o" function is working on the output to the file being '''cat'''ed (there are alternate ways of doing this that some people might prefer), to only accept the lines in the file that match the '''X\[(Y\[])]Z''' string.  These lines would appear to be lines of out.txt (a fairly generic name that reveals little to its original purpose) that are well-formed for some other purpose.  A safety-escaped (i.e. not to be taken literally by any simple parser) '''[]'''-grouping containing a '''()'''-group (''not'' escaped, perhaps reasonably in context) containing potentially random text followed by an empty '''[]''' pair (again, safety-escaped).  Depending on the source, the empty '''[]'''-pair could mean many things, as with the other layers.  And the lines may end with any further text.
 
   
 
   
The "out.txt" file might be the result of a prior Grep (string-search function) quote possibly scanning code for lines of particular importance by another pattern and dumping the results to out.txt for further perusal.  And then Randall finds the need to dig further into the first result by extracting just those already selected that all have the '''X\[(Y\[])Z]'''-ish pattern to them.
+
The "out.txt" file might be the result of a prior Grep (string-search function) quote possibly scanning code for lines of particular importance by another pattern and dumping the results to out.txt for further perusal.  And then Randall finds the need to dig further into the first result by extracting just those already selected that all have the '''X\[(Y\[])Z]'''-ish pattern to them.
 
   
 
   
But I could be wrong, and that's way too long for an official explanation.
+
But I could be wrong, and that's way too long for an official explanation.
 
(Perhaps just something like the penultimate paragraph, if we're not entirely mistaken?) [[Special:Contributions/162.158.152.89|162.158.152.89]] 14:14, 3 February 2016 (UTC)
 
(Perhaps just something like the penultimate paragraph, if we're not entirely mistaken?) [[Special:Contributions/162.158.152.89|162.158.152.89]] 14:14, 3 February 2016 (UTC)
  
The regex is supposed to be looking for:
+
The regex is supposed to be looking for (''Note: The regex changed after initial publication. See '''Changed Regex''' below.''):
 
  \\\      backslash
 
  \\\      backslash
 
  [[(]    [ or (
 
  [[(]    [ or (
Line 116: Line 127:
 
  $        end of string
 
  $        end of string
  
The first problem is that you're not supposed to escape ] in a [...], and it also has to be first in the grouping (unless negated with a ^) It should be [][)] or something similar.
+
The first problem is that you're not supposed to escape ] in a [...], and it also has to be first in the grouping (unless negated with a ^) It should be [][)] or something similar.
The second problem is the same. The last bit should be [^])]*$ and not [^)\]]*$. [[User:Khris|Khris]] ([[User talk:Khris|talk]]) 14:24, 3 February 2016 (UTC)
+
 
 +
The second problem is the same. The last bit should be [^])]*$ and not [^)\]]*$. [[User:Khris|Khris]] ([[User talk:Khris|talk]]) 14:24, 3 February 2016 (UTC)
  
  
Line 124: Line 136:
  
  
The regex relies on several special cases (*surprise*).
+
The regex relies on several special cases (*surprise*). (''Note: The regex changed after initial publication. See '''Changed Regex''' below.'')
 
First: bash double-quote expansion (see [https://www.gnu.org/software/bash/manual/html_node/Double-Quotes.html#Double-Quotes]). Perhaps non-intuitively, \\\ followed by a character that \ doesn't escape is an escaped backslash followed by a literal backslash, effectively the same as \\\\ followed by that same non-escaped character.  After bash double-quote expansion, this results in:
 
First: bash double-quote expansion (see [https://www.gnu.org/software/bash/manual/html_node/Double-Quotes.html#Double-Quotes]). Perhaps non-intuitively, \\\ followed by a character that \ doesn't escape is an escaped backslash followed by a literal backslash, effectively the same as \\\\ followed by that same non-escaped character.  After bash double-quote expansion, this results in:
  
Line 149: Line 161:
  
  
One key thing to understand is that \ is not a special character when it's in a bracket expression - you can't escape characters in bracket expressions. So [^)\] simply means any character other then ) or \. Also, ( and ) are just regular characters unless they are escaped in basic regular expressions - extended regular expressions reverse this rule.
 
 
----
 
----
 
[[Special:Contributions/108.162.216.34|108.162.216.34]] 16:14, 3 February 2016 (UTC)rb
 
[[Special:Contributions/108.162.216.34|108.162.216.34]] 16:14, 3 February 2016 (UTC)rb
 +
 +
One key thing to understand is that \ is not a special character when it's in a bracket expression - you can't escape characters in bracket expressions. So [^)\] simply means any character other then ) or \. Also, ( and ) are just regular characters unless they are escaped in basic regular expressions - extended regular expressions reverse this rule. {{unsigned|Kalfalfa}}
  
 
I don't know about the regular expression in the title text, but I think the explanation is incorrect in that it starts off talking about regular expressions. Escaping backslashes is an issue with strings in programming in general. [[Special:Contributions/173.245.54.46|173.245.54.46]] 17:12, 3 February 2016 (UTC)
 
I don't know about the regular expression in the title text, but I think the explanation is incorrect in that it starts off talking about regular expressions. Escaping backslashes is an issue with strings in programming in general. [[Special:Contributions/173.245.54.46|173.245.54.46]] 17:12, 3 February 2016 (UTC)
Line 157: Line 170:
 
I suspect that Randall may have used the regexp in the title text to *find* malformed regular expressions in a file (out.txt) that he (or someone) had previously filled with output from some error message (or collection of error messages, or at least the output of something where a regular expression had been expected to work but had not worked as expected). [[Special:Contributions/162.158.252.227|162.158.252.227]] 19:06, 3 February 2016 (UTC)
 
I suspect that Randall may have used the regexp in the title text to *find* malformed regular expressions in a file (out.txt) that he (or someone) had previously filled with output from some error message (or collection of error messages, or at least the output of something where a regular expression had been expected to work but had not worked as expected). [[Special:Contributions/162.158.252.227|162.158.252.227]] 19:06, 3 February 2016 (UTC)
  
== Example of a match ==
+
You can use metacharacters in character classes, the only metacharacters in a character class that must be escaped are the closing square bracket (]), the backslash (\), the hyphen, and the carat and hyphen (^) if they are the first listed item in the set. The closing square bracket requires escaping because including it without would signal the end of the set otherwise, which then means the backslash must also be escaped. The hyphen must be escaped because, without it, it signals a range (unless it is listed first, then it is literal without escaping). Carat when listed first because otherwise it signals a negative set.<br>
 +
Therefore, the end of the title text regex matches a backslash followed by either ] or ), which is then followed by any number (including none) of characters so long as they are not ] nor ) which means the whole regex can match "<span style="color:#040;">\[something\] more</span>" or "<span style="color:#040;">\(something\)more</span>" or "<span style="color:#040;">\[something\) more</span>" as well as "<span style="color:#040;">\[something\]</span>". — [[Special:Contributions/162.158.255.117|162.158.255.117]] 01:16, 4 February 2016 (UTC)
 +
 
 +
I'll add that I use an ''almost identical'' regex in my mail server for matching mailing-list subject lines which often have a format of "<span style="color:#040;">[Listname] normal subject line</span>" which made it pretty recognizable to me. — [[Special:Contributions/162.158.255.117|162.158.255.117]] 01:24, 4 February 2016 (UTC)
 +
 
 +
;Example of a match
 +
''Note: The regex changed after initial publication. See '''Changed Regex''' below''
  
 
First, the shell will do some escaping substitution. So, in order to easily read it, let's see what grep really receives:
 
First, the shell will do some escaping substitution. So, in order to easily read it, let's see what grep really receives:
Line 176: Line 195:
 
* <code>$</code> matches the end of the string
 
* <code>$</code> matches the end of the string
  
So the string '''\[aaa\]\\)]a]]]]]]''' matches!
+
So the string '''\[aaa\]\\)]a]]]]]]''' matches! {{unsigned ip|108.162.228.167}}
 +
 
 +
...Maybe it's meant to search for all Game Grumps transcripts which make mention of the "[http://gamegrumps.wikia.com/wiki/Grep Grep]" gag? [[Special:Contributions/108.162.216.55|108.162.216.55]] 15:53, 3 February 2016 (UTC)
 +
 
 +
...Wow, guys, and here I was thinking he wanted to put the cat out, when the cat didn't want to go out.... [[Special:Contributions/108.162.249.158|108.162.249.158]] 04:03, 4 February 2016 (UTC)
 +
 
 +
What I think is that Randall probably ''intended'' the regex to match "backslash, opening round or square bracket, anything, backslash, closing round or square bracket, anything that doesn't involve closing round or square brackets", since (unlike most other possibilities given) that actually looks like something one might want to search for. Whether it ''does'', in fact, match that or something else (or indeed anything at all) is another question entirely. (For all we know, it didn't work, Randall figured out it didn't, and wrote the correct thing the next line over.)<br>Unrelatedly: this comic (and the backslash proliferation in general) reminded me of the Telnet Song. --[[Special:Contributions/162.158.180.137|162.158.180.137]] 04:16, 4 February 2016 (UTC)
 +
 
 +
That explanation is wrong: <code>[\]</code> does not match a literal backslash; it would still need to be escaped inside the brackets. That backslash escapes the next character, a ], so the group doesn't end there. The actual expression there is <code>[\])]</code>, a character group containing an escaped ] and a ). Just like the first part. It is most likely intended to catch content surrounded by [ ] or ( ). [[Special:Contributions/141.101.104.15|141.101.104.15]] 13:43, 4 February 2016 (UTC)
 +
:To clarify: this makes the expression catch anything that starts with a block surrounded by escaped round or square brackets. So stuff like '''\(Hello world\)more text here''' but with either round or square brackets (or combinations, since there's nothing enforcing they have to match. I'd have made it an OR case with two groups with matching brackets, personally) -[[Special:Contributions/141.101.104.15|141.101.104.15]] 13:51, 4 February 2016 (UTC)
 +
:You're making the same mistake Randall did: while many (most?) regex dialects use \ as escape inside a character class, this is not true for grep's default syntax. I've expanded that interpretation in my comment below, however the analysis by 108.162.228.167 is a correct explanation of how this expression is ''actually'' interpreted by grep. --[[Special:Contributions/141.101.75.185|141.101.75.185]] 15:42, 4 February 2016 (UTC)
 +
 
 +
Your analysis is thorough and correct, however it is unlikely this is what the regex was intended to accomplish. (''Note: The regex changed after initial publication. See '''Changed Regex''' below.'') More likely, Randall is more accustomed to other regex dialects such as Perl(-compatible) regex where a backslash ''does'' work to escape special characters inside a character class.  Under that assumption the regex (with some whitespace inserted for readability) would break up as:
 +
* <code>\\ [[(]</code> an escaped opening bracket or paren
 +
* <code>.*</code> anything
 +
* <code>\\ [\])]</code> an escaped closing bracket or paren
 +
* <code>[^)\]]* $</code> no closing bracket or paren occurring on the remainder of the line
 +
Although the final condition is still a bit obscure, this still makes a ''lot'' more sense. Unfortunately it also crushes Randall's hope the regex worked as intended, since this simply isn't how the expression is parsed with grep's default syntax (which is why I always use <code>grep -P</code>). --[[Special:Contributions/141.101.75.185|141.101.75.185]] 15:34, 4 February 2016 (UTC)
 +
----
 +
Did anyone notice the [https://en.wikipedia.org/wiki/Cat_%28Unix%29#Useless_use_of_cat Useless Use of Cat]? [[Special:Contributions/141.101.106.101|141.101.106.101]] 19:36, 4 February 2016 (UTC)
 +
 
 +
:Yup - I hereby award Randall with the Useless Use of Cat Award of the day. Cherish it.
 +
:[[User:Zedn00|Zedn00]] ([[User talk:Zedn00|talk]]) 03:51, 5 February 2016 (UTC) Zedn00
 +
----
 +
;Changed Regex
 +
At some point before 2016-02-09 18:00 +0100, Randall has modified the bash command in the title text!
 +
 
 +
Original command:
 +
cat out.txt | grep -o "\\\[[(].*\\\[\])][^)\]]*$"
 +
New command:
 +
cat out.txt | grep -o "[[(].*[])][^)]]*$"
 +
 
 +
For the old command, 108.162.228.167's and 108.162.216.34's explanations above were correct.
 +
 
 +
The new command matches:
 +
[[(]  either a '[' or a '('
 +
.*    an unbounded and possibly empty sequence of arbitrary characters
 +
[])]  either a ']' or a ')'
 +
[^)]  any character except for a ')'
 +
]*    an unbounded and possibly empty sequence of ']'
 +
$    anchored at end of line
 +
 
 +
It now e.g. matches '''123[abc.<>)x]]]]]''':
 +
$ echo "123[abc.<>)x]]]]]" | tee /dev/stderr | grep -o "[[(].*[])][^)]]*$"
 +
 
 +
This makes hardly more sense than the original command.
 +
--[[User:Markus|Markus]] ([[User talk:Markus|talk]]) 17:38, 9 February 2016 (UTC)
 +
:Randal may have been sincere about finding it in his history and wondering if it worked. I think he probably meant
 +
cat out.txt | grep -o "[[(].*[])][^])]*$"
 +
 
 +
:which breaks down as:
 +
[[(]  either a '[' or a '('
 +
.*    an unbounded and possibly empty sequence of arbitrary characters
 +
[])]  either a ']' or a ')'
 +
[^])]*  any number of any characters except for a ')' or ']'
 +
$    anchored at end of line
 +
 
 +
:This matches any line that has a '[' or '(' followed by a ')' or ']', matching from the first '[' or '(' to the end of the line. The final part of the regex, '[^])]*$', is not really necessary here, but it is a common pattern to follow a character pattern with an opposite character pattern to be sure the first character pattern matches the last instance of a repeating character, so he might have added it out of habit, which would explain also why he got it wrong (since he just followed '[blah]' with '[^blah]' which in this special case doesn't work because 'blah' has a special character in it: ']').
 +
:[[User:Jgro|Concerned Netizen]] ([[User talk:Jgro|talk]]) 02:23, 25 April 2017 (UTC)
 +
----
 +
Funny enough, I'm literally looking at some other dev's code right now that actually implements an eight backslash regex sequence, with just the comment "backslash". I'm still scratching my head over what they were trying to accomplish or even communicate with this. [[User:Domino|Domino]] ([[User talk:Domino|talk]]) 21:45, 16 August 2016 (UTC)domino
 +
----
 +
I believe the regex is a reference to xkcd 1313 (Regex Golf)
 +
[[Special:Contributions/108.162.221.90|108.162.221.90]] 16:08, 26 August 2016 (UTC)
 +
----
 +
I had to use a backslash that escaped the screen today. I have a Discord bot written with Node.js, and my "friends" demanded I add a shruggie output [despite Discord having that already]. So, I now have a string that looks like <code>'¯\\\\\\_(ツ)\_/¯'</code>.{{unsigned|Papayaman1000}}
 +
----
 +
Regarding the change to the title text (with all backslashes being removed) it appears that this may not be a deliberate edit, as other comics (e.g. 1277) have also had all backslashes disappear from the title text. It appears that one of the tools Randall is using may be 'solving' accidentally escaped characters by doing a sed 's/\\//g' [[User:Sqek|Sqek]] ([[User talk:Sqek|talk]]) 13:18, 14 December 2017 (UTC)
 +
 
 +
<code>sed -r "s/^(.*)$/^\1$/" wordlist.txt</code> What was I cooking... There are also cursed grep commands for wordle such as <code>grep -iwE '[qwiafghjkzxvbn]{5}' wordlist.txt | grep -i '.[^ia][^n][^ai][^n]' | grep -i [ain] </code> [[User:Wilh3lm|Wilh3lm]] ([[User talk:Wilh3lm|talk]]) 14:19, 16 April 2024 (UTC)

Latest revision as of 14:20, 22 August 2024

It should be noted that this also occurs in almost every programming language where "\" is the escape character. i.e.

print("Hello")
> Hello
print("\"Hello\"")
> "Hello"
print("\\Hello\\")
> \Hello\

Oh, and by the way, isn't this the third comic to mention "Ba'al, the Soul Eater"? Maybe we should start a category. (Others are 1246 (title text) and 1419.) 173.245.54.29 06:14, 3 February 2016 (UTC)

Did that before seeing you comment, so yes I agree. --Kynde (talk) 09:47, 3 February 2016 (UTC)
But Davidy did not so the category has been deleted again. I have just cleaned up after my mess ;-) so there are no left over links to the dead category... --Kynde (talk) 22:27, 8 February 2016 (UTC)
I noticed theres no character for Ba'al the soul eater on the character page. Is it just me or does someone else think that Ba'al should have a spot given that he(she??they??it??) has been in more comics than a lot of the minor characters? Just a thought. --Apollo11 (talk) 10:42, 11 March 2024 (CST)
That would be pretty funny. What if Ba'al the Soul Eater likes scones? He and Beret Guy would be best friends. 172.68.245.229 (talk) 19:11, 21 August 2024 (please sign your comments with ~~~~)

Beret Guy would probably just think he's cranky from sleeping so long. Like "Oh my gosh are those HORNS?! THATS SO COOL!! Wanna come over and eat some bread?" Ba'al starts a curse "Hash slen non desth NEVAIR-" Beret Guy interrupting "Wow your congested, I have some soup that'll fix you right up" Apollo11 (talk) 14:20, 22 August 2024 (UTC)

The last entry may also be an oblique reference to the infinitely-expandable recursive acronym "GOD = GOD Over Djinn" mentioned in Richard Hofstadter's Gödel, Escher, Bach.Taibhse (talk) 16:42, 3 February 2016 (UTC)

I don't think the regex is invalid

Note: The regex changed after initial publication. See Changed Regex below

According to man grep you need to specify the -E option to use extended regex; without it unescaped parentheses are not interpreted, so they don't need to match.

My - very wild - guess is that it was the command he used to find the line with the most special characters, but I am not confident enough to edit the article (if someone can confirm?). 141.101.66.83 (talk) (please sign your comments with ~~~~)

If it was supposed to do that, it doesn't work. Running it on my bash history matches no lines, and I have lots of special characters in there 197.234.242.243 07:12, 3 February 2016 (UTC)

Explain it to me like I'm dumb. What is this comic going on about? I think the explanation needs more examples like that hello, above, because that's almost understandable. --198.41.238.231 07:47, 3 February 2016 (UTC)

I agree. But I cannot help either.--Kynde (talk) 09:51, 3 February 2016 (UTC)

This is the third time Randall has mentioned Ba'al the Soul Eater xD International Space Station (talk) 08:26, 3 February 2016 (UTC)

Yes, that was already mentioned a few hours before you comment, see the first comment. --Kynde (talk) 09:51, 3 February 2016 (UTC)

After passing the regex through bash, you get \\[[(].*\\[\])][^)\]]*$ That is, the literal character \, followed by [ or (, followed by any number of any characters, followed by \, followed by ] or ), followed by any number of characters that aren't ) or ], until the end of the line. 108.162.216.44 08:33, 3 February 2016 (UTC)

It sounds like you know what you are talking about. Anyone who can explain it good enough for the explanation, and correct the explanation of the title text if it is wrong to say that it would not work. I have added this as the reason for incomplete. But maybe also examples are needed for people with not programming skills/knowledge. We also enjoy xkcd ;-) --Kynde (talk) 09:51, 3 February 2016 (UTC)
I'm thinking that it's grepping for regular expressions that contain regular expressions. A regex containing "\[...\]" or "\(...\)" will match other regular expressions, as almost all non-trivial regexes use either character lists or groups. Now why out.txt is likely to contain not just regexes but rather regexes that search for regexes I have no idea - perhaps he had actually put too many backslashes in and he was trying to grep just for "[...]" or "(...)" (i.e. to locate probable regular expressions in out.txt, or anything else in parenthesis for that matter such as countless kinds of code/markup)? 162.158.152.185 17:35, 4 February 2016 (UTC)

For fun:

cat ~/.bash_history | xargs -d "\n" -n 1 -I {} bash -c 'chars="$(echo "$1" | grep -o "[a-zA-Z0-9 ]" | wc -l)"; echo "$(( 100 - $(( $chars * 100 / ${#1} )) )) $1"' _ {} | sort -nrk 1 | less

Outputs your bash_history, ordered by relative gibberishness. This was copied by hand from desktop to mobile, might well have a few typos.--162.158.90.208 10:04, 3 February 2016 (UTC)

Besides the fact that -d is a GNU extension to xargs (so it won't exist on OS X, FreeBSD, or anything else but Linux), this is a weird way to calculate gibberishness; I'm guessing functions, variable substitutions, .. and ./, etc. are going to swamp the more unreadable grep and the like. Plus, I think you need a uniq in there somewhere; otherwise, aren't the first few pages are all going to be filled with the 78 copies of "422 cd .." that tied for most gibberishy in my last 500 commands? --162.158.255.82 22:51, 7 March 2016 (UTC)
I know it's been almost a decade, but I wanted to provide a more portable solution. Here's a posixly correct way to do that: sed 'h;:b;s/./-/g;:a;s/-/<<123456789-01>/;s/\(.\)<.*\1\(-*.\).*>/\2/;/^$/s/^/0/;ta;x;H;/\n/!s/[a-zA-Z0-9 ]*//g;tb;x;s/\n/ /g;s/0 0/0 1/' ~/.bash_history | awk '{$1=$1/$2;$2="";print $0}' | sort -rn | less. However, if you do have GNU sed, I think this is nicer: sed -E 'h;:b;s/./-/g;:a;s/-/<<123456789-01>/;s/(.)<.*\1(-*.).*>/\2/;/^$/s/^/0/;ta;x;H;/\n/!s/[a-zA-Z0-9 ]*//g;tb;x;s/^/dc -e"Fk/;s%\n%/n";#%2e;G;s/\n.*\n/ /;s/\.(..)([^ ]*)/\1.\2%/' ~/.bash_history | sort -rn | less if your 'most gibberishy commands' are 'cd ..' then you're obviously not having enough fun on the command line. For example, a short and sweet one I found which I'd encourage everyone to try is PS1='C:${PWD////\\\\}>' Regex user (talk) 10:30, 23 June 2024 (UTC)

The problem in the comic is not with regexes per se but with situations when the entered text or expression passes through several interpreters, like bash -> grep/sed/awk, or program text -> external shell command. In such cases, you have to escape backslashes for each program in the sequence, and it gets worse if you have 'real' backslashes in the final text that you're processing with the utilities (Windows' file paths, for example). See https://en.wikipedia.org/wiki/Leaning_toothpick_syndrome. Feel free to lift this to the explanation page, since I'm not good at longer and more careful explanations than this one. Also, gotta notice that Feedly stripped paired backslashes in the title text (probably passed it through some 'interpreter' embedded in its scripts). Aasasd (talk) 10:13, 3 February 2016 (UTC)

A funny comment about the MediaWiki software, which is even worse than this comic: <Nikerabbit> I looked the code for rlike and didn't find where it does this. Can you point me to it? <vvv> $pattern = preg_replace( '!(\\\\\\\\\\\\\\\\)*(\\\\\\\\)?/!', '$1\\/', $pattern ); <Nikerabbit> I thought that was ascii art :) (source) --162.158.91.215 10:18, 3 February 2016 (UTC)

Interestingly, I first looked at this on my phone (using Chrome Feedly for Android), but the title text did not display correctly in that the backslashes didn't appear (which was a little confusing!). In Chrome on my Windows desktop, the title text appeared correctly. Jdluk (talk) 11:36, 3 February 2016 (UTC)

enough with the harry potter fancruft. "elder" is a perfectly good word. just because you came across it for the first time in harry potter means you are *typing carefully* the kind of person that likes harry potter. unless this is a harry potter reference wiki, of course. in which case i'll prepare a complete list of every word that appears both here and there and put a list on every page. oh, right, no i won't. --141.101.106.161 12:41, 3 February 2016 (UTC)

Remember that "Elder" is used in a lot of RPGs to denote high level enemies or items. I feel like that's what Randall's referring to here, more than Harry Potter or the general sense of the term "Elder." 108.162.245.156 (talk) (please sign your comments with ~~~~)

+1. Between the fact that harry potter (, ages, or tribes) aren't mentioned anywhere else in the text and the comic being a progressive list, I see this being the most likely explanation. Plus the metion of demons, which are easily the most* common usage of the modifier.
(*) or second most, after "elder gods", who are, let's face it, also demons. 162.158.180.125 14:41, 3 February 2016 (UTC)
I'm pretty sure that "Elder backslash" is in reference to the "Elder gods" of Lovecraft. 173.245.54.35 16:51, 3 February 2016 (UTC)
Note also that it's called 'The Elder Wand' not as an intensifier, as in this comic and the other examples given, but because it is literally made from the wood of an Elder Tree I'm pretty sure it's not an intentional reference. -Graptor 173.245.54.23 19:29, 3 February 2016 (UTC)
If it's an intentional reference to anything, it's to Lovecraft (or to something similar). I suspect the Elder Wand was an intentional pun by Rowling, however. --162.158.180.137 04:16, 4 February 2016 (UTC)
Since no-one else seemed to want to, I just restructured that paragraph to make it more clear that if anything Harry Potter was inspired by the older examples, not the other way around. Expanded the LOTR reference and added DnD. If anything Randall is likely to be referencing either the Lovecraft references, or the concept of Elder in general. 141.101.64.173 11:50, 4 February 2016 (UTC)

Attempting to add to the discussion: This regex is not necessarily invalid or incomprehensible. (Note: The regex changed after initial publication. See Changed Regex below.) It looks like he was looking for a line with a regular expression or definitely some code. You just have to work your way through the backslashes. Although it might be invalid depending on the precise rules. He has some unescaped closing brackets and closing parenthesis. If these have to always be escaped then the regex is invalid. If however you don't have to escape a closing bracket with no opening bracket, then things are fine. I'm not familiar enough with grep's regex parser to know how it handles that edge case. Presuming those unescaped paren and brackets are fine, his regex searches for:

1. A backslash

2. An opening bracket

3. An opening parenthesis (this is a character set but the only character in it is an opening paren)

4. Any number of any characters

5. A backslash

6. An opening bracket

7. A closing bracket

8. A closing paren (presuming it doesn't have to be escaped when there is no opening paren)

9. A closing bracket (presuming it doesn't have to be escaped when there is no opening bracket)

10. Any number of character that are not a closing paren or closing bracket

11. The end of the line


Basically he is looking for a string that looks like:

\[(AAAAA\[])]AAAAA

Looks like a regex to me, and it looks like this regex also doesn't escape closing paren/brackets that don't have an opening paren/bracket, so I'm guessing that he knows what he is doing and his regex is fine. Maybe he was playing regex golf? Cmancone (talk)cmancone

Ninjaed by Cmancone, above. I agree with that result in every respect except for the start-of-string being potentially anything, but putting my own analysis in here because it took long enough to type!

Depth-of-backslash might depend upon depth of utility. In Perl, ''-quotes (among others) treat everything within as literal whilst ""-quotes (and variations) interpolates any special characters, variables, etc that you put in it. (Search for "Quote and Quote-like operators" in your favourite PerlDocs source.) '\sss' is a literal backslash followed by three 's' characters , while "\sss" is the special \s escape (a whitespace) followed by two further regular characters. You might need to define the first when you need to use it to provide a not-previously-escaped \s so that it might be escaped within another context. Or you define it as "\\sss" (escaped-\) the first time, as equivalent to '\sss'. But '\\sss' would be a literal that, later, could be interpreted as an escaped-\ to the input of a further context where the \s finally becomes 'match a whitespace'.

'\\\sss' would be literal, whilst "\\\sss" could be equivalent to '\ ss' (literal backslash, literal space, rest of characters). Then, instead of literal '\\sss', for some purpose, you could interpolate two escaped-backslashes "\\\\sss"... and so on.

Meanwhile I think, just from visual inspection, "\\\[[(].*\\\[\])][^)\]]*$" in Bash should obey the interpolation rules quite nicely. The first two characters must be a literal backslash (from the escaped-backslash) and a literal open-square bracket (again, escaped). The next open-square and the close-square shortly after depict a character class that contains only an open-parenthesis, and could have been written as \(.

The .* indicates zero-or-more (the asterix) instances of any character (the dot). There is then a literal backslash (from the next \\ duo) and a literal open-square (the \[ pair) and close-square (the \] pair). The ) is literal and does not need escaping (as a parenthesis group had not yet been opened), as is the next ] character. To be sure, I would have written these two as the pair escapes \)\], but horses for courses...

Then there's another character class (the next [ and the final ]) required zero-or-more times (the asterix) to use up all the rest of the characters to the end (the ending $ character). As there was no ^ character (a.k.a. caret/circumflex/etc) at the start, the match isn't bothered about what unmatched characters appear before the original \(. This character class, however, starts with a ^ which in this context (the very first character of a character-class definition, not somewhere where an entire match-string starts) indicates negation of the following selection, so it is all characters but those specified, which is the regular close-parenthesis and (because it needs to be contained within a [] pair) the escaped close-square.

So, all matching strings must start with \[(, i.e. the backslash, open-square and open-paren. They can continue with any further text, before then having a \[])], i.e. backslash, open-and-close-squares and close-paren, close-square. After this, the match continues just as long as there are no non-closing square/classic brackets before the ending.

The minimum matching literal string would be \[(\[])] with longer variants being of the form X\[(Y\[])]Z where X and Y can be replaced by anything (or be absent), and Z can be replaced by anything (or absent!) so long as it doesn't contain possibly relevent close-brackets!. The latter stipulation is likely because the Y (and X) is allowed to contain these characters, and for some reason you don't want to confuse the test by finding some other \[])] segment within the X/Y-zones. (In this context, it doesn't actually seem to matter too much. But it might do in ways I haven't spotted or just be a hang-over from a prior permutation of the test.)

The "grep -o" function is working on the output to the file being cated (there are alternate ways of doing this that some people might prefer), to only accept the lines in the file that match the X\[(Y\[])]Z string. These lines would appear to be lines of out.txt (a fairly generic name that reveals little to its original purpose) that are well-formed for some other purpose. A safety-escaped (i.e. not to be taken literally by any simple parser) []-grouping containing a ()-group (not escaped, perhaps reasonably in context) containing potentially random text followed by an empty [] pair (again, safety-escaped). Depending on the source, the empty []-pair could mean many things, as with the other layers. And the lines may end with any further text.

The "out.txt" file might be the result of a prior Grep (string-search function) quote possibly scanning code for lines of particular importance by another pattern and dumping the results to out.txt for further perusal. And then Randall finds the need to dig further into the first result by extracting just those already selected that all have the X\[(Y\[])Z]-ish pattern to them.

But I could be wrong, and that's way too long for an official explanation. (Perhaps just something like the penultimate paragraph, if we're not entirely mistaken?) 162.158.152.89 14:14, 3 February 2016 (UTC)

The regex is supposed to be looking for (Note: The regex changed after initial publication. See Changed Regex below.):

\\\      backslash
[[(]     [ or (
.*       any character (repeated 0 or more times)
space    space
\\\      backslash
[[\])]   probably meant to match either [, ] or ). However, it's not correct, it instead matches the literal characters [)]
[^)\]]*  probably meant to match any character that isn't ) or ], repeated. Instead it means one character that's not a ), and then a ] zero or more times
$        end of string

The first problem is that you're not supposed to escape ] in a [...], and it also has to be first in the grouping (unless negated with a ^) It should be [][)] or something similar.

The second problem is the same. The last bit should be [^])]*$ and not [^)\]]*$. Khris (talk) 14:24, 3 February 2016 (UTC)


I was reading through the regex, if using grep you run into an error with an unmatched ")". Removing this gets a string such as \[(AAAAA\[]]AAAAA$ http://regexr.com/3cng8 162.158.214.230 14:42, 3 February 2016 (UTC)


The regex relies on several special cases (*surprise*). (Note: The regex changed after initial publication. See Changed Regex below.) First: bash double-quote expansion (see [1]). Perhaps non-intuitively, \\\ followed by a character that \ doesn't escape is an escaped backslash followed by a literal backslash, effectively the same as \\\\ followed by that same non-escaped character. After bash double-quote expansion, this results in:

\\[[(].*\\[\])][^)\]]*$


grep interprets this as:

  1. any leading non-\ characters
  2. literal backslash
  3. character class containing [ and (
  4. zero or more *any* characters
  5. another literal backslash
  6. yet another literal backslash, via a character class containing only a backslash. Note this does not contain an escaped ], as it might appear at first glance. See [2]
  7. literal )
  8. literal ]
  9. character class of anything except ), \
  10. zero or more ]
  11. end of line

Matching examples:

  • echo 'asdf\[asdfasdf\\)]a]]]]]]' | grep -o "\\\[[(].*\\\[\])][^)\]]*$"
  • echo '\(\\)]P' | grep -o "\\\[[(].*\\\[\])][^)\]]*$"



108.162.216.34 16:14, 3 February 2016 (UTC)rb

One key thing to understand is that \ is not a special character when it's in a bracket expression - you can't escape characters in bracket expressions. So [^)\] simply means any character other then ) or \. Also, ( and ) are just regular characters unless they are escaped in basic regular expressions - extended regular expressions reverse this rule. -- Kalfalfa (talk) (please sign your comments with ~~~~)

I don't know about the regular expression in the title text, but I think the explanation is incorrect in that it starts off talking about regular expressions. Escaping backslashes is an issue with strings in programming in general. 173.245.54.46 17:12, 3 February 2016 (UTC)


I suspect that Randall may have used the regexp in the title text to *find* malformed regular expressions in a file (out.txt) that he (or someone) had previously filled with output from some error message (or collection of error messages, or at least the output of something where a regular expression had been expected to work but had not worked as expected). 162.158.252.227 19:06, 3 February 2016 (UTC)

You can use metacharacters in character classes, the only metacharacters in a character class that must be escaped are the closing square bracket (]), the backslash (\), the hyphen, and the carat and hyphen (^) if they are the first listed item in the set. The closing square bracket requires escaping because including it without would signal the end of the set otherwise, which then means the backslash must also be escaped. The hyphen must be escaped because, without it, it signals a range (unless it is listed first, then it is literal without escaping). Carat when listed first because otherwise it signals a negative set.
Therefore, the end of the title text regex matches a backslash followed by either ] or ), which is then followed by any number (including none) of characters so long as they are not ] nor ) which means the whole regex can match "\[something\] more" or "\(something\)more" or "\[something\) more" as well as "\[something\]". — 162.158.255.117 01:16, 4 February 2016 (UTC)

I'll add that I use an almost identical regex in my mail server for matching mailing-list subject lines which often have a format of "[Listname] normal subject line" which made it pretty recognizable to me. — 162.158.255.117 01:24, 4 February 2016 (UTC)

Example of a match

Note: The regex changed after initial publication. See Changed Regex below

First, the shell will do some escaping substitution. So, in order to easily read it, let's see what grep really receives:

$ echo "\\\[[(].*\\\[\])][^)\]]*$"
\\[[(].*\\[\])][^)\]]*$

Let's break it out:

  • \\ matches a \
  • [[(] matches either a [ or a (
  • .* matches any series of characters until the next match
  • \\ matches a \
  • [\] matches a \
  • )] matches )]
  • [^)\] matches anything but ) or \
  • ]* matches any number of ] (including none)
  • $ matches the end of the string

So the string \[aaa\]\\)]a]]]]]] matches! 108.162.228.167 (talk) (please sign your comments with ~~~~)

...Maybe it's meant to search for all Game Grumps transcripts which make mention of the "Grep" gag? 108.162.216.55 15:53, 3 February 2016 (UTC)

...Wow, guys, and here I was thinking he wanted to put the cat out, when the cat didn't want to go out.... 108.162.249.158 04:03, 4 February 2016 (UTC)

What I think is that Randall probably intended the regex to match "backslash, opening round or square bracket, anything, backslash, closing round or square bracket, anything that doesn't involve closing round or square brackets", since (unlike most other possibilities given) that actually looks like something one might want to search for. Whether it does, in fact, match that or something else (or indeed anything at all) is another question entirely. (For all we know, it didn't work, Randall figured out it didn't, and wrote the correct thing the next line over.)
Unrelatedly: this comic (and the backslash proliferation in general) reminded me of the Telnet Song. --162.158.180.137 04:16, 4 February 2016 (UTC)

That explanation is wrong: [\] does not match a literal backslash; it would still need to be escaped inside the brackets. That backslash escapes the next character, a ], so the group doesn't end there. The actual expression there is [\])], a character group containing an escaped ] and a ). Just like the first part. It is most likely intended to catch content surrounded by [ ] or ( ). 141.101.104.15 13:43, 4 February 2016 (UTC)

To clarify: this makes the expression catch anything that starts with a block surrounded by escaped round or square brackets. So stuff like \(Hello world\)more text here but with either round or square brackets (or combinations, since there's nothing enforcing they have to match. I'd have made it an OR case with two groups with matching brackets, personally) -141.101.104.15 13:51, 4 February 2016 (UTC)
You're making the same mistake Randall did: while many (most?) regex dialects use \ as escape inside a character class, this is not true for grep's default syntax. I've expanded that interpretation in my comment below, however the analysis by 108.162.228.167 is a correct explanation of how this expression is actually interpreted by grep. --141.101.75.185 15:42, 4 February 2016 (UTC)

Your analysis is thorough and correct, however it is unlikely this is what the regex was intended to accomplish. (Note: The regex changed after initial publication. See Changed Regex below.) More likely, Randall is more accustomed to other regex dialects such as Perl(-compatible) regex where a backslash does work to escape special characters inside a character class. Under that assumption the regex (with some whitespace inserted for readability) would break up as:

  • \\ [[(] an escaped opening bracket or paren
  • .* anything
  • \\ [\])] an escaped closing bracket or paren
  • [^)\]]* $ no closing bracket or paren occurring on the remainder of the line

Although the final condition is still a bit obscure, this still makes a lot more sense. Unfortunately it also crushes Randall's hope the regex worked as intended, since this simply isn't how the expression is parsed with grep's default syntax (which is why I always use grep -P). --141.101.75.185 15:34, 4 February 2016 (UTC)


Did anyone notice the Useless Use of Cat? 141.101.106.101 19:36, 4 February 2016 (UTC)

Yup - I hereby award Randall with the Useless Use of Cat Award of the day. Cherish it.
Zedn00 (talk) 03:51, 5 February 2016 (UTC) Zedn00

Changed Regex

At some point before 2016-02-09 18:00 +0100, Randall has modified the bash command in the title text!

Original command:

cat out.txt | grep -o "\\\[[(].*\\\[\])][^)\]]*$"

New command:

cat out.txt | grep -o "[[(].*[])][^)]]*$"

For the old command, 108.162.228.167's and 108.162.216.34's explanations above were correct.

The new command matches:

[[(]  either a '[' or a '('
.*    an unbounded and possibly empty sequence of arbitrary characters
[])]  either a ']' or a ')'
[^)]  any character except for a ')'
]*    an unbounded and possibly empty sequence of ']'
$     anchored at end of line

It now e.g. matches 123[abc.<>)x]]]]]:

$ echo "123[abc.<>)x]]]]]" | tee /dev/stderr | grep -o "[[(].*[])][^)]]*$"

This makes hardly more sense than the original command. --Markus (talk) 17:38, 9 February 2016 (UTC)

Randal may have been sincere about finding it in his history and wondering if it worked. I think he probably meant
cat out.txt | grep -o "[[(].*[])][^])]*$"
which breaks down as:
[[(]  either a '[' or a '('
.*    an unbounded and possibly empty sequence of arbitrary characters
[])]  either a ']' or a ')'
[^])]*  any number of any characters except for a ')' or ']'
$     anchored at end of line
This matches any line that has a '[' or '(' followed by a ')' or ']', matching from the first '[' or '(' to the end of the line. The final part of the regex, '[^])]*$', is not really necessary here, but it is a common pattern to follow a character pattern with an opposite character pattern to be sure the first character pattern matches the last instance of a repeating character, so he might have added it out of habit, which would explain also why he got it wrong (since he just followed '[blah]' with '[^blah]' which in this special case doesn't work because 'blah' has a special character in it: ']').
Concerned Netizen (talk) 02:23, 25 April 2017 (UTC)

Funny enough, I'm literally looking at some other dev's code right now that actually implements an eight backslash regex sequence, with just the comment "backslash". I'm still scratching my head over what they were trying to accomplish or even communicate with this. Domino (talk) 21:45, 16 August 2016 (UTC)domino


I believe the regex is a reference to xkcd 1313 (Regex Golf) 108.162.221.90 16:08, 26 August 2016 (UTC)


I had to use a backslash that escaped the screen today. I have a Discord bot written with Node.js, and my "friends" demanded I add a shruggie output [despite Discord having that already]. So, I now have a string that looks like '¯\\\\\\_(ツ)\_/¯'. -- Papayaman1000 (talk) (please sign your comments with ~~~~)


Regarding the change to the title text (with all backslashes being removed) it appears that this may not be a deliberate edit, as other comics (e.g. 1277) have also had all backslashes disappear from the title text. It appears that one of the tools Randall is using may be 'solving' accidentally escaped characters by doing a sed 's/\\//g' Sqek (talk) 13:18, 14 December 2017 (UTC)

sed -r "s/^(.*)$/^\1$/" wordlist.txt What was I cooking... There are also cursed grep commands for wordle such as grep -iwE '[qwiafghjkzxvbn]{5}' wordlist.txt | grep -i '.[^ia][^n][^ai][^n]' | grep -i [ain] Wilh3lm (talk) 14:19, 16 April 2024 (UTC)