Difference between revisions of "Talk:1638: Backslashes"

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
(A little editing... Might be more readable. Might not be.)
Line 80: Line 80:
 
  '\\\sss' would be literal, whilst "\\\sss" could be equivalent to '\ ss' (literal backslash, literal space, rest of characters).  Then, instead of literal '\\sss', for some purpose, you could interpolate two escaped-backslashes "\\\\sss"... and so on.
 
  '\\\sss' would be literal, whilst "\\\sss" could be equivalent to '\ ss' (literal backslash, literal space, rest of characters).  Then, instead of literal '\\sss', for some purpose, you could interpolate two escaped-backslashes "\\\\sss"... and so on.
 
   
 
   
  Meanwhile I ''think'', just from visual inspection, "\\\[[(].*\\\[\])][^)\]]*$" in Bash should obey the interpolation rules quite nicely.  The first two characters must be a literal backslash (from the escaped-backslash) and a literal open-square bracket (again, escaped).  The next open-square and the close-square shortly after depict a character class that contains only an open-parenthesis, and could have been written as "\(".
+
  Meanwhile I ''think'', just from visual inspection, "'''\\\[[(].*\\\[\])][^)\]]*$'''" in Bash should obey the interpolation rules quite nicely.  The first two characters must be a literal backslash (from the escaped-backslash) and a literal open-square bracket (again, escaped).  The next open-square and the close-square shortly after depict a character class that contains only an open-parenthesis, and could have been written as '''\('''.
 
   
 
   
  The .* indicates zero-or-more (the asterix) instances of ''any'' character (the dot).  There is then a literal backslash (from the next \\ duo) and a literal open-square (the \[ pair) and close-square (the \] pair).  The ) is literal and does not need escaping (as a parenthesis group had not yet been opened), as is the next ] character.  To be sure, I would have written these two as the pair escapes \)\]
+
  The '''.*''' indicates zero-or-more (the asterix) instances of ''any'' character (the dot).  There is then a literal backslash (from the next '''\\''' duo) and a literal open-square (the '''\[''' pair) and close-square (the '''\]''' pair).  The ''')''' is literal and does not need escaping (as a parenthesis group had not yet been opened), as is the next ''']''' character.  To be sure, I would have written these two as the pair escapes '''\)\]''', but horses for courses...
 
   
 
   
  Then there's another character class (the next [ and the final ]) required zero-or-more times (the asterix) to use up all the rest of the characters to the end (the ending $ character).  As there was no ^ character (a.k.a. caret/circumflex/etc) at the start, the match isn't bothered about what unmatched characters appear.  This character class, however, starts with a ^ which in this context (the very first character of a charactr-class definition, not somewhere where an entire match-string starts) indicates negation of the following selection, so it is all characters ''but'' those specified, which is the regular close-parenthesis and (because it needs to be contained within a [] pair) the escaped close-square.
+
  Then there's another character class (the next '''[''' and the final ''']''') required zero-or-more times (the asterix) to use up all the rest of the characters to the end (the ending '''$''' character).  As there was no '''^''' character (a.k.a. caret/circumflex/etc) at the start, the match isn't bothered about what unmatched characters appear before the original '''\('''.  This character class, however, starts with a '''^''' which in this context (the very first character of a character-class definition, not somewhere where an entire match-string starts) indicates negation of the following selection, so it is all characters ''but'' those specified, which is the regular close-parenthesis and (because it needs to be contained within a '''[]''' pair) the escaped close-square.
 
   
 
   
  So, all matching strings must start with '\[(', i.e. the backslash, open-square and open-paren.  They can continue with ''any'' further text, before then having a '\[])]', i.e. backslash, open-and-close-squares and close-paren, close-square.  After this, the match continues just as long as there are no non-closing square/classic brackets before the ending.
+
  So, all matching strings must start with '''\[(''', i.e. the backslash, open-square and open-paren.  They can continue with ''any'' further text, before then having a '''\[])]''', i.e. backslash, open-and-close-squares and close-paren, close-square.  After this, the match continues just as long as there are no non-closing square/classic brackets before the ending.
 
   
 
   
  The minimum matching literal string would be '\[(\[])]' with longer variants being of the form 'X\[(Y\[])]Z' where X and Y can be replaced by anything, and Z can be replaced by anything ''so long as it doesn't contain possibly relevent close-brackets!''. The latter stipulation is likely because the Y (and X) ''is'' allowed to contain these characters, and for some reason you don't want to confuse the test by finding some other '\[])]' segment within the X/Y-zones.  (In this context, it doesn't actually seem to matter too much.  But it might do in ways I haven't spotted or just be a hang-over from a prior permutation of the test.)
+
  The minimum matching literal string would be '''\[(\[])]''' with longer variants being of the form '''X\[(Y\[])]Z''' where X and Y can be replaced by anything (or be absent), and Z can be replaced by anything (or absent!) ''so long as it doesn't contain possibly relevent close-brackets!''. The latter stipulation is likely because the Y (and X) ''is'' allowed to contain these characters, and for some reason you don't want to confuse the test by finding some other '''\[])]''' segment within the X/Y-zones.  (In this context, it doesn't actually seem to matter too much.  But it might do in ways I haven't spotted or just be a hang-over from a prior permutation of the test.)
 
   
 
   
  The "grep -o" function is working on the output to the file being 'cat'ed (there are alternate ways of doing this that some people might prefer), to only accept the lines in the file that match the 'X\[(Y\[])]Z' string.  These lines would appear to be lines of out.txt (a fairly generic name that reveals little to its original purpose) that are well-formed for some other purpose.  A safety-escaped (i.e. not to be taken literally by any simple parser) []-grouping containing a ()-group (''not'' escaped) containing potentially random text followed by an empty [] pair (again, safety-escaped).  Depending on the source, the empty []-pair could mean many things, as with the other layers.  And the lines may end with any further text.
+
  The "grep -o" function is working on the output to the file being '''cat'''ed (there are alternate ways of doing this that some people might prefer), to only accept the lines in the file that match the '''X\[(Y\[])]Z''' string.  These lines would appear to be lines of out.txt (a fairly generic name that reveals little to its original purpose) that are well-formed for some other purpose.  A safety-escaped (i.e. not to be taken literally by any simple parser) '''[]'''-grouping containing a '''()'''-group (''not'' escaped, perhaps reasonably in context) containing potentially random text followed by an empty '''[]''' pair (again, safety-escaped).  Depending on the source, the empty '''[]'''-pair could mean many things, as with the other layers.  And the lines may end with any further text.
 
   
 
   
  The "out.txt" file might be the result of a prior Grep (string-search function) quote possibly scanning code for lines of particular importance by another pattern and dumping the results to out.txt for further perusal.  And then Randall finds the need to dig further into the first result by extracting just those with the "X\[(Y\[])Z]"-ish pattern to them.
+
  The "out.txt" file might be the result of a prior Grep (string-search function) quote possibly scanning code for lines of particular importance by another pattern and dumping the results to out.txt for further perusal.  And then Randall finds the need to dig further into the first result by extracting just those already selected that all have the '''X\[(Y\[])Z]'''-ish pattern to them.
 
   
 
   
 
  But I could be wrong, and that's way too long for an official explanation.
 
  But I could be wrong, and that's way too long for an official explanation.
 
(Perhaps just something like the penultimate paragraph, if we're not entirely mistaken?) [[Special:Contributions/162.158.152.89|162.158.152.89]] 14:14, 3 February 2016 (UTC)
 
(Perhaps just something like the penultimate paragraph, if we're not entirely mistaken?) [[Special:Contributions/162.158.152.89|162.158.152.89]] 14:14, 3 February 2016 (UTC)

Revision as of 14:23, 3 February 2016

It should be noted that this also occurs in almost every programming language where "\" is the escape character. i.e.

print("Hello")
> Hello
print("\"Hello\"")
> "Hello"
print("\\Hello\\")
> \Hello\

Oh, and by the way, isn't this the third comic to mention "Ba'al, the Soul Eater"? Maybe we should start a category. (Others are 1246 (title text) and 1419.) 173.245.54.29 06:14, 3 February 2016 (UTC)

Did that before seeing you comment, so yes I agree. --Kynde (talk) 09:47, 3 February 2016 (UTC)
I don't think the regex is invalid

According to man grep you need to specify the -E option to use extended regex; without it unescaped parentheses are not interpreted, so they don't need to match.

My - very wild - guess is that it was the command he used to find the line with the most special characters, but I am not confident enough to edit the article (if someone can confirm?). 141.101.66.83 (talk) (please sign your comments with ~~~~)

If it was supposed to do that, it doesn't work. Running it on my bash history matches no lines, and I have lots of special characters in there 197.234.242.243 07:12, 3 February 2016 (UTC)

Explain it to me like I'm dumb. What is this comic going on about? I think the explanation needs more examples like that hello, above, because that's almost understandable. --198.41.238.231 07:47, 3 February 2016 (UTC)

I agree. But I cannot help either.--Kynde (talk) 09:51, 3 February 2016 (UTC)

This is the third time Randall has mentioned Ba'al the Soul Eater xD International Space Station (talk) 08:26, 3 February 2016 (UTC)

Yes, that was already mentioned a few hours before you comment, see the first comment. --Kynde (talk) 09:51, 3 February 2016 (UTC)

After passing the regex through bash, you get \\[[(].*\\[\])][^)\]]*$ That is, the literal character \, followed by [ or (, followed by any number of any characters, followed by \, followed by ] or ), followed by any number of characters that aren't ) or ], until the end of the line. 108.162.216.44 08:33, 3 February 2016 (UTC)

It sounds like you know what you are talking about. Anyone who can explain it good enough for the explanation, and correct the explanation of the title text if it is wrong to say that it would not work. I have added this as the reason for incomplete. But maybe also examples are needed for people with not programming skills/knowledge. We also enjoy xkcd ;-) --Kynde (talk) 09:51, 3 February 2016 (UTC)

For fun:

cat ~/.bash_history | xargs -d "\n" -n 1 -I {} bash -c 'chars="$(echo "$1" | grep -o "[a-zA-Z0-9 ]" | wc -l)"; echo "$(( 100 - $(( $chars * 100 / ${#1} )) )) $1"' _ {} | sort -nrk 1 | less

Outputs your bash_history, ordered by relative gibberishness. This was copied by hand from desktop to mobile, might well have a few typos.--162.158.90.208 10:04, 3 February 2016 (UTC)

The problem in the comic is not with regexes per se but with situations when the entered text or expression passes through several interpreters, like bash -> grep/sed/awk, or program text -> external shell command. In such cases, you have to escape backslashes for each program in the sequence, and it gets worse if you have 'real' backslashes in the final text that you're processing with the utilities (Windows' file paths, for example). See https://en.wikipedia.org/wiki/Leaning_toothpick_syndrome. Feel free to lift this to the explanation page, since I'm not good at longer and more careful explanations than this one. Also, gotta notice that Feedly stripped paired backslashes in the title text (probably passed it through some 'interpreter' embedded in its scripts). Aasasd (talk) 10:13, 3 February 2016 (UTC)

A funny comment about the MediaWiki software, which is even worse than this comic: <Nikerabbit> I looked the code for rlike and didn't find where it does this. Can you point me to it? <vvv> $pattern = preg_replace( '!(\\\\\\\\\\\\\\\\)*(\\\\\\\\)?/!', '$1\\/', $pattern ); <Nikerabbit> I thought that was ascii art :) (source) --162.158.91.215 10:18, 3 February 2016 (UTC)

Interestingly, I first looked at this on my phone (using Chrome Feedly for Android), but the title text did not display correctly in that the backslashes didn't appear (which was a little confusing!). In Chrome on my Windows desktop, the title text appeared correctly. Jdluk (talk) 11:36, 3 February 2016 (UTC)

enough with the harry potter fancruft. "elder" is a perfectly good word. just because you came across it for the first time in harry potter means you are *typing carefully* the kind of person that likes harry potter. unless this is a harry potter reference wiki, of course. in which case i'll prepare a complete list of every word that appears both here and there and put a list on every page. oh, right, no i won't. --141.101.106.161 12:41, 3 February 2016 (UTC)

Remember that "Elder" is used in a lot of RPGs to denote high level enemies or items. I feel like that's what Randall's referring to here, more than Harry Potter or the general sense of the term "Elder."

Attempting to add to the discussion: This regex is not necessarily invalid or incomprehensible. It looks like he was looking for a line with a regular expression or definitely some code. You just have to work your way through the backslashes. Although it might be invalid depending on the precise rules. He has some unescaped closing brackets and closing parenthesis. If these have to always be escaped then the regex is invalid. If however you don't have to escape a closing bracket with no opening bracket, then things are fine. I'm not familiar enough with grep's regex parser to know how it handles that edge case. Presuming those unescaped paren and brackets are fine, his regex searches for:

1. A backslash

2. An opening bracket

3. An opening parenthesis (this is a character set but the only character in it is an opening paren)

4. Any number of any characters

5. A backslash

6. An opening bracket

7. A closing bracket

8. A closing paren (presuming it doesn't have to be escaped when there is no opening paren)

9. A closing bracket (presuming it doesn't have to be escaped when there is no opening bracket)

10. Any number of character that are not a closing paren or closing bracket

11. The end of the line


Basically he is looking for a string that looks like:

\[(AAAAA\[])]AAAAA

Looks like a regex to me, and it looks like this regex also doesn't escape closing paren/brackets that don't have an opening paren/bracket, so I'm guessing that he knows what he is doing and his regex is fine. Maybe he was playing regex golf? Cmancone (talk)cmancone

Ninjaed by Cmancone, above. I agree with that result in every respect except for the start-of-string being potentially anything, but putting my own analysis in here because it took long enough to type!

Depth-of-backslash might depend upon depth of utility. In Perl, ''-quotes (among others) treat everything within as literal whilst ""-quotes (and variations) interpolates any special characters, variables, etc that you put in it.  (Search for "Quote and Quote-like operators" in your favourite PerlDocs source.)  '\sss' is a literal backslash followed by three 's' characters , while "\sss" is the special \s escape (a whitespace) followed by two further regular characters.  You might need to define the first when you need to use it to provide a not-previously-escaped \s so that it might be escaped within another context.  Or you define it as "\\sss" (escaped-\) the first time, as equivalent to '\sss'.  But '\\sss' would be a literal that, later, could be interpreted as an escaped-\ to the input of a further context where the \s finally becomes 'match a whitespace'.

'\\\sss' would be literal, whilst "\\\sss" could be equivalent to '\ ss' (literal backslash, literal space, rest of characters).  Then, instead of literal '\\sss', for some purpose, you could interpolate two escaped-backslashes "\\\\sss"... and so on.

Meanwhile I think, just from visual inspection, "\\\[[(].*\\\[\])][^)\]]*$" in Bash should obey the interpolation rules quite nicely.  The first two characters must be a literal backslash (from the escaped-backslash) and a literal open-square bracket (again, escaped).  The next open-square and the close-square shortly after depict a character class that contains only an open-parenthesis, and could have been written as \(.

The .* indicates zero-or-more (the asterix) instances of any character (the dot).  There is then a literal backslash (from the next \\ duo) and a literal open-square (the \[ pair) and close-square (the \] pair).  The ) is literal and does not need escaping (as a parenthesis group had not yet been opened), as is the next ] character.  To be sure, I would have written these two as the pair escapes \)\], but horses for courses...

Then there's another character class (the next [ and the final ]) required zero-or-more times (the asterix) to use up all the rest of the characters to the end (the ending $ character).  As there was no ^ character (a.k.a. caret/circumflex/etc) at the start, the match isn't bothered about what unmatched characters appear before the original \(.  This character class, however, starts with a ^ which in this context (the very first character of a character-class definition, not somewhere where an entire match-string starts) indicates negation of the following selection, so it is all characters but those specified, which is the regular close-parenthesis and (because it needs to be contained within a [] pair) the escaped close-square.

So, all matching strings must start with \[(, i.e. the backslash, open-square and open-paren.  They can continue with any further text, before then having a \[])], i.e. backslash, open-and-close-squares and close-paren, close-square.  After this, the match continues just as long as there are no non-closing square/classic brackets before the ending.

The minimum matching literal string would be \[(\[])] with longer variants being of the form X\[(Y\[])]Z where X and Y can be replaced by anything (or be absent), and Z can be replaced by anything (or absent!) so long as it doesn't contain possibly relevent close-brackets!. The latter stipulation is likely because the Y (and X) is allowed to contain these characters, and for some reason you don't want to confuse the test by finding some other \[])] segment within the X/Y-zones.  (In this context, it doesn't actually seem to matter too much.  But it might do in ways I haven't spotted or just be a hang-over from a prior permutation of the test.)

The "grep -o" function is working on the output to the file being cated (there are alternate ways of doing this that some people might prefer), to only accept the lines in the file that match the X\[(Y\[])]Z string.  These lines would appear to be lines of out.txt (a fairly generic name that reveals little to its original purpose) that are well-formed for some other purpose.  A safety-escaped (i.e. not to be taken literally by any simple parser) []-grouping containing a ()-group (not escaped, perhaps reasonably in context) containing potentially random text followed by an empty [] pair (again, safety-escaped).  Depending on the source, the empty []-pair could mean many things, as with the other layers.  And the lines may end with any further text.

The "out.txt" file might be the result of a prior Grep (string-search function) quote possibly scanning code for lines of particular importance by another pattern and dumping the results to out.txt for further perusal.  And then Randall finds the need to dig further into the first result by extracting just those already selected that all have the X\[(Y\[])Z]-ish pattern to them.

But I could be wrong, and that's way too long for an official explanation.

(Perhaps just something like the penultimate paragraph, if we're not entirely mistaken?) 162.158.152.89 14:14, 3 February 2016 (UTC)