2109: Invisible Formatting

Explain xkcd: It's 'cause you're dumb.
Revision as of 17:17, 8 February 2019 by (talk) (Explanation)
Jump to: navigation, search
Invisible Formatting
To avoid errors like this, we render all text and pipe it through OCR before processing, fixing a handful of irregular bugs by burying them beneath a smooth, uniform layer of bugs.
Title text: To avoid errors like this, we render all text and pipe it through OCR before processing, fixing a handful of irregular bugs by burying them beneath a smooth, uniform layer of bugs.


A fast way to select a word in many systems is to double-click it, which also selects the following space. After applying formatting, one may select only the word to remove this formatting, by clicking and dragging with the mouse, which leaves the space formatted. Since in most fonts the word space looks identical between the bold and the regular, this has no effect on how the end user will read the document, but could theoretically cause a problem on later occasions. Randall worries about this. Randall's worries may be due to the fact that he's good as programming, that makes him be more attentive to hidden problems that most people don't pay attention and happen under the hood.

In this case, he does not appear to have used the double-click method of selecting the word (based on the fact the cursor is depicted past the end of the word instead of on the word), but instead clicked-and-dragged to select it, a process that makes it easy to accidentally select the space as well—it's a thin character, hard to avoid highlighting, and most people don't worry about trying to avoid including it anyway. So either method of highlighting a word makes it easy to include the trailing space in the selection.

If later the same word is highlighted to have the bold removed, but this time the highlighting did not include the space, you would end up with an invisible character that is still bold, but since there is no visible component to it there is no easy way to tell it is still bold—even if it is of a different size, it may be hard to notice. This is the situation the comic is highlighting... no pun intended.

Occasions where a hidden bold space may be a problem include:

  • Wikis. In the first paragraph of this article, every space is a hidden bold space. From the editing view, all the spaces look like''' '''this. This will annoy all future editors of this article, due to the hidden apostrophes which are formatting the spaces. They may also accidentally introduce bold words.
    • By default, MediaWiki attempts to prevent this by not including the trailing spaces in the bold formatting when you click the “bold” button, so someone has to manually type the formatting apostrophes to do this.
  • Editing that adds some text at the location of the space will make this text bold.
  • A situation where formatted text is not allowed, and is rejected, but the user failed to strip formatting from the spaces, and this is noticed.
  • If a font has the word space look different between the bold and the regular, perhaps to make it so bold words are spaced closer to each other, the spacing will look inconsistent if there is a hidden bold space.
  • Unnecessary extra formatting will usually unnecessarily increase file size, which may put the file above some file maximum file size threshold.

In the title text, Randall says that he “fixes” this by running the text through OCR, which turns physical copies or images into text. It would usually ruin even more formatting and add inaccuracies to the text. This way, no one can tell which bugs were introduced by him and which ones by the OCR, which he thinks is better somehow.

Popular modern word processing programs have features which may make it easier to notice improperly formatted invisible characters. In the tutorials linked here, one may learn how to view invisible characters in Microsoft Word, Pages and LibreOffice Writer. In the older word processor WordPerfect, one could do this with the “Reveal Codes” feature, which showed you character codes in place of the characters.


Ambox notice.png This transcript is incomplete. Please help editing it! Thanks.
[A text editor, with [...]. The word "not ", including the following space, is highlighted in blue. There is a cursor below it.]
Text: ...ere, but would not have to mo...
Action: Select
[The cursor is on the bold option and the selected word is bolded.]
Text: ...ere, but would not have to mo...
Action: Click
[The cursor is next to the "to".]
Text: ...ere, but would not have to mo...
Thought bubble: ...Nah, the bold is too much.
[The word "not" is highlighted.]
Text: ...ere, but would not have to mo...
Action: Select
[The cursor is on the bold option and the selected word is not bolded.]
Text: ...ere, but would not have to mo...
Action: Click
[The cursor is gone. There is an arrow pointing to the bolded space with a dashed box around it.]
Text: ...ere, but would not have to mo...
Arrow: Hidden bold space
[Caption below the panel:]
When editing text, in the back of my mind I always worry that I'm adding invisible formatting that will somehow cause a problem in the distant future.

comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!


This reminds me of the person who used l (lower-case "L") instead of 1 for data entry at some business. Amazingly, the computer accepted it (BAD programming!) and it wasn't found out until the end of the tax year, when all heck broke loose! 14:50, 8 February 2019 (UTC)

Some programming puzzles are often solved with stuff like this: AΑ Fabian42 (talk) 15:19, 8 February 2019 (UTC)
"l" (lower-case "L") is a valid suffix to integer literals in C and derived languages. It indicates the number is of the "long int" type as opposed to a plain "int". Because C automatically upconverts the "int" type into "long int" when needed, the "l" suffix is rarely used. The result: "long int a = 1;" and "long int a = 1l;" mean exactly the same thing, and both statements are perfectly standard and won't raise any warning from compilers. "ll" (double el) is also a valid suffix, this time for the "long long int" type. GuB (talk) 15:39, 8 February 2019 (UTC)
Typing lowercase L instead of 1 is a common thing for people of a certain age. Old manual typewriters usually don't have a "1" key, so people learned to use lowercase L instead -- and sometimes slip back into that habit on newer technology. --Aaron of Mpls (talk) 02:03, 9 February 2019 (UTC) Tha's exactly what happened in my example. I blame the programmer, though, for allowing a letter where a numeral was required or possibly converting the l to a 1 if the programmer knew such a thing ever happened. In either case, it shouldn't have allowed the l to just sit there like a bomb waiting to blow apart the post-tax-year processing. 15:22, 9 February 2019 (UTC)

I went to this page, expecting it to be self-referential. Was not disappointed. Fabian42 (talk) 15:19, 8 February 2019 (UTC)

Some markup conversion tools don't handle hidden bold spaces correctly. This HTML to Markdown converter is an example: https://anthonychu.github.io/to-markdown/ It converts <b>a </b> to **a ** instead of **a** . 15:40, 8 February 2019 (UTC)

Hah, this comment is not mine! Somehow I have your IP now. 17:47, 8 February 2019 (UTC)

Were the periods in the beginning there for a specific reason? Netherin5 (talk) 17:42, 8 February 2019 (UTC)

The user thought it was a good idea for some reason. Glad you fixed it. I finished the job 17:46, 8 February 2019 (UTC)

I've had this happen when writing papers. Bold. Unbold. Later backspace into the hidden bold space and everything typed after gets put in bold. If a professor gives you a page count instead of a word count, you can make the punctuation in your paper bold (or increase the font) to add some extra padding that might go unnoticed. Don't actually do this if you can't convey your thesis in fewer words. 18:11, 8 February 2019 (UTC)

I hated when Microsoft Word took over and lacked a real "Reveal Codes" like WordPerfect used to have. I'm kind of like Randall, I think about those behind-the-scenes things that lots of companies like to try to hide from the user, and I like the power to do something about them. -boB (talk) 18:58, 8 February 2019 (UTC)

When I saw the strip, I immediately thought of Word Perfect because its brain dead way of inserting formatting as special codes inline with the text. Hit "reveal codes" and it would reveal a string of bold on / bold off codes because it wasn't clever enough to optimise them away. I assume Word does it differently, perhaps with attributed strings and so doesn't need the reveal codes function so you can manually fix the mess the program has a made.

In Microsoft Word, where the majority of people would have experience with selecting and bolding text, the cursor appears as an "I-beam" when positioned over text and not as the "mouse pointer arrow" shown by Randall. Also, in Word double-clicking a word does select the following space(s), but when bold is applied it is applied only to the selected word, NOT to the trailing space (even though the space was selected when the bold was applied). So selecting just the word and un-bolding would not leave a bolded space behind, since the space was never bolded. Clearly Randall's example is in some editor other than Word. Since Word is where most people have familiarity with selecting and bolding text, something should be added to the explanation noting this and speculating on which text editor Randall is actually showing. - 20:35, 8 February 2019 (UTC)

Agreed. Most text editors do not select the trailing space when double-clicking. Microsoft Word is one of the few that does it. But in that case, the space is not formatted as bold. But in most word processors including Word, if you do select the word with the trailing space and apply the bold formatting, the space retains the formatting even if the word is un-bolded. So the first sentence of the explanation is incorrect.
Do they not? Notepad does it. Notepad++ does it. Your browser does it. Where is the wealth of programs that don't? I reckon this is the default system-wide behavior for double-clicking in Windows, regardless of program. 11:46, 9 February 2019 (UTC)
It seems to be indeed Windows issue, as everything I tried did highlight extra space (except Notepad++), but nothing I tried on Linux did. 13:59, 9 February 2019 (UTC)

Hidden formatting annoys translators greatly. Sometimes, the formatting of the word processor used and the formatting recognized by the CAT program (such as SDL Trados Studio or MemoQ) do not line up very well, which causes the formatting to appear as tags within the text (purple colored in the most widely used CAT software, Trados). If there is sloppy or hidden formatting all through the document, this turns into what most people call a "wall of purple", with tags everywhere within the document. Since tags need to be accounted for (otherwise the document does not save properly), and the formatting capability of most CAT tools is a lot more limited compared to any word processors, this is a colossal waste of time for any translator to wade through. Thus, if you leave any hidden formatting in a document and you know it will be translated somewhere down the line, you know there is a translator out there that curses the day you were born. (A note though - PDF conversion is responsible for a lot more wall of purple incidents than sloppy formatting. Seriously - if you expect a document to be translated at some point, never bring it anywhere close to the PDF format. That format is evil, I tell you. Pure evil.) 05:47, 9 February 2019 (UTC)

In WordPerfect for DOS, the codes were [BOLD] to turn bold on and [bold] to turn it off again. -- 11:30, 9 February 2019 (UTC)

The whole idea of invisible formatting is being used by some websites, including Facebook, to make it much harder for ad blockers to block ads. For example, https://twitter.com/themikepan/status/1093035372186034176 Of course, the same can also be used to defeat swear filters on forums, as well (which, for some words like "bastard sword," *the moderators* themselves suggest doing). Draco18s (talk) 19:43, 9 February 2019 (UTC)

We have a category for comics with colour... can we have a category for comics with lowercase letters? :) Undergroundmonorail (talk) 02:33, 10 February 2019 (UTC)

I frequently see a similar, related problem. In preparing a weekly newsletter (consisting mostly of links to articles from various news sources), people submitting articles to me usually send me Microsoft Word files into which they have used copy/paste to insert the headline, URL and a few lines of text for context. On far too many articles, I find that the resulting text has embedded UNICODE Left-to-right mark characters (U+200E) in it. These don't affect display and printing at all (since all of the text is already left-to-right), but it creates broken links if one appears in a URL and I copy/paste it into a web browser's location bar. There doesn't seem to be any way to make these characters visible in Word. If manually cursoring over the text (with left/right keys), you will see the cursor change shape without moving when stepping over the left-to-right mark, but that's the only indication. It's quite annoying to have to work around. (If anyone knows of a good workaround, please let me know.) Shamino (talk) 19:32, 10 February 2019 (UTC)

I frequently cut-and-paste text into Notepad (or gedit, or some other text-only editor etc.), then cut-and-paste it back to Word or whatever other "rich text" capable destination I am using -- this removes all hidden junk, formatting, font changes, bold, etc. and the pasted text takes on the characteristics of wherever it's pasted into rather than where it came from. This is basically taking the text down to the bare minimum, and then I can reintroduce whatever formatting I want it to have. -boB (talk) 16:47, 11 February 2019 (UTC)

GIMP is really bad about this when trying to add text to an image. You either end up with the formatting not wanting to stick, or you end up with invisible formatting all over the place. Dark talk 00:15, 11 February 2019 (UTC)

Seems to me that everybody here misses the point of the comic. Which is not the problems hidden left over formatting could do to later text. The joke here is that Randall is about to write something where he really means that NOT. But then regrets it, as he is afraid that the reader of his text/message would take offense of having this not shouted out in bold! So he reverts the bold, but because he misses the space, he has left a proof that he actually did mean Not and this can now be found out by the receiver anyway, which might then take offense anyway, or take offense that Randall felt he had to delete the bold, as if the receiver could not handle this (of course if he took offense from this Randall had proved his point, but never the less he tries to avoid this.). All this is mentioned now at the very end of a long list of indifferent problems such a bold space could create. I will move this up to the top now, as the main explanation. --Kynde (talk) 10:06, 13 February 2019 (UTC)

I found (and find) the typography in this comic troubling, because while it is clearly a proportionally spaced font ("l" is 5px wide, "w" is 23px), the boldfaced and roman "not"s are the same size (49px wide). In a normal proportionally spaced situation, the boldfaced letters would be wider. JohnHawkinson (talk) 03:23, 23 February 2019 (UTC)

In an edit last week I removed the claims that "Randall bolds text via clicking" and that it "could indicate that Randall is not familiar with using word processors." just reverted my removal, and I wanted to explain here why '.145 is wrong, in a little more space than the edit summary allows. I said originally, "An iconbutton is used for bold in comics for illustrative purposes, because you can't see the keyboard. It does not reflect the author-artist's knowlege." That is, we cannot draw conclusions about Randall's knowledge based on the fact that he didn't illustrate in this comic using a keyboard.
'.145 asks, perhaps rhetorically, "Then why not just write "Ctrl+B"? You can't see the mouse either, but you know what "click" and "select" are referring to."
First of all, it doesn't matter. The comic could also have illustrated use of a menu, but that wouldn't tell us anything about Randall's knowledge of the iconbutton or the keyboard shortcut. Without any information about this, it's not possible to make reasonable inferences about this, and so the explanation shouldn't even go there. Secondly, there are good reasons why an iconbutton makes more sense (not that I'm required to supply them); because keyboard shortcuts are not as discoverable as iconbuttons or menus (and menus take a lot of space that make them hard in a comic of small compact multiples like this one) that means more people are familiar with the menu or button than the keyboard shortcut, and indeed those who know the keyboard shortcut are generally a subset of those who know another method; and further still, "Ctrl+B" is not platform-independent (e.g. Mac users need Cmd+B) or software-independent (InDesign users need Cmd+Shift+B). Thirdly, you can indeed see the mouse pointer, so I'm not sure what '.145 is trying to suggest. And finally, it's utterly ridiculous and kind of offensive to suggest (without any real basis) that Randall doesn't know how to use a word processor. That a person chooses to use one method, even if it's not the most efficient method, doesn't mean they are "not familiar with using word processors." We don't even know what Randall's UI preferences are here, but even if we did that wouldn't be enough to suggest a lack of familiarity rather than a personal preference. The text from this edit is not encyclopedic and should stay out. JohnHawkinson (talk) 14:48, 4 March 2019 (UTC)

In LibreOffice Writer on Linux if I select a word with double-click it doesn't include the space, but if I select it with the keyboard using Ctrl+Shift+RightArrow it does include the space. In the comic it looks like the selection was made with the mouse, but it's not explicit. 00:15, 11 July 2019 (UTC)