Difference between revisions of "1676: Full-Width Justification"

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
(โ†’โ€ŽExplanation: Snake-justification showing up in actual e-readers)
(๐Ÿ)
Line 67: Line 67:
 
::their famous paper  
 
::their famous paper  
 
::on the relationship  
 
::on the relationship  
โˆ’
::between [a snake filling the gap]
+
::between ๐Ÿ [a snake filling the gap]
 
::deindustrialization  
 
::deindustrialization  
 
::and the growth of
 
::and the growth of

Revision as of 17:55, 4 May 2016

Full-Width Justification
Gonna start bugging the Unicode consortium to add snake segment characters that can be combined into an arbitrary-length non-breaking snake.
Title text: Gonna start bugging the Unicode consortium to add snake segment characters that can be combined into an arbitrary-length non-breaking snake.

Explanation

Ambox notice.png This explanation may be incomplete or incorrect: hasty & impatient placeholder. Still an early draft; needs citations, fact-checking, and it also needs the Wikipedia links to be fixed.
If you can address this issue, please edit the page! Thanks.

The comic refers to an irritating problem in laying out text to fit from edge to edge, the problem of justification, where you want multiple-line text to line up on the left side (common), the right side (less common), or both sides, which is commonly called full justification. This strip is dealing with how to make text fit such that it lines up on both sides while still looking good. Sometimes, as before a long word like "deindustrialization," there's no universal good way to make the typography work. It is a difficult problem to make text look good and be easily legible especially in a narrow space, with the biggest issue being how to handle words that are too long to fit nicely.

The comic shows several solutions to this problem, some realistic and others less so, but each unsatisfying. "Giving up" is ugly, leaving a line break which doesn't fit with the rest; spacing looks pretty confusing as people may think it is an acronym. Hyphenating is confusing in English because its spelling requires full-word recognition ("deindus-" looks like an independent, unfamiliar word, pronounced "dayn-duss"); stretching is unnatural, probably hard to code or render, unfamiliar and quite ugly; adding "filler" words, a radical solution, makes the writing worse (in the case of the example, making the tone too informal); and adding a meaningless snake image, just long enough to fill the extra space, is a novel (and quite bizarre) solution which probably wouldn't actually be used by a serious typographer.[citation needed]

Another approach is to treat justification as part of a global typesetting strategy which allows words to move between lines even where this is not locally optimal. This approach is used by TeX.

The title text suggests that in order to facilitate this last method of "solving" the problem, the Unicode Consortium, the organization in charge of the common text standard Unicode, should add "snake-building characters" (similar in concept to the existing Box Drawing block), to allow variable-length snake images to be used as filling. Currently, there are two Unicode snakes, xF40D (🐍) and x1DC2 (᷂). The latter is a diacritical mark used in Americanist phonetic notation to indicate lenis (weak) articulation.

The use of the phrase "non-breaking" in the title text is a play on non-breaking space and implies that an automatic line break could not be inserted after a snake segment; the whole snake would shift down if it were too wide to fit on a given line. This suggestion would likely be rejected; the Unicode consortium is very specific about which characters are added[citation needed], and always require a good reason[citation needed] before adding a character or set of characters to the standard. Strange decisions by the consortium have previously been referenced in 1253: Exoplanet Names, 1513: Code Quality, and 1525: Emojic 8 Ball.

Jim Chapman, developer of Windows 10 e-reader app Freda, has announced the next version of Freda will incorporate snake-justification.

Note that in Arabic, it is common to stretch the lines connecting letters as a relatively elegant and satisfying resolution to this problem. This trick is called "kashida" (ูƒุดูŠุฏุฉ). There does in fact exist a Unicode character, U+0640: (ู€), to help with this: using it to extend "ูƒุดูŠุฏุฉ" would result in something like "ูƒุดู€ู€ู€ู€ู€ู€ู€ู€ู€ู€ู€ูŠุฏุฉ" (which, incidentally, looks a lot like a snake).

Transcript

[Caption above the panels:]
Strategies for full-width justification
[Below the caption is a column with six boxes, each showing a different "strategy" for justification which is annotated beside it. Here the annotation is written at the top and the text below. The top and bottom of the text is cut of in the middle, but as it can be "read" this is written anyway. Only for hyphenation does an extra word appear at the end. In the last with snakes, a snake is drawn to cover the entire space from the end of between to the right border.]
Giving up
their famous paper
on the relationship
between
deindustrialization
and the growth of
Letter spacing
their famous paper
on the relationship
b  e   t   w   e  e   n
deindustrialization
and the growth of
Hyphenation
their famous paper
on the relationship
between deindus-
trialization and the
growth of ecological
Stretching
their famous paper
on the relationship
between
deindustrialization
and the growth of
Filler
their famous paper
on the relationship
between crap like
deindustrialization
and the growth of
Snakes
their famous paper
on the relationship
between ๐Ÿ [a snake filling the gap]
deindustrialization
and the growth of

Trivia

  • The full text (with alternate changes) reads:
...their famous paper on the relationship between [crap like]/[ ๐Ÿ ] deindustrialization and the growth of [ecological]...


comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!

Discussion

I added the emoji snake. Is emoji snake the same as a Unicode snake would be? Azule (talk) 05:46, 4 May 2016 (UTC)

I assumed Unicode snakes would use three different characters: a head, a body segment, and a tail. Your solution is good, but objectively not perfect compared to what's shown in the comic.
So what would be the optimal snake transcription method here? A parenthetical aside saying "A drawing of a snake stretches to the right end of the line."? Or should we just blackmail the Unicode consortium again? ~AgentMuffin
The correct solution is obviously to include a 16 Mpixel image of a snake.Henke37 (talk) 07:41, 4 May 2016 (UTC)
Emoji full snake is already in Unicode as Azule knows. &#x1f40d = 🐍
Segmented snake needs at least three characteres: head, e.g. ยฐ, body e.g ~ and tail, e.g. โ—.
Three segment snake ยฐ~โ—
Four segment snake: ยฐ~~โ—
Demro (talk) 12:45, 4 May 2016 (UTC)

Could the title text also be a reference to the snake in umwelt? Azule (talk) 05:46, 4 May 2016 (UTC)

Amazon is notorious for being bad at this. Here's a somewhat related Computerphile video. Eno (talk) 06:32, 4 May 2016 (UTC)

Also, funnily enough, the filler text and the snakes were used in medieval (hand-written) manuscripts. Although it's not a snake but usually a nondescript wriggle that could only pass as a snake when you're squinting really hard. For filler text it's usually low-content words like "truly", "verily", "indeed", "without fail", "in truth" or stuff like that. So it's really an old problem with no satisfactory solution developed in hundreds of years... 162.158.85.93 08:19, 4 May 2016 (UTC)

This practice of filling the line with a dingbat carried on into the days of handset letterpress (i.e. up until the early 1900's), although it gradually became more whimsical and so less frequent in serious works.108.162.241.123 12:28, 4 May 2016 (UTC)

In practice you reformulate. Not necessarily insert filler words, but just reorder the sentence enough that justification works. That is assuming the automated justification doesn't work, which will try a combination of multiple methods like word-spacing, letter-spacing and hyphenation. Imagine hyphenating at "de-" instead, but adding a little bit extra letter space in "between", and almost double normal word space between "between" and "de-".162.158.114.222 08:20, 4 May 2016 (UTC)

Reformulating can only be done with the (tacit or explicit) permission of the author. There are situations where rewording would not be allowed.108.162.241.123 12:28, 4 May 2016 (UTC)

While the arabic part is interesting, I don't feel it to be very relevant here. 108.162.249.156 09:11, 4 May 2016 (UTC)

It is relevant because is yet another solution (useful only in Arabic). Demro (talk) 12:47, 4 May 2016 (UTC)

Sorry- how do add a [citation needed] in superscript? Transuranium (talk)Transuranium


The "snake" option is actually less out there than the current explanation indicates. Snakes proper were not necessarily the go-to, but the same general strategy (decorative filling) was used heavily in illuminated manuscripts in the medieval period. 162.158.214.217 14:36, 4 May 2016 (UTC)

Came here just to say that. The current explanation needs reworking because that's actually one of the oldest ways of dealing with text justification. Check for example the Book of Kells 162.158.203.141 20:15, 4 May 2016 (UTC)
Modified the explanation accordingly.162.158.214.217 21:44, 4 May 2016 (UTC)

"the Unicode consortium is very specific about which characters are added[citation needed], and always require a good reason[citation needed] before adding a character or set of characters to the standard." Seriously? Then what are all the emoji pages added for? U+1F459 (Bikini) ๐Ÿ‘™, for example... 108.162.221.98 04:05, 5 May 2016 (UTC)

Emoji were added because Japanese cellphones had introduced them with wild success. A stable standard was badly needed, and the Unicode Consortium, whose job it is to make such standards, complied, after some hesitation.108.162.219.10 17:55, 9 May 2016 (UTC)
In case of bikini, I would suspect the gender of Unicode consortium members is the reason ... -- Hkmaly (talk) 17:52, 5 May 2016 (UTC)

I suspect that U+13192 (EGYPTIAN HIEROGLYPH I009A) is actually a "snake building" character in the sense that it is a horned viper coming out of a building. I do not however have easy access to a copy of the original source reference (Gardinerโ€™s "Supplement to the Catalogue of the Egyptian Hieroglyphic Printing Type Showing Acquisitions to December 1953") that was the basis for adding this character in Unicode 5.2. Poslfit (talk) 20:19, 10 May 2016 (UTC)

Found a list online and have updated the main text accordingly. Poslfit (talk) 20:53, 10 May 2016 (UTC)

I changed "Hyphenation is also confusing as it often leaves two partial non-words" with "Hyphenation is confusing in English because its spelling requires full-word recognition". In many (if not most) languages two partial non-words can be easily read. The hyphenation problem is probably unique to English. 108.162.221.13 13:06, 5 May 2016 (UTC)

In most languages, the cases where the hyphenation will be confusing will be rare. In English, the cases where the hyphenation will NOT be confusing will be rare. -- Hkmaly (talk) 17:52, 5 May 2016 (UTC)
On the contrary, it will generally result in non-words (and hence difficulty reading) regardless of which language you're writing in. Unless maybe you're dealing with logographs, e.g. in written Chinese languages. Flipping Mackerel (talk) 03:32, 6 May 2016 (UTC)

For hyphenation would it make sens to also talk about the case where it create new words which can be offensives ? Ex therapist -> the-rapist 108.162.228.137 22:37, 9 May 2016 (UTC)

Letter Spacing in German

Hi there...

I guess the statement concerning letter spacing being not available in German isn't (wasn't ever) entirely accurate.

Letter spacing has since the demise of black letter typing become obsolete and is nowadays merely used to emphasise surnames or city names in administrative paperwork. But even in ancient times of German black letter usage, letter spacing wa salso used to achieve justification. If something was to be emphasised in such a line, the spaces would've been even larger, maintaining a certain ratio between regular letter spaces and emphasised letter spaces.

However, since letter spacing is as uncommon in German typing as black letters are, it may be used for justification without any concern. In order to emphasise certain words, italic, bold or underlined text is the means of choice.

Personally, I prefer letter spacing and hyphenation combined, although snakes seem to be the real deal!162.158.85.141 14:29, 29 July 2016 (UTC)