2700: Account Problems

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
Account Problems
My password is just every Unicode codepoint concatenated into a single UTF-8 string.
Title text: My password is just every Unicode codepoint concatenated into a single UTF-8 string.

Explanation[edit]

Cueball asks Ponytail to help him because he can't log in to his account. Having attempted to fix Cueball's tech issues in the past, Ponytail replies with dread. Cueball promises that "It's a normal problem this time", and Ponytail agrees to look at it. But then Cueball reveals that he has included a null string terminator character in his password when creating an account and now he can't log in.

In computer systems, every "character" (letter, digit, punctuation, etc.) is represented as an integer. For example the lowercase letter 'a' is represented as the number 97, and the digit '1' is represented as the number 49 (when using the ASCII character encoding or Unicode character encoding). A "string" refers to a sequence of characters, and can be used to store arbitrary text (for example names, messages, passwords). Strings can be arbitrarily long, so some mechanism must be used to record their length. One approach is to store the length explicitly (Pascal string). Another approach is to mark the end of the string using a specific character, usually the null character (which is represented as the number 0); such strings are called null-terminated strings, and are used by the C programming language. Both approaches have advantages and disadvantages. A limitation of null-terminated strings is that they cannot be used to represent text containing embedded null characters. This is usually not a problem, because normal text never contains null characters. However, if somehow a null character were to end up in the string, it would cause problems: any code that uses that string would assume this null character marks the end of the string, so the string would effectively be cut off.

Account registration systems often place requirements on passwords in an attempt to encourage users to pick stronger passwords. For example, they might ask that the password include at least one "special character" (such as !@#$%^&*). Cueball misunderstood this requirement as referring to characters such as the null character (which is more accurately referred to as a control character). Cueball managed to type the null character as part of his password somehow (on some systems it is possible to type the null character using certain keyboard shortcuts such as Ctrl+Space, Ctrl+@, Ctrl+2, or Alt+0 using the number pad), but the software running the registration system was poorly written and could not cope with this – it allowed him to create an account with that password, but then when he tried to log in with the same password the system didn't accept it.

It's unclear how that particular situation might arise in real software, but here is a similar situation that can easily happen in practice: Suppose a website's registration form allows the user's new password to have up to 20 characters, but due to a programmer error the login page only accepts passwords with up to 18 characters. If the user picks a medium-length password (say with 12 characters), all is well. But if the user picks a password with 20 characters, they will find themselves in the same position as Cueball, being able to register but not able to log in. Some additional situations are described below.

The title text describes a password which is "just" every Unicode character concatenated into a single string. Unicode is a standard for representing characters from many writing systems, and it has 149,186 characters as at the time of this comic (with new characters being added over time). A password consisting of all of those characters would be extremely long; it would be impractical to type by hand, and would be too long for pretty much all account registration systems. (A "codepoint" is the number assigned to a character, and UTF-8 is a common encoding system for representing each Unicode codepoint as a sequence of bytes.) Also, since Unicode includes the null character, the password would have the same issue as Cueball's password. Further, if the account registration system treats the null character as a string terminator (as in C), then the password would be equivalent to an empty password (assuming it contains the Unicode codepoints in order, starting with the null character).

Transcript[edit]

[Cueball carries an open laptop over to Ponytail, holding it in both hands. The screen shows a box filling the screen with some text on lines. Ponytail is sitting in an office chair with her laptop at her desk. She has turned her head away from the computer looking at Cueball's screen.]
Cueball: Can you help me with my account?
Ponytail: Oh no.
[Cueball holds his laptop up in front of Ponytail who has turned the chair so she faces him, with her hands in her lap. Her table is not drawn.]
Cueball: No no, I promise it's a normal problem this time.
Ponytail: Okay. Fine. What is it?
[Cueball holds both hands out palm up towards Ponytail who is sitting with his laptop in her lap typing on it.]
Cueball: I included a null string terminator as part of my password, and now I can't-
Ponytail: How?!
Cueball: They said to use special characters!

Trivia[edit]

  • Here are some additional situations where passwords with special characters might stop working:
    • The registration form allows passwords to contain null characters, but the login form strips null characters (for example because it was written by a different developer/team, or because it has been updated over time). When Cueball tries to log in, the login form strips the null characters, so the resulting password can never match such a stored password (which contains a null character).
    • The password system accepts Unicode characters at first, but is later changed to only accept ASCII passwords. Users who included non-ASCII characters like é or ö in their password become locked out of their account because they are no longer allowed to submit those characters.
    • Passwords containing non-ASCII characters are in general problematic, because it might not be possible to type them on the keyboard used for logging in. For example, on Mac OS a logged-in user can change their password to one that contains emojis, but the keyboard on the login screen does not have good support for typing emojis.[1][2].
    • A business network may have multiple systems that connect to a central database of usernames and passwords. If the systems have different password handling rules, a user might find that some of the systems don't support their password (for example because the password contains a character which is forbidden on a particular system).
  • There are several techniques that can be used to safely handle passwords and other user inputs that might contain unsafe characters such as the null character:
    • Validate: Check whether the user input contains unsafe characters, and if it does display an error message to the user.
    • Sanitize: Remove unsafe characters from the user input to prevent them from causing problems.
    • Encode/quote/escape: Replace each unsafe character with an appropriate sequence of characters (depending on the context). For example, a null character can be included in a URL by encoding it as %00. This technique is not very relevant to password handling, but is relevant for example when including user input in generated web pages or passing user inputs to database queries.
    • For the specific case of null characters: Use a string representation that supports null characters (e.g. Pascal strings), and be very careful not to pass such strings to functions that can't handle embedded null characters.
  • Failure to handle strings containing null characters correctly can result in security vulnerabilities. For example, including a null character in crafted input may allow a user to read or write files that they are not supposed to be able to access.[3][4]
  • In C, a string is usually stored in a block of memory that is allocated to have a known size. The maximum size of string that can be stored in such a buffer is one character less than the buffer's size, since the last character is used for the null terminator. Language functions that operate on strings, such as those that return the length of a specified string or which compare two strings, look for the terminator as a marker. However, there is a risk in using this feature: if that terminator is somehow overwritten by some other value, a function which assumes that there is still a stopping point may go far beyond the intended region of memory before it happens to find an unrelated terminator or otherwise is forced to stop looking. This can have serious security implications, as well as the potential for bugs and crashes. Instead, safe programming uses versions of the string functions that include a specification of the maximum allowed length. For example, the strlen() function takes a pointer to a string, counts the number of characters until it encounters a null terminator, and returns that number: the length of the string not including the terminator. The strnlen() function takes a pointer to a string and a maximum length, and counts characters until it either finds a terminator or reaches the maximum.
  • The number of the xkcd comic is 2700. When interpreting this as two concatenated octal numbers \27 + \00 it represents both the ETB as well as the null character, both of these characters possibly leading to problems when processed in legacy systems (e.g. mainframe computers). When interpreting 2700 as hexadecimal 0x27 + 0x00 numbers it represents the ' character and the null character - a sequence that could lead to SQL injection when it is placed in unescaped form inside of a SQL command.


comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!

Discussion

What was going on with this page? Sarah the Pie(yes, the food) (talk) 00:58, 19 November 2022 (UTC)

Vandalism. I mentioned it on the Admin requests page. It's getting reverted back to normal pretty quickly when it happens, but it will probably keep happening until an admin bans the person doing it, or the person doing it gets bored and stops on their own. Equites (talk) 01:05, 19 November 2022 (UTC)

are two nazis actually in an edit war or is it just one person astroturfing --162.158.63.100 01:18, 19 November 2022 (UTC)

I'm trying to combat it, but I'll only be able to keep this up for around another 20 minutes or so. InfoManiac (talk) 01:21, 19 November 2022 (UTC)

Is TheusafBot ofline or something? Generally it handles this sort of stuff pretty well--Mapron01 (talk) 01:44, 19 November 2022 (UTC)
I'm pretty sure he is. Starstar (talk) 02:23, 19 November 2022 (UTC)

This reminds me of the time I used a character in my password that was the "stty kill" character for one workstation's default console terminal settings. I normally logged in via ssh, and occasionally logged in via xdm, but the time I tried logging in via the console, it really didn't like what was left of my password. 162.158.62.180 01:25, 19 November 2022 (UTC)

Ah, the good old days when ordinary printing characters were used for erase and kill. Barmar (talk) 01:43, 19 November 2022 (UTC)

Vandals are just looking for a fun time, generally. Solution: make it not a fun time for them. Revert their edits dryly, patiently, with no particular comment or anything. Eventually they will get bored and find something else to do. Or, perhaps they'll sit there vandalizing while we revert them, we dozens against probably just one vandal. But if you make your irritation clear, that's "fun" to them, and they'll keep at it with renewed vigour. 108.162.216.239 01:37, 19 November 2022 (UTC)

I accidentally used a backspace character in a username one time. It caused all sorts of problems with my account.

Also, I've never found the whole "The trolls will leave you alone if you don't move." thing to be effective. But I've never found anything else to be effective at universally adjusting behavior either. -Master Areth

I wrote most of the current page after the first paragraph. It's a fairly sloppy first draft that could probably use some editing. Anyone who can should feel free to clean it up. Especially since the page is now protected (I'm not complaining; it was necessary) and so I can't edit it any more. Equites (talk) 05:57, 19 November 2022 (UTC)

Hi Equites, I rewrote the explanation, hope that's okay. I removed the references to the security aspect because I didn't think it was relevant. (Also pinging FrankHightower.) --Hddqsb (talk) 07:59, 20 November 2022 (UTC)
The first paragraph seems a bit superfluous - it's basically just a description of the comic, so isn't really adding anything to the explanation. Also, I think the bit about Pascal could come out of the second para - it doesn't appear to be relevant to what's going on in the comic, so it could just skip to the bit about null terminators.172.70.91.54 16:46, 21 November 2022 (UTC)
I removed the most superfluous part from the first paragraph, and pared down the explanation of Pascal strings (diff). I didn't remove the first paragraph entirely because I think it provides important context and details which are implicit in the comic. And I think it's important to at least mention Pascal strings because that sets the scene for the explanation of C strings (which don't explicitly store the length). --Hddqsb (talk) 10:08, 22 November 2022 (UTC)

Seems to be another Tech issue comic, its a tech issue with Cueball talking to Megan and the tech issue is extremely cursed. Should we add this one?162.158.22.98 06:00, 19 November 2022 (UTC)

"since there is no sequence of keys he could type that would result in a null terminator" ... I can type a NULL (ASCII 00) just fine in my editor on Linux (ctrl-v ctrl-@, the latter I type as ctrl-shift-2). However, I am not quite sure how to phrase this in the explanation without sounding like "Áctually! ...." Henri

I am amused that both in the main text and in this comment something has converted the "at sign" into [email protected].

The title text is likely a reference to this reddit post. Pb (talk) 07:06, 19 November 2022 (UTC)

I don't think that's likely... --Hddqsb (talk) 08:50, 20 November 2022 (UTC)

The only thing is I'm pretty sure it's not terribly difficult to enter a null string character, you just have to know what it is. On a PC with a keyboard that has a number pad, you can press Alt-[Number] to enter special characters using their ASCII code (Alt-65 will get "A", Alt-8 is backspace or delete, I forget which but I think BS, etc. MIGHT need leading zeroes to be 3 digits). The 0 to 31 codes - 32 is space, starting the normal characters - tend to have all the special characters, I think null string is 0? NiceGuy1 (talk) 04:14, 20 November 2022 (UTC)

It is. And (with caveats, depending upon other issues and circumstances) Alt-numpad0 would give me the null-char wherever it's practical and not blocked (intentionally or just because it isn't specifically catered for).172.71.178.206 15:25, 20 November 2022 (UTC)
I know a sysadmin friend of mine had to help a user whose account name was "🦙" (The Llama unicode symbol) and he was on a computer where not all layers between the username field and the password authentication understood unicode. Examples like this will happen in real life. IIVQ (talk) 11:16, 21 November 2022 (UTC)
Were they Spanish, by any chance?172.70.90.173 16:49, 21 November 2022 (UTC)

As Cueball is showing and handing over his laptop, I don't think the issue is about a website account (where he could probably do a password reset), but his local account on the laptop, of which he is now locked out, and hopes Poneytail can break into it? ghen (talk) 18:28, 19 November 2022 (UTC)

Good point, updated to avoid referring to "website" specifically. (Another possibility is that it is the password for some installed application.) --Hddqsb (talk) 07:17, 20 November 2022 (UTC)

"Suppose a website's registration form allows the user's new password to have up to 20 characters, but due to a programmer error the login page only accepts passwords with up to 18 characters."
There are also cases where page or application is updated with the expectation that old user accounts will still be working, but updated page no longer accepts same characters (or number of characters) than the old one, locking some people out. -- Hkmaly (talk) 01:35, 20 November 2022 (UTC)

I know from experience that (at least one version of) Windows Server allows very long passwords and that the Windows Server installer will accept very long passwords when setting up the initial admin account, but that the installer silently truncates the password to a "normal" length when actually setting up said account. If you aren't aware of this (and you have a client that uses ridiculously long passwords), you can easily trick yourself into thinking you mistyped and locked yourself out, and have to reinstall. Once installed with a shorter password, it can be changed to whatever length you want.172.70.134.122 16:16, 21 November 2022 (UTC)

Concerning the password described in the title text. If the characters are used in the order they appear in the Unicode Table the password starts with the Null String Terminator and therefor you will essentially end up with an empty password if C or a programming language is used handling strings the same way. Kimmerin (talk) 12:51, 21 November 2022 (UTC)

Good point, added (snapshot). --Hddqsb (talk) 15:38, 21 November 2022 (UTC)

I've actually had this problem long ago; I used the @ sign as part of my password, and it didn't let me log in anymore. Some systems in the good old days (I think it was an FTP server) used the @ character to separate username and password when authenticating. Also, I am still running into this problem sometimes with usernames (emails) allowing "+" in the address on registration, but not when logging in. Pbb (talk)

The @-sign is used to separate authentication and hostname information in an URL, e.g. http://user:[email protected]:port/... Within an FTP-session it was commonly used in FTP-proxy scenarios, i.e. you've connected to an internal FTP-proxy-server providing username and hostname as username in the form [email protected] (similar to the syntax used for scp/sftp) and the password as is. An @-sign in the password in the latter shouldn't have any effect and within the URL an @-character would get URL-encoded not having an effect, either. URL-encoding might be the reason for the last problem, you've described leading to a space in the stored value on the server side. Kimmerin (talk) 15:50, 21 November 2022 (UTC)

A very similar situation happened when I was network manager at Moravian College back in the mid-‘90s. A user was unknowingly typing an ASCII 0 character as a “special” character for their password, and doing it as like the 4th character typed, so the rest of what they typed (which was about 8 more characters) was simply ignored, the system thought their password was just the first 3 characters, the user was none the wiser, until the day I implemented checks to require “strong” passwords that included a minimum length. The user came to me all huffy that their password *was* long enough, but they system was making them change it, but not accepting the change. I never ask users for their password, so diagnosing the problem took a few tries, I had to think to ask them to prepend 8 x’s to the front of their password, and when that worked then I understood the problem.

NULL was also a headache for me in the early 2000’s, working with Oracle web forms, and some weird interaction of software bugs between a particular version of Safari web browser, Apache web server, and Oracle somehow allowed the string “NULL” to get into the Oracle database, breaking the SQL Boolean function IS NULL. The kludge was to change the IF [string] IS NULL” test to be IF [string] IS NULL OR [string] = “NULL” (Unfortunately not the ugliest code I have ever written) John (talk) 12:40, 25 November 2022 (UTC)

Not with null-character, that I'm aware, but when our small company (with Novell-based networking, for fule-servers, printers and most asynchronous communications to the outside world via a somewhat proprietry email gateway over a dial-up) merged into a larger company (with NT servers, and the rest, and now tied directly into their worldwide-WAN by ISDN) there were various hiccoughs in making sure existing and extended infrastructure didn't have conflicting ideas of what was acceptible in the now unified logins. (Not to mention that our username system had been initial-based, but we were now needing formats based upon full names. We had to keep both continuity (for our own long term usage validation) and a migration (to integrate into theirs) and otherwise competent users who were big experts in their own field of data analysis often could not handle the technicalities of multiple/nested logins or the logistical fallout from having their initial login profiles 'remembering credentials'. The fuss it took, until we phased through a full migration (helped by some staff turnover) and relegated the much more competant Novell system to backup/archive servers only.
And then there was the printer that aperiodically 'broke' because the replacement Windows printserver was somehow unable to pass some particular control characters (not sure if null was ever amongst them) that were occasionally used as the daily-changing hashed output to 'sign' the printouts and thus prove their legacy/providence.
I got a great deal of experience with system migrations, from all that, but also a strong dislike of being pushed into them or things that aren't themselves 'broke' being 'fixed' by mandatory upgrades. 172.70.91.58 14:53, 25 November 2022 (UTC)