Editing 2700: Account Problems

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 8: Line 8:
 
| titletext = My password is just every Unicode codepoint concatenated into a single UTF-8 string.
 
| titletext = My password is just every Unicode codepoint concatenated into a single UTF-8 string.
 
}}
 
}}
+
 
 
==Explanation==
 
==Explanation==
 +
{{incomplete|Created by a VISIBLE ZERO WIDTH SPACE. Please change this comment when editing this page. Do NOT delete this tag too soon.}}
 +
[[Cueball]] walks toward [[Ponytail]] carrying his laptop. Ponytail is sitting at her desk, and turns to face him. Having attempted to fix Cueball's computer problems before (e.g., [[2083: Laptop Issues]]), she replies with dread. Cueball promises that "It's a normal problem this time", and Ponytail reluctantly agrees to look at it. Cueball then reveals that he has included a {{w|Null-terminated_string|null string terminator character}} in one of his passwords, probably for a website. Ponytail responds in disbelief, and Cueball defends his actions by saying that the website told him to use special characters.
 +
 +
In computers, every "character" is a sequence of bytes. Every byte is a sequence of eight bits. A bit is always either a zero (0) or a one (1).
 +
 +
Every character is a sequence of bytes, but not every sequence of bytes is a valid character. For example, a JPEG image is also a sequence of bytes (much longer than a character). An MP3 audio file is also a sequence of bytes.
  
[[Cueball]] asks [[Ponytail]] to help him because he can't log in to his account. Having attempted to fix [[:Category:Cueball Computer Problems|Cueball's tech issues]] in the past, Ponytail replies with dread. Cueball promises that "It's a normal problem this time", and Ponytail agrees to look at it. But then Cueball reveals that he has included a {{w|Null character|null string terminator character}} in his password when creating an account and now he can't log in.
+
A null string terminator is a type of character called a control character. Unlike characters which are letters of the alphabet or numbers, control characters are not intended to be displayed on the screen, and are not intended to be typed on a keyboard; rather, they are used for internal purposes in the computer program. It is thus strange and hard to understand how Cueball was able to successfully insert such a character in his password, since there is no sequence of keys he could type that would result in a null terminator.
  
In computer systems, every {{w|Character (computing)|"character"}} (letter, digit, punctuation, etc.) is represented as an integer. For example the lowercase letter 'a' is represented as the number 97, and the digit '1' is represented as the number 49 (when using the {{w|ASCII}} character encoding or {{w|Unicode}} character encoding). A {{w|String (computer science)|"string"}} refers to a sequence of characters, and can be used to store arbitrary text (for example names, messages, passwords). Strings can be arbitrarily long, so some mechanism must be used to record their length. One approach is to store the length explicitly ({{w|String_(computer_science)#Length-prefixed|Pascal string}}). Another approach is to mark the end of the string using a specific character, usually the {{w|null character}} (which is represented as the number 0); such strings are called {{w|null-terminated string}}s, and are used by the {{w|C (programming language)|C programming language}}. Both approaches have advantages and disadvantages. A limitation of null-terminated strings is that they cannot be used to represent text containing embedded null characters. This is usually not a problem, because normal text never contains null characters. However, if somehow a null character were to end up in the string, it would cause problems: any code that uses that string would assume this null character marks the end of the string, so the string would effectively be cut off.
+
Null terminators are used in older, C-based languages to mark where a string ends. Every programming language has variables{{citation needed}}, which are used to store data. In C, a primitive variable can store a small amount of data, such as an integer or boolean (true or false) value. Strings (which are a sequence of characters) often need to store much larger amounts of data; too much to fit in the memory space which is available for a primitive. To solve this, C uses a system called "pointers", in which the variable is an integer which refers to a memory location. When the string needs to be read or written, C looks up the memory location, and interprets the data as a series of characters. One problem is, because a string can be any length (big or small), C needs to know where to stop reading from memory. The null terminator is C's solution to this. When C encounters the null terminator, it knows it has reached the end of the string and stops reading. Therefore, it is important that the null terminator is not a normal character that can be typed on a keyboard.
  
Account registration systems often place requirements on passwords in an attempt to encourage users to pick stronger passwords. For example, they might ask that the password include at least one "special character" (such as <code>!@#$%^&*</code>). Cueball misunderstood this requirement as referring to characters such as the null character (which is more accurately referred to as a {{w|ASCII#Control_characters|control character}}). Cueball managed to type the null character as part of his password somehow (on some systems it is possible to type the null character using {{w|Null_character#Representation|certain keyboard shortcuts}} such as <code>Ctrl</code>+<code>Space</code>, <code>Ctrl</code>+<code>@</code>, <code>Ctrl</code>+<code>2</code>, or <code>Alt+0</code> {{w|Alt_code|using the number pad}}), but the software running the registration system was poorly written and could not cope with this &ndash; it allowed him to create an account with that password, but then when he tried to log in with the same password the system didn't accept it.  
+
This has implications for security. If users are able to add or remove null terminators at will, then they can exploit C's string reading mechanisms in order to read data in a way not intended by the software programmers. If a malicious user is successful in doing this, they may be able to intentionally cause security problems on the computer, such as infecting it with malware.
  
It's unclear how that particular situation might arise in real software, but here is a similar situation that can easily happen in practice: Suppose a website's registration form allows the user's new password to have up to 20 characters, but due to a programmer error the login page only accepts passwords with up to 18 characters. If the user picks a medium-length password (say with 12 characters), all is well. But if the user picks a password with 20 characters, they will find themselves in the same position as Cueball, being able to register but not able to log in. Some additional situations are described [[#Trivia|below]].
+
Based on Ponytail's reaction, this is not the first time Cueball has come to her with strange problems. Based on Cueball's reaction, it does not look like he was purposely trying to exploit a security vulnerability, but instead ended up in this situation through some mysterious, unexplained happenstance.
  
The title text describes a password which is "just" every Unicode character concatenated into a single string. {{w|Unicode}} is a standard for representing characters from many writing systems, and it has {{w|Unicode#Versions|149,186 characters}} as at the time of this comic (with new characters being added over time). A password consisting of all of those characters would be extremely long; it would be impractical to type by hand, and would be too long for pretty much all account registration systems. (A "codepoint" is the number assigned to a character, and {{w|UTF-8}} is a common encoding system for representing each Unicode codepoint as a sequence of {{w|byte}}s.) Also, since Unicode includes the null character, the password would have the same issue as Cueball's password. Further, if the account registration system treats the null character as a string terminator (as in C), then the password would be equivalent to an empty password (assuming it contains the Unicode codepoints in order, starting with the null character).
+
Cueball notes that his password contains a "special character", which is a typical requirement imposed on users. However, in most contexts, "special character" means an ordinary printable character, other than letters or numbers, that can be typed on a normal keyboard and seen on the screen. Cueball's use of "special" is technically true, as null terminals do have a specialized purpose; but his word usage is not in keeping with the way that phrase is normally understood.
  
 
==Transcript==
 
==Transcript==
:[Cueball carries an open laptop over to Ponytail, holding it in both hands. The screen shows a box filling the screen with some text on lines. Ponytail is sitting in an office chair with her laptop at her desk. She has turned her head away from the computer looking at Cueball's screen.]
+
{{incomplete transcript|Do NOT delete this tag too soon.}}
 +
 
 +
:[Cueball walks up to Ponytail.]
 
:Cueball: Can you help me with my account?
 
:Cueball: Can you help me with my account?
 
:Ponytail: Oh no.
 
:Ponytail: Oh no.
 
:[Cueball holds his laptop up in front of Ponytail who has turned the chair so she faces him, with her hands in her lap. Her table is not drawn.]
 
 
:Cueball: No no, I promise it's a normal problem this time.
 
:Cueball: No no, I promise it's a normal problem this time.
 
:Ponytail: Okay. Fine. What is it?
 
:Ponytail: Okay. Fine. What is it?
 
:[Cueball holds both hands out palm up towards Ponytail who is sitting with his laptop in her lap typing on it.]
 
 
:Cueball: I included a null string terminator as part of my password, and now I can't-
 
:Cueball: I included a null string terminator as part of my password, and now I can't-
:Ponytail: '''''How?!'''''
+
:Ponytail: How?!
 
:Cueball: They said to use special characters!
 
:Cueball: They said to use special characters!
 
==Trivia==
 
 
* User input containing unsafe characters has previously appeared in the famous comic [[327: Exploits of a Mom]].
 
 
* Here are some additional situations where passwords with special characters might stop working:
 
** The registration form allows passwords to contain null characters, but the login form strips null characters (for example because it was written by a different developer/team, or because it has been updated over time). When Cueball tries to log in, the login form strips the null characters, so the resulting password can ''never'' match such a stored password (which contains a null character).
 
** The password system accepts Unicode characters at first, but is later changed to only accept ASCII passwords. Users who included non-ASCII characters like é or ö in their password become locked out of their account because they are no longer allowed to submit those characters.
 
** Passwords containing non-ASCII characters are in general problematic, because it might not be possible to type them on the keyboard used for logging in. For example, on Mac OS a logged-in user can change their password to one that contains emojis, but the keyboard on the login screen does not have good support for typing emojis.[https://medium.com/@hvost/why-you-should-not-use-emojis-in-your-passwords-b8db0607e169][https://apple.stackexchange.com/questions/202143/i-included-emoji-in-my-password-and-now-i-cant-log-in-to-my-account-on-yosemite].
 
** A business network may have multiple systems that connect to a central database of usernames and passwords. If the systems have different password handling rules, a user might find that some of the systems don't support their password (for example because the password contains a character which is forbidden on a particular system).
 
 
* There are several techniques that can be used to safely handle passwords and other user inputs that might contain unsafe characters such as the null character:
 
** Validate: Check whether the user input contains unsafe characters, and if it does display an error message to the user.
 
** Sanitize: Remove unsafe characters from the user input to prevent them from causing problems.
 
** Encode/quote/escape: Replace each unsafe character with an appropriate sequence of characters (depending on the context). For example, a null character can be included in a {{w|URL}} by encoding it as <code>%00</code>. This technique is not very relevant to password handling, but is relevant for example when {{w|Cross-site_scripting#Non-persistent_(reflected)|including user input in generated web pages}} or [https://www.cloudflare.com/learning/security/threats/sql-injection/ passing user inputs to database queries].
 
** For the specific case of null characters: Use a string representation that supports null characters (e.g. Pascal strings), and be very careful not to pass such strings to functions that can't handle embedded null characters.
 
 
* Failure to handle strings containing null characters correctly can result in security vulnerabilities. For example, including a null character in crafted input may allow a user to read or write files that they are not supposed to be able to access.[https://insecure.org/news/P55-07.txt][https://elixirforum.com/t/static-and-session-security-fixes-for-plug/3913]
 
 
* In C, a string is usually stored in a block of memory that is allocated to have a known size. The maximum size of string that can be stored in such a buffer is one character less than the buffer's size, since the last character is used for the null terminator. Language functions that operate on strings, such as those that return the length of a specified string or which compare two strings, look for the terminator as a marker. However, there is a risk in using this feature: if that terminator is somehow overwritten by some other value, a function which assumes that there is still a stopping point may go far beyond the intended region of memory before it happens to find an unrelated terminator or otherwise is forced to stop looking. This can have serious security implications, as well as the potential for bugs and crashes. Instead, safe programming uses versions of the string functions that include a specification of the maximum allowed length. For example, the <code>strlen()</code> function takes a pointer to a string, counts the number of characters until it encounters a null terminator, and returns that number: the length of the string not including the terminator.  The <code>str'''n'''len()</code> function takes a pointer to a string and a maximum length, and counts characters until it either finds a terminator or reaches the maximum.
 
 
* The number of the xkcd comic is 2700. When interpreting this as two concatenated octal numbers \27 + \00 it represents both the {{w|End-of-Transmission-Block_character|ETB}} as well as the null character, both of these characters possibly leading to problems when processed in legacy systems (e.g. mainframe computers). When interpreting 2700 as hexadecimal 0x27 + 0x00 numbers it represents the ' character and the null character - a sequence that could lead to [[327: Exploits of a Mom|SQL injection]] when it is placed in unescaped form inside of a SQL command.
 
  
 
{{comic discussion}}
 
{{comic discussion}}
  
[[Category:Comics featuring Cueball]]
 
[[Category:Comics featuring Ponytail]]
 
 
[[Category:Cueball Computer Problems]]
 
[[Category:Cueball Computer Problems]]

Please note that all contributions to explain xkcd may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see explain xkcd:Copyrights for details). Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel | Editing help (opens in new window)