Editing 2700: Account Problems

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 10: Line 10:
 
   
 
   
 
==Explanation==
 
==Explanation==
 +
{{incomplete|Created by a VISIBLE ZERO WIDTH SPACE. Please change this comment when editing this page. Do NOT delete this tag too soon.}}
  
[[Cueball]] asks [[Ponytail]] to help him because he can't log in to his account. Having attempted to fix [[:Category:Cueball Computer Problems|Cueball's tech issues]] in the past, Ponytail replies with dread. Cueball promises that "It's a normal problem this time", and Ponytail agrees to look at it. But then Cueball reveals that he has included a {{w|Null character|null string terminator character}} in his password when creating an account and now he can't log in.
+
[[Cueball]] asks [[Ponytail]] to help him because he can't log in to his account. Having attempted to fix [[:Category:Cueball Computer Problems|Cueball's tech issues]] in the past, Ponytail replies with dread. Cueball promises that "It's a normal problem this time", and Ponytail agrees to look at it. But then Cueball reveals that he has included a {{w|Null character|null string terminator character}} in his password when creating an account and now he can't log in. Ponytail responds in disbelief, and Cueball defends his actions by saying that the instructions said to use special characters.
  
In computer systems, every {{w|Character (computing)|"character"}} (letter, digit, punctuation, etc.) is represented as an integer. For example the lowercase letter 'a' is represented as the number 97, and the digit '1' is represented as the number 49 (when using the {{w|ASCII}} character encoding or {{w|Unicode}} character encoding). A {{w|String (computer science)|"string"}} refers to a sequence of characters, and can be used to store arbitrary text (for example names, messages, passwords). Strings can be arbitrarily long, so some mechanism must be used to record their length. One approach is to store the length explicitly ({{w|String_(computer_science)#Length-prefixed|Pascal string}}). Another approach is to mark the end of the string using a specific character, usually the {{w|null character}} (which is represented as the number 0); such strings are called {{w|null-terminated string}}s, and are used by the {{w|C (programming language)|C programming language}}. Both approaches have advantages and disadvantages. A limitation of null-terminated strings is that they cannot be used to represent text containing embedded null characters. This is usually not a problem, because normal text never contains null characters. However, if somehow a null character were to end up in the string, it would cause problems: any code that uses that string would assume this null character marks the end of the string, so the string would effectively be cut off.
+
In computer systems, every {{w|Character (computing)|"character"}} (letter, digit, punctuation, etc.) is represented as an integer. For example the lowercase letter 'a' is represented as the number 97, and the digit '1' is represented as the number 49 (when using the {{w|ASCII}} character encoding or {{w|Unicode}} character encoding). A {{w|String (computer science)|"string"}} refers to a sequence of characters, and can be used to store arbitrary text (for example names, messages, passwords). Strings can be arbitrarily long, so some mechanism must be used to record their length. One approach is to store the length explicitly; this representation is often called a {{w|String_(computer_science)#Length-prefixed|Pascal string}} (after the programming language {{w|Pascal}}, which uses this representation). Another approach is to mark the end of the string using a specific character, usually the {{w|null character}} (which is represented as the number 0); such strings are called {{w|null-terminated string}}s, and are used by the {{w|C (programming language)|C programming language}}. Both approaches have advantages and disadvantages.
  
Account registration systems often place requirements on passwords in an attempt to encourage users to pick stronger passwords. For example, they might ask that the password include at least one "special character" (such as <code>!@#$%^&*</code>). Cueball misunderstood this requirement as referring to characters such as the null character (which is more accurately referred to as a {{w|ASCII#Control_characters|control character}}). Cueball managed to type the null character as part of his password somehow (on some systems it is possible to type the null character using {{w|Null_character#Representation|certain keyboard shortcuts}} such as <code>Ctrl</code>+<code>Space</code>, <code>Ctrl</code>+<code>@</code>, <code>Ctrl</code>+<code>2</code>, or <code>Alt+0</code> {{w|Alt_code|using the number pad}}), but the software running the registration system was poorly written and could not cope with this &ndash; it allowed him to create an account with that password, but then when he tried to log in with the same password the system didn't accept it.  
+
A limitation of null-terminated strings is that they cannot be used to represent text containing embedded null characters. This is usually not a problem, because normal text never contains null characters. However, if somehow a null character were to end up in the middle of the string, it would cause problems: any code that uses that string would assume this null character marks the end of the string, so the string would effectively be truncated.
  
It's unclear how that particular situation might arise in real software, but here is a similar situation that can easily happen in practice: Suppose a website's registration form allows the user's new password to have up to 20 characters, but due to a programmer error the login page only accepts passwords with up to 18 characters. If the user picks a medium-length password (say with 12 characters), all is well. But if the user picks a password with 20 characters, they will find themselves in the same position as Cueball, being able to register but not able to log in. Some additional situations are described [[#Trivia|below]].
+
Account registration systems often place requirements on passwords in an attempt to encourage users to pick stronger passwords. For example, they might ask that the password include at least one "special character" (such as <code>!@#$%^&*</code>). Cueball misunderstood this requirement as referring to characters such as the null character (which is more accurately referred to as a {{w|ASCII#Control_characters|control character}}). Cueball managed to type the null character as part of his password somehow (on some systems it is possible to type the null character using {{w|Null_character#Representation|certain keyboard shortcuts}} such as <code>Ctrl</code>+<code>Space</code>, <code>Ctrl</code>+<code>@</code>, or <code>Ctrl</code>+<code>2</code>), but the software running the registration system was poorly written and could not cope with this &ndash; it allowed him to create an account with that password, but then when he tried to log in with the same password the system didn't accept it. Software that accepts user inputs is supposed to "sanitize" the entered data, to prevent exactly this sort of problem. (This has come up previously, in [[327: Exploits of a Mom]].)
  
The title text describes a password which is "just" every Unicode character concatenated into a single string. {{w|Unicode}} is a standard for representing characters from many writing systems, and it has {{w|Unicode#Versions|149,186 characters}} as at the time of this comic (with new characters being added over time). A password consisting of all of those characters would be extremely long; it would be impractical to type by hand, and would be too long for pretty much all account registration systems. (A "codepoint" is the number assigned to a character, and {{w|UTF-8}} is a common encoding system for representing each Unicode codepoint as a sequence of {{w|byte}}s.) Also, since Unicode includes the null character, the password would have the same issue as Cueball's password. Further, if the account registration system treats the null character as a string terminator (as in C), then the password would be equivalent to an empty password (assuming it contains the Unicode codepoints in order, starting with the null character).
+
The number of the xkcd comic is 2700. When interpreting this as two concatenated octal numbers \27 + \00 it represents both the {{w|End-of-Transmission-Block_character|ETB}} as well as the null character, both of these characters possibly leading to problems when processed in legacy systems (e.g. mainframe computers). When interpreting 2700 as hexadecimal 0x27 + 0x00 numbers it represents the ' character and the null character - a sequence that could lead to SQL injection when it is placed in unescaped form inside of a SQL command.
 +
 
 +
It's unclear how that particular situation might arise in real software, but here is a similar situation that can easily happen in practice: Suppose a website's registration form allows the user's new password to have up to 20 characters, but due to a programmer error the login page only accepts passwords with up to 18 characters. If the user picks a medium-length password (say with 12 characters), all is well. But if the user picks a password with 20 characters, they will be able to register but they won't be able to log in (which is what happened to Cueball). Another problem can arise when the password system allows the input of Unicode characters at first, but is later changed to only store ASCII passwords: language-specific characters like é or ö are then no longer allowed, locking the user from their account until a new password is set.
 +
 
 +
The title text describes a different situation, where a person's password is "simply" every Unicode character concatenated into a single string. {{w|Unicode}} is a standard for representing characters from many writing systems, and it has 149,186 characters[https://en.wikipedia.org/wiki/Unicode#Versions] as at the time of this comic (with new characters being added over time). A password consisting of all of those characters would be extremely long; it would be impractical to type by hand, and would be too long for pretty much all account registration systems. (A "codepoint" is the number assigned to a character, and {{w|UTF-8}} is a common encoding system for representing a Unicode codepoint as a sequence of {{w|byte}}s.)
  
 
==Transcript==
 
==Transcript==
:[Cueball carries an open laptop over to Ponytail, holding it in both hands. The screen shows a box filling the screen with some text on lines. Ponytail is sitting in an office chair with her laptop at her desk. She has turned her head away from the computer looking at Cueball's screen.]
+
:[Cueball carries an open laptop over to Ponytail, holding in in both hands. The screen shows a box filling the screen with some text on lines. Ponytail is sitting in an office chair with her laptop at her desk. She has turned her head away from the computer looking at Cueball's screen.]
 
:Cueball: Can you help me with my account?
 
:Cueball: Can you help me with my account?
 
:Ponytail: Oh no.
 
:Ponytail: Oh no.
Line 34: Line 39:
 
:Ponytail: '''''How?!'''''
 
:Ponytail: '''''How?!'''''
 
:Cueball: They said to use special characters!
 
:Cueball: They said to use special characters!
 
==Trivia==
 
 
* User input containing unsafe characters has previously appeared in the famous comic [[327: Exploits of a Mom]].
 
 
* Here are some additional situations where passwords with special characters might stop working:
 
** The registration form allows passwords to contain null characters, but the login form strips null characters (for example because it was written by a different developer/team, or because it has been updated over time). When Cueball tries to log in, the login form strips the null characters, so the resulting password can ''never'' match such a stored password (which contains a null character).
 
** The password system accepts Unicode characters at first, but is later changed to only accept ASCII passwords. Users who included non-ASCII characters like é or ö in their password become locked out of their account because they are no longer allowed to submit those characters.
 
** Passwords containing non-ASCII characters are in general problematic, because it might not be possible to type them on the keyboard used for logging in. For example, on Mac OS a logged-in user can change their password to one that contains emojis, but the keyboard on the login screen does not have good support for typing emojis.[https://medium.com/@hvost/why-you-should-not-use-emojis-in-your-passwords-b8db0607e169][https://apple.stackexchange.com/questions/202143/i-included-emoji-in-my-password-and-now-i-cant-log-in-to-my-account-on-yosemite].
 
** A business network may have multiple systems that connect to a central database of usernames and passwords. If the systems have different password handling rules, a user might find that some of the systems don't support their password (for example because the password contains a character which is forbidden on a particular system).
 
 
* There are several techniques that can be used to safely handle passwords and other user inputs that might contain unsafe characters such as the null character:
 
** Validate: Check whether the user input contains unsafe characters, and if it does display an error message to the user.
 
** Sanitize: Remove unsafe characters from the user input to prevent them from causing problems.
 
** Encode/quote/escape: Replace each unsafe character with an appropriate sequence of characters (depending on the context). For example, a null character can be included in a {{w|URL}} by encoding it as <code>%00</code>. This technique is not very relevant to password handling, but is relevant for example when {{w|Cross-site_scripting#Non-persistent_(reflected)|including user input in generated web pages}} or [https://www.cloudflare.com/learning/security/threats/sql-injection/ passing user inputs to database queries].
 
** For the specific case of null characters: Use a string representation that supports null characters (e.g. Pascal strings), and be very careful not to pass such strings to functions that can't handle embedded null characters.
 
 
* Failure to handle strings containing null characters correctly can result in security vulnerabilities. For example, including a null character in crafted input may allow a user to read or write files that they are not supposed to be able to access.[https://insecure.org/news/P55-07.txt][https://elixirforum.com/t/static-and-session-security-fixes-for-plug/3913]
 
 
* In C, a string is usually stored in a block of memory that is allocated to have a known size. The maximum size of string that can be stored in such a buffer is one character less than the buffer's size, since the last character is used for the null terminator. Language functions that operate on strings, such as those that return the length of a specified string or which compare two strings, look for the terminator as a marker. However, there is a risk in using this feature: if that terminator is somehow overwritten by some other value, a function which assumes that there is still a stopping point may go far beyond the intended region of memory before it happens to find an unrelated terminator or otherwise is forced to stop looking. This can have serious security implications, as well as the potential for bugs and crashes. Instead, safe programming uses versions of the string functions that include a specification of the maximum allowed length. For example, the <code>strlen()</code> function takes a pointer to a string, counts the number of characters until it encounters a null terminator, and returns that number: the length of the string not including the terminator.  The <code>str'''n'''len()</code> function takes a pointer to a string and a maximum length, and counts characters until it either finds a terminator or reaches the maximum.
 
 
* The number of the xkcd comic is 2700. When interpreting this as two concatenated octal numbers \27 + \00 it represents both the {{w|End-of-Transmission-Block_character|ETB}} as well as the null character, both of these characters possibly leading to problems when processed in legacy systems (e.g. mainframe computers). When interpreting 2700 as hexadecimal 0x27 + 0x00 numbers it represents the ' character and the null character - a sequence that could lead to [[327: Exploits of a Mom|SQL injection]] when it is placed in unescaped form inside of a SQL command.
 
  
 
{{comic discussion}}
 
{{comic discussion}}

Please note that all contributions to explain xkcd may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see explain xkcd:Copyrights for details). Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel | Editing help (opens in new window)