2169: Predictive Models
Predictive Models |
Title text: WE WILL ARREST THE REVOLUTION MEMBERS [AT THE JULY 28TH MEETING][tab] "Cancel the meeting! Our cover is blown." |
Explanation[edit]
Predictive text is a feature on many systems whereas you type the system automatically suggests likely words or phrases to follow what you have written to that point. For instance, if you type "I'm heading" the system may suggest "home" or "back" as likely words to follow. Predictive systems usually use prior input to generate their predictions, so if you frequently type "Totally amazing!" the system will suggest "amazing!" every time you type "totally" even if you actually go on to type "totally true" sometimes.
In the comic, Cueball is using predictive text to uncover a plot against his organization/government, but instead of using only his personal input, the system is using input from all users. By typing in an obscure phrase related to revolution and a meeting, he gets the predictive text algorithm to display where and when the next supposedly secret meeting will be held based on other users input. This works because it is unlikely that anyone else other than revolutionaries would be typing this phrase, thus the only data the algorithm has to predict from is the actual message from the revolutionaries on their next meeting. The caption of the comic is pointing out that systems which use prior input for predictive purposes in this way can end up leaking information that might otherwise be considered private. (However, this method may produce outdated information. On June 29, 2019, typing in Google "Long live the revolution. Our next meeting will be at" gave the predicted completion "long live the revolution. our next meeting will be at comic con 2018", which would not be useful information to anyone looking for revolutionaries, because Comic-Con 2018 was already over.)
The title text shows the revolutionaries using the same technique. By typing in "We will arrest the revolution members" they are hoping that the algorithm will suggest the time and date of their planned arrest, since no one other than the authorities would be typing in that phrase. Pressing the key [tab] to autocomplete that text produces "WE WILL ARREST THE REVOLUTION MEMBERS [AT THE JULY 28TH MEETING]", and the revolutionaries then say "Cancel the meeting! Our cover is blown." The revolutionaries have apparently made the serious mistake of holding secret meetings on regular, predictable dates (such as the 28th day of each month, the last date guaranteed to exist in any month of the Gregorian Calendar), and the authorities have successfully figured this out, either through the predictive-text attack or by other means.
Both examples assume that the revolutionaries and the authorities would be talking about very secret information in the clear on a network accessible to their adversaries. In the real world, people engaged in sensitive activities would communicate via code, encryption, or both, or would do so through what they believe to be secure channels. There is still the danger of secret information leaking via non-secret channels, however. Side-channel attacks use information gained from the implementation of a system to deduce supposedly protected information. A famous example occurred in World War II. The Germans kept tank production figures a secret, but they gave items like engine blocks sequential serial numbers. The Allies wanted to know exact tank production figures, so they solved the German tank problem by using statistical methods to analyze the distribution of these numbers on captured vehicles. They were able to predict tank production figures extremely accurately, to the point they predicted 270 tanks in a month when 276 were actually built. Thus, the secret information on tank production leaked.
Some systems require frequent password change, in an effort to limit danger from a password being discovered. However, people respond by choosing passwords in patterns, so it is easy to predict what subsequent passwords will be, given old ones, thus defeating the purpose of requiring frequent changes.Passwords Evolved: Authentication Guidance for the Modern Era
Although the comic title is "Predictive Models", the term Predictive modelling usually refers to computer programs that try to predict outcomes from data aggregation, such as reviewing health records to identify people most at risk from certain diseases based on weight, prior injuries, etc., before testing directly for the diseases themselves. This is similar to but not precisely like the example in the comic, since predictive text is using direct input to predict further input, while predictive modelling is using related input (such as make and model of a car along with driver acceleration patterns) to predict a different output (such as likelihood of a crash). Both predictive text and predictive modelling could leak information as the comic suggests, however. Predictive text and the possibility to leak unintended information has been parodied on xkcd before in 1068: Swiftkey.
Transcript[edit]
- [Cueball is sitting in an office chair at a desk typing on a laptop. Above him is the text he writes along with what the predictive text tool suggests, the latter in grey text. The TAB at the end is in a small frame.]
- Cueball typing: Long live the revolution. Our next meeting will be at| the docks at midnight on June 28 [tab]
- Cueball: Aha, found them!
- [Caption below the panel:]
- When you train predictive models on input from your users, it can leak information in unexpected ways.
Trivia[edit]
- On its original release, the alt text was bugged. The full text would not display in certain browsers, and clicking the comic takes you to this page: https://xkcd.com/[AT%20THE%20JULY%2028TH%20MEETING][tab], which only shows "404 Not Found".
- The anchor actually contains invalid HTML <a href=" [AT THE JULY 28TH MEETING][tab] "Cancel the meeting! Our cover is blown."">. This would suggest that Randall didn't intend this behaviour.
- The image and alt text were later corrected, long before July 28th, 2019, further implying it was a simple mistake on Randall's part.
- Some browsers, only show the first part of the title text "WE WILL ARREST THE REVOLUTION MEMBERS." For example Firefox version 66 Windows does this, evidently some versions of Firefox and chrome do likewise on GNU/Linux, also Windows 10 Microsoft Edge
Discussion
If you click on the comic, it opens a page with error 404. Looking at the URL, it says "At the July 28th meeting", which I assume is the prediction result to the title text suggesting that they will be 1 month late. 162.158.106.174 17:13, 28 June 2019 (UTC)
In the HTML tag for the link (the <a> tag surrounding the comic image) after the link it says "cancel the meeting! our cover is blown" Everlastingwonder (talk) 17:21, 28 June 2019 (UTC)
In the mobile version, you can read «See also: [AT THE JULY 28TH MEETING][tab] "Cancel the meeting! Our cover is blown."» It leads to a 404, like the other examples in the comments here. 172.69.44.136 17:31, 28 June 2019 (UTC)
This looks a whole lot like Gmail's Smart Compose 172.68.206.76
Today GMail actually predicted the beginning of my mail correctly. I typed literally zero characters and it already knew how to continue. In the future, we won't even have to upload our brains to a computer, a backup will already be available there automatically. Fabian42 (talk) 21:32, 28 June 2019 (UTC)
Not a backup, a simulation. 108.162.219.184 04:46, 29 June 2019 (UTC)
- If you can't tell the difference, does it matter? 173.245.48.147 17:04, 1 July 2019 (UTC)
On my Mac the title text only shows "WE WILL ARREST THE REVOLUTION MEMBERS" while on my iPad (where you long press to see title texts) long pressing only shows the link. Weird. Also someone remind me to check the link again on July 28. Herobrine (talk) 13:10, 29 June 2019 (UTC)
- On my Ubuntu system, both Firefox and Chrome display "WE WILL ARREST THE REVOLUTION MEMBERS" as the title text and "https://xkcd.com/[AT THE JULY 28TH MEETING][tab]" as the link target, which is also what's in the HTML source. Additionally, the HTML source is malformed, with quotes inside quotes in the href attribute. - Linneris (talk) 14:37, 29 June 2019 (UTC)
- Malformed. Precisely! I think there was a glitch while the comic was uploaded, which used the title text as a link in addition to as the title text. It didn't include the last part due to the quotes. It will be either fixed or legitimate, or at least make the href a little nicer. That's right, Jacky720 just signed this (talk | contribs) 21:24, 29 June 2019 (UTC)
- Actually... Looking at the comic again (for the first time on my PC), I would like to rethink that. I think this is Randall's method of demonstrating the [tab]; clicking and looking at the URL. [EDIT] Man, the more I think, the weirder it gets. Maybe it's about how sometimes you can find the information on the client side in the code where it should be hidden? I don't know anymore. That's right, Jacky720 just signed this (talk | contribs) 21:27, 29 June 2019 (UTC)
- When you look at the source of that 404 page, you can see six HTML comments with the content a padding to disable MSIE and Chrome friendly error page. This is to prevent MSIE and Chrome from displaying "helpful" proprietary error pages. If you change the link in the slightest, you will also get a 404 page, but without these comments. I assume that either this was a glitch (intended or unintended) and this particular 404 page was modified so that everyone can see that the authors are aware of it, *or* it's a hint pointing to somewhere else. A rabbit hole maybe? I would like the latter to be true, but I haven't found anything.--162.158.90.168 22:42, 29 June 2019 (UTC)
- My computer did that, but then it didn't happen anymore and the title text was complete.172.69.44.146 13:09, 22 July 2019 (UTC)
- Malformed. Precisely! I think there was a glitch while the comic was uploaded, which used the title text as a link in addition to as the title text. It didn't include the last part due to the quotes. It will be either fixed or legitimate, or at least make the href a little nicer. That's right, Jacky720 just signed this (talk | contribs) 21:24, 29 June 2019 (UTC)
This reminds me of that time where via data analytics on things like shopping habits, Target figured out that a teen girl was pregnant before her father did. Ahiijny (talk) 06:42, 30 June 2019 (UTC)
I tried this on google, and got "we will arrest chamisa" and "the meeting will be in room 27" and "our next meeting will be at 3 p.m. on wednesday". Any more? 162.158.59.214 19:16, 30 June 2019 (UTC)
I decided to see what a more sophisticated predictive model would do, so I plugged it into Talk to Transformer. The output: "Long live the revolution. Our next meeting will be at 10 a.m. on December 14 at the Cressey Building, 1636 S. Second St. Please invite your friends, family, and coworkers! For those interested in donating to the cause, please contact:" I'm legitimately impressed. Arcorann (talk) 01:03, 1 July 2019 (UTC)
Thinking about predictive text, in combination with the advice on the futility of making people change their passwords frequently, perhaps systems which require people to change their passwords could be more helpful by observing the pattern the user is using, and suggesting what the next password should be. Passwords Evolved: Authentication Guidance for the Modern Era 162.158.106.216 20:05, 1 July 2019 (UTC)
This paragraph was in the explanation, however the cited source gives no information about how the private correspondence was obtained, and no suggestion that the privacy of the communication channel was compromised. (The most obvious way that such information would be obtained is that somebody who was party to the communication made it available.) I moved it here in case somebody has sources to show that it was a breach of security. "As humanity adapts to a digital world, people are finding that their digital communications provide the illusion of confidentiality, with damaging results when the information leaks out. Real-life examples include a 2016 British trainee doctor strike, where a technically-secure WhatsApp group leaked information to the press." 108.162.245.220 05:18, 3 July 2019 (UTC)
Incomplete tag worth saving for posterity, due to H2G2 reference: Created by a PREDICTIVE MODEL THAT WILL BE FIRST AGAINST THE WALL WHEN THE REVOLUTION COMES. --172.68.47.240 01:56, 12 July 2019 (UTC)
So... is that midnight at 00:00hrs or 24:00hrs upon that date? Might the intending captors choose wrongly and lose the opportunity to pounce upon those they wish to take captive? 172.71.178.137 01:20, 25 March 2023 (UTC)
Social media has been used in revolutions[edit]
Once revolutions achieve critical mass, they often communicate on more insecure channels. Many of the Arab Spring revolutions involved spread through Twitter. Broadly speaking the security vs contagiousness issues often cause disagreement among revolutionaries.
I.E. Trotskyist/Stalinist disagreements over "Permanent revolution" (expansionist) vs "Socialism in one country" (security and development of the USSR without spending all available surplus on spreading communism directly).