Editing 2169: Predictive Models

{{comic
| number    = 2169
| date      = June 28, 2019
| title     = Predictive Models
| image     = predictive_models.png
| titletext = WE WILL ARREST THE REVOLUTION MEMBERS [AT THE JULY 28TH MEETING][tab] "Cancel the meeting! Our cover is blown."
}}

==Explanation==


{{w|Predictive text}} is a feature on many systems where as you type the system automatically suggests likely words or phrases to follow what you have written to that point.  For instance, if you type "I'm heading" the system may suggest "home" or "back" as likely words to follow.  Predictive systems usually use prior input to generate their predictions, so if you frequently type "Totally amazing!" the system will suggest "amazing!" every time you type "totally" even if you actually want to type "totally true" sometimes.

In the comic, [[Cueball]] is using predictive text in Gmail to uncover a plot against his organization/government, but instead of using only his personal input, the system is using input from ''all'' users.  By typing in an obscure phrase related to revolution and a meeting, he gets the predictive text algorithm to display where and when the next supposedly secret meeting will be held based on other users input.  This works because it is unlikely that anyone else other than revolutionaries would be typing this phrase, thus the only data the algorithm has to predict from is the actual message from the revolutionaries on their next meeting.  The caption of the comic is pointing out that systems which use prior input for predictive purposes in this way can end up leaking information that might otherwise be considered private.  (However, this method may produce outdated information.  On June 29, 2019, typing in Google "Long live the revolution. Our next meeting will be at" gave the predicted completion "long live the revolution. our next meeting will be at comic con 2018", which would not be useful information to anyone looking for revolutionaries, because Comic-Con 2018 was already over.)

The title text shows the revolutionaries using the same technique.  By typing in "We will arrest the revolution members" they are hoping that the algorithm will suggest the time and date of their planned arrest, since no one other than the authorities would be typing in that phrase. Pressing the key [tab] to autocomplete that text produces "WE WILL ARREST THE REVOLUTION MEMBERS [AT THE JULY 28TH MEETING]", and the revolutionaries then say "Cancel the meeting! Our cover is blown." The revolutionaries have apparently made the serious mistake of holding secret meetings on regular, predictable dates (such as the 28th day of each month, the last date guaranteed to exist in any month of the Gregorian Calendar), and the authorities have successfully figured this out, either through the predictive-text attack or by other means.

Both examples assume that the revolutionaries and the authorities would be talking about very secret information in the clear on a network accessible to their adversaries.  In the real world people engaged in sensitive activities would communicate via code, encryption, or both, or would do so through what they believe to be secure channels.  There is still the danger of secret information leaking via non-secret channels, however.  

{{w|Side-channel attack|Side-channel attacks}} use information gained from the implementation of a system to deduce supposedly protected information.  A famous example occurred in World War II.  The Germans kept tank production figures a secret, but they gave items like engine blocks sequential serial numbers.  The Allies wanted to know exact tank production figures, so they solved the {{w|German tank problem}} by using statistical methods to analyze the distribution of these numbers on captured vehicles.  They were able to predict tank production figures extremely accurately, to the point they predicted 270 tanks in a month when 276 were actually built.  Thus the secret information on tank production leaked.

Some systems require frequent password change, in an effort to limit danger from a password being discovered.  However, people respond by chosing passwords in patterns, so it is easy to predict what subsequent passwords will be, given old ones, thus defeating the purpose of requiring frequent changes.[https://www.troyhunt.com/passwords-evolved-authentication-guidance-for-the-modern-era/ Passwords Evolved: Authentication Guidance for the Modern Era]

Although the comic title is "Predictive Models", the term {{w|Predictive modelling}} usually refers to computer programs that try to predict outcomes from data aggregation, such as reviewing health records to identify people most at risk from certain diseases based on weight, prior injuries, etc., before testing directly for the diseases themselves.  This is similar to but not precisely like the example in the comic, since predictive text is using direct input to predict further input, while predictive modelling is using related input (such as make and model of a car along with driver acceleration patterns) to predict a different output (such as likelihood of a crash).  Both predictive text and predictive modelling could leak information as the comic suggests, however.  

Predictive text and the possibility to leak unintended information has been parodied on xkcd before in [[1068: Swiftkey]].

==Transcript==
:[Cueball is sitting in an office chair at a desk typing on a laptop. Above him is the text he writes along with what the predictive text tool suggests, the latter in grey text. The TAB at the end is in a small frame.]
:Cueball typing: Long live the revolution. Our next meeting will be at<span style="color:gray">| the docks at midnight on June 28 [tab]</span>
:Cueball: ''Aha, found them!''

:[Caption below the panel:]
:When you train predictive models on input from your users, it can leak information in unexpected ways.

==Trivia==
*On its original release, the alt text was bugged. The full text would not display in certain browsers, and clicking the comic takes you to this page: [https://xkcd.com/%5BAT%2520THE%2520JULY%252028TH%2520MEETING%5D%5Btab%5D <nowiki>https://xkcd.com/[AT%20THE%20JULY%2028TH%20MEETING][tab]</nowiki>], which only shows "404 Not Found". 
**The anchor actually contains invalid HTML <nowiki><a href=" [AT THE JULY 28TH MEETING][tab] "Cancel the meeting! Our cover is blown.""></nowiki>. This would suggest that [[Randall]] didn't intend this behaviour.
**The image and alt text were later corrected, long before July 28th, 2019, further implying it was a simple mistake on Randall's part.
*Some browsers, only show the first part of the title text "WE WILL ARREST THE REVOLUTION MEMBERS." For example Firefox version 66 Windows does this, evidently some versions of Firefox and chrome do likewise on GNU/Linux, also Windows 10 Microsoft Edge

{{comic discussion}}

[[Category:Comics featuring Cueball]]