Welcome to the explain xkcd wiki!
We have an explanation for all 2450 xkcd comics

AI Methodology
We've learned that weird spacing and diacritics in the methodology description are apparently the key to good research; luckily, we've developed an AI tool to help us figure out where to add them.
Title text: We've learned that weird spacing and diacritics in the methodology description are apparently the key to good research; luckily, we've developed an AI tool to help us figure out where to add them.


Ambox notice.png This explanation may be incomplete or incorrect: Created by a BOT (91%). TRAINED BY AN ADVERSARIAL AI (72%). If you are knowledgeable about AI, please rewrite at least one paragraph for us. The current content was completely fudged by amateurs. For instance explain Classifier and methodology for someone not familiar with these terms. Do NOT delete this tag too soon.

The joke in this comic is that the people are using artificial intelligence (AI) without understanding how to, and that by doing this networks of AI are controlling our research. The classifier is trained on data that doesn't include the causes of the results and may have even been generated from the same codebase, and then not tested it at all, producing a model that is both random and heavily overfitted. Such a model appears perfect but makes random predictions on new data. The title text is describing this happening, and how. For an introduction to machine learning, you can visit .

This comic shows Cueball giving a presentation of some description. He is reassuring his audience of the validity of his research's methodology, which he says is "AI-based". There are many issues that can arise from an AI-based methodology, such as lingering influence from its training data or a bad algorithm reducing the quality of the investigation.

Cueball seeks to reassure his audience by quantifying the quality of his methodology. He does this by creating yet another AI to rank methodologies. This would not actually improve the confidence of any audience member, as any flaws of the methodology AI would likely be shared by the ranking AI, due to being created by the same team.

This is problematic because the concerns about his methodology are not concerns about the methodology section. A methodology section refers to quality of writing and is a specific section of a research paper. A good methodology section would accurately and clearly explain what he did, but does not mean the research methodology itself was valid. Therefore, claiming that he has a good methodology section does nothing to address concerns with research methodology.

Furthermore, the ranking AI heavily favors the methodology of Cueball's AI, and may be biased. It shows a normal distribution, with a singular outlier to the far right with an arrow above. It can be inferred (from the arrow) that this data-point represents the AI's methodology. It is a significant outlier, and as such it is probably not an accurate representation of Cueball's AI. Alternatively, this could be taken as AI 'nepotism', where Cueball's methodology AI is more likely to select AI-based approaches over others. This type of algorithmic bias is mentioned in 2237: AI Hiring Algorithm. Another explanation would be that the x axis measures something other than "how good the methodology is" (e.g., rate of highly significant results), and the fact that Cueball's AI is not within the normal distribution should have been a red flag indicating a problem with their methodology, but the ranking AI didn't notice the skew / correctly interpret the meaning of the data. (However, the title text seems to indicate that the x axis was indeed labeled by "quality of methodology", albeit defining this quality by very strange criteria.)

The title text is a joke about overfitting in AI and its impact on the model. The model is likely trained on too small a set of data and behaves unpredictably when provided with novel data, e.g. unusual spacing or uncommon diacritics. The "AI tool" mentioned is akin to an adversarial network, which attempts to tweak bad data in very small ways (adding said punctuation) in order to trick its opponent AI into accepting bad data as good data.


[Cueball is standing on a podium in front of a projection on a screen and points with a stick to a bar chart histogram with a bell curve to the left and a single bar to the far right marked with an arrow.]
Cueball: Despite our great research results, some have questioned our AI-based methodology.
Cueball: But we trained a classifier on a collection of good and bad methodology sections, and it says ours is fine.

