2239: Data Error

Explain xkcd: It's 'cause you're dumb.
Revision as of 15:40, 10 December 2019 by (talk) (Explanation)
Jump to: navigation, search
Data Error
Cyanobacteria wiped out nearly all life on Earth once before, and they can do it again!
Title text: Cyanobacteria wiped out nearly all life on Earth once before, and they can do it again!


Ambox notice.png This explanation may be incomplete or incorrect: Created by some anomolous perfectly normal algae. Please mention here why this explanation isn't complete. Do NOT delete this tag too soon.
If you can address this issue, please edit the page! Thanks.
Megan is frustrated that a data error invalidates her research, which was ready for publication. Black Hat suggests two options: redo her analysis and share the correct results, trying to extract some value from the research; or, as our classic classhole says, she should destroy the evidence, build a superweapon and dominate the world. She seems excited about this idea (and very aware it isn't a truly thoughtful idea), and proclaims that people should fear her algae, which would probably be her superweapon. She then feigns to remember her first research was incorrect because of the data error, knowing all along they are merely normal algae.

In fact, destroying the evidence, hiding the error and publishing the wrong results as if they were right is what a dishonest scientist would do in such a situation. This is what a lot of readers would expect such a dishonest character as Black Hat to suggest in panels two and three. However, the unexpected turn in mid of last panel changes leaves scientific misconduct to go to pure supervillain. It appears Megan is being a wise-ass, knowing full well her algae won't threaten Earth, especially considering it turned out to be NORMAL algae.

The title text refers to the Great Oxidation Event, when prokaryotic photosynthetic organisms built up oxygen in Earth's atmosphere for the first time and most organisms, which weren't adapted to oxygen, went extinct. That suggests that algae may be somehow dangerous - although cyanobacteria, which are colloquially referred to as "blue-green algae", are not considered to be true algae by many scientists, who restrict the term to eukaryotes.


Ambox notice.png This transcript is incomplete. Please help editing it! Thanks.
[Black Hat and Megan stand facing each other.]
Megan: I can't believe this data error invalidates a year and a half of my research.
Megan: I was about to publish.

[:Black Hat and Megan stand facing each other. Black Hat has his hands raised slightly]

Black Hat: Don't panic. You have two options.
Megan: Yeah?
[Closeup shot of Black Hat holding one hand up.]
Black Hat: 1) Redo your analysis and share whatever results you can, whether positive or negative. It's disappointing, but these things happen.
[Zoom out on Black Hat and Megan. Black Hat has closed his fist. Megan holds her arms up in the air.]
Black Hat: 2) Destroy the evidence. Use your materials and research methods to build a superweapon. Conquer Earth and rule with an iron fist.
Megan: Tremble before my anomalously productive algae!
Megan: Except the anomaly was an artifact.
Megan: Tremble before my normal algae!

comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!


Randall's comics are usually relevant to recent events on or near the day comics are posted. I was wondering if this Data Error comic might be referencing some recent event, some data error at NASA or something. Does anyone know what it might be in reference to? 21:13, 9 December 2019 (UTC) ... Sorry, forgot to sign in. Saibot84 21:14, 9 December 2019 (UTC)

I'm not aware of anything in the news. However, this is not the first time Randall has commented on research publication in a comic, so I suspect it's just another in that series. It seems obvious that he feels the first option is the appropriate choice, and the second option is the joke. Ianrbibtitlht (talk) 21:22, 9 December 2019 (UTC)
I believe there was a relatively recent issue where a Python script used for processing data-sets made assumptions about the order in which data files would be returned by the host operating system that turned out to not always be true, throwing the results of several analyses off. Could he be referring to that? The scripts in question were used for obtaining results into cyanobacteria studies... https://arstechnica.com/information-technology/2019/10/chemists-discover-cross-platform-python-scripts-not-so-cross-platform/ 15:03, 13 December 2019 (UTC)

I think the stickwoman is not "excited" but sarcastic, although you can't be sure in text. It is a joke based on the discrepancy in capabilities between real scientists and fictional mad scientists. 22:23, 9 December 2019 (UTC)

I agree, Megan is being a smart-ass 15:46, 10 December 2019 (UTC)
For start, "mad scientists" are usually more like mad engineers ... you can't get world domination by researching something and writing paper about it, you need to USE that research, usually by building something. -- Hkmaly (talk) 23:10, 9 December 2019 (UTC)
Are you suggesting scientists can't build things? I don't actually know, since I'm an engineer! Ianrbibtitlht (talk) 23:43, 9 December 2019 (UTC)

What is a data error in general? Explain me a term :) 02:39, 10 December 2019 (UTC)

The discovery that the data you used was sampled below the Nyquist frequency pretty much kills your thesis until you can get data that was properly acquired. All your results will be contaminated with artifacts produced by the sampling rate, rather than by variations in the quantity that you imagined you were observing. 12:37, 10 December 2019 (UTC)
I thought I knew what a data error is, but after that reply I'm not sure - although I'm almos sure that it did not help the one asking the question ;-) --Kynde (talk) 15:55, 10 December 2019 (UTC)
Well, that is a type of data error (bad sampling technique), but not the only type. The data itself could have had corruption problems, such as maybe some rogue second species of algae contaminated the samples, etc. 21:39, 10 December 2019 (UTC)
Also, malfunctioning or miscalibrated measuring equipment (transducers, cabling, etc.) would be another type of data error. Ianrbibtitlht (talk) 22:17, 10 December 2019 (UTC)
More about data errors. Yes, I listed just one kind, and a fellow I knew had to re-do his thesis because of that particular error. The careful researcher investigates many possible sources of error. The poor researcher simply throws away the data points that do not match his preconceptions. HERE WE GO, enumerating some errors: (1) Noise from physically sloppy equipment. (2) Lack of calibration of measuring device. (3) Device loses calibration over time. (4) Manually recorded data errors, such as transposed digits. (5) Incorrect assumptions of linearity in the design of measurement. (6) Failure to record crucial environmental parameters. [That's just six minutes of thinking. Surely there are others.]
Yes, I omitted an important source of error: Sabotage! You're not paranoid, someone really is messing with your data. 01:34, 11 December 2019 (UTC)
So, a data error is an error in your data, instead of in your analysis? 11:35, 11 December 2019 (UTC)

If it were merely an error in analysis (see the recent mess with python, [1] ), then you simply fix your analysis code and re-run. So, yes, a "data error" means the original data values were flawed or invalid or whatever. Most likely sabotage inflicted by sophons. Cellocgw (talk) 12:29, 11 December 2019 (UTC)

I'm happy that he said "two options" instead of "two choices", which of course would involve around four options. Watching the horrific Star Trek: Discovery for completist purposes, I was annoyed when someone said "you have only one alternative" when they meant "you have only one option". — Kazvorpal (talk) 18:39, 22 January 2020 (UTC)