2341: Scientist Tech Help

Explain xkcd: It's 'cause you're dumb.
Revision as of 01:16, 4 August 2020 by (talk) (Explanation: wlink)
Jump to: navigation, search
Scientist Tech Help
I vaguely and irrationally resent how useful WebPlotDigitizer is.
Title text: I vaguely and irrationally resent how useful WebPlotDigitizer is.


Ambox notice.png This explanation may be incomplete or incorrect: Created by a WASP-INFESTED LAB. An explanation for WebPlotDigitizer is needed. Do NOT delete this tag too soon.
If you can address this issue, please edit the page! Thanks.

In this comic, Randall pokes fun at stereotypes of scientists that "tech people" hold.

In the first panel, Randall presents an idealized view of the tasks of tech people. Large portions of machine learning and data science hinge around finding a pattern (either regression or classification) in a given data set, but the more common, real-world problem is in data cleaning and preparation. For the most part, the rest can be done with preexisting implementations. These types of tasks are those that tech people would expect to perform.

The second panel, however, presents a different reality. Because wasps had infested the lab, the scientists had to take photos of their equipment through the window. This created a much more fundamental problem of data format than normal (image vs spreadsheet, as opposed to something more normal like pixel-wise vs vertex-based segmentation).

A Polaroid is a brand of instant camera, though "Polaroid" is often used to refer to instant cameras in general. Excel is referring to Microsoft Excel, a spreadsheet creation program.

The title text refers to WebPlotDigitizer, a tool which may be used on visual displays of data such as graphs and charts in order to extract the underlying data. This tool would have the potential to solve the problem which the scientists have by extracting data from the images taken of the equipment. Randall acknowledges the usefulness of the tool, but also expresses some dislike for it, perhaps out of the feeling that it is too powerful and leaving him little work to do.


Ambox notice.png This transcript is incomplete. Please help editing it! Thanks.

There are two panels. The one on the left is titled "What tech people think scientists need help with:" (Cueball A, Ponytail, and Megan are facing Cueball B and Hairbun.)

Ponytail: Please- our data, it's too complex! Can your magical machine minds unearth the patterns that lie within?

(Cueball B raises his finger.)

Cueball B: We shall marshal our finest algorithms!

The one on the right is titled "What scientists actually need":

(The two Cueballs, Ponytail, Megan, and Hairbun are in the same position as before.) Ponytail: For a few weeks in June, the lab was infested by wasps, so we had to take pictures of the equipment through the window. How do you get graphs from a Polaroid photo into Excel?

comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!


First. Goodbye, world! (talk) 23:19, 3 August 2020 (UTC)

     But more importantly, I added a transcript and added definitions for a Polaroid and Excel. Also, how should I deal with multiple Cueballs in the transcript? Goodbye, world! (talk) 23:35, 3 August 2020 (UTC)
I don't think it is 2 Cueballs. I think the one on the right is Cueball and I don't recognise the other one. He is drawn slightly differently, he's got a bit of a butt-head (crack-head?). Xseo (talk) 07:23, 4 August 2020 (UTC)

I know of a team whose data was in the form of images - tens of thousands of them. Somehow during a pre-processing step they lost the exif data for the image files - which held the only digital link between the image file which had names assigned by the cameras like Img237856.png and their science which needed things like date and time of the image..... Fortunately the image itself had the date and time in a banner across the bottom 100 pixels. Managed to read the banner using OCR and tesseract. Not so very far off the thrust of this comic! 00:08, 4 August 2020 (UTC)

I feel old when I know that Polaroid was not a disposable camera; it was an instant camera, meaning that the picture was taken, the film was slowly ejected from the camera body and you held the picture as it developed before your eyes. There were one-time use cameras, or "disposable" cameras, that were made cheaply and the camera was sent in for processing. Yes, probably incomprehensible to one so young to not know what a rotary dial desk phone (or wall phone) was. Doubting Thomas (talk) 00:41, 4 August 2020 (UTC)

I think the resentment stems from the ugly truth that such tool is needed in the first place? Is that a possibility? 01:48, 4 August 2020 (UTC)

Don't the scientists own the data since they collected it on their own equipment?Nk1406 (talk) 13:51, 4 August 2020 (UTC)

"As you can see from the graphs, we detected significant Gravity Wave events on average once every 30-40 days for the whole two years of the observations, except for this short period where we seemed to get a consistently low level of background noise hum, that we have yet to fully connect with any of our existing astrophysical theories..." 10:17, 4 August 2020 (UTC)

A serious suggestion: instead of webplotdigitizer, if you want to grab data off a chart image, get the java-based DataThief, https://datathief.org/ . It's fast, very customizable, can handle a certain amount of image distortion, i.e. X and Y axes not perpendicular in the crappy image your uncle sent you. Cellocgw (talk) 10:42, 4 August 2020 (UTC)

I thought that the title text meant that webplotdigitizer is being recommended in this sintuation, and that past recommendations for similar problems were ignored. They irrationally hold out hope that the software will be used and remembered by the scientists. Operating the software is also not the interesting challenge the tech people were hoping to be presented to them. 18:10, 4 August 2020 (UTC)

Very shortly after this comic published I started seeing several articles about how geneticists recently renamed several genes so they would stop auto-formatting as dates in Excel. I wonder if Randall knew this before he drew the comic, and it was commentary on that, or if by amazing coincidence the world spewed out the perfect example of the scenario he was pointing out after the fact. For example, this Engadget article. 19:07, 9 August 2020 (UTC)

I have worked in many labs where an exposure is taken, or a photo even, or instruments are analog readouts that must be digitized. I have only used imageJ for this. I realize it says "graphs" (and also that the photos I took were with a digital camera, not poloroid) but there are examples of physical graphs- an old school temp tracer for instance. Or are those all charts? I'm not so pedantic these days, just a dumb labrat. anyway sorry, i don't know how to add comments here sorry for probably screwing something up (unregistered user- hi i'm Gian!) 16:00 hundred hours, 17 August 2020 (I'm using local US west coast time, lol)