2304: Preprint

Explain xkcd: It's 'cause you're dumb.
Revision as of 08:14, 17 October 2022 by (talk) (Explanation: Further fixed (Oxford Commas are more trouble than they are worth, don't bother with them.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
DOWNSIDES: Adobe people may periodically email your newsroom to ask you to call it an 'Adobe® PDF document,' but they'll reverse course once they learn how sarcastically you can pronounce the registered trademark symbol.
Title text: DOWNSIDES: Adobe people may periodically email your newsroom to ask you to call it an 'Adobe® PDF document,' but they'll reverse course once they learn how sarcastically you can pronounce the registered trademark symbol.


This comic is about how media reports non-peer-reviewed research papers. The newscaster depicted is attempting to report breaking news based on information in a study; however, the study in question has not been formally published. This leads to uncertainty on the part of either the newscaster, Blondie, or her scriptwriters as they try to determine how to refer to this study, represented here by alternative introduction lines being scribbled out.

Randall suggests that, instead of explaining that the paper was in preprint, or unpublished or submitted to a preprint server and not peer-reviewed, the newscaster could simply say it was a PDF. PDF (Portable Document Format) is a file format for documents developed by Adobe to be used independent of application software, hardware and operating systems. Randall proceeds to lists several benefits of using "PDF":

  • The use of terms such as "preprint" makes statement about its publication status, which might be based on inaccurate information or even be in the process of changing as the news goes out; in contrast, proclaiming it to be a PDF document is an unambiguously factual statement. Additionally, "preprint”, "peer review" and related terminology are not familiar to most people who are not academics.
  • Referring to the PDF document directly also prevents individuals from making assumptions that the one responsible knows and has verified what they're doing - or, in contrast, that the information is automatically false based on the grounds that it hasn't yet been officially published.
  • The comic finishes with a jab at PDF itself, proclaiming that no ordinary person would voluntarily choose a PDF file as their medium of communication. Ordinary people use the default file format of whatever word processor or text editor they use, but PDF files are not very convenient to edit, so they're generally only used for final versions of documents that are ready to print or distribute, following a dedicated export or conversion process.
    • This is similar to Randall's declaration in the comic 1301: File Extensions that ".pdf" is the second-most-trustworthy file extension. As it happens, he says that the most trustworthy file extension is ".tex", so perhaps the news anchor could specify that the PDF was "compiled from LaTeX" (if this is true) to imply additional legitimacy.

The title text makes fun of what is incorrectly believed to be the official name of PDF; it is now an open international standard (ISO 32000-1), and the only PDF files that are "Adobe Acrobat files" or "Adobe PDF" files are those created using Adobe Systems' software. Further, Adobe does not use the ® designation in conjunction with PDF. (See Adobe Trademark Guidelines, 1 Nov. 2014) Adobe trademark guidelines were also made fun of here.

Since so many applications can create and even edit PDF files, implying a connection with Adobe every time someone talks about one is preposterous, and one could sarcastically pronounce the registered trademark symbol to show contempt for the fact that it is a registered trademark.

This comic was possibly produced in response to the preprint study "COVID-19 Antibody Seroprevalence in Santa Clara County, California", Bendavid et al, which was posted online in mid-April 2020 before peer review. The authors of the paper went on a media blitz immediately after posting it, appearing on major cable news networks and writing editorials in major publications, claiming that their results show that COVID-19 is not nearly as bad as thought and that most people are already immune to it. Other scientists have pointed out that, if the very high false-positive rate of the test used and the sample bias of their methodology (testing only people who self-report as sick) are properly considered in the analysis, the data collected is such poor quality as to be meaningless, with properly applied error bars on the number of actual cases in the general population extending below 0. Nonetheless, many less-scientifically-literate politicians, media figures, and protest groups continue to use the much-criticized study as proof that COVID-19 should not be considered an emergency, and that quarantine measures should be cancelled. As of May 11 2020, the study has still not passed peer review, nor undergone any revisions since the first posting.


[Blondie as a newscaster is sitting at a desk. To the right is a screen with text, the bottom word is a thin line making the letters white. Just above her head is what she says as her opening line for her news story. But above this text, is more text which have been grayed out and scribbled over. This are three other alternative opening lines which she did not use, indicating revisions to her script.]
Blondie [gray and scribbled out]: According to a new preprint…
Blondie [gray and scribbled out]: …An unpublished study…
Blondie [gray and scribbled out]: According to a new paper uploaded to a preprint server but which has not undergone peer review…
Blondie: According to a new PDF…
Inset graphic: Breaking news
[Beneath the panel is a long caption consisting of an underlined headline with three bulleted points beneath it:]
Benefits of just saying "a PDF":
  • Avoids implications about publication status
  • Immediately raises questions about author(s)
  • Still implies "this document was probably prepared by a professional, because no normal human trying to communicate in 2020 would choose this ridiculous format."


  • This comic made Blondie the character that has most often presented a news anchor, as this became her ninth appearance in this role.

comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!


I was going to mention the TeX format(/family), but someone got in there before me. So how about if it's a .wp4 document? ;) 01:40, 9 May 2020 (UTC)

But now the LaTeX reference is removed, anyway. 16:14, 9 May 2020 (UTC)

Why is this comic labeled as a Saturday comic? I don't know what timezone you use, but it was posted Friday, well before midnight UTC. 02:15, 9 May 2020 (UTC)

I'm pretty sure that's just an error. The date for the comic in the archive is "2020-5-8", which is today (Friday). Comic #2303 correctly has the "Wednesday comic" category, and the archive lists its date as 2020-5-6 (which is Wednesday). ...And I've fixed it now. The category is automatically generated based on the date listed in the Template:Comic infobox at the top of the article; someone incorrectly entered it as "May 9, 2020" instead of "May 8, 2020". --V2Blast (talk) 02:53, 9 May 2020 (UTC)
'Someone' == DgbrtBOT; and thus probably based off the time() it thinks it is, upon autocreating the base article, rather than any human erring. Depending on the home system's timezone, it probably was Saturday for DB, if not for Randall. Maybe an offset/correction/relocali(s|z)ation should be put into the code, but it seems to normally work out Ok and this comic might have been just over a threshhold... (edit: Wiki time in history seems to be UTC, for me at least - I'm in UTC+1/BST but as an IP-editor I haven't made any setting changes to my personal login that I don't have. DgbrtBOT piped up at 22:48, which at UTC+2 or more (Central Europe Daylight Savings, which matches what I recall of knowing about that entity, or anywhere more Easterly) would have been 'tomorrow', and I didn't spot the new comic until at least those dozen minutes after that which occured before my own clocks ticked past midnight. Given that Randall is (usually?) In UTC-5, or UTC-4 when daylight savings is established, maybe Dgbrt needs a special offset of -6 hours (or go directly via localtime() with the best current known Munroevian locale specified) in calculating things. Or we can let the community smooth these things out like we just did when a possible late-evening update causes this to be an issue?) 03:17, 9 May 2020 (UTC)

Is "sarcastically pronouncing the registered trademark symbol" meant as pronouncing it "arr" in the way pirates talk? Bischoff (talk) 15:00, 9 May 2020 (UTC)

I would expect professional news anchors can come with something even more sarcastic. -- Hkmaly (talk) 01:08, 10 May 2020 (UTC)
Perhaps they'd go with something like "R in a circle" or "Circled R" (pronounced "Circledar"). PotatoGod (talk) 17:27, 10 May 2020 (UTC)
Perhaps we can use a little of both and create a new standard for sarcastically pronouncing it as "circled, arrr!" Iggynelix (talk) 12:05, 11 May 2020 (UTC)
ReGiStErEd TrAdEmArK! 20:34, 11 May 2020 (UTC)
I thought it was meant to be read as "Ado-bear" - but then again, English is not my first language:)

In 2020 I use pdf to put documents with tables onto a website, because html exports from editors are voluminous and brittle. 10:32, 10 May 2020 (UTC)

As someone who regularly takes tables from PDF in order to put them into spreadsheets for further use, some people don't do me any favours by that method. Among the problems, if the table setter didn't pay attention to the column widths then the copied-out text of two adjacent cells that don't appear to overlap each other will interlace at a character level and need editing back to separate entites. And then there's the inconsistencies of Header rows atop the table and/or atop the next newpage the table splits over. I could run a quick script on (X)HTML tables, and get it perfectly for my needs. CSV, or even TabSV, would actually be my preferred transport format (i.e. no format, just pure layout without even spanned/merged cells, and I can redo what needs redoing on the final redo), but I can't ever seem to get them to do that for me despite having the data almost in that form prior to the PDFing... Grrrr. 11:30, 10 May 2020 (UTC)
I feel your pain. I receive pdf documents from a financial professional, where an A4 landscape page seems to have about five two-column-wide tables side-by-side, and I'm still deciding what kind of manipulation to do, to get it into CSV and do some analysis. 10:21, 12 May 2020 (UTC)
If the PDFing hasn't ruined the groupings/precedence, like it often does, try mouse-selecting each table, to copy and paste into notepad or equivalent. Sometimes that works well enough to create tab delimited elements (other times, it line-feeds between columns as well as rows, but still can be reconstructed) and then that'll paste into a spreadsheet (or be parsable with a script) better than any Paste Special (using "no textformat" options?) straight into a grid. Sometimes you need to fiddle a bit with the notepad text, but depending on the data that might be doable with a few choice find+replace runs, perhaps upon consecutive table-pastings to save you time repeating yourself. Or not. 00:08, 13 May 2020 (UTC)

I think Randall's last point (no unprofessional humans use PDFs in 2020) is very wrong. Especially due to the coronavirus, all college classes have switched to online assignment submissions, and the teachers only accept PDF submissions (although, annoyingly, they give the original template files in .doc format!) I would NOT trust random college student's assignment submissions as a reputable information source! PotatoGod (talk) 17:22, 10 May 2020 (UTC)