# Difference between revisions of "Talk:2294: Coronavirus Charts"

(15 intermediate revisions by 10 users not shown) | |||

Line 5: | Line 5: | ||

I want to know if this is a random sketch with silly labels, or if Randall looked up actual data to plot it. It seems to be a combination of 4 metrics which might be reported somewhere (search popularity, death rate, total reported cases, and number of tests performed). I suspect there aren't many countries/regions for which all 4 are available, but it's conceivable that someone's published enough stats to draw this crazy plot. ¬[[User:Angel|Angel]] ([[User talk:Angel|talk]]) 01:39, 16 April 2020 (UTC) | I want to know if this is a random sketch with silly labels, or if Randall looked up actual data to plot it. It seems to be a combination of 4 metrics which might be reported somewhere (search popularity, death rate, total reported cases, and number of tests performed). I suspect there aren't many countries/regions for which all 4 are available, but it's conceivable that someone's published enough stats to draw this crazy plot. ¬[[User:Angel|Angel]] ([[User talk:Angel|talk]]) 01:39, 16 April 2020 (UTC) | ||

:What would negative results in a google search be? How do you make them a graph axis? I think its just random labels on graphs. --[[User:Lupo|Lupo]] ([[User talk:Lupo|talk]]) 05:12, 16 April 2020 (UTC) | :What would negative results in a google search be? How do you make them a graph axis? I think its just random labels on graphs. --[[User:Lupo|Lupo]] ([[User talk:Lupo|talk]]) 05:12, 16 April 2020 (UTC) | ||

+ | ::It doesn't say negative test results for a google search. It's the number of people who've tested negative for the disease, divided by the number of people who've searched google for it. I'm moderately surprised that nobody's yet started a list of links to various data soources that could be used to plot this graph. Does Google provide per-country search frequencies? ¬[[User:Angel|Angel]] ([[User talk:Angel|talk]]) 09:34, 16 April 2020 (UTC) | ||

+ | :::[https://trends.google.com/trends/explore/GEO_MAP/1587034200?hl=en-US&tz=420&date=today+3-m&q=covid&sni=3 Google Trends] is always normalized so that the data returned is in [0, 100], and denormalizing out of relative values back to raw numbers is almost impossible. The best you can do is get a unitless proportion by comparing to a second search term chosen as one which doesn't vary much over time. [[Special:Contributions/172.68.142.203|172.68.142.203]] 10:54, 16 April 2020 (UTC) | ||

+ | ::::From the docs, looks like that data is simply scaled. "A value of 50 means that the term is half as popular [as its most popular day]". Using that 0-100 number as if it were an actual number of people should give the same graph, just with the units on the X-axis offset by some value. Positioning the graphs relative to each other would be harder, as the "Interest by region" chart doesn't follow the same rules; we're lacking good data for the ratio between one country and another. [[User:Angel|Angel]] ([[User talk:Angel|talk]]) 13:48, 16 April 2020 (UTC) | ||

Is the y-axis ''(death_today + cases_aweekago)/capita'' or ''death_today + (cases_aweekago/capita)''? This would hugely effect the weighting of the two terms. (Parentheses in second interpretation are for clarity only, I know they change nothing mathematically.) [[Special:Contributions/172.69.54.9|172.69.54.9]] 09:03, 16 April 2020 (UTC) | Is the y-axis ''(death_today + cases_aweekago)/capita'' or ''death_today + (cases_aweekago/capita)''? This would hugely effect the weighting of the two terms. (Parentheses in second interpretation are for clarity only, I know they change nothing mathematically.) [[Special:Contributions/172.69.54.9|172.69.54.9]] 09:03, 16 April 2020 (UTC) | ||

+ | :Perhaps it is intentionally ambiguous to support the main point about bad charts. [[Special:Contributions/172.68.142.203|172.68.142.203]] 10:54, 16 April 2020 (UTC) | ||

+ | :I assumed the latter; but the page here seems to assume the former. Either way, one of the results will dwarf the other. [[User:Angel|Angel]] ([[User talk:Angel|talk]]) 13:48, 16 April 2020 (UTC) | ||

+ | |||

+ | |||

+ | The 19th COVID19 comic... :-) almost in a row. --[[User:Kynde|Kynde]] ([[User talk:Kynde|talk]]) 12:40, 16 April 2020 (UTC) | ||

+ | |||

+ | I tried my hand at graphing the data for the United States, in this spreadsheet here: [https://docs.google.com/spreadsheets/d/1W1ttxu9Dths5uOLOzk7VHd78hXG0EgeMkW5TCtdgtqw/edit?usp=sharing]. If anybody is motivated enough to add data from other countries, go ahead. As it is, this data doesn't really look anything like what Randall graphed, making me think that he just made up the lines. [[Special:Contributions/172.68.174.82|172.68.174.82]] 16:42, 16 April 2020 (UTC) | ||

+ | :[https://imgur.com/a/hHc1j7S OH NO!] [[Special:Contributions/172.68.143.96|172.68.143.96]] 18:43, 16 April 2020 (UTC) | ||

+ | :: Well, since the x axis doesn't graph time, there's no reason for the trend lines to be functions of x— he just chose to draw them that way. Both x and y are independent functions of t. [[Special:Contributions/172.68.174.70|172.68.174.70]] 19:11, 16 April 2020 (UTC) | ||

+ | |||

+ | I suddenly wondered if the graph means negative test results to date; or the new ones returned today. Same for the Google results, I guess. The Y-axis explicitly says it's talking about the total number of cases and today's death count, but the X-axis doesn't say for either of its values. And then that gave me the idea that "total" on the Y axis might actually mean "worldwide". So now I'm reading the Y-axis label as being (today's deaths in $country)+(worldwide infection count/population of $country). Maybe that makes the graph more useful. [[User:Angel|Angel]] ([[User talk:Angel|talk]]) 22:36, 16 April 2020 (UTC) | ||

+ | |||

+ | So did this comic not come out on 4/15 or is that just me? It seemed like all of yesterday was still the Conway Memorial comic. [[Special:Contributions/172.69.63.167|172.69.63.167]] 22:48, 16 April 2020 (UTC) Acolyte | ||

+ | :i thought so too! is this the first time in ages that randall missed a day? maybe someone wants to add this to a trivia section. [[User:Gir|-- //gir.st/]] ([[User talk:Gir|talk]]) 23:01, 16 April 2020 (UTC) | ||

+ | ::I saw this comic on 4/15 (late in the afternoon/early evening PDT). According to Randall [https://xkcd.com/archive/ here], it was posted on 4/15. [[Special:Contributions/172.69.34.104|172.69.34.104]] 00:19, 17 April 2020 (UTC) | ||

+ | :::looking at the first capture in the internet archive (https://web.archive.org/web/20200415230401/https://xkcd.com/2294/), it was indeed posted on the 15th -- albeit at 23:04:01. [[User:Gir|-- //gir.st/]] ([[User talk:Gir|talk]]) 13:53, 17 April 2020 (UTC) | ||

+ | |||

+ | I've removed the remark that logarithmic scale axes "would not have evenly spaced ticks as shown", as it is incorrect. when the marks are 10, 100, 1000, ... the marks are evenly spaced. [[User:Gir|-- //gir.st/]] ([[User talk:Gir|talk]]) 23:00, 16 April 2020 (UTC) | ||

+ | |||

+ | For those of you interested in the difficulties experienced by epidemiology under the embarrassment of riches allowed by contemporary big data, please see [https://cmmid.github.io/topics/covid19/severity/global_cfr_estimates.html this working draft on the sufficiency of testing.] [[Special:Contributions/172.69.22.146|172.69.22.146]] 23:59, 16 April 2020 (UTC) | ||

+ | |||

+ | There's a [https://www.clevelandfed.org/newsroom-and-events/publications/cfed-district-data-briefs/cfddb-20200408-getting-to-accuracy.aspx graph] from an economist at the Cleveland Federal Reserve Bank that may have been an inspiration for this comic--it has log scales and a difficult to decipher X-axis that is only vaguely time-like. Also [https://statmodeling.stat.columbia.edu/2020/04/10/a-better-way-to-visualize-the-spread-of-coronavirus-in-different-countries/ discussion here]. | ||

+ | --[[User:DanR|DanR]] ([[User talk:DanR|talk]]) 15:13, 17 April 2020 (UTC) |

## Latest revision as of 15:13, 17 April 2020

It must be because there aren't any numbers along the axes 172.69.34.104 23:53, 15 April 2020 (UTC)

I want to know if this is a random sketch with silly labels, or if Randall looked up actual data to plot it. It seems to be a combination of 4 metrics which might be reported somewhere (search popularity, death rate, total reported cases, and number of tests performed). I suspect there aren't many countries/regions for which all 4 are available, but it's conceivable that someone's published enough stats to draw this crazy plot. ¬Angel (talk) 01:39, 16 April 2020 (UTC)

- What would negative results in a google search be? How do you make them a graph axis? I think its just random labels on graphs. --Lupo (talk) 05:12, 16 April 2020 (UTC)
- It doesn't say negative test results for a google search. It's the number of people who've tested negative for the disease, divided by the number of people who've searched google for it. I'm moderately surprised that nobody's yet started a list of links to various data soources that could be used to plot this graph. Does Google provide per-country search frequencies? ¬Angel (talk) 09:34, 16 April 2020 (UTC)
- Google Trends is always normalized so that the data returned is in [0, 100], and denormalizing out of relative values back to raw numbers is almost impossible. The best you can do is get a unitless proportion by comparing to a second search term chosen as one which doesn't vary much over time. 172.68.142.203 10:54, 16 April 2020 (UTC)
- From the docs, looks like that data is simply scaled. "A value of 50 means that the term is half as popular [as its most popular day]". Using that 0-100 number as if it were an actual number of people should give the same graph, just with the units on the X-axis offset by some value. Positioning the graphs relative to each other would be harder, as the "Interest by region" chart doesn't follow the same rules; we're lacking good data for the ratio between one country and another. Angel (talk) 13:48, 16 April 2020 (UTC)

- Google Trends is always normalized so that the data returned is in [0, 100], and denormalizing out of relative values back to raw numbers is almost impossible. The best you can do is get a unitless proportion by comparing to a second search term chosen as one which doesn't vary much over time. 172.68.142.203 10:54, 16 April 2020 (UTC)

- It doesn't say negative test results for a google search. It's the number of people who've tested negative for the disease, divided by the number of people who've searched google for it. I'm moderately surprised that nobody's yet started a list of links to various data soources that could be used to plot this graph. Does Google provide per-country search frequencies? ¬Angel (talk) 09:34, 16 April 2020 (UTC)

Is the y-axis *(death_today + cases_aweekago)/capita* or *death_today + (cases_aweekago/capita)*? This would hugely effect the weighting of the two terms. (Parentheses in second interpretation are for clarity only, I know they change nothing mathematically.) 172.69.54.9 09:03, 16 April 2020 (UTC)

- Perhaps it is intentionally ambiguous to support the main point about bad charts. 172.68.142.203 10:54, 16 April 2020 (UTC)
- I assumed the latter; but the page here seems to assume the former. Either way, one of the results will dwarf the other. Angel (talk) 13:48, 16 April 2020 (UTC)

The 19th COVID19 comic... :-) almost in a row. --Kynde (talk) 12:40, 16 April 2020 (UTC)

I tried my hand at graphing the data for the United States, in this spreadsheet here: [1]. If anybody is motivated enough to add data from other countries, go ahead. As it is, this data doesn't really look anything like what Randall graphed, making me think that he just made up the lines. 172.68.174.82 16:42, 16 April 2020 (UTC)

- OH NO! 172.68.143.96 18:43, 16 April 2020 (UTC)
- Well, since the x axis doesn't graph time, there's no reason for the trend lines to be functions of x— he just chose to draw them that way. Both x and y are independent functions of t. 172.68.174.70 19:11, 16 April 2020 (UTC)

I suddenly wondered if the graph means negative test results to date; or the new ones returned today. Same for the Google results, I guess. The Y-axis explicitly says it's talking about the total number of cases and today's death count, but the X-axis doesn't say for either of its values. And then that gave me the idea that "total" on the Y axis might actually mean "worldwide". So now I'm reading the Y-axis label as being (today's deaths in $country)+(worldwide infection count/population of $country). Maybe that makes the graph more useful. Angel (talk) 22:36, 16 April 2020 (UTC)

So did this comic not come out on 4/15 or is that just me? It seemed like all of yesterday was still the Conway Memorial comic. 172.69.63.167 22:48, 16 April 2020 (UTC) Acolyte

- i thought so too! is this the first time in ages that randall missed a day? maybe someone wants to add this to a trivia section. -- //gir.st/ (talk) 23:01, 16 April 2020 (UTC)
- I saw this comic on 4/15 (late in the afternoon/early evening PDT). According to Randall here, it was posted on 4/15. 172.69.34.104 00:19, 17 April 2020 (UTC)
- looking at the first capture in the internet archive (https://web.archive.org/web/20200415230401/https://xkcd.com/2294/), it was indeed posted on the 15th -- albeit at 23:04:01. -- //gir.st/ (talk) 13:53, 17 April 2020 (UTC)

- I saw this comic on 4/15 (late in the afternoon/early evening PDT). According to Randall here, it was posted on 4/15. 172.69.34.104 00:19, 17 April 2020 (UTC)

I've removed the remark that logarithmic scale axes "would not have evenly spaced ticks as shown", as it is incorrect. when the marks are 10, 100, 1000, ... the marks are evenly spaced. -- //gir.st/ (talk) 23:00, 16 April 2020 (UTC)

For those of you interested in the difficulties experienced by epidemiology under the embarrassment of riches allowed by contemporary big data, please see this working draft on the sufficiency of testing. 172.69.22.146 23:59, 16 April 2020 (UTC)

There's a graph from an economist at the Cleveland Federal Reserve Bank that may have been an inspiration for this comic--it has log scales and a difficult to decipher X-axis that is only vaguely time-like. Also discussion here. --DanR (talk) 15:13, 17 April 2020 (UTC)