Editing 2599: Spacecraft Debris Odds Ratio

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 8: Line 8:
  
 
==Explanation==
 
==Explanation==
This comic is a misunderstanding of statistics very similar to that of [[1252: Increased Risk]]. It suggests that going outside for more than 5 hours per day significantly increases your risk of head injury from falling spacecraft, and advises to limit outside activity to avoid this risk.  
+
{{incomplete|Created by an EVENS RATIO - Do NOT delete this tag too soon.}}
 +
This comic is a misunderstanding of statistics very similar to that of [[1252: Increased Risk]]. It explains that going outside for more than 5 hours per day significantly increases your risk of head injury from falling spacecraft, and advises to limit outside activity to avoid this risk.  
  
The data are apparently based on a {{w|Monte Carlo Method|Monte Carlo simulation}}, a computational method that uses input values randomly drawn from a given distribution and which repeats that calculation many times; the distribution of the outputs is then analyzed. This method is used to determine the possible outcomes (and their respective probabilities) for a given scenario. Basically, instead of doing hard math to calculate the outcomes you let a computer repeat the scenario for a huge number of different input values and watch what happens.
+
However, since the odds of being hit in the head by (any part of) a falling spacecraft are astronomically low to begin with [https://www.livescience.com/33511-falling-nasa-satellite-uars-risk.html], quadrupling it or more still results in a negligible probability. The horizontal error bars for times greater than 4 hours are marked with asterisks to indicate they are significantly different from the reference value at 0 hours, as indeed those error bars don't overlap the vertical line for the 0-hours reference value.
In this case, the study might have consisted of defining the baseline probability of spacecraft debris falling from the sky in a given time frame (say, 1% every minute) as well as the probability that it is heavy enough to break through the roof (say, also 1%), translating this to the output of a random number generator (e.g. "1" means "space debris falls in direction of head and can break through the roof", 2-100 means "space debris falls in direction of head  but can't go through a roof" and values 101-10000 mean "no danger from space debris"), adding another random number generator to simulate the distributions for "person is outside X hours of the day", then drawing numbers repeatedly from both distributions and calculating the outcome for each instance.
 
  
Doing a Monte Carlo simulation for a hypothetical and rare scenario like this can make sense: it is so rare for humans to be struck by spacecraft debris that an absurdly large sample size, involving tens of millions of participants over several decades, would be necessary to obtain significant experimental data.
+
{{w|Error bar}}s are graphical representations of the variability of data and used on graphs to indicate the error or uncertainty in a reported measurement.
However, the statistical analysis and presentation of the data is horribly misleading and sensationalizing. The comic essentially pokes fun at the way that data can be misrepresented and exaggerated using an example that people  would realize is absurdly unlikely.  
 
  
The results are presented not as an overall probability but rather as an {{w|Odds_ratio|odds ratio}} of the probabilities. The odds ratio is defined as p(A happens in presence of B)/p(A happens in absence of B), which here would be p(space debris head injury after Xh spent outside and 24-Xh inside)/p(space debris head injury after 24h spent inside). The resulting value tells you how much more likely an outcome becomes if you do (or have) A. E.g. the bottom line of the graph in the comic means that spending 11+ hours outside will make it 3 times as likely to get a head injury from space debris compared to not being outside at all.
+
Presenting the data by hour brackets hides the data distribution inside each bracket. If the data were presented hour by hour, and not by groups of hours, they may show a different threshold of increased risk or no threshold (odds ratio could be linear).
However, while odds ratios can be useful they tend to hide the scale of a probability - e.g. 0.00000000002%/0.00000000001% = 2, the outcome became twice as likely but the probability only rose by 0.00000000001%. And since the odds of being hit in the head by (any part of) a falling spacecraft are [https://www.livescience.com/33511-falling-nasa-satellite-uars-risk.html astronomically ([[559: No Pun Intended |no pun intended]]) low to begin with], even quadrupling it still results in a negligible probability.
 
  
The choice of hour brackets instead of a linear time scale is suspicious. Monte Carlo simulations involve a huge number of computations; the scientists should have more than enough data to plot the odds ratio for every additional hour spent outside. Moreover, each hour bracket has a different size - why didn't they use a regular binning like e.g. 1-3, 4-6, 7-9, 10-12? One might suspect that they wanted to conceal inconsistencies and that the underlying data points by themselves don't look nearly as convincing.
+
The graph and error bars are based on a {{w|Monte Carlo Method|Monte Carlo simulation}}, a type of computational algorithm that uses repeated random sampling to obtain the likelihood of a range of results of occurring; see, for instance, this article about [https://www.ibm.com/cloud/learn/monte-carlo-simulation Monte Carlo simulations]. Additionally, this may indicate that the entire study was conducted via a monte carlo simulation and that no real data was collected adding to the absurdity of the claim that more time spent outside could lead to an increased risk of head injuries due to falling space craft. Indeed, it is so rare for humans to be struck by spacecraft debris that a simulation is probably the only way to study the risk; an absurdly large sample size, involving tens of millions of participants over several decades, would be necessary to obtain significant experimental data.
Moreover, range-based groups of any kind should never be analyzed as if they were independent categories. Spending 5 hours outside is not intrinsically different from spending 1 hour outside - the 5-hour-mark (presumably) doesn't suddenly turn humans into space-debris magnets. The likelihood of space debris falling down at any given moment stays the same and the cumulative (i.e. summed-up) probability should increase at a constant rate. Instead of comparing every hour bracket to the same baseline reference, each should each be compared to the next-lowest value.  
 
  
The error bars (the lines extending from the points in the graph) are HUGE compared to the effect they measured. Error bars define the range in which the true value might be - here, for 2-4 hours the true value could be an increase by 2, or a small DEcrease of the probability.  
+
The specific reference to falling spacecraft is likely inspired by events happening around the time of this comics release (March 2022).  Around a month before this was posted, the head of the Russian space agency, {{w|Roscosmos}}, warned that sanctions against Russia (mostly those over the {{w|2022 Russian invasion of Ukraine}}) could result in the {{w|International Space Station}} crashing. Since the Russian section of the space station is the one that provides propulsion (although it is built to rely on the power generated by the other sections), this was taken seriously and as of when this was posted, {{w|NASA}} was trying to come up with alternative stabilization strategies in case the situation worsened. There was also a recent [https://www-uol-com-br.translate.goog/tilt/noticias/redacao/2022/03/17/parte-do-foguete-spacex-e-encontrada-por-morador-do-pr.htm?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=pt-BR&_x_tr_pto=wapp report] of some 600 kg space rocket debris found in Brazil.  
  
[[1429: Data|The data]] are shown on a [[1162: Log Scale|log scale]]. Logarithmic scales are used when you have both very small and very large values and want to depict their relative differences in a single plot without making the small values look like zero or cutting off the large values. The data shown here do not have huge differences - there is no good reason for using a log scale. However, the log scale is conveniently chosen to make the error bars look like they have the same length. They do not. The error bar for the last data point is actually twice as large as that for the first data point.
+
The title text makes a similar joke. While the increase in chances of death by a bear attack are greater when going outside than the decrease in chances of death by cardiovascular disease, by getting out to exercise, it is incorrect to combine them in this way, since cardiovascular disease has a much higher starting chance of death, and reducing it by 30% has a much more significant effect on overall life expectancy than quadrupling the very very small chance of death by bear attack.
  
 +
The "280% increase" of the title text is also an error, though perhaps not for reasons that are obvious at first (for instance, the correct calculation is not "300% − 30% = 270%"). To "increase by 300%" means multiplying the probability by (1 + 3.0) = 4.0, while to "decrease by 30%" means multiplying by (1 − 0.3) = 0.7. Combining these means multiplying by both, for an overall change of 4.0 × 0.7 = 2.8, or 280%. However, this result means the risk has increased ''to'' 280% of its old value, not ''by'' 280%. And in any case, it is ''still'' not valid to simply combine two changes in wildly different risks like this.
  
The title text continues the misuse of statistics by insinuating that a 30% decrease of cardiovascular disease resulting from going outside (and exercising) is outweighed by a simultaneous 300% increase of risk of being killed by a bear. As shown in [[1102: Fastest-Growing]], the percentage increase/decrease alone of something has little meaning; the context of the original size is needed to evaluate how impressive the change really is. And in this case, the probability of dying from a cardiovascular disease is much, MUCH higher than the probability of being attacked and killed by a bear, so the moderate decrease of the former has much more impact on one's overall life expectancy than even a huge increase of the latter (unless you live in an area that has many bears, in which case your best bet is to take appropriate precautions rather than to never go outside at all).
+
===Odds & Odds Ratios===
  
The "280% increase" of the title text is also an error, though perhaps not for reasons you might assume at first glance (the correct calculation is not "300% − 30% = 270%"). To "increase by 300%" means to add 300% on top of the original 100% (=400%, so multiplied by 4), while to "decrease by 30%" means to remove 30% from the original 100% (=70%, so multiplied by 0.7). Combining these (which is very very wrong!) would mean multiplying by both, for an overall change of 4.0 × 0.7 = 2.8, or 280%. However, this should be read as an increase ''to'' 280% of its old value, not ''by'' 280% (you started at 100% and added 180%). But this is a very, very wrong way of doing the math because these are probabilities of very different things with very different scales (if you threw out 30% of your dishware but in that same period also acquired 3 toothpicks on top of your original 1 toothpick, would you say that your kitchen stuff increased by 180%?). The correct way of combining the two probabilities would be to translate them onto the same scale - the overall chance of death - which would be done by multiplying each value with its probability of happening at all. For example, if the chance of dying from cardiovascular disease was 50% and the chance of being killed by a bear was 0.1%, the overall chance of dying from either would be the sum, 50.1%. Both probabilities are affected by going outside; the new chances are now 50%*0.7=35% and 0.1%*4=0.4% and the combined chance of dying from either is now 35.4% - a significant DEcrease from the original 50.1%.
+
The odds of an event is the probability that it happens divided by the probability that it doesn't happen. People often express odds as a ratio (e.g. the odds of rolling a 6 on a 6-sided dice might be expressed as 0.16777... : 0.83333..., or equivalently as 1:5), but it is important to note that such ratios are not ''odds ratios'' (it would be fitting to call this a "probability ratio", but this terminology is not standard).
  
 +
An odds ratio is the odds of event O happening, given that some other event E has occurred, divided by the odds of O given that E has not occurred. O is sometimes called an "outcome" and E is sometimes called an "exposure", because people are often interested in comparing things like the odds of getting lung cancer (O) given that you smoke (E) to the odds of getting lung cancer given that you don't smoke, as a way of measuring the extent to which exposure to E influences outcome O. In the case of the comic, the outcome variable O is the event of getting a head injury from falling spacecraft debris, and the exposure variable E is the event of spending H hours per day outside, for various values of H. The comic appears to be saying that for each value of H, there are two options for E: either you spend H hours per day outside or you never go outside.
  
The specific reference to falling spacecraft is likely inspired by events happening around the time of this comics release (March 2022).  Around a month before this was posted, the head of the Russian space agency, {{w|Roscosmos}}, warned that sanctions against Russia (mostly those over the {{w|2022 Russian invasion of Ukraine}}) could result in the {{w|International Space Station}} crashing. Since the Russian section of the space station is the one that provides propulsion (although it is built to rely on the power generated by the other sections), this was taken seriously and as of when this was posted, {{w|NASA}} was trying to come up with alternative stabilization strategies in case the situation worsened. There was also a recent [https://www-uol-com-br.translate.goog/tilt/noticias/redacao/2022/03/17/parte-do-foguete-spacex-e-encontrada-por-morador-do-pr.htm?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=pt-BR&_x_tr_pto=wapp report] of some 600 kg space rocket debris found in Brazil.
+
So for small values of H (e.g. 1 hour per day), the comic is saying that the event of being hit by spacecraft debris is more or less independent of the event of spending H hours per day outside, which is to say that the odds of being hit is more or less the same regardless of the choice you make between spending H hours per day outside and never going outside. Hence the dot on the 1-hour bar is close to 1, because the two odds are more or less equal (the dot appears to represent an average estimate of the odds ratio).
 +
 
 +
Note that when calculating the odds ratios for this comic, the odds in the denominators are always the same, as they are the odds of being hit given that you never go outside, which does not depend on H. So when the comic says that the odds ratio is above 3 for H={11+ hours per day}, it is effectively saying that the odds of being hit when you spend this much time outside is a bit more than 3 times the odds of being hit when you spend 1 hour per day outside.
 +
 
 +
Suppose the probability of being hit is: P when you spend 1 hour per day outside, and Q when you spend 11+ hours per day outside. The odds of being hit under these two exposures are P/(1-P) and Q/(1-Q) respectively, and because the odds ratios have equal denominators, the comic is saying that Q/(1-Q) = kP/(1-P), where k is a bit more than 3. If we rearrange this to get an expression for Q, we get:
 +
 
 +
<pre>
 +
      Q/(1-Q) = kP/(1-P)
 +
<=>    Q(1-P) = kP(1-Q)
 +
<=>      Q-QP = kP-kPQ
 +
<=>  Q+kPQ-QP = kP
 +
<=> Q(1+kP-P) = kP
 +
<=>        Q = kP/(1+kP-P)
 +
<=>        Q = P/(P+(1-P)/k)  {by dividing the numerator and denominator by k}
 +
</pre>
 +
 
 +
As P is negligibly small, 1-P is very close to 1, and P+(1-P)/k is very close to 1/k. Thus Q is very close to kP (i.e. a bit more than 3 times P), meaning that the probability of being hit when you spend 11+ hours per day outside is still negligibly small. Thus, the comic's suggestion that we spend 4 hours or less outside based on the estimated odds ratios is extremely misguided.
  
 
==Transcript==
 
==Transcript==
Line 42: Line 57:
 
:X-axis: 1 2 3 4 5
 
:X-axis: 1 2 3 4 5
  
:[The Y-axis is not scaled; there are no ticks or lines. Instead it just gives five labels from top to bottom. Above those labels there is an arrow pointing to the top one with a label above explaining the axis.]
+
:[The Y-axis is not scaled; there are no ticks or lines. Instead it just gives five labels from top to bottom. Above those labels there is an arrow pointing to the top one with a label above explaning the axis.]
 
:Hours spent outdoors per day
 
:Hours spent outdoors per day
 
:Y-axis:  
 
:Y-axis:  
Line 59: Line 74:
  
 
==Trivia==
 
==Trivia==
*In the [https://www.explainxkcd.com/wiki/images/archive/d/d5/20220329223238%21spacecraft_debris_odds_ratio.png original version] of the comic, the Y-axis label said "Hours spent outdoors", but the comic was later changed to specify "Hours spent outdoors ''per day''", which makes more sense. When the updated image was uploaded, it had a much larger size than normal, because Randall posted the same file for both the normal "double size" image and the "regular" size. This had happened before with [[2576: Control Group]], see that comic's [[2576: Control Group#Trivia|trivia section]]. This resulted in the problem that the comic broke the boundaries on [https://xkcd.com xkcd.com].
+
*In the [https://www.explainxkcd.com/wiki/images/archive/d/d5/20220329223238%21spacecraft_debris_odds_ratio.png original version] of the comic the Y-axis label referred to "hours spent outdoors". So more than four hours spent outdoors in one's lifetime would be a problem.
*This comic's title text ("That's a 280% increased") has a typo.
+
**But later the comic was edited to specify "hours spent outdoors per day", which makes more sense.
 +
*When the new version was uploaded, Randall again made the error of making the two versions of the comic image the same size, as he did earlier in [[2576: Control Group]], see that comic's [[2576: Control Group#Trivia|trivia]]
 +
**This resulted in the problem that the comic broke the boundaries on the xkcd website.  
 +
**This was later fixed. But even at that time, the two images was the same size.
 +
**Here an example of how it looked when the error was present:
 +
[[File:2599- Spacecraft Debris Odds Ratio Image scaling off.png|500px]]
  
 
{{comic discussion}}
 
{{comic discussion}}
Line 68: Line 88:
 
[[Category:Space]]
 
[[Category:Space]]
 
[[Category:Animals]] <!-- bears title text-->
 
[[Category:Animals]] <!-- bears title text-->
[[Category:Comics edited after their publication]]
 

Please note that all contributions to explain xkcd may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see explain xkcd:Copyrights for details). Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel | Editing help (opens in new window)