Difference between revisions of "1725: Linear Regression"
CRGreathouse (talk | contribs) m (category) |
(→Explanation) |
||
(22 intermediate revisions by 14 users not shown) | |||
Line 10: | Line 10: | ||
{{w|Linear regression}} is a method for modeling the relationship between multiple variables. In the simplest case, it can be used for two variables wherein the model determines a "{{w|least squares|best-fit}}" line through a {{w|scatter plot}} of the datasets, together with a {{w|coefficient of determination}}, usually denoted ''r''<sup>2</sup> or ''R''<sup>2</sup>. When only two variables are included in the regression, ''R''<sup>2</sup> is merely the square of the correlation between the two variables. ''R''<sup>2</sup> is a number between 0 and 1 that indicates how well one variable can be used to predict the value of another. A value of 1 means perfect correlation, while a value close to 0 indicates a weak relationship between the variables. | {{w|Linear regression}} is a method for modeling the relationship between multiple variables. In the simplest case, it can be used for two variables wherein the model determines a "{{w|least squares|best-fit}}" line through a {{w|scatter plot}} of the datasets, together with a {{w|coefficient of determination}}, usually denoted ''r''<sup>2</sup> or ''R''<sup>2</sup>. When only two variables are included in the regression, ''R''<sup>2</sup> is merely the square of the correlation between the two variables. ''R''<sup>2</sup> is a number between 0 and 1 that indicates how well one variable can be used to predict the value of another. A value of 1 means perfect correlation, while a value close to 0 indicates a weak relationship between the variables. | ||
− | A | + | A constellation is a pattern created by linking the apparent positions of stars as seen in the sky from Earth. (Astronomers, in technical contexts, usually refer to these as {{w|Asterism_(astronomy)|asterisms}}, reserving "{{w|Constellation_(astronomy)|constellations}}" for the 88 regions into which the sky is divided, each named for the most prominent asterism it contains, although "constellation" is used informally in place of "asterism" by even seasoned astronomers.) Different civilizations have recognized different constellations, and one could create their own constellations by connecting assorted points, the way Randall connected points in his plot to make "Rexthor." |
In this comic, a set of data has had linear regression and some form of statistical analysis applied to it, indicating that there is low correlation between the two. The data points are so widely scattered that (as noted in the comic) it is easier to connect the data points in a constellation-like pattern than it is to determine whether the correlation is negative or positive (without looking at the trendline, of course). Because of this, [[Randall]] suggests we should be suspicious of any conclusions drawn from this data. | In this comic, a set of data has had linear regression and some form of statistical analysis applied to it, indicating that there is low correlation between the two. The data points are so widely scattered that (as noted in the comic) it is easier to connect the data points in a constellation-like pattern than it is to determine whether the correlation is negative or positive (without looking at the trendline, of course). Because of this, [[Randall]] suggests we should be suspicious of any conclusions drawn from this data. | ||
Line 16: | Line 16: | ||
The comic is somewhat misleading, since the data in the graph actually has an ''R''<sup>2</sup> of 0.02, only a third of what Randall claims. An example of published research with an ''R''<sup>2</sup> of 0.06 where the association in the graph is noticeable (if not strong) can be found [http://www.i-jmr.org/2012/1/e1/ here] (figure 2 has ''r'' = 0.25 which corresponds to ''R''<sup>2</sup> = 0.06). In addition, it is hard to see the association in the comic's graph because relatively few points are plotted. In a data set with 1000 observations and ''R''<sup>2</sup> = 0.06, any association between the two variables would be quite clear. | The comic is somewhat misleading, since the data in the graph actually has an ''R''<sup>2</sup> of 0.02, only a third of what Randall claims. An example of published research with an ''R''<sup>2</sup> of 0.06 where the association in the graph is noticeable (if not strong) can be found [http://www.i-jmr.org/2012/1/e1/ here] (figure 2 has ''r'' = 0.25 which corresponds to ''R''<sup>2</sup> = 0.06). In addition, it is hard to see the association in the comic's graph because relatively few points are plotted. In a data set with 1000 observations and ''R''<sup>2</sup> = 0.06, any association between the two variables would be quite clear. | ||
− | The lines connecting the stars in this "constellation" create a crude illustration of a person with an outstretched arm holding up a dog | + | The lines connecting the stars in this "constellation" create a crude illustration of a person with an outstretched arm holding up a dog, which could be a reference to the film {{w|Life is Beautiful}} where a waiter carries a dog on his tray without realizing. The name "Rexthor the Dog Bearer" spoofs the fact that numerous Greek-derived constellation names have both a proper name and an epithet (for example, "Orion, the Hunter"). The fact that "Rex" is an archetypal dog name (but also meaning {{w|Rex (title)|king}} as in king of the dinosaurs <i>Tyrannosaurus rex</i>), adds to the humor. |
The 95% {{w|confidence interval}} in statistics is such a range of an estimate, that it is expected to contain the real value (the estimated population parameter) 95% of the time. The confidence interval is a standard method to provide evaluation of the estimation error in statistics. On the right panel the resulting estimate seems to be a drawing, so the 95% confidence interval would be a set of drawings, expected to contain the correct drawing in 95% of samples where it is calculated. According to the title text, the interval in this particular sample also includes a cat and a teapot, so we can only make extremely vague statements in order to maintain 95% confidence. | The 95% {{w|confidence interval}} in statistics is such a range of an estimate, that it is expected to contain the real value (the estimated population parameter) 95% of the time. The confidence interval is a standard method to provide evaluation of the estimation error in statistics. On the right panel the resulting estimate seems to be a drawing, so the 95% confidence interval would be a set of drawings, expected to contain the correct drawing in 95% of samples where it is calculated. According to the title text, the interval in this particular sample also includes a cat and a teapot, so we can only make extremely vague statements in order to maintain 95% confidence. | ||
− | The teapot may be a reference to {{w|Russell's_teapot|Russell's teapot}}, or possibly to the {{w|Sagittarius_(constellation)#Visualizations|"teapot" asterism in the constellation Sagittarius.}} Alternatively | + | The teapot may be a reference to {{w|Russell's_teapot|Russell's teapot}}, or possibly to the {{w|Sagittarius_(constellation)#Visualizations|"teapot" asterism in the constellation Sagittarius.}} Alternatively it is just because the "dog" actually looks more like a teapot than a dog, and Randall noticed this and added it in the title text. In the latter case, the two first suggestions are just another example on how humans see patterns also where there are none to find, like those of {{w|pareidolia}} mentioned in [[1551: Pluto]]. |
==Transcript== | ==Transcript== | ||
Line 42: | Line 42: | ||
[[Category:Astronomy]] | [[Category:Astronomy]] | ||
[[Category:Animals]] | [[Category:Animals]] | ||
+ | [[Category:Dogs]] |
Latest revision as of 23:54, 18 March 2024
Linear Regression |
Title text: The 95% confidence interval suggests Rexthor's dog could also be a cat, or possibly a teapot. |
Explanation[edit]
Linear regression is a method for modeling the relationship between multiple variables. In the simplest case, it can be used for two variables wherein the model determines a "best-fit" line through a scatter plot of the datasets, together with a coefficient of determination, usually denoted r2 or R2. When only two variables are included in the regression, R2 is merely the square of the correlation between the two variables. R2 is a number between 0 and 1 that indicates how well one variable can be used to predict the value of another. A value of 1 means perfect correlation, while a value close to 0 indicates a weak relationship between the variables.
A constellation is a pattern created by linking the apparent positions of stars as seen in the sky from Earth. (Astronomers, in technical contexts, usually refer to these as asterisms, reserving "constellations" for the 88 regions into which the sky is divided, each named for the most prominent asterism it contains, although "constellation" is used informally in place of "asterism" by even seasoned astronomers.) Different civilizations have recognized different constellations, and one could create their own constellations by connecting assorted points, the way Randall connected points in his plot to make "Rexthor."
In this comic, a set of data has had linear regression and some form of statistical analysis applied to it, indicating that there is low correlation between the two. The data points are so widely scattered that (as noted in the comic) it is easier to connect the data points in a constellation-like pattern than it is to determine whether the correlation is negative or positive (without looking at the trendline, of course). Because of this, Randall suggests we should be suspicious of any conclusions drawn from this data.
The comic is somewhat misleading, since the data in the graph actually has an R2 of 0.02, only a third of what Randall claims. An example of published research with an R2 of 0.06 where the association in the graph is noticeable (if not strong) can be found here (figure 2 has r = 0.25 which corresponds to R2 = 0.06). In addition, it is hard to see the association in the comic's graph because relatively few points are plotted. In a data set with 1000 observations and R2 = 0.06, any association between the two variables would be quite clear.
The lines connecting the stars in this "constellation" create a crude illustration of a person with an outstretched arm holding up a dog, which could be a reference to the film Life is Beautiful where a waiter carries a dog on his tray without realizing. The name "Rexthor the Dog Bearer" spoofs the fact that numerous Greek-derived constellation names have both a proper name and an epithet (for example, "Orion, the Hunter"). The fact that "Rex" is an archetypal dog name (but also meaning king as in king of the dinosaurs Tyrannosaurus rex), adds to the humor.
The 95% confidence interval in statistics is such a range of an estimate, that it is expected to contain the real value (the estimated population parameter) 95% of the time. The confidence interval is a standard method to provide evaluation of the estimation error in statistics. On the right panel the resulting estimate seems to be a drawing, so the 95% confidence interval would be a set of drawings, expected to contain the correct drawing in 95% of samples where it is calculated. According to the title text, the interval in this particular sample also includes a cat and a teapot, so we can only make extremely vague statements in order to maintain 95% confidence.
The teapot may be a reference to Russell's teapot, or possibly to the "teapot" asterism in the constellation Sagittarius. Alternatively it is just because the "dog" actually looks more like a teapot than a dog, and Randall noticed this and added it in the title text. In the latter case, the two first suggestions are just another example on how humans see patterns also where there are none to find, like those of pareidolia mentioned in 1551: Pluto.
Transcript[edit]
- [Two square panels show identical sets of scattered black dots, with only the red additions being different.]
- [The left panel shows a slightly rising red line drawn through the middle of the panel, passing near a few dots but not obviously related to most of them. A red text is below the dots:]
- R2=0.06
- [The right panel shows many of the dots connected by red lines to form a stick figure of a man resembling the constellation Orion, with the hand on the reader's right raised and holding an object. A red text is below the dots:]
- Rexthor, the Dog-Bearer
- [A caption is below and spanning both panels:]
- I don't trust linear regressions when it's harder to guess the direction of the correlation from the scatter plot than to find new constellations on it.
Discussion
It also seems likely that the teapot refers to the Utah Teapot (https://en.wikipedia.org/wiki/Utah_teapot). It was one of the first complex 3D objects defined for CGI rendering, and has seen countless uses since. Notably in the Pipes screensaver, and early SIGGRAPH papers where it was rendered along side the 5 platonic solids as if it belonged with them. Dkfenger (talk) 17:10, 26 August 2016 (UTC)
- I'm not sure I follow. How do you reach that conclusion? Given that the concept of constellations (and thus stars) is clearly shown in the comic, it seems much more likely to me that he was referring to Russell's Teapot and not to a computer rendering (if there was any reference at all). The fact that that shape could abstractly resemble a teapot may be all that there is to it. :) KieferSkunk (talk) 18:06, 26 August 2016 (UTC)
I think that the teapot is a reference to the constellation Sagittarius. This seems most likely to me as the reference is to a constellation that looks like a teapot despite ostensibly being something else. Sagittarius is a constellation that is supposed to be an archer, but many people see it as a teapot instead. (http://www.space.com/30274-constellation-sagittarius-archer-dipper-teapot.html) Harperska (talk) 19:27, 26 August 2016 (UTC)
I think it looks like a alcohol drink with the little umbrella sticking out. Mikemk (talk) 06:25, 27 August 2016 (UTC)
Based on what is an R^2 value of 0.06 significant??? I'm removing that. Djbrasier (talk) 20:59, 26 August 2016 (UTC)
- Oops, misread it! I read "insignificant" as "significant". Djbrasier (talk) 21:00, 26 August 2016 (UTC)
The teapot mention may just be a joke, not a reference. 141.101.98.114 (talk) (please sign your comments with ~~~~)
did someone check if it really was a Rsquared of 0,06?141.101.104.67 20:56, 27 August 2016 (UTC)
- Asuming the top left of the image as 0/0 and measuring in pixels I get f(x)=-0,135x + 124,8 with R²=0,0197, calcuated with LibreOffice. The line in the image has f(x)=-0,094x+125. If I change a single point by one or two the R² value varies from 0,0195 to 0,0199. If I substract 10% of the x value from the y value R² increases to 0,0574. So I think R²=0,06 is a little bit inaccurate, but not completely wrong. --162.158.83.228 19:01, 2 September 2016 (UTC)
- I think R^2 = 6% is very inaccurate if the true R^2 = 2%. 108.162.219.56 00:07, 3 September 2016 (UTC)
Does anybody know of any real-world examples of a similarly low R^2 given in genuine research? It would be worth mentioning their existence if we can find one. Cosmogoblin (talk) 18:03, 28 August 2016 (UTC)
- In published research? I don't recall any. In submissions for review? At least twice. And of course one case where this comic could and should be used as an educational drawing - student reports, master's theses, etc. I've seen "conclusions" drawn from weaker data in those, far too many times for my mental health...--162.158.86.119 09:32, 30 August 2016 (UTC)
Rex is also Latin for king, which may be related in the context of constellations. 172.68.11.81 (talk) (please sign your comments with ~~~~)
This is irrelevant to the humor of the comic, but I fixed the paragraph on confidence intervals because it contained at least three misinterpretations (I have a MSc in statistics). The phrasing can be improved if needed. Don't worry though, even experienced statisticians get it wrong sometimes... 162.158.234.40 09:43, 10 April 2018 (UTC)