Editing 1347: t Distribution

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 14: Line 14:
 
The comic is a play on the name "Student", the pseudonym of the creator, versus the "Teacher". The idea is that a "teacher's" distribution would be more complex, and that it would be used for fitting data when the student's distribution wasn't sophisticated enough. Of course, in actuality, such a complex distribution as the one shown in the comic would have many parameters, and in practice would probably lead to overfitting and/or bias. Thus, the comic (and the title text) can be seen as making fun of the idea that more complex is always better, or perhaps of the idea that a statistician's job is to use more and more sophisticated tools to force the data to yield a "publishable" result, rather than to use the simplest appropriate tool and let the chips fall where they may.  
 
The comic is a play on the name "Student", the pseudonym of the creator, versus the "Teacher". The idea is that a "teacher's" distribution would be more complex, and that it would be used for fitting data when the student's distribution wasn't sophisticated enough. Of course, in actuality, such a complex distribution as the one shown in the comic would have many parameters, and in practice would probably lead to overfitting and/or bias. Thus, the comic (and the title text) can be seen as making fun of the idea that more complex is always better, or perhaps of the idea that a statistician's job is to use more and more sophisticated tools to force the data to yield a "publishable" result, rather than to use the simplest appropriate tool and let the chips fall where they may.  
  
βˆ’
[[Cueball]] tries to "fit" a distribution to the data on the paper. This is the usual jargon for when a statistician is trying to model their data as coming from some underlying probability distribution, and the comic makes a pun with the physical meaning of "fit". In the second panel, Cueball decides that the Student's T distribution does not fit his data well (the data failed the Student t-test), and decides to pull out the more complex Teachers t-distribution instead (the teachers t-test - which the data is not allowed to continue to fail). Note that "test" is what statisticians do to data to see if it fits some distribution, but it is also another word for "examination."
+
[[Cueball]] tries to "fit" a distribution to the data on the paper. This is the usual jargon for when a statistician is trying to model his/her data as coming from some underlying probability distribution, and the comic makes a pun with the physical meaning of "fit". In the second panel, Cueball decides that the Student's T distribution does not fit his data well (the data failed the Student t-test), and decides to pull out the more complex Teachers t-distribution instead (the teachers t-test - which the data is not allowed to continue to fail). Note that "test" is what statisticians do to data to see if it fits some distribution, but it is also another word for "examination."
  
 
The Students t distribution relates the average of a small sample to the "true" population average, under the assumptions, unobjectionable in many contexts, that there is such a "true" value, and that the samples are independent and normally distributed with equal variance. As such, unless the data on Cueball's paper contain many small groups which radically violate these assumptions somehow, there is no way Cueball's data could falsify the t distribution. In particular, a single number (for the average of one group) or a small set of numbers (for the averages of several numbers) will never make a nice smooth curve, but an average statistician would see that as normal statistical noise that would even out over time, not as a reason to prefer a complex, spiky curve such as the supposed "teacher's" distribution. But of course, Cueball's access to a secret, cooler-looking distribution makes them more badass than a mere average statistician... or does it?
 
The Students t distribution relates the average of a small sample to the "true" population average, under the assumptions, unobjectionable in many contexts, that there is such a "true" value, and that the samples are independent and normally distributed with equal variance. As such, unless the data on Cueball's paper contain many small groups which radically violate these assumptions somehow, there is no way Cueball's data could falsify the t distribution. In particular, a single number (for the average of one group) or a small set of numbers (for the averages of several numbers) will never make a nice smooth curve, but an average statistician would see that as normal statistical noise that would even out over time, not as a reason to prefer a complex, spiky curve such as the supposed "teacher's" distribution. But of course, Cueball's access to a secret, cooler-looking distribution makes them more badass than a mere average statistician... or does it?

Please note that all contributions to explain xkcd may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see explain xkcd:Copyrights for details). Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel | Editing help (opens in new window)