Talk:2118: Normal Distribution

Explain xkcd: It's 'cause you're dumb.
Revision as of 23:32, 1 March 2019 by (talk)
Jump to: navigation, search

Is there a statistician in the house? Hawthorn (talk) 15:32, 1 March 2019 (UTC)

   I think they all got annoyed at the graph and left. Margath (talk) 15:46, 1 March 2019 (UTC)

Of course there is! 15:44, 1 March 2019 (UTC)

As an example: When measuring the height of people in the same age bracket, then you'll expect the number of people at each height to look like this graph. There will be a lot of people around the average height, fewer a foot shorter/taller, some (but very few) exceptionally tall people, and some (but very few) exceptionally short people. The x-value represents the height, the y-value essentially represents the amount of population that share that height. When we measure the middle 50% of the population using vertical bars, then people at a certain height are either inside OR outside the middle. Randall uses horizontal bars here, which means some people at a certain height will be counted in the middle 50%, but other people with the same height won't be. In fact, some people with the exact average height of the whole population would fall outside the middle. 16:01, 1 March 2019 (UTC)

Feel free to rip me apart for referring to it as the "number of people at each height", since y-axis is more complicated than a simple count. 16:03, 1 March 2019 (UTC)

Just to say, Randall's horizontal slice isn't entirely meaningless. It's a calculation I've had to do, where I have a series of binned samples of a population (say I knew how many fell in -10..10, how many fell in -5..5, how many fell in -2..2) and wanted to combine them with an appropriate weighting to approximate a Gaussian. I was using it for filtering, but it's logically similar. Fluppeteer (talk) 16:19, 1 March 2019 (UTC)

Also, the slice sampler for MCMC is a trick for sampling from a distribution by "turning it on its side". But I don't think the 50% figure would be meaningful in that context. 21:16, 1 March 2019 (UTC)

Pedant: etymologically, there *is* actually a connection between a normal (to a surface or line) and the normal distribution; the former comes from the Latin for a set square (giving you perpendicular), and it later came to mean "standard". The "tangential distribution" certainly fits the etymology of "odd/unusual" though. Fluppeteer (talk) 16:26, 1 March 2019 (UTC)

This reminds me of the difference between Riemann(-Stieltjes) and Lebesgue integration. 20:16, 1 March 2019 (UTC)

As the axis are not labeled (see comic 833) we could consider this a multivariate distribution where one parameter is uniform and the other is normal. That was my first thought when I saw this. 18:43, 1 March 2019 (UTC)

Is there any meaning to midpoint: 52.7%? Maybe that is the arbitrary center he formed the horizontal bounds around? Maybe it relates to data? Is this a reference to something? It's certainly reminiscent of how normal distributions produce statistically meaningful numbers that have weird decimals in them (like the % represented by being within so many standard deviations). 19:45, 1 March 2019 (UTC)

Maybe it's because the meaning of "50% of the chart lies between these lines" specifically becomes roughly useless for discerning error if the lines are not centered around the origin. 19:52, 1 March 2019 (UTC)
I might get it!!! The area between the lines is 52.7% of the total area: which means that 50% is technically included in what lies between them. 23:07, 1 March 2019 (UTC)

The correct way to do this is to have the topmost vertical line equal to or above the top of the normal plot. Then the bottom-most line would represent the same values as vertical lines would. 23:32, 1 March 2019 (UTC)