Difference between revisions of "2435: Geothmetic Meandian"

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
(Improve Python code)
(MUAHAHAHAHAHAHA GET OOFED PLEASE EDIT)
Line 1: Line 1:
{{comic
 
| number    = 2435
 
| date      = March 10, 2021
 
| title    = Geothmetic Meandian
 
| image    = geothmetic_meandian.png
 
| titletext = Pythagorean means are nice and all, but throwing the median in the pot is really what turns this into random forest statistics: applying every function you can think of, and then gradually dropping the ones that make the result worse.
 
}}
 
  
==Explanation==
 
 
There are a number of different ways to identify the '{{w|average}}' value of a series of values, the most common unweighted methods being the {{w|median}} (take the central value from the ordered list of values if there are an odd number - or the value half-way between the two that straddle the divide between two halves if there are an even number) and the {{w|arithmetic mean}} (add all the numbers up, divide by the number of numbers). The {{w|geometric mean}} is less well known to the layman but works with multiplication and Nth-rooting, useful for some statistical analyses.  The geometric mean, arithmetic mean and {{w|harmonic mean}} (not shown) are collectively known as the {{w|Pythagorean means}}.
 
 
{{w|Outlier}}s and internal biases within the original sample can make boiling down a set of values into a single 'average' sometimes overly biased by flaws in the data, with your choice of which method to use perhaps resulting in a value that is misleading.
 
 
<!-- Either here or after the next paragraph, demonstrate how (1,1,2,3,5) resolves in each individual method, perhaps? -->
 
 
In this depiction, the three named methods of averaging are embedded within a single function that produces a sequence of three values - one output for each of the methods. Being a series of values, Randall suggests that this is ideally suited to being ''itself'' subjected to the comparative 'averaging' method. Not just once, but as many times as it takes to narrow down to a sequence of three values that are very close to one another.
 
 
The comment in the title text about suggests that this will save you the trouble of committing to the 'wrong' analysis as it gradually shaves down any 'outlier average' that is unduly affected by anomalies in the original inputs. It is a method without any danger of divergence of values, since all three averaging methods stay within the interval covering the input values (and two of them will stay strictly within that interval).
 
 
The title text may also be a sly reference to an actual mathematical theorem, namely that if one performs this procedure only using the arithmetic mean and the harmonic mean, the result will converge to the geometric mean. Randal suggests that the (non-Pythagorean) median, which does not have such good mathematical properties with relation to convergence, is, in fact, the secret sauce in his definition.
 
 
There does exist an {{w|arithmetic-geometric mean}}, which is defined identically to this except with the arithmetic and geometric means, and sees some use in calculus.  In some ways it's also philosophically similar to the {{w|truncated mean}} (extremities of the value range, e.g. the highest and lowest 10%s, are ignored as not acceptable and not counted) or {{w|Winsorized mean}} (instead of ignored, the values are readjusted to be the chosen floor/ceiling values that they lie beyond, to still effectively be counted as 'edge' conditions), only with a strange dilution-and-compromise method rather than one where quantities can be culled or neutered just for being unexpectedly different from most of the other data.
 
 
The following python code (inefficiently) implements the above algorithm:
 
 
<pre>
 
from functools import reduce
 
from itertools import count
 
 
 
def f(*args):
 
    args = sorted(args)
 
    mean = sum(args) / len(args)
 
    gmean = reduce(lambda x, y: x * y, args) ** (1 / len(args))
 
    if len(args) % 2:
 
        median = args[len(args) // 2]
 
    else:
 
        median = (args[len(args) // 2] + args[len(args) // 2 - 1]) / 2
 
    return mean, gmean, median
 
 
 
l0 = [1, 1, 2, 3, 5]
 
l = l0
 
for iterations in count():
 
    fst, *rest = l
 
    if all((abs(r - fst) < 0.00000001 for r in rest)):
 
        break
 
    l = f(*l)
 
print(l[0], iterations)
 
</pre>
 
 
And here is an implementation of the Gmdn function in R:
 
 
    Gmdn <- function (..., threshold = 1E-6) {
 
      # Function F(x) as defined in comic
 
      f <- function (x) {
 
        n <- length(x)
 
        return(c(mean(x), prod(x)^(1/n), median(x)))
 
      }
 
      # Extract input vector from ... argument
 
      x <- c(...)
 
      # Iterate until the standard deviation of f(x) reaches a threshold
 
      while (sd(x) > threshold) x <- f(x)
 
      # Return the mean of the final triplet
 
      return(mean(x))
 
    }
 
 
The input sequence of numbers (1,1,2,3,5) chosen by Randall is also the opening of the {{w|Fibonacci sequence}}.  This may have been selected because the Fibonacci sequence also has a convergent property: the ratio of two adjacent numbers in the sequence approaches the [https://en.wikipedia.org/wiki/Golden_ratio#Relationship_to_Fibonacci_sequence golden ratio] as the length of the sequence approaches infinity.
 
 
==Transcript==
 
{{incomplete transcript|Do NOT delete this tag too soon.}}
 
 
F(x1,x2,...xn)=({x1+x2+...+xn/n [bracket: arithmetic mean]},{nx,x2...xn, [bracket: geometric mean]} {x n+1/2 [bracket: median]})
 
 
Gmdn(x1,x2,...xn)={F(F(F(...F(x1,x2,...xn)...)))[bracket: geothmetic meandian]}
 
 
Gmdn(1,1,2,3,5) [equals about sign] 2.089
 
 
Caption: Stats tip: If you aren't sure whether to use the mean, median, or geometric mean, just calculate all three, then repeat until it converges
 
 
{{comic discussion}}
 
<!--
 
For a start, there is a syntax error. After the first application of F, you get a 3-tuple. Subsequent iterations preserve the 3-tuple, and we need to analyze the resulting sequence.
 
Perhaps there is an implicit claim all three entries converge to the same result. In any case, lets see what we get:
 
 
Wlog, we have three inputs (x_1,y_1,z_1), and want to understand the iterates of the map
 
F(x,y,z) = ( (x+y+z)/3, cube root of (xyz), median(x,y,z) ). Lets write F(x_n,y_n,z_n) = (x_{n+1},y_{n+1},z_{n+1}).
 
 
The inequality of arithmetic and geometric means gives x_n \geq y_n, if n \geq 2,  and
 
-->
 
 
[[Category:Math]]
 
[[Category:Statistics]]
 

Revision as of 14:00, 11 March 2021