Revision as of 01:57, 11 March 2021

Geothmetic Meandian

Pythagorean means are nice and all, but throwing the median in the pot is really what turns this into random forest statistics: applying every function you can think of, and then gradually dropping the ones that make the result worse.

Title text: Pythagorean means are nice and all, but throwing the median in the pot is really what turns this into random forest statistics: applying every function you can think of, and then gradually dropping the ones that make the result worse.

Explanation

This explanation may be incomplete or incorrect: Created by an AVERAGE JOE. Please mention here why this explanation isn't complete. Do NOT delete this tag too soon.
If you can address this issue, please edit the page! Thanks.

There are a number of different ways to identify the 'average' value of a series of values, the most common unweighted methods being the median (take the central value from the ordered list of values - or the value half-way between the two that straddle the divide between two halves) and the arithmetic mean (add all the numbers up, divide by the number of numbers). The geometric mean is less well known to the layman but works with multiplication and Nth rooting, useful for some statistical analyses.

Outliers and internal biases within the data can make boiling down a set of sample values into a single 'average' sometimes quite misleading, with your choice of average to use perhaps resulting in a misleading value.

In this depiction, the three named methods of averaging are embedded within a single function that produces a set of three values - one output for each of the methods. Being a series of values, Randall suggests that this is ideally suited to being itself subjected to the comparative 'averaging' method. Not just once, but as many times as it takes to narrow down to a consistent value (or trio of similarly consistent values?).

The comment in the Title Text suggests that this will save you the trouble of committing to the 'wrong' analysis as it gradually shaves down any 'outlier average'. At first sight it does appear to be a method without any danger of (permanent) divergence of values, but this may need to be proven (or disproven) to be sure.

In some ways this philosophy may be considered similar to the Truncated Means (extremities of the value range, e.g. the highest and lowest 10%s, are ignored as not acceptible and not counted) or Winsorised Means (instead of ignored, the values are readjusted to be the chosen floor/ceiling values that they lay beyond, to still effectively be counted as 'edge' conditions), only with a strange dilution-and-compromise method rather than one where quantities can be culled or neutered just for being unexpectedly different from most of the other data.

Transcript

This transcript is incomplete. Please help editing it! Thanks.

add a comment! ⋅

add a topic (use sparingly)! ⋅

refresh comments!

Discussion

Oh, this one's good. Just checked in (no, I wasn't hovering over the refresh button, my first visit today!) and one glance had me in paroxysms of laughter. But how to explain it? Gonna have to think about that. 141.101.98.96 01:12, 11 March 2021 (UTC)

I made a really bad spreadsheet to understand better how it works: https://docs.google.com/spreadsheets/d/1fqmHwDmirJrsKPdf94PutFDw31DMAYxNeR7jef1jneE/edit?usp=sharing

Someone fix my awful transcript edits please. --Char Latte49 (talk) 02:31, 11 March 2021 (UTC)

Seeing the Python added to the Explanation, try this Perl (typed straight here, so not tested)...

## Your prefered variations of "#!/usr/bin/perl", "use strict;" and "use warnings;" here! ##
sub F { my (@vals)=@_; my $invVals=1/int(@vals);
 my ($geo,$arith,$med)=(1); # Only defining $geo, so first *= works correctly!
 while (@vals) { my($lo,$hi)=(shift @vals,pop @vals); # $hi may be undef - this is intended!
  $arith+=$lo; $geo*=$lo; unless (defined $hi) {  $med =  $lo;     last }
  $arith+=$hi; $geo*=$hi; unless (@vals)       { ($med)=F($lo,$hi)      }
 }
 return ($arith*$invVals, $geo**$invVals, $med);
}
sub GMDN { my (@vals)=sort @_; my $lim=10**(-5); # Adjust $lim to taste...
  return "Error: No vals!" unless  @vals; # Catch!
  return $vals[0]          unless ($vals[$#vals]-$vals[0]) > $lim;
  return GMDM(F(@vals));
}
my @test=(1,1,2,3,5);
print "Values:              @test\nGeothmetic Meandian: ".GMDN(@test)."\n";

...debugged in my head, so probably fatally flawed but easily fixed/adapted anyway. 141.101.99.109 03:04, 11 March 2021 (UTC)

Why so complicated?

perl -e 'use strict; use warnings; sub F { my ($s,$p) = (0,1); my @srt = sort {$a<=>$b} @_; for (@_) { $s += $_; $p *= $_; } return ($s/@_,$p**(1/@_),$srt[$#_/2]); } sub Gmdn { print join(", ",@_=F(@_)),"\n" for 0..20; return @_; } print join(", ",Gmdn(1,1,2,3,5)),"\n";'

(With interim results) SCNR -- Xorg (talk) 03:18, 11 March 2021 (UTC)

I can read your version (and I see you do explicit {$a<=>$b}, which indeed may be necessary in mine for real use, along with additional sanity checks, I will check later) but I wanted to make mine neat, and slightly tricksy in implementation, but still not quite so entirely obfuscated to the more uninitiated. TIMTOWTDI, etc, so I like your (almost) bare-bones version too. ;)

(Is 20 cycles enough to converge in sufficiently extreme cases? Won't give "Too deep" error, though, even if it takes at least that long. There's a definite risk that mine might, as written.) 141.101.99.229 03:45, 11 March 2021 (UTC)

Given the lack of precision in Randall's example usage, I think 20 cycles ought to be enough for everyone ;-P. I'm trying to prove that the interval's size has to shrink by somewhat close to a factor of 1/2 every cycle, but it's tricky and it's late. If I can assume a factor of 1/2 in the long run, 64 iterations should pin down a 64-bit float.

I actually didn't try to obfuscate, I was just too lazy to type more ;-). Otherwise I might have left out the "return"s and passing parameters at all. -- Xorg (talk) 04:21, 11 March 2021 (UTC)

I find the one-liner more readable: it's straightforward and pretty minimal. For what its worth, here's my version:

perl -MList::Util=sum,product -E 'sub F { (sum @_)/@_, (product @_)**(1/@_), (sort { $a <=> $b } @_)[$#_/2] } $, = " "; say @v = @ARGV; say @v = F(@v) for 1..30' 1 1 2 3 5

30 iterations is enough for the numbers to display identically on this system (to 14 decimal places). I think it's even cleaner in Raku (formerly Perl 6):

raku -e 'sub F(@d) { @d.sum/@d, [*](@d)**(1/@d), @d.sort[@d/2] }; say my @v = +«@*ARGS; say @v = F(@v) for 1..33' 1 1 2 3 5

On this system, Rakudo yields an additional decimal place, which takes another 3 iterations to converge. Smylers (talk) 06:53, 11 March 2021 (UTC)

Side-thought: is GMDN (nowhere near as logical an ETLA contraction of the title term as, say, 'GMMD' or 'GTMD') actually an oblique reference to the GNDNs as popularised/coined by Trek canon? Worth a citation/Trivia? 162.158.158.97 04:12, 11 March 2021 (UTC)

Besides of nerdgasm is there some reason why the program code is relevant for the explanation? Elektrizikekswerk (talk) 08:55, 11 March 2021 (UTC)

Apparently not. I moved it to the trivia section. Elektrizikekswerk (talk) 07:51, 12 March 2021 (UTC)

I do not agree with the statement that "The title text may also be a sly reference to an actual mathematical theorem, namely that if one performs this procedure only using the arithmetic mean and the harmonic mean, the result will converge to the geometric mean." Could one produce a reference to this result? A simple computer experiment does not show this "theorem" to be true, i.e. for the procedure to return the geometric mean of the original entry. Pointfivegully (talk) 15:04, 12 March 2021 (UTC)

= Here is a reference https://math.stackexchange.com/questions/1734978/computing-square-roots-with-arithmetic-harmonic-mean. You must have made a mistake in your computer experiment. As an example, see the details I posted about Cueball and Megan exchanging dollars and euros. This is actually a handy way to compute square roots by hand.

Proof of convergence

Can any of you come up with a mathematical proof that repeated application of F on a set of (say) positive real numbers is guaranteed to converge toward a single real number, i.e. that the GMDN of a set of positive real numbers is well-defined?

One observation I've made is that if you consider that maximum and minimum numbers in the original set to be x1 and xn (without loss of generality), something we know for sure is that AM(x1, ..., xn), GM(x1, ..., xn) and Median(x1, ..., xn) are all at least x1 and at most xn that is to say...

x1 <= AM(x1, ..., xn), GM(x1, ..., xn), Median(x1, ..., xn) <= xn

So range(AM(x1, ..., xn), GM(x1, ..., xn), Median(x1, ..., xn)) is necessarily <= range(x1, ..., xn).

And given that we know that unless x1, ..., xn are all equal, that x1 < AM(x1, ..., xn) < xn, we have an even stricter result (unless x1, ..., xn are all equal) that is range(AM(x1, ..., xn), GM(x1, ..., xn), Median(x1, ..., xn)) < range(x1, ..., xn).

So, it's clear that range(x1, ..., xn) > range(F(x1, ..., xn)) > range(F(F(x1, ..., xn))) > range(F(F(F(x1, ..., xn)))) > ... and it's also clear that all of these ranges are >= 0. There is a result in number theory that says that any infinite sequence of real numbers which monotonically decreases and is bounded from below converges.

So we know for sure that range(F(F(...F(x1, ..., xn)...))) converges but we still have to show that it converges to 0 to show that the GMDN converges to a single real number.

I'm not sure how to proceed. Does anyone have any ideas?

EDIT: I just noticed that unless x1, ..., xn are all equal, AM(x1, ..., xn) is at least ((n-1)/n) * range(x1, ..., xn) away from both x1 and xn. So not only do we have that range(x1, ..., xn) > range(F(x1, ..., xn)) from before, but we also have that ((n-1)/n) * range(x1, ..., xn) >= range(F(x1, ..., xn)). This guarantees that that the range falls exponentially on repeated applications of F. So it's certain that the the range ultimately converges to 0, and hence that the GMDN is well-defined.

It might be a good idea for someone to concretely present this idea as a proof on Page.

See my additional notes below. -Ramakarl

172.69.135.44 05:07, 11 March 2021 (UTC) Anirudh Ajith

That doesn't quite work as it stands, since proving AM is that distance away does not say anything about the other two averages. I think it's true, but a little more rigour is required. 141.101.98.120 09:17, 11 March 2021 (UTC)

When trying this myself I first arrived at 2.082, not 2.089. What threw me off was the incomplete formula for the median, which only works with sorted lists. The three values returned by F(...) aren't necessarily sorted. 141.101.76.194 09:49, 11 March 2021 (UTC)

First: almost all invocations are with exactly 3 arguments (The output of the previous invocation), so we don't have to deal with N inputs at all.
Notation: In iteration n we have the values min[n] <= mid[n] <= max[n] (in any order) and can compute AM[n], GM[n] (and median[n] = mid[n]).
Let Q[n] := max[n]/min[n] >= 1, R[n] := max[n]-min[n] = (Q[n]-1)*min[n].
We already established that R is decreasing and min is increasing, so Q is decreasing.

Theorem: There is an n0 with R[n+1] <= R[n]*2/3 for all n > n0.

Proof (by case discrimination for each n):
case 1: mid[n+1] != AM[n]:
    R[n+1] <= Max(max[n]-AM[n],AM[n]-min[n]) 
            = Max(max[n]*3-(max[n]+mid[n]+min[n]),(max[n]+mid[n]+min[n])-min[n]*3)/3
            = Max(max[n]*2-(mid[n]+min[n]),(max[n]+mid[n])-min[n]*2)/3
           <= (max[n]-min[n])*2/3
            = R[n]*2/3
    Hence: R[n+1] <= R[n]*2/3

case 2: mid[n+1] == AM[n]:
  because GM <= AM: min[n+1] = GM[n], max[n+1] = mid[n]
  Q[n+1] = mid[n]/GM[n]
         = (mid[n]^3/(max[n]*mid[n]*min[n]))^(1/3)
         = (mid[n]^2/(max[n]*min[n]))^(1/3)
        <= (mid[n]/min[n])^(1/3)
        <= Q[n]^(1/3)
  R[n+1] = (Q[n+1]-1)*min[n+1]
        <= (Q[n]^(1/3)-1)*GM[n]
        <= (Q[n]^(1/3)-1)*(max[n]^2*min[n])^(1/3)
         = (Q[n]^(1/3)-1)*Q[n]^(2/3)*min[n]
         = (Q[n]-Q[n]^(2/3))*min[n]
         = R[n]-(Q[n]^(2/3)-1)*min[n]
        <= R[n]-(Q[n]-1)*min[n]/(Q[n]^(1/3)+1))
         = R[n]-R[n]/(Q[n]^(1/3)+1)
         = R[n]*(1-1/(Q[n]^(1/3)+1))
  Now we can pick a q1 = Q(n1) with q1 > Q[n] >= 1 for n > n1 because Q is decreasing:
    R[n+1] <= R[n]*(1-1/(q1^(1/3)+1))
  
  Together with case 1, this gives R -> 0 and thus Q -> 1. So we can pick another q0 = Q(n0) with q0 <= 8:
    R[n+1] <= R[n]*(1-1/(q0^(1/3)+1)) <= R[n]*2/3

-- Xorg (talk) 17:34, 11 March 2021 (UTC)

Better Python implementations

I'd like to add a somewhat more compact Python implementation based on the numpy module.

import numpy as np

def F(x):
   return np.mean(x), np.exp(np.log(x).mean()), np.median(x)

def GMDN(x, tolerance=1e-6):
   while np.std(x) > tolerance:
       x = F(x)
   return x[0]

gmdn = GMDN([1, 1, 2, 3, 5])
print(gmdn)

--Lvdgraaff (talk) 10:42, 11 March 2021 (UTC)

No need for numpy, there's the statistics module in the stdlib

import math
import statistics

def F(*nums):
    return (
        statistics.mean(nums),
        statistics.geometric_mean(nums),
        statistics.median(nums),
    )

def GMDN(*nums):
    while not math.isclose(max(nums), min(nums)):
        nums = F(*nums)
    return nums[0]

gmdn = GMDN(1, 1, 2, 3, 5)
print(gmdn)

For something as simple as this, I always find it cheating to use a package to abstract away the few actually necessary calculations. You might as well use a DWIM module and do 'result = DWIM(input)' as the sole command. But that's me for you. I'd write my own direct-to-memory screen RAM accesses, if silly things like OS HALs and GPU acceleration (once you find a way to message them as directly as possible) hadn't long since made that pretty much moot, if not actually verboten... 141.101.99.109 17:53, 11 March 2021 (UTC)

I'd like to add my own implementation:

from math import *
def getMeans(n=list):
    n.sort()
    mean=sum(n)/len(n)
    if len(n)%2==1:
        median=n[len(n)//2]
    else:
        median=(n[1+floor(len(n)/2)]-n[floor(len(n)/2)])/2
    prod=1
    for i in n:
        prod*=i
    gmean=prod**(1/len(n))
    return [mean,median,gmean]
def gmdn(tol,n=list):
    mList=n
    mList=getMeans(mList)
    while not (isclose(mList[0],mList[1],rel_tol=tol) and isclose(mList[1],mList[2],rel_tol=tol) and 
isclose(mList[0],mList[2],rel_tol=tol) ) :
        mList = getMeans(mList)
    return (mList[0]+mList[1]+mList[2])/3
print(gmdn(1e-15,[1,1,2,3,5]))

which gives me 2.089057949736859

Sloppy notation?

As a mathematician, I immediately noticed a couple of annoying niggles. Firstly, it is only implied, but never clearly stated, that the input list is ordered - which means the median is wrong unless ordered. Now F outputs an ordered triple of real numbers, and in calculating G, this is fed in to F again directly. This will frequently give inputs that are not in order, and in subsequent iterations the "median" will always be the middle number - i.e. the geometric mean - regardless of the actual median. Secondly, Randall's final line gives the output of G as a single number, but as it is just the result of a repeated application of F, the output of G should be an ordered triple. I'm sure Randall is aware of both, and chose to cut out the implied ordering of the inputs and choosing one of the three values as the output of G as they aren't necessary for the joke, but maybe we should note something about this in the explanation.141.101.99.109 13:07, 11 March 2021 (UTC)

First, I've never seen a definition of median which doesn't account for ordering itself, although I am a little annoyed at his definition for a different reason— that it doesn't account for even-length lists. Second, what I got from the comic initially is that G_MDN is supposed to be a single number, specifically that value for which all three of its elements would become equal if implemented infinitely many times (and it will converge, because if the three elements are all the same it already has converged, and if at least two are different, both means will necessarily become greater than the least value and smaller than the greatest value due to the definition of 'mean'). Another annoyance I noted is that G_MDN is real iff there are an even number of negative numbers and/or the length of the initial list is odd, but I suppose that can't be helped. Ooh, complex meandianing! BlackHat (talk) 15:15, 11 March 2021 (UTC)

I guess it depends on whether we take x_{(n+1)/2} literally or not. If taken literally, the elements need to be sorted, and even-length lists don't work. If we assume the "median" is the literal part, and the notation is just a simplified sketch, then both are okay (median of even list is average of the two "middle" elements). 172.68.143.160 22:00, 12 March 2021 (UTC)

Is the arithmetic-geometric mean connected to geometry?

In the case that only the arithmetic and geometric mean are used the combined arithmetic-geometric mean can be interpreted as the radius R of a circle which has the same circumference as an ellipse with half axes a and b.

R = M(a,b)

Can the Geothmetic Meandian be interpreted are the radius R of a 3D sphere which has the same surface as an elliptic cylinder with half-axes a and b and length c ?

R = GMDN(a,b,c)

Here is a geometric interpretation of the inequality between them: Inequality_of_arithmetic_and_geometric_means#Geometric_interpretation. DanLoeb (talk) 19:31, 14 March 2021 (UTC) D--DanLoeb (talk) 19:31, 14 March 2021 (UTC)

The RandallMunroe Set

Here is some MATLAB code (sorry) to generate an image showing the number of cycles required to converge, a' la' the Mandelbrot Set. Not sure how to post an image here, but it is really cool.

   % RandallMunroeSet.m
   % From a suggestion by Randall Munroe in XKCD #2435 Mar 10 2021
   % new statistic GMDN(x) = [mean(x), geomean(x), median(x)]
   % calculation is recursive, ending when converged
   % here we count the cycles required to converge and plot a' la' Madelbrot Set
   % the initial X input can be any length vector, but we restrict to 3 space
   % here for visualization, and fix Z so we get a 2D image
   % so far, for positive values, it converges in less than 40 or not at all
   % for negative x, set max cycles to something larger like 60
   % I haven't plotted it, but there is logically another set that plots the
   % resulting converged value.
   % Explore!
   % (c)2021 CC BY-NC 2.5 [email protected] peace, love, trees

   % here we answer the question, how many cycles does it take for GMDN to
   % converge?
   maxcycles = 40; stepsize = .0025;
   z = 1; % pick a Z, any Z
   x = stepsize:stepsize:(4-stepsize); % explore a range of x and y
   y = x; 
   wbh = waitbar(0);
   RMS = zeros(numel(x),numel(y),numel(z)); % no, not root mean square, this is the Randall Munro Set!

   for idx = 1: numel(x)
       waitbar(idx ./ numel(x)); % feedback on progress
       for jdx = idx:numel(y) % result is symmetric across the diagonal, so we save time by computing above the diagonal

           for kdx = 1:numel(z)
               RMS(idx, jdx, kdx)  = gmdn([x(idx),y(jdx),z(kdx)], maxcycles);
               RMS(jdx, idx, kdx) = RMS(idx, jdx, kdx); % copy across the diagonal
           end
       end
   end
   close(wbh)
   RMS = min(maxcycles,RMS);

   if numel(z) == 1;
       figure(420);
       image(255*RMS./maxcycles);truesize; colormap(jet(256));
   end
   if numel(z) == 3; % allow for true color, but in practice it is so sensitive to initial z value it just gives three different sets unless the Zs are VERY CLOSE
       figure(420);image(RMS./maxcycles);truesize;
   end
   title(['RandallMunro Set Z = ' num2str(z)]);

   figure(3);hist(RMS(:),0:maxcycles);
   %% How many cycles to converge?
   function ncycles = gmdn(x, maxcycles)
   ncycles = 0;
   while ncycles < maxcycles
       ncycles = ncycles + 1;
       x = [mean(x), geomean(x), median3(x)];
       if all(x(1) == x(2:3))
           break
       end
   end

end

   %% Geometric Mean
   function result = geomean(x)
   result = prod(x) .^ (1/numel(x));
   end

   %% Slightly faster median than builtin MATLAB function
   function result = median3(x)
   y = sort(x);
   result = y(2);
   end

My python program:

from math import *
def getMeans(n=list):
    n.sort()
    mean=sum(n)/len(n)
    if len(n)%2==1:
        median=n[len(n)//2]
    else:
        median=(n[1+floor(len(n)/2)]-n[floor(len(n)/2)])/2
    prod=1
    for i in n:
        prod*=i
    gmean=prod**(1/len(n))
    return [mean,median,gmean]
def gmdn(tol,n=list):
    mList=n
    mList=getMeans(mList)
    while not (isclose(mList[0],mList[1],rel_tol=tol) and isclose(mList[1],mList[2],rel_tol=tol) and 
isclose(mList[0],mList[2],rel_tol=tol) ) :
        mList = getMeans(mList)
    return mList[0]
print(gmdn(1e-15,[1,1,2,3,5]))

It gave me 2.0890579497368584

Proof - Possibly by Induction

Earlier question: > Can any of you come up with a mathematical proof that repeated application of F on a set of (say) positive real numbers is guaranteed to converge toward a single real number

Define:
F(n) = {An,Bn,Cn}
F(n+1) = {An+1, Bn+1, Cn+1} = {ave(An,Bn,Cn), geomean(An,Bn,Cn), median(An,Bn,Cn)}
R(n) = range of F = max(An,Bn,Cn)-min(An,Bn,Cn), for iteration n
We want to show that the range R(n) converges to 0.
With the following notation: max(n) == max(An,Bn,Cn), ave(n)==ave(An,Bn,Cn), ..
We observe the following emperically for many different inputs:
R(n) = max(n)-min(n)
CASE 1: max(n)=ave(n), THEN max(n+1)=median(n+1)=geomean(n) AND min(n+1)=geomean(n+1)
In this case max(n+1) is fixed to a previous value, the geomean(n), and min(n+1) takes on the new geomean(n+1) which is guaranteed to reduce the range R(n) as min(n) < geomean(n+1) < max(n). It also implies case 2 must be invoked because min(n+1)=geomean(n+1) at n+1.
CASE 2: min(n)=geomean(n), THEN max(n+1)=ave(n+1) AND min(n+1)=median(n+1)=ave(n)
In this case min(n+1) is fixed to a previous value, the ave(n), and max(n+1) takes on the new ave(n+1) which is guaranteed to reduce the range R(n) as min(n) < ave(n+1) < max(n). It also implies case 1 must be invoked because max(n+1)=ave(n+1) at n+1.

Each case forces the range to be reduced while also forcing the alternate case on the next iteration.

In other words, the maximum at each iteration alternates between the average and the median, and the minimum alternates between the geomean and the median. Thus either the minimum or the maximum at n+1 are always converging away from the minimum and maximum at previous n.
While this is not a formal proof, since the initial observations are emperical, I believe that a proof-by-induction can be built based on the oscillating convergence (without the need for F to be differentiable).

Ramakarl (talk) 00:00, 12 March 2021 (UTC)

How can this be formulated as a PDE when F isn't even differentiable?

Besides, R(Fn+1) < R(Fn) does not imply limit R(Fn) = 0 (Think R(n) := 1+1/n). -- Xorg (talk) 02:50, 12 March 2021 (UTC)

Agreed. F is not differentiable due to median. For arbitrary R(n) such as R(n)=1+1/n then limit R(Fn) != 0, however I do not define R(n) arbitrarily but define it as R(n)=max(An,Bn,Cn)-min(An,Bn,Cn) Ramakarl

User:snark This has nothing to do with a PDE or the heat equation. It is an iterative map from R^3 to R^3 (after the first application of F). In order to prove it converges you need to show that there is a fixed point and that the mapping takes you closer to it. The fixed points are easy since F((x,x,x))=(x,x,x) so there is a line of fixed points. You can then calculate the perpendicular distance between the starting point (x1,x2,x3) and the line given by (x(t),x(t),x(t)). Next you calculate the distance between f((x1,x2,x3)) and the line and show that is is less than the first distance.

Agreed. I've removed reference to PDE or Heat equation except for the indirect similarity. The new suggestion for proof, which is incomplete, is inductive based on the observed alternation of the min/max between the geomean and arithmean. I believe it can be shown that for some N, there exists F(N)=k, where k=min(N)=max(N)=arithmean(N)=geomean(N)=median(N) within some epsilon, and k is the fixed point. This is because the min(n+1), median(n+1) and max(n+1) alternate between arithmean(n) and geomean(n), which are strictly inside the open interval ( min(n), max(n) ). Ramakarl

I believe we can produce a simpler, rigorous proof. Assuming a set of three is given, we can show that after every 2 iterations, the range is reduced by at least 1/3 of its original value, and therefore it converges exponentially to 0. We use the fact that each iteration, none of the three values will lie outside the range of the previous iteration. In addition, it can be shown that the arithmean lies at least 1/3 of the previous range away from the highest and lowest values of the previous iteration.

If the arithmean is the highest or lowest value on the first iteration, then the range will therefore already be small enough (and won't get bigger in the second iteration.) Otherwise, the only remaining option is that it is the middle (median) value. So on the second iteration, both the median and the arithmean are within the reduced 1/3 range, and at least one of them must be the highest or lowest value. The range will always be the required size.

Edit: Note that this proof holds only assuming the values are nonnegative. Some sets of values including negative ones, such as gmdn(-4,-4,1), do not converge. 141.101.98.16

Why is this funny?

Wow, paragraphs and paragraphs of explanation, and calculations, and computer code describing everything about the XKCD comic. I am impressed with how much people know. After all that explanation, can anyone tell me if there is anything comical about this comic? Aside from the fact that Randal is combining formulas that don't usually get combined, is there anything here that strikes anyone as funny? The previous one about people asking absurd questions about what they could do after they are vaccinated had me laughing out loud. Can anyone tell me that they laughed at this comic and what was funny? Thanks. Rtanenbaum (talk) 01:56, 12 March 2021 (UTC)

YMMV, but I found it funny because I just spent the last fortnight teaching how to find mean (and median, and quartiles for that matter) to 15/16yrolds. And they found that hard enough. I did not inform them of Geometric mean. I guess it's funny to me because it's such a long reach. Thisfox (talk) 02:48, 12 March 2021 (UTC)

No, the joke is quite clearly explained in the text below the formula: "Pro Tip: If in doubt just mash them together". Elektrizikekswerk (talk) 07:53, 12 March 2021 (UTC)

As I'm currently supposed to be working someone else should please add this with a proper formulation. I just re-added the incomplete tag. Elektrizikekswerk (talk) 08:00, 12 March 2021 (UTC)

@Rtanenbaum funny you should give previous comic as example, as it is funny for exactly the same reason: using absurdity. The only difference is *most* people will get it why it is absurd to ride bicycle down the stairs in someones house (while it is OK to use bike outside, and it is OK to visit if you're vaccinated and thus use the stairs in someones house, BUT it is combining those unrelated activities that is absurd). Same thing here, only it requires some math background: using median has its uses, as does using geometric and arithmetic means, but it is combining them in this fashion that is absurd. And especially the recommendation to "mash mathematical functions you obviously don't understand as substitute to choosing correct one" is absurd. It is like you don't know you have to ADD prices of items on your receipt to calculate the total, so someone recommended you to use some random combination of mathematical operations to calculate the total. (with a added twist that suggested combination would return some result which is not far off the calculation). In addition, the fact that some people do not understand why it is funny (so might take such absurd recommendation seriously) makes it even more funny.--172.68.221.46 09:49, 12 March 2021 (UTC)

Apparantly someone deleted the tag again without giving a further explanation... I will undo this change. Elektrizikekswerk (talk) 09:58, 12 March 2021 (UTC)

'Twas not I, but note that this is 'explainxkcd' not 'explainwhyxkcdisfunny'. I think we both recognise that a cornucopia of details have been explained. It is even funnier to see someone insisting we continue to dissect the frog, but I'm not sure I need to fully explain that. ;) 162.158.159.108 15:16, 12 March 2021 (UTC)

Yes, here's a bit more on that.. I agree with Elektrizikekswerk the joke is explained. The stat tip: "If you aren't sure whether to use the mean, median or geometric mean, just calculate all three, then repeat until it converges." is funny because there are many situations in the physical sciences where the arthmean, geometric mean and median for some data are different values. It is perhaps common that scientists not well versed in statistics are unsure which to use. The funny bit is imagining this less-statistically-versed-scientist throwing up their hands and just accepting the fixed constant given by iterating GMDN as the 'answer' irrelevant of any physical meaning. Also the name "geothmetic meandian" is funny because the word meandian is similar to both median, which it uses, and to meander which is indicated by the alternate assignment of the median on each iteration -- informally, this function meanders. Ramakarl

Thanks to all who shared the absurdity so I could also enjoy the joke, and the joke is on me for needing to have a joke "explained". Now where is the button for me to give credit to the best answer? I want to be sure you get points toward your next-level badge. ;-) Rtanenbaum (talk) 16:23, 13 March 2021 (UTC)

This is actually one of my favorite xkcd comics. Even before seeing this comic, one thing I would always do was take the AM(AM,GM) instead of using either average on it's own. nullcline (talk) 17:33, 20 July 2021 (UTC)

PyPi-package

I made a PyPi-package, if anyone (for some God-forsaken reason) want to use this without implementing it (ya lazy gits): https://pypi.org/project/GMDN/ BollaBerg (talk) 15:03, 15 March 2021 (UTC)

Proof - extension to negative values

As others have proven, the geothmetic meandian converges for a,b,c > 0.

Case a==0, b,c>=0 (and permutations):

then the geometric mean will always be 0. All derivations for a,b,c > 0 still apply, thus the set always converges to 0.

Case a,b,c <= 0:

arithmean(a,b,c) == -arithmean(-a,-b,-c); the same holds for the median and the geometric mean. Thus convergence is given, with the absolute value of the result the same as for positive inputs.

Case a<0, b,c>0, Case a,b<0, c>0

the geometric mean is negative iff 1 or 3 inputs are negative.

the median is negative iff 2 or 3 inputs are negative.

thus, any mixed sign input will yield a mixed sign output.

thus, if the geothmetic meandian converges for mixed-sign inputs, the limit has to be +-0.

unfortunately, the proof by 141.101.98.16 does not hold here unmodified: it relies on the next set always being within the range of the previous set. This is not true for the geometric mean with mixed-sign inputs. E.g. geommean(-1,2,4)=-4.

However, what does hold is that geometric mean is always within the extended range of +- max(abs(inputs)).

this is not yet sufficient proof.

Xlf (talk) 18:05, 15 March 2021 (UTC)

Given M_n := max(abs(inputs_n)) and min_n < 0 < max_n, we still have abs(AM_n) ≤ M_n*2/3 (by dropping a summand with different sign from AM_n), abs(AM_n+1) ≤ M_n+1*2/3 ≤ M_n*2/3 and abs(mid_n+1) ≤ abs(AM_n) ≤ M_n*2/3, because AM_n shares a sign with one of the others. abs(GM_n+1) = abs(GM_n*mid_n*AM_n)^(1/3) ≤ M_n*(1*1*2/3)^(1/3) ≤ M_n*(2/3)^(1/3). Consequently M_n+2/M_n ≤ (2/3)^(1/3), which shows convergence to 0. □ -- Xorg (talk) 16:43, 31 March 2021 (UTC)

Edit: Deleted comment. Sorry for the accidental spam. ^{)|(}Quill _{)|(} 14:54, 25 March 2021 (UTC)

The joke is still not explained, as far as I can tell. Maybe it's too obvious but since people asked (see three sections above in the comments) I think the explanation of the actual joke shoudl be added. As I too find it quite obvious what the joke is I find it hard to really explain it which is the reason why I didn't. Besides of that, the incomplete tag actually says why it's incomplete (in this case). AND there is already a comment section (again, three sections above) about this problem. Elektrizikekswerk (talk) 07:41, 26 March 2021 (UTC)

Sorry. That wasn't a thing when I added this section.

can someone help me with my python code of this? i dont get why this isnt working! https://replit.com/@Bumpf/geothmetic for my code. if you want i can paste my code here. i also commented in the code the error and why i dont understand it. thank you in advance! (also the geometric formula thing in the table on the page is showing a big red error. Maybe fix that?) 172.70.38.56 13:35, 26 March 2021 (UTC)Bumpf

Since it uses F(F(F...)) someone should write a Hascal version that actually uses the fixedpoint function. 172.69.35.175 22:56, 30 March 2021 (UTC)

Short proof that the thing converges. Basically there are three cases, one is that the median is larger than both one is that it is smaller than both and one is that it is in between. If it is in between, both he max must decrease and the min must increase. If it is greater, then the minimum must increase. If it is less, the maximum must decrease. In all three cases max(F(n+1))-min(F(n+1))<max(F(n))-min(F(n)) so by monotonic convergence theorem it must converge. 172.70.206.163 02:40, 1 April 2022 (UTC)

For a start, there is a syntax error. After the first application of F, you get a 3-tuple. Subsequent iterations preserve the 3-tuple, and we need to analyze the resulting sequence. Perhaps there is an implicit claim all three entries converge to the same result. In any case, lets see what we get:

Wlog, we have three inputs (x_1,y_1,z_1), and want to understand the iterates of the map F(x,y,z) = ( (x+y+z)/3, cube root of (xyz), median(x,y,z) ). Lets write F(x_n,y_n,z_n) = (x_{n+1},y_{n+1},z_{n+1}).

The inequality of arithmetic and geometric means gives x_n \geq y_n, if n \geq 2, and

@@ Line 9: / Line 9: @@
 ==Explanation==
 {{incomplete|Created by an AVERAGE JOE. Please mention here why this explanation isn't complete. Do NOT delete this tag too soon.}}
+There are a number of different ways to identify the 'average' value of a series of values, the most common unweighted methods being the median (take the central value from the ordered list of values - or the value half-way between the two that straddle the divide between two halves) and the arithmetic mean (add all the numbers up, divide by the number of numbers). The geometric mean is less well known to the layman but works with multiplication and Nth rooting, useful for some statistical analyses.
+Outliers and internal biases within the data can make boiling down a set of sample values into a single 'average' sometimes quite misleading, with your choice of average to use perhaps resulting in a misleading value.
+In this depiction, the three named methods of averaging are embedded within a single function that produces a set of three values - one output for each of the methods. Being a series of values, Randall suggests that this is ideally suited to being ''itself'' subjected to the comparative 'averaging' method. Not just once, but as many times as it takes to narrow down to a consistent value (or trio of similarly consistent values?).
+The comment in the Title Text suggests that this will save you the trouble of committing to the 'wrong' analysis as it gradually shaves down any 'outlier average'. At first sight it does appear to be a method without any danger of (permanent) divergence of values, but this may need to be proven (or disproven) to be sure.
+In some ways this philosophy may be considered similar to the Truncated Means (extremities of the value range, e.g. the highest and lowest 10%s, are ignored as not acceptible and not counted) or Winsorised Means (instead of ignored, the values are readjusted to be the chosen floor/ceiling values that they lay beyond, to still effectively be counted as 'edge' conditions), only with a strange dilution-and-compromise method rather than one where quantities can be culled or neutered just for being unexpectedly different from most of the other data.
 ==Transcript==

Difference between revisions of "2435: Geothmetic Meandian"

Revision as of 01:57, 11 March 2021

Explanation

Transcript

Discussion

Proof of convergence

Better Python implementations

Sloppy notation?

Is the arithmetic-geometric mean connected to geometry?

The RandallMunroe Set

Proof - Possibly by Induction

Why is this funny?

PyPi-package

Proof - extension to negative values

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

Ads