Difference between revisions of "Talk:2435: Geothmetic Meandian"

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
(Why is this funny?)
(Why is this funny?)
Line 284: Line 284:
 
:YMMV, but I found it funny because I just spent the last fortnight teaching how to find mean (and median, and quartiles for that matter) to 15/16yrolds. And they found that hard enough. I did not inform them of Geometric mean. I guess it's funny because it's such a long reach. [[User:Thisfox|Thisfox]] ([[User talk:Thisfox|talk]]) 02:48, 12 March 2021 (UTC)
 
:YMMV, but I found it funny because I just spent the last fortnight teaching how to find mean (and median, and quartiles for that matter) to 15/16yrolds. And they found that hard enough. I did not inform them of Geometric mean. I guess it's funny because it's such a long reach. [[User:Thisfox|Thisfox]] ([[User talk:Thisfox|talk]]) 02:48, 12 March 2021 (UTC)
 
::No, the joke is quite clearly explained in the text below the formula: "Pro Tip: If in doubt just mash them together". [[User:Elektrizikekswerk|Elektrizikekswerk]] ([[User talk:Elektrizikekswerk|talk]]) 07:53, 12 March 2021 (UTC)
 
::No, the joke is quite clearly explained in the text below the formula: "Pro Tip: If in doubt just mash them together". [[User:Elektrizikekswerk|Elektrizikekswerk]] ([[User talk:Elektrizikekswerk|talk]]) 07:53, 12 March 2021 (UTC)
::As I'm currently supposed to be working so someone else should add this with a proper formulation. I just re-added the incomplete tag. [[User:Elektrizikekswerk|Elektrizikekswerk]] ([[User talk:Elektrizikekswerk|talk]]) 08:00, 12 March 2021 (UTC)
+
::As I'm currently supposed to be working someone else should please add this with a proper formulation. I just re-added the incomplete tag. [[User:Elektrizikekswerk|Elektrizikekswerk]] ([[User talk:Elektrizikekswerk|talk]]) 08:00, 12 March 2021 (UTC)

Revision as of 08:05, 12 March 2021

Oh, this one's good. Just checked in (no, I wasn't hovering over the refresh button, my first visit today!) and one glance had me in paroxysms of laughter. But how to explain it? Gonna have to think about that. 141.101.98.96 01:12, 11 March 2021 (UTC)

I made a really bad spreadsheet to understand better how it works: https://docs.google.com/spreadsheets/d/1fqmHwDmirJrsKPdf94PutFDw31DMAYxNeR7jef1jneE/edit?usp=sharing

Someone fix my awful transcript edits please. --Char Latte49 (talk) 02:31, 11 March 2021 (UTC)

Seeing the Python added to the Explanation, try this Perl (typed straight here, so not tested)...

## Your prefered variations of "#!/usr/bin/perl", "use strict;" and "use warnings;" here! ##
sub F { my (@vals)=@_; my $invVals=1/int(@vals);
 my ($geo,$arith,$med)=(1); # Only defining $geo, so first *= works correctly!
 while (@vals) { my($lo,$hi)=(shift @vals,pop @vals); # $hi may be undef - this is intended!
  $arith+=$lo; $geo*=$lo; unless (defined $hi) {  $med =  $lo;     last }
  $arith+=$hi; $geo*=$hi; unless (@vals)       { ($med)=F($lo,$hi)      }
 }
 return ($arith*$invVals, $geo**$invVals, $med);
}
sub GMDN { my (@vals)=sort @_; my $lim=10**(-5); # Adjust $lim to taste...
  return "Error: No vals!" unless  @vals; # Catch!
  return $vals[0]          unless ($vals[$#vals]-$vals[0]) > $lim;
  return GMDM(F(@vals));
}
my @test=(1,1,2,3,5);
print "Values:              @test\nGeothmetic Meandian: ".GMDN(@test)."\n";

...debugged in my head, so probably fatally flawed but easily fixed/adapted anyway. 141.101.99.109 03:04, 11 March 2021 (UTC)

Why so complicated?

perl -e 'use strict; use warnings; sub F { my ($s,$p) = (0,1); my @srt = sort {$a<=>$b} @_; for (@_) { $s += $_; $p *= $_; } return ($s/@_,$p**(1/@_),$srt[$#_/2]); } sub Gmdn { print join(", ",@_=F(@_)),"\n" for 0..20; return @_; } print join(", ",Gmdn(1,1,2,3,5)),"\n";'

(With interim results) SCNR -- Xorg (talk) 03:18, 11 March 2021 (UTC)

I can read your version (and I see you do explicit {$a<=>$b}, which indeed may be necessary in mine for real use, along with additional sanity checks, I will check later) but I wanted to make mine neat, and slightly tricksy in implementation, but still not quite so entirely obfuscated to the more uninitiated. TIMTOWTDI, etc, so I like your (almost) bare-bones version too. ;)
(Is 20 cycles enough to converge in sufficiently extreme cases? Won't give "Too deep" error, though, even if it takes at least that long. There's a definite risk that mine might, as written.) 141.101.99.229 03:45, 11 March 2021 (UTC)
Given the lack of precision in Randall's example usage, I think 20 cycles ought to be enough for everyone ;-P. I'm trying to prove that the interval's size has to shrink by somewhat close to a factor of 1/2 every cycle, but it's tricky and it's late. If I can assume a factor of 1/2 in the long run, 64 iterations should pin down a 64-bit float.
I actually didn't try to obfuscate, I was just too lazy to type more ;-). Otherwise I might have left out the "return"s and passing parameters at all. -- Xorg (talk) 04:21, 11 March 2021 (UTC)
I find the one-liner more readable: it's straightforward and pretty minimal. For what its worth, here's my version:
perl -MList::Util=sum,product -E 'sub F { (sum @_)/@_, (product @_)**(1/@_), (sort { $a <=> $b } @_)[$#_/2] } $, = " "; say @v = @ARGV; say @v = F(@v) for 1..30' 1 1 2 3 5
30 iterations is enough for the numbers to display identically on this system (to 14 decimal places). I think it's even cleaner in Raku (formerly Perl 6):
raku -e 'sub F(@d) { @d.sum/@d, [*](@d)**(1/@d), @d.sort[@d/2] }; say my @v = +«@*ARGS; say @v = F(@v) for 1..33' 1 1 2 3 5
On this system, Rakudo yields an additional decimal place, which takes another 3 iterations to converge. Smylers (talk) 06:53, 11 March 2021 (UTC)

Side-thought: is GMDN (nowhere near as logical an ETLA contraction of the title term as, say, 'GMMD' or 'GTMD') actually an oblique reference to the GNDNs as popularised/coined by Trek canon? Worth a citation/Trivia? 162.158.158.97 04:12, 11 March 2021 (UTC)

Besides of nerdgasm is there some reason why the program code is relevant for the explanation? Elektrizikekswerk (talk) 08:55, 11 March 2021 (UTC)

Apparently not. I moved it to the trivia section. Elektrizikekswerk (talk) 07:51, 12 March 2021 (UTC)

Proof of convergence

Can any of you come up with a mathematical proof that repeated application of F on a set of (say) positive real numbers is guaranteed to converge toward a single real number, i.e. that the GMDN of a set of positive real numbers is well-defined?

One observation I've made is that if you consider that maximum and minimum numbers in the original set to be x1 and xn (without loss of generality), something we know for sure is that AM(x1, ..., xn), GM(x1, ..., xn) and Median(x1, ..., xn) are all at least x1 and at most xn that is to say...

x1 <= AM(x1, ..., xn), GM(x1, ..., xn), Median(x1, ..., xn) <= xn

So range(AM(x1, ..., xn), GM(x1, ..., xn), Median(x1, ..., xn)) is necessarily <= range(x1, ..., xn).

And given that we know that unless x1, ..., xn are all equal, that x1 < AM(x1, ..., xn) < xn, we have an even stricter result (unless x1, ..., xn are all equal) that is range(AM(x1, ..., xn), GM(x1, ..., xn), Median(x1, ..., xn)) < range(x1, ..., xn).

So, it's clear that range(x1, ..., xn) > range(F(x1, ..., xn)) > range(F(F(x1, ..., xn))) > range(F(F(F(x1, ..., xn)))) > ... and it's also clear that all of these ranges are >= 0. There is a result in number theory that says that any infinite sequence of real numbers which monotonically decreases and is bounded from below converges.

So we know for sure that range(F(F(...F(x1, ..., xn)...))) converges but we still have to show that it converges to 0 to show that the GMDN converges to a single real number.

I'm not sure how to proceed. Does anyone have any ideas?

EDIT: I just noticed that unless x1, ..., xn are all equal, AM(x1, ..., xn) is at least ((n-1)/n) * range(x1, ..., xn) away from both x1 and xn. So not only do we have that range(x1, ..., xn) > range(F(x1, ..., xn)) from before, but we also have that ((n-1)/n) * range(x1, ..., xn) >= range(F(x1, ..., xn)). This guarantees that that the range falls exponentially on repeated applications of F. So it's certain that the the range ultimately converges to 0, and hence that the GMDN is well-defined.

It might be a good idea for someone to concretely present this idea as a proof on Page.

See my additional notes below. -Ramakarl

172.69.135.44 05:07, 11 March 2021 (UTC) Anirudh Ajith

That doesn't quite work as it stands, since proving AM is that distance away does not say anything about the other two averages. I think it's true, but a little more rigour is required. 141.101.98.120 09:17, 11 March 2021 (UTC)

When trying this myself I first arrived at 2.082, not 2.089. What threw me off was the incomplete formula for the median, which only works with sorted lists. The three values returned by F(...) aren't necessarily sorted. 141.101.76.194 09:49, 11 March 2021 (UTC)


First: almost all invocations are with exactly 3 arguments (The output of the previous invocation), so we don't have to deal with N inputs at all.
Notation: In iteration n we have the values min[n] <= mid[n] <= max[n] (in any order) and can compute AM[n], GM[n] (and median[n] = mid[n]).
Let Q[n] := max[n]/min[n] >= 1, R[n] := max[n]-min[n] = (Q[n]-1)*min[n].
We already established that R is decreasing and min is increasing, so Q is decreasing.

Theorem: There is an n0 with R[n+1] <= R[n]*2/3 for all n > n0.

Proof (by case discrimination for each n):
case 1: mid[n+1] != AM[n]:
    R[n+1] <= Max(max[n]-AM[n],AM[n]-min[n]) 
            = Max(max[n]*3-(max[n]+mid[n]+min[n]),(max[n]+mid[n]+min[n])-min[n]*3)/3
            = Max(max[n]*2-(mid[n]+min[n]),(max[n]+mid[n])-min[n]*2)/3
           <= (max[n]-min[n])*2/3
            = R[n]*2/3
    Hence: R[n+1] <= R[n]*2/3

case 2: mid[n+1] == AM[n]:
  because GM <= AM: min[n+1] = GM[n], max[n+1] = mid[n]
  Q[n+1] = mid[n]/GM[n]
         = (mid[n]^3/(max[n]*mid[n]*min[n]))^(1/3)
         = (mid[n]^2/(max[n]*min[n]))^(1/3)
        <= (mid[n]/min[n])^(1/3)
        <= Q[n]^(1/3)
  R[n+1] = (Q[n+1]-1)*min[n+1]
        <= (Q[n]^(1/3)-1)*GM[n]
        <= (Q[n]^(1/3)-1)*(max[n]^2*min[n])^(1/3)
         = (Q[n]^(1/3)-1)*Q[n]^(2/3)*min[n]
         = (Q[n]-Q[n]^(2/3))*min[n]
         = R[n]-(Q[n]^(2/3)-1)*min[n]
        <= R[n]-(Q[n]-1)*min[n]/(Q[n]^(1/3)+1))
         = R[n]-R[n]/(Q[n]^(1/3)+1)
         = R[n]*(1-1/(Q[n]^(1/3)+1))
  Now we can pick a q1 = Q(n1) with q1 > Q[n] >= 1 for n > n1 because Q is decreasing:
    R[n+1] <= R[n]*(1-1/(q1^(1/3)+1))
  
  Together with case 1, this gives R -> 0 and thus Q -> 1. So we can pick another q0 = Q(n0) with q0 <= 8:
    R[n+1] <= R[n]*(1-1/(q0^(1/3)+1)) <= R[n]*2/3

-- Xorg (talk) 17:34, 11 March 2021 (UTC)

Better Python implementations

I'd like to add a somewhat more compact Python implementation based on the numpy module.

import numpy as np

def F(x):
   return np.mean(x), np.exp(np.log(x).mean()), np.median(x)

def GMDN(x, tolerance=1e-6):
   while np.std(x) > tolerance:
       x = F(x)
   return x[0]

gmdn = GMDN([1, 1, 2, 3, 5])
print(gmdn)

--Lvdgraaff (talk) 10:42, 11 March 2021 (UTC)

No need for numpy, there's the statistics module in the stdlib

import math
import statistics

def F(*nums):
    return (
        statistics.mean(nums),
        statistics.geometric_mean(nums),
        statistics.median(nums),
    )

def GMDN(*nums):
    while not math.isclose(max(nums), min(nums)):
        nums = F(*nums)
    return nums[0]

gmdn = GMDN(1, 1, 2, 3, 5)
print(gmdn)
For something as simple as this, I always find it cheating to use a package to abstract away the few actually necessary calculations. You might as well use a DWIM module and do 'result = DWIM(input)' as the sole command. But that's me for you. I'd write my own direct-to-memory screen RAM accesses, if silly things like OS HALs and GPU acceleration (once you find a way to message them as directly as possible) hadn't long since made that pretty much moot, if not actually verboten... 141.101.99.109 17:53, 11 March 2021 (UTC)

Sloppy notation?

As a mathematician, I immediately noticed a couple of annoying niggles. Firstly, it is only implied, but never clearly stated, that the input list is ordered - which means the median is wrong unless ordered. Now F outputs an ordered triple of real numbers, and in calculating G, this is fed in to F again directly. This will frequently give inputs that are not in order, and in subsequent iterations the "median" will always be the middle number - i.e. the geometric mean - regardless of the actual median. Secondly, Randall's final line gives the output of G as a single number, but as it is just the result of a repeated application of F, the output of G should be an ordered triple. I'm sure Randall is aware of both, and chose to cut out the implied ordering of the inputs and choosing one of the three values as the output of G as they aren't necessary for the joke, but maybe we should note something about this in the explanation.141.101.99.109 13:07, 11 March 2021 (UTC)

First, I've never seen a definition of median which doesn't account for ordering itself, although I am a little annoyed at his definition for a different reason— that it doesn't account for even-length lists. Second, what I got from the comic initially is that GMDN is supposed to be a single number, specifically that value for which all three of its elements would become equal if implemented infinitely many times (and it will converge, because if the three elements are all the same it already has converged, and if at least two are different, both means will necessarily become greater than the least value and smaller than the greatest value due to the definition of 'mean'). Another annoyance I noted is that GMDN is real iff there are an even number of negative numbers and/or the length of the initial list is odd, but I suppose that can't be helped. Ooh, complex meandianing! BlackHat (talk) 15:15, 11 March 2021 (UTC)

Is the arithmetic-geometric mean connected to geometry?

In the case that only the arithmetic and geometric mean are used the combined arithmetic-geometric mean can be interpreted as the radius R of a circle which has the same circumference as an ellipse with half axes a and b.

R = M(a,b)

Can the Geothmetic Meandian be interpreted are the radius R of a 3D sphere which has the same surface as an elliptic cylinder with half-axes a and b and length c ?

R = GMDN(a,b,c)

The RandallMunroe Set

Here is some MATLAB code (sorry) to generate an image showing the number of cycles required to converge, a' la' the Mandelbrot Set. Not sure how to post an image here, but it is really cool.

   % RandallMunroeSet.m
   % From a suggestion by Randall Munroe in XKCD #2435 Mar 10 2021
   % new statistic GMDN(x) = [mean(x), geomean(x), median(x)]
   % calculation is recursive, ending when converged
   % here we count the cycles required to converge and plot a' la' Madelbrot Set
   % the initial X input can be any length vector, but we restrict to 3 space
   % here for visualization, and fix Z so we get a 2D image
   % so far, for positive values, it converges in less than 40 or not at all
   % for negative x, set max cycles to something larger like 60
   % I haven't plotted it, but there is logically another set that plots the
   % resulting converged value.
   % Explore!
   % (c)2021 CC BY-NC 2.5 [email protected] peace, love, trees
   % here we answer the question, how many cycles does it take for GMDN to
   % converge?
   maxcycles = 40; stepsize = .0025;
   z = 1; % pick a Z, any Z
   x = stepsize:stepsize:(4-stepsize); % explore a range of x and y
   y = x; 
   wbh = waitbar(0);
   RMS = zeros(numel(x),numel(y),numel(z)); % no, not root mean square, this is the Randall Munro Set!
   for idx = 1: numel(x)
       waitbar(idx ./ numel(x)); % feedback on progress
       for jdx = idx:numel(y) % result is symmetric across the diagonal, so we save time by computing above the diagonal
           for kdx = 1:numel(z)
               RMS(idx, jdx, kdx)  = gmdn([x(idx),y(jdx),z(kdx)], maxcycles);
               RMS(jdx, idx, kdx) = RMS(idx, jdx, kdx); % copy across the diagonal
           end
       end
   end
   close(wbh)
   RMS = min(maxcycles,RMS);


   if numel(z) == 1;
       figure(420);
       image(255*RMS./maxcycles);truesize; colormap(jet(256));
   end
   if numel(z) == 3; % allow for true color, but in practice it is so sensitive to initial z value it just gives three different sets unless the Zs are VERY CLOSE
       figure(420);image(RMS./maxcycles);truesize;
   end
   title(['RandallMunro Set Z = ' num2str(z)]);
   figure(3);hist(RMS(:),0:maxcycles);
   %% How many cycles to converge?
   function ncycles = gmdn(x, maxcycles)
   ncycles = 0;
   while ncycles < maxcycles
       ncycles = ncycles + 1;
       x = [mean(x), geomean(x), median3(x)];
       if all(x(1) == x(2:3))
           break
       end
   end
   end
   %% Geometric Mean
   function result = geomean(x)
   result = prod(x) .^ (1/numel(x));
   end
   %% Slightly faster median than builtin MATLAB function
   function result = median3(x)
   y = sort(x);
   result = y(2);
   end

Proof - This is connected to the Heat Equation

Earlier question: > Can any of you come up with a mathematical proof that repeated application of F on a set of (say) positive real numbers is guaranteed to converge toward a single real number

Yes. This triplet can be regarded as a partial differential equation with in a monotonically decreasing range that converges to a single value. We observe that all three of these always result in a value which is strictly less than the min and max. Thus the PDE equations are progressively bounded by a smaller range, ie. they converge in the same way that the heat equation converges. "Under broad assumptions, an initial/boundary-value problem for a linear parabolic PDE has a solution for all time. The solution .. is generally smoother than the initial data" (https://en.wikipedia.org/wiki/Parabolic_partial_differential_equation#Solution), which holds here. The range converges to zero because the values are real numbers.

Given the real 3-vector for Fn, the range of Fn+1 must be strictly less than the range of Fn:
R(Fn) = max(Fn)-min(Fn)
R(Fn+1) = max(ave(Fn),geomean(Fn),median(Fn))-min(ave(Fn),geomean(Fn),median(Fn))
max(ave(Fn),geomean(Fn),median(Fn)) < max(Fn) because ave(Fn),geomean(Fn),median(Fn) < max(Fn)
min(ave(Fn),geomean(Fn),median(Fn)) > min(Fn) because ave(Fn),geomean(Fn),median(Fn) > min(Fn)
therefore..
R(Fn+1) < R(Fn)|
limit R(Fn+1) = 0 as n approaches inf.
The range of Fn+1 approaches 0, ie. the series converges for positive inputs. (note the series is undefined for real negative inputs since the geomean uses the n-th root)

Ramakarl (talk) 00:00, 12 March 2021 (UTC)

How can this be formulated as a PDE when F isn't even differentiable?
Besides, R(Fn+1) < R(Fn) does not imply limit R(Fn) = 0 (Think R(n) := 1+1/n). -- Xorg (talk) 02:50, 12 March 2021 (UTC)

User:snark This has nothing to do with a PDE or the heat equation. It is an iterative map from R^3 to R^3 (after the first application of F). In order to prove it converges you need to show that there is a fixed point and that the mapping takes you closer to it. The fixed points are easy since F((x,x,x))=(x,x,x) so there is a line of fixed points. You can then calculate the perpendicular distance between the starting point (x1,x2,x3) and the line given by (x(t),x(t),x(t)). Next you calculate the distance between f((x1,x2,x3)) and the line and show that is is less than the first distance.

Why is this funny?

Wow, paragraphs and paragraphs of explanation, and calculations, and computer code describing everything about the XKCD comic. I am impressed with how much people know. After all that explanation, can anyone tell me if there is anything comical about this comic? Aside from the fact that Randal is combining formulas that don't usually get combined, is there anything here that strikes anyone as funny? The previous one about people asking absurd questions about what they could do after they are vaccinated had me laughing out loud. Can anyone tell me that they laughed at this comic and what was funny? Thanks. Rtanenbaum (talk) 01:56, 12 March 2021 (UTC)

YMMV, but I found it funny because I just spent the last fortnight teaching how to find mean (and median, and quartiles for that matter) to 15/16yrolds. And they found that hard enough. I did not inform them of Geometric mean. I guess it's funny because it's such a long reach. Thisfox (talk) 02:48, 12 March 2021 (UTC)
No, the joke is quite clearly explained in the text below the formula: "Pro Tip: If in doubt just mash them together". Elektrizikekswerk (talk) 07:53, 12 March 2021 (UTC)
As I'm currently supposed to be working someone else should please add this with a proper formulation. I just re-added the incomplete tag. Elektrizikekswerk (talk) 08:00, 12 March 2021 (UTC)