Editing 2059: Modified Bayes' Theorem

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 8: Line 8:
  
 
==Explanation==
 
==Explanation==
{{w|Bayes' Theorem}} is an equation in {{w|statistics}} that gives the probability of a given hypothesis accounting not only for a single experiment or observation but also for your existing knowledge about the hypothesis, i.e. its prior probability. Randall's modified form of the equation also purports to account for the probability that you are indeed applying Bayes' Theorem itself correctly by including that as a term in the equation.
+
{{incomplete|When using the Math-syntax please also care for a proper layout. Please edit the explanation below and only mention here why it isn't complete. Do NOT delete this tag too soon.}}
 +
{{w|Bayes' Theorem}} is an equation in statistics that gives the probability of a given hypothesis accounting not only for a single experiment or observation but also for your existing knowledge about the hypothesis, i.e. its prior probability. Randall's modified form of the equation also purports to account for the probability that you are indeed applying Bayes' Theorem itself correctly by including that as a term in the equation.
  
 
Bayes' theorem is:
 
Bayes' theorem is:
Line 19: Line 20:
 
The purpose of Bayesian inference is to discover something we want to know (how likely is it that our explanation is correct given the evidence we've seen) by mathematically expressing it in terms of things we can find out: how likely are our observations, how likely is our hypothesis ''a priori'', and how likely are we to see the observations we've seen assuming our hypothesis is true. A Bayesian learning system will iterate over available observations, each time using the likelihood of new observations to update its priors (beliefs) with the hope that, after seeing enough data points, the prior and posterior will converge to a single model.
 
The purpose of Bayesian inference is to discover something we want to know (how likely is it that our explanation is correct given the evidence we've seen) by mathematically expressing it in terms of things we can find out: how likely are our observations, how likely is our hypothesis ''a priori'', and how likely are we to see the observations we've seen assuming our hypothesis is true. A Bayesian learning system will iterate over available observations, each time using the likelihood of new observations to update its priors (beliefs) with the hope that, after seeing enough data points, the prior and posterior will converge to a single model.
  
The probability always has a value between zero and one, the latter value represents a 100% probability. Both extremes would be:
+
If <math>P(C)=1</math> the modified theorem reverts to the original Bayes' theorem (which makes sense, as a probability one would mean certainty that you are using Bayes' theorem correctly).
*If ''P(C)=1'' the modified theorem reverts to the original Bayes' theorem (which makes sense, as a probability one would mean certainty that you are using Bayes' theorem correctly).
 
*If ''P(C)=0'' the modified theorem becomes ''P(H | X) = P(H)'', which says that the belief in your hypothesis is not affected by the result of the observation.
 
  
It is a {{w|Linear interpolation|linear-interpolated}} weighted average of the belief from before the calculation and the belief after applying the theorem correctly. This goes smoothly from not believing the calculation at all up to be fully convinced to it.
+
If <math>P(C)=0</math> the modified theorem becomes <math>P(H \mid X) = P(H)</math>, which says that the belief in your hypothesis is not affected by the result of the observation (which makes sense because you're certain you're misapplying the theorem so the outcome of the calculation shouldn't affect your belief.)
  
Bayesian statistics is often contrasted with "frequentist" statistics. For a frequentist, ''probability'' is defined as the limit of the relative frequency after a large number of trials. So to a frequentist the notion of "Probability that you are using Bayesian Statistics correctly" is meaningless: One cannot do repeated trials, even in principle. A Bayesian considers probability to be a quantification of personal belief, and so concepts such as "Probability that you are using Bayesian Statistics correctly" is meaningful. However since the value of such subjective prior probablities cannot be independently determined, the value of P(H|X) cannot be objectively found.
+
This happens because the modified theorem can be rewritten as: <math>P(H \mid X) = (1-P(C))\,P(H) + P(C)\,\frac{P(X \mid H)\,P(H)}{P(X)}</math>. This is the {{w|Linear interpolation|linear-interpolated}} weighted average of the belief you had before the calculation and the belief you would have if you applied the theorem correctly. This goes smoothly from not believing your calculation at all (keeping the same belief as before) if <math>P(C)=0</math> to changing your belief exactly as Bayes' theorem suggests if <math>P(C)=1</math>. (Note that <math>1-P(C)</math> is the probability that you are using the theorem incorrectly.)
  
The title text suggests that an additional term should be added for the probability that the Modified Bayes Theorem is correct. But that's ''this'' equation, so it would make the formula self-referential, unless we call the result the Modified Modified Bayes Theorem. It could also result in an infinite regress -- needing another term for the probability that the version with the probability added is correct, and another term for that version, and so on. If the modifications have a limit, then a Modified<sup>&omega;</sup> Bayes Theorem would be the result, but then another term for whether it's correct is needed, leading to the Modified<sup>&omega;+1</sup> Bayes Theorem, and so on through every {{w|ordinal number}}.
+
The title text suggests that an additional term should be added for the probability that the Modified Bayes Theorem is correct. But that's ''this'' equation, so it would make the formula self-referential, unless we call the result the Modified Modified Bayes Theorem (or Modified<sup>2</sup>). It could also result in an infinite regress -- we'd need another term for the probability that the version with the probability added is correct, and another term for that version, and so on. If the modifications have a limit, then we can make that the Modified<sup>&omega;</sup> Bayes Theorem, but then we need another term for whether we did ''that'' correctly, leading to the Modified<sup>&omega;+1</sup> Bayes Theorem, and so on through every {{w|ordinal number}}. It's also unclear what the point of using an equation we're not sure of is (although sometimes we can: {{w|Newton's Laws}} are not as correct as Einstein's {{w|Theory of Relativity}} but they're a reasonable approximation in most circumstances. Alternatively, ask any student taking a difficult exam with a formula sheet.).
  
Modified theories are often suggested in science when the measurements doesn't fit the original theory. An example is the {{w|Modified Newtonian dynamics}} theory, among many others, in which some physicists try to explain dark matter with not much success.
+
If we denote the probability that the Modified<sup>n</sup> Bayes' Theorem is correct by <math>P(C_n)</math>, then one way to define this sequence of modified Bayes' theorems is by the rule <math>P_n(H \mid X) := P_{n-1}(H \mid X) P(C_n) + (1-P(C_n))P(H)</math>
 +
 
 +
One can then show by induction that <math>P_n(H \mid X) = \prod_{i=1}^n P(C_i)\left(\frac{P(X \mid H)}{P(X)} - P(H) \right) + P(H).</math>
 +
 
 +
If we assume that one doubts each step infinitely (that is, <math>P(C_i)<1</math> for any <math>i</math>), then we can calculate a limit.
 +
 
 +
Without changing a limit, we can assume that <math>P(C_i) = p</math> where <math>0<p<1</math>. Then we can write a limit as:
 +
 
 +
<math>\lim_{n \to \infty} P_n(H \mid X) =\lim_{n \to \infty} \prod_{i=1}^n p\left(\frac{P(X \mid H)}{P(X)} - P(H) \right) + P(H) =</math>
 +
 
 +
<math>=\lim_{n \to \infty} p^n\left(\frac{P(X \mid H)}{P(X)} - P(H)\right) ^n + P(H)=</math>
 +
 
 +
<math>=\lim_{n \to \infty} p^n q^n + P(H)</math>
 +
 
 +
where both <math>|p|<1</math> and <math>|q|<1</math>, so
 +
 
 +
<math>\lim_{n \to \infty} P_n(H \mid X) = 0 + P(H) = P(H)</math>
 +
 
 +
This may interpreted as an universal truth that we have to trust ''something'' eventually, otherwise everything boils down to an unconditional belief.
  
 
==Transcript==
 
==Transcript==

Please note that all contributions to explain xkcd may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see explain xkcd:Copyrights for details). Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel | Editing help (opens in new window)