Editing 2059: Modified Bayes' Theorem

{{comic
| number    = 2059
| date      = October 15, 2018
| title     = Modified Bayes' Theorem
| image     = modified_bayes_theorem.png
| titletext = Don't forget to add another term for "probability that the Modified Bayes' Theorem is correct."
}}

==Explanation==
{{incomplete|When using the Math-syntax please also care for a proper layout. Please edit the explanation below and only mention here why it isn't complete. Do NOT delete this tag too soon.}}
{{w|Bayes' Theorem}} is an equation in statistics that gives the probability of a given hypothesis accounting not only for a single experiment or observation but also for your existing knowledge about the hypothesis, i.e. its prior probability. Randall's modified form of the equation also purports to account for the probability that you are indeed applying Bayes' Theorem itself correctly by including that as a term in the equation.

Bayes' theorem is:

<math>P(H \mid X) = \frac{P(X \mid H) \, P(H)}{P(X)}</math>,
where
*<math>P(H \mid X)</math> is the probability that <math>H</math>, the hypothesis, is true given observation <math>X</math>. This is called the ''posterior probability''.
*<math>P(X \mid H)</math> is the probability that observation <math>X</math> will appear given the truth of hypothesis <math>H</math>. This term is often called the ''likelihood''.
*<math>P(H)</math> is the probability that hypothesis <math>H</math> is true before any observations. This is called the ''prior'', or ''belief''.
*<math>P(X)</math> is the probability of the observation <math>X</math> regardless of any hypothesis might have produced it. This term is called the ''marginal likelihood''.

The purpose of Bayesian inference is to discover something we want to know (how likely is it that our explanation is correct given the evidence we've seen) by mathematically expressing it in terms of things we can find out: how likely are our observations, how likely is our hypothesis ''a priori'', and how likely are we to see the observations we've seen assuming our hypothesis is true. A Bayesian learning system will iterate over available observations, each time using the likelihood of new observations to update its priors (beliefs) with the hope that, after seeing enough data points, the prior and posterior will converge to a single model.

If <math>P(C)=1</math> the modified theorem reverts to the original Bayes' theorem (which makes sense, as a probability one would mean certainty that you are using Bayes' theorem correctly).

If <math>P(C)=0</math> the modified theorem becomes <math>P(H \mid X) = P(H)</math>, which says that the belief in your hypothesis is not affected by the result of the observation (which makes sense because you're certain you're misapplying the theorem so the outcome of the calculation shouldn't affect your belief.)

This happens because, if you apply the original theorem, the modified theorem can be rewritten as: <math>P(H \mid X) = P(H)(1-P(C)) + P(H \mid X)P(C)</math>. This is the {{w|Linear interpolation|linear-interpolated}} weighted average of the belief you had before the calculation and the belief you would have if you applied the theorem correctly. This goes smoothly from the not believing your calculation at all, keeping the same belief as before if <math>P(C)=0</math> to changing your belief exactly as Bayes' theorem suggests when <math>P(C)=1</math>. 

<math>1-P(C)</math> is the probability that you are using the theorem incorrectly.

As an equation, the rewritten form makes no sense. <math>P(H \mid X) = P(H)(1-P(C)) + P(H \mid X)P(C)</math> is strangely self-referential and reduces to the piecewise equation <math>\begin{cases}P(H \mid X) = P(H) & P(C) \neq 1 \\ 0 = 0 & P(C) = 1 \end{cases}</math>. However, the Modified Bayes Theorem includes an extra variable not listed in the conditioning, so a person with an AI background might understand that Randal was trying to write an expression for updating <math>P(H \mid X)</math> with knowledge of <math>C</math> i.e. <math>P(H \mid X,C)</math>, the belief in the hypothesis given the observation <math>X</math> and the confidence that you were applying Bayes' theorem correctly <math>C</math>, for which the expression <math>P(H \mid X,C) = P(H)(1-P(C)) + P(H \mid X)P(C)</math> makes some intuitive sense.

The title text suggests that an additional term should be added for the probability that the Modified Bayes Theorem is correct. But that's *this* equation, so it would make the formula self-referential. It could also result in an infinite regress -- we'd need another term for the probability that the version with the probability added is correct, and another term for that version, and so on. It's also unclear what the point of using an equation we're not sure of is (although sometimes we can: {{w|Newton's Laws}} are not as correct as Einstein's {{w|Theory of Relativity}} but they're a reasonable approximation in most circumstances}.

==Transcript==
{{incomplete transcript|Do NOT delete this tag too soon.}}

:Modified Bayes' theorem:

:P(H|X) = P(H) × (1 + P(C) × ( P(X|H)/P(X) - 1 ))

:H: Hypothesis
:X: Observation
:P(H): Prior probability that H is true
:P(X): Prior probability of observing X
:P(C): Probability that you're using Bayesian statistics correctly

{{comic discussion}}

[[Category:Statistics]]