Ah - now this is interesting. We have strong data evidence of an unfair coin (since we generated the data we know it is unfair with p=.8), but our prior beliefs are telling us that coins are fair. How do we deal with this?
Bayes theorem is what allows us to go from our sampling and prior distributions to our posterior distribution. The **posterior distribution is the $P(\theta | X)$**. Or in English, the probability of our parameters given our data. And if you think about it that is what we really want. We are typically given our data - from maybe a survey or web traffic - and we want to figure out what parameters are most likely given our data. So how do we get to this posterior distribution? Here comes some math (don’t worry it is not too bad): |
By definition, we know that (If you don’t believe me, check out this page for a refresher):
-
$P(A B) = \dfrac{P(A,B)}{P(B)}$. Or in English, the probability of seeing A given B is the probability of seeing them both divided by the probability of B. -
$P(B A) = \dfrac{P(A,B)}{P(A)}$. Or in English, the probability of seeing B given A is the probability of seeing them both divided by the probability of A.
You will notice that both of these values share the same numerator, so:
-
$P(A,B) = P(A B)*P(B)$ -
$P(A,B) = P(B A)*P(A)$
Thus:
$P(A | B)*P(B) = P(B | A)*P(A)$ |
Which implies:
$P(A | B) = \dfrac{P(B | A)*P(A)}{P(B)}$ |
And plug in $\theta$ for $A$ and $X$ for $B$:
$P(\theta | X) = \dfrac{P(X | \theta)*P(\theta)}{P(X)}$ |
Nice! Now we can plug in some terminology we know:
$Posterior = \dfrac{likelihood * prior}{P(X)}$
But what is the $P(X)$? Or in English, the probability of our data? That sounds wierd… Let’s go back to some math and use $B$ and $A$ again:
We know that $P(B) = \sum_{A} P(A,B)$ (check out this page for a refresher)
And from our definitions above, we know that:
$P(A,B) = P(B | A)*P(A)$ |
Thus:
$P(B) = \sum_{A} P(B | A)*P(A)$ |
Plug in our $\theta$ and $X$:
$P(X) = \sum_{\theta} P(X | \theta)*P(\theta)$ |
Plug in our terminology:
$P(X) = \sum_{\theta} likelihood * prior$
Wow! Isn’t that awesome! But what do we mean by $\sum_{\theta}$. This means to sum over all the values of our parameters. In our coin flip example, we defined 100 values for our parameter p, so we would have to calculated the likelihood * prior for each of these values and sum all those anwers. That is our denominator for Bayes Theorem. Thus our final answer for Bayes is:
$Posterior = \dfrac{likelihood * prior}{\sum_{\theta} likelihood * prior}$
That was a lot of text. Let’s do some more coding and put everything together.