According to Wikipedia the law of total probability “expresses the total probability of an outcome which can be realized via several distinct events”. We can also think of this as the marginal probability: irrespsective of what road we took to get to this outcome, what is the total likelihood of the outcome occurring?
Example 1 – Piggy banks
Let’s do a very basic example: I have 3 different piggy banks, which may contain copper coins or silver coins or both – this is to give us a little light relief from urns, yes?
So basic probabilities are easy enough: the probability of choosing any given piggy bank will simply be 1/3. Within each piggy bank there are different numbers of copper and silver coins, for example the probability of picking a silver coin from the green piggy bank is simply 5/7.
But what if we want to mix it up a bit and ask what is the probability of drawing a silver coin from any piggy bank? The typical tree diagram can help us to do the initial visualization and thinking around this problem:
So we see that the probability of drawing a silver coin from any piggy bank is the sum of 0.238 + 0.056 = 0.294. But rather than drawing a tree diagram each time and working along the branches it would be better to conceptualize this in more mathematical terms.
P(Silver) = P(Silver∩Green) + P(Silver∩Blue) + P(Silver∩Pink)
= P(Silver∣Green) ⋅ P(Green) + P(Silver∣Blue) ⋅ P(Blue) + P(Silver∣Pink) ⋅ P(Pink)
= 5/7 ⋅ 1/3 + 1/6 ⋅ 1/3 + 0 ⋅ 1/3
= 0.294
Example 2 – Lightbulbs
Let’s look at the Wikipedia lightbulb factory example in more detail. We are told:
“Suppose that two factories supply light bulbs to the market. Factory X‘s bulbs work for over 5000 hours in 99% of cases, whereas factory Y‘s bulbs work for over 5000 hours in 95% of cases. It is known that factory X supplies 60% of the total bulbs available and Y supplies 40% of the total bulbs available. What is the chance that a purchased bulb will work for longer than 5000 hours?” The tree diagram version looks like this:
And mathematically we can think of it like this:
P(>5000h) = P(>5000h∩X) + P(>5000h∩Y)
= P(>5000h∣X) ⋅ P(X) + P(>5000h∣Y) ⋅ P(Y)
= 0.99 ⋅ 0.6 + 0.95 ⋅ 0.4
= 0.974
Example 3 – People who do & don’t visit the doctor
Another interesting example I found online was at Statistics How To. Here we are asked to think of the law of total probability in terms of P(A) = P(A∩B) + P(A∩Bc). This is the scenario:
“80% of people attend their primary care physician regularly; 35% of those people have no health problems crop up during the following year. Out of the 20% of people who don’t see their doctor regularly, only 5% have no health issues during the following year. What is the probability a random person will have no health problems in the following year?”
So in this scenario B represents people who do see their doctor and Bc represents the people who do not. All the people who do not experience health problems are represented by A, and this is the probability we want to find.
Let’s think back a moment to our definition irrespective of what road we took to get to this outcome, what is the total likelihood of the outcome occurring? Drawing the tree diagram actually helped to clarify that it was really no different from any of the other examples:
P(✓) = P(✓∩DO) + P(✓∩DON’T)
= P(✓∣DO) ⋅ P(DO) + P(✓∣DON’T) ⋅ P(DON’T)
= 0.35 ⋅ 0.8 + 0.05 ⋅ 0.2
= 0.29
Example 3 ½ – step up to Bayes
This law of total probability essentially forms the denominator for Bayes theorem, so it’s a short step from here to ask a slightly different question: you randomly select a person who had no health issues during the past year – what is the probability that they did not visit a doctor in the preceding year? We can think of this in the following mathematical terms:
P(DON’T|✓) = P(✓∣DON’T) ⋅ P(DON’T) / P(✓)
= 0.05 ⋅ 0.2 / 0.29
= 0.03
And if we randomly select a person who had no health issues during the past year – what is the probability that they did visit a doctor in the preceding year?
P(DO|✓) = P(✓∣DO) ⋅ P(DO) / P(✓)
= 0.35 ⋅ 0.8 / 0.29
= 0.97
It balances out to 1 – hooray!