Simpson’s Paradox

Statistics can be weird. Just when you’ve done the game show paradox, and the birthday paradox, there’s this. I think people in general need to realize that we as humans are just not that good at intuitively knowing probability.

From John Rice’s Statistics Textbook:

A black urn contains 5 red and 6 green balls, and a white urn contains 3 red and 4 green balls. You are allowed to choose an urn and then choose a ball at random from the urn. If you choose a red ball, you get a prize. Which urn should you choose to draw from? If you draw from the black urn, the probability of choosing a red ball is 5/11 (the number of ways you can draw a red ball divided by the total number of outcomes). If you choose to draw from the white urn, the probability of choosing a red ball is 3/7, so you should choose to dra from the black urn.

Now consider another game in which a second black urn has 6 red and 3 green balls, and a second white urn has 9 red and 5 green balls. If you draw from the black urn, the probability of a red ball is 6/9, whereas if you choose to draw from the white urn, the probability is 9/14. Again, you should choose to draw from the black urn.

In the final game, the contents of the second black urn are added to the first black urn, and the contents of the second white urn are added to the first white urn. Again, you can choose which urn to draw from. Which should you choose? Intuition says choose the black urn, but let’s calculate the probabilities. The black urn now contains 11 red and 9 green, so the probability of drawing a red ball from it is 11/20 = .55. The white urn now contains 12 red and 9 green balls, so the probability of drawing a red ball from it is 12/21 = .571.  So, you should choose the white urn.

When you think about it, it actually makes sense. Because the number is greater in the second one it sort of evens out. Still, a little weird though.

Another common example is in batting averages. Here is an example wikipedia gives. In this example, David Justice has a better batting average than Mike Jeter for two years in a row, but his cumulative batting average is worse.

1995 1996 Combined
Derek Jeter 12/48 .250 183/582 .314 195/630 .310
David Justice 104/411 .253 45/140 .321 149/551 .270

Actually, the batting averages make it more intuitive for me.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: