Econometrics

# Fisher’s Exact Test

How can one tell if two variables are related?  If you have continuous variables, one can calculate a correlation.  What if one is analyzing a 2×2 contingency table?   What are is the likelihood of observing just such a distribution assuming the probability of being in the two groups is known?

One way to determine this value is to apply the Fisher’s exact test.

To see how this test works, assume that the entire U.S. population lives in one of two states: Pennsylvania (where I went to undergrad) and California (where I earned my PhD).  California is about three times as large as Pennsylvania so assume that 75% of individuals live in California.

Say we are interested in examining whether California has significantly more smokers than Pennsylvania.  The CDC reports that 12.1% of Californians smoke whereas 18.4% of Pennsylvanians smoke.   If we assume a total population in our sample of 80 people, we could get a contingency table that looks like this:

 CA PA Row total Smokers 6 4 10 Non-Smokers 54 16 70 Column total 60 20 80

In this chart, 75% of all individuals are from California (i.e,. 60/80) and 12.5% of all individuals are smokers.  We see that 10% of Californians smoke and 20% of Pennsylvanians smoke.  What are the odds of seeing this distribution?

The Fisher exact test gives us this answer.  Assume:

• a = number of California smokers
• b = number of Pennsylvania smokers
• c = number of California non-smokers
• d = number of Pennsylvania non-smokers

Fisher proved that using a hypergeometric distribution, one can determine the likelihood of observing this contingency table.  One can calculate the probability as of observing this distribution as:

• (a+bCa) (c+dCc)/ (nCc+d) = ((a+b)!*(c+d)!*(a+c)!*(b+d)!) ÷ (a!*b!*c!*d!*n!)

Using our example, the answer would be:

• (10C6) (70C54)/ (80C60) = (10!*70!*60!*20!) ÷ (6!*4!*54!*16!*80!) = 14.7%

## 1 Comment

1. PK says:

appreciate the share…!