Let us say we observe a women without any symptoms for hemophilia. What are the odds she is a carrier for the disease?
A few relevant points about the disease are helpful. First, hemophilia is a recessive, X-chromosome linked trait. Thus, males with hemophilia inherited the disease from their mothers via their X-chromosome and females with hemophilia inherited the gene from both their father and mother. Thus, if we observe a woman without hemophilia, we know she either is a carrier (one of the two X-chromosomes has the hemophilia gene) or does not have the gene (neither of her X-chromosomes have the gene). If the woman’s mother was a carrier, but her father is not affected, we know she had a 50% chance of being a carrier (ignoring the small probability of a random mutation). Using Bayesian statistical terms, this is the prior distribution.
Consider now that the woman has two sons and assume neither has hemophilia. Given this new information, how can we use Bayesian statistics to update our estimate that the woman is a carrier of the hemophilia gene.
- Pr(y1=0, y2=0 | θ = 1) = 0.5 * 0.5 = 0.25
- Pr(y1=0, y2=0 | θ = 0) = 1 * 1 = 1
Above, y1 and y2 indicate whether her sons had hemophilia and θ indicates the true value of whether the mother is a carrier. These two formulas are the likelihood functions. We know that if the mother is a carrier (θ =1), each son has a 50/50 chance of having hemophilia; if the mother is not a carrier (θ =0) then the sons will not have hemophilia.
Now, let’s use Bayesian statistics to estimate the likelihood the mother has hemophilia.
- Pr(θ = 1|y) = [p(y| θ = 1)Pr(θ = 1)] / [p(y| θ = 1)Pr(θ = 1) + p(y| θ = 0)Pr(θ = 0) ]
- Pr(θ = 1|y) = [(0.25)(0.5)] / [(0.25)(0.5) + (1)(0.5) ] = 0.2 = 20%
Whereas we had estimated the woman had a 50% chance of being a carrier, knowing that the woman had two sons without hemophilia lead us to update our posterior probability to 20%.
We can easily update this probability based on new information as well. For instance, assume that the woman now has a third son without hemophilia. The updated posterior probability is:
- Pr(θ = 1|y) = [(0.5)^3*(0.5)] / [(0.5)^3(0.5) + (1)(0.5) ] = 1/9 = 11.1%
Example courtesy of Bayesian Data Analysis.