What is Berkson’s paradox?
Assume that there are two independent events, A and B. These events are not correlated when observed in nature. However, what if one conditions on the fact that either A or B occurs. By conditioning on the fact that one of the two events occurred, these events are now correlated.
If both events have a positive probability less than unity (i.e., 0 < P(A) < 1 and 0 < P(B) < 1), then one can describe Berkson’s paradox mathematically as:
- if P(A|B) = P(A) then P(A|B,C) < P(A|C) where C = A∪B
The correlation occurs because of selection bias. If one finds out that conditional on C that event A did not occur, we know that event B did occur. Thus, conditioning on the union of the two variables leads to a correlation.
In Causality, Judea Pearl provides a more concrete example:
…if the admission criteria to a certain graduate school call for either high grades as an undergraduate or special musical talents, then these two attributes will be found to be correlated (negatively) in the student population of that school, even if these attributes are uncorrelated in the population at large. indeed, students with low grades are likely to be exceptionally gifted in music, which explains their admission to graduate school.