Berkson’s paradox happens when given two independent events, if you only consider outcomes where at least one occurs, then they become negatively dependent. More technically, this paradox occurs when there is ascertainment bias in a study design.
Let me provide an example.
Consider the case where patients can have diabetes or HIV. Assume that patients have a positive probability of being hospitalized for diabetes or HIV. Further, assume that the prevalence of diabetes and HIV are uncorrelated. In other words, we would find that probability of having diabetes would be the same among the general public as among patients with HIV.
However, now consider the case where both diabetes and HIV increase the risk of being hospitalized. If we again surveyed the general public, we again would still find that diabetes and HIV are independent.
If, however, we surveyed only patients in the hospital, then we would find that among these patients the probability of having diabetes is lower when we condition on having HIV; the probability of having HIV is lower once we condition on having diabetes. What changed?
We find that a hospital patient without diabetes is more likely to have HIV than a member of the general population, as this patient must have had some non-diabetes reason (in my example, HIV) to have entered the hospital in the first place.
I provide a numerical example of the paradox HERE.
Wikipedia has another nice example from the world of dating.
Suppose Alex will only date a man if his niceness plus his handsomeness exceeds some threshold. Then nicer men do not have to be as handsome to qualify for Alex’s dating pool. So, among the men that Alex dates, Alex may observe that the nicer ones are less handsome on average (and vice versa), even if these traits are uncorrelated in the general population.