Causality: Bradford Hill criteria

If you observe two things occuring, how can you know whether event A causes event B. For instance, consider the case of patients who use a given treatment and finding that they have better health outcomes. While this relationship could be causal in nature, it may not be. For instance, if only people with higher socioeconomic status can afford the treatment and these same individuals are likely to have better health outcomes due to other factors (e.g., more flexible work schedules, more family support) than there may not be a causal relationship at all.

So how do we determine if some event A is causal of event B? In the medical literature, Bradford Hill criteria are often used. These are:

Strength (effect size): A small association does not mean that there is not a causal effect, though the larger the association, the more likely that it is causal.
Consistency (reproducibility): Consistent findings observed by different persons in different places with different samples strengthens the likelihood of an effect.
Specificity: Causation is likely if there is a very specific population at a specific site and disease with no other likely explanation. The more specific an association between a factor and an effect is, the bigger the probability of a causal relationship.
Temporality: The effect has to occur after the cause (and if there is an expected delay between the cause and expected effect, then the effect must occur after that delay).
Biological gradient: Greater exposure should generally lead to greater incidence of the effect. However, in some cases, the mere presence of the factor can trigger the effect. In other cases, an inverse proportion is observed: greater exposure leads to lower incidence.^[1]
Plausibility: A plausible mechanism between cause and effect is helpful (but Hill noted that knowledge of the mechanism is limited by current knowledge).
Coherence: Coherence between epidemiological and laboratory findings increases the likelihood of an effect. However, Hill noted that “… lack of such [laboratory] evidence cannot nullify the epidemiological effect on associations”.
Experiment: “Occasionally it is possible to appeal to experimental evidence”.
Analogy: The effect of similar factors may be considered.

These items are sensible. However, they do not derive from first principles. Instead, they look at logically, the likelihood that an observed phenomenon is causal, but do not put specific numeric limits. Let’s delve into each of these criteria in turn.

Strength: There is no reason why a causal effect need be strong. For instance, if we want to know what factors moved a toy boat forward, a small breath may move it slightly. The breath is clearly causal even though the effect is small. The reason for the strength criterion, however, likely is that in the real world, observations are made with a lot of noise (e.g., measurement error, selection bias, etc.). A strong relationship is more likely to be causal since it my not be overwhelmed by these other factors. Perhaps, the strength criterion should be used if we want to look at the principal causes of some effect.
Consistency. If A causes B, and we observed B to occur after A in one case, it would be more convincing if every time A occurred, B occurred. For instance, if we want to show that snow causes ice, we could show that times when it snows, water turns to ice. However, if we can show that ice sometimes appear without snow, then clearly snow cannot cause ice. Thus, the consistency criterion is only valid so long as the reproducible tests or observations chosen are sufficiently diverse in order for the different observations to prove informative.
Specificity. This would be close to the causal pathway. If we see that treatment A improves health outcomes B, but don’t know why, it is less clear that this is causal. If we know that treatment A increases the production of protein P, we would expect to see patients with lower levels of protein P to see a bigger improvement in health outcomes. More specific proposed causal pathways can also be flawed, but it is helpful to understand the mechanism through which any causal effect could occur.
Temporality. Causes occur before effects. Makes sense.
Biological gradient. This causal pathway makes sense only for linear causal relationships. If I want to know whether ice makes people’s skin cold, I could test to see if more ice makes people’s skin more cold. However, if I am looking at a pharmacetuical and want to measure its effect on quality of life, increasing the dose from something minimal would likely improve health outcomes. Setting the dose too high, however, may lead to serious adverse events. Thus, the biological gradient criteria may fail in non-linear causal relationships.
Plausibility. This is a very subjective criterion. In practice, scientists often consider plausibility when making decisions on the likelihood of causality, but just because a relationship seems unlikely does not make it the case that it can’t be causal.
Coherence. I would similar evidence from lab and epidemiological studies helps to make the causal case. There may be cases, however, where the evidence differs, such as in cases where laboratory settings and real-world settings differ. In the case of clinical trials, patients receiving treatments outside of clinical trials often differ from those who enroll.
Experiment. If you could replicate the phenomenom to be studied in a lab setting and then test how including or removing a cause effects results, this would be strong evidence.
Analogy. Like plausibility, in practice, scientists are likely to see if there is evidence from analogous cases for causality. Failure to find an appropriate analogy, however, does not preclude a causal a relationship, particularly for novel phenomenon. Perhaps more importantly, finding an analogy does not mean that the phenomenon under investigation is causal.

In short, the Bradford Hill criteria provide some useful, practical guidelines for causality. Some of the criteria, however, are vague and they leave a lot of room for investigator judgment. One piece of advice from Potischman and Weed (1999) that I agree with is that Bradford Hill criteria should be used as a guide, but not as a list of criteria to definitively determine causality.

3 Comments

Leave a Reply Cancel reply