Causality: Granger Causality – Healthcare Economist

Earlier this year, I reviewed one definition of causality: the Bradford Hill criteria. Now I move from a medical definition causality to one developed by an economist (in fact, an economist from my alma matter). Granger causality basically identifies a variable as causal if: (i) it occurs the outcome of interest and (ii) including past values of the variable improves the prediction of the outcome of interest in a statistically significant way. Scholarpedia provides a useful example:

The basic “Granger Causality” definition is quite simple. Suppose that we have three terms, X_t, Y_t, and W_t, and that we first attempt to forecast X_t+1 using past terms of X_t and W_t . We then try to forecast X_t+1 using past terms of X_t+1, Y_t, and W_t. If the second forecast is found to be more successful, according to standard cost functions, then the past of Y appears to contain information helping in forecasting X_t+1that is not in past X_tor W_t. In particular, W_tcould be a vector of possible explanatory variables. Thus, Y_twould “Granger cause” X_t+1 if (a) Y_t occurs before X_t+1; and (b) it contains information useful in forecasting X_t+1 that is not found in a group of other appropriate variables.

Consider a linear example with two time series processes.

X₁(t) = Σ_j=1^pA_11,jX₁(t-j)+Σ_j=1^pA_12,jX₂(t-j)

X₂(t) = Σ_j=1^pA_21,jX₁(t-j)+Σ_j^pA_22,j=1X₂(t-j)

One can test for whether variable X₂ Granger-causes X₁ by testing whether the vector A₁₂=0, using an F-test. The F-test examines whether all coefficients in the vector are statistically significantly different from zero. The magnitude of the Granger-causality (a.k.a. G-causality) can be estimated as the logarithm of the corresponding F-statistic. To determine the appropriate number of lag terms in the time series, one could use the Akaike Information Criterion (AIC,) or Bayesian Information Criterion (BIC) to determine which value of p maximizes the model fit.

G-causality can be readily extended to the n variable case, where n>2 , by estimating an n variable autoregressive model. In this case, X₂ G-causes X₁ if lagged observations of X₂ help predict X₁ when lagged observations of all other variables X₃…X_N are also taken into account.

Leave a Reply Cancel reply