IV methods are usually implemented using a two‐stage approach where the first stage estimates an expectation of the endogenous variable conditional on measured confounders and one or more IV. The second stage model then predicts outcomes as a function of the estimated treatment values from the first stage, measured confounders, and potentially other control variables.
There are a few approaches for implementing IV.
- 2-stage least squares (2SLS): In this approach the 1st and 2nd stage models are fitted using ordinary least squares (OLS) regression, where the model is fit in order to minimizing the sum of squared residuals from linear models.
- 2-stage residual inclusion (2SRI): In this approach, the 1st stage is the same as 2SLS, but in the 2nd stage, the endogenous variable itself, covariates, and the residual from the 1st stage are included in the 2nd stage.
Note that 2SLS and 2SRI produce identical results when models are linear. But what happens when models are not linear, as would be the case for a binary variable? This is the question that Basu, Coe and Chapman (2018) attempt to address.
They note that ideally we would like to run a probit or logit model in these cases to better fit the data. However, would running these nonlinear models bias the estimates? An alternative would be a linear approach, which would be unbiased but the fit would likely be poor and the resulting estimates imprecise. But what is the magnitude of this imprecision? How much bias is there?
Previous research from Chapman and Brooks (2016) showed:
2SLS produced consistent estimates of LATE [local average treatment effect] across alternative scenarios whereas 2SRI estimates were not generally consistent for either ATE [average treatment effect] or LATE. However, the evidence produced by Chapman and Brooks is limited in that their scenarios all included two continuous IV and had treatment and outcome rates near 50%, a setting that may have inadvertently favored the 2SLS method.
In this paper, the authors use a Monte Carlo simulation with bivariate dependent and independent variables. They test: (1) linear 2SLS, (2) probit-probit 2SRI, (3) bivariate probit model. For (2) the process they include five variations of residual inclusions including (2a) the raw residual levels, (2b) “standardized” residuals by dividing each residual by the standard deviation of all residuals, (2c) deviance residuals (2d) Anscombe residuals, and (2e) Generalized residuals from Gourieroux. The bivariate probit is estimated using a maximium likelihood estimator based on the true data generating process.
In their Monte Carlo analysis, the authors find that:
…bivariate probit always produced the least biased estimates of the ATE. Also as expected, 2SLS produced biased estimates of ATE, especially as the ATE and LATE became increasingly distinct in value with rarer treatment and outcome. Results showed that all of the 2SRI estimators produced substantially larger biases (and poor coverage probabilities) than bivariate probit in estimating ATE.
This is not surprising, but the bivariate probit’s lack of bias depending on knowing the true data generating function. The authors also use an empirical example from the Health and Retirement Study (HRS). In this application, they find:
The simulation results indicated that 2SLS should produce consistent estimates of LATEs, regardless of treatment or outcome rarity. Conversely, results suggested 2SRI models were likely to produce bias in estimating average treatment effects on outcomes (ATE or LATE), with generalized residuals estimator (2SRI‐Gres) producing the least bias. For very rare outcome, such as nursing home care and home health care in our empirical application, 2SRI with Anscombe residual (2SRI‐ares) may produce estimates close to the unbiased estimates of ATE.
- Basu A, Coe NB, Chapman CG. 2SLS versus 2SRI: Appropriate methods for rare outcomes and/or rare exposures. Health Economics. 2018;1–19. https://doi.org/10.1002/hec.3647