When selection bias is an issue, many researchers use propensity score matching to insure that observable differences in patient characteristics are balanced between individuals who receive a given treatment and those who do not. If unobservable characteristics are correlated with observable characteristics, propensity score matching generally works well.
Cases where propensity score matching does not work well include cases where balancing patient characteristics is not simply a linear combination of covariates. For instance, matching the square of a variable (e.g., age) or the interaction of multiple covariates (e.g., comorbidity interactions) may be needed to avoid misspecification.
A paper by Alemi, ElRafey and Avramovic (2016) propose one method of achieving this: stratified covariate balancing. Their approach is as follows.
- Identify Naturally Occurring Strata. Here the authors look at all 2-variable interactions. For instance, many patients in a dataset may have hypertension and diabetes. Using SQL, the authors look at all covariate combinations where there are at least one observation of a person receiving a treatment and one observation of a person receiving the control. Each of these interactions is considerate a strata.
- Matched Estimation of the Effect. Once natural strata have been organized, the estimation of impact follows statistical procedures known since the 1950s for analysis of stratified
case–control design (Cochran 1950; Mantel and Haenszel 1959). The chi-square
test for homogeneity is used to see whether, across strata, a common odds ratio exists.” One can calculate the matched effect for each strata as the difference in the average outcome for people receiving the treatment and the average outcome of those not receiving the treatment. The strata are weighted by the number of people receiving the treatment in each strata.
- Weighted Estimation of the Effect. Step 2 above assumes that there will be the same number of individuals and controls receiving the treatment with each strata. However, this is unlikely to be the case. To address this, observations are weighted so that the number of people receiving the treatment and control are the same within strata. In the typical case where fewer people receive the treatment then the control, the authors propose weighting the treatment arm within each strata with a 1 and weighting the people in the control arm by the ratio of the people receiving the treatment divided by the number of people receiving the control in each strata.
The paper has an empirical application of stratified covariate balancing and demonstrates that it is able to balance covariates and covariate interactions. You can try this out yourself as the authors have created a StratifiedBalancing package in R available here.
- Alemi, F., ElRafey, A. and Avramovic, I. (2016), Covariate Balancing through Naturally Occurring Strata. Health Serv Res. doi:10.1111/1475-6773.12628
- Cochran, William G. “The comparison of percentages in matched samples.” Biometrika 37, no. 3/4 (1950): 256-266.
- Mantel, Nathan, and William Haenszel. “Statistical aspects of the analysis of data from retrospective studies.” J natl cancer inst 22, no. 4 (1959): 719-748.