In randomized controlled trials, participants are randomized to different groups where each group receives a unique intervention (or control). This process insures that any differences in the outcomes of interest are due entirely to the interventions under investigation. While RCTs are useful, they are expensive to run, are highly controlled and suffer from their own biases (e.g., attrition bias).
Use of real-world data are an alternative to randomized controlled trials to measure the effect of an intervention. Statistical analysis of the effect of an intervention on patient outcomes must demonstrate that the assignment of the intervention is unrelated to outcomes. In the RCT, this ignorability of treatment assignment occurs due to randomization. In the real world, techniques such as difference in differences, instrumental variables analysis, regression discontinuity, and propensity score patching are often used to eliminate or minimize any endogeneity bias in treatment assigned when using real world data. However, not all methods are the same.
Consider the case of propensity score matching. An article by Fullerton et al. (2016) examines which approaches to propensity score matching work best. The consider three dimensions of propensity score matching:
- Number of variables included in the match. One approach is to include all covariates in the match, whereas other approaches only match on selected variables likely to influence the outcome of interest. One could select these variables based on expert opinion or previous literature. The Fullerton paper uses a backward selection approach where they run a logistic regression for the effect of the patient characteristic on receipt of treatment. All treatments with a p-value >0.10 are included in the match.
- Functional form. Most often, a propensity score is calculated based on the probability of receiving treatment conditional on patient characteristics using a logistic regression. The logistic regression is widely used due to its simplicity but all interactions or higher order terms would have to be explicitly specified as functional terms in logistic regressions, which is rarely done for a large number of covariates. Fullerton and co-authors also use a general boosted regression (GBR), a multivariate nonparametric regression technique that can flexibly include nonlinear relationships between the propensity score and a large number of covariates.
- Number of matches. In this simplest case, each individual receiving the treatment of interest is matched to one and only one individuals based on proximity of propensity score. However, each “treated” individual could also be matched to multiple individuals. Matching on more individuals increases the bias—as the people matched by definition have a larger difference in propensity score—but more power.
Further, there are other types of matching that could be considered.
- ESLID Matching. In this approach, patient characteristics are categorized into mandatory and optional. Patients must match on the mandatory characteristics and then among those matched, an individual (or individuals) is selected based on closest match on the optional characteristics.
- Exact matching. This matching occurs when treated individuals are matched to all controls with exactly the same values for the covariates of interest. Clearly this creates a good match, but power decreases.
- Coarsened exact matching (CEM). In this specification, “the number of matching dimensions is reduced by creating intervals for continuous variables and possibly redefining categorical variables.” For instance, one could match on age deciles which would allow the age levels themselves not to match but age relative to ones peers would be similar.
Using this approach applied to data from Germany on disease management programs for type 2 diabetes (DMPDM2). The authors found the following:
Exact matching methods performed well across all measures of balance, but resulted in the exclusion of many observations, leading to a change of the baseline characteristics of the study sample and also the effect estimate of the DMPDM2. All PS-based methods showed similar effect estimates. Applying a higher matching ratio and using a larger variable set generally resulted in better balance. Using a generalized boosted instead of a logistic regression model showed slightly better performance for balance diagnostics taking into account imbalances at higher moments.
Nevertheless, the authors wisely recommend applying multiple matching techniques to ensure the results are robust to matching specification choice.
- Fullerton, Birgit, Boris Pöhlmann, Robert Krohn, John L. Adams, Ferdinand M. Gerlach, and Antje Erler. “The Comparison of Matching Methods Using Different Measures of Balance: Benefits and Risks Exemplified within a Study to Evaluate the Effects of German Disease Management Programs on Long‐Term Outcomes of Patients with Type 2 Diabetes.” Health services research(2016).