How to implement propensity score matching

What options are available for propensity score matching algorithms?  Baser (2006) describes a number of popular options.

  • Stratified Matching.  In this method, the range of variation of the propensity score is divided into intervals such that within each interval, treated and control units have, on average, the same propensity score. Differences in outcome measures between the treatment and control group in each interval are then calculated. The average treatment effect is thus obtained as an average of outcome measure differences per block, weighted by the distribution of treated units across the blocks. With five classes, 95% of bias is removed.
  • Nearest Neighbor and 2 to 1 Matching.  This method randomly orders the treatment and control patients, then selects the first treatment and finds one (or two for 2 to 1 matching) control with the closest propensity score. The nearest neighbor technique faces the risk of imprecise matches if the closest neighbor is numerically distant.
  • Radius Matching.  With radius matching, each treated unit is matched only with the control unit whose propensity score falls in a predefined neighborhood of the propensity score of the treated unit. The benefit of this approach is that it uses only the number of comparison units available within a predefined radius, thereby allowing for use of extra units when good matches are available and fewer units when they are not. One possible drawback is the difficulty of knowing a priori what radius is reasonable.
  • Kernel Matching.  All treated units are matched with a weighted average of all controls, with weights inversely proportional to the distance between the propensity scores of the treated and control groups. Because all control units contribute to the weights, lower variance is achieved. Nevertheless, two decisions need to be made: the type of kernel function and the bandwidth parameter.
  • Mahalanobis Metric Matching. This method randomly orders subjects and then calculates the distance between first treated subjects and all controls, where the distance d(i,j) = (u–v)TC−1(u–v) where u and v are the values of matching variables (including propensity score) and C is the sample covariance matrix of matching variables from the full set of control subjects.  I describe the logic of calculating Mahalanobis distance in a previous post.

Which one of these methods should a researcher choose?

Baser recommends five criteria for determining a preferred matching algorithm.  The logic of these criteria are that after matching, the distribution of covariates in the treatment and control groups are similar.

  • Criterion 1. Calculate two sample t-statistics for continuous variables and chi-square tests for categorical variables, between the mean of the treatment group for each explanatory variable and the mean of the control group for each explanatory variable.
  • Criterion 2. Calculate the mean difference as a percentage of the average standard deviation: 100(XT − XC)/½ (SXT + SXC), where XT and XC are a set of covariates, and SXT, SXC are the standard deviation of these covariates in the treatment and control groups, respectively.
  • Criterion 3. Calculate the percent reduction bias in the means of the explanatory variables after matching (A) and before matching (I): {(XATXAC) – (XITXIC)}/ (XITXIC) where XIT and XIC are the mean of a covariate in the treatment and control group, respectively, before matching XAT and XAC is the mean of a covariate in the treatment and control group, respectively, after matching,
  • Criterion 4. Use the Kalmogorov–Smirnov test to compare the treatment and control density estimates for explanatory variables.
  • Criterion 5.  Use the Kalmogorov–Smirnov test to compare the density estimates of the propensity scores of control units with those of the treated units.



1 Comment

  1. The statement about kernel estimation “all control units contribute to the weights” only holds true for some kernel functions. For example, although it is true for the normal kernel, it is not true for the triangular, uniform, or Epanechnikov. The Epanechnikov is the default for many statistically packages. The basic idea that higher weight is given to closer data points remains true.

Leave a Reply

Your email address will not be published. Required fields are marked *