Econometrics Nursing Homes P4P Quality

Shrinkage Estimators and Composite Quality Scores

Shrinkage estimators (such as the James-Stein estimator) are well-known in the economics literature and have a number of applications. A recent paper by Shwartz et al. 2012 demonstrates how one could apply shrinkage estimators to measure nursing home quality.

A challenge when examining individual QIs across a range of facilities is that sample sizes are often small, and they vary across facilities…Rather than estimating the “true” proportion experiencing a QI event at a particular facility as the observed proportion at that facility, a simple shrinkage estimator estimates the “true” proportion at a facility as the weighted average of the observed proportion at the facility and the observed proportion at some larger set of facilities that include the particular facility. As a result, the estimate of the “true” proportion is “pulled” or “shrunken” toward the overall proportion in the larger set of facilities. The amount of shrinkage depends both on the sample size at the particular facility and the extent to which performance differs across facilities. [my highlights]”

The authors apply a Bayesian multivariate normal-binomial model to the data. The prior distribution is assumed to be multivariate normal and the authors use a Wishart distribution (i.e., a generalized multivariate chi-squared distribution) to calculate the inverse covariance matrix. [For coding details, see content at the end of the post.]

Shrinkage generally produces an individual quality indicator (QI) estimate that falls between the nursing home’s observed rate and the overall population’s rate. If the sample size is larger, then the shrinkage estimate will be closer to the observed rate, since one is more confident that this is the true rate; if there is more variability across facilities (compared to within facilities), then the shrinkage estimator also will produce estimates closer to the observed rate. When the sample size is smaller or there is less variability across providers, then the shrinkage estimate is closer to the population average.

Typically, the shrinkage estimate for an individual quality indicator falls between the observed rate and the population average, but for the multivariate shrinkage estimator this need not always be the case. The authors describe one cases (quality indicator #4) where this does not occur:

QI 4 illustrates ‘nontypical’ shrinkage—the shrunken estimate is not between the observed rate and the higher population rate but significantly lower than the observed rate. The reason for this is because of the nature of the variance/covariance matrix for the 28 QIs. Shrinkage depends not just on the population rate of a particular QI but on performance on other QIs with which the particular QI is correlated. QI 4 is highly correlated with QI 5 (0.64) and QI 6 (0.52) (numbers in parentheses are the correlation coefficients). The observed facility rate on both of these QIs is zero, well below the respective population rates. The low value of the shrunken estimate for QI 4 reflects these very low rates of correlated QIs.

What happens to shrinkage estimators when a hospital has missing data? In this case, the shrinkage estimate typically assigns the nursing home the population rate. However, if the nursing home performs well (or poorly) on certain quality measures that are highly correlated with the measure with missing data, then the shrinkage estimate for that QI may not in fact be the population average.


The authors find that the shrinkage estimator is better able to predict quality scores in future years than the observed composite quality scores. The authors also propose that “In a pay-for-performance program, one might well want to increase payments to top-quintile facilities that have higher likelihoods of actually being in the top quintile and reduce penalties of bottom-quintile facilities that have smaller likelihoods of actually being in the bottom quintile.” Thus, hospitals with smaller sample sizes would be less likely to quality for the top or bottom P4P payment ranges since the true underlying quality measure is less reliable.


The authors apply four different weighting schemes when creating a composite quality score.

  • Facility-specific opportunity-based weights. This scheme weights each measure based on the number of ‘opportunities’ (i.e., the denominator of each quality measure) for a specific nursing home.
  • Population-driven opportunity-based weights. This scheme weights each measured based on the number of ‘opportunities’ (i.e., the denominator of each quality measure) across all nursing homes.
  • Equal weights. In this framework, each quality measure has equal weight regardless of the number of opportunities.
  • Population-derived numerator-based weights. In this scheme, each quality measure is weighted by the number of successful quality measure observations (i.e., the numerator of each quality measure) across all nursing homes.

The benefit of the first weighting scheme is that it best reflects that patients each provider treats. However, the relative importance of different quality measures will vary across nursing homes. On the other hand, the population-driven opportunity-based weights assign identical weights to measures across all nursing homes, but at a cost that an individual nursing home may have little (or no) opportunity to complete this measure. It need not be the case that measures that appear more frequently are necessarily more important. The equal weights ignores the frequency with which the quality measure is observed at the nursing home; however, since it may not be the case that measures that appear more frequently are necessarily more important that those that have fewer observations, in many cases equal weighting may not be problematic. Weighting by the numerator does not make much sense as it would weight the measures for which the hospitals are already most successful most; this approach would put more weight on potentially topped up measures.


Author WinBUGS code

for (j in 1 : Nf) {
p1[j, 1:28 ] ~ dmnorm(gamma[1:28 ], T[1:28 ,1:28 ]);

for (i in 1:28) {
Y[j,i] ~ dbin(p[j,i],n[j,i])


# Hyper-priors:
gamma[1:28] ~ dmnorm(mn[1:28 ], prec[1:28 ,1:28 ]);
T[1:28 ,1:28 ] ~ dwish(R[1:28 ,1:28 ], 28)

# mn is a 28 dimension vector with all 0s
prec is a 28 by 28 matrix with .0001 along the diagonal and 0s elsewhere
R is a 28 by 28 dimension matrix with .01 along the diagonal and .005 elsewhere #

Leave a Reply

Your email address will not be published. Required fields are marked *