Hospitals Medicare P4P Quality

Limitations of CMS’s Hospital Star Ratings System

Is your hospital high quality? Well, this depends on what quality means. Does it have low readmission rates? Low rates of mortality? Do they follow clinical guidelines? Are patients satisfied? Are they good at cardiology care? What about cancer treatment?

Combining all these different dimensions of quality is a complex task. The Centers for Medicare and Medicaid Services aimed to summarize hospital quality using a star rating system, where hospitals can receive between 1 and 5 stars (5 is better). The results are presented on Hospital Compare. These star ratings are based on 64 metrics across 7 measure categories: mortality, safety of care, readmission, patient experience, effectiveness of care, timeliness of care, and efficient use of medical imaging.

What are the key challenges with Hospital Compare’s star ratings? A ViewPoint in JAMA by Bilimoria and Barnard (2021) list five key limitations:

  • Comparison against non comparable hospitals. The authors claim that comparing a small 20-bed critical access hospitals to say a specialty orthopedic hospital is not really a useful comparison. In theory, if these hospitals are in the patient/physicians choice set for potential places to be hospitalized, they should be compared. In practice, however, the authors note that these hospitals often report different numbers and types of measures. While this is sensible from a feasibility standpoint–you can only measure quality for the patients a hospital treats–the comparisons may not be fully comparable. The authors also note that empirically, “the more measures a hospital reported, the less likely it was to receive 5 stars.” As smaller hospitals have less reporting requirements, it’s not clear if smaller hospitals are providing better care, or if this is a statistical anomaly. CMS has updated the Hospital Compare so that only hospitals with a minimum number of measures are scored.
  • Complex methodology. The authors write that “the latent variable model weighting approach was complex, opaque, and resulted in skewed measure weights. One measure, the AHRQ PSI-90 complication composite, counted for up to 90% of performance in the safety of care measure group, with the 6 other measures in the group carrying much smaller weights.” This does not mean that the approach was wrong. It could be the case that PSI-90 was the key differentiator across hospitals and hospital quality was similar across the other measures. The k-means clustering approach does seem reasonable and may be useful for retrospective analysis. However, the authors have a fair point that quality metrics are only actionable if they can be clearly understood by the people being evaluated. Likely few hospital administrators are familiar with k-means clustering or latent variable models. To address this, Medicare has revised the star ratings in 2020 so that all measures are equally weighted. While this may not be statistically ideal, it is more transparent.
  • Relative vs. absolute scoring. Hospital Compare in essence ranks hospitals by quality meaning that only about 1 in 7 hospitals receives a 5-star rating. An alternative approach would be to measure quality on an absolute scale. The authors argue for an absolute scale, but implementing this in practice may be difficult. Absolute scores could result in a majority of hospitals receiving 5 stars or receiving 1 stars. While this could be useful for quality measurement, it would not be helpful for patient or physician decision-making where rank-ordering is better.
  • Are academic medical centers really worse? The authors argue that small community hospitals are still ranking better than many esteemed academic medical centers. A key question is whether this is a good representation of quality of care or if there are unmeasured dimensions of quality over which academic medical centers would score better, but do not since they are not measured. Since academic medical centers likely perform better at innovative procedures for which there are few quality metrics, this is likely the case, but to what extent is unclear.
  • Data quality. The star ratings are only as good as the data. The authors have some critiques here as well. “Much of the data used by CMS for the star ratings do not undergo any type of meaningful audit. There are wide hospital-to-hospital variations in how the data are reported based on resources allocated to abstraction.” More auditing of the data would be good, but that effort is not costless. The authors also argue for more rigorous peer review by external experts of the star rating methodology in addition to relying on feedback from Medicare’s Technical Expert Panel (TEP).

In summary, measuring quality of care is difficult. It is certainly clear that there are areas where CMS’s Hospital Compare star ratings could be improved, but any top-down, administrative approach to measuring quality of care is likely to have many limitations in practice.


Leave a Reply

Your email address will not be published. Required fields are marked *