Medicare aims to tie 90% of reimbursement to quality measures. The potential for quality-linked reimbursement to incentivized improved quality of care, however, depends critically on whether physician quality can be measured reliably. Profiling individual physicians is difficult. Sample sizes are small and attributing patients to a single physician can be difficult (as Mehtrotra et al. 2010 notes), particularly with sick patients who visit multiple physicians every month or even every week. Further “Using data from commercial plans, several authors have found low reliabilities for measuring some aspects of physician quality (Hofer et al. 1999; Scholle et al. 2008; Sequist et al. 2011; Smith et al. 2013) and cost profiles (Hofer et al. 1999; Adams et al. 2010). The reliability of quality measures for hospitals also has been called into question (Thompson et al. 2016).
A recent paper by Adams and Paddock (2016) examine quality of care measures for primary care physicians serving Medicare fee-for-service patients in New York or Florida and who had ≥1 attributed quality measure. As quality measures are often measured as the share of patients who reach some benchmark and thus generally fall between 0 and 1, the authors wisely apply a beta-binomial model to estimate the reliability HEDIS quality metrics. [“Applying the standard HLM to reliability calculations in this context would require an assumption of asymptotic normality of physician-level score estimates, which might not be tenable for physician profiling when the number of opportunities for physicians to pass quality indicators is small.”] Under this approach, a physician’s quality is modeled using a binomial distribution assuming conditional on the true quality (i.e., probability an individual patient meets the quality metric) is known. The true probability is modelled using the beta distribution so that.
Reliabilty is calculated as:
where (σp2p)2 is variance in mean quality across providers and p(1-p)/n is the within-individual variance of the estimated quality p.
Using this relibaiblity measure the authors examine 3 scoring systems: (i) classifying an individual as above or below the mean, (ii) classifying provider networks based on whether they fall above or below the 75th percentile, and (iii) classifying provider who meet inclusion criteria based on whether they fall below the 25th percentile quality. The probability of miscalculation is calculated in each case for each individual provider and total misclassification probabilities are simply the sum of these probabilities across providers.
The authors find that:
In the three scoring systems, misclassification ranges were 8.6–25.7 percent, 6.4–22.8 percent, and 4.5–21.7% [across all measures considered]. True positive rate ranges were 72.9–97.0 percent, 83.4–100.0 percent, and 34.7–88.2 percent. True negative rate ranges were 68.5–91.6 percent, 10.5–92.4 percent, and 81.1–99.9 percent. Positive predictive value ranges were 70.5–91.6 percent, 77.0–97.3 percent, and 55.2–99.1 percent.
Of particular interest is that measures of reliability and misclassification are not the same.
median physician reliabilities of greater than 0.90 (e.g., glaucoma screening in NY, minimum denominator size of 100) can produce misclassification rates of more than 10 percent, while those greater than 0.70 can still result in misclassification rates of 20 percent or more.
This information should be helpful for determining the utility of quality metrics for both ranking physicians and tying physician reimbursement to quality metrics.
Adams, J. L., A. Mehrotra, J. W. Thomas, and E. A. McGlynn. 2010. “Physician Cost Profiling—Reliability and Risk of Misclassification.” The New England Journal of Medicine 362 (11): 1014–21.
Hofer, T. P., R. A. Hayward, S. Greenfield, E. H. Wagner, S. H. Kaplan, and W. G. Manning. 1999. “The Unreliability of Individual Physician “Report Cards” for Assessing the Costs and Quality of Care of a Chronic Disease.” Journal of the American Medical Association 281 (22): 2098–105.
Mehrotra, A., J. L. Adams, J.W. Thomas, and E. A. McGlynn. 2010. “The Effect of Different Attribution Rules on Individual Physician Cost Profiles.” Annals of Internal Medicine 152 (10): 649–54.
Scholle, S. H., J. Roski, J. L. Adams, D. L. Dunn, E. A. Kerr, D. P. Dugan, and R. E. Jensen. 2008. “Benchmarking Physician Performance: Reliability of Individual and Composite Measures.” The American Journal of Managed Care 14 (12): 833–8.
Sequist, T. D., E. C. Schneider, A. Li,W. H. Rogers, and D. G. Safran. 2011. “Reliability of Medical Group and Physician Performance Measurement in the Primary Care Setting.” Medical Care 49 (2): 126–31.
Smith, K. A., J. B. Sussman, S. J. Bernstein, and R. A. Hayward. 2013. “Improving the Reliability of Physician “Report Cards”.” Medical Care 51 (3): 266–74.
- Thompson, Michael P., Cameron M. Kaplan, Yu Cao, Gloria J. Bazzoli, and Teresa M. Waters. “Reliability of 30‐Day Readmission Measures Used in the Hospital Readmission Reduction Program.” Health Services Research(2016).