Unmeasured Outcomes in Quality Reports

Pay-for-performance (P4P) has long been tauted as a means to improve quality.  However, since the Holmstrom and Milgrom (1991) paper on multitasking, it has been known that compensating individuals on one measured dimension can compel them to substitute effort away from unmeasured dimensions.  For instance, if a mortgage broker is compensated only for the number of new mortgages he secures and not the credit worthiness of the borrower, it is likely that they will bring in borrowers with bad credit.  In the healthcare setting, compensating doctors to do certain tests (e.g., test A1C levels) may increase the probability the doctor conducts the A1C test for diabetics, but may decrease the amount of time the physician dedicates towards counseling the patient to lose weight or stop smoking.

A paper by Glazer, McGuire and Normand (2008) tries to remedy this problem.  Take a look at the following Table.  We see that discharges one and two are observable and can be measured.  On the other hand, discharge 3 is unmeasurable.  For instance, discharges one and two could represent patient mortality with respect to different types of cardiac operations.  On the other hand “Discharge Three can be thought of as representing medical discharges associated with Skin, Subcutaneous Tissue, and Breast Disorders, for which in-hospital mortality is very low, that mortality would not be a feasible (or even valid) measure of quality.”

How should the hospital weight the overall quality score between outcomes one and two.  The authors of this paper claim that more weight should be placed out outcome one.  Why?

Discharge one and three have the same inputs.  Thus, putting more weight on discharge one, will compel the hospital to increase inputs associated with a better outcomes associated with discharge one.  Since discharge one and three share inputs, this will lead to an increase in quality improvement for discharge 3.  For instance, an increase in nursing staff or computerized records may increase productivity for multiple observed an unobserved outcomes.  On the other hand, if discharge 2 depends on the purchase of a machine that test for only one condition, less weight should be placed on high levels of discharge 2 since there are less spillovers.

A necessary condition for this type of measurement to work is that all inputs must be used in at least one of the observable discharge types.  Further complications arise from the fact that, not all providers use the same inputs to treat patients with the same disease.  Also, “…the existing evidence supporting commonality is too general to be usable yet as a basis for modifying profile construction.”

Nevertheless, thinking about how quality improvements can spillover to other treatments is an important framework to have whether policy-makers are creating P4P metrics.


  1. You and the cited articles touch on an important issue in trying to measure quality. What is missing is any consensus on the right way to do this, and the deficiencies of current measures for this seen in the continuing problems with medical care experienced by doctors and patients alike.

    We ought to measure quality based on a single question: did each patient get the right diagnosis and the right treatment?

    Getting the right answer to this question requires that doctors spend plenty of time with each patient, think about their problems, order the tests they consider appropriate. The trouble is that the health care system does not value these things, and so there is understandably not much interest in using quality metrics that reveal these contradictions.

    It is inescapable that we come to grips with this measurement problem. There is interesting work being done by Patrick Crosskerry and others that highlights these significant quality problems that lie at the core of our healthcare quality dilemma.

    Evan Falchuk

Leave a Reply

Your email address will not be published. Required fields are marked *