Is an average always the best estimate? Let us say that we are evaluating physician quality. Does a physician’s average score across patients (or episodes of care) best represent their true quality level? Stein’s paradox says that when we are evaluating the true quality value for a number of doctors, we can do better than the average. To do this, we can use a shrinkage methodology.
Shrinkage methodologies in essence estimate the physician’s true quality score as a weighted average of the individual physicians average score and the average score for all physicians. The intuition behind this is that if we observe a high quality doctor, there is some probability that the are actually a high quality doctor and some probability that they are an average doctor, but they just happened to perform above average when treating the patients in the sample. Similarly, for low quality doctors, there is some probability that the individual is actually a low quality doctor and some probability that they are an average doctor who just scored poorly on the patients in the sample.
How to calculate the James-Stein Shrinkage Estimator
Because of an abundance of statistics, let us move from evaluating physician quality to evaluating the quality of quarterbacks (QB) in the NFL. Let yi be the average QB rating for an individual quarterback i, and Y be the average QB rating for all quarterbacks. Then in this case, given we observe yi and Y, we can calculate the James-Stein estimator as:
The value of c depends on the relationship between the variance of each quarterback individuals score across games and the variation in scores across all quarterbacks. A higher variance in the QB rating across game within each individual quarterback implies that more weight will be put on the NFL-wide average (c will be close to 0). In other words, if each QB’s rating is imprecise, one should rely more on the mean. On the other hand, if the variation in QB scores across quarterbacks is large, then we should rely more on each quarterback’s individual rating (c will be close to 1). If the distribution of quarterback quality is disperse, than the individual quarterback ratings reveal significant information regarding the quarterback’s true quality.
We can calculate c as follows:
- c = 1 – [(k-3)σ2 / Σ(yi-Y)2]
The variable k is the number of quarterbacks in the sample. The variable σ2 is the variance in the individual quarterback’s scores across games. The denominator contains the sum of squared residuals. The SSR is the sum of the squared deviation between the average score of each quarterback across games compared to the average score for all quarterbacks. We can see that higher variation in quarterback scores across games (σ2) results in a smaller value of c. More variation in the average scores across quarterbacks (SSR) results in a higher value of c.
QB Rating Example
Take a look at the following example. The first four columns give QB ratings from the 2008 regular season. Let us pretend that this represents the true quality of each quarterback. Columns E through U represent a simulated season of 16 games for each quarterback. These simulated games are drawn from a normal distribution with each quarterback’s 2008 average rating as the distribution’s mean and with variance of 25.
The columns highlighted in yellow give the estimate of the quarterback’s true rating using the average and the shrinkage methodology. We see that the shrinkage estimate “shrinks” the average down towards the mean if it is above average and up towards the mean if it is below average. The shrinkage is based on the value of c which is calculated as above.
The columns highlighted in light blue give the squared deviations from the true QB ratings in column D. We can see that the sum of the error terms between the predicted values and the true value are higher for the average than the shrinkage estimator. Thus, the shrinkage estimate is more accurate. I redid this simulation twenty times and each time, the shrinkage estimate outperformed the simple average.
- For more information on shrinkage estimators in plain english, see Efron and Morris (1977) “Stein’s paradox in statistics” Scientific American, v. 236, pp. 119-127.