Is an average always the best estimate? Let us say that we are evaluating physician quality. Does a physician’s average score across patients (or episodes of care) best represent their true quality level? Stein’s paradox says that when we are evaluating the true quality value for a number of doctors, we can do better than the average. To do this, we can use a shrinkage methodology.

Shrinkage methodologies in essence estimate the physician’s true quality score as a weighted average of the individual physicians average score and the average score for all physicians. The intuition behind this is that if we observe a high quality doctor, there is some probability that the are actually a high quality doctor and some probability that they are an average doctor, but they just happened to perform above average when treating the patients in the sample. Similarly, for low quality doctors, there is some probability that the individual is actually a low quality doctor and some probability that they are an average doctor who just scored poorly on the patients in the sample.

**How to calculate the James-Stein Shrinkage Estimator**

Because of an abundance of statistics, let us move from evaluating physician quality to evaluating the quality of quarterbacks (QB) in the NFL. Let *y _{i}* be the average QB rating for an individual quarterback

*i*, and Y be the average QB rating for all quarterbacks. Then in this case, given we observe

*y*and

_{i}*Y*, we can calculate the James-Stein estimator as:

- y
^{JS}=Y+c(y_{i}-Y)

The value of *c* depends on the relationship between the variance of each quarterback individuals score across games and the variation in scores across all quarterbacks. A higher variance in the QB rating across game within each individual quarterback implies that more weight will be put on the NFL-wide average (c will be close to 0). In other words, if each QB’s rating is imprecise, one should rely more on the mean. On the other hand, if the variation in QB scores across quarterbacks is large, then we should rely more on each quarterback’s individual rating (c will be close to 1). If the distribution of quarterback quality is disperse, than the individual quarterback ratings reveal significant information regarding the quarterback’s true quality.

We can calculate *c* as follows:

- c = 1 – [(k-3)σ
^{2}/ Σ(y_{i}-Y)^{2}]

The variable k is the number of quarterbacks in the sample. The variable σ^{2} is the variance in the individual quarterback’s scores across games. The denominator contains the sum of squared residuals. The SSR is the sum of the squared deviation between the average score of each quarterback across games compared to the average score for all quarterbacks. We can see that higher variation in quarterback scores across games (σ^{2}) results in a smaller value of *c*. More variation in the average scores across quarterbacks (*SSR*) results in a higher value of *c*.

**QB Rating Example**

Take a look at the following example. The first four columns give QB ratings from the 2008 regular season. Let us pretend that this represents the true quality of each quarterback. Columns E through U represent a simulated season of 16 games for each quarterback. These simulated games are drawn from a normal distribution with each quarterback’s 2008 average rating as the distribution’s mean and with variance of 25.

The columns highlighted in yellow give the estimate of the quarterback’s true rating using the average and the shrinkage methodology. We see that the shrinkage estimate “shrinks” the average down towards the mean if it is above average and up towards the mean if it is below average. The shrinkage is based on the value of *c* which is calculated as above.

The columns highlighted in light blue give the squared deviations from the true QB ratings in column D. We can see that the sum of the error terms between the predicted values and the true value are higher for the average than the shrinkage estimator. Thus, the shrinkage estimate is more accurate. I redid this simulation twenty times and each time, the shrinkage estimate outperformed the simple average.

- For more information on shrinkage estimators in plain english, see Efron and Morris (1977) “Stein’s paradox in statistics”
*Scientific American*, v. 236, pp. 119-127.

are you sure the within player variances aka sigma squared – are 1 – they seem to be a lot higher and the values for each player a lot more scattered, I think a variance and SD of 1 would mean 2/3 of the data within 1 of the average.

if the variance was higher that would increase the shrinkage (reduce c) in the formula for c.

what happens if there is a zero residual – isn’t the formula for c invald then ?

also if there are small residuals is it not possible for c to become negative ?

sorry I mean to say are you sure the variance is 25 not 1.

“these simulated games are drawn from a normal distribution with each quarterback’s 2008 average rating as the distribution’s mean and with variance of 25.”

a variance of 25 is an SD of 5.

My point is that for normal data 2/3 of values lie within 1 SD of the mean and I’m not sure that is the case here.

For example for Philip Rivers, the first row, with an average of 101.6 we’d need 2.3 of his values between 95.6 and 106.6 but the data seems more variable than that (I think only 4 of 16 values lie in that range for that person – 25% not 67%)

many thanks

I was looking around for a simple demonstration of James-Stein, when I cam across this site. While it looked quite helpful, after staring at it for some time, I must conclude that the example ‘spreadsheet’ contains a large number of errors — so many that I’m inclined to conclude it must be incorrect.

First, he value in column D is stated to represent the true mean, from which normal deviates are drawn with var=25. There is no way that the values in columns E -> U represent N(col D, 25). The variances for the values reported are >>25. Simply take any row from column E -> U, and calculate the variance. Not even close.

Second clue that there is a problem involves looking at the averages –(column labeled Avg). They should decline monotonically, since the true mean (column D) declines monotonically. They don’t. Again, not even close. The averages aren’t monotonic, and the disparity between the sample mean and the true mean is way too large given 16 sample from random normal with var=25. Take the values for Phillip Rivers — rating 105.5. You show values that are so unlikely as to be impossible. For example,the probability of getting a value 120 is <1%, and yet, of our your 16 values, 7 are bigger or smaller than you would expect. Again, there is no way the random deviates you show are from a normal with variance of 25. Even U(105.5,25) only varies from 96 114.

Third, c is made a constant in the example. Perhaps reasonable in this particular case, but in fact you should derive c for each row separately, as the product of (k-3)*row variance/SSR.

Fourth, the reported SSR is *much* too high. Higher than it could possibly be with var=25.

Finally, squared-error for shrunk estimates is not small than squared-error for averages, as you suggest. I’ve generated a spreadsheet which anyone is welcome to try (just email me offline). The spreadsheet automates the basic structure of the ‘result’ shown on this website. But, it fixes the noted errors.