Econometrics

Kappa Statistic

Many research studies aim to figure out if a physicians did a good job.  Many studies use administrative claims data to evaluate performance.  Other times, researchers use medical record review.

One problem with medical record review is that oftentimes experts will come up with differing opinions from reviewing the same medical record.  Thus, researchers often have at least two individuals review the medical record so that the results are not biased by a single person’t opinion.

A question of interest is how reliable are different evaluators of medical record.  Cohen’s kappa can provide a quantitative estimate of inter-rater reliability.  The formula is the following:

• [P(a)-P(e)]/[1-P(e)]
Where P(a) is the observed level of agreement and P(e) is the expected level of agreement from pure chance.  In essence, the kappa measurement compares the observed level of inter-rater agreement against the level of agreement that would be expected by pure chance.

To give an example, consider the situation where two raters rate 10 blogs and can give them a rating of an A, B, or C. These data are available here.  You can see that Tester 1 is more likely to give positive ratings and Tester 2 is more likely to give negative ratings.  In this example, the value of Kappa is 0.44.

A general rule of thumb to follow is values < 0 as indicating no agreement, 0–.20 as slight, .21–.40 as fair, .41–.60 as moderate, .61–.80 as substantial, and .81–1 as almost perfect agreement.