Many research studies aim to figure out if a physicians did a good job. Many studies use administrative claims data to evaluate performance. Other times, researchers use medical record review.
One problem with medical record review is that oftentimes experts will come up with differing opinions from reviewing the same medical record. Thus, researchers often have at least two individuals review the medical record so that the results are not biased by a single person’t opinion.
A question of interest is how reliable are different evaluators of medical record. Cohen’s kappa can provide a quantitative estimate of inter-rater reliability. The formula is the following:
To give an example, consider the situation where two raters rate 10 blogs and can give them a rating of an A, B, or C. These data are available here. You can see that Tester 1 is more likely to give positive ratings and Tester 2 is more likely to give negative ratings. In this example, the value of Kappa is 0.44.
A general rule of thumb to follow is values < 0 as indicating no agreement, 0–.20 as slight, .21–.40 as fair, .41–.60 as moderate, .61–.80 as substantial, and .81–1 as almost perfect agreement.