Agreement Between Two Independent Groups Of Raters

For ordinal data for which there are more than two categories, it is useful to know whether the evaluations of different evaluators vary slightly or strongly. For example, microbiologists may consider bacterial growth on cultured plaques to be inaccurate, occasional, moderate, or confluent. In this case, the ratings of a given disk by two critics as “casual” and “moderate” would imply a lower level of discordance than those “no growth” or “confluent” ratings. Kappa`s weighted statistics take this difference into account. This results in a higher value if the evaluators` responses are closer, with the maximum scores for a perfect match; Conversely, a larger difference in two ratings leads to a lower value of the weighted Kappa. The techniques for assigning weighting to the difference between categories (linear, square) may vary. We can see from the next edition that the “Simple Kappa” gives kappa`s estimated value of 0.389 with its asymptotic default error (ASE) of 0.0598. The difference between compliance with the agreement and the expected independent concordance is about 40% of the maximum possible difference. Based on the recorded 95% confidence interval, $kappa$ falls somewhere between 0.27 and 0.51, indicating only a moderate match between Siskel and Ebert. Kalantri et al. studied the accuracy and reliability of pallor as a tool for detecting anemia.

[5] They concluded that “clinical evaluation of pallor may exclude severe anemia and decide modestly.” However, the correspondence between observers for the detection of pallor was very poor (kappa = 0.07 for connective blues and 0.20 for tongue blues), meaning that pallor is an unreliable sign for the diagnosis of anemia. Think of two ophthalmologists who measure the pressure inside the eye with a tonometer. Each patient therefore receives two measurements, one from each observer. ICC provides an estimate of the total concordance between these measured values. It somewhat resembles “analysis of variance” because it considers variances between pairs expressed as a percentage of the overall variance of observations (i.e., the overall variability in “2n” observations that is expected to be the sum of variances within and between). The CCI can accept a value from 0 to 1, with 0 showing no agreement and 1 a perfect agreement. If the observed conformity is due only to chance, that is: If the evaluations are completely independent, then each diagonal element is a product of the two marginals. Since the overall probability of the concordance is Σi πii, the probability of a concordance below the zero hypothesis is equal to Σi+π+i.

. .