10 research outputs found

    Comparison studies on agreement coefficients with emphasis on missing data

    Get PDF
    In various fields of science the categorization of people into categories is required. An example is the assignment of people with mental health problems to classes of mental disorders by a psychologists. A diagnosis may provide a person more insight into his or her problems, which is often a prerequisite for finding the right treatment. A nominal rating instrument has high reliability if persons obtain the same classification under similar conditions. In other words, a classification is considered reliable if the raters agree on their rating. A coefficient that is commonly used for measuring the degree of agreement between two raters is Cohen’s kappa. Cohen’s kappa is a standard tool for assessing greement between nominal classifications in social and medical sciences. Missing data (or missing values) are a common problem in many fields of science. In agreement studies, missing data may occur due to missed appointments or dropout of persons. However, missing data may also be the result of rater performance. If a particular category is missing, or if a category is not fully understood, a rater may choose not to rate the unit. How missing data may affect the quantification of inter-rater agreement has not been studied comprehensively.In this dissertation we mainly focused on the impact of missing data on kappa coefficients. The results show that a coefficient that uses missing data for a more precise estimation of the expected agreement, multiple imputation methods and listwise deletion are able to handle missing agreement data sufficiently

    Ordering Properties of the First Eigenvector of Certain Similarity Matrices

    Get PDF
    It is shown for coefficient matrices of Russell-Rao coefficients and two asymmetric Dice coefficients that ordinal information on a latent variable model can be obtained from the eigenvector corresponding to the largest eigenvalue

    Properties of Bangdiwala's B

    Get PDF
    Cohen's kappa is the most widely used coefficient for assessing interobserver agreement on a nominal scale. An alternative coefficient for quantifying agreement between two observers is Bangdiwala's B. To provide a proper interpretation of an agreement coefficient one must first understand its meaning. Properties of the kappa coefficient have been extensively studied and are well documented. Properties of coefficient B have been studied, but not extensively. In this paper, various new properties of B are presented. Category B-coefficients are defined that are the basic building blocks of B. It is studied how coefficient B, Cohen's kappa, the observed agreement and associated category coefficients may be related. It turns out that the relationships between the coefficients are quite different for 2x2 tables than for agreement tables with three or more categories

    Ordering Properties of the First Eigenvector of Certain Similarity Matrices

    Get PDF
    It is shown for coefficient matrices of Russell-Rao coefficients and two asymmetric Dice coefficients that ordinal information on a latent variable model can be obtained from the eigenvector corresponding to the largest eigenvalue

    A Comparison of Reliability Coefficients for Ordinal Rating Scales

    Get PDF
    Kappa coefficients are commonly used for quantifying reliability on a categorical scale, whereas correlation coefficients are commonly applied to assess reliability on an interval scale. Both types of coefficients can be used to assess the reliability of ordinal rating scales. In this study, we compare seven reliability coefficients for ordinal rating scales: the kappa coefficients included are Cohen’s kappa, linearly weighted kappa, and quadratically weighted kappa; the correlation coefficients included are intraclass correlation ICC(3,1), Pearson’s correlation, Spearman’s rho, and Kendall’s tau-b. The primary goal is to provide a thorough understanding of these coefficients such that the applied researcher can make a sensible choice for ordinal rating scales. A second aim is to find out whether the choice of the coefficient matters. We studied to what extent we reach the same conclusions about inter-rater reliability with different coefficients, and to what extent the coefficients measure agreement in a similar way, using analytic methods, and simulated and empirical data. Using analytical methods, it is shown that differences between quadratic kappa and the Pearson and intraclass correlations increase if agreement becomes larger. Differences between the three coefficients are generally small if differences between rater means and variances are small. Furthermore, using simulated and empirical data, it is shown that differences between all reliability coefficients tend to increase if agreement between the raters increases. Moreover, for the data in this study, the same conclusion about inter-rater reliability was reached in virtually all cases with the four correlation coefficients. In addition, using quadratically weighted kappa, we reached a similar conclusion as with any correlation coefficient a great number of times. Hence, for the data in this study, it does not really matter which of these five coefficients is used. Moreover, the four correlation coefficients and quadratically weighted kappa tend to measure agreement in a similar way: their values are very highly correlated for the data in this study

    Kappa Coefficients for Missing Data

    Get PDF
    Cohen’s kappa coefficient is commonly used for assessing agreement between classifications of two raters on a nominal scale. Three variants of Cohen’s kappa that can handle missing data are presented. Data are considered missing if one or both ratings of a unit are missing. We study how well the variants estimate the kappa value for complete data under two missing data mechanisms—namely, missingness completely at random and a form of missingness not at random. The kappa coefficient considered in Gwet (Handbook of Inter-rater Reliability, 4th ed.) and the kappa coefficient based on listwise deletion of units with missing ratings were found to have virtually no bias and mean squared error if missingness is completely at random, and small bias and mean squared error if missingness is not at random. Furthermore, the kappa coefficient that treats missing ratings as a regular category appears to be rather heavily biased and has a substantial mean squared error in many of the simulations. Because it performs well and is easy to compute, we recommend to use the kappa coefficient that is based on listwise deletion of missing ratings if it can be assumed that missingness is completely at random or not at random
    corecore