Search CORE

192 research outputs found

Inequalities between multi-rater kappas

Author: Warrens M.J.
Publication venue
Publication date: 01/01/2010
Field of study

Multivariate analysis of psychological data - ou

Springer - Publisher Connector

Leiden University Scholary Publications

Reformulation and Generalisation of the Cohen and Fleiss Kappas

Author: Cheetham Barry M.G.
Gadepalli Chaitanya
Xie Zheng
Publication venue: 'Global Research & Development Services'
Publication date: 16/11/2017
Field of study

The assessment of consistency in the categorical or ordinal decisions made by observers or raters is an important problem especially in the medical field. The Fleiss Kappa, Cohen Kappa and Intra-class Correlation (ICC), as commonly used for this purpose, are compared and a generalised approach to these measurements is presented. Differences between the Fleiss Kappa and multi-rater versions of the Cohen Kappa are explained and it is shown how both may be applied to ordinal scoring with linear, quadratic or other weighting. The relationship between quadratically weighted Fleiss and Cohen Kappa and pair-wise ICC is clarified and generalised to multi-rater assessments. The AC1 coefficient is considered as an alternative measure of consistency and the relevance of the Kappas and AC1 to measuring content validity is explore

CLoK

Global Research & Development Services Publishing

A Comparison of Reliability Coefficients for Ordinal Rating Scales

Author: Bosker Roel J.
de Raadt Alexandra
Kiers Henk A. L.
Warrens Matthijs J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2021
Field of study

Kappa coefficients are commonly used for quantifying reliability on a categorical scale, whereas correlation coefficients are commonly applied to assess reliability on an interval scale. Both types of coefficients can be used to assess the reliability of ordinal rating scales. In this study, we compare seven reliability coefficients for ordinal rating scales: the kappa coefficients included are Cohen’s kappa, linearly weighted kappa, and quadratically weighted kappa; the correlation coefficients included are intraclass correlation ICC(3,1), Pearson’s correlation, Spearman’s rho, and Kendall’s tau-b. The primary goal is to provide a thorough understanding of these coefficients such that the applied researcher can make a sensible choice for ordinal rating scales. A second aim is to find out whether the choice of the coefficient matters. We studied to what extent we reach the same conclusions about inter-rater reliability with different coefficients, and to what extent the coefficients measure agreement in a similar way, using analytic methods, and simulated and empirical data. Using analytical methods, it is shown that differences between quadratic kappa and the Pearson and intraclass correlations increase if agreement becomes larger. Differences between the three coefficients are generally small if differences between rater means and variances are small. Furthermore, using simulated and empirical data, it is shown that differences between all reliability coefficients tend to increase if agreement between the raters increases. Moreover, for the data in this study, the same conclusion about inter-rater reliability was reached in virtually all cases with the four correlation coefficients. In addition, using quadratically weighted kappa, we reached a similar conclusion as with any correlation coefficient a great number of times. Hence, for the data in this study, it does not really matter which of these five coefficients is used. Moreover, the four correlation coefficients and quadratically weighted kappa tend to measure agreement in a similar way: their values are very highly correlated for the data in this study

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Reliability of an Observational Method Used to Assess Tennis Serve Mechanics in a Group of Novice Raters

Author: Capilouto Gilson J.
English Tony
Kibler W. Ben
Myers Natalie L.
Uhl Timothy L.
Westgate Philip M.
Publication venue: UKnowledge
Publication date: 01/12/2017
Field of study

Background: Previous research has developed an observational tennis serve analysis (OTSA) tool to assess serve mechanics. The OTSA has displayed substantial agreement between the two health care professionals that developed the tool; however, it is currently unknown if the OTSA is reliable when administered by novice users. Purpose: The purpose of this investigation was to determine if reliability for the OTSA could be established in novice users via an interactive classroom training session. Methods: Eight observers underwent a classroom instructional training protocol highlighting the OTSA. Following training, observers participated in two different rating sessions approximately a week apart. Each observer independently viewed 16 non-professional tennis players performing a first serve. All observers were asked to rate the tennis serve using the OTSA. Both intra and inter-observer reliability were determined using Kappa coefficients. Results: Kappa coefficients for intra and inter-observer agreement ranged from 0.09 to 0.83 depending on the body position. A majority of all body positions yeilded moderate agreement and higher. Conclusion: This study suggests that the majority of components associated with the OTSA are reliable and can be taught to novice users via a classroom training session

University of Kentucky

A family of multi-rater kappas that can always be increased and decreased by combining categories

Author: Warrens M.J.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

FSW – Publicaties zonder aanstelling Universiteit Leide

Leiden University Scholary Publications

Tilburg University Repository

The problem with Kappa

Author: Barnes Lara-Marie
Cheek Liz
Davies J.G.
Kozynchenko O.P.
Lloyd Andrew
Mikhalovsky Sergey
Phillips Gary
Rawlinson A.P.
Tennison S.R.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/07/2009
Field of study

It is becoming clear that traditional evaluation measures used in Computational Linguistics (including Error Rates, Accuracy, Recall, Precision and F-measure) are of limited value for unbiased evaluation of systems, and are not meaningful for comparison of algorithms unless both the dataset and algorithm parameters are strictly controlled for skew (Prevalence and Bias). The use of techniques originally designed for other purposes, in particular Receiver Operating Characteristics Area Under Curve, plus variants of Kappa, have been proposed to fill the void. This paper aims to clear up some of the confusion relating to evaluation, by demonstrating that the usefulness of each evaluation method is highly dependent on the assumptions made about the distributions of the dataset and the underlying populations. The behaviour of a number of evaluation measures is compared under common assumptions. Deploying a system in a context which has the opposite skew from its validation set can be expected to approximately negate Fleiss Kappa and halve Cohen Kappa but leave Powers Kappa unchanged. For most performance evaluation purposes, the latter is thus most appropriate, whilst for comparison of behaviour, Matthews Correlation is recommended

University of Brighton Research Portal

King's Research Portal

Flinders Academic Commons

Corrected Zegers-ten Berge coefficients are special cases of Cohen's weighted kappa

Author: Warrens M.J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Multivariate analysis of psychological dat

Leiden University Scholary Publications

Kappa coefficients for dichotomous-nominal classifications

Author: Warrens Matthijs J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2021
Field of study

Two types of nominal classifications are distinguished, namely regular nominal classifications and dichotomous-nominal classifications. The first type does not include an 'absence' category (for example, no disorder), whereas the second type does include an 'absence' category. Cohen's unweighted kappa can be used to quantify agreement between two regular nominal classifications with the same categories, but there are no coefficients for assessing agreement between two dichotomous-nominal classifications. Kappa coefficients for dichotomous-nominal classifications with identical categories are defined. All coefficients proposed belong to a one-parameter family. It is studied how the coefficients for dichotomous-nominal classifications are related and if the values of the coefficients depend on the number of categories. It turns out that the values of the new kappa coefficients can be strictly ordered in precisely two ways. The orderings suggest that the new coefficients are measuring the same thing, but to a different extent. If one accepts the use of magnitude guidelines, it is recommended to use stricter criteria for the new coefficients that tend to produce higher values

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen