Epidemiological studies often involve clinical scoring of animals by several observers due to the high number of farms to be visited. Detailed written procedures and intensive observer training minimize variation between observers. This, however, is still not common in international cooperation. We present data on clinical assessment of sows from an EU project on organic pig health (COREPIG) to illustrate the consequences. \ud The clinical scoring system was based on procedures from the Welfare Quality ® project and included measures regarding body condition (5-level scale), injuries (number of lesions >3cm on shoulder, side and hindquarters), lameness (3-level scale), dirtiness (3-level scale) and skin problems (3-level scale). Nine observers from 6 EU countries trained clinical scoring during two days in two herds. Of the 9 observers, 4 had no or little, 2 had intermediate and 3 had extensive experience in working with pigs. Four observers each had little or intermediate experience in clinical scoring of sows and only 1 had extensive experience. Training comprised parameter discussions and joint scoring of animals. After training, each observer scored up to 30 pregnant sows per farm in 3 to 20 herds in six European countries as part of a larger epidemiological protocol. After completion of farm visits, observers scored up to 50 sows independently but at the same day and farm in order to assess inter-observer agreement. Parameters were collapsed into binary variables. We calculated Kendall's Coefficient of Concordance (W) across all observers and Prevalence Adjusted Bias Adjusted Kappas (PABAK) for observer pairs as measures of agreement. \ud Agreement across observers was not acceptable for skin problems and lameness (W <0.41), and acceptable for dirtiness, obesity, and shoulder and hindquarters injuries (W between 0.41 and 0.60). Only for animal too thin and side injuries was W >0.60 (N = 26 sows for skin problems, and 31 to 34 sows for other parameters). Pairwise agreement was not acceptable for skin problems and dirtiness (mean PABAK <0.41) and acceptable for injuries shoulder and side (mean PABAK between 0.41 and 0.60). Agreement was good for hindquarter injuries and animal too thin (PABAK = 0.66 and 0.65, respectively), while obesity and lameness had mean PABAK of 0.84 and 0.95. Observer pairs scored 40 to 50 sows per parameter except for skin problems (36 to 49 sows). Results for lameness and obesity should be interpreted with care, as average prevalence across observers were only 3 and 8 %, respectively. Determination of whether a sow was too thin was the parameter with best agreement. The poor agreement for skin problems and dirtiness can be explained by misunderstandings regarding the parameter definition (e.g. inclusion of mud soiling). Extensive practical experience with pigs was of highest benefit for inter-observer agreement. Average PABAK was 0.70 (STD = 0.19, N = 24 scorings; 3 observer pairs, 8 parameters) for experienced observers but ranged between 0.49 and 0.56 (STD range 0.32 to 0.40) for all other combinations of experience level. The level of experience with clinical scoring of pigs did not have obvious positive effects. Average PABAK for all experience combinations ranged from 0.51 to 0.61 (STD range 0.32 to 0.40). By way of explanation, general experience with pigs helps to score an animal because observers will know a wider range of possible scenarios. By contrast, scores of observers who have already learned a scoring system will tend to be biased by their experience.\ud As a conclusion, our data emphasize the importance of intensive observer training before data collection and the need for inter-observer agreement tests before and after data collection
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.