Abstract This study assessed the consistency with which aggressive behavior occurred across 3 different\ud provocation tests that are currently used in practice to evaluate the behavior and safety of dogs. The\ud aim of this study was not to validate the tests, but to evaluate tests that are not validated but are nevertheless\ud being used in a legal context in Switzerland, by investigating the hypothesis that 3 different\ud approaches, all claiming to correctly evaluate the behavior of dogs, should be expected to show significant\ud agreement. The same 60 dogs were tested in 3 behavioral tests being used in Switzerland at the\ud time of this study in the year 2003 (Test A: Test of the American Staffordshire Terrier Club; Test B:\ud Halterpru¨fung; Test C: Test of the Canton of Basel-Stadt). ‘‘Intraspecific behavior’’ and ‘‘interspecific\ud behavior toward humans’’ that might relate to potential aggressive behavior were of particular interest.\ud The observed agreement among the 3 tests was compared relative to chance using a k test. Significant\ud but low levels of agreement were found among the 3 tests for the criterion ‘‘intraspecific behavior’’\ud (k 5 0.133, P 5 .014), with the highest correlation between Tests A and B (k 5 0.345, P , .001)\ud and for the criterion ‘‘interspecific behavior’’ (k 5 0.135, P 5 0. 014), with Tests A and B (k 5 0.220,\ud P 5 .005) showing the highest correlation. However, significant absolute values of k were low in all\ud cases. In a further analysis, dogs evaluated to show no signs of potential aggression in the test situations\ud by all 3 tests were eliminated, and the results of the remaining dogs (‘‘interspecific behavior,’’ n 5 23;\ud ‘‘intraspecific behavior,’’ n 5 29) were assessed for disagreement in pairwise combinations using a\ud McNemar chi-square test. No significant levels of disagreement were found for ‘‘intraspecific behavior,’’\ud however, for ‘‘interspecific behavior,’’ Tests A and B (P 5 .035), and Tests B and C (P , .001)\ud differed significantly, with no significant difference between Tests A and B (P 5 0.11). The inconsistency\ud of the results from different tests suggests test bias at the very least and questions the validity of\ud these tests. Further work examining the validity of each individual test is warranted if they are to be\ud used in a legal context
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.