Comparison of the capacity of three nonparametric person-fit indices to detect different aberrant response patterns on real data

Abstract

Amb el suport del Consell Superior d'Avaluació de CatalunyaIntroduction: A total test score obtained as the sum of correct answers to items may provide a false picture of the actual level of competence of the person evaluated if it is the result of an aberrant response pattern (ARP). To detect possible ARP's a long list of person-fit indices are available. We study the behavior of three group-based or nonparametric person-fit indices, namely the Harnisch and Linn' Modified Caution Index (MCI), Van der Flier' U3 and Sijtsma' HT. The aim of the study is to evaluate the sensitivity of the extreme values of these indices to detect different types of ARP. Method: The indices are calculated on responses to basic skills test administered to the entire population of students of sixth year of primary school (11-12 years) made by the Catalan Evaluation Council of the Education System. The work focuses on a sample of students who meet the criterion of having obtained in any of the three indices considered, one above the 95th percentile, the ARP presence indicator score. To classify different types of ARP, we propose a procedure based on the grouping of items by level of difficulty, and to identify the kind of items where the successes and errors are concentrated. Results: The results indicate that the most aberrant patterns are best detected by the U3. MCI and HT show similar behavior and better detect response patterns that deviate from those expected in items of moderate to high difficulty. The maximum percentage of coincidence of the three indices is observed in identifying patterns that focuses on fewer correct answers than expected among the easy items and more correct answers than expected among the most difficult items. Conclusion: In our study, conducted with real data and a new way of characterizing patterns, we have observed that the indices show different sensitivities for detecting different ARP. Our results complement those obtained in other studies based on simulated data. All these results should be taken into account when choosing the index that best suits the characteristics of the test being evaluated and the type of ARP to be detected

    Similar works