You only live up to the standards you set: An evaluation of different approaches to standard setting

Abstract

Interpretation of performance in reference to a standard can provide nuanced, finely-tuned information regarding examinee abilities beyond that of just a total score. However, there is a multitude of ways to set performance standards yet little guidance regarding which method operates best and under what circumstances. Traditional methods are the most common approach adopted in practice and heavily involve subject matter experts (SMEs). Two other approaches have been suggested in the literature as alternative ways to set performance standards, although they have yet to be implemented in practice. Data-driven approaches do not involve SMEs but rather rely solely upon statistical techniques to classify examinees into groups. Integrated approaches are a newer standard setting method that combines judgments provided by SMEs with statistical techniques to inform the creation of performance standards. The primary purpose of this dissertation was to describe and illustrate the traditional, data-driven, and integrated approaches used to establish performance standards on tests. A traditional standard setting was conducted using the modified Angoff procedure. Latent class analysis (LCA)—a data-driven classification technique—was performed in which model parameters were first freely estimated to assess the fit of various general LCA models and later constrained to create ordered groups for various ordinal LCA models. The traditional and data-driven standard setting methods were combined to form an “integrated” approach. SMEs’ ratings of expected examinee performance (derived from the modified Angoff standard setting) were used as item difficulty constraints in an integrated LCA model, the Angoff LCA. The results were used to compare examinee classifications from all three approaches and model-data fit amongst the statistically-oriented methods. Although classifications were planned for comparison across all three approaches, issues were encountered with the Angoff LCA. Therefore, the comparisons of primary interest were between the modified Angoff and championed LCA model. The results did not offer a clear-cut decision about which approach to champion. Ultimately, the modified Angoff was selected as the most appropriate standard setting approach for the test administered. Important considerations are offered for researchers who wish to use data-driven models to set standards and ideas are proposed for future research

    Similar works