219 research outputs found
Simple nonparametric checks for model data fit in CAT
In this paper, the usefulness of several nonparametric checks is discussed in a computerized adaptive testing (CAT) context. Although there is no tradition of nonparametric scalability in CAT, it can be argued that scalability checks can be useful to investigate, for example, the quality of item pools. Although IRT models are strongly embedded in the development and construction of CAT, the development of CAT is strongly related to parametric as opposed to nonparametric IRT modeling. This is not surprising because one of the key features of a CAT is the item selection procedure on the basis of an estimated latent trait from a calibrated item pool. Parametric IRT models enable the separate estimation of item and person parameters and, thus, facilitate this process enormously. The recent developments in nonparametric IRT, however, also suggest that techniques and statistics used in this IRT field may contribute to the development and improvement of the psychometric quality of a CAT. Investigating nonparametric IRT modeling may also help us to gain insight into the assumptions underlying CAT and may help to unify IRT modeling
Robustness of person-fit decisions in computerized adaptive testing
Person-fit statistics test whether or not the likelihood of a respondent’s complete vector of item scores on a test is low given the hypothesized item response theory (IRT) model. This binary information may be insufficient for diagnosing the cause of a misfitting item-score vector. This paper applies different types of person-fit analysis in a computer adaptive testing context and investigates the robustness of several methods to multidimensional test data. Both global person-fit statistics to make the binary decision about fit or misfit of a person’s item-score vector and local checks are applied. Results showed that there are differences between the methods with respect to the robustness in a multidimensional context and that some methods are more useful than othermethods
Outlier detection in high-stakes college entrance testing
In this study we discuss recent developments of person-fit analysis in the context of computerized adaptive testing (CAT). Methods from statistical process control are discussed that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory (IRT) model in a CAT. Most person-fit research in CAT is restricted to simulated data. In this study, empirical data from a high-stakes test are used. Alternative methods to generate norm distributions to allow the determination of bounds are discussed. These bounds may be used to classify item score patterns as fitting or misfitting. Using bounds determined from the sample, the empirical analysis indicated that different types of misfit can be distinguished. Possibilities to use this method as a diagnostic instrument are discussed
Investigating the quality of items in CAT using nonparametric IRT
I discuss the applicability of nonparametric item response theory (IRT) models to the quality of item pool development in the context of CAT, and I contrast these models with parametric IRT models. I also show how nonparametric IRT models can easily be applied and how misleading results from parametric IRT models can be avoided. I recommend the use of nonparametric IRT modeling to routinely investigate the quality of item pools
Detection of advance item knowledge using response times in computer adaptive testing
We propose a new method for detecting item preknowledge in a CAT based on an estimate of “effective response time” for each item. Effective response time is defined as the time required for an individual examinee to answer an item correctly. An unusually short response time relative to the expected effective response time may be an indicator of item preknowledge. The new method was applied to empirical data. Results showed that the Type I error rate of the statistic can be controlled. Power analysis revealed that the power is high when the response time is reduced even for a small set of items where the examinee has item preknowledge
Exploring new methods to detect person misfit in CAT
Item scores that do not fit an assumed item response theory model may cause the latent trait value to be inaccurately estimated. Several person-fit statistics for detecting nonfitting response behavior for paper-and-pencil tests have been proposed. In the context of computerized adaptive testing, the use of person-fit analysis is hardly explored. Because it has been shown that the distribution of existing person-fit statistics is not applicable in a computer adaptive test (CAT), new person-fit statistics are proposed, and critical values for these statistics are derived from existing statistical theory. The theoretical and empirical distributions are compared, and a power study is performed
Global, local and graphical person-fit analysis using person response functions
Person-fit statistics test whether the likelihood of a respondent’s complete vector of item scores on a test is low given the hypothesized item response theory model. This binary information may be insufficient for diagnosing the cause of a misfitting item-score vector. The authors propose a comprehensive methodology for person-fit analysis in the context of nonparametric item response theory. The methodology (a) includes H. Van der Flier’s (1982) global person-fit statistic U3 to make the binary decision about fit or misfit of a person’s item-score vector, (b) uses kernel smoothing (J. O. Ramsay, 1991) to estimate the person-response function for the misfitting item-score vectors, and (c) evaluates unexpected trends in the person-response function using a new local person-fit statistic (W. H. M. Emons, 2003). An empirical data example shows how to use the methodology for practical person-fit analysis
- …
