185 research outputs found

    Outlier detection in high-stakes college entrance testing

    Get PDF
    In this study we discuss recent developments of person-fit analysis in the context of computerized adaptive testing (CAT). Methods from statistical process control are discussed that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory (IRT) model in a CAT. Most person-fit research in CAT is restricted to simulated data. In this study, empirical data from a high-stakes test are used. Alternative methods to generate norm distributions to allow the determination of bounds are discussed. These bounds may be used to classify item score patterns as fitting or misfitting. Using bounds determined from the sample, the empirical analysis indicated that different types of misfit can be distinguished. Possibilities to use this method as a diagnostic instrument are discussed

    Simple nonparametric checks for model data fit in CAT

    Get PDF
    In this paper, the usefulness of several nonparametric checks is discussed in a computerized adaptive testing (CAT) context. Although there is no tradition of nonparametric scalability in CAT, it can be argued that scalability checks can be useful to investigate, for example, the quality of item pools. Although IRT models are strongly embedded in the development and construction of CAT, the development of CAT is strongly related to parametric as opposed to nonparametric IRT modeling. This is not surprising because one of the key features of a CAT is the item selection procedure on the basis of an estimated latent trait from a calibrated item pool. Parametric IRT models enable the separate estimation of item and person parameters and, thus, facilitate this process enormously. The recent developments in nonparametric IRT, however, also suggest that techniques and statistics used in this IRT field may contribute to the development and improvement of the psychometric quality of a CAT. Investigating nonparametric IRT modeling may also help us to gain insight into the assumptions underlying CAT and may help to unify IRT modeling

    Investigating the quality of items in CAT using nonparametric IRT

    Get PDF
    I discuss the applicability of nonparametric item response theory (IRT) models to the quality of item pool development in the context of CAT, and I contrast these models with parametric IRT models. I also show how nonparametric IRT models can easily be applied and how misleading results from parametric IRT models can be avoided. I recommend the use of nonparametric IRT modeling to routinely investigate the quality of item pools

    Robustness of person-fit decisions in computerized adaptive testing

    Get PDF
    Person-fit statistics test whether or not the likelihood of a respondent’s complete vector of item scores on a test is low given the hypothesized item response theory (IRT) model. This binary information may be insufficient for diagnosing the cause of a misfitting item-score vector. This paper applies different types of person-fit analysis in a computer adaptive testing context and investigates the robustness of several methods to multidimensional test data. Both global person-fit statistics to make the binary decision about fit or misfit of a person’s item-score vector and local checks are applied. Results showed that there are differences between the methods with respect to the robustness in a multidimensional context and that some methods are more useful than othermethods

    Divisie D: Measurement and reseach methodology

    Get PDF

    Nonparametric item response theory and related topics

    Get PDF

    Detection of advance item knowledge using response times in computer adaptive testing

    Get PDF
    We propose a new method for detecting item preknowledge in a CAT based on an estimate of “effective response time” for each item. Effective response time is defined as the time required for an individual examinee to answer an item correctly. An unusually short response time relative to the expected effective response time may be an indicator of item preknowledge. The new method was applied to empirical data. Results showed that the Type I error rate of the statistic can be controlled. Power analysis revealed that the power is high when the response time is reduced even for a small set of items where the examinee has item preknowledge

    A Bayesian approach to person-fit analysis in item response theory models

    Get PDF

    Some new methods to detect person fit in CAT

    Get PDF
    Person fit is concerned with detecting nonfitting item-score patterns. Most person-fit statistics have been proposed in the context of conventionally administered tests or paper-and-pencil (P&P) tests. In this study, we will first review some existing person-fit studies in a computerized adaptive testing (CAT) context and then investigate the usefulness of some new fit statistics that are based on the specific characteristics of a CAT. Both the use of statistical process control and the use of nonparametric tests is explored. The results of a simulation study to detect nonfitting response patterns in a CAT showed that the detection rate of these statistics is comparable to the detection rate of person-fit statistics in P&P tests

    On the consistency of individual classification using short scales

    Get PDF
    Short tests containing at most 15 items are used in clinical and health psychology, medicine, and psychiatry for making decisions about patients. Because short tests have large measure-ment error, the authors ask whether they are reliable enough for classifying patients into a treatment and a nontreatment group. For a given certainty level, proportions of correct classifications were computed for varying test length, cut-scores, item scoring, and choices of item parameters. Short tests were found to classify at most 50 % of a group consistently. Results were much better for tests containing 20 or 40 items. Small differences were found between dichotomous and polytomous (5 ordered scores) items. It is recommended that short tests for high-stakes decision making be used in combination with other information so as to increase reliability and classification consistency
    • …
    corecore