419 research outputs found

    Using Paired Comparison Matrices to Estimate Parameters of the Partial Credit Rasch Measurement Model for Rater-Mediated Assessments

    Get PDF
    The purpose of this paper is to describe a technique for estimating the parameters of a Rasch model that accommodates ordered categories and rater severity. The technique builds on the conditional pairwise algorithm described by Choppin (1968, 1985) and represents an extension of a conditional algorithm described by Garner and Engelhard (2000, 2002) in which parameters appear as the eigenvector of a matrix derived from paired comparisons. The algorithm is used successfully to recover parameters from a simulated data set. No one has previously described such an extension of the pairwise algorithm to a Rasch model that includes both ordered categories and rater effects. The paired comparisons technique has importance for several reasons: it relies on the separability of parameters that is true only for the Rasch measurement model; it works in the presence of missing data; it makes transparent the connectivity needed for parameter estimation; and it is very simple. The technique also shares the mathematical framework of a very popular technique in the social sciences called the Analytic Hierarchy Process (Saaty, 1996)

    Critical Values for Yen’s Q3: Identification of Local Dependence in the Rasch model using Residual Correlations

    Get PDF
    The assumption of local independence is central to all IRT models. Violations can lead to inflated estimates of reliability and problems with construct validity. For the most widely used fit statistic Q3 there are currently no well-documented suggestions of the critical values which should be used to indicate local dependence, and for this reason a variety of arbitrary rules of thumb are used. In this study, we used an empirical data example and Monte Carlo simulation to investigate the different factors that can influence the null distribution of residual correlations, with the objective of proposing guidelines that researchers and practitioners can follow when making decisions about local dependence during scale development and validation. We propose that a parametric bootstrapping procedure should be implemented in each separate situation in order to obtain the critical value of local dependence applicable to the data set, and provide example critical values for a number of data structure situations. The results show that for the Q3 fit statistic no single critical value is appropriate for all situations, as the percentiles in the empirical null distribution are influenced by the number of items, the sample size, and the number of response categories. Furthermore, our results show that local dependence should be considered relative to the average observed residual correlation, rather than to a uniform value, as this results in more stable percentiles for the null distribution of an adjusted fit statistic

    THE EFFECTS OF MISSING DATA TREATMENT ON PERSON ABILITY ESTIMATES USING IRT MODELS

    Get PDF
    Unplanned missing responses are common to surveys and tests including large scale assessments. There has been an ongoing debate on how missing responses should be handled and some approaches are preferred over others, especially in the context of the item response theory (IRT) models. In this context, examinees’ abilities are normally estimated with the missing responses generally ignored or treated as incorrect. Most of the studies that have explored the performance of missing data handling approaches have used simulated data. This study uses the SERCE (UNESCO, 2006) dataset and missingness pattern to evaluate the performance of three approaches: treating omitted as incorrect, midpoint imputation, and multiple imputation with and without auxiliary variables. Using the Rasch and 2PL models, the results showed that treating omitted as incorrect had a reduced average error in the estimation of ability but tended to underestimate the examinee’s ability. Multiple imputation with and without auxiliary variables had similar performances to one another. Consequently, the use of auxiliary variable may not harm the estimation, but it can become an unnecessary burden during the imputation process. The midpoint imputation did not differ much from multiple imputation in its performance and thus should be preferred over the latter for practical reasons. The main implication is that SERCE might have underestimated the student’s ability. Limitations and further directions are discussed. Adviser: R. J. De Ayal

    Three New Studies on Model-data Fit for Latent Variable Models in Educational Measurement

    Get PDF
    This dissertation encompasses three studies on issues of model-data fit methods for latent variable models implemented in modern educational measurement. The first study proposes a new statistic to test the mean-difference of the ability distributions estimated based on the responses of a group of examinees, which can be used to detect aberrant responses of a group of test-takers. The second study is a review of the current model-data fit indexes used for cognitive diagnostic models. Third study introduces a modified version of an existing item fit statistic so that the modified statistic has a known chi-square distribution. Lastly, a discussion of the three studies is given, including the studies’ limitations and thoughts on the direction of future research

    Estimation of Population Size with Heterogeneous Catchability and Behavioural Dependence: Applications to Air and Water Borne Disease Surveillance

    Full text link
    Population size estimation based on the capture-recapture experiment is an interesting problem in various fields including epidemiology, criminology, demography, etc. In many real-life scenarios, there exists inherent heterogeneity among the individuals and dependency between capture and recapture attempts. A novel trivariate Bernoulli model is considered to incorporate these features, and the Bayesian estimation of the model parameters is suggested using data augmentation. Simulation results show robustness under model misspecification and the superiority of the performance of the proposed method over existing competitors. The method is applied to analyse real case studies on epidemiological surveillance. The results provide interesting insight on the heterogeneity and dependence involved in the capture-recapture mechanism. The methodology proposed can assist in effective decision-making and policy formulation

    ITEM-ANALYSIS METHODS AND THEIR IMPLICATIONS FOR THE ILTA GUIDELINES FOR PRACTICE: A COMPARISON OF THE EFFECTS OF CLASSICAL TEST THEORY AND ITEM RESPONSE THEORY MODELS ON THE OUTCOME OF A HIGH-STAKES ENTRANCE EXAM

    Get PDF
    The current version of the International Language Testing Association (ILTA) Guidelines for Practice requires language testers to pretest items before including them on an exam, or when pretesting is not possible, to conduct post-hoc item analysis to ensure any malfunctioning items are excluded from scoring. However, the guidelines are devoid of guidance with respect to which item-analysis method is appropriate for any given examination. The purpose of this study is to determine what influence choice of item-analysis method has on the outcome of a high-stakes university entrance exam. Two types of classical-test-theory (CTT) item analysis and three item-response-theory (IRT) models were applied to responses generated from a single administration of a 70-item dichotomously scored multiple-choice test of English proficiency, administered to 2,320 examinees applying to a prestigious private university in western Japan. Results illustrate that choice of item-analysis method greatly influences the ordinal ranking of examinees. The implications of these findings are discussed and recommendations are made for revising the ILTA Guidelines for Practice to delineate more explicitly how language testers should apply item analysis in their testing practice
    • …
    corecore