29 research outputs found

    On Wald tests for differential item functioning detection

    Get PDF
    Wald-type tests are a common procedure for DIF detection among the IRT-based methods. However, the empirical type I error rate of these tests departs from the significance level. In this paper, two reasons that explain this discrepancy will be discussed and a new procedure will be proposed. The first reason is related to the equating coefficients used to convert the item parameters to a common scale, as they are treated as known constants whereas they are estimated. The second reason is related to the parameterization used to estimate the item parameters, which is different from the usual IRT parameterization. Since the item parameters in the usual IRT parameterization are obtained in a second step, the corresponding covariance matrix is approximated using the delta method. The proposal of this article is to account for the estimation of the equating coefficients treating them as random variables and to use the untransformed (i.e. not reparameterized) item parameters in the computation of the test statistic. A simulation study is presented to compare the performance of this new proposal with the currently used procedure. Results show that the new proposal gives type I error rates closer to the significance level

    Regularized Estimation of the Nominal Response Model

    Get PDF
    The nominal response model is an item response theory model that does not require the ordering of the response options. However, while providing a very flexible modeling approach of polytomous responses, it involves the estimation of many parameters at the risk of numerical instability and overfitting. The lasso is a technique widely used to achieve model selection and regularization. In this paper, we propose the use of a fused lasso penalty to group response categories and perform regularization of the unidimensional and multidimensional nominal response models. The good performance of the method is illustrated through real-data applications and simulation studies

    Multiple Equating of Separate IRT Calibrations

    Get PDF
    When test forms are calibrated separately, item response theory parameters are not comparable because they are expressed on different measurement scales. The equating process includes the conversion of item parameter estimates on a common scale and the determination of comparable test scores. Various statistical methods have been proposed to perform equating between two test forms. This paper provides a generalization to multiple test forms of the mean-geometric mean, the mean-mean, the Haebara, and the Stocking\u2013Lord methods. The proposed methods estimate simultaneously the equating coefficients that permit the scale transformation of the parameters of all forms to the scale of the base form. Asymptotic standard errors of the equating coefficients are derived. A simulation study is presented to illustrate the performance of the methods

    Regularized Estimation of the Four-Parameter Logistic Model

    Get PDF
    The four-parameter logistic model is an Item Response Theory model for dichotomous items that limit the probability of giving a positive response to an item into a restricted range, so that even people at the extremes of a latent trait do not have a probability close to zero or one. Despite the literature acknowledging the usefulness of this model in certain contexts, the difficulty of estimating the item parameters has limited its use in practice. In this paper we propose a regularized estimation approach for the estimation of the item parameters based on the inclusion of a penalty term in the log-likelihood function. Simulation studies show the good performance of the proposal, which is further illustrated through an application to a real-data set

    Factors affecting the variability of IRT equating coefficients

    Get PDF
    Knowing the effect of the factors that can influence the variability of the equating coefficients is an important tool for the development of the linkage plans. This paper explores the effect of various factors on the variability of item response theory equating coefficients. The factors studied are the sample size, the number of common items, the length of the chain, and the possibility of averaging the equating transformations related to different paths that connect the same two forms. Both asymptotic and simulations results are provided

    equateIRT: An R Package for IRT Test Equating

    Get PDF
    The R package equateIRT implements item response theory (IRT) methods for equating different forms composed of dichotomous items. In particular, the IRT models included are the three-parameter logistic model, the two-parameter logistic model, the one-parameter logistic model and the Rasch model. Forms can be equated when they present common items (direct equating) or when they can be linked through a chain of forms that present common items in pairs (indirect or chain equating). When two forms can be equated through different paths, a single conversion can be obtained by averaging the equating coefficients. The package calculates direct and chain equating coefficients. The averaging of direct and chain coefficients that link the same two forms is performed through the bisector method. Furthermore, the package provides analytic standard errors of direct, chain and average equating coefficients

    THE RASCH APPROACH TO "OBJECTIVE MEASUREMENT' m THE PRESENCE OF SUBJECTIVE EVALUATION FROM "JUDGES"

    Get PDF
    Algunas de las actividades humanas -deporte, educación, economía, investigación, desarrollo profesional, alimentación- requieren la participación de jueces en la evaluación de aspectos que son difíciles de medir de forma directa. Como estas evaluaciones pueden tener consecuencias relevantes para los sujetos examinados, es necesario investigar la máxima objetividad del proceso evaluador. El modelo de Rasch es el único procedimiento estadístico que asegura la medición objetiva, incluso en el proceso evaluador de jueces. Este artículo repasa la teoría del modelo de Rasch y propone una aplicación de datos relativos a la evaluación de proyectos financiados. Se examinan también las alteraciones significativas que se detectan cuando se emplean los modelos de las mediciones de Rasch.ABSTRACTA variety of human activities - sport, education, finance, research, professional development, feeding - require the participation of judges in order to evalúate aspects that are difficult to be measured directly. As this evaluations may have important consequences on the examined subjects, it is necessary to research the máximum objectivity in the  evaluation process. The Rasch model is the unique statistical model that assures the construction of objective measurements, even in the presence of judges evaluations. This paper reviews Rasch models theory and proposes an application to data concerning the evaluation of projects presented for a funding competition. The serious alterations that arise from the use of the rough scores instead of theRasch measures are explored

    RASCH ANALYSIS OF THE HOPE-SSQ QUESTIONNAIRE

    Get PDF
    The data arising from the \u201cHOPE ?WG1 - SSQ - Questionnaire\u201d, gathered on almost 1500 students in 27 sites around Europe, has been analyzed using Rasch models, in order to extract and measure factors inspiring to study physics. In particular, using a Rating Scale Model (Wright & Masters, 1982) and Principal Components Analysis (PCA) of standardized residuals, we identified and measured two main latent traits. These factors, interacting with other personal characteristics such as sex, level of knowledge of physics and so on, may influence performance, decisions, goals and preferences. We applied multilevel logistic regression models with SIMEX correction (Lederer & K\ufcchenhoff, 2006), using the estimated factors as explanatory variables: the results show that these are significant and relevant in explaining the decision to study physics, in association with the level of knowledge of physics and the wish to become a physics teacher. Some possible guidelines for stimulating the decision to study physics arises from this analysis
    corecore