140 research outputs found
Detection of differential item functioning using Lagrange multiplier tests
In this paper it is shown that differential item functioning can be evaluated using the Lagrange multiplier test or C. R. Rao's efficient score test. The test is presented in the framework of a number of item response theory (IRT) models such as the Rasch model, the one-parameter logistic model, the two-parameter logistic model, the generalized partial credit model, and the nominal response mod.el. However, the paradigm ifor detection of differential item functioning presented here applies to other IRT models. The proposed method is based on a test statistic with a known asymptotic distribution. Two examples are given, one using simulated data and one using real data from 1,000 boys and 1,000 girls taking a Dutch secondary examination
Violations of ignorability in computerized adaptive testing
Using auxiliary information and allowing item review in computerized adaptive testing produces a violation of the ignorability principle for missing data (Rubin, 1976) that may bias parameter estimates in IRT models. However, the violation of ignorability does not automatically lead to bias. In this report, two situations are distinguished. 1. Estimation of the proficiency parameters in computerized adaptive testing using auxiliary information about proficiency and allowing item review, where the item parameters are considered known. Both analytically and through simulation studies, it is shown that the violation of ignorability does not lead to a gross inflation of bias. 2. Calibration of item and population parameters using maximum marginal likelihood estimation. Through simulation studies it is shown that violation of ignorability does result in bias. An analytical explanation of the result is given
CUSUM statistics for large item banks: computation of standard errors
In a previous study (1998), how to evaluate whether adaptive testing data used for online calibration sufficiently fit the item response model used by C. Glas was studied. Three approaches were suggested, based on a Lagrange multiplier (LM) statistic, a Wald statistic, and a cumulative sum (CUMSUM) statistic respectively. For all these methods, the asymptotic variance of the parameter estimates has to be approximated. In the previous study, standard errors were computed using an observed Fisher information matrix. However, when the number of items in each bank becomes very large, manipulating complete information matrices becomes quite difficult. This study investigates the extent to which standard errors can be computed using the diagonal of information matrices only, and how the CUMSUM procedure must be tuned to this alternative approach. Simulation studies showed that the asymptotic standard errors are underestimated by the block-diagonal approach but that the magnitude of the bias in the standard errors was relatively small. It was also shown that the power of the statistical test based on a CUMSUM statistic using these approximated standard errors is well under control
Alternative approaches to updating item parameter estimates in tests with item cloning
Item cloning techniques can greatly reduce the cost of item writing and enhance the flexibility of item presentation. To deal with the possible variability of the item parameters caused by item cloning, Glas and van der Linden (in press, 2006) proposed a multilevel item response model where it is assumed that the item parameters of a 3-parameter logistic (3PL) model or a 3-parameter normal ogive (3PNO) model are sampled from a multivariate normal distribution associated with a parent item. The model is referred to as the item cloning model (ICM). For the situation where each cloned item is presented to a substantial number of respondents, Glas and van der Linden (2006) proposed a Bayesian procedure for parameter estimation using a Markov chain Monte Carlo (MCMC) method (the Gibbs sampler). Two procedures for updating the parameter estimates in the ICM are compared. In the first procedure, the MCMC procedure is run on the combined original and new data set. In the second procedure, the estimates obtained on the original data set are used as priors in an MCMC run using the new data only. Results of simulation studies indicated that the second procedure tended to lead to some loss of precision in the parameter estimates. However, in the simulation studies presented here, this loss was limited. On the other hand, the gain in computation time for the second method was not substantial either
The impact of parameter estimation on computerized adaptive testing with item cloning
Item cloning techniques can greatly reduce the cost of item writing and enhance the flexibility of item presentation. An important consequence of cloning is that it may cause variability in the item parameters. Recently, Glas and van der Linden (in press, 2005) proposed a multilevel item response model where it is assumed that the item parameters of a 3-parameter logistic (3PL) model or a 3-parameter normal ogive (3PNO) model are sampled from a multivariate normal distribution associated with a parent item. In the sequel, the model will be referred to as the item cloning model, which will be abbreviated ICM. Several procedures for item bank calibration and computerized adaptive testing (CAT) were proposed. The latter procedures were developed under the usual assumption that the item parameters are known. However, in practice, item parameters have to be estimated, which introduces an error component that can have substantial effects. For the standard 3PL model, van der Linden and Glas (2000, 2001) show that capitalization on estimation error can lead to a substantial loss of precision. In the present report, this finding is corroborated for the ICM. It is shown that the problem can be solved by a Bayesian item selection procedure where the uncertainty about the item parameters is taken into account by implicating their posterior distributions. These posterior distributions are generated using the Gibbs Sampler. A simulation study is presented to illustrate the performance of the method
Quality control of online calibration in computerized assessment
In computerized adaptive testing, updating item parameter estimates using adaptive testing data is often called online calibration. This study investigated how to evaluate whether the adaptive testing data used for online calibration sufficiently fit the item response model used. Three approaches were investigated, based on a Lagrange multiplier (LM) statistic, a Wald statistic, and a cumulative sum (CUSUM) statistic. The power of the tests was evaluated with a number of simulation studies. It was found that the tests had moderate to good power to detect shifts in the values of the guessing and difficulty parameters, and all tests were equally sensitive to all shifts in the values of all parameters. The practical conclusion is that all of these statistics can be used very well to detect if something has happened to the item parameters but that it may be difficult to attribute the problems to specific parameters
Likelihood-based statistics for validating continuous response models
The theory for the estimation and testing of item response theory (IRT) models for items with discrete responses is by now very thoroughly developed. In contrast, the estimation and testing theory for IRT models for items with continuous responses has hardly received any attention. This is mainly due to the fact that the continuous response format is seldom used. An exception may be the so-called analogous-scale item format where a respondent marks the position on a line to express his or her opinion about a topic. Recently, continuous responses have attracted interest as covariates accompanying discrete responses. One may think of the response time needed to answer an item in a computerized adaptive testing situation. In the present report, the theory of estimating and testing a model for continuous responses, the model proposed by Mellenbergh in 1994, is developed in a marginal maximum likelihood framework. It is shown that the fit to the model can be evaluated using Lagrange multiplier tests. Simulation studies show that these tests have excellent properties in terms of control of Type I error rate and power
- …