44 research outputs found

    An Evaluation of DIF Tests in Multistage Tests for Continuous Covariates

    Get PDF
    Multistage tests are a widely used and efficient type of test presentation that aims to provide accurate ability estimates while keeping the test relatively short. Multistage tests typically rely on the psychometric framework of item response theory. Violations of item response models and other assumptions underlying a multistage test, such as differential item functioning, can lead to inaccurate ability estimates and unfair measurements. There is a practical need for methods to detect problematic model violations to avoid these issues. This study compares and evaluates three methods for the detection of differential item functioning with regard to continuous person covariates in data from multistage tests: a linear logistic regression test and two adaptations of a recently proposed score-based DIF test. While all tests show a satisfactory Type I error rate, the score-based tests show greater power against three types of DIF effects

    Controlling the Speededness of Assembled Test Forms: A Generalization to the Three-Parameter Lognormal Response Time Model

    Get PDF
    When designing or modifying a test, an important challenge is controlling its speededness. To achieve this, van der Linden (2011a, 2011b) proposed using a lognormal response time model, more specifically the two-parameter lognormal model, and automated test assembly (ATA) via mixed integer linear programming. However, this approach has a severe limitation, in that the two-parameter lognormal model lacks a slope parameter. This means that the model assumes that all items are equally speed sensitive. From a conceptual perspective, this assumption seems very restrictive. Furthermore, various other empirical studies and new data analyses performed by us show that this assumption almost never holds in practice. To overcome this shortcoming, we bring together the already frequently used three-parameter lognormal model for response times, which contains a slope parameter, and the ATA approach for controlling speededness by van der Linden. Using multiple empirically based illustrations, the proposed extension is illustrated, including complete and documented R code. Both the original van der Linden approach and our newly proposed approach are available to practitioners in the freely available R package eatATA.Peer Reviewe

    Automated Test Assembly in R: The eatATA Package

    Get PDF
    Combining items from an item pool into test forms (test assembly) is a frequent task in psychological and educational testing. Although efficient methods for automated test assembly exist, these are often unknown or unavailable to practitioners. In this paper we present the R package eatATA, which allows using several mixed-integer programming solvers for automated test assembly in R. We describe the general functionality and the common work flow of eatATA using a minimal example. We also provide four more elaborate use cases of automated test assembly: (a) The assembly of multiple test forms for a pilot study; (b) the assembly of blocks of items for a multiple matrix booklet design in the context of a large-scale assessment; (c) the assembly of two linear test forms for individual diagnostic purposes; (d) the assembly of multi-stage testing modules for individual diagnostic purposes. All use cases are accompanied with example item pools and commented R code.Peer Reviewe

    Detecting differential item functioning in 2PL multistage assessments

    Full text link
    The detection of differential item functioning is crucial for the psychometric evaluation of multistage tests. This paper discusses five approaches presented in the literature: logistic regression, SIBTEST, analytical score-based tests, bootstrap score-based tests, and permutation score-based tests. First, using an simulation study inspired by a real-life large-scale educational assessment, we compare the five approaches with respect to their type I error rate and their statistical power. Then, we present an application to an empirical data set. We find that all approaches show type I error rates close to the nominal alpha level. Furthermore, all approaches are shown to be sensitive to uniform and non-uniform DIF effects, with the score-based tests showing the highest power

    Preliminary evidence for an increased likelihood of a stable trajectory in mild cognitive impairment in individuals with higher motivational abilities

    Full text link
    BACKGROUND: Motivational abilities (MA), that describe skills in relation to goal-oriented behavior, have recently been found to be associated with neuropathological aging. Here we examine the impact of MA on the long-term course of mild cognitive impairment (MCI). METHODS: We followed-up N = 64 individuals diagnosed with MCI (M = 73 years, 44% female) for 3 years. MA were assessed by long-term informants of the participants using two scales: motivation and decision regulation [Volitional Components Questionnaires, VCQ, (Kuhl and Fuhrmann, Decomposing self-regulation and self-control: the volitional components inventory, 1998)]. Cognitive abilities were assessed with the Mini Mental State Examination (J Psychiatr Res 12:189-98, 1975). Survival analyses and multilevel modeling (MLM) were applied to determine the predicting effect of informant-rated MA at baseline on the likelihood of MCI stability and on the trajectory of cognitive abilities. RESULTS: Fifty percent (n = 32) of the MCI participants remained stable, while 32.8% (n = 21) and 17.2% (n = 11) converted to Alzheimer's disease (AD) or dropped-out, respectively. Survival analyses revealed that MCI cases with higher-rated MA at baseline were more likely to exert a stable course in MCI over 3 years (p = 0.036) when controlling for demographic characteristics and executive function. MLM analyses indicated that higher informant-rated MA at baseline were significantly related to higher cognitive abilities, even when controlling for MCI subtype (p = 0.030). CONCLUSIONS: This study provides preliminary longitudinal evidence for a lower risk of conversion to AD and higher cognitive abilities by higher rated MA at an early stage of MCI

    Student, school, and country differences in sustained test-taking effort in the 2009 PISA reading assessment

    Full text link
    In this article, the change in examinee effort during an assessment, which we will refer to as persistence, is modeled as an effect of item position. A multilevel extension is proposed to analyze hierarchically structured data and decompose the individual differences in persistence. Data from the 2009 Program of International Student Achievement (PISA) reading assessment from N = 467,819 students from 65 countries are analyzed with the proposed model, and the results are compared across countries. A decrease in examinee effort during the PISA reading assessment was found consistently across countries, with individual differences within and between schools. Both the decrease and the individual differences are more pronounced in lower performing countries. Within schools, persistence is slightly negatively correlated with reading ability; but at the school level, this correlation is positive in most countries. The results of our analyses indicate that it is important to model and control examinee effort in low-stakes assessments. (DIPF/Orig.

    Item-position effects and missing responses in large-scale assessments: Models and applications

    No full text
    Psychological and educational assessments commonly consist of multiple items that are inevitably administered in a specific item order. Hence, effects related to the sequential ordering of items (i.e., item-position effects) may arise. Typical examples of reported item-position effects are a change in the difficulty of items in aptitude tests (i.e., so-called fatigue or practice effects), and an increase in the consistency of the responses in attitude and personality questionnaires (i.e., the so-called Socratic effect). Further, the serial ordering of items also plays a role in skipping items and dropping out before the end of the assessment, both of which are observed in large-scale educational assessments. Common psychometric models assume that only properties of the test taker and the item contribute to the item response, and that these properties are invariant with respect to the position in which the item is administered. This dissertation focuses on effects of the sequential ordering of items and how these effects can be investigated and modeled using Item Response Theory (IRT). Models are proposed, evaluated and applied to empirical data. In Chapter 1, an IRT framework is proposed to model and investigate item-position effects in achievement tests with dichotomous items. Within the proposed framework, a variety of functions of item position can be added to both the item discrimination and the item difficulty parameter. Further, by introducing individual differences, the position effect on item difficulty can be interpreted as a test takers' persistence. A simulation study indicates that ignoring this item-position effect can result in biased estimates. Further, using two empirical illustrations the applicability of the modeling framework is demonstrated. In Chapter 2, a multilevel extension of the model with individual differences in ability and persistence is formulated and applied to the PSIA 2009 reading assessment data. Persistence, which can be related to a change in examinee effort, is investigated across all participating countries, and the individual differences in ability and persistence are decomposed into a within-school and between-school part. A negative average persistence is found consistently across all countries. Both the negative average persistence and the variance in persistence prove to be stronger in the lower performing countries. Chapter 3 focuses on the Socratic effect in personality and attitude questionnaires. An IRT approach based on the Generalized Partial Credit Model is proposed, in which the Socratic effect is modeled as an item-position effect on the discrimination parameter. Evidence is found for a small linear Socratic effect in the CES-D, which is a commonly used screening instrument for depression. In Chapter 4, skipped and not-reached responses in educational assessments are modeled using a tree-based multidimensional IRT model with sequentially interconnected subprocesses (i.e., an IRTree). Responses and omissions are modeled jointly, taking into account that both test takers and items can contribute to the two types of omissions. A simulation study shows that when the missingness is not at random, ignoring missing responses can results in biased estimates, and that an IRTree can reduce this bias. The applicability of the IRTrees for response omissions is illustrated using the PISA 2009 reading assessment data in Argentina.status: publishe

    Conditional permutation importance revisited

    Get PDF
    BACKGROUND: Random forest based variable importance measures have become popular tools for assessing the contributions of the predictor variables in a fitted random forest. In this article we reconsider a frequently used variable importance measure, the Conditional Permutation Importance (CPI). We argue and illustrate that the CPI corresponds to a more partial quantification of variable importance and suggest several improvements in its methodology and implementation that enhance its practical value. In addition, we introduce the threshold value in the CPI algorithm as a parameter that can make the CPI more partial or more marginal. RESULTS: By means of extensive simulations, where the original version of the CPI is used as the reference, we examine the impact of the proposed methodological improvements. The simulation results show how the improved CPI methodology increases the interpretability and stability of the computations. In addition, the newly proposed implementation decreases the computation times drastically and is more widely applicable. The improved CPI algorithm is made freely available as an add-on package to the open-source software R. CONCLUSION: The proposed methodology and implementation of the CPI is computationally faster and leads to more stable results. It has a beneficial impact on practical research by making random forest analyses more interpretable.status: publishe

    An Evaluation of DIF Tests in Multistage Tests for Continuous Covariates

    No full text
    Multistage tests are a widely used and efficient type of test presentation that aims to provide accurate ability estimates while keeping the test relatively short. Multistage tests typically rely on the psychometric framework of item response theory. Violations of item response models and other assumptions underlying a multistage test, such as differential item functioning, can lead to inaccurate ability estimates and unfair measurements. There is a practical need for methods to detect problematic model violations to avoid these issues. This study compares and evaluates three methods for the detection of differential item functioning with regard to continuous person covariates in data from multistage tests: a linear logistic regression test and two adaptations of a recently proposed score-based DIF test. While all tests show a satisfactory Type I error rate, the score-based tests show greater power against three types of DIF effects

    Conditional permutation importance revisited

    No full text
    BackgroundRandom forest based variable importance measures have become popular tools for assessing the contributions of the predictor variables in a fitted random forest. In this article we reconsider a frequently used variable importance measure, the Conditional Permutation Importance (CPI). We argue and illustrate that the CPI corresponds to a more partial quantification of variable importance and suggest several improvements in its methodology and implementation that enhance its practical value. In addition, we introduce the threshold value in the CPI algorithm as a parameter that can make the CPI more partial or more marginal.ResultsBy means of extensive simulations, where the original version of the CPI is used as the reference, we examine the impact of the proposed methodological improvements. The simulation results show how the improved CPI methodology increases the interpretability and stability of the computations. In addition, the newly proposed implementation decreases the computation times drastically and is more widely applicable. The improved CPI algorithm is made freely available as an add-on package to the open-source software R.ConclusionThe proposed methodology and implementation of the CPI is computationally faster and leads to more stable results. It has a beneficial impact on practical research by making random forest analyses more interpretable
    corecore