4,482 research outputs found

    On Wald tests for differential item functioning detection

    Get PDF
    Wald-type tests are a common procedure for DIF detection among the IRT-based methods. However, the empirical type I error rate of these tests departs from the significance level. In this paper, two reasons that explain this discrepancy will be discussed and a new procedure will be proposed. The first reason is related to the equating coefficients used to convert the item parameters to a common scale, as they are treated as known constants whereas they are estimated. The second reason is related to the parameterization used to estimate the item parameters, which is different from the usual IRT parameterization. Since the item parameters in the usual IRT parameterization are obtained in a second step, the corresponding covariance matrix is approximated using the delta method. The proposal of this article is to account for the estimation of the equating coefficients treating them as random variables and to use the untransformed (i.e. not reparameterized) item parameters in the computation of the test statistic. A simulation study is presented to compare the performance of this new proposal with the currently used procedure. Results show that the new proposal gives type I error rates closer to the significance level

    Anchor selection strategies for DIF analysis: Review, assessment, and new approaches

    Get PDF
    Differential item functioning (DIF) indicates the violation of the invariance assumption for instance in models based on item response theory (IRT). For item-wise DIF analysis using IRT, a common metric for the item parameters of the groups that are to be compared (e.g. for the reference and the focal group) is necessary. In the Rasch model, therefore, the same linear restriction is imposed in both groups. Items in the restriction are termed the anchor items. Ideally, these items are DIF-free to avoid artificially augmented false alarm rates. However, the question how DIF-free anchor items are selected appropriately is still a major challenge. Furthermore, various authors point out the lack of new anchor selection strategies and the lack of a comprehensive study especially for dichotomous IRT models. This article reviews existing anchor selection strategies that do not require any knowledge prior to DIF analysis, offers a straightforward notation and proposes three new anchor selection strategies. An extensive simulation study is conducted to compare the performance of the anchor selection strategies. The results show that an appropriate anchor selection is crucial for suitable item-wise DIF analysis. The newly suggested anchor selection strategies outperform the existing strategies and can reliably locate a suitable anchor when the sample sizes are large enough

    Anchor methods for DIF detection: A comparison of the iterative forward, backward, constant and all-other anchor class

    Get PDF
    In the analysis of differential item functioning (DIF) using item response theory (IRT), a common metric is necessary to compare item parameters between groups of test-takers. In the Rasch model, the same restriction is placed on the item parameters in each group in order to define a common metric. However, the question how the items in the restriction - termed anchor items - are selected appropriately is still a major challenge. This article proposes a conceptual framework for categorizing anchor methods: The anchor class to describe characteristics of the anchor methods and the anchor selection strategy to guide how the anchor items are determined. Furthermore, a new anchor class termed the iterative forward anchor class is proposed. Several anchor classes are implemented with two different anchor selection strategies (the all-other and the single-anchor selection strategy) and are compared in an extensive simulation study. The results show that the newly proposed anchor class combined with the single-anchor selection strategy is superior in situations where no prior knowledge about the direction of DIF is available. Moreover, it is shown that the proportion of DIF items in the anchor - rather than the fact whether the anchor includes DIF items at all (termed contamination in previous studies) - is crucial for suitable DIF analysis

    Longitudinal Differential Item Functioning Detection Using Bifactor Models and the Wald Test

    Get PDF
    The use of longitudinal data for studying cross-time changes is built on the key assumption that properties (e.g., slopes and intercepts) of the repeatedly-used items remain unchanged over time. True changes in the latent variables are indistinguishable from item-level changes when items exhibit differential item functioning (DIF) across time points. To date, no research has extended the modified Wald test for longitudinal DIF detection. The current Monte Carlo simulation study proposes and evaluates a new approach, which pairs the versatile bifactor model with the modified Wald test, for detecting longitudinal DIF. Power and Type I error associated with DIF tests under the new approach are reported for conditions with varied proportions of known anchors and different types of standard error estimation procedure. The new approach is also compared to DIF methods based on the misspecified unidimensional model which assumes independence in the factors and items. An applied example is provided, along with the flexMIRT script and the R code used respectively for model calibration and DIF analysis. Limitations of the current study and future research directions are discussed

    An Examination of the MIMIC Method for Detecting DIF and Comparison to the IRT Likelihood Ratio and Wald Tests.

    Get PDF
    Ph.D. Thesis. University of HawaiŹ»i at Mānoa 2018

    Differential item functioning in the Patient Reported Outcomes Measurement Information System Pediatric Short Forms in a sample of children and adolescents with cerebral palsy.

    Get PDF
    AIM: The present study examined the Patient Reported Outcomes Measurement Information System (PROMIS) Mobility, Fatigue, and Pain Interference Short Forms (SFs) in children and adolescents with cerebral palsy (CP) for the presence of differential item functioning (DIF) relative to the original calibration sample. METHOD: Using the Graded Response Model we compared item parameter estimates generated from a sample of 303 children and adolescents with CP (175 males, 128 females; mean age 15y 5mo) to parameter estimates from the PROMIS calibration sample, which served as the reference group. DIF was assessed in a two-step process using the item response theory-likelihood ratio-differential item functioning detection procedure. RESULTS: Significant DIF was identified for four of eight items in the PROMIS Mobility SF, for two of eight items in the Pain Interference Scale, and for one item out of 10 on the Fatigue Scale. Impact of DIF on total score estimation was notable for Mobility and Pain Interference, but not for Fatigue. INTERPRETATION: Results suggest differences in the responses of adolescents with CP to some items on the PROMIS Mobility and Pain Interference SFs. Cognitive interviews about the PROMIS items with adolescents with varying degrees of mobility limitations would provide better understanding of how they are interpreting and selecting responses to the PROMIS items and thus help guide selection of the most appropriate way to address this issue

    A Comparison of Differential Item Functioning Detection Methods in Cognitive Diagnostic Models

    Get PDF
    As a class of discrete latent variable models, cognitive diagnostic models have been widely researched in education, psychology, and many other disciplines. Detecting and eliminating differential item functioning (DIF) items from cognitive diagnostic tests is of great importance for test fairness and validity. A Monte Carlo study with varying manipulated factors was carried out to investigate the performance of the Mantel-Haenszel (MH), logistic regression (LR), and Wald tests based on item-wise information, cross-product information, observed information, and sandwich-type covariance matrices (denoted by Wd, WXPD, WObs, and WSw, respectively) for DIF detection. The results showed that (1) the WXPD and LR methods had the best performance in controlling Type I error rates among the six methods investigated in this study and (2) under the uniform DIF condition, when the item quality was high or medium, the power of WXPD, WObs, and WSw was comparable with or superior to that of MH and LR, but when the item quality was low, WXPD, WObs, and WSw were less powerful than MH and LR. Under the non-uniform DIF condition, the power of WXPD, WObs, and WSw was comparable with or higher than that of LR

    A Comparison of Six DIF Detection Methods

    Get PDF
    In the context of educational measurement, a test item is identified as differentially functioning across groups when the probability an examineeā€™s response to it depends on group membership. Methods for detecting uniform and nonuniform DIF have been studied and examined over decades to improve the validity of tests. The current study focused on examining and comparing the effectiveness of six DIF detection methods: the Mantel-Haenszel (MH) procedure, the Logistic Regression procedure, the multiple indicators multiple causes (MIMIC) model, the item response theory likelihood-ratio test (IRT-LR), Lordā€™s IRT-based Wald test and a Randomization Test based on a R-square change statistic. A simulation study was conducted in which the factors manipulated were the percentage of DIF items (%DIF), sample size (number of examinees in each group), test length (number of items in test), type and magnitude of DIF, and the mean ability difference between groups of examinees. The results showed that the MIMIC model had the greatest power in detecting uniform DIF items, as well as nonuniform DIF items with longer tests. The logistic regression method and the randomization test are quite efficient in detecting uniform DIF items, but the randomization test only applies when the two groups of people have the same mean ability. The IRT methods are more useful for detecting nonuniform DIF items. The percentage of DIF items does not have much effect on the power of each method, while most methods are better when detecting large magnitude DIF than small, and are better when the sample size for each group is large

    A Differential Response Functioning Framework for Understanding Item, Bundle, and Test Bias

    Get PDF
    This dissertation extends the parametric sampling method and area-based statistics for differential test functioning (DTF) proposed by Chalmers, Counsell, and Flora (2016). Measures for differential item and bundle functioning are first introduced as a special case of the DTF statistics. Next, these extensions are presented in concert with the original DTF measures as a unified framework for quantifying differential response functioning (DRF) of items, bundles, and tests. To evaluate the utility of the new family of measures, the DRF framework is compared to the previously established simultaneous item bias test (SIBTEST) and differential functioning of items and tests (DFIT) frameworks. A series of Monte Carlo simulation conditions were designed to estimate the power to detect differential effects when compensatory and non-compensatory differential effects are present, as well as to evaluate Type I error control. Benefits inherent to the DRF framework are discussed, extensions are suggested, and alternative methods for generating composite-level sampling variability are presented. Finally, it is argued that the area-based measures in the DRF framework provide an intuitive and meaningful quantification of marginal and conditional response bias over and above what has been offered by the previously established statistical frameworks
    • ā€¦
    corecore