37 research outputs found

    Application of the health assessment questionnaire disability index to various rheumatic diseases

    Get PDF
    Purpose\ud \ud To investigate whether the Stanford Health Assessment Questionnaire Disability Index (HAQ-DI) can serve as a generic instrument for measuring disability across different rheumatic diseases and to propose a scoring method based on item response theory (IRT) modeling to support this goal.\ud \ud Methods\ud \ud The HAQ-DI was administered to a cross-sectional sample of patients with confirmed rheumatoid arthritis (n = 619), osteoarthritis (n = 125), or gout (n = 102). The results were analyzed using the generalized partial credit model as an IRT model.\ud \ud Results\ud \ud It was found that 4 out of 8 item categories of the HAQ-DI displayed substantial differential item functioning (DIF) over the three diseases. Further, it was shown that this DIF could be modeled using an IRT model with disease-specific item parameters, which produces measures that are comparable for the three diseases.\ud \ud Conclusion\ud \ud Although the HAQ-DI partially functioned differently in the three disease groups, the measurement regarding the disability level of the patients can be made comparable using IRT methods\u

    Methodological issues regarding power of classical test theory (CTT) and item response theory (IRT)-based approaches for the comparison of patient-reported outcomes in two groups of patients - a simulation study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Patients-Reported Outcomes (PRO) are increasingly used in clinical and epidemiological research. Two main types of analytical strategies can be found for these data: classical test theory (CTT) based on the observed scores and models coming from Item Response Theory (IRT). However, whether IRT or CTT would be the most appropriate method to analyse PRO data remains unknown. The statistical properties of CTT and IRT, regarding power and corresponding effect sizes, were compared.</p> <p>Methods</p> <p>Two-group cross-sectional studies were simulated for the comparison of PRO data using IRT or CTT-based analysis. For IRT, different scenarios were investigated according to whether items or person parameters were assumed to be known, to a certain extent for item parameters, from good to poor precision, or unknown and therefore had to be estimated. The powers obtained with IRT or CTT were compared and parameters having the strongest impact on them were identified.</p> <p>Results</p> <p>When person parameters were assumed to be unknown and items parameters to be either known or not, the power achieved using IRT or CTT were similar and always lower than the expected power using the well-known sample size formula for normally distributed endpoints. The number of items had a substantial impact on power for both methods.</p> <p>Conclusion</p> <p>Without any missing data, IRT and CTT seem to provide comparable power. The classical sample size formula for CTT seems to be adequate under some conditions but is not appropriate for IRT. In IRT, it seems important to take account of the number of items to obtain an accurate formula.</p

    Some recommendations for developing multidimensional computerized adaptive tests for patient-reported outcomes

    Get PDF
    PURPOSE: Multidimensional item response theory and computerized adaptive testing (CAT) are increasingly used in mental health, quality of life (QoL), and patient-reported outcome measurement. Although multidimensional assessment techniques hold promises, they are more challenging in their application than unidimensional ones. The authors comment on minimal standards when developing multidimensional CATs. METHODS: Prompted by pioneering papers published in QLR, the authors reflect on existing guidance and discussions from different psychometric communities, including guidelines developed for unidimensional CATs in the PROMIS project. RESULTS: The commentary focuses on two key topics: (1) the design, evaluation, and calibration of multidimensional item banks and (2) how to study the efficiency and precision of a multidimensional item bank. The authors suggest that the development of a carefully designed and calibrated item bank encompasses a construction phase and a psychometric phase. With respect to efficiency and precision, item banks should be large enough to provide adequate precision over the full range of the latent constructs. Therefore CAT performance should be studied as a function of the latent constructs and with reference to relevant benchmarks. Solutions are also suggested for simulation studies using real data, which often result in too optimistic evaluations of an item bank's efficiency and precision. DISCUSSION: Multidimensional CAT applications are promising but complex statistical assessment tools which necessitate detailed theoretical frameworks and methodological scrutiny when testing their appropriateness for practical applications. The authors advise researchers to evaluate item banks with a broad set of methods, describe their choices in detail, and substantiate their approach for validation

    Application of multidimensional IRT models to longitudinal data

    No full text
    The application of multidimensional item response theory (IRT) models to longitudinal educational surveys where students are repeatedly measured is discussed and exemplified. A marginal maximum likelihood (MML) method to estimate the parameters of a multidimensional generalized partial credit model for repeated measures is presented. It is shown that model fit can be evaluated using Lagrange multiplier tests. Two tests are presented: the first aims at evaluation of the fit of the item response functions and the second at the constancy of the item location parameters over time points. The outcome of the latter test is compared with an analysis using scatter plots and linear regression. An analysis of data from a school effectiveness study in Flanders (Belgium) is presented as an example of the application of these methods. In the example, it is evaluated whether the concepts "academic self-concept," "well-being at school," and "attentiveness in the classroom" were constant during the secondary school period.status: publishe

    Equating the HBSC Family Affluence Scale across survey years: a method to account for item parameter drift using the Rasch model

    No full text
    Purpose To investigate the measurement invariance (MI) of the Family Affluence Scale (FAS) measured in the Health Behavior in School-aged Children (HBSC) survey, and to describe a method for equating the scale when MI is violated across survey years. Methods This study used a sample of 14,076 Norwegian and 17,365 Scottish adolescents from the 2002, 2006 and 2010 HBSC surveys to investigate the MI of the FAS across survey years. Violations of MI in the form of differential item functioning (DIF) due to item parameter drift (IPD) were modeled within the Rasch framework to ensure that the FAS scores from different survey years remain comparable. Results The results indicate that the FAS is upwardly biased due to IPD in the computer item across survey years in the Norwegian and Scottish samples. Ignoring IPD across survey years resulted in the conclusion that family affluence is increasing quite consistently in Norway and Scotland. However, the results show that a large part of the increase in the FAS scores can be attributed to bias in the FAS because of IPD across time. The increase in the FAS was more modest in Scotland and slightly negative in Norway once the DIF in the computer item was accounted for in this study. Conclusions When the comparison of family affluence is necessary over different HBSC survey years or when the longitudinal implications of family affluence are of interest, it is necessary to account for IPD in interpretation of changes in family affluence across time
    corecore