54 research outputs found

    Application of multidimensional IRT models to longitudinal data

    Get PDF
    The application of multidimensional item response theory (IRT) models to longitudinal educational surveys where students are repeatedly measured is discussed and exemplified. A marginal maximum likelihood (MML) method to estimate the parameters of a multidimensional generalized partial credit model for repeated measures is presented. It is shown that model fit can be evaluated using Lagrange multiplier tests. Two tests are presented: the first aims at evaluation of the fit of the item response functions and the second at the constancy of the item location parameters over time points. The outcome of the latter test is compared with an analysis using scatter plots and linear regression. An analysis of data from a school effectiveness study in Flanders (Belgium) is presented as an example of the application of these methods. In the example, it is evaluated whether the concepts "academic self-concept," "well-being at school," and "attentiveness in the classroom" were constant during the secondary school period. \u

    Practical methods for dealing with 'not applicable' item responses in the AMC Linear Disability Score project

    Get PDF
    Background:\ud Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included in the AMC Linear Disability Score (ALDS) project item bank. \ud \ud Methods:\ud The data examined in this paper come from the responses of 392 respondents to 32 items and form part of the calibration sample for the ALDS item bank. The data are analysed using the one-parameter logistic item response theory model. The four practical strategies for dealing with this type of response are: cold deck imputation; hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. \ud \ud Results:\ud The item and respondent population parameter estimates were very similar for the strategies involving hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. The estimates obtained using the cold deck imputation method were substantially different. \ud \ud Conclusions:\ud The cold deck imputation method was not considered suitable for use in the ALDS item bank. The other three methods described can be usefully implemented in the ALDS item bank, depending on the purpose of the data analysis to be carried out. These three methods may be useful for other data sets examining similar constructs, when item response theory based methods are used

    Working mechanism of a multidimensional computerized adaptive test for fatigue in rheumatoid arthritis

    Get PDF
    Background This paper demonstrates the mechanism of a multidimensional computerized adaptive test (CAT) to measure fatigue in patients with rheumatoid arthritis (RA). A CAT can be used to precisely measure patient-reported outcomes at an individual level as items are consequentially selected based on the patient’s previous answers. The item bank of the CAT Fatigue RA has been developed from the patients’ perspective and consists of 196 items pertaining to three fatigue dimensions: severity, impact and variability of fatigue. Methods The CAT Fatigue RA was completed by fifteen patients. To test the CAT’s working mechanism, we applied the flowchart-check-method. The adaptive item selection procedure for each patient was checked by the researchers. The estimated fatigue levels and the measurement precision per dimension were illustrated with the selected items, answers and flowcharts. Results The CAT Fatigue RA selected all items in a logical sequence and those items were selected which provided the most information about the patient’s individual fatigue. Flowcharts further illustrated that the CAT reached a satisfactory measurement precision, with less than 20 items, on the dimensions severity and impact and to somewhat lesser extent also for the dimension variability. Patients’ fatigue scores varied across the three dimensions; sometimes severity scored highest, other times impact or variability. The CAT’s ability to display different fatigue experiences can improve communication in daily clinical practice, guide interventions, and facilitate research into possible predictors of fatigue. Conclusions The results indicate that the CAT Fatigue RA measures precise and comprehensive. Once it is examined in more detail in a consecutive, elaborate validation study, the CAT will be available for implementation in daily clinical practice and for research purpose

    Further optimization of the reliability of the 28-joint disease activity score in patients with early rheumatoid arthritis

    Get PDF
    BACKGROUND: The 28-joint Disease Activity Score (DAS28) combines scores on a 28-tender and swollen joint count (TJC28 and SJC28), a patient-reported measure for general health (GH), and an inflammatory marker (either the erythrocyte sedimentation rate [ESR] or the C-reactive protein [CRP]) into a composite measure of disease activity in rheumatoid arthritis (RA). This study examined the reliability of the DAS28 in patients with early RA using principles from generalizability theory and evaluated whether it could be increased by adjusting individual DAS28 component weights. METHODS: Patients were drawn from the DREAM registry and classified into a "fast response" group (N = 466) and "slow response" group (N = 80), depending on their pace of reaching remission. Composite reliabilities of the DAS28-ESR and DAS28-CRP were determined with the individual components' reliability, weights, variances, error variances, correlations and covariances. Weight optimization was performed by minimizing the error variance of the index. RESULTS: Composite reliabilities of 0.85 and 0.86 were found for the DAS28-ESR and DAS28-CRP, respectively, and were approximately equal across patients groups. Component reliabilities, however, varied widely both within and between sub-groups, ranging from 0.614 for GH ("slow response" group) to 0.912 for ESR ("fast response" group). Weight optimization increased composite reliability even further. In the total and "fast response" groups, this was achieved mostly by decreasing the weight of the TJC28 and GH. In the "slow response" group, though, the weights of the TJC28 and SJC28 were increased, while those of the inflammatory markers and GH were substantially decreased. CONCLUSIONS: The DAS28-ESR and the DAS28-CRP are reliable instruments for assessing disease activity in early RA and reliability can be increased even further by adjusting component weights. Given the low reliability and weightings of the general health component across subgroups it is recommended to explore alternative patient-reported outcome measures for inclusion in the DAS28

    Construct Validation of a Multidimensional Computerized Adaptive Test for Fatigue in Rheumatoid Arthritis

    Get PDF
    Objective Multidimensional computerized adaptive testing enables precise measurements of patient-reported outcomes at an individual level across different dimensions. This study examined the construct validity of a multidimensional computerized adaptive test (CAT) for fatigue in rheumatoid arthritis (RA). Methods The ‘CAT Fatigue RA’ was constructed based on a previously calibrated item bank. It contains 196 items and three dimensions: ‘severity’, ‘impact’ and ‘variability’ of fatigue. The CAT was administered to 166 patients with RA. They also completed a traditional, multidimensional fatigue questionnaire (BRAF-MDQ) and the SF-36 in order to examine the CAT’s construct validity. A priori criterion for construct validity was that 75% of the correlations between the CAT dimensions and the subscales of the other questionnaires were as expected. Furthermore, comprehensive use of the item bank, measurement precision and score distribution were investigated. Results The a priori criterion for construct validity was supported for two of the three CAT dimensions (severity and impact but not for variability). For severity and impact, 87% of the correlations with the subscales of the well-established questionnaires were as expected but for variability, 53% of the hypothesised relations were found. Eighty-nine percent of the items were selected between one and 137 times for CAT administrations. Measurement precision was excellent for the severity and impact dimensions, with more than 90% of the CAT administrations reaching a standard error below 0.32. The variability dimension showed good measurement precision with 90% of the CAT administrations reaching a standard error below 0.44. No floor- or ceiling-effects were found for the three dimensions. Conclusion The CAT Fatigue RA showed good construct validity and excellent measurement precision on the dimensions severity and impact. The dimension variability had less ideal measurement characteristics, pointing to the need to recalibrate the CAT item bank with a two-dimensional model, solely consisting of severity and impact

    Item response theory in educational assessment and evaluation

    No full text
    Item response theory provides a useful and theoretically well-founded framework for educational measurement. It supports such activities as the construction of measurement instruments, linking and equating measurements, and evaluation of test bias and differential item functioning. It further provides underpinnings for item banking and flexible test administration designs, such as multiple matrix sampling, flexi-level testing, and computerized adaptive testing. First, a concise introduction to the principles of IRT models is given. The models discussed pertain to dichotomous items (items that are scored as either correct or incorrect) and polytomous items (items with partial credit scoring, such as most types of openended questions and performance assessments). Second, it is shown how an IRT measurement model can be enhanced with a structural model, such as, for instance, an analysis of variance model, to relate data from achievement and ability tests to students’ background variables, such as socio-economic status, intelligence or cultural capital, to school variables, and to features of the schooling system. Two applications are presented. The first one pertains to equating and linking of assessments, and the second one to a combination of an IRT measurement model and a multilevel linear model useful in school effectiveness research.La thĂ©orie de rĂ©ponse Ă  l’item (TRI) fournit un cadre utile et thĂ©oriquement bien fondĂ© pour la mesure en Ă©ducation. Elle soutient des activitĂ©s telles que la construction d’instruments de mesure, les procĂ©dures de mise en relation et de vĂ©rification d’équivalence des mesures, l’évaluation du biais d’un test et le fonctionnement diffĂ©rentiel d’items. Elle prĂ©voit la base pour des banques d’items et des designs flexibles pour l’administration d’un test, comme les mĂ©thodes d’échantillonnage multicritĂ©riĂ©, « flexi-level testing », et la mĂ©thode du test adaptatif par ordinateur. Tout d’abord, une brĂšve introduction aux principes de modĂšles TRI est donnĂ©e. Les modĂšles discutĂ©s concernent des items dichotomiques (items qui sont corrects ou incorrects) et des items polytomiques (items Ă  un crĂ©dit partiel, comme la plupart des questions ouvertes et questions de l’évaluation des compĂ©tences). DeuxiĂšmement, on montre comment un modĂšle de mesure TRI peut ĂȘtre amĂ©liorĂ© en utilisant un modĂšle structurel, par exemple, un modĂšle d’analyse de la variance, pour Ă©tablir un lien entre les donnĂ©es provenant de tests pour mesurer le rendement et la capacitĂ© des Ă©lĂšves Ă  des variables, tels leur statut socio-Ă©conomique, leur niveau d’intelligence ou leur capital culturel, et Ă  des variables caractĂ©risant l’école et le systĂšme scolaire. Deux applications sont prĂ©sentĂ©es. La premiĂšre se rapporte aux procĂ©dures de type mise en parallĂšle (equating et linking), et la seconde Ă  une combinaison d’un modĂšle de mesure TRI et d’un modĂšle linĂ©aire multiniveaux utilisĂ© dans la recherche relative Ă  l’efficacitĂ© de l’école.A teoria de resposta ao item (TRI) fornece um quadro Ăștil e teoricamente bem fundamentado para a medida em educação. Sustenta actividades como a construção de instrumentos de medida, os procedimentos de relacionamento e de verificação de equivalĂȘncia de medidas, avaliação do desvio de um teste e o funcionamento diferencial de itens. PrevĂȘ a base para os bancos de itens e desenhos flexĂ­veis para a administração de um teste, como os mĂ©todos de amostragem multicriterial, “flexi-level testing” e o mĂ©todo do teste adaptativo por computador. Antes de mais, Ă© dada uma breve introdução aos princĂ­pios dos modelos TRI. Os modelos discutidos dizem respeito aos itens dicotĂłmicos (itens que sĂŁo correctos ou incorrectos) e a itens politĂłmicos (itens de crĂ©dito parcial, como a maior parte das perguntas abertas e das perguntas de avaliação de competĂȘncias). Em segundo lugar, mostra-se como um modelo de medida pode ser melhorado utilizando um modelo estrutural, por exemplo, um modelo de anĂĄlise da variĂąncia, para relacionar os dados provenientes de testes para medir o rendimento e a capacidade dos alunos com variĂĄveis, tais como o seu estatuto socio-econĂłmico, o seu nĂ­vel de inteligĂȘncia ou o seu capital cultural e com variĂĄveis que caracterizam a escola e o sistema escolar. Apresentam-se duas aplicaçÔes. A primeira estĂĄ relacionada com procedimentos do tipo colocar em paralelo (equating et linking), e a segunda Ă© uma combinação de um modelo de medida TRI com um modelo linear multinĂ­vel utilizado na investigação relativa Ă  eficĂĄcia da escola
    • 

    corecore