54 research outputs found
Application of multidimensional IRT models to longitudinal data
The application of multidimensional item response theory (IRT) models to longitudinal educational surveys where students are repeatedly measured is discussed and exemplified. A marginal maximum likelihood (MML) method to estimate the parameters of a multidimensional generalized partial credit model for repeated measures is presented. It is shown that model fit can be evaluated using Lagrange multiplier tests. Two tests are presented: the first aims at evaluation of the fit of the item response functions and the second at the constancy of the item location parameters over time points. The outcome of the latter test is compared with an analysis using scatter plots and linear regression. An analysis of data from a school effectiveness study in Flanders (Belgium) is presented as an example of the application of these methods. In the example, it is evaluated whether the concepts "academic self-concept," "well-being at school," and "attentiveness in the classroom" were constant during the secondary school period. \u
Practical methods for dealing with 'not applicable' item responses in the AMC Linear Disability Score project
Background:\ud
Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included in the AMC Linear Disability Score (ALDS) project item bank. \ud
\ud
Methods:\ud
The data examined in this paper come from the responses of 392 respondents to 32 items and form part of the calibration sample for the ALDS item bank. The data are analysed using the one-parameter logistic item response theory model. The four practical strategies for dealing with this type of response are: cold deck imputation; hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. \ud
\ud
Results:\ud
The item and respondent population parameter estimates were very similar for the strategies involving hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. The estimates obtained using the cold deck imputation method were substantially different. \ud
\ud
Conclusions:\ud
The cold deck imputation method was not considered suitable for use in the ALDS item bank. The other three methods described can be usefully implemented in the ALDS item bank, depending on the purpose of the data analysis to be carried out. These three methods may be useful for other data sets examining similar constructs, when item response theory based methods are used
Working mechanism of a multidimensional computerized adaptive test for fatigue in rheumatoid arthritis
Background
This paper demonstrates the mechanism of a multidimensional computerized adaptive test (CAT) to measure fatigue in patients with rheumatoid arthritis (RA). A CAT can be used to precisely measure patient-reported outcomes at an individual level as items are consequentially selected based on the patientâs previous answers. The item bank of the CAT Fatigue RA has been developed from the patientsâ perspective and consists of 196 items pertaining to three fatigue dimensions: severity, impact and variability of fatigue.
Methods
The CAT Fatigue RA was completed by fifteen patients. To test the CATâs working mechanism, we applied the flowchart-check-method. The adaptive item selection procedure for each patient was checked by the researchers. The estimated fatigue levels and the measurement precision per dimension were illustrated with the selected items, answers and flowcharts.
Results
The CAT Fatigue RA selected all items in a logical sequence and those items were selected which provided the most information about the patientâs individual fatigue. Flowcharts further illustrated that the CAT reached a satisfactory measurement precision, with less than 20 items, on the dimensions severity and impact and to somewhat lesser extent also for the dimension variability. Patientsâ fatigue scores varied across the three dimensions; sometimes severity scored highest, other times impact or variability. The CATâs ability to display different fatigue experiences can improve communication in daily clinical practice, guide interventions, and facilitate research into possible predictors of fatigue.
Conclusions
The results indicate that the CAT Fatigue RA measures precise and comprehensive. Once it is examined in more detail in a consecutive, elaborate validation study, the CAT will be available for implementation in daily clinical practice and for research purpose
Further optimization of the reliability of the 28-joint disease activity score in patients with early rheumatoid arthritis
BACKGROUND:
The 28-joint Disease Activity Score (DAS28) combines scores on a 28-tender and swollen joint count (TJC28 and SJC28), a patient-reported measure for general health (GH), and an inflammatory marker (either the erythrocyte sedimentation rate [ESR] or the C-reactive protein [CRP]) into a composite measure of disease activity in rheumatoid arthritis (RA). This study examined the reliability of the DAS28 in patients with early RA using principles from generalizability theory and evaluated whether it could be increased by adjusting individual DAS28 component weights.
METHODS:
Patients were drawn from the DREAM registry and classified into a "fast response" group (Nâ=â466) and "slow response" group (Nâ=â80), depending on their pace of reaching remission. Composite reliabilities of the DAS28-ESR and DAS28-CRP were determined with the individual components' reliability, weights, variances, error variances, correlations and covariances. Weight optimization was performed by minimizing the error variance of the index.
RESULTS:
Composite reliabilities of 0.85 and 0.86 were found for the DAS28-ESR and DAS28-CRP, respectively, and were approximately equal across patients groups. Component reliabilities, however, varied widely both within and between sub-groups, ranging from 0.614 for GH ("slow response" group) to 0.912 for ESR ("fast response" group). Weight optimization increased composite reliability even further. In the total and "fast response" groups, this was achieved mostly by decreasing the weight of the TJC28 and GH. In the "slow response" group, though, the weights of the TJC28 and SJC28 were increased, while those of the inflammatory markers and GH were substantially decreased.
CONCLUSIONS:
The DAS28-ESR and the DAS28-CRP are reliable instruments for assessing disease activity in early RA and reliability can be increased even further by adjusting component weights. Given the low reliability and weightings of the general health component across subgroups it is recommended to explore alternative patient-reported outcome measures for inclusion in the DAS28
Construct Validation of a Multidimensional Computerized Adaptive Test for Fatigue in Rheumatoid Arthritis
Objective
Multidimensional computerized adaptive testing enables precise measurements of patient-reported outcomes at an individual level across different dimensions. This study examined the construct validity of a multidimensional computerized adaptive test (CAT) for fatigue in rheumatoid arthritis (RA).
Methods
The âCAT Fatigue RAâ was constructed based on a previously calibrated item bank. It contains 196 items and three dimensions: âseverityâ, âimpactâ and âvariabilityâ of fatigue. The CAT was administered to 166 patients with RA. They also completed a traditional, multidimensional fatigue questionnaire (BRAF-MDQ) and the SF-36 in order to examine the CATâs construct validity. A priori criterion for construct validity was that 75% of the correlations between the CAT dimensions and the subscales of the other questionnaires were as expected. Furthermore, comprehensive use of the item bank, measurement precision and score distribution were investigated.
Results
The a priori criterion for construct validity was supported for two of the three CAT dimensions (severity and impact but not for variability). For severity and impact, 87% of the correlations with the subscales of the well-established questionnaires were as expected but for variability, 53% of the hypothesised relations were found. Eighty-nine percent of the items were selected between one and 137 times for CAT administrations. Measurement precision was excellent for the severity and impact dimensions, with more than 90% of the CAT administrations reaching a standard error below 0.32. The variability dimension showed good measurement precision with 90% of the CAT administrations reaching a standard error below 0.44. No floor- or ceiling-effects were found for the three dimensions.
Conclusion
The CAT Fatigue RA showed good construct validity and excellent measurement precision on the dimensions severity and impact. The dimension variability had less ideal measurement characteristics, pointing to the need to recalibrate the CAT item bank with a two-dimensional model, solely consisting of severity and impact
Item response theory in educational assessment and evaluation
Item response theory provides a useful and theoretically well-founded framework for educational measurement. It supports such activities as the construction of measurement instruments, linking and equating measurements, and evaluation of test bias and differential item functioning. It further provides underpinnings for item banking and flexible test administration designs, such as multiple matrix sampling, flexi-level testing, and computerized adaptive testing. First, a concise introduction to the principles of IRT models is given. The models discussed pertain to dichotomous items (items that are scored as either correct or incorrect) and polytomous items (items with partial credit scoring, such as most types of openended questions and performance assessments). Second, it is shown how an IRT measurement model can be enhanced with a structural model, such as, for instance, an analysis of variance model, to relate data from achievement and ability tests to studentsâ background variables, such as socio-economic status, intelligence or cultural capital, to school variables, and to features of the schooling system. Two applications are presented. The first one pertains to equating and linking of assessments, and the second one to a combination of an IRT measurement model and a multilevel linear model useful in school effectiveness research.La thĂ©orie de rĂ©ponse Ă lâitem (TRI) fournit un cadre utile et thĂ©oriquement bien fondĂ© pour la mesure en Ă©ducation. Elle soutient des activitĂ©s telles que la construction dâinstruments de mesure, les procĂ©dures de mise en relation et de vĂ©rification dâĂ©quivalence des mesures, lâĂ©valuation du biais dâun test et le fonctionnement diffĂ©rentiel dâitems. Elle prĂ©voit la base pour des banques dâitems et des designs flexibles pour lâadministration dâun test, comme les mĂ©thodes dâĂ©chantillonnage multicritĂ©riĂ©, « flexi-level testing », et la mĂ©thode du test adaptatif par ordinateur. Tout dâabord, une brĂšve introduction aux principes de modĂšles TRI est donnĂ©e. Les modĂšles discutĂ©s concernent des items dichotomiques (items qui sont corrects ou incorrects) et des items polytomiques (items Ă un crĂ©dit partiel, comme la plupart des questions ouvertes et questions de lâĂ©valuation des compĂ©tences). DeuxiĂšmement, on montre comment un modĂšle de mesure TRI peut ĂȘtre amĂ©liorĂ© en utilisant un modĂšle structurel, par exemple, un modĂšle dâanalyse de la variance, pour Ă©tablir un lien entre les donnĂ©es provenant de tests pour mesurer le rendement et la capacitĂ© des Ă©lĂšves Ă des variables, tels leur statut socio-Ă©conomique, leur niveau dâintelligence ou leur capital culturel, et Ă des variables caractĂ©risant lâĂ©cole et le systĂšme scolaire. Deux applications sont prĂ©sentĂ©es. La premiĂšre se rapporte aux procĂ©dures de type mise en parallĂšle (equating et linking), et la seconde Ă une combinaison dâun modĂšle de mesure TRI et dâun modĂšle linĂ©aire multiniveaux utilisĂ© dans la recherche relative Ă lâefficacitĂ© de lâĂ©cole.A teoria de resposta ao item (TRI) fornece um quadro Ăștil e teoricamente bem fundamentado para a medida em educação. Sustenta actividades como a construção de instrumentos de medida, os procedimentos de relacionamento e de verificação de equivalĂȘncia de medidas, avaliação do desvio de um teste e o funcionamento diferencial de itens. PrevĂȘ a base para os bancos de itens e desenhos flexĂveis para a administração de um teste, como os mĂ©todos de amostragem multicriterial, âflexi-level testingâ e o mĂ©todo do teste adaptativo por computador. Antes de mais, Ă© dada uma breve introdução aos princĂpios dos modelos TRI. Os modelos discutidos dizem respeito aos itens dicotĂłmicos (itens que sĂŁo correctos ou incorrectos) e a itens politĂłmicos (itens de crĂ©dito parcial, como a maior parte das perguntas abertas e das perguntas de avaliação de competĂȘncias). Em segundo lugar, mostra-se como um modelo de medida pode ser melhorado utilizando um modelo estrutural, por exemplo, um modelo de anĂĄlise da variĂąncia, para relacionar os dados provenientes de testes para medir o rendimento e a capacidade dos alunos com variĂĄveis, tais como o seu estatuto socio-econĂłmico, o seu nĂvel de inteligĂȘncia ou o seu capital cultural e com variĂĄveis que caracterizam a escola e o sistema escolar. Apresentam-se duas aplicaçÔes. A primeira estĂĄ relacionada com procedimentos do tipo colocar em paralelo (equating et linking), e a segunda Ă© uma combinação de um modelo de medida TRI com um modelo linear multinĂvel utilizado na investigação relativa Ă eficĂĄcia da escola
- âŠ