77 research outputs found

    Psychometric Framework for Modeling Parental Involvement and Reading Literacy

    Get PDF
    Assessment, Testing and Evaluatio

    Psychometric Framework for Modeling Parental Involvement and Reading Literacy

    Get PDF
    Assessment, Testing and Evaluatio

    Variance Decomposition Using an IRT Measurement Model

    Get PDF
    Large scale research projects in behaviour genetics and genetic epidemiology are often based on questionnaire or interview data. Typically, a number of items is presented to a number of subjects, the subjects’ sum scores on the items are computed, and the variance of sum scores is decomposed into a number of variance components. This paper discusses several disadvantages of the approach of analysing sum scores, such as the attenuation of correlations amongst sum scores due to their unreliability. It is shown that the framework of Item Response Theory (IRT) offers a solution to most of these problems. We argue that an IRT approach in combination with Markov chain Monte Carlo (MCMC) estimation provides a flexible and efficient framework for modelling behavioural phenotypes. Next, we use data simulation to illustrate the potentially huge bias in estimating variance components on the basis of sum scores. We then apply the IRT approach with an analysis of attention problems in young adult twins where the variance decomposition model is extended with an IRT measurement model. We show that when estimating an IRT measurement model and a variance decomposition model simultaneously, the estimate for the heritability of attention problems increases from 40% (based on sum scores) to 73%

    Validity and reliability of student perceptions of teaching quality in primary education

    Get PDF
    A Bayesian IRT-model approach was used to investigate the validity and reliability of student perceptions of teaching quality. Furthermore, the student perceptions were compared with ratings of teaching quality by external observers. Grade 4 students (n = 675) filled out a questionnaire that was used to measure their opinions about the lessons of their teachers. Three lessons of 39 teachers were recorded and rated by 4 raters. The analyses showed that student perception and lesson observation scales fit best in an 11-dimensional model, which was an indication of construct validity and discriminant validity. Student perception scales were reliable, although not all items contributed to the scales to the same extent. Student ratings and lesson observations scores generally correlated moderately (ranging from r = .18 to r = .50). Higher correlations were found for scales with a similar content; however, no clear pattern was apparent. Suggestions for future research are presented

    Measuring Patient-Reported Outcomes Adaptively: Multidimensionality Matters!

    Get PDF
    As there is currently a marked increase in the use of both unidimensional (UCAT) and multidimensional computerized adaptive testing (MCAT) in psychological and health measurement, the main aim of the present study is to assess the incremental value of using MCAT rather than separate UCATs for each dimension. Simulations are based on empirical data that could be considered typical for health measurement: a large number of dimensions (4), strong correlations among dimensions (.77-.87), and polytomously scored response data. Both variable- (SE <.316, SE <.387) and fixed-length conditions (total test length of 12, 20, or 32 items) are studied. The item parameters and variance–covariance matrix Φ are estimated with the multidimensional graded response model (GRM). Outcome variables include computerized adaptive test (CAT) length, root mean square error (RMSE), and bias. Both simulated and empirical latent trait distributions are used to sample vectors of true scores. MCATs were generally more efficient (in terms of test length) and more accurate (in terms of RMSE) than their UCAT counterparts. Absolute average bias was highest for variable-length UCATs with termination rule SE <.387. Test length of variable-length MCATs was on average 20% to 25% shorter than test length across separate UCATs. This study showed that there are clear advantages of using MCAT rather than UCAT in a setting typical for health measurement

    Application of the health assessment questionnaire disability index to various rheumatic diseases

    Get PDF
    Purpose\ud \ud To investigate whether the Stanford Health Assessment Questionnaire Disability Index (HAQ-DI) can serve as a generic instrument for measuring disability across different rheumatic diseases and to propose a scoring method based on item response theory (IRT) modeling to support this goal.\ud \ud Methods\ud \ud The HAQ-DI was administered to a cross-sectional sample of patients with confirmed rheumatoid arthritis (n = 619), osteoarthritis (n = 125), or gout (n = 102). The results were analyzed using the generalized partial credit model as an IRT model.\ud \ud Results\ud \ud It was found that 4 out of 8 item categories of the HAQ-DI displayed substantial differential item functioning (DIF) over the three diseases. Further, it was shown that this DIF could be modeled using an IRT model with disease-specific item parameters, which produces measures that are comparable for the three diseases.\ud \ud Conclusion\ud \ud Although the HAQ-DI partially functioned differently in the three disease groups, the measurement regarding the disability level of the patients can be made comparable using IRT methods\u

    Personal non-commercial use only

    Get PDF
    ABSTRACT. Objective. To compare the psychometric functioning of multidimensional disease-specific, multiitem generic, and single-item measures of fatigue in patients with rheumatoid arthritis (RA). Methods. Confirmatory factor analysis (CFA) and longitudinal item response theory (IRT) modeling were used to evaluate the measurement structure and local reliability of the Bristol RA Fatigue Multi-Dimensional Questionnaire (BRAF-MDQ), the Medical Outcomes Study Short Form-36 (SF-36) vitality scale, and the BRAF Numerical Rating Scales (BRAF-NRS) in a sample of 588 patients with RA. Results. A 1-factor CFA model yielded a similar fit to a 5-factor model with subscale-specific dimensions, and the items from the different instruments adequately fit the IRT model, suggesting essential unidimensionality in measurement. The SF-36 vitality scale outperformed the BRAF-MDQ at lower levels of fatigue, but was less precise at moderate to higher levels of fatigue. At these levels of fatigue, the living, cognition, and emotion subscales of the BRAF-MDQ provide additional precision. The BRAF-NRS showed a limited measurement range with its highest precision centered on average levels of fatigue. Conclusion. The different instruments appear to access a common underlying domain of fatigue severity, but differ considerably in their measurement precision along the continuum. The SF-36 vitality scale can be used to measure fatigue severity in samples with relatively mild fatigue. For samples expected to have higher levels of fatigue, the multidimensional BRAF-MDQ appears to be a better choice. The BRAF-NRS are not recommended if precise assessment is required, for instance in longitudina

    Fatigue assessment in RA Personal non-commercial use only

    Get PDF
    ABSTRACT. Objective. To compare the psychometric functioning of multidimensional disease-specific, multiitem generic, and single-item measures of fatigue in patients with rheumatoid arthritis (RA). Methods. Confirmatory factor analysis (CFA) and longitudinal item response theory (IRT) modeling were used to evaluate the measurement structure and local reliability of the Bristol RA Fatigue Multi-Dimensional Questionnaire (BRAF-MDQ), the Medical Outcomes Study Short Form-36 (SF-36) vitality scale, and the BRAF Numerical Rating Scales (BRAF-NRS) in a sample of 588 patients with RA. Results. A 1-factor CFA model yielded a similar fit to a 5-factor model with subscale-specific dimensions, and the items from the different instruments adequately fit the IRT model, suggesting essential unidimensionality in measurement. The SF-36 vitality scale outperformed the BRAF-MDQ at lower levels of fatigue, but was less precise at moderate to higher levels of fatigue. At these levels of fatigue, the living, cognition, and emotion subscales of the BRAF-MDQ provide additional precision. The BRAF-NRS showed a limited measurement range with its highest precision centered on average levels of fatigue. Conclusion. The different instruments appear to access a common underlying domain of fatigue severity, but differ considerably in their measurement precision along the continuum. The SF-36 vitality scale can be used to measure fatigue severity in samples with relatively mild fatigue. For samples expected to have higher levels of fatigue, the multidimensional BRAF-MDQ appears to be a better choice. The BRAF-NRS are not recommended if precise assessment is required, for instance in longitudinal settings. (J Rheumatol First Release Jan 15 2015
    corecore