234 research outputs found

    Differential Item Functioning in PISA Due to Mode Effects

    Get PDF
    One of the most important goals of the Programme for International Student Assessment (PISA) is assessing national changes in educational performance over time. These so-called trend results inform policy makers about the development of ability of 15-year-old students within a specific country. The validity of those trend results prescribes invariant test conditions. In the 2015 PISA survey, several alterations to the test administration were implemented, including a switch from paper-based assessments to computer-based assessments for most countries (OECD 2016a). This alteration of the assessment mode is examined by evaluating if the items used to assess trends are subject to differential item functioning across PISA surveys (2012 vs. 2015). Furthermore, the impact on the trend results due to the change in assessment mode of the Netherlands is assessed. The results show that the decrease reported for mathematics in the Netherlands is smaller when results are based upon a separate national calibration.</p

    Methodological issues regarding power of classical test theory (CTT) and item response theory (IRT)-based approaches for the comparison of patient-reported outcomes in two groups of patients - a simulation study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Patients-Reported Outcomes (PRO) are increasingly used in clinical and epidemiological research. Two main types of analytical strategies can be found for these data: classical test theory (CTT) based on the observed scores and models coming from Item Response Theory (IRT). However, whether IRT or CTT would be the most appropriate method to analyse PRO data remains unknown. The statistical properties of CTT and IRT, regarding power and corresponding effect sizes, were compared.</p> <p>Methods</p> <p>Two-group cross-sectional studies were simulated for the comparison of PRO data using IRT or CTT-based analysis. For IRT, different scenarios were investigated according to whether items or person parameters were assumed to be known, to a certain extent for item parameters, from good to poor precision, or unknown and therefore had to be estimated. The powers obtained with IRT or CTT were compared and parameters having the strongest impact on them were identified.</p> <p>Results</p> <p>When person parameters were assumed to be unknown and items parameters to be either known or not, the power achieved using IRT or CTT were similar and always lower than the expected power using the well-known sample size formula for normally distributed endpoints. The number of items had a substantial impact on power for both methods.</p> <p>Conclusion</p> <p>Without any missing data, IRT and CTT seem to provide comparable power. The classical sample size formula for CTT seems to be adequate under some conditions but is not appropriate for IRT. In IRT, it seems important to take account of the number of items to obtain an accurate formula.</p

    Rasch model analysis of the Depression, Anxiety and Stress Scales (DASS)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>There is a growing awareness of the need for easily administered, psychometrically sound screening tools to identify individuals with elevated levels of psychological distress. Although support has been found for the psychometric properties of the Depression, Anxiety and Stress Scales (DASS) using classical test theory approaches it has not been subjected to Rasch analysis. The aim of this study was to use Rasch analysis to assess the psychometric properties of the DASS-21 scales, using two different administration modes.</p> <p>Methods</p> <p>The DASS-21 was administered to 420 participants with half the sample responding to a web-based version and the other half completing a traditional pencil-and-paper version. Conformity of DASS-21 scales to a Rasch partial credit model was assessed using the RUMM2020 software.</p> <p>Results</p> <p>To achieve adequate model fit it was necessary to remove one item from each of the DASS-21 subscales. The reduced scales showed adequate internal consistency reliability, unidimensionality and freedom from differential item functioning for sex, age and mode of administration. Analysis of all DASS-21 items combined did not support its use as a measure of general psychological distress. A scale combining the anxiety and stress items showed satisfactory fit to the Rasch model after removal of three items.</p> <p>Conclusion</p> <p>The results provide support for the measurement properties, internal consistency reliability, and unidimensionality of three slightly modified DASS-21 scales, across two different administration methods. The further use of Rasch analysis on the DASS-21 in larger and broader samples is recommended to confirm the findings of the current study.</p

    A Rasch and factor analysis of the Functional Assessment of Cancer Therapy-General (FACT-G)

    Get PDF
    BACKGROUND: Although the Functional Assessment of Cancer Therapy – General questionnaire (FACT-G) has been validated few studies have explored the factor structure of the instrument, in particular using non-sample dependent measurement techniques, such as Rasch Models. Furthermore, few studies have explored the relationship between item fit to the Rasch Model and clinical utility. The aim of this study was to investigate the dimensionality and measurement properties of the FACT-G with Rasch Models and Factor analysis. METHODS: A factor analysis and Rasch analysis (Partial Credit Model) was carried out on the FACT-G completed by a heterogeneous sample of cancer patients (n = 465). For the Rasch analysis item fit (infit mean squares ≥ 1.30), dimensionality and item invariance were assessed. The impact of removing misfitting items on the clinical utility of the subscales and FACT-G total scale was also assessed. RESULTS: The factor analysis demonstrated a four factor structure of the FACT-G which broadly corresponded to the four subscales of the instrument. Internal consistency for these four scales was very good (Cronbach's alpha 0.72 – 0.85). The Rasch analysis demonstrated that each of the subscales and the FACT-G total scale had misfitting items (infit means square ≥ 1.30). All these scales with the exception of the Social & Family Well-being Scale (SFWB) were unidimensional. When misfitting items were removed, the effect sizes and the clinical utility of the instrument were maintained for the subscales and the total FACT-G scores. CONCLUSION: The results of the traditional factor analysis and Rasch analysis of the FACT-G broadly agreed. Caution should be exercised when utilising the Social & Family Well-being scale and further work is required to determine whether this scale is best represented by two factors. Additionally, removing misfitting items from scales should be performed alongside an assessment of the impact on clinical utility

    Rasch analysis of the Psychiatric Out-Patient Experiences Questionnaire (POPEQ)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Psychiatric Out-Patient Experiences Questionnaire (POPEQ) is an 11-item core measure of psychiatric out-patients experiences of the perceived outcome of the treatment, the quality of interaction with the clinician, and the quality of information provision. The POPEQ was found to have evidence for reliability and validity following the application of classical test theory but has not previously been assessed by Rasch analysis.</p> <p>Methods</p> <p>Two national postal surveys of psychiatric outpatients took place in Norway in 2004 and 2007. The performance of the POPEQ, including item functioning and differential item functioning, was assessed by Rasch analysis. Principal component analysis of item residuals was used to assess the presence of subdimensions.</p> <p>Results</p> <p>6,677 (43.3%) and 11,085 (35.2%) psychiatric out patients responded to the questionnaire in 2004 and 2007, respectively. All items in the scale were retained after the Rasch analysis. The resulting scale had reasonably good fit to the Rasch model. The items performed the same for the two survey years and there was no differential item functioning relating to patient characteristics. Principal component analysis of the residuals confirmed that the measure to a high degree is unidimensional. However, the data also reflects three potential subscales, each relating to one of the three included aspects of health care.</p> <p>Conclusions</p> <p>The POPEQ had excellent psychometric properties and Rasch analysis further supported the construct validity of the scale by also identifying the three subdimensions originally included as components in the instrument development. The 11-item instrument is recommended in future research on psychiatric out-patient experiences. Future development may lead to the construction of more precise measures of the three subdomains that the POPEQ is based on.</p

    KIDMAP, a web based system for gathering patients' feedback on their doctors

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The gathering of feedback on doctors from patients after consultations is an important part of patient involvement and participation. This study first assesses the 23-item Patient Feedback Questionnaire (PFQ) designed by the Picker Institute, Europe, to determine whether these items form a single latent trait. Then, an Internet module with visual representation is developed to gather patient views about their doctors; this program then distributes the individualized results by email.</p> <p>Methods</p> <p>A total of 450 patients were randomly recruited from a 1300-bed-size medical center in Taiwan. The Rasch rating scale model was used to examine the data-fit. Differential item functioning (DIF) analysis was conducted to verify construct equivalence across the groups. An Internet module with visual representation was developed to provide doctors with the patient's online feedback.</p> <p>Results</p> <p>Twenty-one of the 23 items met the model's expectation, namely that they constitute a single construct. The test reliability was 0.94. DIF was found between ages and different kinds of disease, but not between genders and education levels. The visual approach of the KIDMAP module on the WWW seemed to be an effective approach to the assessment of patient feedback in a clinical setting.</p> <p>Conclusion</p> <p>The revised 21-item PFQ measures a single construct. Our work supports the hypothesis that the revised PFQ online version is both valid and reliable, and that the KIDMAP module is good at its designated task. Further research is needed to confirm data congruence for patients with chronic diseases.</p
    • …
    corecore