6,769 research outputs found

    Parallel universes and parallel measures: estimating the reliability of test results

    Get PDF

    The internal reliability of some City & Guilds tests

    Get PDF

    Deconstructing therapy outcome measurement with Rasch analysis of a measure of general clinical distress: the Symptom Checklist-90-Revised

    Get PDF
    Rasch analysis was used to illustrate the usefulness of item-level analyses for evaluating a common therapy outcome measure of general clinical distress, the Symptom Checklist-90-Revised (SCL-90-R; Derogatis, 1994). Using complementary therapy research samples, the instrument's 5-point rating scale was found to exceed clients' ability to make reliable discriminations and could be improved by collapsing it into a 3-point version (combining scale points 1 with 2 and 3 with 4). This revision, in addition to removing 3 misfitting items, increased person separation from 4.90 to 5.07 and item separation from 7.76 to 8.52 (resulting in alphas of .96 and .99, respectively). Some SCL-90-R subscales had low internal consistency reliabilities; SCL-90-R items can be used to define one factor of general clinical distress that is generally stable across both samples, with two small residual factors

    Marker effects and examination reliability: a comparative exploration from the perspectives of generalizability theory, Rasch modelling and multilevel modelling

    Get PDF
    This study looked at how three different analysis methods could help us to understand rater effects on exam reliability. The techniques we looked at were: generalizability theory (G-theory) item response theory (IRT): in particular the Many-Facets Partial Credit Rasch Model (MFRM) multilevel modelling (MLM) We used data from AS component papers in geography and psychology for 2009, 2010 and 2011 from Edexcel.</p

    Psychometrics in Practice at RCEC

    Get PDF
    A broad range of topics is dealt with in this volume: from combining the psychometric generalizability and item response theories to the ideas for an integrated formative use of data-driven decision making, assessment for learning and diagnostic testing. A number of chapters pay attention to computerized (adaptive) and classification testing. Other chapters treat the quality of testing in a general sense, but for topics like maintaining standards or the testing of writing ability, the quality of testing is dealt with more specifically.\ud All authors are connected to RCEC as researchers. They present one of their current research topics and provide some insight into the focus of RCEC. The selection of the topics and the editing intends that the book should be of special interest to educational researchers, psychometricians and practitioners in educational assessment

    Gender Fairness within the Force Concept Inventory

    Get PDF
    Research on the test structure of the Force Concept Inventory (FCI) has largely ignored gender, and research on FCI gender effects (often reported as "gender gaps") has seldom interrogated the structure of the test. These rarely-crossed streams of research leave open the possibility that the FCI may not be structurally valid across genders, particularly since many reported results come from calculus-based courses where 75% or more of the students are men. We examine the FCI considering both psychometrics and gender disaggregation (while acknowledging this as a binary simplification), and find several problematic questions whose removal decreases the apparent gender gap. We analyze three samples (total Npre=5,391N_{pre}=5,391, Npost=5,769N_{post}=5,769) looking for gender asymmetries using Classical Test Theory, Item Response Theory, and Differential Item Functioning. The combination of these methods highlights six items that appear substantially unfair to women and two items biased in favor of women. No single physical concept or prior experience unifies these questions, but they are broadly consistent with problematic items identified in previous research. Removing all significantly gender-unfair items halves the gender gap in the main sample in this study. We recommend that instructors using the FCI report the reduced-instrument score as well as the 30-item score, and that credit or other benefits to students not be assigned using the biased items.Comment: 18 pages, 3 figures, 5 tables; submitted to Phys. Rev. PE
    • 

    corecore