93 research outputs found

    Rasch scaling procedures for informing development of a valid Fetal Surveillance Education Program multiple-choice assessment

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It is widely recognised that deficiencies in fetal surveillance practice continue to contribute significantly to the burden of adverse outcomes. This has prompted the development of evidence-based clinical practice guidelines by the Royal Australian and New Zealand College of Obstetricians and Gynaecologists and an associated Fetal Surveillance Education Program to deliver the associated learning. This article describes initial steps in the validation of a corresponding multiple-choice assessment of the relevant educational outcomes through a combination of item response modelling and expert judgement.</p> <p>Methods</p> <p>The Rasch item response model was employed for item and test analysis and to empirically derive the substantive interpretation of the assessment variable. This interpretation was then compared to the hierarchy of competencies specified a priori by a team of eight subject-matter experts. Classical Test Theory analyses were also conducted.</p> <p>Results</p> <p>A high level of agreement between the hypothesised and derived variable provided evidence of construct validity. Item and test indices from Rasch analysis and Classical Test Theory analysis suggested that the current test form was of moderate quality. However, the analyses made clear the required steps for establishing a valid assessment of sufficient psychometric quality. These steps included: increasing the number of items from 40 to 50 in the first instance, reviewing ineffective items, targeting new items to specific content and difficulty gaps, and formalising the assessment blueprint in light of empirical information relating item structure to item difficulty.</p> <p>Conclusion</p> <p>The application of the Rasch model for criterion-referenced assessment validation with an expert stakeholder group is herein described. Recommendations for subsequent item and test construction are also outlined in this article.</p

    Creating an Instrument to Measure Student Response to Instructional Practices

    Full text link
    BackgroundCalls for the reform of education in science, technology, engineering, and mathematics (STEM) have inspired many instructional innovations, some research based. Yet adoption of such instruction has been slow. Research has suggested that students’ response may significantly affect an instructor’s willingness to adopt different types of instruction.PurposeWe created the Student Response to Instructional Practices (StRIP) instrument to measure the effects of several variables on student response to instructional practices. We discuss the step‐by‐step process for creating this instrument.Design/MethodThe development process had six steps: item generation and construct development, validity testing, implementation, exploratory factor analysis, confirmatory factor analysis, and instrument modification and replication. We discuss pilot testing of the initial instrument, construct development, and validation using exploratory and confirmatory factor analyses.ResultsThis process produced 47 items measuring three parts of our framework. Types of instruction separated into four factors (interactive, constructive, active, and passive); strategies for using in‐class activities into two factors (explanation and facilitation); and student responses to instruction into five factors (value, positivity, participation, distraction, and evaluation).ConclusionsWe describe the design process and final results for our instrument, a useful tool for understanding the relationship between type of instruction and students’ response.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/136692/1/jee20162_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/136692/2/jee20162.pd

    Iterative Linking With the Differential Functioning of Items and Tests (DFIT) Method: Comparison of Testwide and Item Parameter Replication (IPR) Critical Values

    No full text
    A Monte Carlo study was conducted to examine the accuracy of differential item functioning (DIF) detection using the differential functioning of items and tests (DFIT) method. Specifically, the performance of DFIT was compared using “testwide” critical values suggested by Flowers, Oshima, and Raju, based on simulations involving large numbers of DIF-free items, with item-specific critical values obtained via the newer item parameter replication (IPR) method. Also examined were the benefits of single-stage, two-stage, and iterative linking for dichotomous and ordered polytomous data involving samples of various size, tests of different length, types and percentages of DIF items, and levels of impact. Overall, the results indicated that testwide and IPR-based critical values corresponding to a nominal alpha of .01 provided similar power for detecting DIF due to shifts in extremity parameters, but IPR power was generally lower when DIF was due to differences in discrimination. In addition, IPR-based critical values provided as good as or better control of Type I error under most conditions, and results improved for all methods when using two-stage and iterative linking rather than single-stage linking and DIF analysis. The implications of these findings for future research involving the DFIT method with various item response models are discussed
    • 

    corecore