73 research outputs found

    Calculating the random guess scores of multiple-response and matching test items

    Get PDF
    For achievement tests, the guess score is often used as a baseline for the lowest possible grade for score to grade transformations and setting the cut scores. For test item types such as multiple-response, matching and drag-and-drop, determin-ing the guess score requires more elaborate calculations than the more straight-forward calculation of the guess score for True-False and multiple-choice test item formats. For various variants of multiple-response and matching types with respect to dichotomous and polytomous scoring, methods for determining the guess score are presented and illustrated with practical applications. The implica-tions for theory and practice are discussed

    Developing and Verifying the Psychometric Integrity of the Certification Examination for Imaging Informatics Professionals

    Get PDF
    The American Board of Imaging Informatics (ABII) was founded in 2005 by the Society of Imaging Informatics in Medicine (SIIM) and the American Registry of Radiologic Technologists (ARRT). ABII’s mission is to enhance patient care, professionalism, and competence in imaging informatics. This is accomplished primarily through the development and administration of a certification examination. The creation of the exam has been an exercise in open community involvement with SIIM providing access to the PACS community and ARRT providing skilled psychometric support to ensure a balanced and comprehensive examination. The process to generate the exam required several years and the efforts of dozens of subject matter experts active who volunteered to submit and validate questions for the examination. This article describes the organizational and statistical processes used to generate test items, assemble test forms, set performance standards, and validate test scores

    A collaborative comparison of Objective Structured Clinical Examination (OSCE) standard setting methods at Australian medical schools

    Get PDF
    Background: A key issue underpinning the usefulness of the OSCE assessment to medical education is standard-setting, but the majority of standard-setting methods remain challenging for performance assessment because they produce varying passing marks. Several studies have compared standard setting methods; however, most of these studies are limited by their experimental scope, or use data on examinee performance at a single OSCE station or from a single medical school. This collaborative study between ten Australian medical schools investigated the effect of standard-setting methods on OSCE cut scores and failure rates. Methods: This research used 5,256 examinee scores from seven shared OSCE stations to calculate cut scores and failure rates using two different compromise standard-setting methods, namely the Borderline Regression and Cohen's methods. Results: The results of this study indicate that Cohen's method yields similar outcomes to the Borderline Regression method, particularly for large examinee cohort sizes. However, with lower examinee numbers on a station, the Borderline Regression method resulted in higher cut scores and larger difference margins in the failure rates. Conclusion: Cohen's method yields similar outcomes as the Borderline Regression method and its application for benchmarking purposes and in resource-limited settings is justifiable, particularly with large examinee numbers

    An assessment of functioning and non-functioning distractors in multiple-choice questions: a descriptive analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Four- or five-option multiple choice questions (MCQs) are the standard in health-science disciplines, both on certification-level examinations and on in-house developed tests. Previous research has shown, however, that few MCQs have three or four functioning distractors. The purpose of this study was to investigate non-functioning distractors in teacher-developed tests in one nursing program in an English-language university in Hong Kong.</p> <p>Methods</p> <p>Using item-analysis data, we assessed the proportion of non-functioning distractors on a sample of seven test papers administered to undergraduate nursing students. A total of 514 items were reviewed, including 2056 options (1542 distractors and 514 correct responses). Non-functioning options were defined as ones that were chosen by fewer than 5% of examinees and those with a positive option discrimination statistic.</p> <p>Results</p> <p>The proportion of items containing 0, 1, 2, and 3 functioning distractors was 12.3%, 34.8%, 39.1%, and 13.8% respectively. Overall, items contained an average of 1.54 (SD = 0.88) functioning distractors. Only 52.2% (n = 805) of all distractors were functioning effectively and 10.2% (n = 158) had a choice frequency of 0. Items with more functioning distractors were more difficult and more discriminating.</p> <p>Conclusion</p> <p>The low frequency of items with three functioning distractors in the four-option items in this study suggests that teachers have difficulty developing plausible distractors for most MCQs. Test items should consist of as many options as is feasible given the item content and the number of plausible distractors; in most cases this would be three. Item analysis results can be used to identify and remove non-functioning distractors from MCQs that have been used in previous tests.</p

    Motives of cheating among secondary students: The role of self-efficacy and peer influence

    Get PDF
    A survey research study was conducted with a sample of 100 secondary students from a local secondary school about the motives of cheating. The primary focus of this study was the interplay among variables of self-efficacy, peer influence and cheating. The results showed that students with low self-efficacy were more likely to cheat than those who perceived themselves as efficacious. It was further found that peers played a significant role in discouraging cheating by expressing disapproval and informing teachers of dishonest behaviour

    Standard setting: Comparison of two methods

    Get PDF
    BACKGROUND: The outcome of assessments is determined by the standard-setting method used. There is a wide range of standard – setting methods and the two used most extensively in undergraduate medical education in the UK are the norm-reference and the criterion-reference methods. The aims of the study were to compare these two standard-setting methods for a multiple-choice question examination and to estimate the test-retest and inter-rater reliability of the modified Angoff method. METHODS: The norm – reference method of standard -setting (mean minus 1 SD) was applied to the 'raw' scores of 78 4th-year medical students on a multiple-choice examination (MCQ). Two panels of raters also set the standard using the modified Angoff method for the same multiple-choice question paper on two occasions (6 months apart). We compared the pass/fail rates derived from the norm reference and the Angoff methods and also assessed the test-retest and inter-rater reliability of the modified Angoff method. RESULTS: The pass rate with the norm-reference method was 85% (66/78) and that by the Angoff method was 100% (78 out of 78). The percentage agreement between Angoff method and norm-reference was 78% (95% CI 69% – 87%). The modified Angoff method had an inter-rater reliability of 0.81 – 0.82 and a test-retest reliability of 0.59–0.74. CONCLUSION: There were significant differences in the outcomes of these two standard-setting methods, as shown by the difference in the proportion of candidates that passed and failed the assessment. The modified Angoff method was found to have good inter-rater reliability and moderate test-retest reliability

    Summative assessment of 5th year medical students' clinical reasoning by script concordance test: requirements and challenges

    Get PDF
    Background: The Script Concordance Test (SCT) has not been reported in summative assessment of students across the multiple domains of a medical curriculum. We report the steps used to build a test for summative assessment in a medical curriculum. Methods: A 51 case, 158-question, multidisciplinary paper was constructed to assess clinical reasoning in 5th-year. 10–16 experts in each of 7 discipline-based reference panels answered questions on-line. A multidisciplinary group considered reference panel data and data from a volunteer group of 6th Years, who sat the same test, to determine the passing score for the 5th Years. Results: The mean (SD) scores were 63.6 (7.6) and 68.6 (4.8) for the 6th Year (n = 23, alpha = 0.78) and and 5th Year (n = 132, alpha =0.62) groups (p < 0.05), respectively. The passing score was set at 4 SD from the expert mean. Four students failed. Conclusions: The SCT may be a useful method to assess clinical reasoning in medical students in multidisciplinary summative assessments. Substantial investment in training of faculty and students and in the development of questions is required.Paul Duggan and Bernard Charli

    The Place of Psychometricians’ Beliefs in Educational Reform: A Rejoinder to Shepard

    No full text

    How to measure the quality of the OSCE: A review of metrics – AMEE guide no. 49

    No full text
    With an increasing use of criterion-based assessment techniques in both undergraduate and postgraduate healthcare programmes, there is a consequent need to ensure the quality and rigour of these assessments. The obvious question for those responsible for delivering assessment is how is this 'quality' measured, and what mechanisms might there be that allow improvements in assessment quality over time to be demonstrated? Whilst a small base of literature exists, few papers give more than one or two metrics as measures of quality in Objective Structured Clinical Examinations (OSCEs). In this guide, aimed at assessment practitioners, the authors aim to review the metrics that are available for measuring quality and indicate how a rounded picture of OSCE assessment quality may be constructed by using a variety of such measures, and also to consider which characteristics of the OSCE are appropriately judged by which measure(s). The authors will discuss the quality issues both at the individual station level and across the complete clinical assessment as a whole, using a series of 'worked examples' drawn from OSCE data sets from the authors' institution
    • …
    corecore