Search CORE

31 research outputs found

An assessment of functioning and non-functioning distractors in multiple-choice questions: a descriptive analysis

Author: A Tversky
Ahmed M Mohammed
AR Delgado
D Precht
DB Swanson
FM Lord
GJ Cizek
GJ Cizek
James Ware
JC Masters
JE Bruno
JK Farley
JT Sidick
KD Crehan
LWT Schuwirth
M Tarrant
M Tarrant
Marie Tarrant
MC Rodriguez
MG Aamodt
MS Trevisan
MS Trevisan
P McCoubrie
PM Wallach
RE Landrum
RL Ebel
SJ Osterlind
SM Case
SM Downing
StatCorp
SV Owen
T Shizuka
TM Haladyna
TM Haladyna
TM Haladyna
TM Haladyna
TM Haladyna
WT Rogers
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Four- or five-option multiple choice questions (MCQs) are the standard in health-science disciplines, both on certification-level examinations and on in-house developed tests. Previous research has shown, however, that few MCQs have three or four functioning distractors. The purpose of this study was to investigate non-functioning distractors in teacher-developed tests in one nursing program in an English-language university in Hong Kong. Methods Using item-analysis data, we assessed the proportion of non-functioning distractors on a sample of seven test papers administered to undergraduate nursing students. A total of 514 items were reviewed, including 2056 options (1542 distractors and 514 correct responses). Non-functioning options were defined as ones that were chosen by fewer than 5% of examinees and those with a positive option discrimination statistic. Results The proportion of items containing 0, 1, 2, and 3 functioning distractors was 12.3%, 34.8%, 39.1%, and 13.8% respectively. Overall, items contained an average of 1.54 (SD = 0.88) functioning distractors. Only 52.2% (n = 805) of all distractors were functioning effectively and 10.2% (n = 158) had a choice frequency of 0. Items with more functioning distractors were more difficult and more discriminating. Conclusion The low frequency of items with three functioning distractors in the four-option items in this study suggests that teachers have difficulty developing plausible distractors for most MCQs. Test items should consist of as many options as is feasible given the item content and the number of plausible distractors; in most cases this would be three. Item analysis results can be used to identify and remove non-functioning distractors from MCQs that have been used in previous tests.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

HKU Scholars Hub

Standard setting: Comparison of two methods

Author: A Kramer
BH Verhoeven
BH Verhoeven
DB Wayne
DC Howell
DM Kaufman
Femi Oyebode
G Hurtz
G Talente
GJ Cizek
GV Glass
J Searle
JC Impara
JC Impara
JC Impara
JJ Norcini
JJ Norcini
JJ Norcini
K Boursicot
M Cusimano
M Kane
M Sayeed Haque
MD Reckase
MJ Zieky
ML Fehrmann
National Research Council
P Armitage
PR Brandon
S Humphry-Murto
S Kilminster
Sanju George
SM Case
SM Downing
WA Angoff
Publication venue: BioMed Central
Publication date: 14/09/2006
Field of study

BACKGROUND: The outcome of assessments is determined by the standard-setting method used. There is a wide range of standard – setting methods and the two used most extensively in undergraduate medical education in the UK are the norm-reference and the criterion-reference methods. The aims of the study were to compare these two standard-setting methods for a multiple-choice question examination and to estimate the test-retest and inter-rater reliability of the modified Angoff method. METHODS: The norm – reference method of standard -setting (mean minus 1 SD) was applied to the 'raw' scores of 78 4th-year medical students on a multiple-choice examination (MCQ). Two panels of raters also set the standard using the modified Angoff method for the same multiple-choice question paper on two occasions (6 months apart). We compared the pass/fail rates derived from the norm reference and the Angoff methods and also assessed the test-retest and inter-rater reliability of the modified Angoff method. RESULTS: The pass rate with the norm-reference method was 85% (66/78) and that by the Angoff method was 100% (78 out of 78). The percentage agreement between Angoff method and norm-reference was 78% (95% CI 69% – 87%). The modified Angoff method had an inter-rater reliability of 0.81 – 0.82 and a test-retest reliability of 0.59–0.74. CONCLUSION: There were significant differences in the outcomes of these two standard-setting methods, as shown by the difference in the proportion of candidates that passed and failed the assessment. The modified Angoff method was found to have good inter-rater reliability and moderate test-retest reliability

Crossref

Springer - Publisher Connector

University of Birmingham Research Portal

PubMed Central

Setting defensible standards in small cohort OSCEs: Understanding better when borderline regression can ‘work’

Author: American Educational Research Association.
Cizek GJ
Godfrey Pell
Harden R
Hejri SM
Jennifer Hallam
Matt Homer
Richard Fuller
Wilcox RR
Publication venue: 'Informa UK Limited'
Publication date: 03/03/2020
Field of study

Introduction: Borderline regression (BRM) is considered problematic in small cohort OSCEs (e.g. n < 50), with institutions often relying on item-centred standard setting approaches which can be resource intensive and lack defensibility in performance tests. Methods: Through an analysis of post-hoc station- and test-level metrics, we investigate the application of BRM in three different small-cohort OSCE contexts: the exam for international medical graduates wanting to practice in the UK, senior sequential undergraduate exams, and Physician associates exams in a large UK medical school. Results: We find that BRM provides robust metrics and concomitantly defensible cut scores in the majority of stations (percentage of problematic stations 5, 14, and 12%, respectively across our three contexts). Where problems occur, this is generally due to an insufficiently strong relationship between global grades and checklist scores to be confident in the standard set by BRM in these stations. Conclusion: This work challenges previous assumptions about the application of BRM in small test cohorts. Where there is sufficient spread of ability, BRM will generally provide defensible standards, assuming careful design of station-level scoring instruments. However, extant station cut-scores are preferred as a substitute where BRM standard setting problems do occur

Crossref

White Rose Research Online