Search CORE

Open Research Online (The Open University)

Developing and Verifying the Psychometric Integrity of the Certification Examination for Imaging Informatics Professionals

Author: American Registry of Radiologic Technolgists
B Reiner
GJ Cizek
Mark Raymond
MJ Kolen
P Nagy
Paul G. Nagy
Publication venue: Springer-Verlag
Publication date: 01/01/2009
Field of study

The American Board of Imaging Informatics (ABII) was founded in 2005 by the Society of Imaging Informatics in Medicine (SIIM) and the American Registry of Radiologic Technologists (ARRT). ABII’s mission is to enhance patient care, professionalism, and competence in imaging informatics. This is accomplished primarily through the development and administration of a certification examination. The creation of the exam has been an exercise in open community involvement with SIIM providing access to the PACS community and ARRT providing skilled psychometric support to ensure a balanced and comprehensive examination. The process to generate the exam required several years and the efforts of dozens of subject matter experts active who volunteered to submit and validate questions for the examination. This article describes the organizational and statistical processes used to generate test items, assemble test forms, set performance standards, and validate test scores

Maastricht University Research Portal

A collaborative comparison of Objective Structured Clinical Examination (OSCE) standard setting methods at Australian medical schools

Author: Angoff WH.
Barman A.
Bunmi Sherifat Malau-Aduli
Cees van der Vleuten
Cizek GJ.
Clare Heal
David L. Garne
Hambleton RK
Hofstee WKB.
Karen D’Souza
Livingstone SA
Malau-Aduli BS
Peta-Ann Teague
Richard Turner
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2017
Field of study

Background: A key issue underpinning the usefulness of the OSCE assessment to medical education is standard-setting, but the majority of standard-setting methods remain challenging for performance assessment because they produce varying passing marks. Several studies have compared standard setting methods; however, most of these studies are limited by their experimental scope, or use data on examinee performance at a single OSCE station or from a single medical school. This collaborative study between ten Australian medical schools investigated the effect of standard-setting methods on OSCE cut scores and failure rates. Methods: This research used 5,256 examinee scores from seven shared OSCE stations to calculate cut scores and failure rates using two different compromise standard-setting methods, namely the Borderline Regression and Cohen's methods. Results: The results of this study indicate that Cohen's method yields similar outcomes to the Borderline Regression method, particularly for large examinee cohort sizes. However, with lower examinee numbers on a station, the Borderline Regression method resulted in higher cut scores and larger difference margins in the failure rates. Conclusion: Cohen's method yields similar outcomes as the Borderline Regression method and its application for benchmarking purposes and in resource-limited settings is justifiable, particularly with large examinee numbers

Deakin Research Online

ResearchOnline at James Cook University

University of Tasmania Open Access Repository

Radboud Repository

Research Online

An assessment of functioning and non-functioning distractors in multiple-choice questions: a descriptive analysis

Author: A Tversky
Ahmed M Mohammed
AR Delgado
D Precht
DB Swanson
FM Lord
GJ Cizek
GJ Cizek
James Ware
JC Masters
JE Bruno
JK Farley
JT Sidick
KD Crehan
LWT Schuwirth
M Tarrant
M Tarrant
Marie Tarrant
MC Rodriguez
MG Aamodt
MS Trevisan
MS Trevisan
P McCoubrie
PM Wallach
RE Landrum
RL Ebel
SJ Osterlind
SM Case
SM Downing
StatCorp
SV Owen
T Shizuka
TM Haladyna
TM Haladyna
TM Haladyna
TM Haladyna
TM Haladyna
WT Rogers
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Four- or five-option multiple choice questions (MCQs) are the standard in health-science disciplines, both on certification-level examinations and on in-house developed tests. Previous research has shown, however, that few MCQs have three or four functioning distractors. The purpose of this study was to investigate non-functioning distractors in teacher-developed tests in one nursing program in an English-language university in Hong Kong. Methods Using item-analysis data, we assessed the proportion of non-functioning distractors on a sample of seven test papers administered to undergraduate nursing students. A total of 514 items were reviewed, including 2056 options (1542 distractors and 514 correct responses). Non-functioning options were defined as ones that were chosen by fewer than 5% of examinees and those with a positive option discrimination statistic. Results The proportion of items containing 0, 1, 2, and 3 functioning distractors was 12.3%, 34.8%, 39.1%, and 13.8% respectively. Overall, items contained an average of 1.54 (SD = 0.88) functioning distractors. Only 52.2% (n = 805) of all distractors were functioning effectively and 10.2% (n = 158) had a choice frequency of 0. Items with more functioning distractors were more difficult and more discriminating. Conclusion The low frequency of items with three functioning distractors in the four-option items in this study suggests that teachers have difficulty developing plausible distractors for most MCQs. Test items should consist of as many options as is feasible given the item content and the number of plausible distractors; in most cases this would be three. Item analysis results can be used to identify and remove non-functioning distractors from MCQs that have been used in previous tests.</p

Directory of Open Access Journals

University of Lincoln Institutional Repository

HKU Scholars Hub

Motives of cheating among secondary students: The role of self-efficacy and peer influence

Author: A Bandura
A Lathrop
AE Jordan
BE Whitley
BE Whitley
BG Glaser
BJ Zimmerman
C Marshall
D McCabe
D Schunk
DL McCabe
DL McCabe
DL McCabe
EM Anderman
GJ Cizek
GJ Cizek
Kaili Chen Zhang
KV Finn
L Taylor
LK Treviño
MP Jendrek
NK Denzin
TB Murdock
TB Murdock
Wong Lok Yan Nora
YS Lincoln
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

A survey research study was conducted with a sample of 100 secondary students from a local secondary school about the motives of cheating. The primary focus of this study was the interplay among variables of self-efficacy, peer influence and cheating. The results showed that students with low self-efficacy were more likely to cheat than those who perceived themselves as efficacious. It was further found that peers played a significant role in discouraging cheating by expressing disapproval and informing teachers of dishonest behaviour

HKU Scholars Hub

Standard setting: Comparison of two methods

Author: A Kramer
BH Verhoeven
BH Verhoeven
DB Wayne
DC Howell
DM Kaufman
Femi Oyebode
G Hurtz
G Talente
GJ Cizek
GV Glass
J Searle
JC Impara
JC Impara
JC Impara
JJ Norcini
JJ Norcini
JJ Norcini
K Boursicot
M Cusimano
M Kane
M Sayeed Haque
MD Reckase
MJ Zieky
ML Fehrmann
National Research Council
P Armitage
PR Brandon
S Humphry-Murto
S Kilminster
Sanju George
SM Case
SM Downing
WA Angoff
Publication venue: BioMed Central
Publication date: 14/09/2006
Field of study

BACKGROUND: The outcome of assessments is determined by the standard-setting method used. There is a wide range of standard – setting methods and the two used most extensively in undergraduate medical education in the UK are the norm-reference and the criterion-reference methods. The aims of the study were to compare these two standard-setting methods for a multiple-choice question examination and to estimate the test-retest and inter-rater reliability of the modified Angoff method. METHODS: The norm – reference method of standard -setting (mean minus 1 SD) was applied to the 'raw' scores of 78 4th-year medical students on a multiple-choice examination (MCQ). Two panels of raters also set the standard using the modified Angoff method for the same multiple-choice question paper on two occasions (6 months apart). We compared the pass/fail rates derived from the norm reference and the Angoff methods and also assessed the test-retest and inter-rater reliability of the modified Angoff method. RESULTS: The pass rate with the norm-reference method was 85% (66/78) and that by the Angoff method was 100% (78 out of 78). The percentage agreement between Angoff method and norm-reference was 78% (95% CI 69% – 87%). The modified Angoff method had an inter-rater reliability of 0.81 – 0.82 and a test-retest reliability of 0.59–0.74. CONCLUSION: There were significant differences in the outcomes of these two standard-setting methods, as shown by the difference in the proportion of candidates that passed and failed the assessment. The modified Angoff method was found to have good inter-rater reliability and moderate test-retest reliability

University of Birmingham Research Portal

Summative assessment of 5th year medical students' clinical reasoning by script concordance test: requirements and challenges

Author: A Collard
A Collard
B Carrière
B Charlin
B Charlin
B Charlin
B Charlin
Bernard Charlin
C Lambert
E Palmer
GJ Cizek
HG Schmidt
K Boursicot
L Brazeau-Lamontagne
L Shepard
P Duggan
Paul Duggan
R Gagnon
S Lubarsky
S Lubarsky
S Meterissian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Background: The Script Concordance Test (SCT) has not been reported in summative assessment of students across the multiple domains of a medical curriculum. We report the steps used to build a test for summative assessment in a medical curriculum. Methods: A 51 case, 158-question, multidisciplinary paper was constructed to assess clinical reasoning in 5th-year. 10–16 experts in each of 7 discipline-based reference panels answered questions on-line. A multidisciplinary group considered reference panel data and data from a volunteer group of 6th Years, who sat the same test, to determine the passing score for the 5th Years. Results: The mean (SD) scores were 63.6 (7.6) and 68.6 (4.8) for the 6th Year (n = 23, alpha = 0.78) and and 5th Year (n = 132, alpha =0.62) groups (p < 0.05), respectively. The passing score was set at 4 SD from the expert mean. Four students failed. Conclusions: The SCT may be a useful method to assess clinical reasoning in medical students in multidisciplinary summative assessments. Substantial investment in training of faculty and students and in the development of questions is required.Paul Duggan and Bernard Charli

Adelaide Research & Scholarship

Directory of Open Access Journals

The Place of Psychometricians’ Beliefs in Educational Reform: A Rejoinder to Shepard

Author: Cizek GJ
Madaus GF
Publication venue: 'American Educational Research Association (AERA)'
Publication date
Field of study

How to measure the quality of the OSCE: A review of metrics – AMEE guide no. 49

Author: Cizek GJ
Cohen DS
Cusimano M
Godfrey Pell
Homer M
Matthew Homer
Norcini J
Richard Fuller
Streiner DL
Trudie Roberts
Publication venue: 'Informa UK Limited'
Publication date: 01/10/2010
Field of study

With an increasing use of criterion-based assessment techniques in both undergraduate and postgraduate healthcare programmes, there is a consequent need to ensure the quality and rigour of these assessments. The obvious question for those responsible for delivering assessment is how is this 'quality' measured, and what mechanisms might there be that allow improvements in assessment quality over time to be demonstrated? Whilst a small base of literature exists, few papers give more than one or two metrics as measures of quality in Objective Structured Clinical Examinations (OSCEs). In this guide, aimed at assessment practitioners, the authors aim to review the metrics that are available for measuring quality and indicate how a rounded picture of OSCE assessment quality may be constructed by using a variety of such measures, and also to consider which characteristics of the OSCE are appropriately judged by which measure(s). The authors will discuss the quality issues both at the individual station level and across the complete clinical assessment as a whole, using a series of 'worked examples' drawn from OSCE data sets from the authors' institution

White Rose Research Online

Validation Practices in the Social, Behavioral, and Health Sciences: A Synthesis of Syntheses

Author: A Anastasi
AM Hubley
American Educational Research Association
GJ Cizek
GJ Cizek
M Kane
PA Moss
PD Nichols
S Messick
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study