Objective: To assess the sources of measurement error in an electrocardiogram (ECG) interpretation examination given in a third-year internal medicine clerkship. Design: Three successive generalizability studies were conducted. 1) Multiple faculty rated student responses to a previously administered exam. 2) The rating criteria were revised and study 1 was repeated. 3) The examination was converted into an extended matching format including multiple cases with the same underlying cardiac problem. Results: The discrepancies among raters (main effects and interactions) were dwarfed by the error associated with case specificity. The largest source of the differences among raters was in rating student errors of commission rather than student errors of omission. Revisions in the rating criteria may have helped increase inter-rater reliability slightly however, due to case specificity, it had little impact on the overall reliability of the exam. The third study indicated the majority of the variability in student performance across cases was in performance across cases within the same type of cardiac problem rather than between different types of cardiac problems. Conclusions: Case specificity was the overwhelming source of measurement error. The variation among cases came mainly from discrepancies in performance between examples of the same cardiac problem rather than from differences in performance across different types of cardiac problems. This suggests it is necessary to include a large number of cases even if the goal is to assess performance on only a few types of cardiac problems
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.