Can Physicians Identify Inappropriate Nuclear Stress Tests? An Examination of Inter-Rater Reliability for the 2009 Appropriate Use Criteria for Radionuclide Imaging

Bokhari, Sabahat; Einstein, Andrew; Kelly, Christopher; Kelly, Maureen; Lewis, Matthew J.; Paz, Yehuda E.; Peck, Clara L.; Rabbani, Leroy E.; Rao, Shaline; Weiner, Shepard David; Ye, Siqin

Can Physicians Identify Inappropriate Nuclear Stress Tests? An Examination of Inter-Rater Reliability for the 2009 Appropriate Use Criteria for Radionuclide Imaging

Authors: Sabahat Bokhari
Andrew Einstein
Christopher Kelly
Maureen Kelly
Matthew J. Lewis
Yehuda E. Paz
Clara L. Peck
Leroy E. Rabbani
Shaline Rao
Shepard David Weiner
Siqin Ye
Publication date: 1 January 2015
Publisher: 'Columbia University Libraries/Information Services'
Doi

Abstract

Background—We sought to determine inter-rater reliability of the 2009 Appropriate Use Criteria for radionuclide imaging and whether physicians at various levels of training can effectively identify nuclear stress tests with inappropriate indications. Methods and Results—Four hundred patients were randomly selected from a consecutive cohort of patients undergoing nuclear stress testing at an academic medical center. Raters with different levels of training (including cardiology attending physicians, cardiology fellows, internal medicine hospitalists, and internal medicine interns) classified individual nuclear stress tests using the 2009 Appropriate Use Criteria. Consensus classification by 2 cardiologists was considered the operational gold standard, and sensitivity and specificity of individual raters for identifying inappropriate tests were calculated. Inter-rater reliability of the Appropriate Use Criteria was assessed using Cohen κ statistics for pairs of different raters. The mean age of patients was 61.5 years; 214 (54%) were female. The cardiologists rated 256 (64%) of 400 nuclear stress tests as appropriate, 68 (18%) as uncertain, 55 (14%) as inappropriate; 21 (5%) tests were unable to be classified. Inter-rater reliability for noncardiologist raters was modest (unweighted Cohen κ, 0.51, 95% confidence interval, 0.45–0.55). Sensitivity of individual raters for identifying inappropriate tests ranged from 47% to 82%, while specificity ranged from 85% to 97%. Conclusions—Inter-rater reliability for the 2009 Appropriate Use Criteria for radionuclide imaging is modest, and there is considerable variation in the ability of raters at different levels of training to identify inappropriate tests

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Sustaining member

Columbia University Academic Commons

oai:academiccommons.columbia.e...

Last time updated on 02/10/2018