BackgroundAlthough interpretive performance varies substantially among radiologists, such variation has not been examined among mammography facilities. Understanding sources of facility variation could become a foundation for improving interpretive performance.MethodsIn this cross-sectional study conducted between 1996 and 2002, we surveyed 53 facilities to evaluate associations between facility structure, interpretive process characteristics, and interpretive performance of screening mammography (ie, sensitivity, specificity, positive predictive value [PPV1], and the likelihood of cancer among women who were referred for biopsy [PPV2]). Measures of interpretive performance were ascertained prospectively from mammography interpretations and cancer data collected by the Breast Cancer Surveillance Consortium. Logistic regression and receiver operating characteristic (ROC) curve analyses estimated the association between facility characteristics and mammography interpretive performance or accuracy (area under the ROC curve [AUC]). All P values were two-sided.ResultsOf the 53 eligible facilities, data on 44 could be analyzed. These 44 facilities accounted for 484 463 screening mammograms performed on 237 669 women, of whom 2686 were diagnosed with breast cancer during follow-up. Among the 44 facilities, mean sensitivity was 79.6% (95% confidence interval [CI] = 74.3% to 84.9%), mean specificity was 90.2% (95% CI = 88.3% to 92.0%), mean PPV1 was 4.1% (95% CI = 3.5% to 4.7%), and mean PPV2 was 38.8% (95% CI = 32.6% to 45.0%). The facilities varied statistically significantly in specificity (P < .001), PPV1 (P < .001), and PPV2 (P = .002) but not in sensitivity (P = .99). AUC was higher among facilities that offered screening mammograms alone vs those that offered screening and diagnostic mammograms (0.943 vs 0.911, P = .006), had a breast imaging specialist interpreting mammograms vs not (0.932 vs 0.905, P = .004), did not perform double reading vs independent double reading vs consensus double reading (0.925 vs 0.915 vs 0.887, P = .034), or conducted audit reviews two or more times per year vs annually vs at an unknown frequency (0.929 vs 0.904 vs 0.900, P = .018).ConclusionMammography interpretive performance varies statistically significantly by facility