10 research outputs found
End-to-end Prostate Cancer Detection in bpMRI via 3D CNNs: Effects of Attention Mechanisms, Clinical Priori and Decoupled False Positive Reduction
We present a multi-stage 3D computer-aided detection and diagnosis (CAD)
model for automated localization of clinically significant prostate cancer
(csPCa) in bi-parametric MR imaging (bpMRI). Deep attention mechanisms drive
its detection network, targeting salient structures and highly discriminative
feature dimensions across multiple resolutions. Its goal is to accurately
identify csPCa lesions from indolent cancer and the wide range of benign
pathology that can afflict the prostate gland. Simultaneously, a decoupled
residual classifier is used to achieve consistent false positive reduction,
without sacrificing high sensitivity or computational efficiency. In order to
guide model generalization with domain-specific clinical knowledge, a
probabilistic anatomical prior is used to encode the spatial prevalence and
zonal distinction of csPCa. Using a large dataset of 1950 prostate bpMRI paired
with radiologically-estimated annotations, we hypothesize that such CNN-based
models can be trained to detect biopsy-confirmed malignancies in an independent
cohort.
For 486 institutional testing scans, the 3D CAD system achieves
83.695.22% and 93.192.96% detection sensitivity at 0.50 and 1.46
false positive(s) per patient, respectively, with 0.8820.030 AUROC in
patient-based diagnosis significantly outperforming four state-of-the-art
baseline architectures (U-SEResNet, UNet++, nnU-Net, Attention U-Net) from
recent literature. For 296 external biopsy-confirmed testing scans, the
ensembled CAD system shares moderate agreement with a consensus of expert
radiologists (76.69%; 0.510.04) and independent pathologists
(81.08%; 0.560.06); demonstrating strong generalization to
histologically-confirmed csPCa diagnosis.Comment: Accepted to MedIA: Medical Image Analysis. This manuscript
incorporates and expands upon our 2020 Medical Imaging Meets NeurIPS Workshop
paper (arXiv:2011.00263
Annotation-efficient cancer detection with report-guided lesion annotation for deep learning-based prostate cancer detection in bpMRI
Deep learning-based diagnostic performance increases with more annotated
data, but large-scale manual annotations are expensive and labour-intensive.
Experts evaluate diagnostic images during clinical routine, and write their
findings in reports. Leveraging unlabelled exams paired with clinical reports
could overcome the manual labelling bottleneck. We hypothesise that detection
models can be trained semi-supervised with automatic annotations generated
using model predictions, guided by sparse information from clinical reports. To
demonstrate efficacy, we train clinically significant prostate cancer (csPCa)
segmentation models, where automatic annotations are guided by the number of
clinically significant findings in the radiology reports. We included 7,756
prostate MRI examinations, of which 3,050 were manually annotated. We evaluated
prostate cancer detection performance on 300 exams from an external centre with
histopathology-confirmed ground truth. Semi-supervised training improved
patient-based diagnostic area under the receiver operating characteristic curve
from to () and improved
lesion-based sensitivity at one false positive per case from
to (). Semi-supervised training was 14 more
annotation-efficient for case-based performance and 6 more
annotation-efficient for lesion-based performance. This improved performance
demonstrates the feasibility of our training procedure. Source code is publicly
available at github.com/DIAGNijmegen/Report-Guided-Annotation. Best csPCa
detection algorithm is available at
grand-challenge.org/algorithms/bpmri-cspca-detection-report-guided-annotations/
Common Limitations of Image Processing Metrics:A Picture Story
While the importance of automatic image analysis is continuously increasing,
recent meta-research revealed major flaws with respect to algorithm validation.
Performance metrics are particularly key for meaningful, objective, and
transparent performance assessment and validation of the used automatic
algorithms, but relatively little attention has been given to the practical
pitfalls when using specific metrics for a given image analysis task. These are
typically related to (1) the disregard of inherent metric properties, such as
the behaviour in the presence of class imbalance or small target structures,
(2) the disregard of inherent data set properties, such as the non-independence
of the test cases, and (3) the disregard of the actual biomedical domain
interest that the metrics should reflect. This living dynamically document has
the purpose to illustrate important limitations of performance metrics commonly
applied in the field of image analysis. In this context, it focuses on
biomedical image analysis problems that can be phrased as image-level
classification, semantic segmentation, instance segmentation, or object
detection task. The current version is based on a Delphi process on metrics
conducted by an international consortium of image analysis experts from more
than 60 institutions worldwide.Comment: This is a dynamic paper on limitations of commonly used metrics. The
current version discusses metrics for image-level classification, semantic
segmentation, object detection and instance segmentation. For missing use
cases, comments or questions, please contact [email protected] or
[email protected]. Substantial contributions to this document will be
acknowledged with a co-authorshi
Uncertainty-Aware Semi-Supervised Learning for Prostate MRI Zonal Segmentation
Quality of deep convolutional neural network predictions strongly depends on
the size of the training dataset and the quality of the annotations. Creating
annotations, especially for 3D medical image segmentation, is time-consuming
and requires expert knowledge. We propose a novel semi-supervised learning
(SSL) approach that requires only a relatively small number of annotations
while being able to use the remaining unlabeled data to improve model
performance. Our method uses a pseudo-labeling technique that employs recent
deep learning uncertainty estimation models. By using the estimated
uncertainty, we were able to rank pseudo-labels and automatically select the
best pseudo-annotations generated by the supervised model. We applied this to
prostate zonal segmentation in T2-weighted MRI scans. Our proposed model
outperformed the semi-supervised model in experiments with the ProstateX
dataset and an external test set, by leveraging only a subset of unlabeled data
rather than the full collection of 4953 cases, our proposed model demonstrated
improved performance. The segmentation dice similarity coefficient in the
transition zone and peripheral zone increased from 0.835 and 0.727 to 0.852 and
0.751, respectively, for fully supervised model and the uncertainty-aware
semi-supervised learning model (USSL). Our USSL model demonstrates the
potential to allow deep learning models to be trained on large datasets without
requiring full annotation. Our code is available at
https://github.com/DIAGNijmegen/prostateMR-USSL.Comment: 9 page
Artificial intelligence for prostate MRI: open datasets, available applications, and grand challenges
Artificial intelligence (AI) for prostate magnetic resonance imaging (MRI) is starting to play a clinical role for prostate cancer (PCa) patients. AI-assisted reading is feasible, allowing workflow reduction. A total of 3,369 multi-vendor prostate MRI cases are available in open datasets, acquired from 2003 to 2021 in Europe or USA at 3 T (n = 3,018; 89.6%) or 1.5 T (n = 296; 8.8%), 346 cases scanned with endorectal coil (10.3%), 3,023 (89.7%) with phased-array surface coils; 412 collected for anatomical segmentation tasks, 3,096 for PCa detection/classification; for 2,240 cases lesions delineation is available and 56 cases have matching histopathologic images; for 2,620 cases the PSA level is provided; the total size of all open datasets amounts to approximately 253 GB. Of note, quality of annotations provided per dataset highly differ and attention must be paid when using these datasets (e.g., data overlap). Seven grand challenges and commercial applications from eleven vendors are here considered. Few small studies provided prospective validation. More work is needed, in particular validation on large-scale multi-institutional, well-curated public datasets to test general applicability. Moreover, AI needs to be explored for clinical stages other than detection/characterization (e.g., follow-up, prognosis, interventions, and focal treatment)
Artificial intelligence for prostate MRI: open datasets, available applications, and grand challenges
Artificial intelligence (AI) for prostate magnetic resonance imaging (MRI) is starting to play a clinical role for prostate cancer (PCa) patients. AI-assisted reading is feasible, allowing workflow reduction. A total of 3,369 multi-vendor prostate MRI cases are available in open datasets, acquired from 2003 to 2021 in Europe or USA at 3 T (n = 3,018; 89.6%) or 1.5 T (n = 296; 8.8%), 346 cases scanned with endorectal coil (10.3%), 3,023 (89.7%) with phased-array surface coils; 412 collected for anatomical segmentation tasks, 3,096 for PCa detection/classification; for 2,240 cases lesions delineation is available and 56 cases have matching histopathologic images; for 2,620 cases the PSA level is provided; the total size of all open datasets amounts to approximately 253 GB. Of note, quality of annotations provided per dataset highly differ and attention must be paid when using these datasets (e.g., data overlap). Seven grand challenges and commercial applications from eleven vendors are here considered. Few small studies provided prospective validation. More work is needed, in particular validation on large-scale multi-institutional, well-curated public datasets to test general applicability. Moreover, AI needs to be explored for clinical stages other than detection/characterization (e.g., follow-up, prognosis, interventions, and focal treatment)
Combining public datasets for automated tooth assessment in panoramic radiographs
Objective: Panoramic radiographs (PRs) provide a comprehensive view of the oral and maxillofacial region and are used routinely to assess dental and osseous pathologies. Artificial intelligence (AI) can be used to improve the diagnostic accuracy of PRs compared to bitewings and periapical radiographs. This study aimed to evaluate the advantages and challenges of using publicly available datasets in dental AI research, focusing on solving the novel task of predicting tooth segmentations, FDI numbers, and tooth diagnoses, simultaneously. Materials and methods: Datasets from the OdontoAI platform (tooth instance segmentations) and the DENTEX challenge (tooth bounding boxes with associated diagnoses) were combined to develop a two-stage AI model. The first stage implemented tooth instance segmentation with FDI numbering and extracted regions of interest around each tooth segmentation, whereafter the second stage implemented multi-label classification to detect dental caries, impacted teeth, and periapical lesions in PRs. The performance of the automated tooth segmentation algorithm was evaluated using a free-response receiver-operating-characteristics (FROC) curve and mean average precision (mAP) metrics. The diagnostic accuracy of detection and classification of dental pathology was evaluated with ROC curves and F1 and AUC metrics. Results: The two-stage AI model achieved high accuracy in tooth segmentations with a FROC score of 0.988 and a mAP of 0.848. High accuracy was also achieved in the diagnostic classification of impacted teeth (F1 = 0.901, AUC = 0.996), whereas moderate accuracy was achieved in the diagnostic classification of deep caries (F1 = 0.683, AUC = 0.960), early caries (F1 = 0.662, AUC = 0.881), and periapical lesions (F1 = 0.603, AUC = 0.974). The model’s performance correlated positively with the quality of annotations in the used public datasets. Selected samples from the DENTEX dataset revealed cases of missing (false-negative) and incorrect (false-positive) diagnoses, which negatively influenced the performance of the AI model. Conclusions: The use and pooling of public datasets in dental AI research can significantly accelerate the development of new AI models and enable fast exploration of novel tasks. However, standardized quality assurance is essential before using the datasets to ensure reliable outcomes and limit potential biases.</p
The PI-CAI Challenge: Public Training and Development Dataset
This dataset represents the PI-CAI: Public Training and Development Dataset. It contains 1500 anonymized prostate biparametric MRI scans from 1476 patients, acquired between 2012-2021, at three centers (Radboud University Medical Center, University Medical Center Groningen, Ziekenhuis Groep Twente) based in The Netherlands. The PI-CAI challenge is an all-new grand challenge that aims to validate the diagnostic performance of artificial intelligence and radiologists at clinically significant prostate cancer (csPCa) detection/diagnosis in MRI, with histopathology and follow-up (≥ 3 years) as the reference standard, in a retrospective setting. The study hypothesizes that state-of-the-art AI algorithms, trained using thousands of patient exams, are non-inferior to radiologists reading bpMRI. Key aspects of the PI-CAI study design have been established in conjunction with an international scientific advisory board of 16 experts in prostate AI, radiology and urology —to unify and standardize present-day guidelines, and to ensure meaningful validation of prostate AI towards clinical translation (Reinke et al., 2021)
Artificial intelligence and radiologists in prostate cancer detection on MRI (PI-CAI):an international, paired, non-inferiority, confirmatory study
Background: Artificial intelligence (AI) systems can potentially aid the diagnostic pathway of prostate cancer by alleviating the increasing workload, preventing overdiagnosis, and reducing the dependence on experienced radiologists. We aimed to investigate the performance of AI systems at detecting clinically significant prostate cancer on MRI in comparison with radiologists using the Prostate Imaging—Reporting and Data System version 2.1 (PI-RADS 2.1) and the standard of care in multidisciplinary routine practice at scale. Methods: In this international, paired, non-inferiority, confirmatory study, we trained and externally validated an AI system (developed within an international consortium) for detecting Gleason grade group 2 or greater cancers using a retrospective cohort of 10 207 MRI examinations from 9129 patients. Of these examinations, 9207 cases from three centres (11 sites) based in the Netherlands were used for training and tuning, and 1000 cases from four centres (12 sites) based in the Netherlands and Norway were used for testing. In parallel, we facilitated a multireader, multicase observer study with 62 radiologists (45 centres in 20 countries; median 7 [IQR 5–10] years of experience in reading prostate MRI) using PI-RADS (2.1) on 400 paired MRI examinations from the testing cohort. Primary endpoints were the sensitivity, specificity, and the area under the receiver operating characteristic curve (AUROC) of the AI system in comparison with that of all readers using PI-RADS (2.1) and in comparison with that of the historical radiology readings made during multidisciplinary routine practice (ie, the standard of care with the aid of patient history and peer consultation). Histopathology and at least 3 years (median 5 [IQR 4–6] years) of follow-up were used to establish the reference standard. The statistical analysis plan was prespecified with a primary hypothesis of non-inferiority (considering a margin of 0·05) and a secondary hypothesis of superiority towards the AI system, if non-inferiority was confirmed. This study was registered at ClinicalTrials.gov, NCT05489341. Findings:Of the 10 207 examinations included from Jan 1, 2012, through Dec 31, 2021, 2440 cases had histologically confirmed Gleason grade group 2 or greater prostate cancer. In the subset of 400 testing cases in which the AI system was compared with the radiologists participating in the reader study, the AI system showed a statistically superior and non-inferior AUROC of 0·91 (95% CI 0·87–0·94; p<0·0001), in comparison to the pool of 62 radiologists with an AUROC of 0·86 (0·83–0·89), with a lower boundary of the two-sided 95% Wald CI for the difference in AUROC of 0·02. At the mean PI-RADS 3 or greater operating point of all readers, the AI system detected 6·8% more cases with Gleason grade group 2 or greater cancers at the same specificity (57·7%, 95% CI 51·6–63·3), or 50·4% fewer false-positive results and 20·0% fewer cases with Gleason grade group 1 cancers at the same sensitivity (89·4%, 95% CI 85·3–92·9). In all 1000 testing cases where the AI system was compared with the radiology readings made during multidisciplinary practice, non-inferiority was not confirmed, as the AI system showed lower specificity (68·9% [95% CI 65·3–72·4] vs 69·0% [65·5–72·5]) at the same sensitivity (96·1%, 94·0–98·2) as the PI-RADS 3 or greater operating point. The lower boundary of the two-sided 95% Wald CI for the difference in specificity (−0·04) was greater than the non-inferiority margin (−0·05) and a p value below the significance threshold was reached (p<0·001).Interpretation: An AI system was superior to radiologists using PI-RADS (2.1), on average, at detecting clinically significant prostate cancer and comparable to the standard of care. Such a system shows the potential to be a supportive tool within a primary diagnostic setting, with several associated benefits for patients and radiologists. Prospective validation is needed to test clinical applicability of this system. </p
Artificial intelligence and radiologists in prostate cancer detection on MRI (PI-CAI):an international, paired, non-inferiority, confirmatory study
Background: Artificial intelligence (AI) systems can potentially aid the diagnostic pathway of prostate cancer by alleviating the increasing workload, preventing overdiagnosis, and reducing the dependence on experienced radiologists. We aimed to investigate the performance of AI systems at detecting clinically significant prostate cancer on MRI in comparison with radiologists using the Prostate Imaging—Reporting and Data System version 2.1 (PI-RADS 2.1) and the standard of care in multidisciplinary routine practice at scale. Methods: In this international, paired, non-inferiority, confirmatory study, we trained and externally validated an AI system (developed within an international consortium) for detecting Gleason grade group 2 or greater cancers using a retrospective cohort of 10 207 MRI examinations from 9129 patients. Of these examinations, 9207 cases from three centres (11 sites) based in the Netherlands were used for training and tuning, and 1000 cases from four centres (12 sites) based in the Netherlands and Norway were used for testing. In parallel, we facilitated a multireader, multicase observer study with 62 radiologists (45 centres in 20 countries; median 7 [IQR 5–10] years of experience in reading prostate MRI) using PI-RADS (2.1) on 400 paired MRI examinations from the testing cohort. Primary endpoints were the sensitivity, specificity, and the area under the receiver operating characteristic curve (AUROC) of the AI system in comparison with that of all readers using PI-RADS (2.1) and in comparison with that of the historical radiology readings made during multidisciplinary routine practice (ie, the standard of care with the aid of patient history and peer consultation). Histopathology and at least 3 years (median 5 [IQR 4–6] years) of follow-up were used to establish the reference standard. The statistical analysis plan was prespecified with a primary hypothesis of non-inferiority (considering a margin of 0·05) and a secondary hypothesis of superiority towards the AI system, if non-inferiority was confirmed. This study was registered at ClinicalTrials.gov, NCT05489341. Findings:Of the 10 207 examinations included from Jan 1, 2012, through Dec 31, 2021, 2440 cases had histologically confirmed Gleason grade group 2 or greater prostate cancer. In the subset of 400 testing cases in which the AI system was compared with the radiologists participating in the reader study, the AI system showed a statistically superior and non-inferior AUROC of 0·91 (95% CI 0·87–0·94; p<0·0001), in comparison to the pool of 62 radiologists with an AUROC of 0·86 (0·83–0·89), with a lower boundary of the two-sided 95% Wald CI for the difference in AUROC of 0·02. At the mean PI-RADS 3 or greater operating point of all readers, the AI system detected 6·8% more cases with Gleason grade group 2 or greater cancers at the same specificity (57·7%, 95% CI 51·6–63·3), or 50·4% fewer false-positive results and 20·0% fewer cases with Gleason grade group 1 cancers at the same sensitivity (89·4%, 95% CI 85·3–92·9). In all 1000 testing cases where the AI system was compared with the radiology readings made during multidisciplinary practice, non-inferiority was not confirmed, as the AI system showed lower specificity (68·9% [95% CI 65·3–72·4] vs 69·0% [65·5–72·5]) at the same sensitivity (96·1%, 94·0–98·2) as the PI-RADS 3 or greater operating point. The lower boundary of the two-sided 95% Wald CI for the difference in specificity (−0·04) was greater than the non-inferiority margin (−0·05) and a p value below the significance threshold was reached (p<0·001).Interpretation: An AI system was superior to radiologists using PI-RADS (2.1), on average, at detecting clinically significant prostate cancer and comparable to the standard of care. Such a system shows the potential to be a supportive tool within a primary diagnostic setting, with several associated benefits for patients and radiologists. Prospective validation is needed to test clinical applicability of this system. </p