10 research outputs found
Annotation-efficient cancer detection with report-guided lesion annotation for deep learning-based prostate cancer detection in bpMRI
Deep learning-based diagnostic performance increases with more annotated
data, but large-scale manual annotations are expensive and labour-intensive.
Experts evaluate diagnostic images during clinical routine, and write their
findings in reports. Leveraging unlabelled exams paired with clinical reports
could overcome the manual labelling bottleneck. We hypothesise that detection
models can be trained semi-supervised with automatic annotations generated
using model predictions, guided by sparse information from clinical reports. To
demonstrate efficacy, we train clinically significant prostate cancer (csPCa)
segmentation models, where automatic annotations are guided by the number of
clinically significant findings in the radiology reports. We included 7,756
prostate MRI examinations, of which 3,050 were manually annotated. We evaluated
prostate cancer detection performance on 300 exams from an external centre with
histopathology-confirmed ground truth. Semi-supervised training improved
patient-based diagnostic area under the receiver operating characteristic curve
from to () and improved
lesion-based sensitivity at one false positive per case from
to (). Semi-supervised training was 14 more
annotation-efficient for case-based performance and 6 more
annotation-efficient for lesion-based performance. This improved performance
demonstrates the feasibility of our training procedure. Source code is publicly
available at github.com/DIAGNijmegen/Report-Guided-Annotation. Best csPCa
detection algorithm is available at
grand-challenge.org/algorithms/bpmri-cspca-detection-report-guided-annotations/
Complexities of deep learning-based undersampled MR image reconstruction
Artificial intelligence has opened a new path of innovation in magnetic resonance (MR) image reconstruction of undersampled k-space acquisitions. This review offers readers an analysis of the current deep learning-based MR image reconstruction methods. The literature in this field shows exponential growth, both in volume and complexity, as the capabilities of machine learning in solving inverse problems such as image reconstruction are explored. We review the latest developments, aiming to assist researchers and radiologists who are developing new methods or seeking to provide valuable feedback. We shed light on key concepts by exploring the technical intricacies of MR image reconstruction, highlighting the importance of raw datasets and the difficulty of evaluating diagnostic value using standard metrics.Relevance statementIncreasingly complex algorithms output reconstructed images that are difficult to assess for robustness and diagnostic quality, necessitating high-quality datasets and collaboration with radiologists.Key points• Deep learning-based image reconstruction algorithms are increasing both in complexity and performance.• The evaluation of reconstructed images may mistake perceived image quality for diagnostic value.• Collaboration with radiologists is crucial for advancing deep learning technology.</p
Using deep learning to optimize the prostate MRI protocol by assessing the diagnostic efficacy of MRI sequences
Purpose: To explore diagnostic deep learning for optimizing the prostate MRI protocol by assessing the diagnostic efficacy of MRI sequences. Method: This retrospective study included 840 patients with a biparametric prostate MRI scan. The MRI protocol included a T2-weighted image, three DWI sequences (b50, b400, and b800 s/mm2), a calculated ADC map, and a calculated b1400 sequence. Two accelerated MRI protocols were simulated, using only two acquired b-values to calculate the ADC and b1400. Deep learning models were trained to detect prostate cancer lesions on accelerated and full protocols. The diagnostic performances of the protocols were compared on the patient-level with the area under the receiver operating characteristic (AUROC), using DeLong's test, and on the lesion-level with the partial area under the free response operating characteristic (pAUFROC), using a permutation test. Validation of the results was performed among expert radiologists. Results: No significant differences in diagnostic performance were found between the accelerated protocols and the full bpMRI baseline. Omitting b800 reduced 53% DWI scan time, with a performance difference of + 0.01 AUROC (p = 0.20) and −0.03 pAUFROC (p = 0.45). Omitting b400 reduced 32% DWI scan time, with a performance difference of −0.01 AUROC (p = 0.65) and + 0.01 pAUFROC (p = 0.73). Multiple expert radiologists underlined the findings. Conclusions: This study shows that deep learning can assess the diagnostic efficacy of MRI sequences by comparing prostate MRI protocols on diagnostic accuracy. Omitting either the b400 or the b800 DWI sequence can optimize the prostate MRI protocol by reducing scan time without compromising diagnostic quality.</p
Uncertainty-Aware Semi-Supervised Learning for Prostate MRI Zonal Segmentation
Quality of deep convolutional neural network predictions strongly depends on
the size of the training dataset and the quality of the annotations. Creating
annotations, especially for 3D medical image segmentation, is time-consuming
and requires expert knowledge. We propose a novel semi-supervised learning
(SSL) approach that requires only a relatively small number of annotations
while being able to use the remaining unlabeled data to improve model
performance. Our method uses a pseudo-labeling technique that employs recent
deep learning uncertainty estimation models. By using the estimated
uncertainty, we were able to rank pseudo-labels and automatically select the
best pseudo-annotations generated by the supervised model. We applied this to
prostate zonal segmentation in T2-weighted MRI scans. Our proposed model
outperformed the semi-supervised model in experiments with the ProstateX
dataset and an external test set, by leveraging only a subset of unlabeled data
rather than the full collection of 4953 cases, our proposed model demonstrated
improved performance. The segmentation dice similarity coefficient in the
transition zone and peripheral zone increased from 0.835 and 0.727 to 0.852 and
0.751, respectively, for fully supervised model and the uncertainty-aware
semi-supervised learning model (USSL). Our USSL model demonstrates the
potential to allow deep learning models to be trained on large datasets without
requiring full annotation. Our code is available at
https://github.com/DIAGNijmegen/prostateMR-USSL.Comment: 9 page
The PI-CAI Challenge: Public Training and Development Dataset
This dataset represents the PI-CAI: Public Training and Development Dataset. It contains 1500 anonymized prostate biparametric MRI scans from 1476 patients, acquired between 2012-2021, at three centers (Radboud University Medical Center, University Medical Center Groningen, Ziekenhuis Groep Twente) based in The Netherlands. The PI-CAI challenge is an all-new grand challenge that aims to validate the diagnostic performance of artificial intelligence and radiologists at clinically significant prostate cancer (csPCa) detection/diagnosis in MRI, with histopathology and follow-up (≥ 3 years) as the reference standard, in a retrospective setting. The study hypothesizes that state-of-the-art AI algorithms, trained using thousands of patient exams, are non-inferior to radiologists reading bpMRI. Key aspects of the PI-CAI study design have been established in conjunction with an international scientific advisory board of 16 experts in prostate AI, radiology and urology —to unify and standardize present-day guidelines, and to ensure meaningful validation of prostate AI towards clinical translation (Reinke et al., 2021)
Artificial intelligence and radiologists in prostate cancer detection on MRI (PI-CAI):an international, paired, non-inferiority, confirmatory study
Background: Artificial intelligence (AI) systems can potentially aid the diagnostic pathway of prostate cancer by alleviating the increasing workload, preventing overdiagnosis, and reducing the dependence on experienced radiologists. We aimed to investigate the performance of AI systems at detecting clinically significant prostate cancer on MRI in comparison with radiologists using the Prostate Imaging—Reporting and Data System version 2.1 (PI-RADS 2.1) and the standard of care in multidisciplinary routine practice at scale. Methods: In this international, paired, non-inferiority, confirmatory study, we trained and externally validated an AI system (developed within an international consortium) for detecting Gleason grade group 2 or greater cancers using a retrospective cohort of 10 207 MRI examinations from 9129 patients. Of these examinations, 9207 cases from three centres (11 sites) based in the Netherlands were used for training and tuning, and 1000 cases from four centres (12 sites) based in the Netherlands and Norway were used for testing. In parallel, we facilitated a multireader, multicase observer study with 62 radiologists (45 centres in 20 countries; median 7 [IQR 5–10] years of experience in reading prostate MRI) using PI-RADS (2.1) on 400 paired MRI examinations from the testing cohort. Primary endpoints were the sensitivity, specificity, and the area under the receiver operating characteristic curve (AUROC) of the AI system in comparison with that of all readers using PI-RADS (2.1) and in comparison with that of the historical radiology readings made during multidisciplinary routine practice (ie, the standard of care with the aid of patient history and peer consultation). Histopathology and at least 3 years (median 5 [IQR 4–6] years) of follow-up were used to establish the reference standard. The statistical analysis plan was prespecified with a primary hypothesis of non-inferiority (considering a margin of 0·05) and a secondary hypothesis of superiority towards the AI system, if non-inferiority was confirmed. This study was registered at ClinicalTrials.gov, NCT05489341. Findings:Of the 10 207 examinations included from Jan 1, 2012, through Dec 31, 2021, 2440 cases had histologically confirmed Gleason grade group 2 or greater prostate cancer. In the subset of 400 testing cases in which the AI system was compared with the radiologists participating in the reader study, the AI system showed a statistically superior and non-inferior AUROC of 0·91 (95% CI 0·87–0·94; p<0·0001), in comparison to the pool of 62 radiologists with an AUROC of 0·86 (0·83–0·89), with a lower boundary of the two-sided 95% Wald CI for the difference in AUROC of 0·02. At the mean PI-RADS 3 or greater operating point of all readers, the AI system detected 6·8% more cases with Gleason grade group 2 or greater cancers at the same specificity (57·7%, 95% CI 51·6–63·3), or 50·4% fewer false-positive results and 20·0% fewer cases with Gleason grade group 1 cancers at the same sensitivity (89·4%, 95% CI 85·3–92·9). In all 1000 testing cases where the AI system was compared with the radiology readings made during multidisciplinary practice, non-inferiority was not confirmed, as the AI system showed lower specificity (68·9% [95% CI 65·3–72·4] vs 69·0% [65·5–72·5]) at the same sensitivity (96·1%, 94·0–98·2) as the PI-RADS 3 or greater operating point. The lower boundary of the two-sided 95% Wald CI for the difference in specificity (−0·04) was greater than the non-inferiority margin (−0·05) and a p value below the significance threshold was reached (p<0·001).Interpretation: An AI system was superior to radiologists using PI-RADS (2.1), on average, at detecting clinically significant prostate cancer and comparable to the standard of care. Such a system shows the potential to be a supportive tool within a primary diagnostic setting, with several associated benefits for patients and radiologists. Prospective validation is needed to test clinical applicability of this system. </p
MIC-DKFZ/nnUNet: nnU-Net v2.2
Reworked inference code to be more flexible
Minor bug fixe
Artificial intelligence and radiologists in prostate cancer detection on MRI (PI-CAI):an international, paired, non-inferiority, confirmatory study
Background: Artificial intelligence (AI) systems can potentially aid the diagnostic pathway of prostate cancer by alleviating the increasing workload, preventing overdiagnosis, and reducing the dependence on experienced radiologists. We aimed to investigate the performance of AI systems at detecting clinically significant prostate cancer on MRI in comparison with radiologists using the Prostate Imaging—Reporting and Data System version 2.1 (PI-RADS 2.1) and the standard of care in multidisciplinary routine practice at scale. Methods: In this international, paired, non-inferiority, confirmatory study, we trained and externally validated an AI system (developed within an international consortium) for detecting Gleason grade group 2 or greater cancers using a retrospective cohort of 10 207 MRI examinations from 9129 patients. Of these examinations, 9207 cases from three centres (11 sites) based in the Netherlands were used for training and tuning, and 1000 cases from four centres (12 sites) based in the Netherlands and Norway were used for testing. In parallel, we facilitated a multireader, multicase observer study with 62 radiologists (45 centres in 20 countries; median 7 [IQR 5–10] years of experience in reading prostate MRI) using PI-RADS (2.1) on 400 paired MRI examinations from the testing cohort. Primary endpoints were the sensitivity, specificity, and the area under the receiver operating characteristic curve (AUROC) of the AI system in comparison with that of all readers using PI-RADS (2.1) and in comparison with that of the historical radiology readings made during multidisciplinary routine practice (ie, the standard of care with the aid of patient history and peer consultation). Histopathology and at least 3 years (median 5 [IQR 4–6] years) of follow-up were used to establish the reference standard. The statistical analysis plan was prespecified with a primary hypothesis of non-inferiority (considering a margin of 0·05) and a secondary hypothesis of superiority towards the AI system, if non-inferiority was confirmed. This study was registered at ClinicalTrials.gov, NCT05489341. Findings:Of the 10 207 examinations included from Jan 1, 2012, through Dec 31, 2021, 2440 cases had histologically confirmed Gleason grade group 2 or greater prostate cancer. In the subset of 400 testing cases in which the AI system was compared with the radiologists participating in the reader study, the AI system showed a statistically superior and non-inferior AUROC of 0·91 (95% CI 0·87–0·94; p<0·0001), in comparison to the pool of 62 radiologists with an AUROC of 0·86 (0·83–0·89), with a lower boundary of the two-sided 95% Wald CI for the difference in AUROC of 0·02. At the mean PI-RADS 3 or greater operating point of all readers, the AI system detected 6·8% more cases with Gleason grade group 2 or greater cancers at the same specificity (57·7%, 95% CI 51·6–63·3), or 50·4% fewer false-positive results and 20·0% fewer cases with Gleason grade group 1 cancers at the same sensitivity (89·4%, 95% CI 85·3–92·9). In all 1000 testing cases where the AI system was compared with the radiology readings made during multidisciplinary practice, non-inferiority was not confirmed, as the AI system showed lower specificity (68·9% [95% CI 65·3–72·4] vs 69·0% [65·5–72·5]) at the same sensitivity (96·1%, 94·0–98·2) as the PI-RADS 3 or greater operating point. The lower boundary of the two-sided 95% Wald CI for the difference in specificity (−0·04) was greater than the non-inferiority margin (−0·05) and a p value below the significance threshold was reached (p<0·001).Interpretation: An AI system was superior to radiologists using PI-RADS (2.1), on average, at detecting clinically significant prostate cancer and comparable to the standard of care. Such a system shows the potential to be a supportive tool within a primary diagnostic setting, with several associated benefits for patients and radiologists. Prospective validation is needed to test clinical applicability of this system. </p
Artificial intelligence and radiologists in prostate cancer detection on MRI (PI-CAI): an international, paired, non-inferiority, confirmatory study
Background Artificial intelligence (AI) systems can potentially aid the diagnostic pathway of prostate cancer by alleviating the increasing workload, preventing overdiagnosis, and reducing the dependence on experienced radiologists. We aimed to investigate the performance of AI systems at detecting clinically significant prostate cancer on MRI in comparison with radiologists using the Prostate Imaging-Reporting and Data System version 2.1 (PI-RADS 2.1) and the standard of care in multidisciplinary routine practice at scale. Methods In this international, paired, non -inferiority, confirmatory study, we trained and externally validated an AI system (developed within an international consortium) for detecting Gleason grade group 2 or greater cancers using a retrospective cohort of 10 207 MRI examinations from 9129 patients. Of these examinations, 9207 cases from three centres (11 sites) based in the Netherlands were used for training and tuning, and 1000 cases from four centres (12 sites) based in the Netherlands and Norway were used for testing. In parallel, we facilitated a multireader, multicase observer study with 62 radiologists (45 centres in 20 countries; median 7 [IQR 5-10] years of experience in reading prostate MRI) using PI-RADS (2.1) on 400 paired MRI examinations from the testing cohort. Primary endpoints were the sensitivity, specificity, and the area under the receiver operating characteristic curve (AUROC) of the AI system in comparison with that of all readers using PI-RADS (2.1) and in comparison with that of the historical radiology readings made during multidisciplinary routine practice (ie, the standard of care with the aid of patient history and peer consultation). Histopathology and at least 3 years (median 5 [IQR 4-6] years) of follow-up were used to establish the reference standard. The statistical analysis plan was prespecified with a primary hypothesis of noninferiority (considering a margin of 005) and a secondary hypothesis of superiority towards the AI system, if noninferiority was confirmed. This study was registered at ClinicalTrials.gov, NCT05489341. Findings Of the 10 207 examinations included from Jan 1, 2012, through Dec 31, 2021, 2440 cases had histologically confirmed Gleason grade group 2 or greater prostate cancer. In the subset of 400 testing cases in which the AI system was compared with the radiologists participating in the reader study, the AI system showed a statistically superior and non-inferior AUROC of 091 (95% CI 087-094; p<00001), in comparison to the pool of 62 radiologists with an AUROC of 086 (083-089), with a lower boundary of the two-sided 95% Wald CI for the difference in AUROC of 002. At the mean PI-RADS 3 or greater operating point of all readers, the AI system detected 6 8% more cases with Gleason grade group 2 or greater cancers at the same specificity (57 7%, 95% CI 516-633), or 50 4% fewer falsepositive results and 20 0% fewer cases with Gleason grade group 1 cancers at the same sensitivity (89 4%, 95% CI 853-929).In all 1000 testing cases where the AI system was compared with the radiology readings made during multidisciplinary practice, non -inferiority was not confirmed, as the AI system showed lower specificity (689% [95% CI 653-724] vs 69 0% [65 5-72 5]) at the same sensitivity (961%, 940-982) as the PI-RADS 3 or greater operating point. The lower boundary of the two-sided 95% Wald CI for the difference in specificity (-0 04) was greater than the non -inferiority margin (-005) and a p value below the significance threshold was reached (p<0 001). Interpretation An AI system was superior to radiologists using PI-RADS (2.1), on average, at detecting clinically significant prostate cancer and comparable to the standard of care. Such a system shows the potential to be a supportive tool within a primary diagnostic setting, with several associated benefits for patients and radiologists. Prospective validation is needed to test clinical applicability of this system