15 research outputs found

    Supervised Uncertainty Quantification for Segmentation with Multiple Annotations

    Full text link
    The accurate estimation of predictive uncertainty carries importance in medical scenarios such as lung node segmentation. Unfortunately, most existing works on predictive uncertainty do not return calibrated uncertainty estimates, which could be used in practice. In this work we exploit multi-grader annotation variability as a source of 'groundtruth' aleatoric uncertainty, which can be treated as a target in a supervised learning problem. We combine this groundtruth uncertainty with a Probabilistic U-Net and test on the LIDC-IDRI lung nodule CT dataset and MICCAI2012 prostate MRI dataset. We find that we are able to improve predictive uncertainty estimates. We also find that we can improve sample accuracy and sample diversity. In real-world applications, our method could inform doctors about the confidence of the segmentation results.Comment: MICCAI 2019. Fixed a few typo

    Quality control for more reliable integration of deep learning-based image segmentation into medical workflows

    Get PDF
    Machine learning algorithms underpin modern diagnostic-aiding software, whichhas proved valuable in clinical practice, particularly in radiology. However,inaccuracies, mainly due to the limited availability of clinical samples fortraining these algorithms, hamper their wider applicability, acceptance, andrecognition amongst clinicians. We present an analysis of state-of-the-artautomatic quality control (QC) approaches that can be implemented within thesealgorithms to estimate the certainty of their outputs. We validated the mostpromising approaches on a brain image segmentation task identifying whitematter hyperintensities (WMH) in magnetic resonance imaging data. WMH are acorrelate of small vessel disease common in mid-to-late adulthood and areparticularly challenging to segment due to their varied size, anddistributional patterns. Our results show that the aggregation of uncertaintyand Dice prediction were most effective in failure detection for this task.Both methods independently improved mean Dice from 0.82 to 0.84. Our workreveals how QC methods can help to detect failed segmentation cases andtherefore make automatic segmentation more reliable and suitable for clinicalpractice.<br

    Learning to Predict Error for MRI Reconstruction

    Get PDF
    In healthcare applications, predictive uncertainty has been used to assess predictive accuracy. In this paper, we demonstrate that predictive uncertainty estimated by the current methods does not highly correlate with prediction error by decomposing the latter into random and systematic errors, and showing that the former is equivalent to the variance of the random error. In addition, we observe that current methods unnecessarily compromise performance by modifying the model and training loss to estimate the target and uncertainty jointly. We show that estimating them separately without modifications improves performance. Following this, we propose a novel method that estimates the target labels and magnitude of the prediction error in two steps. We demonstrate this method on a large-scale MRI reconstruction task, and achieve significantly better results than the state-of-the-art uncertainty estimation methods.Comment: Accepted to MICCAI 202

    Baseline Photos and Confident Annotation Improve Automated Detection of Cutaneous Graft-Versus-Host Disease

    Get PDF
    Cutaneous erythema is used in diagnosis and response assessment of cutaneous chronic graft-versus-host disease (cGVHD). The development of objective erythema evaluation methods remains a challenge. We used a pre-trained neural network to segment cGVHD erythema by detecting changes relative to a patient’s registered baseline photo. We fixed this change detection algorithm on human annotations from a single photo pair, by using either a traditional approach or by marking definitely affected (“Do Not Miss”, DNM) and definitely unaffected skin (“Do Not Include”, DNI). The fixed algorithm was applied to each of the remaining 47 test photo pairs from six follow-up sessions of one patient. We used both the Dice index and the opinion of two board-certified dermatologists to evaluate the algorithm performance. The change detection algorithm correctly assigned 80% of the pixels, regardless of whether it was fixed on traditional (median accuracy: 0.77, interquartile range 0.62–0.87) or DNM/DNI segmentations (0.81, 0.65–0.89). When the algorithm was fixed on markings by different annotators, the DNM/DNI achieved more consistent outputs (median Dice indices: 0.94–0.96) than the traditional method (0.73–0.81). Compared to viewing only rash photos, the addition of baseline photos improved the reliability of dermatologists’ scoring. The inter-rater intraclass correlation coefficient increased from 0.19 (95% confidence interval lower bound: 0.06) to 0.51 (lower bound: 0.35). In conclusion, a change detection algorithm accurately assigned erythema in longitudinal photos of cGVHD. The reliability was significantly improved by exclusively using confident human segmentations to fix the algorithm. Baseline photos improved the agreement among two dermatologists in assessing algorithm performance
    corecore