15 research outputs found

    An Alarm System For Segmentation Algorithm Based On Shape Model

    Full text link
    It is usually hard for a learning system to predict correctly on rare events that never occur in the training data, and there is no exception for segmentation algorithms. Meanwhile, manual inspection of each case to locate the failures becomes infeasible due to the trend of large data scale and limited human resource. Therefore, we build an alarm system that will set off alerts when the segmentation result is possibly unsatisfactory, assuming no corresponding ground truth mask is provided. One plausible solution is to project the segmentation results into a low dimensional feature space; then learn classifiers/regressors to predict their qualities. Motivated by this, in this paper, we learn a feature space using the shape information which is a strong prior shared among different datasets and robust to the appearance variation of input data.The shape feature is captured using a Variational Auto-Encoder (VAE) network that trained with only the ground truth masks. During testing, the segmentation results with bad shapes shall not fit the shape prior well, resulting in large loss values. Thus, the VAE is able to evaluate the quality of segmentation result on unseen data, without using ground truth. Finally, we learn a regressor in the one-dimensional feature space to predict the qualities of segmentation results. Our alarm system is evaluated on several recent state-of-art segmentation algorithms for 3D medical segmentation tasks. Compared with other standard quality assessment methods, our system consistently provides more reliable prediction on the qualities of segmentation results.Comment: Accepted to ICCV 2019 (10 pages, 4 figures

    FUSQA: Fetal Ultrasound Segmentation Quality Assessment

    Full text link
    Deep learning models have been effective for various fetal ultrasound segmentation tasks. However, generalization to new unseen data has raised questions about their effectiveness for clinical adoption. Normally, a transition to new unseen data requires time-consuming and costly quality assurance processes to validate the segmentation performance post-transition. Segmentation quality assessment efforts have focused on natural images, where the problem has been typically formulated as a dice score regression task. In this paper, we propose a simplified Fetal Ultrasound Segmentation Quality Assessment (FUSQA) model to tackle the segmentation quality assessment when no masks exist to compare with. We formulate the segmentation quality assessment process as an automated classification task to distinguish between good and poor-quality segmentation masks for more accurate gestational age estimation. We validate the performance of our proposed approach on two datasets we collect from two hospitals using different ultrasound machines. We compare different architectures, with our best-performing architecture achieving over 90% classification accuracy on distinguishing between good and poor-quality segmentation masks from an unseen dataset. Additionally, there was only a 1.45-day difference between the gestational age reported by doctors and estimated based on CRL measurements using well-segmented masks. On the other hand, this difference increased and reached up to 7.73 days when we calculated CRL from the poorly segmented masks. As a result, AI-based approaches can potentially aid fetal ultrasound segmentation quality assessment and might detect poor segmentation in real-time screening in the future.Comment: 13 pages, 3 figures, 3 table

    Analyzing the Quality and Challenges of Uncertainty Estimations for Brain Tumor Segmentation

    Get PDF
    Automatic segmentation of brain tumors has the potential to enable volumetric measures and high-throughput analysis in the clinical setting. Reaching this potential seems almost achieved, considering the steady increase in segmentation accuracy. However, despite segmentation accuracy, the current methods still do not meet the robustness levels required for patient-centered clinical use. In this regard, uncertainty estimates are a promising direction to improve the robustness of automated segmentation systems. Different uncertainty estimation methods have been proposed, but little is known about their usefulness and limitations for brain tumor segmentation. In this study, we present an analysis of the most commonly used uncertainty estimation methods in regards to benefits and challenges for brain tumor segmentation. We evaluated their quality in terms of calibration, segmentation error localization, and segmentation failure detection. Our results show that the uncertainty methods are typically well-calibrated when evaluated at the dataset level. Evaluated at the subject level, we found notable miscalibrations and limited segmentation error localization (e.g., for correcting segmentations), which hinder the direct use of the voxel-wise uncertainties. Nevertheless, voxel-wise uncertainty showed value to detect failed segmentations when uncertainty estimates are aggregated at the subject level. Therefore, we suggest a careful usage of voxel-wise uncertainty measures and highlight the importance of developing solutions that address the subject-level requirements on calibration and segmentation error localization

    A Deep Learning Pipeline for Assessing Ventricular Volumes from a Cardiac Magnetic Resonance Image Registry of Single Ventricle Patients

    Get PDF
    Purpose: To develop an end-to-end deep learning (DL) pipeline for automated ventricular segmentation of cardiac MRI data from a multicenter registry of patients with Fontan circulation (FORCE). / Materials and Methods: This retrospective study used 250 cardiac MRI examinations (November 2007–December 2022) from 13 institutions for training, validation, and testing. The pipeline contained three DL models: a classifier to identify short-axis cine stacks and two UNet 3+ models for image cropping and segmentation. The automated segmentations were evaluated on the test set (n = 50) using the Dice score. Volumetric and functional metrics derived from DL and ground truth manual segmentations were compared using Bland-Altman and intraclass correlation analysis. The pipeline was further qualitatively evaluated on 475 unseen examinations. / Results: There were acceptable limits of agreement (LOA) and minimal biases between the ground truth and DL end-diastolic volume (EDV) (Bias: -0.6 mL/m2, LOA: -20.6–19.5 mL/m2), and end-systolic volume (ESV) (Bias: - 1.1 mL/m2, LOA: -18.1–15.9 mL/m2), with high intraclass correlation coefficients (ICC > 0.97) and Dice scores (EDV, 0.91 and ESV, 0.86). There was moderate agreement for ventricular mass (Bias: -1.9 g/m2, LOA: -17.3–13.5 g/m2) and a ICC (0.94). There was also acceptable agreement for stroke volume (Bias:0.6 mL/m2, LOA: -17.2–18.3 mL/m2) and ejection fraction (Bias:0.6%, LOA: -12.2%–13.4%), with high ICCs (> 0.81). The pipeline achieved satisfactory segmentation in 68% of the 475 unseen examinations, while 26% needed minor adjustments, 5% needed major adjustments, and in 0.4%, the cropping model failed. / Conclusion: The DL pipeline can provide fast standardized segmentation for patients with single ventricle physiology across multiple centers. This pipeline can be applied to all cardiac MRI examinations in the FORCE registry

    Influence of contrast and texture based image modifications on the performance and attention shift of U-Net models for brain tissue segmentation.

    Get PDF
    Contrast and texture modifications applied during training or test-time have recently shown promising results to enhance the generalization performance of deep learning segmentation methods in medical image analysis. However, a deeper understanding of this phenomenon has not been investigated. In this study, we investigated this phenomenon using a controlled experimental setting, using datasets from the Human Connectome Project and a large set of simulated MR protocols, in order to mitigate data confounders and investigate possible explanations as to why model performance changes when applying different levels of contrast and texture-based modifications. Our experiments confirm previous findings regarding the improved performance of models subjected to contrast and texture modifications employed during training and/or testing time, but further show the interplay when these operations are combined, as well as the regimes of model improvement/worsening across scanning parameters. Furthermore, our findings demonstrate a spatial attention shift phenomenon of trained models, occurring for different levels of model performance, and varying in relation to the type of applied image modification
    corecore