519 research outputs found

    Attention Gated Networks: Learning to Leverage Salient Regions in Medical Images

    Get PDF
    We propose a novel attention gate (AG) model for medical image analysis that automatically learns to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ localisation modules when using convolutional neural networks (CNNs). AGs can be easily integrated into standard CNN models such as VGG or U-Net architectures with minimal computational overhead while increasing the model sensitivity and prediction accuracy. The proposed AG models are evaluated on a variety of tasks, including medical image classification and segmentation. For classification, we demonstrate the use case of AGs in scan plane detection for fetal ultrasound screening. We show that the proposed attention mechanism can provide efficient object localisation while improving the overall prediction performance by reducing false positives. For segmentation, the proposed architecture is evaluated on two large 3D CT abdominal datasets with manual annotations for multiple organs. Experimental results show that AG models consistently improve the prediction performance of the base architectures across different datasets and training sizes while preserving computational efficiency. Moreover, AGs guide the model activations to be focused around salient regions, which provides better insights into how model predictions are made. The source code for the proposed AG models is publicly available.Comment: Accepted for Medical Image Analysis (Special Issue on Medical Imaging with Deep Learning). arXiv admin note: substantial text overlap with arXiv:1804.03999, arXiv:1804.0533

    Automated Retinal Lesion Detection via Image Saliency Analysis

    Get PDF
    Background and objective:The detection of abnormalities such as lesions or leakage from retinal images is an important health informatics task for automated early diagnosis of diabetic and malarial retinopathy or other eye diseases, in order to prevent blindness and common systematic conditions. In this work, we propose a novel retinal lesion detection method by adapting the concepts of saliency. Methods :Retinal images are firstly segmented as superpixels, two new saliency feature representations: uniqueness and compactness, are then derived to represent the superpixels. The pixel level saliency is then estimated from these superpixel saliency values via a bilateral filter. These extracted saliency features form a matrix for low-rank analysis to achieve saliency detection. The precise contour of a lesion is finally extracted from the generated saliency map after removing confounding structures such as blood vessels, the optic disc, and the fovea. The main novelty of this method is that it is an effective tool for detecting different abnormalities at pixel-level from different modalities of retinal images, without the need to tune parameters. Results:To evaluate its effectiveness, we have applied our method to seven public datasets of diabetic and malarial retinopathy with four different types of lesions: exudate, hemorrhage, microaneurysms, and leakage. The evaluation was undertaken at pixel-level, lesion-level, or image-level according to ground truth availability in these datasets. Conclusions:The experimental results show that the proposed method outperforms existing state-of-the-art ones in applicability, effectiveness, and accuracy

    RGBT Salient Object Detection: A Large-scale Dataset and Benchmark

    Full text link
    Salient object detection in complex scenes and environments is a challenging research topic. Most works focus on RGB-based salient object detection, which limits its performance of real-life applications when confronted with adverse conditions such as dark environments and complex backgrounds. Taking advantage of RGB and thermal infrared images becomes a new research direction for detecting salient object in complex scenes recently, as thermal infrared spectrum imaging provides the complementary information and has been applied to many computer vision tasks. However, current research for RGBT salient object detection is limited by the lack of a large-scale dataset and comprehensive benchmark. This work contributes such a RGBT image dataset named VT5000, including 5000 spatially aligned RGBT image pairs with ground truth annotations. VT5000 has 11 challenges collected in different scenes and environments for exploring the robustness of algorithms. With this dataset, we propose a powerful baseline approach, which extracts multi-level features within each modality and aggregates these features of all modalities with the attention mechanism, for accurate RGBT salient object detection. Extensive experiments show that the proposed baseline approach outperforms the state-of-the-art methods on VT5000 dataset and other two public datasets. In addition, we carry out a comprehensive analysis of different algorithms of RGBT salient object detection on VT5000 dataset, and then make several valuable conclusions and provide some potential research directions for RGBT salient object detection.Comment: 12 pages, 10 figures https://github.com/lz118/RGBT-Salient-Object-Detectio

    CLIPUNetr: Assisting Human-robot Interface for Uncalibrated Visual Servoing Control with CLIP-driven Referring Expression Segmentation

    Full text link
    The classical human-robot interface in uncalibrated image-based visual servoing (UIBVS) relies on either human annotations or semantic segmentation with categorical labels. Both methods fail to match natural human communication and convey rich semantics in manipulation tasks as effectively as natural language expressions. In this paper, we tackle this problem by using referring expression segmentation, which is a prompt-based approach, to provide more in-depth information for robot perception. To generate high-quality segmentation predictions from referring expressions, we propose CLIPUNetr - a new CLIP-driven referring expression segmentation network. CLIPUNetr leverages CLIP's strong vision-language representations to segment regions from referring expressions, while utilizing its ``U-shaped'' encoder-decoder architecture to generate predictions with sharper boundaries and finer structures. Furthermore, we propose a new pipeline to integrate CLIPUNetr into UIBVS and apply it to control robots in real-world environments. In experiments, our method improves boundary and structure measurements by an average of 120% and can successfully assist real-world UIBVS control in an unstructured manipulation environment
    • …
    corecore