519 research outputs found
Attention Gated Networks: Learning to Leverage Salient Regions in Medical Images
We propose a novel attention gate (AG) model for medical image analysis that
automatically learns to focus on target structures of varying shapes and sizes.
Models trained with AGs implicitly learn to suppress irrelevant regions in an
input image while highlighting salient features useful for a specific task.
This enables us to eliminate the necessity of using explicit external
tissue/organ localisation modules when using convolutional neural networks
(CNNs). AGs can be easily integrated into standard CNN models such as VGG or
U-Net architectures with minimal computational overhead while increasing the
model sensitivity and prediction accuracy. The proposed AG models are evaluated
on a variety of tasks, including medical image classification and segmentation.
For classification, we demonstrate the use case of AGs in scan plane detection
for fetal ultrasound screening. We show that the proposed attention mechanism
can provide efficient object localisation while improving the overall
prediction performance by reducing false positives. For segmentation, the
proposed architecture is evaluated on two large 3D CT abdominal datasets with
manual annotations for multiple organs. Experimental results show that AG
models consistently improve the prediction performance of the base
architectures across different datasets and training sizes while preserving
computational efficiency. Moreover, AGs guide the model activations to be
focused around salient regions, which provides better insights into how model
predictions are made. The source code for the proposed AG models is publicly
available.Comment: Accepted for Medical Image Analysis (Special Issue on Medical Imaging
with Deep Learning). arXiv admin note: substantial text overlap with
arXiv:1804.03999, arXiv:1804.0533
Automated Retinal Lesion Detection via Image Saliency Analysis
Background and objective:The detection of abnormalities such as lesions or leakage from retinal images is an important health informatics task for automated early diagnosis of diabetic and malarial retinopathy or other eye diseases, in order to prevent blindness and common systematic conditions. In this work, we propose a novel retinal lesion detection method by adapting the concepts of saliency. Methods :Retinal images are firstly segmented as superpixels, two new saliency feature representations: uniqueness and compactness, are then derived to represent the superpixels. The pixel level saliency is then estimated from these superpixel saliency values via a bilateral filter. These extracted saliency features form a matrix for low-rank analysis to achieve saliency detection. The precise contour of a lesion is finally extracted from the generated saliency map after removing confounding structures such as blood vessels, the optic disc, and the fovea. The main novelty of this method is that it is an effective tool for detecting different abnormalities at pixel-level from different modalities of retinal images, without the need to tune parameters. Results:To evaluate its effectiveness, we have applied our method to seven public datasets of diabetic and malarial retinopathy with four different types of lesions: exudate, hemorrhage, microaneurysms, and leakage. The evaluation was undertaken at pixel-level, lesion-level, or image-level according to ground truth availability in these datasets. Conclusions:The experimental results show that the proposed method outperforms existing state-of-the-art ones in applicability, effectiveness, and accuracy
RGBT Salient Object Detection: A Large-scale Dataset and Benchmark
Salient object detection in complex scenes and environments is a challenging
research topic. Most works focus on RGB-based salient object detection, which
limits its performance of real-life applications when confronted with adverse
conditions such as dark environments and complex backgrounds. Taking advantage
of RGB and thermal infrared images becomes a new research direction for
detecting salient object in complex scenes recently, as thermal infrared
spectrum imaging provides the complementary information and has been applied to
many computer vision tasks. However, current research for RGBT salient object
detection is limited by the lack of a large-scale dataset and comprehensive
benchmark. This work contributes such a RGBT image dataset named VT5000,
including 5000 spatially aligned RGBT image pairs with ground truth
annotations. VT5000 has 11 challenges collected in different scenes and
environments for exploring the robustness of algorithms. With this dataset, we
propose a powerful baseline approach, which extracts multi-level features
within each modality and aggregates these features of all modalities with the
attention mechanism, for accurate RGBT salient object detection. Extensive
experiments show that the proposed baseline approach outperforms the
state-of-the-art methods on VT5000 dataset and other two public datasets. In
addition, we carry out a comprehensive analysis of different algorithms of RGBT
salient object detection on VT5000 dataset, and then make several valuable
conclusions and provide some potential research directions for RGBT salient
object detection.Comment: 12 pages, 10 figures
https://github.com/lz118/RGBT-Salient-Object-Detectio
CLIPUNetr: Assisting Human-robot Interface for Uncalibrated Visual Servoing Control with CLIP-driven Referring Expression Segmentation
The classical human-robot interface in uncalibrated image-based visual
servoing (UIBVS) relies on either human annotations or semantic segmentation
with categorical labels. Both methods fail to match natural human communication
and convey rich semantics in manipulation tasks as effectively as natural
language expressions. In this paper, we tackle this problem by using referring
expression segmentation, which is a prompt-based approach, to provide more
in-depth information for robot perception. To generate high-quality
segmentation predictions from referring expressions, we propose CLIPUNetr - a
new CLIP-driven referring expression segmentation network. CLIPUNetr leverages
CLIP's strong vision-language representations to segment regions from referring
expressions, while utilizing its ``U-shaped'' encoder-decoder architecture to
generate predictions with sharper boundaries and finer structures. Furthermore,
we propose a new pipeline to integrate CLIPUNetr into UIBVS and apply it to
control robots in real-world environments. In experiments, our method improves
boundary and structure measurements by an average of 120% and can successfully
assist real-world UIBVS control in an unstructured manipulation environment
- …