28,373 research outputs found
PHTrans: Parallelly Aggregating Global and Local Representations for Medical Image Segmentation
The success of Transformer in computer vision has attracted increasing
attention in the medical imaging community. Especially for medical image
segmentation, many excellent hybrid architectures based on convolutional neural
networks (CNNs) and Transformer have been presented and achieve impressive
performance. However, most of these methods, which embed modular Transformer
into CNNs, struggle to reach their full potential. In this paper, we propose a
novel hybrid architecture for medical image segmentation called PHTrans, which
parallelly hybridizes Transformer and CNN in main building blocks to produce
hierarchical representations from global and local features and adaptively
aggregate them, aiming to fully exploit their strengths to obtain better
segmentation performance. Specifically, PHTrans follows the U-shaped
encoder-decoder design and introduces the parallel hybird module in deep
stages, where convolution blocks and the modified 3D Swin Transformer learn
local features and global dependencies separately, then a sequence-to-volume
operation unifies the dimensions of the outputs to achieve feature aggregation.
Extensive experimental results on both Multi-Atlas Labeling Beyond the Cranial
Vault and Automated Cardiac Diagnosis Challeng datasets corroborate its
effectiveness, consistently outperforming state-of-the-art methods. The code is
available at: https://github.com/lseventeen/PHTrans.Comment: 10 pages, 3 figure
Attention Gated Networks: Learning to Leverage Salient Regions in Medical Images
We propose a novel attention gate (AG) model for medical image analysis that
automatically learns to focus on target structures of varying shapes and sizes.
Models trained with AGs implicitly learn to suppress irrelevant regions in an
input image while highlighting salient features useful for a specific task.
This enables us to eliminate the necessity of using explicit external
tissue/organ localisation modules when using convolutional neural networks
(CNNs). AGs can be easily integrated into standard CNN models such as VGG or
U-Net architectures with minimal computational overhead while increasing the
model sensitivity and prediction accuracy. The proposed AG models are evaluated
on a variety of tasks, including medical image classification and segmentation.
For classification, we demonstrate the use case of AGs in scan plane detection
for fetal ultrasound screening. We show that the proposed attention mechanism
can provide efficient object localisation while improving the overall
prediction performance by reducing false positives. For segmentation, the
proposed architecture is evaluated on two large 3D CT abdominal datasets with
manual annotations for multiple organs. Experimental results show that AG
models consistently improve the prediction performance of the base
architectures across different datasets and training sizes while preserving
computational efficiency. Moreover, AGs guide the model activations to be
focused around salient regions, which provides better insights into how model
predictions are made. The source code for the proposed AG models is publicly
available.Comment: Accepted for Medical Image Analysis (Special Issue on Medical Imaging
with Deep Learning). arXiv admin note: substantial text overlap with
arXiv:1804.03999, arXiv:1804.0533
A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images
Semantic segmentation is the pixel-wise labelling of an image. Since the
problem is defined at the pixel level, determining image class labels only is
not acceptable, but localising them at the original image pixel resolution is
necessary. Boosted by the extraordinary ability of convolutional neural
networks (CNN) in creating semantic, high level and hierarchical image
features; excessive numbers of deep learning-based 2D semantic segmentation
approaches have been proposed within the last decade. In this survey, we mainly
focus on the recent scientific developments in semantic segmentation,
specifically on deep learning-based methods using 2D images. We started with an
analysis of the public image sets and leaderboards for 2D semantic
segmantation, with an overview of the techniques employed in performance
evaluation. In examining the evolution of the field, we chronologically
categorised the approaches into three main periods, namely pre-and early deep
learning era, the fully convolutional era, and the post-FCN era. We technically
analysed the solutions put forward in terms of solving the fundamental problems
of the field, such as fine-grained localisation and scale invariance. Before
drawing our conclusions, we present a table of methods from all mentioned eras,
with a brief summary of each approach that explains their contribution to the
field. We conclude the survey by discussing the current challenges of the field
and to what extent they have been solved.Comment: Updated with new studie
- …