11,101 research outputs found
Cross-CBAM: A Lightweight network for Scene Segmentation
Scene parsing is a great challenge for real-time semantic segmentation.
Although traditional semantic segmentation networks have made remarkable
leap-forwards in semantic accuracy, the performance of inference speed is
unsatisfactory. Meanwhile, this progress is achieved with fairly large networks
and powerful computational resources. However, it is difficult to run extremely
large models on edge computing devices with limited computing power, which
poses a huge challenge to the real-time semantic segmentation tasks. In this
paper, we present the Cross-CBAM network, a novel lightweight network for
real-time semantic segmentation. Specifically, a Squeeze-and-Excitation Atrous
Spatial Pyramid Pooling Module(SE-ASPP) is proposed to get variable
field-of-view and multiscale information. And we propose a Cross Convolutional
Block Attention Module(CCBAM), in which a cross-multiply operation is employed
in the CCBAM module to make high-level semantic information guide low-level
detail information. Different from previous work, these works use attention to
focus on the desired information in the backbone. CCBAM uses cross-attention
for feature fusion in the FPN structure. Extensive experiments on the
Cityscapes dataset and Camvid dataset demonstrate the effectiveness of the
proposed Cross-CBAM model by achieving a promising trade-off between
segmentation accuracy and inference speed. On the Cityscapes test set, we
achieve 73.4% mIoU with a speed of 240.9FPS and 77.2% mIoU with a speed of
88.6FPS on NVIDIA GTX 1080Ti
A2-FPN for semantic segmentation of fine-resolution remotely sensed images
The thriving development of earth observation technology makes more and more high-resolution remote-sensing images easy to obtain. However, caused by fine-resolution, the huge spatial and spectral complexity leads to the automation of semantic segmentation becoming a challenging task. Addressing such an issue represents an exciting research field, which paves the way for scene-level landscape pattern analysis and decision-making. To tackle this problem, we propose an approach for automatic land segmentation based on the Feature Pyramid Network (FPN). As a classic architecture, FPN can build a feature pyramid with high-level semantics throughout. However, intrinsic defects in feature extraction and fusion hinder FPN from further aggregating more discriminative features. Hence, we propose an Attention Aggregation Module (AAM) to enhance multiscale feature learning through attention-guided feature aggregation. Based on FPN and AAM, a novel framework named Attention Aggregation Feature Pyramid Network (A2-FPN) is developed for semantic segmentation of fine-resolution remotely sensed images. Extensive experiments conducted on four datasets demonstrate the effectiveness of our A2-FPN in segmentation accuracy. Code is available at https://github.com/lironui/A2-FPN
GFF: Gated Fully Fusion for Semantic Segmentation
Semantic segmentation generates comprehensive understanding of scenes through
densely predicting the category for each pixel. High-level features from Deep
Convolutional Neural Networks already demonstrate their effectiveness in
semantic segmentation tasks, however the coarse resolution of high-level
features often leads to inferior results for small/thin objects where detailed
information is important. It is natural to consider importing low level
features to compensate for the lost detailed information in high-level
features.Unfortunately, simply combining multi-level features suffers from the
semantic gap among them. In this paper, we propose a new architecture, named
Gated Fully Fusion (GFF), to selectively fuse features from multiple levels
using gates in a fully connected way. Specifically, features at each level are
enhanced by higher-level features with stronger semantics and lower-level
features with more details, and gates are used to control the propagation of
useful information which significantly reduces the noises during fusion. We
achieve the state of the art results on four challenging scene parsing datasets
including Cityscapes, Pascal Context, COCO-stuff and ADE20K.Comment: accepted by AAAI-2020(oral
- …