27 research outputs found
MuraNet: Multi-task Floor Plan Recognition with Relation Attention
The recognition of information in floor plan data requires the use of
detection and segmentation models. However, relying on several single-task
models can result in ineffective utilization of relevant information when there
are multiple tasks present simultaneously. To address this challenge, we
introduce MuraNet, an attention-based multi-task model for segmentation and
detection tasks in floor plan data. In MuraNet, we adopt a unified encoder
called MURA as the backbone with two separated branches: an enhanced
segmentation decoder branch and a decoupled detection head branch based on
YOLOX, for segmentation and detection tasks respectively. The architecture of
MuraNet is designed to leverage the fact that walls, doors, and windows usually
constitute the primary structure of a floor plan's architecture. By jointly
training the model on both detection and segmentation tasks, we believe MuraNet
can effectively extract and utilize relevant features for both tasks. Our
experiments on the CubiCasa5k public dataset show that MuraNet improves
convergence speed during training compared to single-task models like U-Net and
YOLOv3. Moreover, we observe improvements in the average AP and IoU in
detection and segmentation tasks, respectively.Our ablation experiments
demonstrate that the attention-based unified backbone of MuraNet achieves
better feature extraction in floor plan recognition tasks, and the use of
decoupled multi-head branches for different tasks further improves model
performance. We believe that our proposed MuraNet model can address the
disadvantages of single-task models and improve the accuracy and efficiency of
floor plan data recognition.Comment: Document Analysis and Recognition - ICDAR 2023 Workshops. ICDAR 2023.
Lecture Notes in Computer Science, vol 14193. Springer, Cha
CAT: Learning to Collaborate Channel and Spatial Attention from Multi-Information Fusion
Channel and spatial attention mechanism has proven to provide an evident
performance boost of deep convolution neural networks (CNNs). Most existing
methods focus on one or run them parallel (series), neglecting the
collaboration between the two attentions. In order to better establish the
feature interaction between the two types of attention, we propose a
plug-and-play attention module, which we term "CAT"-activating the
Collaboration between spatial and channel Attentions based on learned Traits.
Specifically, we represent traits as trainable coefficients (i.e.,
colla-factors) to adaptively combine contributions of different attention
modules to fit different image hierarchies and tasks better. Moreover, we
propose the global entropy pooling (GEP) apart from global average pooling
(GAP) and global maximum pooling (GMP) operators, an effective component in
suppressing noise signals by measuring the information disorder of feature
maps. We introduce a three-way pooling operation into attention modules and
apply the adaptive mechanism to fuse their outcomes. Extensive experiments on
MS COCO, Pascal-VOC, Cifar-100, and ImageNet show that our CAT outperforms
existing state-of-the-art attention mechanisms in object detection, instance
segmentation, and image classification. The model and code will be released
soon.Comment: 8 pages, 5 figure