25,539 research outputs found
Towards Multi-class Object Detection in Unconstrained Remote Sensing Imagery
Automatic multi-class object detection in remote sensing images in
unconstrained scenarios is of high interest for several applications including
traffic monitoring and disaster management. The huge variation in object scale,
orientation, category, and complex backgrounds, as well as the different camera
sensors pose great challenges for current algorithms. In this work, we propose
a new method consisting of a novel joint image cascade and feature pyramid
network with multi-size convolution kernels to extract multi-scale strong and
weak semantic features. These features are fed into rotation-based region
proposal and region of interest networks to produce object detections. Finally,
rotational non-maximum suppression is applied to remove redundant detections.
During training, we minimize joint horizontal and oriented bounding box loss
functions, as well as a novel loss that enforces oriented boxes to be
rectangular. Our method achieves 68.16% mAP on horizontal and 72.45% mAP on
oriented bounding box detection tasks on the challenging DOTA dataset,
outperforming all published methods by a large margin (+6% and +12% absolute
improvement, respectively). Furthermore, it generalizes to two other datasets,
NWPU VHR-10 and UCAS-AOD, and achieves competitive results with the baselines
even when trained on DOTA. Our method can be deployed in multi-class object
detection applications, regardless of the image and object scales and
orientations, making it a great choice for unconstrained aerial and satellite
imagery.Comment: ACCV 201
Multi-Scale Attention Networks for Pavement Defect Detection
Pavement defects such as cracks, net cracks, and pit slots can cause potential traffic safety problems. The timely detection and identification play a key role in reducing the harm of various pavement defects. Particularly, the recent development in deep learning-based CNNs has shown competitive performance in image detection and classification. To detect pavement defects automatically and improve effects, a multi-scale mobile attention-based network, which we termed MANet, is proposed to perform the detection of pavement defects. The architecture of the encoder-decoder is used in MANet, where the encoder adopts the MobileNet as the backbone network to extract pavement defect features. Instead of the original 3×3 convolution, the multi-scale convolution kernels are utilized in depth-wise separable convolution layers of the network. Further, the hybrid attention mechanism is separately incorporated into the encoder and decoder modules to infer the significance of spatial points and inter-channel relationship features for the input intermediate feature maps. The proposed approach achieves state-of-the-art performance on two publicly-available benchmark datasets, i.e., the Crack500 (500 crack images with 2,000×1,500 pixels) and CFD (118 crack images with 480×320 pixels) datasets. The mean intersection over union ( MIoU ) of the proposed approach on these two datasets reaches 0.7219 and 0.7788, respectively. Ablation experiments show that the multi-scale convolution and hybrid attention modules can effectively help the model extract high-level feature representations and generate more accurate pavement crack segmentation results. We further test the model on locally collected pavement crack images (131 images with 1024×768 pixels) and it achieves a satisfactory result. The proposed approach realizes the MIoU of 0.6514 on the local dataset and outperforms other compared baseline methods. Experimental findings demonstrate the validity and feasibility of the proposed approach and it provides a viable solution for pavement crack detection in practical application scenarios. Our code is available at https://github.com/xtu502/pavement-defects
M^2UNet: MetaFormer Multi-scale Upsampling Network for Polyp Segmentation
Polyp segmentation has recently garnered significant attention, and multiple
methods have been formulated to achieve commendable outcomes. However, these
techniques often confront difficulty when working with the complex polyp
foreground and their surrounding regions because of the nature of convolution
operation. Besides, most existing methods forget to exploit the potential
information from multiple decoder stages. To address this challenge, we suggest
combining MetaFormer, introduced as a baseline for integrating CNN and
Transformer, with UNet framework and incorporating our Multi-scale Upsampling
block (MU). This simple module makes it possible to combine multi-level
information by exploring multiple receptive field paths of the shallow decoder
stage and then adding with the higher stage to aggregate better feature
representation, which is essential in medical image segmentation. Taken all
together, we propose MetaFormer Multi-scale Upsampling Network (MUNet) for
the polyp segmentation task. Extensive experiments on five benchmark datasets
demonstrate that our method achieved competitive performance compared with
several previous methods
- …