58 research outputs found
Linear Gaussian Bounding Box Representation and Ring-Shaped Rotated Convolution for Oriented Object Detection
In oriented object detection, current representations of oriented bounding
boxes (OBBs) often suffer from boundary discontinuity problem. Methods of
designing continuous regression losses do not essentially solve this problem.
Although Gaussian bounding box (GBB) representation avoids this problem,
directly regressing GBB is susceptible to numerical instability. We propose
linear GBB (LGBB), a novel OBB representation. By linearly transforming the
elements of GBB, LGBB avoids the boundary discontinuity problem and has high
numerical stability. In addition, existing convolution-based rotation-sensitive
feature extraction methods only have local receptive fields, resulting in slow
feature aggregation. We propose ring-shaped rotated convolution (RRC), which
adaptively rotates feature maps to arbitrary orientations to extract
rotation-sensitive features under a ring-shaped receptive field, rapidly
aggregating features and contextual information. Experimental results
demonstrate that LGBB and RRC achieve state-of-the-art performance.
Furthermore, integrating LGBB and RRC into various models effectively improves
detection accuracy
Adaptive Rotated Convolution for Rotated Object Detection
Rotated object detection aims to identify and locate objects in images with
arbitrary orientation. In this scenario, the oriented directions of objects
vary considerably across different images, while multiple orientations of
objects exist within an image. This intrinsic characteristic makes it
challenging for standard backbone networks to extract high-quality features of
these arbitrarily orientated objects. In this paper, we present Adaptive
Rotated Convolution (ARC) module to handle the aforementioned challenges. In
our ARC module, the convolution kernels rotate adaptively to extract object
features with varying orientations in different images, and an efficient
conditional computation mechanism is introduced to accommodate the large
orientation variations of objects within an image. The two designs work
seamlessly in rotated object detection problem. Moreover, ARC can conveniently
serve as a plug-and-play module in various vision backbones to boost their
representation ability to detect oriented objects accurately. Experiments on
commonly used benchmarks (DOTA and HRSC2016) demonstrate that equipped with our
proposed ARC module in the backbone network, the performance of multiple
popular oriented object detectors is significantly improved (e.g. +3.03% mAP on
Rotated RetinaNet and +4.16% on CFA). Combined with the highly competitive
method Oriented R-CNN, the proposed approach achieves state-of-the-art
performance on the DOTA dataset with 81.77% mAP
Spatial Transform Decoupling for Oriented Object Detection
Vision Transformers (ViTs) have achieved remarkable success in computer
vision tasks. However, their potential in rotation-sensitive scenarios has not
been fully explored, and this limitation may be inherently attributed to the
lack of spatial invariance in the data-forwarding process. In this study, we
present a novel approach, termed Spatial Transform Decoupling (STD), providing
a simple-yet-effective solution for oriented object detection with ViTs. Built
upon stacked ViT blocks, STD utilizes separate network branches to predict the
position, size, and angle of bounding boxes, effectively harnessing the spatial
transform potential of ViTs in a divide-and-conquer fashion. Moreover, by
aggregating cascaded activation masks (CAMs) computed upon the regressed
parameters, STD gradually enhances features within regions of interest (RoIs),
which complements the self-attention mechanism. Without bells and whistles, STD
achieves state-of-the-art performance on the benchmark datasets including
DOTA-v1.0 (82.24% mAP) and HRSC2016 (98.55% mAP), which demonstrates the
effectiveness of the proposed method. Source code is available at
https://github.com/yuhongtian17/Spatial-Transform-Decoupling
- …