29 research outputs found
BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation
Generating human action proposals in untrimmed videos is an important yet
challenging task with wide applications. Current methods often suffer from the
noisy boundary locations and the inferior quality of confidence scores used for
proposal retrieving. In this paper, we present BSN++, a new framework which
exploits complementary boundary regressor and relation modeling for temporal
proposal generation. First, we propose a novel boundary regressor based on the
complementary characteristics of both starting and ending boundary classifiers.
Specifically, we utilize the U-shaped architecture with nested skip connections
to capture rich contexts and introduce bi-directional boundary matching
mechanism to improve boundary precision. Second, to account for the
proposal-proposal relations ignored in previous methods, we devise a proposal
relation block to which includes two self-attention modules from the aspects of
position and channel. Furthermore, we find that there inevitably exists data
imbalanced problems in the positive/negative proposals and temporal durations,
which harm the model performance on tail distributions. To relieve this issue,
we introduce the scale-balanced re-sampling strategy. Extensive experiments are
conducted on two popular benchmarks: ActivityNet-1.3 and THUMOS14, which
demonstrate that BSN++ achieves the state-of-the-art performance.Comment: Submitted to AAAI21. arXiv admin note: substantial text overlap with
arXiv:2007.0988
Adaptive Rotated Convolution for Rotated Object Detection
Rotated object detection aims to identify and locate objects in images with
arbitrary orientation. In this scenario, the oriented directions of objects
vary considerably across different images, while multiple orientations of
objects exist within an image. This intrinsic characteristic makes it
challenging for standard backbone networks to extract high-quality features of
these arbitrarily orientated objects. In this paper, we present Adaptive
Rotated Convolution (ARC) module to handle the aforementioned challenges. In
our ARC module, the convolution kernels rotate adaptively to extract object
features with varying orientations in different images, and an efficient
conditional computation mechanism is introduced to accommodate the large
orientation variations of objects within an image. The two designs work
seamlessly in rotated object detection problem. Moreover, ARC can conveniently
serve as a plug-and-play module in various vision backbones to boost their
representation ability to detect oriented objects accurately. Experiments on
commonly used benchmarks (DOTA and HRSC2016) demonstrate that equipped with our
proposed ARC module in the backbone network, the performance of multiple
popular oriented object detectors is significantly improved (e.g. +3.03% mAP on
Rotated RetinaNet and +4.16% on CFA). Combined with the highly competitive
method Oriented R-CNN, the proposed approach achieves state-of-the-art
performance on the DOTA dataset with 81.77% mAP