1,358 research outputs found
Detecting events and key actors in multi-person videos
Multi-person event recognition is a challenging task, often with many people
active in the scene but only a small subset contributing to an actual event. In
this paper, we propose a model which learns to detect events in such videos
while automatically "attending" to the people responsible for the event. Our
model does not use explicit annotations regarding who or where those people are
during training and testing. In particular, we track people in videos and use a
recurrent neural network (RNN) to represent the track features. We learn
time-varying attention weights to combine these features at each time-instant.
The attended features are then processed using another RNN for event
detection/classification. Since most video datasets with multiple people are
restricted to a small number of videos, we also collected a new basketball
dataset comprising 257 basketball games with 14K event annotations
corresponding to 11 event classes. Our model outperforms state-of-the-art
methods for both event classification and detection on this new dataset.
Additionally, we show that the attention mechanism is able to consistently
localize the relevant players.Comment: Accepted for publication in CVPR'1
Parallel Residual Bi-Fusion Feature Pyramid Network for Accurate Single-Shot Object Detection
We propose the Parallel Residual Bi-Fusion Feature Pyramid Network (PRB-FPN)
for fast and accurate single-shot object detection. Feature Pyramid (FP) is
widely used in recent visual detection, however the top-down pathway of FP
cannot preserve accurate localization due to pooling shifting. The advantage of
FP is weaken as deeper backbones with more layers are used. To address this
issue, we propose a new parallel FP structure with bi-directional (top-down and
bottom-up) fusion and associated improvements to retain high-quality features
for accurate localization. Our method is particularly suitable for detecting
small objects. We provide the following design improvements: (1) A parallel
bifusion FP structure with a Bottom-up Fusion Module (BFM) to detect both small
and large objects at once with high accuracy. (2) A COncatenation and
RE-organization (CORE) module provides a bottom-up pathway for feature fusion,
which leads to the bi-directional fusion FP that can recover lost information
from lower-layer feature maps. (3) The CORE feature is further purified to
retain richer contextual information. Such purification is performed with CORE
in a few iterations in both top-down and bottom-up pathways. (4) The adding of
a residual design to CORE leads to a new Re-CORE module that enables easy
training and integration with a wide range of (deeper or lighter) backbones.
The proposed network achieves state-of-the-art performance on UAVDT17 and MS
COCO datasets.Comment: accepted by IEEE transactions on Image Processin
Single-Shot Two-Pronged Detector with Rectified IoU Loss
In the CNN based object detectors, feature pyramids are widely exploited to
alleviate the problem of scale variation across object instances. These object
detectors, which strengthen features via a top-down pathway and lateral
connections, are mainly to enrich the semantic information of low-level
features, but ignore the enhancement of high-level features. This can lead to
an imbalance between different levels of features, in particular a serious lack
of detailed information in the high-level features, which makes it difficult to
get accurate bounding boxes. In this paper, we introduce a novel two-pronged
transductive idea to explore the relationship among different layers in both
backward and forward directions, which can enrich the semantic information of
low-level features and detailed information of high-level features at the same
time. Under the guidance of the two-pronged idea, we propose a Two-Pronged
Network (TPNet) to achieve bidirectional transfer between high-level features
and low-level features, which is useful for accurately detecting object at
different scales. Furthermore, due to the distribution imbalance between the
hard and easy samples in single-stage detectors, the gradient of localization
loss is always dominated by the hard examples that have poor localization
accuracy. This will enable the model to be biased toward the hard samples. So
in our TPNet, an adaptive IoU based localization loss, named Rectified IoU
(RIoU) loss, is proposed to rectify the gradients of each kind of samples. The
Rectified IoU loss increases the gradients of examples with high IoU while
suppressing the gradients of examples with low IoU, which can improve the
overall localization accuracy of model. Extensive experiments demonstrate the
superiority of our TPNet and RIoU loss.Comment: Accepted by ACM MM 202
- …