82,796 research outputs found
UA-DETRAC: A New Benchmark and Protocol for Multi-Object Detection and Tracking
In recent years, numerous effective multi-object tracking (MOT) methods are
developed because of the wide range of applications. Existing performance
evaluations of MOT methods usually separate the object tracking step from the
object detection step by using the same fixed object detection results for
comparisons. In this work, we perform a comprehensive quantitative study on the
effects of object detection accuracy to the overall MOT performance, using the
new large-scale University at Albany DETection and tRACking (UA-DETRAC)
benchmark dataset. The UA-DETRAC benchmark dataset consists of 100 challenging
video sequences captured from real-world traffic scenes (over 140,000 frames
with rich annotations, including occlusion, weather, vehicle category,
truncation, and vehicle bounding boxes) for object detection, object tracking
and MOT system. We evaluate complete MOT systems constructed from combinations
of state-of-the-art object detection and object tracking methods. Our analysis
shows the complex effects of object detection accuracy on MOT system
performance. Based on these observations, we propose new evaluation tools and
metrics for MOT systems that consider both object detection and object tracking
for comprehensive analysis.Comment: 18 pages, 11 figures, accepted by CVI
Relation Networks for Object Detection
Although it is well believed for years that modeling relations between
objects would help object recognition, there has not been evidence that the
idea is working in the deep learning era. All state-of-the-art object detection
systems still rely on recognizing object instances individually, without
exploiting their relations during learning.
This work proposes an object relation module. It processes a set of objects
simultaneously through interaction between their appearance feature and
geometry, thus allowing modeling of their relations. It is lightweight and
in-place. It does not require additional supervision and is easy to embed in
existing networks. It is shown effective on improving object recognition and
duplicate removal steps in the modern object detection pipeline. It verifies
the efficacy of modeling object relations in CNN based detection. It gives rise
to the first fully end-to-end object detector
Object Detection in Videos with Tubelet Proposal Networks
Object detection in videos has drawn increasing attention recently with the
introduction of the large-scale ImageNet VID dataset. Different from object
detection in static images, temporal information in videos is vital for object
detection. To fully utilize temporal information, state-of-the-art methods are
based on spatiotemporal tubelets, which are essentially sequences of associated
bounding boxes across time. However, the existing methods have major
limitations in generating tubelets in terms of quality and efficiency.
Motion-based methods are able to obtain dense tubelets efficiently, but the
lengths are generally only several frames, which is not optimal for
incorporating long-term temporal information. Appearance-based methods, usually
involving generic object tracking, could generate long tubelets, but are
usually computationally expensive. In this work, we propose a framework for
object detection in videos, which consists of a novel tubelet proposal network
to efficiently generate spatiotemporal proposals, and a Long Short-term Memory
(LSTM) network that incorporates temporal information from tubelet proposals
for achieving high object detection accuracy in videos. Experiments on the
large-scale ImageNet VID dataset demonstrate the effectiveness of the proposed
framework for object detection in videos.Comment: CVPR 201
Deep Regionlets for Object Detection
In this paper, we propose a novel object detection framework named "Deep
Regionlets" by establishing a bridge between deep neural networks and
conventional detection schema for accurate generic object detection. Motivated
by the abilities of regionlets for modeling object deformation and multiple
aspect ratios, we incorporate regionlets into an end-to-end trainable deep
learning framework. The deep regionlets framework consists of a region
selection network and a deep regionlet learning module. Specifically, given a
detection bounding box proposal, the region selection network provides guidance
on where to select regions to learn the features from. The regionlet learning
module focuses on local feature selection and transformation to alleviate local
variations. To this end, we first realize non-rectangular region selection
within the detection framework to accommodate variations in object appearance.
Moreover, we design a "gating network" within the regionlet leaning module to
enable soft regionlet selection and pooling. The Deep Regionlets framework is
trained end-to-end without additional efforts. We perform ablation studies and
conduct extensive experiments on the PASCAL VOC and Microsoft COCO datasets.
The proposed framework outperforms state-of-the-art algorithms, such as
RetinaNet and Mask R-CNN, even without additional segmentation labels.Comment: Accepted to ECCV 201
- …