9,500 research outputs found
Detect to Track and Track to Detect
Recent approaches for high accuracy detection and tracking of object
categories in video consist of complex multistage solutions that become more
cumbersome each year. In this paper we propose a ConvNet architecture that
jointly performs detection and tracking, solving the task in a simple and
effective way. Our contributions are threefold: (i) we set up a ConvNet
architecture for simultaneous detection and tracking, using a multi-task
objective for frame-based object detection and across-frame track regression;
(ii) we introduce correlation features that represent object co-occurrences
across time to aid the ConvNet during tracking; and (iii) we link the frame
level detections based on our across-frame tracklets to produce high accuracy
detections at the video level. Our ConvNet architecture for spatiotemporal
object detection is evaluated on the large-scale ImageNet VID dataset where it
achieves state-of-the-art results. Our approach provides better single model
performance than the winning method of the last ImageNet challenge while being
conceptually much simpler. Finally, we show that by increasing the temporal
stride we can dramatically increase the tracker speed.Comment: ICCV 2017. Code and models:
https://github.com/feichtenhofer/Detect-Track Results:
https://www.robots.ox.ac.uk/~vgg/research/detect-track
Real-time Multiple People Tracking with Deeply Learned Candidate Selection and Person Re-Identification
Online multi-object tracking is a fundamental problem in time-critical video
analysis applications. A major challenge in the popular tracking-by-detection
framework is how to associate unreliable detection results with existing
tracks. In this paper, we propose to handle unreliable detection by collecting
candidates from outputs of both detection and tracking. The intuition behind
generating redundant candidates is that detection and tracks can complement
each other in different scenarios. Detection results of high confidence prevent
tracking drifts in the long term, and predictions of tracks can handle noisy
detection caused by occlusion. In order to apply optimal selection from a
considerable amount of candidates in real-time, we present a novel scoring
function based on a fully convolutional neural network, that shares most
computations on the entire image. Moreover, we adopt a deeply learned
appearance representation, which is trained on large-scale person
re-identification datasets, to improve the identification ability of our
tracker. Extensive experiments show that our tracker achieves real-time and
state-of-the-art performance on a widely used people tracking benchmark.Comment: ICME 201
Box-level Segmentation Supervised Deep Neural Networks for Accurate and Real-time Multispectral Pedestrian Detection
Effective fusion of complementary information captured by multi-modal sensors
(visible and infrared cameras) enables robust pedestrian detection under
various surveillance situations (e.g. daytime and nighttime). In this paper, we
present a novel box-level segmentation supervised learning framework for
accurate and real-time multispectral pedestrian detection by incorporating
features extracted in visible and infrared channels. Specifically, our method
takes pairs of aligned visible and infrared images with easily obtained
bounding box annotations as input and estimates accurate prediction maps to
highlight the existence of pedestrians. It offers two major advantages over the
existing anchor box based multispectral detection methods. Firstly, it
overcomes the hyperparameter setting problem occurred during the training phase
of anchor box based detectors and can obtain more accurate detection results,
especially for small and occluded pedestrian instances. Secondly, it is capable
of generating accurate detection results using small-size input images, leading
to improvement of computational efficiency for real-time autonomous driving
applications. Experimental results on KAIST multispectral dataset show that our
proposed method outperforms state-of-the-art approaches in terms of both
accuracy and speed
Towards High Performance Video Object Detection
There has been significant progresses for image object detection in recent
years. Nevertheless, video object detection has received little attention,
although it is more challenging and more important in practical scenarios.
Built upon the recent works, this work proposes a unified approach based on
the principle of multi-frame end-to-end learning of features and cross-frame
motion. Our approach extends prior works with three new techniques and steadily
pushes forward the performance envelope (speed-accuracy tradeoff), towards high
performance video object detection
- …