12,034 research outputs found
Track, then Decide: Category-Agnostic Vision-based Multi-Object Tracking
The most common paradigm for vision-based multi-object tracking is
tracking-by-detection, due to the availability of reliable detectors for
several important object categories such as cars and pedestrians. However,
future mobile systems will need a capability to cope with rich human-made
environments, in which obtaining detectors for every possible object category
would be infeasible. In this paper, we propose a model-free multi-object
tracking approach that uses a category-agnostic image segmentation method to
track objects. We present an efficient segmentation mask-based tracker which
associates pixel-precise masks reported by the segmentation. Our approach can
utilize semantic information whenever it is available for classifying objects
at the track level, while retaining the capability to track generic unknown
objects in the absence of such information. We demonstrate experimentally that
our approach achieves performance comparable to state-of-the-art
tracking-by-detection methods for popular object categories such as cars and
pedestrians. Additionally, we show that the proposed method can discover and
robustly track a large variety of other objects.Comment: ICRA'18 submissio
Detect-and-Track: Efficient Pose Estimation in Videos
This paper addresses the problem of estimating and tracking human body
keypoints in complex, multi-person video. We propose an extremely lightweight
yet highly effective approach that builds upon the latest advancements in human
detection and video understanding. Our method operates in two-stages: keypoint
estimation in frames or short clips, followed by lightweight tracking to
generate keypoint predictions linked over the entire video. For frame-level
pose estimation we experiment with Mask R-CNN, as well as our own proposed 3D
extension of this model, which leverages temporal information over small clips
to generate more robust frame predictions. We conduct extensive ablative
experiments on the newly released multi-person video pose estimation benchmark,
PoseTrack, to validate various design choices of our model. Our approach
achieves an accuracy of 55.2% on the validation and 51.8% on the test set using
the Multi-Object Tracking Accuracy (MOTA) metric, and achieves state of the art
performance on the ICCV 2017 PoseTrack keypoint tracking challenge.Comment: In CVPR 2018. Ranked first in ICCV 2017 PoseTrack challenge (keypoint
tracking in videos). Code: https://github.com/facebookresearch/DetectAndTrack
and webpage: https://rohitgirdhar.github.io/DetectAndTrack
LCrowdV: Generating Labeled Videos for Simulation-based Crowd Behavior Learning
We present a novel procedural framework to generate an arbitrary number of
labeled crowd videos (LCrowdV). The resulting crowd video datasets are used to
design accurate algorithms or training models for crowded scene understanding.
Our overall approach is composed of two components: a procedural simulation
framework for generating crowd movements and behaviors, and a procedural
rendering framework to generate different videos or images. Each video or image
is automatically labeled based on the environment, number of pedestrians,
density, behavior, flow, lighting conditions, viewpoint, noise, etc.
Furthermore, we can increase the realism by combining synthetically-generated
behaviors with real-world background videos. We demonstrate the benefits of
LCrowdV over prior lableled crowd datasets by improving the accuracy of
pedestrian detection and crowd behavior classification algorithms. LCrowdV
would be released on the WWW
leave a trace - A People Tracking System Meets Anomaly Detection
Video surveillance always had a negative connotation, among others because of
the loss of privacy and because it may not automatically increase public
safety. If it was able to detect atypical (i.e. dangerous) situations in real
time, autonomously and anonymously, this could change. A prerequisite for this
is a reliable automatic detection of possibly dangerous situations from video
data. This is done classically by object extraction and tracking. From the
derived trajectories, we then want to determine dangerous situations by
detecting atypical trajectories. However, due to ethical considerations it is
better to develop such a system on data without people being threatened or even
harmed, plus with having them know that there is such a tracking system
installed. Another important point is that these situations do not occur very
often in real, public CCTV areas and may be captured properly even less. In the
artistic project leave a trace the tracked objects, people in an atrium of a
institutional building, become actor and thus part of the installation.
Visualisation in real-time allows interaction by these actors, which in turn
creates many atypical interaction situations on which we can develop our
situation detection. The data set has evolved over three years and hence, is
huge. In this article we describe the tracking system and several approaches
for the detection of atypical trajectories
- …