2,407 research outputs found
Robust Multi-Person Tracking from Moving Platforms
In this paper, we address the problem of multi-person tracking in busy pedestrian
zones, using a stereo rig mounted on a mobile platform. The
complexity of the problem calls for an integrated solution, which
extracts as much visual information as possible and combines it
through cognitive feedback. We propose such an approach, which
jointly estimates camera position, stereo depth, object detection,
and tracking. We model the interplay between these components
using a graphical model. Since the model has to
incorporate object-object interactions, and temporal links to past
frames, direct inference is intractable. We therefore propose a two-stage
procedure: for each frame we first solve a simplified version of the
model (disregarding interactions and temporal continuity) to
estimate the scene geometry and an overcomplete set of object
detections. Conditioned on these results, we then address object
interactions, tracking, and prediction in a second step. The
approach is experimentally evaluated on several long and difficult
video sequences from busy inner-city locations. Our results show
that the proposed integration makes it possible to deliver stable
tracking performance in scenes of realistic complexity
Large Scale Real-World Multi-Person Tracking
This paper presents a new large scale multi-person tracking dataset --
\texttt{PersonPath22}, which is over an order of magnitude larger than
currently available high quality multi-object tracking datasets such as MOT17,
HiEve, and MOT20 datasets. The lack of large scale training and test data for
this task has limited the community's ability to understand the performance of
their tracking systems on a wide range of scenarios and conditions such as
variations in person density, actions being performed, weather, and time of
day. \texttt{PersonPath22} dataset was specifically sourced to provide a wide
variety of these conditions and our annotations include rich meta-data such
that the performance of a tracker can be evaluated along these different
dimensions. The lack of training data has also limited the ability to perform
end-to-end training of tracking systems. As such, the highest performing
tracking systems all rely on strong detectors trained on external image
datasets. We hope that the release of this dataset will enable new lines of
research that take advantage of large scale video based training data.Comment: ECCV 202
Person tracking with non-overlapping multiple cameras
Monitoring and tracking of any target in a surveillance system is an important task. When these targets are human then this problem comes under person identification and tracking. At present, large scale smart video surveillance system is an essential component for any commercial or public campus. Since field of view (FOV) of a camera is limited; for large area monitoring, multiple cameras are needed at different locations. This paper proposes a novel model for tracking a person under multiple non-overlapping cameras. It builds the reference signature of the person at the beginning of the tracking system to match with the upcoming signatures captured by other cameras within the specified area of observation with the help of trained support vector machine (SVM) between two cameras. For experiments, wide area re-identification dataset (WARD) and a real-time scenario have been used with color, shape and texture features for person's re-identification
Unsupervised Multiple Person Tracking using AutoEncoder-Based Lifted Multicuts
Multiple Object Tracking (MOT) is a long-standing task in computer vision.
Current approaches based on the tracking by detection paradigm either require
some sort of domain knowledge or supervision to associate data correctly into
tracks. In this work, we present an unsupervised multiple object tracking
approach based on visual features and minimum cost lifted multicuts. Our method
is based on straight-forward spatio-temporal cues that can be extracted from
neighboring frames in an image sequences without superivison. Clustering based
on these cues enables us to learn the required appearance invariances for the
tracking task at hand and train an autoencoder to generate suitable latent
representation. Thus, the resulting latent representations can serve as robust
appearance cues for tracking even over large temporal distances where no
reliable spatio-temporal features could be extracted. We show that, despite
being trained without using the provided annotations, our model provides
competitive results on the challenging MOT Benchmark for pedestrian tracking
- …