9,451 research outputs found
Tracking people in crowds by a part matching approach
The major difficulty in human tracking is the problem raised by challenging occlusions where the target person is repeatedly and extensively occluded by either the background or another moving object. These types of occlusions may cause significant changes in the person's shape, appearance or motion, thus making the data association problem extremely difficult to solve. Unlike most of the existing methods for human tracking that handle occlusions by data association of the complete human body, in this paper we propose a method that tracks people under challenging spatial occlusions based on body part tracking. The human model we propose consists of five body parts with six degrees of freedom and each part is represented by a rich set of features. The tracking is solved using a layered data association approach, direct comparison between features (feature layer) and subsequently matching between parts of the same bodies (part layer) lead to a final decision for the global match (global layer). Experimental results have confirmed the effectiveness of the proposed method. © 2008 IEEE
Class-Agnostic Counting
Nearly all existing counting methods are designed for a specific object
class. Our work, however, aims to create a counting model able to count any
class of object. To achieve this goal, we formulate counting as a matching
problem, enabling us to exploit the image self-similarity property that
naturally exists in object counting problems. We make the following three
contributions: first, a Generic Matching Network (GMN) architecture that can
potentially count any object in a class-agnostic manner; second, by
reformulating the counting problem as one of matching objects, we can take
advantage of the abundance of video data labeled for tracking, which contains
natural repetitions suitable for training a counting model. Such data enables
us to train the GMN. Third, to customize the GMN to different user
requirements, an adapter module is used to specialize the model with minimal
effort, i.e. using a few labeled examples, and adapting only a small fraction
of the trained parameters. This is a form of few-shot learning, which is
practical for domains where labels are limited due to requiring expert
knowledge (e.g. microbiology). We demonstrate the flexibility of our method on
a diverse set of existing counting benchmarks: specifically cells, cars, and
human crowds. The model achieves competitive performance on cell and crowd
counting datasets, and surpasses the state-of-the-art on the car dataset using
only three training images. When training on the entire dataset, the proposed
method outperforms all previous methods by a large margin.Comment: Asian Conference on Computer Vision (ACCV), 201
PoseTrack: A Benchmark for Human Pose Estimation and Tracking
Human poses and motions are important cues for analysis of videos with people
and there is strong evidence that representations based on body pose are highly
effective for a variety of tasks such as activity recognition, content
retrieval and social signal processing. In this work, we aim to further advance
the state of the art by establishing "PoseTrack", a new large-scale benchmark
for video-based human pose estimation and articulated tracking, and bringing
together the community of researchers working on visual human analysis. The
benchmark encompasses three competition tracks focusing on i) single-frame
multi-person pose estimation, ii) multi-person pose estimation in videos, and
iii) multi-person articulated tracking. To facilitate the benchmark and
challenge we collect, annotate and release a new %large-scale benchmark dataset
that features videos with multiple people labeled with person tracks and
articulated pose. A centralized evaluation server is provided to allow
participants to evaluate on a held-out test set. We envision that the proposed
benchmark will stimulate productive research both by providing a large and
representative training dataset as well as providing a platform to objectively
evaluate and compare the proposed methods. The benchmark is freely accessible
at https://posetrack.net.Comment: www.posetrack.ne
Human Motion Trajectory Prediction: A Survey
With growing numbers of intelligent autonomous systems in human environments,
the ability of such systems to perceive, understand and anticipate human
behavior becomes increasingly important. Specifically, predicting future
positions of dynamic agents and planning considering such predictions are key
tasks for self-driving vehicles, service robots and advanced surveillance
systems. This paper provides a survey of human motion trajectory prediction. We
review, analyze and structure a large selection of work from different
communities and propose a taxonomy that categorizes existing methods based on
the motion modeling approach and level of contextual information used. We
provide an overview of the existing datasets and performance metrics. We
discuss limitations of the state of the art and outline directions for further
research.Comment: Submitted to the International Journal of Robotics Research (IJRR),
37 page
F-formation Detection: Individuating Free-standing Conversational Groups in Images
Detection of groups of interacting people is a very interesting and useful
task in many modern technologies, with application fields spanning from
video-surveillance to social robotics. In this paper we first furnish a
rigorous definition of group considering the background of the social sciences:
this allows us to specify many kinds of group, so far neglected in the Computer
Vision literature. On top of this taxonomy, we present a detailed state of the
art on the group detection algorithms. Then, as a main contribution, we present
a brand new method for the automatic detection of groups in still images, which
is based on a graph-cuts framework for clustering individuals; in particular we
are able to codify in a computational sense the sociological definition of
F-formation, that is very useful to encode a group having only proxemic
information: position and orientation of people. We call the proposed method
Graph-Cuts for F-formation (GCFF). We show how GCFF definitely outperforms all
the state of the art methods in terms of different accuracy measures (some of
them are brand new), demonstrating also a strong robustness to noise and
versatility in recognizing groups of various cardinality.Comment: 32 pages, submitted to PLOS On
Hybrid Focal Stereo Networks for Pattern Analysis in Homogeneous Scenes
In this paper we address the problem of multiple camera calibration in the
presence of a homogeneous scene, and without the possibility of employing
calibration object based methods. The proposed solution exploits salient
features present in a larger field of view, but instead of employing active
vision we replace the cameras with stereo rigs featuring a long focal analysis
camera, as well as a short focal registration camera. Thus, we are able to
propose an accurate solution which does not require intrinsic variation models
as in the case of zooming cameras. Moreover, the availability of the two views
simultaneously in each rig allows for pose re-estimation between rigs as often
as necessary. The algorithm has been successfully validated in an indoor
setting, as well as on a difficult scene featuring a highly dense pilgrim crowd
in Makkah.Comment: 13 pages, 6 figures, submitted to Machine Vision and Application
Pedestrian Flow Simulation Validation and Verification Techniques
For the verification and validation of microscopic simulation models of
pedestrian flow, we have performed experiments for different kind of facilities
and sites where most conflicts and congestion happens e.g. corridors, narrow
passages, and crosswalks. The validity of the model should compare the
experimental conditions and simulation results with video recording carried out
in the same condition like in real life e.g. pedestrian flux and density
distributions. The strategy in this technique is to achieve a certain amount of
accuracy required in the simulation model. This method is good at detecting the
critical points in the pedestrians walking areas. For the calibration of
suitable models we use the results obtained from analyzing the video recordings
in Hajj 2009 and these results can be used to check the design sections of
pedestrian facilities and exits. As practical examples, we present the
simulation of pilgrim streams on the Jamarat bridge.
The objectives of this study are twofold: first, to show through verification
and validation that simulation tools can be used to reproduce realistic
scenarios, and second, gather data for accurate predictions for designers and
decision makers.Comment: 19 pages, 10 figure
Markerless Motion Capture in the Crowd
This work uses crowdsourcing to obtain motion capture data from video
recordings. The data is obtained by information workers who click repeatedly to
indicate body configurations in the frames of a video, resulting in a model of
2D structure over time. We discuss techniques to optimize the tracking task and
strategies for maximizing accuracy and efficiency. We show visualizations of a
variety of motions captured with our pipeline then apply reconstruction
techniques to derive 3D structure.Comment: Presented at Collective Intelligence conference, 2012
(arXiv:1204.2991
LCrowdV: Generating Labeled Videos for Simulation-based Crowd Behavior Learning
We present a novel procedural framework to generate an arbitrary number of
labeled crowd videos (LCrowdV). The resulting crowd video datasets are used to
design accurate algorithms or training models for crowded scene understanding.
Our overall approach is composed of two components: a procedural simulation
framework for generating crowd movements and behaviors, and a procedural
rendering framework to generate different videos or images. Each video or image
is automatically labeled based on the environment, number of pedestrians,
density, behavior, flow, lighting conditions, viewpoint, noise, etc.
Furthermore, we can increase the realism by combining synthetically-generated
behaviors with real-world background videos. We demonstrate the benefits of
LCrowdV over prior lableled crowd datasets by improving the accuracy of
pedestrian detection and crowd behavior classification algorithms. LCrowdV
would be released on the WWW
- …