Thesis (PhD) - Indiana University, Computer Sciences, 2007In this thesis we present a system for automatic human tracking and activity recognition from
video sequences. The problem of automated analysis of visual information in order to derive descriptors
of high level human activities has intrigued computer vision community for decades and is
considered to be largely unsolved. A part of this interest is derived from the vast range of applications
in which such a solution may be useful. We attempt to find efficient formulations of these tasks
as applied to the extracting customer behavior information in a retail marketing context. Based on
these formulations, we present a system that visually tracks customers in a retail store and performs
a number of activity analysis tasks based on the output from the tracker.
In tracking we introduce new techniques for pedestrian detection, initialization of the body
model and a formulation of the temporal tracking as a global trans-dimensional optimization problem.
Initial human detection is addressed by a novel method for head detection, which incorporates
the knowledge of the camera projection model.The initialization of the human body model is addressed
by newly developed shape and appearance descriptors. Temporal tracking of customer
trajectories is performed by employing a human body tracking system designed as a Bayesian
jump-diffusion filter. This approach demonstrates the ability to overcome model dimensionality
ambiguities as people are leaving and entering the scene.
Following the tracking, we developed a two-stage group activity formulation based upon the
ideas from swarming research. For modeling purposes, all moving actors in the scene are viewed here as simplistic agents in the swarm. This allows to effectively define a set of inter-agent interactions,
which combine to derive a distance metric used in further swarm clustering. This way, in the
first stage the shoppers that belong to the same group are identified by deterministically clustering
bodies to detect short term events and in the second stage events are post-processed to form clusters
of group activities with fuzzy memberships.
Quantitative analysis of the tracking subsystem shows an improvement over the state of the
art methods, if used under similar conditions. Finally, based on the output from the tracker, the
activity recognition procedure achieves over 80% correct shopper group detection, as validated by
the human generated ground truth results