542 research outputs found
Spatiotemporal visual analysis of human actions
In this dissertation we propose four methods for the recognition of human activities. In all four of
them, the representation of the activities is based on spatiotemporal features that are automatically
detected at areas where there is a significant amount of independent motion, that is, motion that is
due to ongoing activities in the scene. We propose the use of spatiotemporal salient points as features
throughout this dissertation. The algorithms presented, however, can be used with any kind of features,
as long as the latter are well localized and have a well-defined area of support in space and time. We
introduce the utilized spatiotemporal salient points in the first method presented in this dissertation.
By extending previous work on spatial saliency, we measure the variations in the information content of
pixel neighborhoods both in space and time, and detect the points at the locations and scales for which
this information content is locally maximized. In this way, an activity is represented as a collection of
spatiotemporal salient points. We propose an iterative linear space-time warping technique in order
to align the representations in space and time and propose to use Relevance Vector Machines (RVM)
in order to classify each example into an action category. In the second method proposed in this
dissertation we propose to enhance the acquired representations of the first method. More specifically,
we propose to track each detected point in time, and create representations based on sets of trajectories,
where each trajectory expresses how the information engulfed by each salient point evolves over time.
In order to deal with imperfect localization of the detected points, we augment the observation model
of the tracker with background information, acquired using a fully automatic background estimation
algorithm. In this way, the tracker favors solutions that contain a large number of foreground pixels.
In addition, we perform experiments where the tracked templates are localized on specific parts of the
body, like the hands and the head, and we further augment the tracker’s observation model using a
human skin color model. Finally, we use a variant of the Longest Common Subsequence algorithm
(LCSS) in order to acquire a similarity measure between the resulting trajectory representations, and
RVMs for classification. In the third method that we propose, we assume that neighboring salient
points follow a similar motion. This is in contrast to the previous method, where each salient point was
tracked independently of its neighbors. More specifically, we propose to extract a novel set of visual
descriptors that are based on geometrical properties of three-dimensional piece-wise polynomials. The
latter are fitted on the spatiotemporal locations of salient points that fall within local spatiotemporal
neighborhoods, and are assumed to follow a similar motion. The extracted descriptors are invariant in
translation and scaling in space-time. Coupling the neighborhood dimensions to the scale at which the
corresponding spatiotemporal salient points are detected ensures the latter. The descriptors that are
extracted across the whole dataset are subsequently clustered in order to create a codebook, which is
used in order to represent the overall motion of the subjects within small temporal windows.Finally,we use boosting in order to select the most discriminative of these windows for each class, and RVMs for
classification. The fourth and last method addresses the joint problem of localization and recognition
of human activities depicted in unsegmented image sequences. Its main contribution is the use of an
implicit representation of the spatiotemporal shape of the activity, which relies on the spatiotemporal
localization of characteristic ensembles of spatiotemporal features. The latter are localized around
automatically detected salient points. Evidence for the spatiotemporal localization of the activity
is accumulated in a probabilistic spatiotemporal voting scheme. During training, we use boosting in
order to create codebooks of characteristic feature ensembles for each class. Subsequently, we construct
class-specific spatiotemporal models, which encode where in space and time each codeword ensemble
appears in the training set. During testing, each activated codeword ensemble casts probabilistic
votes concerning the spatiotemporal localization of the activity, according to the information stored
during training. We use a Mean Shift Mode estimation algorithm in order to extract the most probable
hypotheses from each resulting voting space. Each hypothesis corresponds to a spatiotemporal volume
which potentially engulfs the activity, and is verified by performing action category classification with
an RVM classifier
A system for learning statistical motion patterns
Analysis of motion patterns is an effective approach for anomaly detection and behavior prediction. Current approaches for the analysis of motion patterns depend on known scenes, where objects move in predefined ways. It is highly desirable to automatically construct object motion patterns which reflect the knowledge of the scene. In this paper, we present a system for automatically learning motion patterns for anomaly detection and behavior prediction based on a proposed algorithm for robustly tracking multiple objects. In the tracking algorithm, foreground pixels are clustered using a fast accurate fuzzy k-means algorithm. Growing and prediction of the cluster centroids of foreground pixels ensure that each cluster centroid is associated with a moving object in the scene. In the algorithm for learning motion patterns, trajectories are clustered hierarchically using spatial and temporal information and then each motion pattern is represented with a chain of Gaussian distributions. Based on the learned statistical motion patterns, statistical methods are used to detect anomalies and predict behaviors. Our system is tested using image sequences acquired, respectively, from a crowded real traffic scene and a model traffic scene. Experimental results show the robustness of the tracking algorithm, the efficiency of the algorithm for learning motion patterns, and the encouraging performance of algorithms for anomaly detection and behavior prediction
A system for learning statistical motion patterns
Analysis of motion patterns is an effective approach for anomaly detection and behavior prediction. Current approaches for the analysis of motion patterns depend on known scenes, where objects move in predefined ways. It is highly desirable to automatically construct object motion patterns which reflect the knowledge of the scene. In this paper, we present a system for automatically learning motion patterns for anomaly detection and behavior prediction based on a proposed algorithm for robustly tracking multiple objects. In the tracking algorithm, foreground pixels are clustered using a fast accurate fuzzy k-means algorithm. Growing and prediction of the cluster centroids of foreground pixels ensure that each cluster centroid is associated with a moving object in the scene. In the algorithm for learning motion patterns, trajectories are clustered hierarchically using spatial and temporal information and then each motion pattern is represented with a chain of Gaussian distributions. Based on the learned statistical motion patterns, statistical methods are used to detect anomalies and predict behaviors. Our system is tested using image sequences acquired, respectively, from a crowded real traffic scene and a model traffic scene. Experimental results show the robustness of the tracking algorithm, the efficiency of the algorithm for learning motion patterns, and the encouraging performance of algorithms for anomaly detection and behavior prediction
- …