1,823 research outputs found
Human Motion Analysis for Efficient Action Recognition
Automatic understanding of human actions is at the core of several application domains, such as content-based indexing, human-computer interaction, surveillance, and sports video analysis. The recent advances in digital platforms and the exponential growth of video and image data have brought an urgent quest for intelligent frameworks to automatically analyze human motion and predict their corresponding action based on visual data and sensor signals. This thesis presents a collection of methods that targets human action recognition using different action modalities. The first method uses the appearance modality and classifies human actions based on heterogeneous global- and local-based features of scene and humanbody appearances. The second method harnesses 2D and 3D articulated human poses and analyizes the body motion using a discriminative combination of the parts’ velocities, locations, and correlations histograms for action recognition. The third method presents an optimal scheme for combining the probabilistic predictions from different action modalities by solving a constrained quadratic optimization problem. In addition to the action classification task, we present a study that compares the utility of different pose variants in motion analysis for human action recognition. In particular, we compare the recognition performance when 2D and 3D poses are used. Finally, we demonstrate the efficiency of our pose-based method for action recognition in spotting and segmenting motion gestures in real time from a continuous stream of an input video for the recognition of the Italian sign gesture language
Improving Facial Analysis and Performance Driven Animation through Disentangling Identity and Expression
We present techniques for improving performance driven facial animation,
emotion recognition, and facial key-point or landmark prediction using learned
identity invariant representations. Established approaches to these problems
can work well if sufficient examples and labels for a particular identity are
available and factors of variation are highly controlled. However, labeled
examples of facial expressions, emotions and key-points for new individuals are
difficult and costly to obtain. In this paper we improve the ability of
techniques to generalize to new and unseen individuals by explicitly modeling
previously seen variations related to identity and expression. We use a
weakly-supervised approach in which identity labels are used to learn the
different factors of variation linked to identity separately from factors
related to expression. We show how probabilistic modeling of these sources of
variation allows one to learn identity-invariant representations for
expressions which can then be used to identity-normalize various procedures for
facial expression analysis and animation control. We also show how to extend
the widely used techniques of active appearance models and constrained local
models through replacing the underlying point distribution models which are
typically constructed using principal component analysis with
identity-expression factorized representations. We present a wide variety of
experiments in which we consistently improve performance on emotion
recognition, markerless performance-driven facial animation and facial
key-point tracking.Comment: to appear in Image and Vision Computing Journal (IMAVIS
Efficient Human Activity Recognition in Large Image and Video Databases
Vision-based human action recognition has attracted considerable interest in recent research for its applications to video surveillance, content-based search, healthcare, and interactive games. Most existing research deals with building informative feature descriptors, designing efficient and robust algorithms, proposing versatile and challenging datasets, and fusing multiple modalities. Often, these approaches build on certain conventions such as the use of motion cues to determine video descriptors, application of off-the-shelf classifiers, and single-factor classification of videos. In this thesis, we deal with important but overlooked issues such as efficiency, simplicity, and scalability of human activity recognition in different application scenarios: controlled video environment (e.g.~indoor surveillance), unconstrained videos (e.g.~YouTube), depth or skeletal data (e.g.~captured by Kinect), and person images (e.g.~Flicker). In particular, we are interested in answering questions like (a) is it possible to efficiently recognize human actions in controlled videos without temporal cues? (b) given that the large-scale unconstrained video data are often of high dimension low sample size (HDLSS) nature, how to efficiently recognize human actions in such data? (c) considering the rich 3D motion information available from depth or motion capture sensors, is it possible to recognize both the actions and the actors using only the motion dynamics of underlying activities? and (d) can motion information from monocular videos be used for automatically determining saliency regions for recognizing actions in still images
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
- …