2,861 research outputs found
Substructure and Boundary Modeling for Continuous Action Recognition
This paper introduces a probabilistic graphical model for continuous action
recognition with two novel components: substructure transition model and
discriminative boundary model. The first component encodes the sparse and
global temporal transition prior between action primitives in state-space model
to handle the large spatial-temporal variations within an action class. The
second component enforces the action duration constraint in a discriminative
way to locate the transition boundaries between actions more accurately. The
two components are integrated into a unified graphical structure to enable
effective training and inference. Our comprehensive experimental results on
both public and in-house datasets show that, with the capability to incorporate
additional information that had not been explicitly or efficiently modeled by
previous methods, our proposed algorithm achieved significantly improved
performance for continuous action recognition.Comment: Detailed version of the CVPR 2012 paper. 15 pages, 6 figure
Comparative Evaluation of Action Recognition Methods via Riemannian Manifolds, Fisher Vectors and GMMs: Ideal and Challenging Conditions
We present a comparative evaluation of various techniques for action
recognition while keeping as many variables as possible controlled. We employ
two categories of Riemannian manifolds: symmetric positive definite matrices
and linear subspaces. For both categories we use their corresponding nearest
neighbour classifiers, kernels, and recent kernelised sparse representations.
We compare against traditional action recognition techniques based on Gaussian
mixture models and Fisher vectors (FVs). We evaluate these action recognition
techniques under ideal conditions, as well as their sensitivity in more
challenging conditions (variations in scale and translation). Despite recent
advancements for handling manifolds, manifold based techniques obtain the
lowest performance and their kernel representations are more unstable in the
presence of challenging conditions. The FV approach obtains the highest
accuracy under ideal conditions. Moreover, FV best deals with moderate scale
and translation changes
Learning Human Poses from Monocular Images
In this research, we mainly focus on the problem of estimating the 2D human pose from a monocular image and reconstructing the 3D human pose based on the 2D human pose. Here a 3D pose is the locations of the human joints in the 3D space and a 2D pose is the projection of a 3D pose on an image. Unlike many previous works that explicitly use hand-crafted physiological models, both our 2D pose estimation and 3D pose reconstruction approaches implicitly learn the structure of human body from human pose data.
This 3D pose reconstruction is an ill-posed problem without considering any prior knowledge. In this research, we propose a new approach, namely Pose Locality Constrained Representation (PLCR), to constrain the search space for the underlying 3D human pose and use it to improve 3D human pose reconstruction. In this approach, an over-complete pose dictionary is constructed by hierarchically clustering the 3D pose space into many subspaces. Then PLCR utilizes the structure of the over-complete dictionary to constrain the 3D pose solution to a set of highly-related subspaces. Finally, PLCR is combined into the matching-pursuit based algorithm for 3D human-pose reconstruction.
The 2D human pose used in 3D pose reconstruction can be manually annotated or automatically estimated from a single image. In this research, we develop a new learning-based 2D human pose estimation approach based on a Dual-Source Deep Convolutional Neural Networks (DS-CNN). The proposed DS-CNN model learns the appearance of each local body part and the relations between parts simultaneously, while most of existing approaches consider them as two separate steps. In our experiments, the proposed DS-CNN model produces superior or comparable performance against the state-of-the-art 2D human-pose estimation approaches based on pose priors learned from hand-crafted models or holistic perspectives.
Finally, we use our 2D human pose estimation approach to recognize human attributes by utilizing the strong correspondence between human attributes and human body parts. Then we probe if and when the CNN can find such correspondence by itself on human attribute recognition and bird species recognition. We find that there is direct correlation between the recognition accuracy and the correctness of the correspondence that the CNN finds
Expanding the Family of Grassmannian Kernels: An Embedding Perspective
Modeling videos and image-sets as linear subspaces has proven beneficial for
many visual recognition tasks. However, it also incurs challenges arising from
the fact that linear subspaces do not obey Euclidean geometry, but lie on a
special type of Riemannian manifolds known as Grassmannian. To leverage the
techniques developed for Euclidean spaces (e.g, support vector machines) with
subspaces, several recent studies have proposed to embed the Grassmannian into
a Hilbert space by making use of a positive definite kernel. Unfortunately,
only two Grassmannian kernels are known, none of which -as we will show- is
universal, which limits their ability to approximate a target function
arbitrarily well. Here, we introduce several positive definite Grassmannian
kernels, including universal ones, and demonstrate their superiority over
previously-known kernels in various tasks, such as classification, clustering,
sparse coding and hashing
- …