630 research outputs found
Down-Sampling coupled to Elastic Kernel Machines for Efficient Recognition of Isolated Gestures
In the field of gestural action recognition, many studies have focused on
dimensionality reduction along the spatial axis, to reduce both the variability
of gestural sequences expressed in the reduced space, and the computational
complexity of their processing. It is noticeable that very few of these methods
have explicitly addressed the dimensionality reduction along the time axis.
This is however a major issue with regard to the use of elastic distances
characterized by a quadratic complexity. To partially fill this apparent gap,
we present in this paper an approach based on temporal down-sampling associated
to elastic kernel machine learning. We experimentally show, on two data sets
that are widely referenced in the domain of human gesture recognition, and very
different in terms of quality of motion capture, that it is possible to
significantly reduce the number of skeleton frames while maintaining a good
recognition rate. The method proves to give satisfactory results at a level
currently reached by state-of-the-art methods on these data sets. The
computational complexity reduction makes this approach eligible for real-time
applications.Comment: ICPR 2014, International Conference on Pattern Recognition, Stockholm
: Sweden (2014
Unsupervised Learning of Complex Articulated Kinematic Structures combining Motion and Skeleton Information
In this paper we present a novel framework for unsupervised kinematic structure learning of complex articulated objects from a single-view image sequence. In contrast to prior motion information based methods, which estimate relatively simple articulations, our method can generate arbitrarily complex kinematic structures with skeletal topology by a successive iterative merge process. The iterative merge process is guided by a skeleton distance function which is generated from a novel object boundary generation method from sparse points. Our main contributions can be summarised as follows: (i) Unsupervised complex articulated kinematic structure learning by combining motion and skeleton information. (ii) Iterative fine-to-coarse merging strategy for adaptive motion segmentation and structure smoothing. (iii) Skeleton estimation from sparse feature points. (iv) A new highly articulated object dataset containing multi-stage complexity with ground truth. Our experiments show that the proposed method out-performs state-of-the-art methods both quantitatively and qualitatively
Pushing the envelope for estimating poses and actions via full 3D reconstruction
Estimating poses and actions of human bodies and hands is an important task in the computer vision community due to its vast applications, including human
computer interaction, virtual reality and augmented reality, medical image analysis. Challenges: There are many in-the-wild challenges in this task (see chapter 1). Among them, in this thesis, we focused on two challenges which could be relieved by incorporating the 3D geometry: (1) inherent 2D-to-3D ambiguity driven by the non-linear 2D projection process when capturing 3D objects. (2) lack of sufficient and quality annotated datasets due to the high-dimensionality of subjects' attribute space and inherent difficulty in annotating 3D coordinate values. Contributions: We first tried to jointly tackle the 2D-to-3D ambiguity and insufficient data issues by (1) explicitly reconstructing 2.5D and 3D samples and use them as new training data to train a pose estimator. Next, we tried to (2) encode 3D geometry in the training process of the action recognizer to reduce the 2D-to-3D ambiguity. In appendix, we proposed a (3) new hand pose synthetic dataset that can be used for more complete attribute changes and multi-modal experiments in the future. Experiments: Throughout experiments, we found interesting facts: (1) 2.5D depth map reconstruction and data augmentation can improve the accuracy of the depth-based hand pose estimation algorithm, (2) 3D mesh reconstruction can be used to generate a new RGB data and it improves the accuracy of RGB-based dense hand pose estimation algorithm, (3) 3D geometry from 3D poses and scene layouts could be successfully utilized to reduce the 2D-to-3D ambiguity in the action recognition problem.Open Acces
Forecasting Human Dynamics from Static Images
This paper presents the first study on forecasting human dynamics from static
images. The problem is to input a single RGB image and generate a sequence of
upcoming human body poses in 3D. To address the problem, we propose the 3D Pose
Forecasting Network (3D-PFNet). Our 3D-PFNet integrates recent advances on
single-image human pose estimation and sequence prediction, and converts the 2D
predictions into 3D space. We train our 3D-PFNet using a three-step training
strategy to leverage a diverse source of training data, including image and
video based human pose datasets and 3D motion capture (MoCap) data. We
demonstrate competitive performance of our 3D-PFNet on 2D pose forecasting and
3D pose recovery through quantitative and qualitative results.Comment: Accepted in CVPR 201
- …