2,233 research outputs found

    Human Motion Analysis: From Gait Modeling to Shape Representation and Pose Estimation

    Get PDF
    This dissertation presents a series of fundamental approaches to the human motion analysis from three perspectives, i.e., manifold learning-based gait motion modeling, articulated shape representation and efficient pose estimation. Firstly, a new joint gait-pose manifold (JGPM) learning algorithm is proposed to jointly optimize the gait and pose variables simultaneously. To enhance the representability and flexibility for complex motion modeling, we also propose a multi-layer JGPM that is capable of dealing with a variety of walking styles and various strides. We resort to a topologically-constrained Gaussian process latent variable model (GPLVM) to learn the multi-layer JGPM where two new techniques are introduced to facilitate model learning. First is training data diversification that creates a set of simulated motion data with different strides under limited data. Second is the topology-aware local learning that is to speed up model learning by taking advantage of the local topological structure. We demonstrate the effectiveness of our approach by synthesizing the high-quality motions from the multi-layer model. The experimental results show that the multi-layer JGPM outperforms several existing GPLVM-based models in terms of the overall performance of motion modeling.On the other hand, to achieve efficient human pose estimation from a single depth sensor, we develop a generalized Gaussian kernel correlation (GKC)-based framework which supports not only body shape modeling, but also articulated pose tracking. We first generalize GKC from the univariate Gaussian to the multivariate one and derive a unified GKC function that provides a continuous and differentiable similarity measure between a template and an observation, both of which are represented by a collection of univariate and/or multivariate Gaussian kernels. Then, to facilitate the data matching and accommodate articulated body deformation, we embed a quaternion-based articulated skeleton into a collection of multivariate Gaussians-based template model and develop an articulated GKC (AGKC) which supports subject-specific shape modeling and articulated pose tracking for both the full-body and hand. Our tracking algorithm is simple yet effective and computationally efficient. We evaluate our algorithm on two benchmark depth datasets. The experimental results are promising and competitive when compared with state-of-the-art algorithms.Electrical Engineerin

    Cascaded 3D Full-body Pose Regression from Single Depth Image at 100 FPS

    Full text link
    There are increasing real-time live applications in virtual reality, where it plays an important role in capturing and retargetting 3D human pose. But it is still challenging to estimate accurate 3D pose from consumer imaging devices such as depth camera. This paper presents a novel cascaded 3D full-body pose regression method to estimate accurate pose from a single depth image at 100 fps. The key idea is to train cascaded regressors based on Gradient Boosting algorithm from pre-recorded human motion capture database. By incorporating hierarchical kinematics model of human pose into the learning procedure, we can directly estimate accurate 3D joint angles instead of joint positions. The biggest advantage of this model is that the bone length can be preserved during the whole 3D pose estimation procedure, which leads to more effective features and higher pose estimation accuracy. Our method can be used as an initialization procedure when combining with tracking methods. We demonstrate the power of our method on a wide range of synthesized human motion data from CMU mocap database, Human3.6M dataset and real human movements data captured in real time. In our comparison against previous 3D pose estimation methods and commercial system such as Kinect 2017, we achieve the state-of-the-art accuracy

    Multi-Context Attention for Human Pose Estimation

    Full text link
    In this paper, we propose to incorporate convolutional neural networks with a multi-context attention mechanism into an end-to-end framework for human pose estimation. We adopt stacked hourglass networks to generate attention maps from features at multiple resolutions with various semantics. The Conditional Random Field (CRF) is utilized to model the correlations among neighboring regions in the attention map. We further combine the holistic attention model, which focuses on the global consistency of the full human body, and the body part attention model, which focuses on the detailed description for different body parts. Hence our model has the ability to focus on different granularity from local salient regions to global semantic-consistent spaces. Additionally, we design novel Hourglass Residual Units (HRUs) to increase the receptive field of the network. These units are extensions of residual units with a side branch incorporating filters with larger receptive fields, hence features with various scales are learned and combined within the HRUs. The effectiveness of the proposed multi-context attention mechanism and the hourglass residual units is evaluated on two widely used human pose estimation benchmarks. Our approach outperforms all existing methods on both benchmarks over all the body parts.Comment: The first two authors contribute equally to this wor

    Learning to Transform Time Series with a Few Examples

    Get PDF
    We describe a semi-supervised regression algorithm that learns to transform one time series into another time series given examples of the transformation. This algorithm is applied to tracking, where a time series of observations from sensors is transformed to a time series describing the pose of a target. Instead of defining and implementing such transformations for each tracking task separately, our algorithm learns a memoryless transformation of time series from a few example input-output mappings. The algorithm searches for a smooth function that fits the training examples and, when applied to the input time series, produces a time series that evolves according to assumed dynamics. The learning procedure is fast and lends itself to a closed-form solution. It is closely related to nonlinear system identification and manifold learning techniques. We demonstrate our algorithm on the tasks of tracking RFID tags from signal strength measurements, recovering the pose of rigid objects, deformable bodies, and articulated bodies from video sequences. For these tasks, this algorithm requires significantly fewer examples compared to fully-supervised regression algorithms or semi-supervised learning algorithms that do not take the dynamics of the output time series into account

    MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation

    Full text link
    In this work, we propose a novel and efficient method for articulated human pose estimation in videos using a convolutional network architecture, which incorporates both color and motion features. We propose a new human body pose dataset, FLIC-motion, that extends the FLIC dataset with additional motion features. We apply our architecture to this dataset and report significantly better performance than current state-of-the-art pose detection systems

    Efficient Object Localization Using Convolutional Networks

    Full text link
    Recent state-of-the-art performance on human-body pose estimation has been achieved with Deep Convolutional Networks (ConvNets). Traditional ConvNet architectures include pooling and sub-sampling layers which reduce computational requirements, introduce invariance and prevent over-training. These benefits of pooling come at the cost of reduced localization accuracy. We introduce a novel architecture which includes an efficient `position refinement' model that is trained to estimate the joint offset location within a small region of the image. This refinement model is jointly trained in cascade with a state-of-the-art ConvNet model to achieve improved accuracy in human joint location estimation. We show that the variance of our detector approaches the variance of human annotations on the FLIC dataset and outperforms all existing approaches on the MPII-human-pose dataset.Comment: 8 pages with 1 page of citation
    • …
    corecore