1,399 research outputs found

    V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map

    Full text link
    Most of the existing deep learning-based methods for 3D hand and human pose estimation from a single depth map are based on a common framework that takes a 2D depth map and directly regresses the 3D coordinates of keypoints, such as hand or human body joints, via 2D convolutional neural networks (CNNs). The first weakness of this approach is the presence of perspective distortion in the 2D depth map. While the depth map is intrinsically 3D data, many previous methods treat depth maps as 2D images that can distort the shape of the actual object through projection from 3D to 2D space. This compels the network to perform perspective distortion-invariant estimation. The second weakness of the conventional approach is that directly regressing 3D coordinates from a 2D image is a highly non-linear mapping, which causes difficulty in the learning procedure. To overcome these weaknesses, we firstly cast the 3D hand and human pose estimation problem from a single depth map into a voxel-to-voxel prediction that uses a 3D voxelized grid and estimates the per-voxel likelihood for each keypoint. We design our model as a 3D CNN that provides accurate estimates while running in real-time. Our system outperforms previous methods in almost all publicly available 3D hand and human pose estimation datasets and placed first in the HANDS 2017 frame-based 3D hand pose estimation challenge. The code is available in https://github.com/mks0601/V2V-PoseNet_RELEASE.Comment: HANDS 2017 Challenge Frame-based 3D Hand Pose Estimation Winner (ICCV 2017), Published at CVPR 201

    Yoga Pose Classification Using Deep Learning

    Get PDF
    Human pose estimation is a deep-rooted problem in computer vision that has exposed many challenges in the past. Analyzing human activities is beneficial in many fields like video- surveillance, biometrics, assisted living, at-home health monitoring etc. With our fast-paced lives these days, people usually prefer exercising at home but feel the need of an instructor to evaluate their exercise form. As these resources are not always available, human pose recognition can be used to build a self-instruction exercise system that allows people to learn and practice exercises correctly by themselves. This project lays the foundation for building such a system by discussing various machine learning and deep learning approaches to accurately classify yoga poses on prerecorded videos and also in real-time. The project also discusses various pose estimation and keypoint detection methods in detail and explains different deep learning models used for pose classification

    Robust arbitrary view gait recognition based on parametric 3D human body reconstruction and virtual posture synthesis

    Get PDF
    This paper proposes an arbitrary view gait recognition method where the gait recognition is performed in 3-dimensional (3D) to be robust to variation in speed, inclined plane and clothing, and in the presence of a carried item. 3D parametric gait models in a gait period are reconstructed by an optimized 3D human pose, shape and simulated clothes estimation method using multiview gait silhouettes. The gait estimation involves morphing a new subject with constant semantic constraints using silhouette cost function as observations. Using a clothes-independent 3D parametric gait model reconstruction method, gait models of different subjects with various postures in a cycle are obtained and used as galleries to construct 3D gait dictionary. Using a carrying-items posture synthesized model, virtual gait models with different carrying-items postures are synthesized to further construct an over-complete 3D gait dictionary. A self-occlusion optimized simultaneous sparse representation model is also introduced to achieve high robustness in limited gait frames. Experimental analyses on CASIA B dataset and CMU MoBo dataset show a significant performance gain in terms of accuracy and robustness

    Rethinking Person Re-identification from a Projection-on-Prototypes Perspective

    Full text link
    Person Re-IDentification (Re-ID) as a retrieval task, has achieved tremendous development over the past decade. Existing state-of-the-art methods follow an analogous framework to first extract features from the input images and then categorize them with a classifier. However, since there is no identity overlap between training and testing sets, the classifier is often discarded during inference. Only the extracted features are used for person retrieval via distance metrics. In this paper, we rethink the role of the classifier in person Re-ID, and advocate a new perspective to conceive the classifier as a projection from image features to class prototypes. These prototypes are exactly the learned parameters of the classifier. In this light, we describe the identity of input images as similarities to all prototypes, which are then utilized as more discriminative features to perform person Re-ID. We thereby propose a new baseline ProNet, which innovatively reserves the function of the classifier at the inference stage. To facilitate the learning of class prototypes, both triplet loss and identity classification loss are applied to features that undergo the projection by the classifier. An improved version of ProNet++ is presented by further incorporating multi-granularity designs. Experiments on four benchmarks demonstrate that our proposed ProNet is simple yet effective, and significantly beats previous baselines. ProNet++ also achieves competitive or even better results than transformer-based competitors

    The Devil of Face Recognition is in the Noise

    Full text link
    The growing scale of face recognition datasets empowers us to train strong convolutional networks for face recognition. While a variety of architectures and loss functions have been devised, we still have a limited understanding of the source and consequence of label noise inherent in existing datasets. We make the following contributions: 1) We contribute cleaned subsets of popular face databases, i.e., MegaFace and MS-Celeb-1M datasets, and build a new large-scale noise-controlled IMDb-Face dataset. 2) With the original datasets and cleaned subsets, we profile and analyze label noise properties of MegaFace and MS-Celeb-1M. We show that a few orders more samples are needed to achieve the same accuracy yielded by a clean subset. 3) We study the association between different types of noise, i.e., label flips and outliers, with the accuracy of face recognition models. 4) We investigate ways to improve data cleanliness, including a comprehensive user study on the influence of data labeling strategies to annotation accuracy. The IMDb-Face dataset has been released on https://github.com/fwang91/IMDb-Face.Comment: accepted to ECCV'1

    Deep representation learning for marker-less human posture analysis

    Full text link
    This thesis presents a holistic human posture analysis system. The proposed system leverages the state-of-the-art deep learning techniques to feature a comprehensive pipeline. Moreover, a new nonlinear computational layer is proposed to the deep convolutional neural network architectures to incorporate human perception capabilities into the deep learning architectures
    • …
    corecore