95,662 research outputs found

    Cascaded 3D Full-body Pose Regression from Single Depth Image at 100 FPS

    Full text link
    There are increasing real-time live applications in virtual reality, where it plays an important role in capturing and retargetting 3D human pose. But it is still challenging to estimate accurate 3D pose from consumer imaging devices such as depth camera. This paper presents a novel cascaded 3D full-body pose regression method to estimate accurate pose from a single depth image at 100 fps. The key idea is to train cascaded regressors based on Gradient Boosting algorithm from pre-recorded human motion capture database. By incorporating hierarchical kinematics model of human pose into the learning procedure, we can directly estimate accurate 3D joint angles instead of joint positions. The biggest advantage of this model is that the bone length can be preserved during the whole 3D pose estimation procedure, which leads to more effective features and higher pose estimation accuracy. Our method can be used as an initialization procedure when combining with tracking methods. We demonstrate the power of our method on a wide range of synthesized human motion data from CMU mocap database, Human3.6M dataset and real human movements data captured in real time. In our comparison against previous 3D pose estimation methods and commercial system such as Kinect 2017, we achieve the state-of-the-art accuracy

    Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks

    Full text link
    Recently, skeleton based action recognition gains more popularity due to cost-effective depth sensors coupled with real-time skeleton estimation algorithms. Traditional approaches based on handcrafted features are limited to represent the complexity of motion patterns. Recent methods that use Recurrent Neural Networks (RNN) to handle raw skeletons only focus on the contextual dependency in the temporal domain and neglect the spatial configurations of articulated skeletons. In this paper, we propose a novel two-stream RNN architecture to model both temporal dynamics and spatial configurations for skeleton based action recognition. We explore two different structures for the temporal stream: stacked RNN and hierarchical RNN. Hierarchical RNN is designed according to human body kinematics. We also propose two effective methods to model the spatial structure by converting the spatial graph into a sequence of joints. To improve generalization of our model, we further exploit 3D transformation based data augmentation techniques including rotation and scaling transformation to transform the 3D coordinates of skeletons during training. Experiments on 3D action recognition benchmark datasets show that our method brings a considerable improvement for a variety of actions, i.e., generic actions, interaction activities and gestures.Comment: Accepted to IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 201

    Combined Feature-Level Video Indexing Using Block-Based Motion Estimation.

    Get PDF
    We describe a method for attaching content-based labels to video data using a weighted combination of low-level features (such as colour, texture, motion, etc.) estimated during motion analysis. Every frame of a video sequence is modeled using a fixed set of low-level feature attributes together with a set of corresponding weights using a block-based motion estimation technique. Indexing a new video involves an alternative scheme in which the weights of the features are first estimated and then classification is performed to determine the label corresponding to the video. A hierarchical architecture of increasingly complexity is used to achieve robust indexing of new videos. We explore the effect of different model parameters on performance and prove that the proposed method is effective using publicly available datasets

    Co-attention Propagation Network for Zero-Shot Video Object Segmentation

    Full text link
    Zero-shot video object segmentation (ZS-VOS) aims to segment foreground objects in a video sequence without prior knowledge of these objects. However, existing ZS-VOS methods often struggle to distinguish between foreground and background or to keep track of the foreground in complex scenarios. The common practice of introducing motion information, such as optical flow, can lead to overreliance on optical flow estimation. To address these challenges, we propose an encoder-decoder-based hierarchical co-attention propagation network (HCPN) capable of tracking and segmenting objects. Specifically, our model is built upon multiple collaborative evolutions of the parallel co-attention module (PCM) and the cross co-attention module (CCM). PCM captures common foreground regions among adjacent appearance and motion features, while CCM further exploits and fuses cross-modal motion features returned by PCM. Our method is progressively trained to achieve hierarchical spatio-temporal feature propagation across the entire video. Experimental results demonstrate that our HCPN outperforms all previous methods on public benchmarks, showcasing its effectiveness for ZS-VOS.Comment: accepted by IEEE Transactions on Image Processin

    Backing off: hierarchical decomposition of activity for 3D novel pose recovery

    Get PDF
    For model-based 3D human pose estimation, even simple models of the human body lead to high-dimensional state spaces. Where the class of activity is known a priori, low-dimensional activity models learned from training data make possible a thorough and efficient search for the best pose. Conversely, searching for solutions in the full state space places no restriction on the class of motion to be recovered, but is both difficult and expensive. This paper explores a potential middle ground between these approaches, using the hierarchical Gaussian process latent variable model to learn activity at different hierarchical scales within the human skeleton. We show that by training on full-body activity data then descending through the hierarchy in stages and exploring subtrees independently of one another, novel poses may be recovered. Experimental results on motion capture data and monocular video sequences demonstrate the utility of the approach, and comparisons are drawn with existing low-dimensional activity models. © 2009. The copyright of this document resides with its authors

    Backing off: hierarchical decomposition of activity for 3D novel pose recovery

    Get PDF
    For model-based 3D human pose estimation, even simple models of the human body lead to high-dimensional state spaces. Where the class of activity is known a priori, low-dimensional activity models learned from training data make possible a thorough and efficient search for the best pose. Conversely, searching for solutions in the full state space places no restriction on the class of motion to be recovered, but is both difficult and expensive. This paper explores a potential middle ground between these approaches, using the hierarchical Gaussian process latent variable model to learn activity at different hierarchical scales within the human skeleton. We show that by training on full-body activity data then descending through the hierarchy in stages and exploring subtrees independently of one another, novel poses may be recovered. Experimental results on motion capture data and monocular video sequences demonstrate the utility of the approach, and comparisons are drawn with existing low-dimensional activity models. © 2009. The copyright of this document resides with its authors

    Deep learning‐based method for reducing residual motion effects in diffusion parameter estimation

    Get PDF
    PURPOSE: Conventional motion-correction techniques for diffusion MRI can introduce motion-level-dependent bias in derived metrics. To address this challenge, a deep learning-based technique was developed to minimize such residual motion effects. METHODS: The data-rejection approach was adopted in which motion-corrupted data are discarded before model-fitting. A deep learning-based parameter estimation algorithm, using a hierarchical convolutional neural network (H-CNN), was combined with motion assessment and corrupted volume rejection. The method was designed to overcome the limitations of existing methods of this kind that produce parameter estimations whose quality depends strongly on a proportion of the data discarded. Evaluation experiments were conducted for the estimation of diffusion kurtosis and diffusion-tensor-derived measures at both the individual and group levels. The performance was compared with the robust approach of iteratively reweighted linear least squares (IRLLS) after motion correction with and without outlier replacement. RESULTS: Compared with IRLLS, the H-CNN-based technique is minimally sensitive to motion effects. It was tested at severe motion levels when 70% to 90% of the data are rejected and when random motion is present. The technique had a stable performance independent of the numbers and schemes of data rejection. A further test on a data set from children with attention-deficit hyperactivity disorder shows the technique can potentially ameliorate spurious group-level difference caused by head motion. CONCLUSION: This method shows great potential for reducing residual motion effects in motion-corrupted diffusion-weighted-imaging data, bringing benefits that include reduced bias in derived metrics in individual scans and reduced motion-level-dependent bias in population studies employing diffusion MRI
    corecore