Search CORE

95,662 research outputs found

Cascaded 3D Full-body Pose Regression from Single Depth Image at 100 FPS

Author: Su Le
Xia Shihong
Zhang Zihao
Publication venue
Publication date: 25/01/2018
Field of study

There are increasing real-time live applications in virtual reality, where it plays an important role in capturing and retargetting 3D human pose. But it is still challenging to estimate accurate 3D pose from consumer imaging devices such as depth camera. This paper presents a novel cascaded 3D full-body pose regression method to estimate accurate pose from a single depth image at 100 fps. The key idea is to train cascaded regressors based on Gradient Boosting algorithm from pre-recorded human motion capture database. By incorporating hierarchical kinematics model of human pose into the learning procedure, we can directly estimate accurate 3D joint angles instead of joint positions. The biggest advantage of this model is that the bone length can be preserved during the whole 3D pose estimation procedure, which leads to more effective features and higher pose estimation accuracy. Our method can be used as an initialization procedure when combining with tracking methods. We demonstrate the power of our method on a wide range of synthesized human motion data from CMU mocap database, Human3.6M dataset and real human movements data captured in real time. In our comparison against previous 3D pose estimation methods and commercial system such as Kinect 2017, we achieve the state-of-the-art accuracy

arXiv.org e-Print Archive

Crossref

Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks

Author: Wang Hongsong
Wang Liang
Publication venue
Publication date: 12/04/2017
Field of study

Recently, skeleton based action recognition gains more popularity due to cost-effective depth sensors coupled with real-time skeleton estimation algorithms. Traditional approaches based on handcrafted features are limited to represent the complexity of motion patterns. Recent methods that use Recurrent Neural Networks (RNN) to handle raw skeletons only focus on the contextual dependency in the temporal domain and neglect the spatial configurations of articulated skeletons. In this paper, we propose a novel two-stream RNN architecture to model both temporal dynamics and spatial configurations for skeleton based action recognition. We explore two different structures for the temporal stream: stacked RNN and hierarchical RNN. Hierarchical RNN is designed according to human body kinematics. We also propose two effective methods to model the spatial structure by converting the spatial graph into a sequence of joints. To improve generalization of our model, we further exploit 3D transformation based data augmentation techniques including rotation and scaling transformation to transform the 3D coordinates of skeletons during training. Experiments on 3D action recognition benchmark datasets show that our method brings a considerable improvement for a variety of actions, i.e., generic actions, interaction activities and gestures.Comment: Accepted to IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 201

arXiv.org e-Print Archive

Crossref

Combined Feature-Level Video Indexing Using Block-Based Motion Estimation.

Author: Bhaskar H.
Mihaylova L.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/07/2010
Field of study

We describe a method for attaching content-based labels to video data using a weighted combination of low-level features (such as colour, texture, motion, etc.) estimated during motion analysis. Every frame of a video sequence is modeled using a fixed set of low-level feature attributes together with a set of corresponding weights using a block-based motion estimation technique. Indexing a new video involves an alternative scheme in which the weights of the features are first estimated and then classification is performed to determine the label corresponding to the video. A hierarchical architecture of increasingly complexity is used to achieve robust indexing of new videos. We explore the effect of different model parameters on performance and prove that the proposed method is effective using publicly available datasets

Crossref

Lancaster E-Prints

Co-attention Propagation Network for Zero-Shot Video Object Segmentation

Author: Huang Dan
Huang Xingguo
Pei Gensheng
Shen Fumin
Shen Heng-Tao
Yao Yazhou
Publication venue
Publication date: 08/04/2023
Field of study

Zero-shot video object segmentation (ZS-VOS) aims to segment foreground objects in a video sequence without prior knowledge of these objects. However, existing ZS-VOS methods often struggle to distinguish between foreground and background or to keep track of the foreground in complex scenarios. The common practice of introducing motion information, such as optical flow, can lead to overreliance on optical flow estimation. To address these challenges, we propose an encoder-decoder-based hierarchical co-attention propagation network (HCPN) capable of tracking and segmenting objects. Specifically, our model is built upon multiple collaborative evolutions of the parallel co-attention module (PCM) and the cross co-attention module (CCM). PCM captures common foreground regions among adjacent appearance and motion features, while CCM further exploits and fuses cross-modal motion features returned by PCM. Our method is progressively trained to achieve hierarchical spatio-temporal feature propagation across the entire video. Experimental results demonstrate that our HCPN outperforms all previous methods on public benchmarks, showcasing its effectiveness for ZS-VOS.Comment: accepted by IEEE Transactions on Image Processin

arXiv.org e-Print Archive

Backing off: hierarchical decomposition of activity for 3D novel pose recovery

Author: Baihua Li (1253553)
David Fleet (804534)
John Darby (7167812)
Neil Lawrence (3568952)
Nicholas Costen (7167815)
Publication venue
Publication date: 01/01/2009
Field of study

For model-based 3D human pose estimation, even simple models of the human body lead to high-dimensional state spaces. Where the class of activity is known a priori, low-dimensional activity models learned from training data make possible a thorough and efficient search for the best pose. Conversely, searching for solutions in the full state space places no restriction on the class of motion to be recovered, but is both difficult and expensive. This paper explores a potential middle ground between these approaches, using the hierarchical Gaussian process latent variable model to learn activity at different hierarchical scales within the human skeleton. We show that by training on full-body activity data then descending through the hierarchy in stages and exploring subtrees independently of one another, novel poses may be recovered. Experimental results on motion capture data and monocular video sequences demonstrate the utility of the approach, and comparisons are drawn with existing low-dimensional activity models. © 2009. The copyright of this document resides with its authors

CiteSeerX

Loughborough University Institutional Repository

Crossref

Backing off: hierarchical decomposition of activity for 3D novel pose recovery

Author: Baihua Li (1253553)
David Fleet (804534)
John Darby (7167812)
Neil Lawrence (3568952)
Nicholas Costen (7167815)
Publication venue
Publication date: 01/01/2009
Field of study

Loughborough University Institutional Repository

Deep learning‐based method for reducing residual motion effects in diffusion parameter estimation

Author: Gong T
He H
Li Z
Tong Q
Zhang H
Zhong J
Publication venue
Publication date: 01/04/2021
Field of study

PURPOSE: Conventional motion-correction techniques for diffusion MRI can introduce motion-level-dependent bias in derived metrics. To address this challenge, a deep learning-based technique was developed to minimize such residual motion effects. METHODS: The data-rejection approach was adopted in which motion-corrupted data are discarded before model-fitting. A deep learning-based parameter estimation algorithm, using a hierarchical convolutional neural network (H-CNN), was combined with motion assessment and corrupted volume rejection. The method was designed to overcome the limitations of existing methods of this kind that produce parameter estimations whose quality depends strongly on a proportion of the data discarded. Evaluation experiments were conducted for the estimation of diffusion kurtosis and diffusion-tensor-derived measures at both the individual and group levels. The performance was compared with the robust approach of iteratively reweighted linear least squares (IRLLS) after motion correction with and without outlier replacement. RESULTS: Compared with IRLLS, the H-CNN-based technique is minimally sensitive to motion effects. It was tested at severe motion levels when 70% to 90% of the data are rejected and when random motion is present. The technique had a stable performance independent of the numbers and schemes of data rejection. A further test on a data set from children with attention-deficit hyperactivity disorder shows the technique can potentially ameliorate spurious group-level difference caused by head motion. CONCLUSION: This method shows great potential for reducing residual motion effects in motion-corrupted diffusion-weighted-imaging data, bringing benefits that include reduced bias in derived metrics in individual scans and reduced motion-level-dependent bias in population studies employing diffusion MRI

UCL Discovery