95,662 research outputs found
Cascaded 3D Full-body Pose Regression from Single Depth Image at 100 FPS
There are increasing real-time live applications in virtual reality, where it
plays an important role in capturing and retargetting 3D human pose. But it is
still challenging to estimate accurate 3D pose from consumer imaging devices
such as depth camera. This paper presents a novel cascaded 3D full-body pose
regression method to estimate accurate pose from a single depth image at 100
fps. The key idea is to train cascaded regressors based on Gradient Boosting
algorithm from pre-recorded human motion capture database. By incorporating
hierarchical kinematics model of human pose into the learning procedure, we can
directly estimate accurate 3D joint angles instead of joint positions. The
biggest advantage of this model is that the bone length can be preserved during
the whole 3D pose estimation procedure, which leads to more effective features
and higher pose estimation accuracy. Our method can be used as an
initialization procedure when combining with tracking methods. We demonstrate
the power of our method on a wide range of synthesized human motion data from
CMU mocap database, Human3.6M dataset and real human movements data captured in
real time. In our comparison against previous 3D pose estimation methods and
commercial system such as Kinect 2017, we achieve the state-of-the-art
accuracy
Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks
Recently, skeleton based action recognition gains more popularity due to
cost-effective depth sensors coupled with real-time skeleton estimation
algorithms. Traditional approaches based on handcrafted features are limited to
represent the complexity of motion patterns. Recent methods that use Recurrent
Neural Networks (RNN) to handle raw skeletons only focus on the contextual
dependency in the temporal domain and neglect the spatial configurations of
articulated skeletons. In this paper, we propose a novel two-stream RNN
architecture to model both temporal dynamics and spatial configurations for
skeleton based action recognition. We explore two different structures for the
temporal stream: stacked RNN and hierarchical RNN. Hierarchical RNN is designed
according to human body kinematics. We also propose two effective methods to
model the spatial structure by converting the spatial graph into a sequence of
joints. To improve generalization of our model, we further exploit 3D
transformation based data augmentation techniques including rotation and
scaling transformation to transform the 3D coordinates of skeletons during
training. Experiments on 3D action recognition benchmark datasets show that our
method brings a considerable improvement for a variety of actions, i.e.,
generic actions, interaction activities and gestures.Comment: Accepted to IEEE International Conference on Computer Vision and
Pattern Recognition (CVPR) 201
Combined Feature-Level Video Indexing Using Block-Based Motion Estimation.
We describe a method for attaching content-based labels to video data using a weighted combination of low-level features (such as colour, texture, motion, etc.) estimated during motion analysis. Every frame of a video sequence is modeled using a fixed set of low-level feature attributes together with a set of corresponding weights using a block-based motion estimation technique. Indexing a new video involves an alternative scheme in which the weights of the features are first estimated and then classification is performed to determine the label corresponding to the video. A hierarchical architecture of increasingly complexity is used to achieve robust indexing of new videos. We explore the effect of different model parameters on performance and prove that the proposed method is effective using publicly available datasets
Co-attention Propagation Network for Zero-Shot Video Object Segmentation
Zero-shot video object segmentation (ZS-VOS) aims to segment foreground
objects in a video sequence without prior knowledge of these objects. However,
existing ZS-VOS methods often struggle to distinguish between foreground and
background or to keep track of the foreground in complex scenarios. The common
practice of introducing motion information, such as optical flow, can lead to
overreliance on optical flow estimation. To address these challenges, we
propose an encoder-decoder-based hierarchical co-attention propagation network
(HCPN) capable of tracking and segmenting objects. Specifically, our model is
built upon multiple collaborative evolutions of the parallel co-attention
module (PCM) and the cross co-attention module (CCM). PCM captures common
foreground regions among adjacent appearance and motion features, while CCM
further exploits and fuses cross-modal motion features returned by PCM. Our
method is progressively trained to achieve hierarchical spatio-temporal feature
propagation across the entire video. Experimental results demonstrate that our
HCPN outperforms all previous methods on public benchmarks, showcasing its
effectiveness for ZS-VOS.Comment: accepted by IEEE Transactions on Image Processin
Backing off: hierarchical decomposition of activity for 3D novel pose recovery
For model-based 3D human pose estimation, even simple models of the human body lead to high-dimensional state spaces. Where the class of activity is known a priori, low-dimensional activity models learned from training data make possible a thorough and efficient search for the best pose. Conversely, searching for solutions in the full state space places no restriction on the class of motion to be recovered, but is both difficult and expensive. This paper explores a potential middle ground between these approaches, using the hierarchical Gaussian process latent variable model to learn activity at different hierarchical scales within the human skeleton. We show that by training on full-body activity data then descending through the hierarchy in stages and exploring subtrees independently of one another, novel poses may be recovered. Experimental results on motion capture data and monocular video sequences demonstrate the utility of the approach, and comparisons are drawn with existing low-dimensional activity models. © 2009. The copyright of this document resides with its authors
Backing off: hierarchical decomposition of activity for 3D novel pose recovery
For model-based 3D human pose estimation, even simple models of the human body lead to high-dimensional state spaces. Where the class of activity is known a priori, low-dimensional activity models learned from training data make possible a thorough and efficient search for the best pose. Conversely, searching for solutions in the full state space places no restriction on the class of motion to be recovered, but is both difficult and expensive. This paper explores a potential middle ground between these approaches, using the hierarchical Gaussian process latent variable model to learn activity at different hierarchical scales within the human skeleton. We show that by training on full-body activity data then descending through the hierarchy in stages and exploring subtrees independently of one another, novel poses may be recovered. Experimental results on motion capture data and monocular video sequences demonstrate the utility of the approach, and comparisons are drawn with existing low-dimensional activity models. © 2009. The copyright of this document resides with its authors
Deep learning‐based method for reducing residual motion effects in diffusion parameter estimation
PURPOSE: Conventional motion-correction techniques for diffusion MRI can introduce motion-level-dependent bias in derived metrics. To address this challenge, a deep learning-based technique was developed to minimize such residual motion effects. METHODS: The data-rejection approach was adopted in which motion-corrupted data are discarded before model-fitting. A deep learning-based parameter estimation algorithm, using a hierarchical convolutional neural network (H-CNN), was combined with motion assessment and corrupted volume rejection. The method was designed to overcome the limitations of existing methods of this kind that produce parameter estimations whose quality depends strongly on a proportion of the data discarded. Evaluation experiments were conducted for the estimation of diffusion kurtosis and diffusion-tensor-derived measures at both the individual and group levels. The performance was compared with the robust approach of iteratively reweighted linear least squares (IRLLS) after motion correction with and without outlier replacement. RESULTS: Compared with IRLLS, the H-CNN-based technique is minimally sensitive to motion effects. It was tested at severe motion levels when 70% to 90% of the data are rejected and when random motion is present. The technique had a stable performance independent of the numbers and schemes of data rejection. A further test on a data set from children with attention-deficit hyperactivity disorder shows the technique can potentially ameliorate spurious group-level difference caused by head motion. CONCLUSION: This method shows great potential for reducing residual motion effects in motion-corrupted diffusion-weighted-imaging data, bringing benefits that include reduced bias in derived metrics in individual scans and reduced motion-level-dependent bias in population studies employing diffusion MRI
- …