Search CORE

835 research outputs found

Auto-Conditioned Recurrent Networks for Extended Complex Human Motion Synthesis

Author: He Chong
Huang Zeng
Li Hao
Li Zimo
Xiao Shuangjiu
Zhou Yi
Publication venue
Publication date: 09/07/2018
Field of study

We present a real-time method for synthesizing highly complex human motions using a novel training regime we call the auto-conditioned Recurrent Neural Network (acRNN). Recently, researchers have attempted to synthesize new motion by using autoregressive techniques, but existing methods tend to freeze or diverge after a couple of seconds due to an accumulation of errors that are fed back into the network. Furthermore, such methods have only been shown to be reliable for relatively simple human motions, such as walking or running. In contrast, our approach can synthesize arbitrary motions with highly complex styles, including dances or martial arts in addition to locomotion. The acRNN is able to accomplish this by explicitly accommodating for autoregressive noise accumulation during training. Our work is the first to our knowledge that demonstrates the ability to generate over 18,000 continuous frames (300 seconds) of new complex human motion w.r.t. different styles

arXiv.org e-Print Archive

Auto-conditioned Recurrent Mixture Density Networks for Learning Generalizable Robot Skills

Author: Heiden Eric
Lim Joseph J.
Nikolaidis Stefanos
Sukhatme Gaurav S.
Zhang Hejia
Publication venue
Publication date: 19/03/2019
Field of study

Personal robots assisting humans must perform complex manipulation tasks that are typically difficult to specify in traditional motion planning pipelines, where multiple objectives must be met and the high-level context be taken into consideration. Learning from demonstration (LfD) provides a promising way to learn these kind of complex manipulation skills even from non-technical users. However, it is challenging for existing LfD methods to efficiently learn skills that can generalize to task specifications that are not covered by demonstrations. In this paper, we introduce a state transition model (STM) that generates joint-space trajectories by imitating motions from expert behavior. Given a few demonstrations, we show in real robot experiments that the learned STM can quickly generalize to unseen tasks and synthesize motions having longer time horizons than the expert trajectories. Compared to conventional motion planners, our approach enables the robot to accomplish complex behaviors from high-level instructions without laborious hand-engineering of planning objectives, while being able to adapt to changing goals during the skill execution. In conjunction with a trajectory optimizer, our STM can construct a high-quality skeleton of a trajectory that can be further improved in smoothness and precision. In combination with a learned inverse dynamics model, we additionally present results where the STM is used as a high-level planner. A video of our experiments is available at https://youtu.be/85DX9Ojq-90Comment: Submitted to IROS 201

arXiv.org e-Print Archive

MT-VAE: Learning Motion Transformations to Generate Multimodal Human Dynamics

Author: Hadap Sunil
Lee Honglak
Rastogi Akash
Shechtman Eli
Sunkavalli Kalyan
Villegas Ruben
Yan Xinchen
Yumer Ersin
Publication venue
Publication date: 14/08/2018
Field of study

Long-term human motion can be represented as a series of motion modes---motion sequences that capture short-term temporal dynamics---with transitions between them. We leverage this structure and present a novel Motion Transformation Variational Auto-Encoders (MT-VAE) for learning motion sequence generation. Our model jointly learns a feature embedding for motion modes (that the motion sequence can be reconstructed from) and a feature transformation that represents the transition of one motion mode to the next motion mode. Our model is able to generate multiple diverse and plausible motion sequences in the future from the same input. We apply our approach to both facial and full body motion, and demonstrate applications like analogy-based motion transfer and video synthesis.Comment: Published at ECCV 201

arXiv.org e-Print Archive

Unsupervised Feature Learning of Human Actions as Trajectories in Pose Embedding Manifold

Author: Babu R. Venkatesh
Gor Maharshi
Kundu Jogendra Nath
Uppala Phani Krishna
Publication venue
Publication date: 06/12/2018
Field of study

An unsupervised human action modeling framework can provide useful pose-sequence representation, which can be utilized in a variety of pose analysis applications. In this work we propose a novel temporal pose-sequence modeling framework, which can embed the dynamics of 3D human-skeleton joints to a continuous latent space in an efficient manner. In contrast to end-to-end framework explored by previous works, we disentangle the task of individual pose representation learning from the task of learning actions as a trajectory in pose embedding space. In order to realize a continuous pose embedding manifold with improved reconstructions, we propose an unsupervised, manifold learning procedure named Encoder GAN, (or EnGAN). Further, we use the pose embeddings generated by EnGAN to model human actions using a bidirectional RNN auto-encoder architecture, PoseRNN. We introduce first-order gradient loss to explicitly enforce temporal regularity in the predicted motion sequence. A hierarchical feature fusion technique is also investigated for simultaneous modeling of local skeleton joints along with global pose variations. We demonstrate state-of-the-art transfer-ability of the learned representation against other supervisedly and unsupervisedly learned motion embeddings for the task of fine-grained action recognition on SBU interaction dataset. Further, we show the qualitative strengths of the proposed framework by visualizing skeleton pose reconstructions and interpolations in pose-embedding space, and low dimensional principal component projections of the reconstructed pose trajectories.Comment: Accepted at WACV 201

arXiv.org e-Print Archive

Learning Bidirectional LSTM Networks for Synthesizing 3D Mesh Animation Sequences

Author: Gao Lin
Lai Yu-Kun
Qiao Yi-Ling
Xia Shihong
Publication venue
Publication date: 03/10/2018
Field of study

In this paper, we present a novel method for learning to synthesize 3D mesh animation sequences with long short-term memory (LSTM) blocks and mesh-based convolutional neural networks (CNNs). Synthesizing realistic 3D mesh animation sequences is a challenging and important task in computer animation. To achieve this, researchers have long been focusing on shape analysis to develop new interpolation and extrapolation techniques. However, such techniques have limited learning capabilities and therefore can produce unrealistic animation. Deep architectures that operate directly on mesh sequences remain unexplored, due to the following major barriers: meshes with irregular triangles, sequences containing rich temporal information and flexible deformations. To address these, we utilize convolutional neural networks defined on triangular meshes along with a shape deformation representation to extract useful features, followed by LSTM cells that iteratively process the features. To allow completion of a missing mesh sequence from given endpoints, we propose a new weight-shared bidirectional structure. The bidirectional generation loss also helps mitigate error accumulation over iterations. Benefiting from all these technical advances, our approach outperforms existing methods in sequence prediction and completion both qualitatively and quantitatively. Moreover, this network can also generate follow-up frames conditioned on initial shapes and improve the accuracy as more bootstrap models are provided, which other works in the geometry processing domain cannot achieve

arXiv.org e-Print Archive

Recurrent Transition Networks for Character Locomotion

Author: Harvey Félix G.
Pal Christopher
Publication venue
Publication date: 17/01/2019
Field of study

Manually authoring transition animations for a complete locomotion system can be a tedious and time-consuming task, especially for large games that allow complex and constrained locomotion movements, where the number of transitions grows exponentially with the number of states. In this paper, we present a novel approach, based on deep recurrent neural networks, to automatically generate such transitions given a past context of a few frames and a target character state to reach. We present the Recurrent Transition Network (RTN), based on a modified version of the Long-Short-Term-Memory (LSTM) network, designed specifically for transition generation and trained without any gait, phase, contact or action labels. We further propose a simple yet principled way to initialize the hidden states of the LSTM layer for a given sequence which improves the performance and generalization to new motions. We both quantitatively and qualitatively evaluate our system and show that making the network terrain-aware by adding a local terrain representation to the input yields better performance for rough-terrain navigation on long transitions. Our system produces realistic and fluid transitions that rival the quality of Motion Capture-based ground-truth motions, even before applying any inverse-kinematics postprocess. Direct benefits of our approach could be to accelerate the creation of transition variations for large coverage, or even to entirely replace transition nodes in an animation graph. We further explore applications of this model in a animation super-resolution setting where we temporally decompress animations saved at 1 frame per second and show that the network is able to reconstruct motions that are hard to distinguish from un-compressed locomotion sequences.Comment: revision fixes: clarity issues in Section 4.4 (text and equations

arXiv.org e-Print Archive

An Evaluation of Trajectory Prediction Approaches and Notes on the TrajNet Benchmark

Author: Arens Michael
Becker Stefan
Hug Ronny
Hübner Wolfgang
Publication venue
Publication date: 16/08/2018
Field of study

In recent years, there is a shift from modeling the tracking problem based on Bayesian formulation towards using deep neural networks. Towards this end, in this paper the effectiveness of various deep neural networks for predicting future pedestrian paths are evaluated. The analyzed deep networks solely rely, like in the traditional approaches, on observed tracklets without human-human interaction information. The evaluation is done on the publicly available TrajNet benchmark dataset, which builds up a repository of considerable and popular datasets for trajectory-based activity forecasting. We show that a Recurrent-Encoder with a Dense layer stacked on top, referred to as RED-predictor, is able to achieve sophisticated results compared to elaborated models in such scenarios. Further, we investigate failure cases and give explanations for observed phenomena and give some recommendations for overcoming demonstrated shortcomings.Comment: Accepted at ECCV Workshop on Anticipating Human Behavior under adapted title. RED: A simple but effective Baseline Predictor for the TrajNet Benchmar

arXiv.org e-Print Archive

Towards 3D Dance Motion Synthesis and Control

Author: Fu Yun
Robinson Joseph
Shao Ming
Wang Congyi
Wang Yangang
Xia Siyu
Zhuang Wenlin
Publication venue
Publication date: 10/06/2020
Field of study

3D human dance motion is a cooperative and elegant social movement. Unlike regular simple locomotion, it is challenging to synthesize artistic dance motions due to the irregularity, kinematic complexity and diversity. It requires the synthesized dance is realistic, diverse and controllable. In this paper, we propose a novel generative motion model based on temporal convolution and LSTM,TC-LSTM, to synthesize realistic and diverse dance motion. We introduce a unique control signal, dance melody line, to heighten controllability. Hence, our model, and its switch for control signals, promote a variety of applications: random dance synthesis, music-to-dance, user control, and more. Our experiments demonstrate that our model can synthesize artistic dance motion in various dance types. Compared with existing methods, our method achieved start-of-the-art results.Comment: 9 page

arXiv.org e-Print Archive

Audio to Body Dynamics

Author: Dery Lucio M.
Kemelmacher-Shlizerman Ira
Schoen Hayden
Shlizerman Eli
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/12/2017
Field of study

We present a method that gets as input an audio of violin or piano playing, and outputs a video of skeleton predictions which are further used to animate an avatar. The key idea is to create an animation of an avatar that moves their hands similarly to how a pianist or violinist would do, just from audio. Aiming for a fully detailed correct arms and fingers motion is a goal, however, it's not clear if body movement can be predicted from music at all. In this paper, we present the first result that shows that natural body dynamics can be predicted at all. We built an LSTM network that is trained on violin and piano recital videos uploaded to the Internet. The predicted points are applied onto a rigged avatar to create the animation.Comment: Link with videos https://arviolin.github.io/AudioBodyDynamics

arXiv.org e-Print Archive

To Create What You Tell: Generating Videos from Captions

Author: Li Houqiang
Mei Tao
Pan Yingwei
Qiu Zhaofan
Yao Ting
Publication venue
Publication date: 23/04/2018
Field of study

We are creating multimedia contents everyday and everywhere. While automatic content generation has played a fundamental challenge to multimedia community for decades, recent advances of deep learning have made this problem feasible. For example, the Generative Adversarial Networks (GANs) is a rewarding approach to synthesize images. Nevertheless, it is not trivial when capitalizing on GANs to generate videos. The difficulty originates from the intrinsic structure where a video is a sequence of visually coherent and semantically dependent frames. This motivates us to explore semantic and temporal coherence in designing GANs to generate videos. In this paper, we present a novel Temporal GANs conditioning on Captions, namely TGANs-C, in which the input to the generator network is a concatenation of a latent noise vector and caption embedding, and then is transformed into a frame sequence with 3D spatio-temporal convolutions. Unlike the naive discriminator which only judges pairs as fake or real, our discriminator additionally notes whether the video matches the correct caption. In particular, the discriminator network consists of three discriminators: video discriminator classifying realistic videos from generated ones and optimizes video-caption matching, frame discriminator discriminating between real and fake frames and aligning frames with the conditioning caption, and motion discriminator emphasizing the philosophy that the adjacent frames in the generated videos should be smoothly connected as in real ones. We qualitatively demonstrate the capability of our TGANs-C to generate plausible videos conditioning on the given captions on two synthetic datasets (SBMG and TBMG) and one real-world dataset (MSVD). Moreover, quantitative experiments on MSVD are performed to validate our proposal via Generative Adversarial Metric and human study.Comment: ACM MM 2017 Brave New Ide

arXiv.org e-Print Archive