23,672 research outputs found

    Fall Prediction for New Sequences of Motions

    Full text link
    Abstract. Motions reinforce meanings in human-robot communication, when they are relevant and initiated at the right times. Given a task of using motions for an autonomous humanoid robot to communicate, different sequences of relevant motions are generated from the motion library. Each motion in the motion library is stable, but a sequence may cause the robot to be unstable and fall. We are interested in predicting if a sequence of motions will result in a fall, without executing the sequence on the robot. We contribute a novel algorithm, ProFeaSM, that uses only body angles collected during the execution of single motions and interpolations between pairs of motions, to predict whether a sequence will cause the robot to fall. We demonstrate the efficacy of ProFeaSM on the NAO humanoid robot in a real-time simulator, Webots, and on a real NAO and explore the trade-off between precision and recall

    HP-GAN: Probabilistic 3D human motion prediction via GAN

    Full text link
    Predicting and understanding human motion dynamics has many applications, such as motion synthesis, augmented reality, security, and autonomous vehicles. Due to the recent success of generative adversarial networks (GAN), there has been much interest in probabilistic estimation and synthetic data generation using deep neural network architectures and learning algorithms. We propose a novel sequence-to-sequence model for probabilistic human motion prediction, trained with a modified version of improved Wasserstein generative adversarial networks (WGAN-GP), in which we use a custom loss function designed for human motion prediction. Our model, which we call HP-GAN, learns a probability density function of future human poses conditioned on previous poses. It predicts multiple sequences of possible future human poses, each from the same input sequence but a different vector z drawn from a random distribution. Furthermore, to quantify the quality of the non-deterministic predictions, we simultaneously train a motion-quality-assessment model that learns the probability that a given skeleton sequence is a real human motion. We test our algorithm on two of the largest skeleton datasets: NTURGB-D and Human3.6M. We train our model on both single and multiple action types. Its predictive power for long-term motion estimation is demonstrated by generating multiple plausible futures of more than 30 frames from just 10 frames of input. We show that most sequences generated from the same input have more than 50\% probabilities of being judged as a real human sequence. We will release all the code used in this paper to Github

    Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos

    Full text link
    Learning to predict scene depth from RGB inputs is a challenging task both for indoor and outdoor robot navigation. In this work we address unsupervised learning of scene depth and robot ego-motion where supervision is provided by monocular videos, as cameras are the cheapest, least restrictive and most ubiquitous sensor for robotics. Previous work in unsupervised image-to-depth learning has established strong baselines in the domain. We propose a novel approach which produces higher quality results, is able to model moving objects and is shown to transfer across data domains, e.g. from outdoors to indoor scenes. The main idea is to introduce geometric structure in the learning process, by modeling the scene and the individual objects; camera ego-motion and object motions are learned from monocular videos as input. Furthermore an online refinement method is introduced to adapt learning on the fly to unknown domains. The proposed approach outperforms all state-of-the-art approaches, including those that handle motion e.g. through learned flow. Our results are comparable in quality to the ones which used stereo as supervision and significantly improve depth prediction on scenes and datasets which contain a lot of object motion. The approach is of practical relevance, as it allows transfer across environments, by transferring models trained on data collected for robot navigation in urban scenes to indoor navigation settings. The code associated with this paper can be found at https://sites.google.com/view/struct2depth.Comment: Thirty-Third AAAI Conference on Artificial Intelligence (AAAI'19

    Deep Video Generation, Prediction and Completion of Human Action Sequences

    Full text link
    Current deep learning results on video generation are limited while there are only a few first results on video prediction and no relevant significant results on video completion. This is due to the severe ill-posedness inherent in these three problems. In this paper, we focus on human action videos, and propose a general, two-stage deep framework to generate human action videos with no constraints or arbitrary number of constraints, which uniformly address the three problems: video generation given no input frames, video prediction given the first few frames, and video completion given the first and last frames. To make the problem tractable, in the first stage we train a deep generative model that generates a human pose sequence from random noise. In the second stage, a skeleton-to-image network is trained, which is used to generate a human action video given the complete human pose sequence generated in the first stage. By introducing the two-stage strategy, we sidestep the original ill-posed problems while producing for the first time high-quality video generation/prediction/completion results of much longer duration. We present quantitative and qualitative evaluation to show that our two-stage approach outperforms state-of-the-art methods in video generation, prediction and video completion. Our video result demonstration can be viewed at https://iamacewhite.github.io/supp/index.htmlComment: Under review for CVPR 2018. Haoye and Chunyan have equal contributio

    Deep representation learning for human motion prediction and classification

    Full text link
    Generative models of 3D human motion are often restricted to a small number of activities and can therefore not generalize well to novel movements or applications. In this work we propose a deep learning framework for human motion capture data that learns a generic representation from a large corpus of motion capture data and generalizes well to new, unseen, motions. Using an encoding-decoding network that learns to predict future 3D poses from the most recent past, we extract a feature representation of human motion. Most work on deep learning for sequence prediction focuses on video and speech. Since skeletal data has a different structure, we present and evaluate different network architectures that make different assumptions about time dependencies and limb correlations. To quantify the learned features, we use the output of different layers for action classification and visualize the receptive fields of the network units. Our method outperforms the recent state of the art in skeletal motion prediction even though these use action specific training data. Our results show that deep feedforward networks, trained from a generic mocap database, can successfully be used for feature extraction from human motion data and that this representation can be used as a foundation for classification and prediction.Comment: This paper is published at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 201

    Automatic emotional state detection using facial expression dynamic in videos

    Get PDF
    In this paper, an automatic emotion detection system is built for a computer or machine to detect the emotional state from facial expressions in human computer communication. Firstly, dynamic motion features are extracted from facial expression videos and then advanced machine learning methods for classification and regression are used to predict the emotional states. The system is evaluated on two publicly available datasets, i.e. GEMEP_FERA and AVEC2013, and satisfied performances are achieved in comparison with the baseline results provided. With this emotional state detection capability, a machine can read the facial expression of its user automatically. This technique can be integrated into applications such as smart robots, interactive games and smart surveillance systems
    • …
    corecore