72,046 research outputs found

    A Quadruple Diffusion Convolutional Recurrent Network for Human Motion Prediction

    Get PDF
    Recurrent neural network (RNN) has become popular for human motion prediction thanks to its ability to capture temporal dependencies. However, it has limited capacity in modeling the complex spatial relationship in the human skeletal structure. In this work, we present a novel diffusion convolutional recurrent predictor for spatial and temporal movement forecasting, with multi-step random walks traversing bidirectionally along an adaptive graph to model interdependency among body joints. In the temporal domain, existing methods rely on a single forward predictor with the produced motion deflecting to the drift route, which leads to error accumulations over time. We propose to supplement the forward predictor with a forward discriminator to alleviate such motion drift in the long term under adversarial training. The solution is further enhanced by a backward predictor and a backward discriminator to effectively reduce the error, such that the system can also look into the past to improve the prediction at early frames. The two-way spatial diffusion convolutions and two-way temporal predictors together form a quadruple network. Furthermore, we train our framework by modeling the velocity from observed motion dynamics instead of static poses to predict future movements that effectively reduces the discontinuity problem at early prediction. Our method outperforms the state of the arts on both 3D and 2D datasets, including the Human3.6M, CMU Motion Capture and Penn Action datasets. The results also show that our method correctly predicts both high-dynamic and low-dynamic moving trends with less motion drift

    Flow-based Autoregressive Structured Prediction of Human Motion

    Full text link
    A new method is proposed for human motion predition by learning temporal and spatial dependencies in an end-to-end deep neural network. The joint connectivity is explicitly modeled using a novel autoregressive structured prediction representation based on flow-based generative models. We learn a latent space of complex body poses in consecutive frames which is conditioned on the high-dimensional structure input sequence. To construct each latent variable, the general and local smoothness of the joint positions are considered in a generative process using conditional normalizing flows. As a result, all frame-level and joint-level continuities in the sequence are preserved in the model. This enables us to parameterize the inter-frame and intra-frame relationships and joint connectivity for robust long-term predictions as well as short-term prediction. Our experiments on two challenging benchmark datasets of Human3.6M and AMASS demonstrate that our proposed method is able to effectively model the sequence information for motion prediction and outperform other techniques in 42 of the 48 total experiment scenarios to set a new state-of-the-art

    Neural correlates of the processing of visually simulated self-motion

    Get PDF
    Successful interaction with our environment requires the perception of our surroundings. For coping with everyday challenges our own movements in this environment are important. In my thesis, I have investigated the neural correlates of visually simulated self-motion. More specifically, I have analyzed the processing of two key features of visual self-motion: the self-motion direction (heading) and the traveled distance (path integration) by means of electroencephalogram (EEG) measurements and transcranial magnetic stimulation (TMS). I have focused on investigating the role of prediction about the upcoming sensory event on the processing of these self-motion features. To this end, I applied the approach of the predictive coding theory. In this context, prediction errors induced by the mismatch between predictions and the actual sensory input are used to update the internal model responsible for creating the predictions. Additionally, I aimed to combine my findings with the results of previous studies on monkeys in order to further probe the role of the macaque monkey as an animal model for human sensorimotor processing. In my first study, I investigated the processing of different self-motion directions using a classical oddball EEG measurement. The frequently presented self-motion stimuli to one direction were interspersed with a rarely presented different self-motion direction. The headings occurred with different probabilities which modified the prediction about the upcoming event and allowed for the formulation of an internal model. Unexpected self-motion directions created a prediction error. I could prove this in my data by detecting a specific EEG-component, the mismatch negativity (MMN). This MMN-component does not only reveal the influence of predictions on the processing of visually simulated self-motion directions according to the predictive coding theory, but is also known to indicate the preattentive processing of the analyzed feature, here the heading. EEG data from monkeys was recorded with identical equipment during the presentation of the previously described stimulus by colleagues from my lab in order to test for the similarities in monkey and human processing of visually simulated self-motion. Remarkably, data showing a MMN-component similar to the human data was recorded. This led us to suggest that the underlying processes are comparable across human and non-human primates. In my second study, the objective was to causally link the human functional equivalent of macaque medial superior temporal area (hMST) to the perception of self-motion directions. In previous studies this area has been shown to be important for the processing of self-motion. Applying TMS to right hemisphere area hMST resulted in an increase in variance when participants were asked to estimate heading to the left, i.e. to the direction contraversive to the stimulation site. The results of this study were used to test a model developed by colleagues of my lab. They used findings from single cell recordings in macaque monkeys to create it. Simulating the influence of lateralized TMS pulses on one hemisphere hMST this model hypothesized an increase in variance for estimation of headings contraversive to the TMS stimulated hemisphere. This is exactly what I observed in data of my TMS experiment. In this second study I verified the finding of previous studies that hMST is important for the processing of self-motion directions. In addition, I showed that a model based on recordings from macaque monkeys can predict the outcome of an experiment with human participants. This indicates the similarity of the processing of visually simulated self-motion in humans and macaque monkeys. The third study focused on the representation of traveled distance using EEG recordings in human participants. The goal of this study was two-fold: First, I analyzed the influence of prediction on the processing of traveled distance. Second, I aimed to find a neural correlate of subjective traveled distance. Participants were asked to passively observe a forward self-motion. The movement onset and offset could not be predicted by them. In a next step participants reproduced double the distance of the previously observed self-motion. Since they actively modulated the movement to reach the desired distance, the resulting self-motion onset and offset could be predicted. Comparing the visually evoked potentials (VEPs) after self-motion onset and offsets of the predicted and unpredicted self-motion, I found differences supporting the predictive coding theory. Amplitudes for self-motion onset VEPs were larger in the passive condition. For self-motion offset, I found larger latencies for the VEP-components in the passive condition. In addition to these results I searched for a neural correlate of the subjective estimation of the distance presented in the passive condition. During the active reproduction of double the distance obviously the single distance was passed. I assumed that half of the reproduced double distance would be the subjective estimation of the single distance. When passing this subjective single distance, an increase in the alpha band activity was detected in half of the participants. At this point in time prediction about the upcoming movement changed since participants started reproducing the single distance again. In context of the predictive coding theory these prediction changes are considered to be feedback processes. It has been shown in previous studies that these kinds of feedback processes are associated with alpha oscillations. With this study, I demonstrated the influence of prediction on self-motion onset and offset VEPs as well as on brain oscillations during a distance reproduction experiment. In conclusion, with this thesis I analyzed the neural correlates of the processing of self-motion directions and traveled distance. The underlying neural mechanisms seem to be very similar in humans and macaque monkeys, which suggests the macaque monkey as an appropriate animal model for human sensorimotor processing. Lastly, I investigated the influence of prediction on EEG-components recorded during the processing of self-motion directions and traveled distances

    Deep temporal motion descriptor (DTMD) for human action recognition

    Get PDF
    Spatiotemporal features have significant importance in human action recognition, as they provide the actor's shape and motion characteristics specific to each action class. This paper presents a new deep spatiotemporal human action representation, \Deep Temporal Motion Descriptor (DTMD)", which shares the attributes of holistic and deep learned features. To generate the DTMD descriptor, the actor's silhouettes are gathered into single motion templates through applying motion history images. These motion templates capture the spatiotemporal movements of the actor and compactly represents the human actions using a single 2D template. Then, deep convolutional neural networks are used to compute discriminative deep features from motion history templates to produce DTMD. Later, DTMD is used for learn a model to recognise human actions using a softmax classifier. The advantage of DTMD comes from (i) DTMD is automatically learned from videos and contains higher dimensional discriminative spatiotemporal representation as compared to handcrafted features; (ii) DTMD reduces the computational complexity of human activity recognition as all the video frames are compactly represented as a single motion template; (iii) DTMD works e ectively for single and multiview action recognition. We conducted experiments on three challenging datasets: MuHAVI-Uncut, iXMAS, and IAVID-1. The experimental findings reveal that DTMD outperforms previous methods and achieves the highest action prediction rate on the MuHAVI-Uncut datase

    Human behavior understanding and intention prediction

    Get PDF
    Human motion, behaviors, and intention are governed by human perception, reasoning, common-sense rules, social conventions, and interactions with others and the surrounding environment. Humans can effectively predict short-term body motion, behaviors, and intention of others and respond accordingly. The ability for a machine to learn, analyze, and predict human motion, behaviors, and intentions in complex environments is highly valuable with a wide range of applications in social robots, intelligent systems, smart manufacturing, autonomous driving, and smart homes. In this thesis, we propose to address the above research question by focusing on three important problems: human pose estimation, temporal action localization and informatics, human motion trajectory and intention prediction. Specifically, in the first part of our work, we aim to develop an automatic system to track human pose, monitor and evaluate worker's efficiency for smart workforce management based on human body pose estimation and temporal activity localization. We have developed a deep learning based method to accurately detect human body joints and track human motion. We use the generative adversarial networks (GANs) for adversarial training to better learn human pose and body configurations, especially in highly cluttered environments. In the second step, we have formulated the automated worker efficiency analysis into a temporal action localization problem in which the action video performed by the worker is matched against a reference video performed by a teacher using dynamic time warping. In the second part of our work, we have developed a new idea, called reciprocal learning, based on the following important observation: the human trajectory is not only forward predictable, but also backward predictable. Both forward and backward trajectories follow the same social norms and obey the same physical constraints with the only difference in their time directions. Based on this unique property, we design and couple two networks, forward and backward prediction networks, satisfying the reciprocal constraint, which allows them to be jointly learned. Based on this constraint, we borrow the concept of adversarial attacks of deep neural networks, which iteratively modifies the input of the network to match the given or forced network output, and develop a new method for network prediction, called reciprocal attack for matched prediction. It further improves the prediction accuracy. In the third part of our work, we have observed that human's future trajectory is not only affected by other pedestrians but also impacted by the surrounding objects in the scene. We propose a novel hierarchical framework based on a recurrent sequence-to-sequence architecture to model both human-human and human-scene interactions. Our experimental results on benchmark datasets demonstrate that our new method outperforms the state-of-the-art methods for human trajectory prediction.Includes bibliographical references (pages 108-129)
    • …
    corecore