27 research outputs found

    Disentangling Human Dynamics for Pedestrian Locomotion Forecasting with Noisy Supervision

    Full text link
    We tackle the problem of Human Locomotion Forecasting, a task for jointly predicting the spatial positions of several keypoints on the human body in the near future under an egocentric setting. In contrast to the previous work that aims to solve either the task of pose prediction or trajectory forecasting in isolation, we propose a framework to unify the two problems and address the practically useful task of pedestrian locomotion prediction in the wild. Among the major challenges in solving this task is the scarcity of annotated egocentric video datasets with dense annotations for pose, depth, or egomotion. To surmount this difficulty, we use state-of-the-art models to generate (noisy) annotations and propose robust forecasting models that can learn from this noisy supervision. We present a method to disentangle the overall pedestrian motion into easier to learn subparts by utilizing a pose completion and a decomposition module. The completion module fills in the missing key-point annotations and the decomposition module breaks the cleaned locomotion down to global (trajectory) and local (pose keypoint movements). Further, with Quasi RNN as our backbone, we propose a novel hierarchical trajectory forecasting network that utilizes low-level vision domain specific signals like egomotion and depth to predict the global trajectory. Our method leads to state-of-the-art results for the prediction of human locomotion in the egocentric view. Project pade: https://karttikeya.github.io/publication/plf/Comment: Accepted to WACV 2020 (Oral

    Indoor future person localization from an egocentric wearable camera

    Get PDF
    Accurate prediction of future person location and movement trajectory from an egocentric wearable camera can benefit a wide range of applications, such as assisting visually impaired people in navigation, and the development of mobility assistance for people with disability. In this work, a new egocentric dataset was constructed using a wearable camera, with 8,250 short clips of a targeted person either walking 1) toward, 2) away, or 3) across the camera wearer in indoor environments, or 4) staying still in the scene, and 13,817 person bounding boxes were manually labelled. Apart from the bounding boxes, the dataset also contains the estimated pose of the targeted person as well as the IMU signal of the wearable camera at each time point. An LSTM-based encoder-decoder framework was designed to predict the future location and movement trajectory of the targeted person in this egocentric setting. Extensive experiments have been conducted on the new dataset, and have shown that the proposed method is able to reliably and better predict future person location and trajectory in egocentric videos captured by the wearable camera compared to three baselines

    Toward Reliable Human Pose Forecasting with Uncertainty

    Full text link
    Recently, there has been an arms race of pose forecasting methods aimed at solving the spatio-temporal task of predicting a sequence of future 3D poses of a person given a sequence of past observed ones. However, the lack of unified benchmarks and limited uncertainty analysis have hindered progress in the field. To address this, we first develop an open-source library for human pose forecasting, featuring multiple models, datasets, and standardized evaluation metrics, with the aim of promoting research and moving toward a unified and fair evaluation. Second, we devise two types of uncertainty in the problem to increase performance and convey better trust: 1) we propose a method for modeling aleatoric uncertainty by using uncertainty priors to inject knowledge about the behavior of uncertainty. This focuses the capacity of the model in the direction of more meaningful supervision while reducing the number of learned parameters and improving stability; 2) we introduce a novel approach for quantifying the epistemic uncertainty of any model through clustering and measuring the entropy of its assignments. Our experiments demonstrate up to 25%25\% improvements in accuracy and better performance in uncertainty estimation

    TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild

    Full text link
    Joint forecasting of human trajectory and pose dynamics is a fundamental building block of various applications ranging from robotics and autonomous driving to surveillance systems. Predicting body dynamics requires capturing subtle information embedded in the humans' interactions with each other and with the objects present in the scene. In this paper, we propose a novel TRajectory and POse Dynamics (nicknamed TRiPOD) method based on graph attentional networks to model the human-human and human-object interactions both in the input space and the output space (decoded future output). The model is supplemented by a message passing interface over the graphs to fuse these different levels of interactions efficiently. Furthermore, to incorporate a real-world challenge, we propound to learn an indicator representing whether an estimated body joint is visible/invisible at each frame, e.g. due to occlusion or being outside the sensor field of view. Finally, we introduce a new benchmark for this joint task based on two challenging datasets (PoseTrack and 3DPW) and propose evaluation metrics to measure the effectiveness of predictions in the global space, even when there are invisible cases of joints. Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks

    A Quadruple Diffusion Convolutional Recurrent Network for Human Motion Prediction

    Get PDF
    Recurrent neural network (RNN) has become popular for human motion prediction thanks to its ability to capture temporal dependencies. However, it has limited capacity in modeling the complex spatial relationship in the human skeletal structure. In this work, we present a novel diffusion convolutional recurrent predictor for spatial and temporal movement forecasting, with multi-step random walks traversing bidirectionally along an adaptive graph to model interdependency among body joints. In the temporal domain, existing methods rely on a single forward predictor with the produced motion deflecting to the drift route, which leads to error accumulations over time. We propose to supplement the forward predictor with a forward discriminator to alleviate such motion drift in the long term under adversarial training. The solution is further enhanced by a backward predictor and a backward discriminator to effectively reduce the error, such that the system can also look into the past to improve the prediction at early frames. The two-way spatial diffusion convolutions and two-way temporal predictors together form a quadruple network. Furthermore, we train our framework by modeling the velocity from observed motion dynamics instead of static poses to predict future movements that effectively reduces the discontinuity problem at early prediction. Our method outperforms the state of the arts on both 3D and 2D datasets, including the Human3.6M, CMU Motion Capture and Penn Action datasets. The results also show that our method correctly predicts both high-dynamic and low-dynamic moving trends with less motion drift

    Towards Explainable, Privacy-Preserved Human-Motion Affect Recognition

    Get PDF
    Human motion characteristics are used to monitor the progression of neurological diseases and mood disorders. Since perceptions of emotions are also interleaved with body posture and movements, emotion recognition from human gait can be used to quantitatively monitor mood changes. Many existing solutions often use shallow machine learning models with raw positional data or manually extracted features to achieve this. However, gait is composed of many highly expressive characteristics that can be used to identify human subjects, and most solutions fail to address this, disregarding the subject's privacy. This work introduces a novel deep neural network architecture to disentangle human emotions and biometrics. In particular, we propose a cross-subject transfer learning technique for training a multi-encoder autoencoder deep neural network to learn disentangled latent representations of human motion features. By disentangling subject biometrics from the gait data, we show that the subject's privacy is preserved while the affect recognition performance outperforms traditional methods. Furthermore, we exploit Guided Grad-CAM to provide global explanations of the model's decision across gait cycles. We evaluate the effectiveness of our method to existing methods at recognizing emotions using both 3D temporal joint signals and manually extracted features. We also show that this data can easily be exploited to expose a subject's identity. Our method shows up to 7% improvement and highlights the joints with the most significant influence across the average gait cycle
    corecore