8 research outputs found

    TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild

    Full text link
    Joint forecasting of human trajectory and pose dynamics is a fundamental building block of various applications ranging from robotics and autonomous driving to surveillance systems. Predicting body dynamics requires capturing subtle information embedded in the humans' interactions with each other and with the objects present in the scene. In this paper, we propose a novel TRajectory and POse Dynamics (nicknamed TRiPOD) method based on graph attentional networks to model the human-human and human-object interactions both in the input space and the output space (decoded future output). The model is supplemented by a message passing interface over the graphs to fuse these different levels of interactions efficiently. Furthermore, to incorporate a real-world challenge, we propound to learn an indicator representing whether an estimated body joint is visible/invisible at each frame, e.g. due to occlusion or being outside the sensor field of view. Finally, we introduce a new benchmark for this joint task based on two challenging datasets (PoseTrack and 3DPW) and propose evaluation metrics to measure the effectiveness of predictions in the global space, even when there are invisible cases of joints. Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks

    Multiagent off-screen behavior prediction in football

    Get PDF
    In multiagent worlds, several decision-making individuals interact while adhering to the dynamics constraints imposed by the environment. These interactions, combined with the potential stochasticity of the agents' dynamic behaviors, make such systems complex and interesting to study from a decision-making perspective. Significant research has been conducted on learning models for forward-direction estimation of agent behaviors, for example, pedestrian predictions used for collision-avoidance in self-driving cars. In many settings, only sporadic observations of agents may be available in a given trajectory sequence. In football, subsets of players may come in and out of view of broadcast video footage, while unobserved players continue to interact off-screen. In this paper, we study the problem of multiagent time-series imputation in the context of human football play, where available past and future observations of subsets of agents are used to estimate missing observations for other agents. Our approach, called the Graph Imputer, uses past and future information in combination with graph networks and variational autoencoders to enable learning of a distribution of imputed trajectories. We demonstrate our approach on multiagent settings involving players that are partially-observable, using the Graph Imputer to predict the behaviors of off-screen players. To quantitatively evaluate the approach, we conduct experiments on football matches with ground truth trajectory data, using a camera module to simulate the off-screen player state estimation setting. We subsequently use our approach for downstream football analytics under partial observability using the well-established framework of pitch control, which traditionally relies on fully observed data. We illustrate that our method outperforms several state-of-the-art approaches, including those hand-crafted for football, across all considered metrics

    Human behavior understanding and intention prediction

    Get PDF
    Human motion, behaviors, and intention are governed by human perception, reasoning, common-sense rules, social conventions, and interactions with others and the surrounding environment. Humans can effectively predict short-term body motion, behaviors, and intention of others and respond accordingly. The ability for a machine to learn, analyze, and predict human motion, behaviors, and intentions in complex environments is highly valuable with a wide range of applications in social robots, intelligent systems, smart manufacturing, autonomous driving, and smart homes. In this thesis, we propose to address the above research question by focusing on three important problems: human pose estimation, temporal action localization and informatics, human motion trajectory and intention prediction. Specifically, in the first part of our work, we aim to develop an automatic system to track human pose, monitor and evaluate worker's efficiency for smart workforce management based on human body pose estimation and temporal activity localization. We have developed a deep learning based method to accurately detect human body joints and track human motion. We use the generative adversarial networks (GANs) for adversarial training to better learn human pose and body configurations, especially in highly cluttered environments. In the second step, we have formulated the automated worker efficiency analysis into a temporal action localization problem in which the action video performed by the worker is matched against a reference video performed by a teacher using dynamic time warping. In the second part of our work, we have developed a new idea, called reciprocal learning, based on the following important observation: the human trajectory is not only forward predictable, but also backward predictable. Both forward and backward trajectories follow the same social norms and obey the same physical constraints with the only difference in their time directions. Based on this unique property, we design and couple two networks, forward and backward prediction networks, satisfying the reciprocal constraint, which allows them to be jointly learned. Based on this constraint, we borrow the concept of adversarial attacks of deep neural networks, which iteratively modifies the input of the network to match the given or forced network output, and develop a new method for network prediction, called reciprocal attack for matched prediction. It further improves the prediction accuracy. In the third part of our work, we have observed that human's future trajectory is not only affected by other pedestrians but also impacted by the surrounding objects in the scene. We propose a novel hierarchical framework based on a recurrent sequence-to-sequence architecture to model both human-human and human-scene interactions. Our experimental results on benchmark datasets demonstrate that our new method outperforms the state-of-the-art methods for human trajectory prediction.Includes bibliographical references (pages 108-129)
    corecore