7 research outputs found

    Human behavior understanding and intention prediction

    Get PDF
    Human motion, behaviors, and intention are governed by human perception, reasoning, common-sense rules, social conventions, and interactions with others and the surrounding environment. Humans can effectively predict short-term body motion, behaviors, and intention of others and respond accordingly. The ability for a machine to learn, analyze, and predict human motion, behaviors, and intentions in complex environments is highly valuable with a wide range of applications in social robots, intelligent systems, smart manufacturing, autonomous driving, and smart homes. In this thesis, we propose to address the above research question by focusing on three important problems: human pose estimation, temporal action localization and informatics, human motion trajectory and intention prediction. Specifically, in the first part of our work, we aim to develop an automatic system to track human pose, monitor and evaluate worker's efficiency for smart workforce management based on human body pose estimation and temporal activity localization. We have developed a deep learning based method to accurately detect human body joints and track human motion. We use the generative adversarial networks (GANs) for adversarial training to better learn human pose and body configurations, especially in highly cluttered environments. In the second step, we have formulated the automated worker efficiency analysis into a temporal action localization problem in which the action video performed by the worker is matched against a reference video performed by a teacher using dynamic time warping. In the second part of our work, we have developed a new idea, called reciprocal learning, based on the following important observation: the human trajectory is not only forward predictable, but also backward predictable. Both forward and backward trajectories follow the same social norms and obey the same physical constraints with the only difference in their time directions. Based on this unique property, we design and couple two networks, forward and backward prediction networks, satisfying the reciprocal constraint, which allows them to be jointly learned. Based on this constraint, we borrow the concept of adversarial attacks of deep neural networks, which iteratively modifies the input of the network to match the given or forced network output, and develop a new method for network prediction, called reciprocal attack for matched prediction. It further improves the prediction accuracy. In the third part of our work, we have observed that human's future trajectory is not only affected by other pedestrians but also impacted by the surrounding objects in the scene. We propose a novel hierarchical framework based on a recurrent sequence-to-sequence architecture to model both human-human and human-scene interactions. Our experimental results on benchmark datasets demonstrate that our new method outperforms the state-of-the-art methods for human trajectory prediction.Includes bibliographical references (pages 108-129)

    On the Design of 2D Human Pose Estimation Networks using Accelerated Neuroevolution and Novel Keypoint Representations

    Get PDF
    Motion capture is a very useful technology that is employed across many industries. Biomechanical analysis, film production, video game development, and virtual reality are among its diverse applications. However, traditional marker-based motion capture systems are limited by their invasiveness, excessive cost, and lack of portability. Human pose estimation represents a promising markerless alternative, where 3D human poses are estimated from RGB images obtained using single or multi-camera setups. The estimation of 2D poses serves as the main foundation for these systems. As such, the development of accurate and efficient 2D human pose estimation algorithms is critical to the overall advancement of markerless motion capture, and that is the focus of this thesis. Two novel convolutional neural networks for 2D human pose estimation are presented, one for each of the two multi-person estimation paradigms (i.e., single-stage and two-stage). Motivated by the recent use of neural architecture search for convolutional neural network design, a novel neuroevolution framework is introduced and is leveraged in the design of a computationally efficient heatmap-based human pose estimation network for use in the two-stage paradigm. The neuroevolution was accelerated by a novel weight transfer scheme that relaxes the complete function-preservation constraint imposed by previous methods. Recognizing the drawbacks of heatmaps, including the inherent issue of quantization error and the excessive computation required to generate and post-process large heatmap fields, two novel heatmap-free keypoint representations are introduced for modeling keypoint locations more efficiently. Drawing inspiration from single-stage object detectors, the representations are centered around modeling individual keypoints and sets of spatially related keypoints (i.e., poses) as objects. It is found that pose objects lend themselves well to single-stage human pose estimation, and a method is introduced that jointly learns human pose objects and keypoint objects and fuses the detections to exploit the strengths of both object representations. At the time of development, both networks achieved state-of-the-art accuracy in their respective categories on the most established multi-person human pose estimation benchmarks when scaled. Moreover, they are more computationally efficient than previous networks as shown by having fewer parameters, fewer floating-point operations, and faster inference speed. The two-stage model is recommended for accuracy-critical scenarios with large computational budgets, whereas the single-stage model is more efficient and has greater potential for future research. It is hoped that these new models advance the current state of markerless motion capture systems based on human pose estimation
    corecore