7,822 research outputs found

    Digging Deeper into Egocentric Gaze Prediction

    Full text link
    This paper digs deeper into factors that influence egocentric gaze. Instead of training deep models for this purpose in a blind manner, we propose to inspect factors that contribute to gaze guidance during daily tasks. Bottom-up saliency and optical flow are assessed versus strong spatial prior baselines. Task-specific cues such as vanishing point, manipulation point, and hand regions are analyzed as representatives of top-down information. We also look into the contribution of these factors by investigating a simple recurrent neural model for ego-centric gaze prediction. First, deep features are extracted for all input video frames. Then, a gated recurrent unit is employed to integrate information over time and to predict the next fixation. We also propose an integrated model that combines the recurrent model with several top-down and bottom-up cues. Extensive experiments over multiple datasets reveal that (1) spatial biases are strong in egocentric videos, (2) bottom-up saliency models perform poorly in predicting gaze and underperform spatial biases, (3) deep features perform better compared to traditional features, (4) as opposed to hand regions, the manipulation point is a strong influential cue for gaze prediction, (5) combining the proposed recurrent model with bottom-up cues, vanishing points and, in particular, manipulation point results in the best gaze prediction accuracy over egocentric videos, (6) the knowledge transfer works best for cases where the tasks or sequences are similar, and (7) task and activity recognition can benefit from gaze prediction. Our findings suggest that (1) there should be more emphasis on hand-object interaction and (2) the egocentric vision community should consider larger datasets including diverse stimuli and more subjects.Comment: presented at WACV 201

    Human Motion Trajectory Prediction: A Survey

    Full text link
    With growing numbers of intelligent autonomous systems in human environments, the ability of such systems to perceive, understand and anticipate human behavior becomes increasingly important. Specifically, predicting future positions of dynamic agents and planning considering such predictions are key tasks for self-driving vehicles, service robots and advanced surveillance systems. This paper provides a survey of human motion trajectory prediction. We review, analyze and structure a large selection of work from different communities and propose a taxonomy that categorizes existing methods based on the motion modeling approach and level of contextual information used. We provide an overview of the existing datasets and performance metrics. We discuss limitations of the state of the art and outline directions for further research.Comment: Submitted to the International Journal of Robotics Research (IJRR), 37 page

    Models for Pedestrian Trajectory Prediction and Navigation in Dynamic Environments

    Get PDF
    Robots are no longer constrained to cages in factories and are increasingly taking on roles alongside humans. Before robots can accomplish their tasks in these dynamic environments, they must be able to navigate while avoiding collisions with pedestrians or other robots. Humans are able to move through crowds by anticipating the movements of other pedestrians and how their actions will influence others; developing a method for predicting pedestrian trajectories is a critical component of a robust robot navigation system. A current state-of-the-art approach for predicting pedestrian trajectories is Social-LSTM, which is a recurrent neural network that incorporates information about neighboring pedestrians to learn how people move cooperatively around each other. This thesis extends and modifies that model to output parameters for a multimodal distribution, which better captures the uncertainty inherent in pedestrian movements. Additionally, four novel architectures for representing neighboring pedestrians are proposed; these models are more general than current trajectory prediction systems and have fewer hyper-parameters. In both simulations and real-world datasets, the multimodal extension significantly increases the accuracy of trajectory prediction. One of the new neighbor representation architectures achieves state-of-the-art results while reducing the number of both parameters and hyper-parameters compared to existing solutions. Two techniques for incorporating the trajectory predictions into a planning system are also developed and evaluated on a real-world dataset. Both techniques plan routes that include fewer near-collisions than algorithms that do not use trajectory predictions. Finally, a Python library for Agent-Based-Modeling and crowd simulation is presented to aid in future research
    • …
    corecore