172 research outputs found
Human Motion Trajectory Prediction: A Survey
With growing numbers of intelligent autonomous systems in human environments,
the ability of such systems to perceive, understand and anticipate human
behavior becomes increasingly important. Specifically, predicting future
positions of dynamic agents and planning considering such predictions are key
tasks for self-driving vehicles, service robots and advanced surveillance
systems. This paper provides a survey of human motion trajectory prediction. We
review, analyze and structure a large selection of work from different
communities and propose a taxonomy that categorizes existing methods based on
the motion modeling approach and level of contextual information used. We
provide an overview of the existing datasets and performance metrics. We
discuss limitations of the state of the art and outline directions for further
research.Comment: Submitted to the International Journal of Robotics Research (IJRR),
37 page
Recommended from our members
Learning Birds-Eye View Representations for Autonomous Driving
Over the past few years, progress towards the ambitious goal of widespread fully-autonomous vehicles on our roads has accelerated dramatically. This progress has been spurred largely by the success of highly accurate LiDAR sensors, as well the use of detailed high-resolution maps, which together allow a vehicle to navigate its surroundings effectively. Often, however, one or both of these resources may be unavailable, whether due to cost, sensor failure, or the need to operate in an unmapped environment. The aim of this thesis is therefore to demonstrate that it is possible to build detailed three-dimensional representations of traffic scenes using only 2D monocular camera images as input. Such an approach faces many challenges: most notably that 2D images do not provide explicit 3D structure. We overcome this limitation by applying a combination of deep learning and geometry to transform image-based features into an orthographic birds-eye view representation of the scene, allowing algorithms to reason in a metric, 3D space. This approach is applied to solving two challenging perception tasks central to autonomous driving.
The first part of this thesis addresses the problem of monocular 3D object detection, which involves determining the size and location of all objects in the scene. Our solution was based on a novel convolutional network architecture that processed features in both the image and birds-eye view perspective. Results on the KITTI dataset showed that this network outperformed existing works at the time, and although more recent works have improved on these results, we conducted extensive analysis to find that our solution performed well in many difficult edge-case scenarios such as objects close to or distant from the camera.
In the second part of the thesis, we consider the related problem of semantic map prediction. This consists of estimating a birds-eye view map of the world visible from a given camera, encoding both static elements of the scene such as pavement and road layout, as well as dynamic objects such as vehicles and pedestrians. This was accomplished using a second network that built on the experience from the previous work and achieved convincing performance on two real-world driving datasets. By formulating the maps as an occupancy grid map (a widely used representation from robotics), we were able to demonstrate how predictions could be accumulated across multiple frames, and that doing so further improved the robustness of maps produced by our system.Toyota Motors Europ
Integrating Perception, Prediction and Control for Adaptive Mobile Navigation
Mobile robots capable of navigating seamlessly and safely in pedestrian rich environments promise to bring robotic assistance closer to our daily lives. A key limitation of existing navigation policies is the difficulty to predict and reason about the environment including static obstacles and pedestrians. In this thesis, I explore three properties of navigation including prediction of occupied spaces, prediction of pedestrians and measurements of uncertainty to improve crowd-based navigation. The hypothesis is that improving prediction and uncertainty estimation will increase robot navigation performance resulting in fewer collisions, faster speeds and lead to more socially-compliant motion in crowds.
Specifically, this thesis focuses on techniques that allow mobile robots to predict occupied spaces that extend beyond the line of sight of the sensor. This is accomplished through the development of novel generative neural network architectures that enable map prediction that exceed the limitations of the sensor. Further, I extend the neural network architectures to predict multiple hypotheses and use the variance of the hypotheses as a measure of uncertainty to formulate an information-theoretic map exploration strategy. Finally, control algorithms that leverage the predicted occupancy map were developed to demonstrate more robust, high-speed navigation on a physical small form factor autonomous car.
I further extend the prediction and uncertainty approaches to include modeling pedestrian motion for dynamic crowd navigation. This includes developing novel techniques that model human intent to predict future motion of pedestrians. I show this approach improves state-of-the-art results in pedestrian prediction. I then show errors in prediction can be used as a measure of uncertainty to adapt the risk sensitivity of the robot controller in real time. Finally, I show that the crowd navigation algorithm extends to socially compliant behavior in groups of pedestrians.
This research demonstrates that combining obstacle and pedestrian prediction with uncertainty estimation achieves more robust navigation policies. This approach results in improved map exploration efficiency, faster robot motion, fewer number of collisions and more socially compliant robot motion within crowds
Improved robustness and efficiency for automatic visual site monitoring
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (p. 219-228).Knowing who people are, where they are, what they are doing, and how they interact with other people and things is valuable from commercial, security, and space utilization perspectives. Video sensors backed by computer vision algorithms are a natural way to gather this data. Unfortunately, key technical issues persist in extracting features and models that are simultaneously efficient to compute and robust to issues such as adverse lighting conditions, distracting background motions, appearance changes over time, and occlusions. In this thesis, we present a set of techniques and model enhancements to better handle these problems, focusing on contributions in four areas. First, we improve background subtraction so it can better handle temporally irregular dynamic textures. This allows us to achieve a 5.5% drop in false positive rate on the Wallflower waving trees video. Secondly, we adapt the Dalal and Triggs Histogram of Oriented Gradients pedestrian detector to work on large-scale scenes with dense crowds and harsh lighting conditions: challenges which prevent us from easily using a background subtraction solution. These scenes contain hundreds of simultaneously visible people. To make using the algorithm computationally feasible, we have produced a novel implementation that runs on commodity graphics hardware and is up to 76 faster than our CPU-only implementation. We demonstrate the utility of this detector by modeling scene-level activities with a Hierarchical Dirichlet Process.(cont.) Third, we show how one can improve the quality of pedestrian silhouettes for recognizing individual people. We combine general appearance information from a large population of pedestrians with semi-periodic shape information from individual silhouette sequences. Finally, we show how one can combine a variety of detection and tracking techniques to robustly handle a variety of event detection scenarios such as theft and left-luggage detection. We present the only complete set of results on a standardized collection of very challenging videos.by Gerald Edwin Dalley.Ph.D
Intent prediction of vulnerable road users for trusted autonomous vehicles
This study investigated how future autonomous vehicles could be further trusted by vulnerable road users (such as pedestrians and cyclists) that they would be interacting with in urban traffic environments. It focused on understanding the behaviours of such road users on a deeper level by predicting their future intentions based solely on vehicle-based sensors and AI techniques. The findings showed that personal/body language attributes of vulnerable road users besides their past motion trajectories and physics attributes in the environment led to more accurate predictions about their intended actions
Advances in Artificial Intelligence: Models, Optimization, and Machine Learning
The present book contains all the articles accepted and published in the Special Issue “Advances in Artificial Intelligence: Models, Optimization, and Machine Learning” of the MDPI Mathematics journal, which covers a wide range of topics connected to the theory and applications of artificial intelligence and its subfields. These topics include, among others, deep learning and classic machine learning algorithms, neural modelling, architectures and learning algorithms, biologically inspired optimization algorithms, algorithms for autonomous driving, probabilistic models and Bayesian reasoning, intelligent agents and multiagent systems. We hope that the scientific results presented in this book will serve as valuable sources of documentation and inspiration for anyone willing to pursue research in artificial intelligence, machine learning and their widespread applications
- …