356 research outputs found

    Long-term future prediction under uncertainty and multi-modality

    Get PDF
    Humans have an innate ability to excel at activities that involve prediction of complex object dynamics such as predicting the possible trajectory of a billiard ball after it has been hit by the player or the prediction of motion of pedestrians while on the road. A key feature that enables humans to perform such tasks is anticipation. There has been continuous research in the area of Computer Vision and Artificial Intelligence to mimic this human ability for autonomous agents to succeed in the real world scenarios. Recent advances in the field of deep learning and the availability of large scale datasets has enabled the pursuit of fully autonomous agents with complex decision making abilities such as self-driving vehicles or robots. One of the main challenges encompassing the deployment of these agents in the real world is their ability to perform anticipation tasks with at least human level efficiency. To advance the field of autonomous systems, particularly, self-driving agents, in this thesis, we focus on the task of future prediction in diverse real world settings, ranging from deterministic scenarios such as prediction of paths of balls on a billiard table to the predicting the future of non-deterministic street scenes. Specifically, we identify certain core challenges for long-term future prediction: long-term prediction, uncertainty, multi-modality, and exact inference. To address these challenges, this thesis makes the following core contributions. Firstly, for accurate long-term predictions, we develop approaches that effectively utilize available observed information in the form of image boundaries in videos or interactions in street scenes. Secondly, as uncertainty increases into the future in case of non-deterministic scenarios, we leverage Bayesian inference frameworks to capture calibrated distributions of likely future events. Finally, to further improve performance in highly-multimodal non-deterministic scenarios such as street scenes, we develop deep generative models based on conditional variational autoencoders as well as normalizing flow based exact inference methods. Furthermore, we introduce a novel dataset with dense pedestrian-vehicle interactions to further aid the development of anticipation methods for autonomous driving applications in urban environments.Menschen haben die angeborene Fähigkeit, Vorgänge mit komplexer Objektdynamik vorauszusehen, wie z. B. die Vorhersage der möglichen Flugbahn einer Billardkugel, nachdem sie vom Spieler gestoßen wurde, oder die Vorhersage der Bewegung von Fußgängern auf der Straße. Eine Schlüsseleigenschaft, die es dem Menschen ermöglicht, solche Aufgaben zu erfüllen, ist die Antizipation. Im Bereich der Computer Vision und der Künstlichen Intelligenz wurde kontinuierlich daran geforscht, diese menschliche Fähigkeit nachzuahmen, damit autonome Agenten in der realen Welt erfolgreich sein können. Jüngste Fortschritte auf dem Gebiet des Deep Learning und die Verfügbarkeit großer Datensätze haben die Entwicklung vollständig autonomer Agenten mit komplexen Entscheidungsfähigkeiten wie selbstfahrende Fahrzeugen oder Roboter ermöglicht. Eine der größten Herausforderungen beim Einsatz dieser Agenten in der realen Welt ist ihre Fähigkeit, Antizipationsaufgaben mit einer Effizienz durchzuführen, die mindestens der menschlichen entspricht. Um das Feld der autonomen Systeme, insbesondere der selbstfahrenden Agenten, voranzubringen, konzentrieren wir uns in dieser Arbeit auf die Aufgabe der Zukunftsvorhersage in verschiedenen realen Umgebungen, die von deterministischen Szenarien wie der Vorhersage der Bahnen von Kugeln auf einem Billardtisch bis zur Vorhersage der Zukunft von nicht-deterministischen Straßenszenen reichen. Insbesondere identifizieren wir bestimmte grundlegende Herausforderungen für langfristige Zukunftsvorhersagen: Langzeitvorhersage, Unsicherheit, Multimodalität und exakte Inferenz. Um diese Herausforderungen anzugehen, leistet diese Arbeit die folgenden grundlegenden Beiträge. Erstens: Für genaue Langzeitvorhersagen entwickeln wir Ansätze, die verfügbare Beobachtungsinformationen in Form von Bildgrenzen in Videos oder Interaktionen in Straßenszenen effektiv nutzen. Zweitens: Da die Unsicherheit in der Zukunft bei nicht-deterministischen Szenarien zunimmt, nutzen wir Bayes’sche Inferenzverfahren, um kalibrierte Verteilungen wahrscheinlicher zukünftiger Ereignisse zu erfassen. Drittens: Um die Leistung in hochmultimodalen, nichtdeterministischen Szenarien wie Straßenszenen weiter zu verbessern, entwickeln wir tiefe generative Modelle, die sowohl auf konditionalen Variations-Autoencodern als auch auf normalisierenden fließenden exakten Inferenzmethoden basieren. Darüber hinaus stellen wir einen neuartigen Datensatz mit dichten Fußgänger-Fahrzeug- Interaktionen vor, um Antizipationsmethoden für autonome Fahranwendungen in urbanen Umgebungen weiter zu entwickeln

    Articulated human tracking and behavioural analysis in video sequences

    Get PDF
    Recently, there has been a dramatic growth of interest in the observation and tracking of human subjects through video sequences. Arguably, the principal impetus has come from the perceived demand for technological surveillance, however applications in entertainment, intelligent domiciles and medicine are also increasing. This thesis examines human articulated tracking and the classi cation of human movement, rst separately and then as a sequential process. First, this thesis considers the development and training of a 3D model of human body structure and dynamics. To process video sequences, an observation model is also designed with a multi-component likelihood based on edge, silhouette and colour. This is de ned on the articulated limbs, and visible from a single or multiple cameras, each of which may be calibrated from that sequence. Second, for behavioural analysis, we develop a methodology in which actions and activities are described by semantic labels generated from a Movement Cluster Model (MCM). Third, a Hierarchical Partitioned Particle Filter (HPPF) was developed for human tracking that allows multi-level parameter search consistent with the body structure. This tracker relies on the articulated motion prediction provided by the MCM at pose or limb level. Fourth, tracking and movement analysis are integrated to generate a probabilistic activity description with action labels. The implemented algorithms for tracking and behavioural analysis are tested extensively and independently against ground truth on human tracking and surveillance datasets. Dynamic models are shown to predict and generate synthetic motion, while MCM recovers both periodic and non-periodic activities, de ned either on the whole body or at the limb level. Tracking results are comparable with the state of the art, however the integrated behaviour analysis adds to the value of the approach.Overseas Research Students Awards Scheme (ORSAS

    Active Object Classification from 3D Range Data with Mobile Robots

    Get PDF
    This thesis addresses the problem of how to improve the acquisition of 3D range data with a mobile robot for the task of object classification. Establishing the identities of objects in unknown environments is fundamental for robotic systems and helps enable many abilities such as grasping, manipulation, or semantic mapping. Objects are recognised by data obtained from sensor observations, however, data is highly dependent on viewpoint; the variation in position and orientation of the sensor relative to an object can result in large variation in the perception quality. Additionally, cluttered environments present a further challenge because key data may be missing. These issues are not always solved by traditional passive systems where data are collected from a fixed navigation process then fed into a perception pipeline. This thesis considers an active approach to data collection by deciding where is most appropriate to make observations for the perception task. The core contributions of this thesis are a non-myopic planning strategy to collect data efficiently under resource constraints, and supporting viewpoint prediction and evaluation methods for object classification. Our approach to planning uses Monte Carlo methods coupled with a classifier based on non-parametric Bayesian regression. We present a novel anytime and non-myopic planning algorithm, Monte Carlo active perception, that extends Monte Carlo tree search to partially observable environments and the active perception problem. This is combined with a particle-based estimation process and a learned observation likelihood model that uses Gaussian process regression. To support planning, we present 3D point cloud prediction algorithms and utility functions that measure the quality of viewpoints by their discriminatory ability and effectiveness under occlusion. The utility of viewpoints is quantified by information-theoretic metrics, such as mutual information, and an alternative utility function that exploits learned data is developed for special cases. The algorithms in this thesis are demonstrated in a variety of scenarios. We extensively test our online planning and classification methods in simulation as well as with indoor and outdoor datasets. Furthermore, we perform hardware experiments with different mobile platforms equipped with different types of sensors. Most significantly, our hardware experiments with an outdoor robot are to our knowledge the first demonstrations of online active perception in a real outdoor environment. Active perception has broad significance in many applications. This thesis emphasises the advantages of an active approach to object classification and presents its assimilation with a wide range of robotic systems, sensors, and perception algorithms. By demonstration of performance enhancements and diversity, our hope is that the concept of considering perception and planning in an integrated manner will be of benefit in improving current systems that rely on passive data collection

    Detecting Road Obstacles by Erasing Them

    Full text link
    Vehicles can encounter a myriad of obstacles on the road, and it is not feasible to record them all beforehand to train a detector. Our method selects image patches and inpaints them with the surrounding road texture, which tends to remove obstacles from those patches. It them uses a network trained to recognize discrepancies between the original patch and the inpainted one, which signals an erased obstacle. We also contribute a new dataset for monocular road obstacle detection, and show that our approach outperforms the state-of-the-art methods on both our new dataset and the standard Fishyscapes Lost & Found benchmark

    Real-time Aerial Vehicle Detection and Tracking using a Multi-modal Optical Sensor

    Get PDF
    Vehicle tracking from an aerial platform poses a number of unique challenges including the small number of pixels representing a vehicle, large camera motion, and parallax error. For these reasons, it is accepted to be a more challenging task than traditional object tracking and it is generally tackled through a number of different sensor modalities. Recently, the Wide Area Motion Imagery sensor platform has received reasonable attention as it can provide higher resolution single band imagery in addition to its large area coverage. However, still, richer sensory information is required to persistently track vehicles or more research on the application of WAMI for tracking is required. With the advancements in sensor technology, hyperspectral data acquisition at video frame rates become possible as it can be cruical in identifying objects even in low resolution scenes. For this reason, in this thesis, a multi-modal optical sensor concept is considered to improve tracking in adverse scenes. The Rochester Institute of Technology Multi-object Spectrometer is capable of collecting limited hyperspectral data at desired locations in addition to full-frame single band imagery. By acquiring hyperspectral data quickly, tracking can be achieved at reasonableframe rates which turns out to be crucial in tracking. On the other hand, the relatively high cost of hyperspectral data acquisition and transmission need to be taken into account to design a realistic tracking. By inserting extended data of the pixels of interest we can address or avoid the unique challenges posed by aerial tracking. In this direction, we integrate limited hyperspectral data to improve measurement-to-track association. Also, a hyperspectral data based target detection method is presented to avoid the parallax effect and reduce the clutter density. Finally, the proposed system is evaluated on realistic, synthetic scenarios generated by the Digital Image and Remote Sensing software

    Predictive World Models from Real-World Partial Observations

    Full text link
    Cognitive scientists believe adaptable intelligent agents like humans perform reasoning through learned causal mental simulations of agents and environments. The problem of learning such simulations is called predictive world modeling. Recently, reinforcement learning (RL) agents leveraging world models have achieved SOTA performance in game environments. However, understanding how to apply the world modeling approach in complex real-world environments relevant to mobile robots remains an open question. In this paper, we present a framework for learning a probabilistic predictive world model for real-world road environments. We implement the model using a hierarchical VAE (HVAE) capable of predicting a diverse set of fully observed plausible worlds from accumulated sensor observations. While prior HVAE methods require complete states as ground truth for learning, we present a novel sequential training method to allow HVAEs to learn to predict complete states from partially observed states only. We experimentally demonstrate accurate spatial structure prediction of deterministic regions achieving 96.21 IoU, and close the gap to perfect prediction by 62% for stochastic regions using the best prediction. By extending HVAEs to cases where complete ground truth states do not exist, we facilitate continual learning of spatial prediction as a step towards realizing explainable and comprehensive predictive world models for real-world mobile robotics applications. Code is available at https://github.com/robin-karlsson0/predictive-world-models.Comment: Accepted for IEEE MOST 202
    corecore