210 research outputs found

    End-to-End Tracking and Semantic Segmentation Using Recurrent Neural Networks

    Full text link
    In this work we present a novel end-to-end framework for tracking and classifying a robot's surroundings in complex, dynamic and only partially observable real-world environments. The approach deploys a recurrent neural network to filter an input stream of raw laser measurements in order to directly infer object locations, along with their identity in both visible and occluded areas. To achieve this we first train the network using unsupervised Deep Tracking, a recently proposed theoretical framework for end-to-end space occupancy prediction. We show that by learning to track on a large amount of unsupervised data, the network creates a rich internal representation of its environment which we in turn exploit through the principle of inductive transfer of knowledge to perform the task of it's semantic classification. As a result, we show that only a small amount of labelled data suffices to steer the network towards mastering this additional task. Furthermore we propose a novel recurrent neural network architecture specifically tailored to tracking and semantic classification in real-world robotics applications. We demonstrate the tracking and classification performance of the method on real-world data collected at a busy road junction. Our evaluation shows that the proposed end-to-end framework compares favourably to a state-of-the-art, model-free tracking solution and that it outperforms a conventional one-shot training scheme for semantic classification

    Obstacle Prediction for Automated Guided Vehicles Based on Point Clouds Measured by a Tilted LIDAR Sensor

    Get PDF

    Multimodal perception for autonomous driving

    Get PDF
    Mención Internacional en el título de doctorAutonomous driving is set to play an important role among intelligent transportation systems in the coming decades. The advantages of its large-scale implementation –reduced accidents, shorter commuting times, or higher fuel efficiency– have made its development a priority for academia and industry. However, there is still a long way to go to achieve full self-driving vehicles, capable of dealing with any scenario without human intervention. To this end, advances in control, navigation and, especially, environment perception technologies are yet required. In particular, the detection of other road users that may interfere with the vehicle’s trajectory is a key element, since it allows to model the current traffic situation and, thus, to make decisions accordingly. The objective of this thesis is to provide solutions to some of the main challenges of on-board perception systems, such as extrinsic calibration of sensors, object detection, and deployment on real platforms. First, a calibration method for obtaining the relative transformation between pairs of sensors is introduced, eliminating the complex manual adjustment of these parameters. The algorithm makes use of an original calibration pattern and supports LiDARs, and monocular and stereo cameras. Second, different deep learning models for 3D object detection using LiDAR data in its bird’s eye view projection are presented. Through a novel encoding, the use of architectures tailored to image detection is proposed to process the 3D information of point clouds in real time. Furthermore, the effectiveness of using this projection together with image features is analyzed. Finally, a method to mitigate the accuracy drop of LiDARbased detection networks when deployed in ad-hoc configurations is introduced. For this purpose, the simulation of virtual signals mimicking the specifications of the desired real device is used to generate new annotated datasets that can be used to train the models. The performance of the proposed methods is evaluated against other existing alternatives using reference benchmarks in the field of computer vision (KITTI and nuScenes) and through experiments in open traffic with an automated vehicle. The results obtained demonstrate the relevance of the presented work and its suitability for commercial use.La conducción autónoma está llamada a jugar un papel importante en los sistemas inteligentes de transporte de las próximas décadas. Las ventajas de su implementación a larga escala –disminución de accidentes, reducción del tiempo de trayecto, u optimización del consumo– han convertido su desarrollo en una prioridad para la academia y la industria. Sin embargo, todavía hay un largo camino por delante hasta alcanzar una automatización total, capaz de enfrentarse a cualquier escenario sin intervención humana. Para ello, aún se requieren avances en las tecnologías de control, navegación y, especialmente, percepción del entorno. Concretamente, la detección de otros usuarios de la carretera que puedan interferir en la trayectoria del vehículo es una pieza fundamental para conseguirlo, puesto que permite modelar el estado actual del tráfico y tomar decisiones en consecuencia. El objetivo de esta tesis es aportar soluciones a algunos de los principales retos de los sistemas de percepción embarcados, como la calibración extrínseca de los sensores, la detección de objetos, y su despliegue en plataformas reales. En primer lugar, se introduce un método para la obtención de la transformación relativa entre pares de sensores, eliminando el complejo ajuste manual de estos parámetros. El algoritmo hace uso de un patrón de calibración propio y da soporte a cámaras monoculares, estéreo, y LiDAR. En segundo lugar, se presentan diferentes modelos de aprendizaje profundo para la detección de objectos en 3D utilizando datos de escáneres LiDAR en su proyección en vista de pájaro. A través de una nueva codificación, se propone la utilización de arquitecturas de detección en imagen para procesar en tiempo real la información tridimensional de las nubes de puntos. Además, se analiza la efectividad del uso de esta proyección junto con características procedentes de imágenes. Por último, se introduce un método para mitigar la pérdida de precisión de las redes de detección basadas en LiDAR cuando son desplegadas en configuraciones ad-hoc. Para ello, se plantea la simulación de señales virtuales con las características del modelo real que se quiere utilizar, generando así nuevos conjuntos anotados para entrenar los modelos. El rendimiento de los métodos propuestos es evaluado frente a otras alternativas existentes haciendo uso de bases de datos de referencia en el campo de la visión por computador (KITTI y nuScenes), y mediante experimentos en tráfico abierto empleando un vehículo automatizado. Los resultados obtenidos demuestran la relevancia de los trabajos presentados y su viabilidad para un uso comercial.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: Jesús García Herrero.- Secretario: Ignacio Parra Alonso.- Vocal: Gustavo Adolfo Peláez Coronad

    Fusion de données multi capteurs pour la détection et le suivi d'objets mobiles à partir d'un véhicule autonome

    Get PDF
    La perception est un point clé pour le fonctionnement d'un véhicule autonome ou même pour un véhicule fournissant des fonctions d'assistance. Un véhicule observe le monde externe à l'aide de capteurs et construit un modèle interne de l'environnement extérieur. Il met à jour en continu ce modèle de l'environnement en utilisant les dernières données des capteurs. Dans ce cadre, la perception peut être divisée en deux étapes : la première partie, appelée SLAM (Simultaneous Localization And Mapping) s'intéresse à la construction d'une carte de l'environnement extérieur et à la localisation du véhicule hôte dans cette carte, et deuxième partie traite de la détection et du suivi des objets mobiles dans l'environnement (DATMO pour Detection And Tracking of Moving Objects). En utilisant des capteurs laser de grande précision, des résultats importants ont été obtenus par les chercheurs. Cependant, avec des capteurs laser de faible résolution et des données bruitées, le problème est toujours ouvert, en particulier le problème du DATMO. Dans cette thèse nous proposons d'utiliser la vision (mono ou stéréo) couplée à un capteur laser pour résoudre ce problème. La première contribution de cette thèse porte sur l'identification et le développement de trois niveaux de fusion. En fonction du niveau de traitement de l'information capteur avant le processus de fusion, nous les appelons "fusion bas niveau", "fusion au niveau de la détection" et "fusion au niveau du suivi". Pour la fusion bas niveau, nous avons utilisé les grilles d'occupations. Pour la fusion au niveau de la détection, les objets détectés par chaque capteur sont fusionnés pour avoir une liste d'objets fusionnés. La fusion au niveau du suivi requiert le suivi des objets pour chaque capteur et ensuite on réalise la fusion entre les listes d'objets suivis. La deuxième contribution de cette thèse est le développement d'une technique rapide pour trouver les bords de route à partir des données du laser et en utilisant cette information nous supprimons de nombreuses fausses alarmes. Nous avons en effet observé que beaucoup de fausses alarmes apparaissent sur le bord de la route. La troisième contribution de cette thèse est le développement d'une solution complète pour la perception avec un capteur laser et des caméras stéréo-vision et son intégration sur un démonstrateur du projet européen Intersafe-2. Ce projet s'intéresse à la sécurité aux intersections et vise à y réduire les blessures et les accidents mortels. Dans ce projet, nous avons travaillé en collaboration avec Volkswagen, l'Université Technique de Cluj-Napoca, en Roumanie et l'INRIA Paris pour fournir une solution complète de perception et d'évaluation des risques pour le démonstrateur de Volkswagen.Perception is one of important steps for the functioning of an autonomous vehicle or even for a vehicle providing only driver assistance functions. Vehicle observes the external world using its sensors and builds an internal model of the outer environment configuration. It keeps on updating this internal model using latest sensor data. In this setting perception can be divided into two sub parts: first part, called SLAM(Simultaneous Localization And Mapping), is concerned with building an online map of the external environment and localizing the host vehicle in this map, and second part deals with finding moving objects in the environment and tracking them over time and is called DATMO(Detection And Tracking of Moving Objects). Using high resolution and accurate laser scanners successful efforts have been made by many researchers to solve these problems. However, with low resolution or noisy laser scanners solving these problems, especially DATMO, is still a challenge and there are either many false alarms, miss detections or both. In this thesis we propose that by using vision sensor (mono or stereo) along with laser sensor and by developing an effective fusion scheme on an appropriate level, these problems can be greatly reduced. The main contribution of this research is concerned with the identification of three fusion levels and development of fusion techniques for each level for SLAM and DATMO based perception architecture of autonomous vehicles. Depending on the amount of preprocessing required before fusion for each level, we call them low level, object detection level and track level fusion. For low level we propose to use grid based fusion technique and by giving appropriate weights (depending on the sensor properties) to each grid for each sensor a fused grid can be obtained giving better view of the external environment in some sense. For object detection level fusion, lists of objects detected for each sensor are fused to get a list of fused objects where fused objects have more information then their previous versions. We use a Bayesian fusion technique for this level. Track level fusion requires to track moving objects for each sensor separately and then do a fusion between tracks to get fused tracks. Fusion at this level helps remove false tracks. Second contribution of this research is the development of a fast technique of finding road borders from noisy laser data and then using these border information to remove false moving objects. Usually we have observed that many false moving objects appear near the road borders due to sensor noise. If they are not filtered out then they result into many false tracks close to vehicle making vehicle to apply breaks or to issue warning messages to the driver falsely. Third contribution is the development of a complete perception solution for lidar and stereo vision sensors and its intigration on a real vehicle demonstrator used for a European Union project (INTERSAFE-21). This project is concerned with the safety at intersections and aims at the reduction of injury and fatal accidents there. In this project we worked in collaboration with Volkswagen, Technical university of Cluj-Napoca Romania and INRIA Paris to provide a complete perception and risk assessment solution for this project.SAVOIE-SCD - Bib.électronique (730659901) / SudocGRENOBLE1/INP-Bib.électronique (384210012) / SudocGRENOBLE2/3-Bib.électronique (384219901) / SudocSudocFranceF

    Multi-Sensor Data Fusion for Robust Environment Reconstruction in Autonomous Vehicle Applications

    Get PDF
    In autonomous vehicle systems, understanding the surrounding environment is mandatory for an intelligent vehicle to make every decision of movement on the road. Knowledge about the neighboring environment enables the vehicle to detect moving objects, especially irregular events such as jaywalking, sudden lane change of the vehicle etc. to avoid collision. This local situation awareness mostly depends on the advanced sensors (e.g. camera, LIDAR, RADAR) added to the vehicle. The main focus of this work is to formulate a problem of reconstructing the vehicle environment using point cloud data from the LIDAR and RGB color images from the camera. Based on a widely used point cloud registration tool such as iterated closest point (ICP), an expectation-maximization (EM)-ICP technique has been proposed to automatically mosaic multiple point cloud sets into a larger one. Motion trajectories of the moving objects are analyzed to address the issue of irregularity detection. Another contribution of this work is the utilization of fusion of color information (from RGB color images captured by the camera) with the three-dimensional point cloud data for better representation of the environment. For better understanding of the surrounding environment, histogram of oriented gradient (HOG) based techniques are exploited to detect pedestrians and vehicles.;Using both camera and LIDAR, an autonomous vehicle can gather information and reconstruct the map of the surrounding environment up to a certain distance. Capability of communicating and cooperating among vehicles can improve the automated driving decisions by providing extended and more precise view of the surroundings. In this work, a transmission power control algorithm is studied along with the adaptive content control algorithm to achieve a more accurate map of the vehicle environment. To exchange the local sensor data among the vehicles, an adaptive communication scheme is proposed that controls the lengths and the contents of the messages depending on the load of the communication channel. The exchange of this information can extend the tracking region of a vehicle beyond the area sensed by its own sensors. In this experiment, a combined effect of power control, and message length and content control algorithm is exploited to improve the map\u27s accuracy of the surroundings in a cooperative automated vehicle system

    “Deep sensor fusion architecture for point-cloud semantic segmentation”

    Get PDF
    Este trabajo de grado desarrolla un completo abordaje del analisis de datos y su procesamiento para obtener una mejor toma de decisiones, presentando así una arquitectura neuronal multimodal basada CNN, comprende explicaciones precisas de los sistemas que integra y realiza una evaluacion del comportamiento en el entorno.Los sistemas de conducción autónoma integran procedimientos realmente complejos, para los cuales la percepción del entorno del vehículo es una fuente de información clave para tomar decisiones durante maniobras en tiempo real. La segmentación semántica de los datos obtenidos de los sensores LiDAR ha desempeñado un papel importante en la consolidación de una representación densa de los objetos y eventos circundantes. Aunque se han hecho grandes avances para resolver esta tarea, creemos que hay una infrautilización de estrategias que aprovechas la fusión de sensores. Presentamos una arquitectura neuronal multimodal, basada en CNNs que es alimentada por las señales de entrada 2D del LiDAR y de la cámara, computa una representación profunda de ambos sensores, y predice un mapeo de etiquetas para el problema de segmentación de puntos en 3D. Evaluamos la arquitectura propuesta en un conjunto de datos derivados del popular dataset KITTI, que contempla clases semánticas comunes ( coche, peatón y ciclista). Nuestro modelo supera a los métodos existentes y muestra una mejora en el refinamiento de las máscaras de segmentación.Self-driving systems are composed by really complex pipelines in which perceiving the vehicle surroundings is a key source of information used to take real-time maneuver decisions. Semantic segmentation on LiDAR sensor data has played a big role in the consolidation of a dense understanding of the surrounding objects and events. Although great advances have been made for this task, we believe there is an under-exploitation of sensor fusion strategies. We present a multimodal neural architecture, based on CNNs that consumes 2D input signals from LiDAR and camera, computes a deep representation leveraging straightness from both sensors, and predicts a label mapping for the 3D point-wise segmentation problem. We evaluated the proposed architecture in a derived dataset from the KITTI vision benchmark suite which contemplates common semantic classes(i.e. car, pedestrian and cyclist). Our model outperforms existing methods and shows improvement in the segmentation masks refinement.MaestríaMagíster en Ingeniería de Sistemas y ComputaciónTable of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Autonomous vehicle perception systems . . . . . . . . . . . . . . . . . . . . 6 2.1 Semantic segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Autonomous vehicles sensing . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 LiDAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.3 Radar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.4 Ultrasonic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Point clouds semantic segmentation . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.1 Raw pointcloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.2 Voxelization of pointclouds . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.3 Point cloud projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.4 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3 Deep multimodal learning for semantic segmentation . . . . . . . . . . . . . 19 3.1 Method overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Point cloud transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3 Multimodal fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3.1 RGB modality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.2 LiDAR modality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3.3 Fusion step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3.4 Decoding part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3.5 Optimization statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.1 KITTI dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2 Evaluation metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.4.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
    corecore