17 research outputs found

    Image Feature Information Extraction for Interest Point Detection: A Comprehensive Review

    Full text link
    Interest point detection is one of the most fundamental and critical problems in computer vision and image processing. In this paper, we carry out a comprehensive review on image feature information (IFI) extraction techniques for interest point detection. To systematically introduce how the existing interest point detection methods extract IFI from an input image, we propose a taxonomy of the IFI extraction techniques for interest point detection. According to this taxonomy, we discuss different types of IFI extraction techniques for interest point detection. Furthermore, we identify the main unresolved issues related to the existing IFI extraction techniques for interest point detection and any interest point detection methods that have not been discussed before. The existing popular datasets and evaluation standards are provided and the performances for eighteen state-of-the-art approaches are evaluated and discussed. Moreover, future research directions on IFI extraction techniques for interest point detection are elaborated

    Robustness of multimodal 3D object detection using deep learning approach fo autonomous vehicles

    Get PDF
    Dans cette thèse, nous étudions la robustesse d’un modèle multimodal de détection d’objets en 3D dans le contexte de véhicules autonomes. Les véhicules autonomes doivent détecter et localiser avec précision les piétons et les autres véhicules dans leur environnement 3D afin de conduire sur les routes en toute sécurité. La robustesse est l’un des aspects les plus importants d’un algorithme dans le problème de la perception 3D pour véhicules autonomes. C’est pourquoi, dans cette thèse, nous avons proposé une méthode pour évaluer la robustesse d’un modèle de détecteur d’objets en 3D. À cette fin, nous avons formé un détecteur d’objets 3D multimodal représentatif sur trois ensembles de données différents et nous avons effectué des tests sur des ensembles de données qui ont été construits avec précision pour démontrer la robustesse du modèle formé dans diverses conditions météorologiques et de luminosité. Notre méthode utilise deux approches différentes pour construire les ensembles de données proposés afin d’évaluer la robustesse. Dans une approche, nous avons utilisé des images artificiellement corrompues et dans l’autre, nous avons utilisé les images réelles dans des conditions météorologiques et de luminosité extrêmes. Afin de détecter des objets tels que des voitures et des piétons dans les scènes de circulation, le modèle multimodal s’appuie sur des images et des nuages de points 3D. Les approches multimodales pour la détection d’objets en 3D exploitent différents capteurs tels que des caméras et des détecteurs de distance pour détecter les objets d’intérêt dans l’environnement. Nous avons exploité trois ensembles de données bien connus dans le domaine de la conduite autonome, à savoir KITTI, nuScenes et Waymo. Nous avons mené des expériences approfondies pour étudier la méthode proposée afin d’évaluer la robustesse du modèle et nous avons fourni des résultats quantitatifs et qualitatifs. Nous avons observé que la méthode que nous proposons peut mesurer efficacement la robustesse du modèle.In this thesis, we study the robustness of a multimodal 3D object detection model in the context of autonomous vehicles. Self-driving cars need to accurately detect and localize pedestrians and other vehicles in their 3D surrounding environment to drive on the roads safely. Robustness is one of the most critical aspects of an algorithm in the self-driving car 3D perception problem. Therefore, in this work, we proposed a method to evaluate a 3D object detector’s robustness. To this end, we have trained a representative multimodal 3D object detector on three different datasets. Afterward, we evaluated the trained model on datasets that we have proposed and made to assess the robustness of the trained models in diverse weather and lighting conditions. Our method uses two different approaches for building the proposed datasets for evaluating the robustness. In one approach, we used artificially corrupted images, and in the other one, we used the real images captured in diverse weather and lighting conditions. To detect objects such as cars and pedestrians in the traffic scenes, the multimodal model relies on images and 3D point clouds. Multimodal approaches for 3D object detection exploit different sensors such as camera and range detectors for detecting the objects of interest in the surrounding environment. We leveraged three well-known datasets in the domain of autonomous driving consist of KITTI, nuScenes, and Waymo. We conducted extensive experiments to investigate the proposed method for evaluating the model’s robustness and provided quantitative and qualitative results. We observed that our proposed method can measure the robustness of the model effectively

    3D Ground Truth Generation Using Pre-Trained Deep Neural Networks

    Get PDF
    Training 3D object detectors on publicly available data has been limited to small datasets due to the large amount of effort required to generate annotations. The difficulty of labeling in 3D using 2.5D sensors, such as LIDAR, is attributed to the high spatial reasoning skills required to deal with occlusion and partial viewpoints. Additionally, the current methods to label 3D objects are cognitively demanding due to frequent task switching. Reducing both task complexity and the amount of task switching done by annotators is key to reducing the effort and time required to generate 3D bounding box annotations. We therefore seek to reduce the burden on the annotators by leveraging existing 3D object detectors using deep neural networks. This work introduces a novel ground truth generation method that combines human supervision with pre-trained neural networks to generate per-instance 3D point cloud seg- mentation, 3D bounding boxes, and class annotations. The annotators provide object anchor clicks which behave as a seed to generate instance segmentation results in 3D. The points belonging to each instance are then used to regress object centroids, bounding box dimensions, and object orientation. The deep neural network model used to generate the segmentation masks and bounding box parameters is based on the PointNet architecture. We develop our approach with reliance on the KITTI dataset to analyze the quality of the generated ground truth. The neural network model is trained on KITTI training split and the 3D bounding box outputs are generated using annotation clicks collected from the validation split. The validation split of KITTI detection dataset contains 3712 frames of pointcloud and image scenes and it took 16.35 hours to label with the following method. Based on these results, our approach is 19 times faster than the latest published 3D object annotation scheme. Additionally, it is found that the annotators spent less time per object as the number of objects in the scenes increase, making it a very efficient for multi-object labeling. Furthermore, the quality of the generated 3D bounding boxes, using the labeling method, is compared against the KITTI ground truth. It is shown that the model performs on par with the current state-of-the-art 3D detectors and the labeling procedure does not negatively impact the output quality of the bounding boxes. Lastly, the proposed scheme is applied to previously unseen data from the Autonomoose self-driving vehicle to demonstrate generalization capabilities of the network

    A PhD Dissertation on Road Topology Classification for Autonomous Driving

    Get PDF
    La clasificaci´on de la topolog´ıa de la carretera es un punto clave si queremos desarrollar sistemas de conducci´on aut´onoma completos y seguros. Es l´ogico pensar que la comprensi ´on de forma exhaustiva del entorno que rodea al vehiculo, tal como sucede cuando es un ser humano el que toma las decisiones al volante, es una condici´on indispensable si se quiere avanzar en la consecuci´on de veh´ıculos aut´onomos de nivel 4 o 5. Si el conductor, ya sea un sistema aut´onomo, como un ser humano, no tiene acceso a la informaci´on del entorno la disminuci´on de la seguridad es cr´ıtica y el accidente es casi instant´aneo i.e., cuando un conductor se duerme al volante. A lo largo de esta tesis doctoral se presentan sendos sistemas basados en deep leaning que ayudan al sistema de conducci´on aut´onoma a comprender el entorno en el que se encuentra en ese instante. El primero de ellos 3D-Deep y su optimizaci´on 3D-Deepest, es una nueva arquitectura de red para la segmentaci´on sem´antica de carretera en el que se integran fuentes de datos de diferente tipolog´ıa. La segmentaci´on de carretera es clave en un veh´ıculo aut´onomo, ya que es el medio por el que deber´ıa circular en el 99,9% de los casos. El segundo es un sistema de clasificaci´on de intersecciones urbanas mediante diferentes enfoques comprendidos dentro del metric-learning, la integraci´on temporal y la generaci´on de im´agenes sint´eticas. La seguridad es un punto clave en cualquier sistema aut´onomo, y si es de conducci´on a´un m´as. Las intersecciones son uno de los lugares dentro de las ciudades donde la seguridad es cr´ıtica. Los coches siguen trayectorias secantes y por tanto pueden colisionar, la mayor´ıa de ellas son usadas por los peatones para atravesar la v´ıa independientemente de si existen pasos de cebra o no, lo que incrementa de forma alarmante los riesgos de atropello y colisi´on. La implementaci´on de la combinaci´on de ambos sistemas mejora substancialmente la comprensi´on del entorno, y puede considerarse que incrementa la seguridad, allanando el camino en la investigaci´on hacia un veh´ıculo completamente aut´onomo.Road topology classification is a crucial point if we want to develop complete and safe autonomous driving systems. It is logical to think that a thorough understanding of the environment surrounding the ego-vehicle, as it happens when a human being is a decision-maker at the wheel, is an indispensable condition if we want to advance in the achievement of level 4 or 5 autonomous vehicles. If the driver, either an autonomous system or a human being, does not have access to the information of the environment, the decrease in safety is critical, and the accident is almost instantaneous, i.e., when a driver falls asleep at the wheel. Throughout this doctoral thesis, we present two deep learning systems that will help an autonomous driving system understand the environment in which it is at that instant. The first one, 3D-Deep and its optimization 3D-Deepest, is a new network architecture for semantic road segmentation in which data sources of different types are integrated. Road segmentation is vital in an autonomous vehicle since it is the medium on which it should drive in 99.9% of the cases. The second is an urban intersection classification system using different approaches comprised of metric-learning, temporal integration, and synthetic image generation. Safety is a crucial point in any autonomous system, and if it is a driving system, even more so. Intersections are one of the places within cities where safety is critical. Cars follow secant trajectories and therefore can collide; most of them are used by pedestrians to cross the road regardless of whether there are crosswalks or not, which alarmingly increases the risks of being hit and collision. The implementation of the combination of both systems substantially improves the understanding of the environment and can be considered to increase safety, paving the way in the research towards a fully autonomous vehicle

    Deep reinforcement learning for multi-modal embodied navigation

    Full text link
    Ce travail se concentre sur une tâche de micro-navigation en plein air où le but est de naviguer vers une adresse de rue spécifiée en utilisant plusieurs modalités (par exemple, images, texte de scène et GPS). La tâche de micro-navigation extérieure s’avère etre un défi important pour de nombreuses personnes malvoyantes, ce que nous démontrons à travers des entretiens et des études de marché, et nous limitons notre définition des problèmes à leurs besoins. Nous expérimentons d’abord avec un monde en grille partiellement observable (Grid-Street et Grid City) contenant des maisons, des numéros de rue et des régions navigables. Ensuite, nous introduisons le Environnement de Trottoir pour la Navigation Visuelle (ETNV), qui contient des images panoramiques avec des boîtes englobantes pour les numéros de maison, les portes et les panneaux de nom de rue, et des formulations pour plusieurs tâches de navigation. Dans SEVN, nous formons un modèle de politique pour fusionner des observations multimodales sous la forme d’images à résolution variable, de texte visible et de données GPS simulées afin de naviguer vers une porte d’objectif. Nous entraînons ce modèle en utilisant l’algorithme d’apprentissage par renforcement, Proximal Policy Optimization (PPO). Nous espérons que cette thèse fournira une base pour d’autres recherches sur la création d’agents pouvant aider les membres de la communauté des gens malvoyantes à naviguer le monde.This work focuses on an Outdoor Micro-Navigation (OMN) task in which the goal is to navigate to a specified street address using multiple modalities including images, scene-text, and GPS. This task is a significant challenge to many Blind and Visually Impaired (BVI) people, which we demonstrate through interviews and market research. To investigate the feasibility of solving this task with Deep Reinforcement Learning (DRL), we first introduce two partially observable grid-worlds, Grid-Street and Grid City, containing houses, street numbers, and navigable regions. In these environments, we train an agent to find specific houses using local observations under a variety of training procedures. We parameterize our agent with a neural network and train using reinforcement learning methods. Next, we introduce the Sidewalk Environment for Visual Navigation (SEVN), which contains panoramic images with labels for house numbers, doors, and street name signs, and formulations for several navigation tasks. In SEVN, we train another neural network model using Proximal Policy Optimization (PPO) to fuse multi-modal observations in the form of variable resolution images, visible text, and simulated GPS data, and to use this representation to navigate to goal doors. Our best model used all available modalities and was able to navigate to over 100 goals with an 85% success rate. We found that models with access to only a subset of these modalities performed significantly worse, supporting the need for a multi-modal approach to the OMN task. We hope that this thesis provides a foundation for further research into the creation of agents to assist members of the BVI community to safely navigate
    corecore