153 research outputs found

    실시간 자율주행 인지 시스템을 위한 신경 네트워크와 군집화 기반 미학습 물체 감지기 통합

    Get PDF
    학위논문 (박사) -- 서울대학교 대학원 : 공과대학 기계항공공학부, 2020. 8. 이경수.최근 몇 년간, 센서 기술의 발전과 컴퓨터 공학 분야의 성과들로 인하여 자율주행 연구가 더욱 활발해지고 있다. 자율주행 시스템에 있어서 차량 주변 환경을 인식하는 것은 안전 및 신뢰성 있는 주행을 하기 위해 필요한 가장 중요한 기능이다. 자율주행 시스템은 크게 인지, 판단, 제어로 구성되어 있는데, 인지 모듈은 자율주행 차량이 경로를 설정하고 판단, 제어를 함에 앞서 주변 물체의 위치와 움직임을 파악해야하기 때문에 중요한 정보를 제공한다. 자율주행 인지 모듈은 주행 환경을 파악하기 위해 다양한 센서가 사용된다. 그 중에서도 LiDAR은 현재 많은 자율주행 연구에서 가장 널리 사용되는 센서 중 하나로, 물체의 거리 정보 획득에 있어서 매우 유용하다. 본 논문에서는 LiDAR에서 생성되는 포인트 클라우드 raw 데이터를 활용하여 장애물의 3D 정보를 파악하고 이들을 추적하는 인지 모듈을 제안한다. 인지 모듈의 전체 프레임워크는 크게 세 단계로 구성된다. 1단계는 비지면 포인트 추정을 위한 마스크 생성, 2단계는 특징 추출 및 장애물 감지, 3단계는 장애물 추적으로 구성된다. 현재 대부분의 신경망 기반의 물체 탐지기는 지도학습을 통해 학습된다. 그러나 지도학습 기반 장애물 탐지기는 학습한 장애물을 찾는다는 방법론적 한계를 지니고 있다. 그러나 실제 주행상황에서는 미처 학습하지 못한 물체를 마주하거나 심지어 학습한 물체도 놓칠 수 있다. 인지 모듈의 1단계에서 이러한 지도학습의 방법론적 한계에 대처하기 위해 포인트 클라우드를 일정한 간격으로 구성된 3D 복셀(voxel)로 분할하고, 이로부터 비접지점들을 추출한 뒤 미지의 물체(Unknown object)를 탐지한다. 2단계에서는 각 복셀의 특성을 추출 및 학습하고 네트워크를 학습시킴으로써 객체 감지기를 구성한다. 마지막 3단계에서는 칼만 필터와 헝가리안 알고리즘을 활용한 다중 객체 탐지기를 제안한다. 이렇게 구성된 인지 모듈은 비지면 점들을 추출하여 학습하지 않은 물체에 대해서도 미지의 물체(Unknown object)로 감지하여 실시간으로 장애물 탐지기를 보완한다. 최근 라이다를 활용한 자율주행 용 객체 탐지기에 대한 연구가 활발히 진행되고 있으나 대부분의 연구들은 단일 프레임의 물체 인식에 대해 집중하여 정확도를 올리는 데 집중하고 있다. 그러나 이러한 연구는 감지 중요도와 프레임 간의 감지 연속성 등에 대한 고려가 되어있지 않다는 한계점이 존재한다. 본 논문에서는 실시간 성능을 얻기 위해 이러한 부분을 고려한 성능 지수를 제안하고, 실차 실험을 통해 제안한 인지 모듈을 테스트, 제안한 성능 지수를 통해 평가하였다.In recent few years, the interest in automotive researches on autonomous driving system has been grown up due to advances in sensing technologies and computer science. In the development of autonomous driving system, knowledge about the subject vehicles surroundings is the most essential function for safe and reliable driving. When it comes to making decisions and planning driving scenarios, to know the location and movements of surrounding objects and to distinguish whether an object is a car or pedestrian give valuable information to the autonomous driving system. In the autonomous driving system, various sensors are used to understand the surrounding environment. Since LiDAR gives the distance information of surround objects, it has been the one of the most commonly used sensors in the development of perception system. Despite achievement of the deep neural network research field, its application and research trends on 3D object detection using LiDAR point cloud tend to pursue higher accuracy without considering a practical application. A deep neural-network-based perception module heavily depends on the training dataset, but it is impossible to cover all the possibilities and corner cases. To apply the perception module in actual driving, it needs to detect unknown objects and unlearned objects, which may face on the road. To cope with these problems, in this dissertation, a perception module using LiDAR point cloud is proposed, and its performance is validated via real vehicle test. The whole framework is composed of three stages : stage-1 for the ground estimation playing as a mask for point filtering which are considered as non-ground and stage-2 for feature extraction and object detection, and stage-3 for object tracking. In the first stage, to cope with the methodological limit of supervised learning that only finds learned object, we divide a point cloud into equally spaced 3D voxels the point cloud and extract non-ground points and cluster the points to detect unknown objects. In the second stage, the voxelization is utilized to learn the characteristics of point clouds organized in vertical columns. The trained network can distinguish the object through the extracted features from point clouds. In non-maximum suppression process, we sort the predictions according to IoU between prediction and polygon to select a prediction close to the actual heading angle of the object. The last stage presents a 3D multiple object tracking solution. Through Kalman filter, the learned and unlearned objects next movement is predicted and this prediction updated by measurement detection. Through this process, the proposed object detector complements the detector based on supervised learning by detecting the unlearned object as an unknown object through non-ground point extraction. Recent researches on object detection for autonomous driving have been actively conducted, but recent works tend to focus more on the recognition of the objects at every single frame and developing accurate system. To obtain a real-time performance, this paper focuses on more practical aspects by propose a performance index considering detection priority and detection continuity. The performance of the proposed algorithm has been investigated via real-time vehicle test.Chapter 1 Introduction 1 1.1. Background and Motivation 1 1.2. Overview and Previous Researches 4 1.3. Thesis Objectives 12 1.4. Thesis Outline 14 Chapter 2 Overview of a Perception in Automated Driving 15 Chapter 3 Object Detector 18 3.1. Voxelization & Feature Extraction 22 3.2. Backbone Network 25 3.3. Detection Head & Loss Function Design 28 3.4. Loss Function Design 30 3.5. Data Augmentation 33 3.6. Post Process 39 Chapter 4 Non-Ground Point Clustering 42 4.1. Previous Researches for Ground Removal 44 4.2. Non-Ground Estimation using Voxelization 45 4.3. Non-ground Object Segmentation 50 4.3.1. Object Clustering 52 4.3.2. Bounding Polygon 55 Chapter 5 . Object Tracking 57 5.1. State Prediction and Update 58 5.2. Data Matching Association 60 Chapter 6 Test result for KITTI dataset 62 6.1. Quantitative Analysis 62 6.2. Qualitative Analysis 72 6.3. Additional Training 76 6.3.1. Additional data acquisition 78 6.3.2. Qualitative Analysis 81 Chapter 7 Performance Evaluation 85 7.1. Current Evaluation Metrics 85 7.2. Limitations of Evaluation Metrics 87 7.2.1. Detection Continuity 87 7.2.2. Detection Priority 89 7.3. Criteria for Performance Index 91 Chapter 8 Vehicle Tests based Performance Evaluation 95 8.1. Configuration of Vehicle Tests 95 8.2. Qualitative Analysis 100 8.3. Quantitative Analysis 105 Chapter 9 Conclusions and Future Works 107 Bibliography 109 국문 초록 114Docto

    Vehicular Instrumentation and Data Processing for the Study of Driver Intent

    Get PDF
    The primary goal of this thesis is to provide processed experimental data needed to determine whether driver intentionality and driving-related actions can be predicted from quantitative and qualitative analysis of driver behaviour. Towards this end, an instrumented experimental vehicle capable of recording several synchronized streams of data from the surroundings of the vehicle, the driver gaze with head pose and the vehicle state in a naturalistic driving environment was designed and developed. Several driving data sequences in both urban and rural environments were recorded with the instrumented vehicle. These sequences were automatically annotated for relevant artifacts such as lanes, vehicles and safely driveable areas within road lanes. A framework and associated algorithms required for cross-calibrating the gaze tracking system with the world coordinate system mounted on the outdoor stereo system was also designed and implemented, allowing the mapping of the driver gaze with the surrounding environment. This instrumentation is currently being used for the study of driver intent, geared towards the development of driver maneuver prediction models

    Automated 3D model generation for urban environments [online]

    Get PDF
    Abstract In this thesis, we present a fast approach to automated generation of textured 3D city models with both high details at ground level and complete coverage for birds-eye view. A ground-based facade model is acquired by driving a vehicle equipped with two 2D laser scanners and a digital camera under normal traffic conditions on public roads. One scanner is mounted horizontally and is used to determine the approximate component of relative motion along the movement of the acquisition vehicle via scan matching; the obtained relative motion estimates are concatenated to form an initial path. Assuming that features such as buildings are visible from both ground-based and airborne view, this initial path is globally corrected by Monte-Carlo Localization techniques using an aerial photograph or a Digital Surface Model as a global map. The second scanner is mounted vertically and is used to capture the 3D shape of the building facades. Applying a series of automated processing steps, a texture-mapped 3D facade model is reconstructed from the vertical laser scans and the camera images. In order to obtain an airborne model containing the roof and terrain shape complementary to the facade model, a Digital Surface Model is created from airborne laser scans, then triangulated, and finally texturemapped with aerial imagery. Finally, the facade model and the airborne model are fused to one single model usable for both walk- and fly-thrus. The developed algorithms are evaluated on a large data set acquired in downtown Berkeley, and the results are shown and discussed

    Lidar-based scene understanding for autonomous driving using deep learning

    Get PDF
    With over 1.35 million fatalities related to traffic accidents worldwide, autonomous driving was foreseen at the beginning of this century as a feasible solution to improve security in our roads. Nevertheless, it is meant to disrupt our transportation paradigm, allowing to reduce congestion, pollution, and costs, while increasing the accessibility, efficiency, and reliability of the transportation for both people and goods. Although some advances have gradually been transferred into commercial vehicles in the way of Advanced Driving Assistance Systems (ADAS) such as adaptive cruise control, blind spot detection or automatic parking, however, the technology is far from mature. A full understanding of the scene is actually needed so that allowing the vehicles to be aware of the surroundings, knowing the existing elements of the scene, as well as their motion, intentions and interactions. In this PhD dissertation, we explore new approaches for understanding driving scenes from 3D LiDAR point clouds by using Deep Learning methods. To this end, in Part I we analyze the scene from a static perspective using independent frames to detect the neighboring vehicles. Next, in Part II we develop new ways for understanding the dynamics of the scene. Finally, in Part III we apply all the developed methods to accomplish higher level challenges such as segmenting moving obstacles while obtaining their rigid motion vector over the ground. More specifically, in Chapter 2 we develop a 3D vehicle detection pipeline based on a multi-branch deep-learning architecture and propose a Front (FR-V) and a Bird’s Eye view (BE-V) as 2D representations of the 3D point cloud to serve as input for training our models. Later on, in Chapter 3 we apply and further test this method on two real uses-cases, for pre-filtering moving obstacles while creating maps to better localize ourselves on subsequent days, as well as for vehicle tracking. From the dynamic perspective, in Chapter 4 we learn from the 3D point cloud a novel dynamic feature that resembles optical flow from RGB images. For that, we develop a new approach to leverage RGB optical flow as pseudo ground truth for training purposes but allowing the use of only 3D LiDAR data at inference time. Additionally, in Chapter 5 we explore the benefits of combining classification and regression learning problems to face the optical flow estimation task in a joint coarse-and-fine manner. Lastly, in Chapter 6 we gather the previous methods and demonstrate that with these independent tasks we can guide the learning of higher challenging problems such as segmentation and motion estimation of moving vehicles from our own moving perspective.Con más de 1,35 millones de muertes por accidentes de tráfico en el mundo, a principios de siglo se predijo que la conducción autónoma sería una solución viable para mejorar la seguridad en nuestras carreteras. Además la conducción autónoma está destinada a cambiar nuestros paradigmas de transporte, permitiendo reducir la congestión del tráfico, la contaminación y el coste, a la vez que aumentando la accesibilidad, la eficiencia y confiabilidad del transporte tanto de personas como de mercancías. Aunque algunos avances, como el control de crucero adaptativo, la detección de puntos ciegos o el estacionamiento automático, se han transferido gradualmente a vehículos comerciales en la forma de los Sistemas Avanzados de Asistencia a la Conducción (ADAS), la tecnología aún no ha alcanzado el suficiente grado de madurez. Se necesita una comprensión completa de la escena para que los vehículos puedan entender el entorno, detectando los elementos presentes, así como su movimiento, intenciones e interacciones. En la presente tesis doctoral, exploramos nuevos enfoques para comprender escenarios de conducción utilizando nubes de puntos en 3D capturadas con sensores LiDAR, para lo cual empleamos métodos de aprendizaje profundo. Con este fin, en la Parte I analizamos la escena desde una perspectiva estática para detectar vehículos. A continuación, en la Parte II, desarrollamos nuevas formas de entender las dinámicas del entorno. Finalmente, en la Parte III aplicamos los métodos previamente desarrollados para lograr desafíos de nivel superior, como segmentar obstáculos dinámicos a la vez que estimamos su vector de movimiento sobre el suelo. Específicamente, en el Capítulo 2 detectamos vehículos en 3D creando una arquitectura de aprendizaje profundo de dos ramas y proponemos una vista frontal (FR-V) y una vista de pájaro (BE-V) como representaciones 2D de la nube de puntos 3D que sirven como entrada para entrenar nuestros modelos. Más adelante, en el Capítulo 3 aplicamos y probamos aún más este método en dos casos de uso reales, tanto para filtrar obstáculos en movimiento previamente a la creación de mapas sobre los que poder localizarnos mejor en los días posteriores, como para el seguimiento de vehículos. Desde la perspectiva dinámica, en el Capítulo 4 aprendemos de la nube de puntos en 3D una característica dinámica novedosa que se asemeja al flujo óptico sobre imágenes RGB. Para ello, desarrollamos un nuevo enfoque que aprovecha el flujo óptico RGB como pseudo muestras reales para entrenamiento, usando solo information 3D durante la inferencia. Además, en el Capítulo 5 exploramos los beneficios de combinar los aprendizajes de problemas de clasificación y regresión para la tarea de estimación de flujo óptico de manera conjunta. Por último, en el Capítulo 6 reunimos los métodos anteriores y demostramos que con estas tareas independientes podemos guiar el aprendizaje de problemas de más alto nivel, como la segmentación y estimación del movimiento de vehículos desde nuestra propia perspectivaAmb més d’1,35 milions de morts per accidents de trànsit al món, a principis de segle es va predir que la conducció autònoma es convertiria en una solució viable per millorar la seguretat a les nostres carreteres. D’altra banda, la conducció autònoma està destinada a canviar els paradigmes del transport, fent possible així reduir la densitat del trànsit, la contaminació i el cost, alhora que augmentant l’accessibilitat, l’eficiència i la confiança del transport tant de persones com de mercaderies. Encara que alguns avenços, com el control de creuer adaptatiu, la detecció de punts cecs o l’estacionament automàtic, s’han transferit gradualment a vehicles comercials en forma de Sistemes Avançats d’Assistència a la Conducció (ADAS), la tecnologia encara no ha arribat a aconseguir el grau suficient de maduresa. És necessària, doncs, una total comprensió de l’escena de manera que els vehicles puguin entendre l’entorn, detectant els elements presents, així com el seu moviment, intencions i interaccions. A la present tesi doctoral, explorem nous enfocaments per tal de comprendre les diferents escenes de conducció utilitzant núvols de punts en 3D capturats amb sensors LiDAR, mitjançant l’ús de mètodes d’aprenentatge profund. Amb aquest objectiu, a la Part I analitzem l’escena des d’una perspectiva estàtica per a detectar vehicles. A continuació, a la Part II, desenvolupem noves formes d’entendre les dinàmiques de l’entorn. Finalment, a la Part III apliquem els mètodes prèviament desenvolupats per a aconseguir desafiaments d’un nivell superior, com, per exemple, segmentar obstacles dinàmics al mateix temps que estimem el seu vector de moviment respecte al terra. Concretament, al Capítol 2 detectem vehicles en 3D creant una arquitectura d’aprenentatge profund amb dues branques, i proposem una vista frontal (FR-V) i una vista d’ocell (BE-V) com a representacions 2D del núvol de punts 3D que serveixen com a punt de partida per entrenar els nostres models. Més endavant, al Capítol 3 apliquem i provem de nou aquest mètode en dos casos d’ús reals, tant per filtrar obstacles en moviment prèviament a la creació de mapes en els quals poder localitzar-nos millor en dies posteriors, com per dur a terme el seguiment de vehicles. Des de la perspectiva dinàmica, al Capítol 4 aprenem una nova característica dinàmica del núvol de punts en 3D que s’assembla al flux òptic sobre imatges RGB. Per a fer-ho, desenvolupem un nou enfocament que aprofita el flux òptic RGB com pseudo mostres reals per a entrenament, utilitzant només informació 3D durant la inferència. Després, al Capítol 5 explorem els beneficis que s’obtenen de combinar els aprenentatges de problemes de classificació i regressió per la tasca d’estimació de flux òptic de manera conjunta. Finalment, al Capítol 6 posem en comú els mètodes anteriors i demostrem que mitjançant aquests processos independents podem abordar l’aprenentatge de problemes més complexos, com la segmentació i estimació del moviment de vehicles des de la nostra pròpia perspectiva

    Operating cycle representations for road vehicles

    Get PDF
    This thesis discusses different ways to represent road transport operations mathematically. The intention is to make more realistic predictions of longitudinal performance measures for road vehicles, such as the CO2 emissions. It is argued that a driver and vehicle independent description of relevant transport operations increase the chance that a predicted measure later coincides with the actual measure from the vehicle in its real-world application. This allows for fair comparisons between vehicle designs and, by extension, effective product development. Three different levels of representation are introduced, each with its own purpose and application. The first representation, called the bird\u27s eye view, is a broad, high-level description with few details. It can be used to give a rough picture of the collection of all transport operations that a vehicle executes during its lifetime. It is primarily useful as a classification system to compare different applications and assess their similarity. The second representation, called the stochastic operating cycle (sOC) format, is a statistical, mid-level description with a moderate amount of detail. It can be used to give a comprehensive statistical picture of transport operations, either individually or as a collection. It is primarily useful to measure and reproduce variation in operating conditions, as it describes the physical properties of the road as stochastic processes subject to a hierarchical structure.The third representation, called the deterministic operating cycle (dOC) format, is a physical, low-level description with a great amount of detail. It describes individual operations and contains information about the road, the weather, the traffic and the mission. It is primarily useful as input to dynamic simulations of longitudinal vehicle dynamics.Furthermore, it is discussed how to build a modular, dynamic simulation model that can use data from the dOC format to predict energy usage. At the top level, the complete model has individual modules for the operating cycle, the driver and the vehicle. These share information only through the same interfaces as in reality but have no components in common otherwise and can therefore be modelled separately. Implementations are briefly presented for each module, after which the complete model is showcased in a numerical example.The thesis ends with a discussion, some conclusions, and an outlook on possible ways to continue
    corecore