Search CORE

2,072 research outputs found

Reasoning From Point Clouds

Author: Wilson Joey
Publication venue: DigitalCommons@CalPoly
Publication date: 01/12/2019
Field of study

Over the past two years, 3D object detection has been a major area of focus across industry and academia. This is primarily due to the difficulty of learning data from point clouds. While camera images are fixed size and can therefore be easily trained on using convolution, point clouds are unstructured series of points in three dimensions. Therefore, there is no fixed number of features, or a structure to run convolution on. Instead, researchers have developed many ways of attempting to learn from this data, however there is no clear consensus on what is the best method, as each has advantages and disadvantages. For this project, I chose to focus on understanding and implementing VoxelNet, a voxelized method for object detection using point cloud data. I used the VoxelNet architecture for the task of detecting objects in the surrounding environment and creating 3D bounding boxes around those objects. I trained these models on the Waymo Open Dataset, then measured performance on the Carla simulator. The goal of training on the Waymo Open Dataset was to gain experience with the new dataset and familiarity with its features, and then evaluate the practicality of the Carla simulator by using a model trained with real-world data in it

DigitalCommons@CalPoly

PV-SSD: A Projection and Voxel-based Double Branch Single-Stage 3D Object Detector

Author: Shao Yongxin
Sun Zhetao
Tan Aihong
Yan Tianhong
Zheng Enhui
Publication venue
Publication date: 13/08/2023
Field of study

LIDAR-based 3D object detection and classification is crucial for autonomous driving. However, inference in real-time from extremely sparse 3D data poses a formidable challenge. To address this issue, a common approach is to project point clouds onto a bird's-eye or perspective view, effectively converting them into an image-like data format. However, this excessive compression of point cloud data often leads to the loss of information. This paper proposes a 3D object detector based on voxel and projection double branch feature extraction (PV-SSD) to address the problem of information loss. We add voxel features input containing rich local semantic information, which is fully fused with the projected features in the feature extraction stage to reduce the local information loss caused by projection. A good performance is achieved compared to the previous work. In addition, this paper makes the following contributions: 1) a voxel feature extraction method with variable receptive fields is proposed; 2) a feature point sampling method by weight sampling is used to filter out the feature points that are more conducive to the detection task; 3) the MSSFA module is proposed based on the SSFA module. To verify the effectiveness of our method, we designed comparison experiments

arXiv.org e-Print Archive

Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous Driving

Author: Chandhok Shivam
Choudhary Tushar
Dewangan Vikrant
Jain Anushka
Jatavallabhula Krishna Murthy
Krishna K. Madhava
Priyadarshan Shubham
Singh Arun K.
Srivastava Siddharth
Publication venue
Publication date: 14/11/2023
Field of study

Talk2BEV is a large vision-language model (LVLM) interface for bird's-eye view (BEV) maps in autonomous driving contexts. While existing perception systems for autonomous driving scenarios have largely focused on a pre-defined (closed) set of object categories and driving scenarios, Talk2BEV blends recent advances in general-purpose language and vision models with BEV-structured map representations, eliminating the need for task-specific models. This enables a single system to cater to a variety of autonomous driving tasks encompassing visual and spatial reasoning, predicting the intents of traffic actors, and decision-making based on visual cues. We extensively evaluate Talk2BEV on a large number of scene understanding tasks that rely on both the ability to interpret free-form natural language queries, and in grounding these queries to the visual context embedded into the language-enhanced BEV map. To enable further research in LVLMs for autonomous driving scenarios, we develop and release Talk2BEV-Bench, a benchmark encompassing 1000 human-annotated BEV scenarios, with more than 20,000 questions and ground-truth responses from the NuScenes dataset.Comment: Project page at https://llmbev.github.io/talk2bev

arXiv.org e-Print Archive

실시간 자율주행 인지 시스템을 위한 신경 네트워크와 군집화 기반 미학습 물체 감지기 통합

Author: 유정겸
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 기계항공공학부, 2020. 8. 이경수.최근 몇 년간, 센서 기술의 발전과 컴퓨터 공학 분야의 성과들로 인하여 자율주행 연구가 더욱 활발해지고 있다. 자율주행 시스템에 있어서 차량 주변 환경을 인식하는 것은 안전 및 신뢰성 있는 주행을 하기 위해 필요한 가장 중요한 기능이다. 자율주행 시스템은 크게 인지, 판단, 제어로 구성되어 있는데, 인지 모듈은 자율주행 차량이 경로를 설정하고 판단, 제어를 함에 앞서 주변 물체의 위치와 움직임을 파악해야하기 때문에 중요한 정보를 제공한다. 자율주행 인지 모듈은 주행 환경을 파악하기 위해 다양한 센서가 사용된다. 그 중에서도 LiDAR은 현재 많은 자율주행 연구에서 가장 널리 사용되는 센서 중 하나로, 물체의 거리 정보 획득에 있어서 매우 유용하다. 본 논문에서는 LiDAR에서 생성되는 포인트 클라우드 raw 데이터를 활용하여 장애물의 3D 정보를 파악하고 이들을 추적하는 인지 모듈을 제안한다. 인지 모듈의 전체 프레임워크는 크게 세 단계로 구성된다. 1단계는 비지면 포인트 추정을 위한 마스크 생성, 2단계는 특징 추출 및 장애물 감지, 3단계는 장애물 추적으로 구성된다. 현재 대부분의 신경망 기반의 물체 탐지기는 지도학습을 통해 학습된다. 그러나 지도학습 기반 장애물 탐지기는 학습한 장애물을 찾는다는 방법론적 한계를 지니고 있다. 그러나 실제 주행상황에서는 미처 학습하지 못한 물체를 마주하거나 심지어 학습한 물체도 놓칠 수 있다. 인지 모듈의 1단계에서 이러한 지도학습의 방법론적 한계에 대처하기 위해 포인트 클라우드를 일정한 간격으로 구성된 3D 복셀(voxel)로 분할하고, 이로부터 비접지점들을 추출한 뒤 미지의 물체(Unknown object)를 탐지한다. 2단계에서는 각 복셀의 특성을 추출 및 학습하고 네트워크를 학습시킴으로써 객체 감지기를 구성한다. 마지막 3단계에서는 칼만 필터와 헝가리안 알고리즘을 활용한 다중 객체 탐지기를 제안한다. 이렇게 구성된 인지 모듈은 비지면 점들을 추출하여 학습하지 않은 물체에 대해서도 미지의 물체(Unknown object)로 감지하여 실시간으로 장애물 탐지기를 보완한다. 최근 라이다를 활용한 자율주행 용 객체 탐지기에 대한 연구가 활발히 진행되고 있으나 대부분의 연구들은 단일 프레임의 물체 인식에 대해 집중하여 정확도를 올리는 데 집중하고 있다. 그러나 이러한 연구는 감지 중요도와 프레임 간의 감지 연속성 등에 대한 고려가 되어있지 않다는 한계점이 존재한다. 본 논문에서는 실시간 성능을 얻기 위해 이러한 부분을 고려한 성능 지수를 제안하고, 실차 실험을 통해 제안한 인지 모듈을 테스트, 제안한 성능 지수를 통해 평가하였다.In recent few years, the interest in automotive researches on autonomous driving system has been grown up due to advances in sensing technologies and computer science. In the development of autonomous driving system, knowledge about the subject vehicles surroundings is the most essential function for safe and reliable driving. When it comes to making decisions and planning driving scenarios, to know the location and movements of surrounding objects and to distinguish whether an object is a car or pedestrian give valuable information to the autonomous driving system. In the autonomous driving system, various sensors are used to understand the surrounding environment. Since LiDAR gives the distance information of surround objects, it has been the one of the most commonly used sensors in the development of perception system. Despite achievement of the deep neural network research field, its application and research trends on 3D object detection using LiDAR point cloud tend to pursue higher accuracy without considering a practical application. A deep neural-network-based perception module heavily depends on the training dataset, but it is impossible to cover all the possibilities and corner cases. To apply the perception module in actual driving, it needs to detect unknown objects and unlearned objects, which may face on the road. To cope with these problems, in this dissertation, a perception module using LiDAR point cloud is proposed, and its performance is validated via real vehicle test. The whole framework is composed of three stages : stage-1 for the ground estimation playing as a mask for point filtering which are considered as non-ground and stage-2 for feature extraction and object detection, and stage-3 for object tracking. In the first stage, to cope with the methodological limit of supervised learning that only finds learned object, we divide a point cloud into equally spaced 3D voxels the point cloud and extract non-ground points and cluster the points to detect unknown objects. In the second stage, the voxelization is utilized to learn the characteristics of point clouds organized in vertical columns. The trained network can distinguish the object through the extracted features from point clouds. In non-maximum suppression process, we sort the predictions according to IoU between prediction and polygon to select a prediction close to the actual heading angle of the object. The last stage presents a 3D multiple object tracking solution. Through Kalman filter, the learned and unlearned objects next movement is predicted and this prediction updated by measurement detection. Through this process, the proposed object detector complements the detector based on supervised learning by detecting the unlearned object as an unknown object through non-ground point extraction. Recent researches on object detection for autonomous driving have been actively conducted, but recent works tend to focus more on the recognition of the objects at every single frame and developing accurate system. To obtain a real-time performance, this paper focuses on more practical aspects by propose a performance index considering detection priority and detection continuity. The performance of the proposed algorithm has been investigated via real-time vehicle test.Chapter 1 Introduction 1 1.1. Background and Motivation 1 1.2. Overview and Previous Researches 4 1.3. Thesis Objectives 12 1.4. Thesis Outline 14 Chapter 2 Overview of a Perception in Automated Driving 15 Chapter 3 Object Detector 18 3.1. Voxelization & Feature Extraction 22 3.2. Backbone Network 25 3.3. Detection Head & Loss Function Design 28 3.4. Loss Function Design 30 3.5. Data Augmentation 33 3.6. Post Process 39 Chapter 4 Non-Ground Point Clustering 42 4.1. Previous Researches for Ground Removal 44 4.2. Non-Ground Estimation using Voxelization 45 4.3. Non-ground Object Segmentation 50 4.3.1. Object Clustering 52 4.3.2. Bounding Polygon 55 Chapter 5 . Object Tracking 57 5.1. State Prediction and Update 58 5.2. Data Matching Association 60 Chapter 6 Test result for KITTI dataset 62 6.1. Quantitative Analysis 62 6.2. Qualitative Analysis 72 6.3. Additional Training 76 6.3.1. Additional data acquisition 78 6.3.2. Qualitative Analysis 81 Chapter 7 Performance Evaluation 85 7.1. Current Evaluation Metrics 85 7.2. Limitations of Evaluation Metrics 87 7.2.1. Detection Continuity 87 7.2.2. Detection Priority 89 7.3. Criteria for Performance Index 91 Chapter 8 Vehicle Tests based Performance Evaluation 95 8.1. Configuration of Vehicle Tests 95 8.2. Qualitative Analysis 100 8.3. Quantitative Analysis 105 Chapter 9 Conclusions and Future Works 107 Bibliography 109 국문 초록 114Docto

SNU Open Repository and Archive

On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving

Author: Bai Yeqi
Cai Pinlong
Cai Xinyu
Dou Min
Fu Daocheng
Hu Shuanglu
Li Xin
Li Yingxuan
Ma Tao
Qiao Yu
Shang Dengke
Shi Botian
Sun Shaoyan
Wang Xiaofeng
Wen Licheng
Xu Linran
Yang Xuemeng
Zhu Zheng
Publication venue
Publication date: 28/11/2023
Field of study

The pursuit of autonomous driving technology hinges on the sophisticated integration of perception, decision-making, and control systems. Traditional approaches, both data-driven and rule-based, have been hindered by their inability to grasp the nuance of complex driving environments and the intentions of other road users. This has been a significant bottleneck, particularly in the development of common sense reasoning and nuanced scene understanding necessary for safe and reliable autonomous driving. The advent of Visual Language Models (VLM) represents a novel frontier in realizing fully autonomous vehicle driving. This report provides an exhaustive evaluation of the latest state-of-the-art VLM, GPT-4V(ision), and its application in autonomous driving scenarios. We explore the model's abilities to understand and reason about driving scenes, make decisions, and ultimately act in the capacity of a driver. Our comprehensive tests span from basic scene recognition to complex causal reasoning and real-time decision-making under varying conditions. Our findings reveal that GPT-4V demonstrates superior performance in scene understanding and causal reasoning compared to existing autonomous systems. It showcases the potential to handle out-of-distribution scenarios, recognize intentions, and make informed decisions in real driving contexts. However, challenges remain, particularly in direction discernment, traffic light recognition, vision grounding, and spatial reasoning tasks. These limitations underscore the need for further research and development. Project is now available on GitHub for interested parties to access and utilize: \url{https://github.com/PJLab-ADG/GPT4V-AD-Exploration

arXiv.org e-Print Archive