261 research outputs found
Multi-Modal 3D Object Detection in Autonomous Driving: a Survey
In the past few years, we have witnessed rapid development of autonomous
driving. However, achieving full autonomy remains a daunting task due to the
complex and dynamic driving environment. As a result, self-driving cars are
equipped with a suite of sensors to conduct robust and accurate environment
perception. As the number and type of sensors keep increasing, combining them
for better perception is becoming a natural trend. So far, there has been no
indepth review that focuses on multi-sensor fusion based perception. To bridge
this gap and motivate future research, this survey devotes to review recent
fusion-based 3D detection deep learning models that leverage multiple sensor
data sources, especially cameras and LiDARs. In this survey, we first introduce
the background of popular sensors for autonomous cars, including their common
data representations as well as object detection networks developed for each
type of sensor data. Next, we discuss some popular datasets for multi-modal 3D
object detection, with a special focus on the sensor data included in each
dataset. Then we present in-depth reviews of recent multi-modal 3D detection
networks by considering the following three aspects of the fusion: fusion
location, fusion data representation, and fusion granularity. After a detailed
review, we discuss open challenges and point out possible solutions. We hope
that our detailed review can help researchers to embark investigations in the
area of multi-modal 3D object detection
Radars for Autonomous Driving: A Review of Deep Learning Methods and Challenges
Radar is a key component of the suite of perception sensors used for safe and
reliable navigation of autonomous vehicles. Its unique capabilities include
high-resolution velocity imaging, detection of agents in occlusion and over
long ranges, and robust performance in adverse weather conditions. However, the
usage of radar data presents some challenges: it is characterized by low
resolution, sparsity, clutter, high uncertainty, and lack of good datasets.
These challenges have limited radar deep learning research. As a result,
current radar models are often influenced by lidar and vision models, which are
focused on optical features that are relatively weak in radar data, thus
resulting in under-utilization of radar's capabilities and diminishing its
contribution to autonomous perception. This review seeks to encourage further
deep learning research on autonomous radar data by 1) identifying key research
themes, and 2) offering a comprehensive overview of current opportunities and
challenges in the field. Topics covered include early and late fusion,
occupancy flow estimation, uncertainty modeling, and multipath detection. The
paper also discusses radar fundamentals and data representation, presents a
curated list of recent radar datasets, and reviews state-of-the-art lidar and
vision models relevant for radar research. For a summary of the paper and more
results, visit the website: autonomous-radars.github.io
“Deep sensor fusion architecture for point-cloud semantic segmentation”
Este trabajo de grado desarrolla un completo abordaje del analisis de datos y su procesamiento para obtener una mejor toma de decisiones, presentando así una arquitectura neuronal multimodal basada CNN, comprende explicaciones precisas de los sistemas que integra y realiza una evaluacion del comportamiento en el entorno.Los sistemas de conducción autónoma integran procedimientos realmente complejos, para los cuales la percepción del entorno del vehículo es una fuente de información clave para tomar decisiones durante maniobras en tiempo real. La segmentación semántica de los datos obtenidos de los sensores LiDAR ha desempeñado un papel importante en la consolidación de una representación densa de los objetos y eventos circundantes. Aunque se han hecho grandes avances para resolver esta tarea, creemos que hay una infrautilización de estrategias que aprovechas la fusión de sensores. Presentamos una arquitectura neuronal multimodal, basada en CNNs que es alimentada por las señales de entrada 2D del LiDAR y de la cámara, computa una representación profunda de ambos sensores, y predice un mapeo de etiquetas para el problema de segmentación de puntos en 3D. Evaluamos la arquitectura propuesta en un conjunto de datos derivados del popular dataset KITTI, que contempla clases semánticas comunes ( coche, peatón y ciclista). Nuestro modelo supera a los métodos existentes y muestra una mejora en el refinamiento de las máscaras de segmentación.Self-driving systems are composed by really complex pipelines in which perceiving the vehicle surroundings is a key source of information used to take real-time maneuver decisions. Semantic segmentation on LiDAR sensor data has played a big role in the consolidation of a dense understanding of the surrounding objects and events. Although great advances have been made for this task, we believe there is an under-exploitation of sensor fusion strategies. We present a multimodal neural architecture, based on CNNs that consumes 2D input signals from LiDAR and camera, computes a deep representation leveraging straightness from both sensors, and predicts a label mapping for the 3D point-wise segmentation problem. We evaluated the proposed architecture in a derived dataset from the KITTI vision benchmark suite which contemplates common semantic classes(i.e. car, pedestrian and cyclist). Our model outperforms existing methods and shows improvement in the segmentation masks refinement.MaestríaMagíster en Ingeniería de Sistemas y ComputaciónTable of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Autonomous vehicle perception systems . . . . . . . . . . . . . . . . . . . . 6
2.1 Semantic segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Autonomous vehicles sensing . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.2 LiDAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.3 Radar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.4 Ultrasonic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Point clouds semantic segmentation . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.1 Raw pointcloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.2 Voxelization of pointclouds . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.3 Point cloud projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.4 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Deep multimodal learning for semantic segmentation . . . . . . . . . . . . . 19
3.1 Method overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Point cloud transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Multimodal fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.1 RGB modality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.2 LiDAR modality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.3 Fusion step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.4 Decoding part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.5 Optimization statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1 KITTI dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Evaluation metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.4.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
LiDAR Object Detection Utilizing Existing CNNs for Smart Cities
As governments and private companies alike race to achieve the vision of a smart city — where artificial intelligence (AI) technology is used to enable self-driving cars, cashier-less shopping experiences and connected home devices from thermostats to robot vacuum cleaners — advancements are being made in both software and hardware to enable increasingly real-time, accurate inference at the edge. One hardware solution adopted for this purpose is the LiDAR sensor, which utilizes infrared lasers to accurately detect and map its surroundings in 3D. On the software side, developers have turned to artificial neural networks to make predictions and recommendations with high accuracy. These neural networks have the potential, particularly run on purpose-built hardware such as GPUs and TPUs, to make inferences in near real-time, allowing the AI models to serve as a usable interface for real-world interactions with other AI-powered devices, or with human users. This paper aims to example the joint use of LiDAR sensors and AI to understand its importance in smart city environments
Inimeste tuvastamine ning kauguse hindamine kasutades kaamerat ning YOLOv3 tehisnärvivõrku
Inimestega vähemalt samal tasemel keskkonnast aru saamine masinate poolt oleks kasulik
paljudes domeenides. Mitmed erinevad sensored aitavad selle ülesande juures, enim on
kasutatud kaameraid. Objektide tuvastamine on tähtis osa keskkonnast aru saamisel. Selle
täpsus on viimasel ajal palju paranenud tänu arenenud masinõppe meetoditele nimega
konvolutsioonilised närvivõrgud (CNN), mida treenitakse kasutades märgendatud
kaamerapilte. Monokulaarkaamerapilt sisaldab 2D infot, kuid ei sisalda sügavusinfot. Teisalt,
sügavusinfo on tähtis näiteks isesõitvate autode domeenis. Inimeste ohutus tuleb tagada
näiteks töötades autonoomsete masinate läheduses või kui jalakäija ületab teed autonoomse
sõiduki eest.
Antud töös uuritakse võimalust, kuidas tuvastada inimesi ning hinnata nende kaugusi
samaaegselt, kasutades RGB kaamerat, eesmärgiga kasutada seda autonoomseks sõitmiseks
maastikul. Selleks täiustatakse hetkel parimat objektide tuvastamise konvolutsioonilist
närvivõrku YOLOv3 (ingl k. You Only Look Once). Selle töö väliselt on
simulatsioonitarkvaradega AirSim ning Unreal Engine loodud lumine metsamaastik koos
inimestega erinevates kehapoosides. YOLOv3 närvivõrgu treenimiseks võeti simulatsioonist
välja vajalikud andmed, kasutades skripte. Lisaks muudeti närvivõrku, et lisaks inimese
asukohta tuvastavale piirikastile väljastataks ka inimese kauguse ennustus. Antud töö
tulemuseks on mudel, mille ruutkesmine viga RMSE (ingl k. Root Mean Square Error) on
2.99m objektidele kuni 50m kaugusel, säilitades samaaegselt originaalse närvivõrgu inimeste
tuvastamise täpsuse. Võrreldavate meetodite RMSE veaks leiti 4.26m (teist andmestikku
kasutades) ja 4.79m (selles töös kasutatud andmestikul), mis vastavalt kasutavad kahte
eraldiseisvat närvivõrku ning LASSO meetodit. See näitab suurt parenemist võrreldes teiste
meetoditega. Edasisteks eesmärkideks on meetodi treenimine ning testimine päris maailmast
kogutud andmetega, et näha, kas see üldistub ka sellistele keskkondadele.Making machines perceive environment better or at least as well as humans would be
beneficial in lots of domains. Different sensors aid in this, most widely used of which is
monocular camera. Object detection is a major part of environment perception and its
accuracy has greatly improved in the last few years thanks to advanced machine learning
methods called convolutional neural networks (CNN) that are trained on many labelled
images. Monocular camera image contains two dimensional information, but contains no
depth information of the scene. On the other hand, depth information of objects is important
in a lot of areas related to autonomous driving, e.g. working next to an automated machine,
pedestrian crossing a road in front of an autonomous vehicle, etc.
This thesis presents an approach to detect humans and to predict their distance from RGB
camera for off-road autonomous driving. This is done by improving YOLO (You Only Look
Once) v3[1], a state-of-the-art object detection CNN. Outside of this thesis, an off-road scene
depicting a snowy forest with humans in different body poses was simulated using AirSim
and Unreal Engine. Data for training YOLOv3 neural network was extracted from there using
custom scripts. Also, network was modified to not only predict humans and their bounding
boxes, but also their distance from camera. RMSE of 2.99m for objects with distances up to
50m was achieved, while maintaining similar detection accuracy to the original network.
Comparable methods using two neural networks and a LASSO model gave 4.26m (in an
alternative dataset) and 4.79m (with dataset used is this work) RMSE respectively, showing a
huge improvement over the baselines. Future work includes experiments with real-world data
to see if the proposed approach generalizes to other environments
MS3D++: Ensemble of Experts for Multi-Source Unsupervised Domain Adaptation in 3D Object Detection
Deploying 3D detectors in unfamiliar domains has been demonstrated to result
in a significant 70-90% drop in detection rate due to variations in lidar,
geography, or weather from their training dataset. This domain gap leads to
missing detections for densely observed objects, misaligned confidence scores,
and increased high-confidence false positives, rendering the detector highly
unreliable. To address this, we introduce MS3D++, a self-training framework for
multi-source unsupervised domain adaptation in 3D object detection. MS3D++
generates high-quality pseudo-labels, allowing 3D detectors to achieve high
performance on a range of lidar types, regardless of their density. Our
approach effectively fuses predictions of an ensemble of multi-frame
pre-trained detectors from different source domains to improve domain
generalization. We subsequently refine predictions temporally to ensure
temporal consistency in box localization and object classification.
Furthermore, we present an in-depth study into the performance and
idiosyncrasies of various 3D detector components in a cross-domain context,
providing valuable insights for improved cross-domain detector ensembling.
Experimental results on Waymo, nuScenes and Lyft demonstrate that detectors
trained with MS3D++ pseudo-labels achieve state-of-the-art performance,
comparable to training with human-annotated labels in Bird's Eye View (BEV)
evaluation for both low and high density lidar. Code is available at
https://github.com/darrenjkt/MS3
Recommended from our members
Real-time spatial modeling to detect and track resources on construction sites
For more than 10 years the U.S. construction industry has experienced over 1,000
fatalities annually. Many fatalities may have been prevented had the individuals and
equipment involved been more aware of and alert to the physical state of the environment
around them. Awareness may be improved by automatic 3D (three-dimensional) sensing
and modeling of the job site environment in real-time. Existing 3D modeling approaches
based on range scanning techniques are capable of modeling static objects only, and thus
cannot model in real-time dynamic objects in an environment comprised of moving
humans, equipment, and materials. Emerging prototype 3D video range cameras offer
another alternative by facilitating affordable, wide field of view, automated static and
dynamic object detection and tracking at frame rates better than 1Hz (real-time).
This dissertation presents an imperical work and methodology to rapidly create a
spatial model of construction sites and in particular to detect, model, and track the position, dimension, direction, and velocity of static and moving project resources in real-time, based on range data obtained from a three-dimensional video range camera in a
static or moving position. Existing construction site 3D modeling approaches based on
optical range sensing technologies (laser scanners, rangefinders, etc.) and 3D modeling
approaches (dense, sparse, etc.) that offered potential solutions for this research are
reviewed. The choice of an emerging sensing tool and preliminary experiments with this
prototype sensing technology are discussed. These findings led to the development of a
range data processing algorithm based on three-dimensional occupancy grids which is
demonstrated in detail. Testing and validation of the proposed algorithms have been
conducted to quantify the performance of sensor and algorithm through extensive
experimentation involving static and moving objects. Experiments in indoor laboratory
and outdoor construction environments have been conducted with construction resources
such as humans, equipment, materials, or structures to verify the accuracy of the
occupancy grid modeling approach. Results show that modeling objects and measuring
their position, dimension, direction, and speed had an accuracy level compatible to the
requirements of active safety features for construction. Results demonstrate that video
rate 3D data acquisition and analysis of construction environments can support effective
detection, tracking, and convex hull modeling of objects. Exploiting rapidly generated
three-dimensional models for improved visualization, communications, and process
control has inherent value, broad application, and potential impact, e.g. as-built vs. as-planned comparison, condition assessment, maintenance, operations, and construction
activities control. In combination with effective management practices, this sensing
approach has the potential to assist equipment operators to avoid incidents that result in
reduce human injury, death, or collateral damage on construction sites.Civil, Architectural, and Environmental Engineerin
Design and implementation of a sensor testing system with use of a cable drone
Abstract. This thesis aims to develop a testing method for various sensors by modifying a commercial cable cam system to drive with an automated process at constant speed. The goal is to find a way to lift the cables in the air securely without a need for humans to climb on ladders and place them afterwards. This is achieved with a hinged truss tower structure that keeps the cables stabile while the tower is lifted. Another goal was to achieve automated movement of the cable drone. This is done by connecting a tracking camera to a computer that is used to control the cable drone’s motor controller. This will have the drone behave in a certain way depending on the tracking camera’s position data. Third goal is to build a portable sensor system which collects and saves the data from the tested sensors. This goal is achieved with an aluminium profile frame which is equipped with all the necessary equipment, such as a powerful computer.
Research included studying different sensors’ performance evaluation criteria and effect of the wind on magnitude of the force in this application. Research was done by studying written sources and consulting a cable camera company called Motion Compound GbR. Results of this master’s thesis are used to evaluate if the idea of using a cable cam is applicable for this kind of sensor testing system. As the conclusion the cable drone with automated driving is evaluated to be a practical method which can still be further developed to meet the requirements even better. Antureiden testausjärjestelmän suunnittelu ja toteuttaminen käyttäen vaijeridronea. Tiivistelmä. Tämän diplomityön tavoitteena on muokata kaupallisesta vaijerikamerajärjestelmästä vakionopeudella liikkuva testausmenetelmä eri antureille. Yhtenä työn tavoitteena on löytää tapa nostaa käytettävät vaijerit ylös turvallisesti siten, ettei niitä tarvitse asentaa jälkikäteen korkealla. Tämä toteutetaan saranoidulla, trusseista rakennetulla tornilla. Tornin huipulle asennetaan laakeroidut akselit sekä suoja, jotka yhdessä pitävät vaijerit paikoillaan myös tornin noston ajan. Toinen tavoite on saavuttaa vaijerilennokin automatisoitu liike. Tämä tapahtuu kytkemällä seurantakamera tietokoneeseen, jota käytetään ohjaamaan myös vaijeridronen moottoriohjainta. Näin vaijeridrone saadaan käyttäytymään halutulla tavalla riippuen seurantakameran sijaintitiedoista. Kolmas tavoite on rakentaa kannettava anturijärjestelmä, jolla kerätään ja tallennetaan testatuilla antureilla kerätty data. Tämä tavoite saavutetaan alumiiniprofiilirungolla, joka varustetaan tarvittavilla laitteilla, kuten esimerkiksi tehokkaalla tietokoneella. Tutkimukseen kuului myös antureiden suorituskyvyn arviointikriteereihin tutustuminen sekä työssä käytettävästä järjestelmästä koituvan voiman suuruuden laskeminen. Tutkimus tehtiin perehtymällä kirjallisuuteen ja konsultoimalla vaijerikamera-alalla toimivaa Motion Compound GbR -yritystä. Tämän diplomityön tuloksia voidaan hyödyntää arvioitaessa, onko vaijerikamerajärjestelmä sovellettavissa mainitun anturien testausjärjestelmän rakentamisessa. Lopputuloksena automatisoidulla ajolla varustetun vaijeridronen arvioidaan olevan tähän tarkoitukseen toimiva menetelmä, jota voidaan edelleen kehittää vastaamaan vaatimuksia vielä paremmin
- …