821 research outputs found
Deep Learning in the Automotive Industry: Applications and Tools
Deep Learning refers to a set of machine learning techniques that utilize
neural networks with many hidden layers for tasks, such as image
classification, speech recognition, language understanding. Deep learning has
been proven to be very effective in these domains and is pervasively used by
many Internet services. In this paper, we describe different automotive uses
cases for deep learning in particular in the domain of computer vision. We
surveys the current state-of-the-art in libraries, tools and infrastructures
(e.\,g.\ GPUs and clouds) for implementing, training and deploying deep neural
networks. We particularly focus on convolutional neural networks and computer
vision use cases, such as the visual inspection process in manufacturing plants
and the analysis of social media data. To train neural networks, curated and
labeled datasets are essential. In particular, both the availability and scope
of such datasets is typically very limited. A main contribution of this paper
is the creation of an automotive dataset, that allows us to learn and
automatically recognize different vehicle properties. We describe an end-to-end
deep learning application utilizing a mobile app for data collection and
process support, and an Amazon-based cloud backend for storage and training.
For training we evaluate the use of cloud and on-premises infrastructures
(including multiple GPUs) in conjunction with different neural network
architectures and frameworks. We assess both the training times as well as the
accuracy of the classifier. Finally, we demonstrate the effectiveness of the
trained classifier in a real world setting during manufacturing process.Comment: 10 page
An out-of-core method for GPU image mapping on large 3D scenarios of the real world
[Abstract] Image mapping on 3D huge scenarios of the real world is one of the most fundamental and computational expensive processes for the integration of multi-source sensing data. Recent studies focused on the observation and characterization of Earth have been enhanced by the proliferation of Unmanned Aerial Vehicle (UAV) and sensors able to capture massive datasets with a high spatial resolution. Despite the advances in manufacturing new cameras and versatile platforms, only a few methods have been developed to characterize the study area by fusing heterogeneous data such as thermal, multispectral or hyperspectral images with high-resolution 3D models. The main reason for this lack of solutions is the challenge to integrate multi-scale datasets and high computational efforts required for image mapping on dense and complex geometric models. In this paper, we propose an efficient pipeline for multi-source image mapping on huge 3D scenarios. Our GPU-based solution significantly reduces the run time and allows us to generate enriched 3D models on-site. The proposed method is out-of-core and it uses available resources of the GPU’s machine to perform two main tasks: (i) image mapping and (ii) occlusion testing. We deploy highly-optimized GPU-kernels for image mapping and detection of self-hidden geometry in the 3D model, as well as a GPU-based parallelization to manage the 3D model considering several spatial partitions according to the GPU capabilities. Our method has been tested on 3D scenarios with different point cloud densities (66M, 271M, 542M) and two sets of multispectral images collected by two drone flights. We focus on launching the proposed method on three platforms: (i) System on a Chip (SoC), (ii) a user-grade laptop and (iii) a PC. The results demonstrate the method’s capabilities in terms of performance and versatility to be computed by commodity hardware. Thus, taking advantage of GPUs, this method opens the door for embedded and edge computing devices for 3D image mapping on large-scale scenarios in near real-time.This work has been partially supported through the research projects TIN2017-84968-R, PID2019-104184RB-I00 funded by MCIN/AEI/10.13039/501100011033 and ERDF funds “A way of doing Europe”, as well as by ED431C 2021/30, ED431F 2021/11 funded by Xunta de Galicia and 1381202 by Junta de AndalucíaXunta de Galicia; ED431C 2021/30Xunta de Galicia; ED431F 2021/11Junta de Andalucía; 138120
Multi-modal Sensor Registration for Vehicle Perception via Deep Neural Networks
The ability to simultaneously leverage multiple modes of sensor information
is critical for perception of an automated vehicle's physical surroundings.
Spatio-temporal alignment of registration of the incoming information is often
a prerequisite to analyzing the fused data. The persistence and reliability of
multi-modal registration is therefore the key to the stability of decision
support systems ingesting the fused information. LiDAR-video systems like on
those many driverless cars are a common example of where keeping the LiDAR and
video channels registered to common physical features is important. We develop
a deep learning method that takes multiple channels of heterogeneous data, to
detect the misalignment of the LiDAR-video inputs. A number of variations were
tested on the Ford LiDAR-video driving test data set and will be discussed. To
the best of our knowledge the use of multi-modal deep convolutional neural
networks for dynamic real-time LiDAR-video registration has not been presented.Comment: 7 pages, double column, IEEE format, accepted at IEEE HPEC 201
Real-time implementation of 3D LiDAR point cloud semantic segmentation in an FPGA
Dissertação de mestrado em Informatics EngineeringIn the last few years, the automotive industry has relied heavily on deep learning applications for
perception solutions. With data-heavy sensors, such as LiDAR, becoming a standard, the task of
developing low-power and real-time applications has become increasingly more challenging. To obtain
the maximum computational efficiency, no longer can one focus solely on the software aspect of such
applications, while disregarding the underlying hardware.
In this thesis, a hardware-software co-design approach is used to implement an inference application
leveraging the SqueezeSegV3, a LiDAR-based convolutional neural network, on the Versal ACAP VCK190
FPGA. Automotive requirements carefully drive the development of the proposed solution, with real-time
performance and low power consumption being the target metrics.
A first experiment validates the suitability of Xilinx’s Vitis-AI tool for the deployment of deep
convolutional neural networks on FPGAs. Both the ResNet-18 and SqueezeNet neural networks are
deployed to the Zynq UltraScale+ MPSoC ZCU104 and Versal ACAP VCK190 FPGAs. The results show
that both networks achieve far more than the real-time requirements while consuming low power.
Compared to an NVIDIA RTX 3090 GPU, the performance per watt during both network’s inference is 12x
and 47.8x higher and 15.1x and 26.6x higher respectively for the Zynq UltraScale+ MPSoC ZCU104 and
the Versal ACAP VCK190 FPGA. These results are obtained with no drop in accuracy in the quantization
step.
A second experiment builds upon the results of the first by deploying a real-time application containing
the SqueezeSegV3 model using the Semantic-KITTI dataset. A framerate of 11 Hz is achieved with a peak
power consumption of 78 Watts. The quantization step results in a minimal accuracy and IoU degradation
of 0.7 and 1.5 points respectively. A smaller version of the same model is also deployed achieving a
framerate of 19 Hz and a peak power consumption of 76 Watts. The application performs semantic
segmentation over all the point cloud with a field of view of 360°.Nos últimos anos a indústria automóvel tem cada vez mais aplicado deep learning para solucionar
problemas de perceção. Dado que os sensores que produzem grandes quantidades de dados, como o
LiDAR, se têm tornado standard, a tarefa de desenvolver aplicações de baixo consumo energético e com
capacidades de reagir em tempo real tem-se tornado cada vez mais desafiante. Para obter a máxima
eficiência computacional, deixou de ser possível focar-se apenas no software aquando do
desenvolvimento de uma aplicação deixando de lado o hardware subjacente.
Nesta tese, uma abordagem de desenvolvimento simultâneo de hardware e software é usada para
implementar uma aplicação de inferência usando o SqueezeSegV3, uma rede neuronal convolucional
profunda, na FPGA Versal ACAP VCK190. São os requisitos automotive que guiam o desenvolvimento da
solução proposta, sendo a performance em tempo real e o baixo consumo energético, as métricas alvo
principais.
Uma primeira experiência valida a aptidão da ferramenta Vitis-AI para a implantação de redes
neuronais convolucionais profundas em FPGAs. As redes ResNet-18 e SqueezeNet são ambas
implantadas nas FPGAs Zynq UltraScale+ MPSoC ZCU104 e Versal ACAP VCK190. Os resultados
mostram que ambas as redes ultrapassam os requisitos de tempo real consumindo pouca energia.
Comparado com a GPU NVIDIA RTX 3090, a performance por Watt durante a inferência de ambas as
redes é superior em 12x e 47.8x e 15.1x e 26.6x respetivamente na Zynq UltraScale+ MPSoC ZCU104
e na Versal ACAP VCK190. Estes resultados foram obtidos sem qualquer perda de accuracy na etapa de
quantização.
Uma segunda experiência é feita no seguimento dos resultados da primeira, implantando uma
aplicação de inferência em tempo real contendo o modelo SqueezeSegV3 e usando o conjunto de dados
Semantic-KITTI. Um framerate de 11 Hz é atingido com um pico de consumo energético de 78 Watts. O
processo de quantização resulta numa perda mínima de accuracy e IoU com valores de 0.7 e 1.5 pontos
respetivamente. Uma versão mais pequena do mesmo modelo é também implantada, atingindo uma
framerate de 19 Hz e um pico de consumo energético de 76 Watts. A aplicação desenvolvida executa
segmentação semântica sobre a totalidade das nuvens de pontos LiDAR, com um campo de visão de
360°
Minuet: Accelerating 3D Sparse Convolutions on GPUs
Sparse Convolution (SC) is widely used for processing 3D point clouds that
are inherently sparse. Different from dense convolution, SC preserves the
sparsity of the input point cloud by only allowing outputs to specific
locations. To efficiently compute SC, prior SC engines first use hash tables to
build a kernel map that stores the necessary General Matrix Multiplication
(GEMM) operations to be executed (Map step), and then use a Gather-GEMM-Scatter
process to execute these GEMM operations (GMaS step). In this work, we analyze
the shortcomings of prior state-of-the-art SC engines, and propose Minuet, a
novel memory-efficient SC engine tailored for modern GPUs. Minuet proposes to
(i) replace the hash tables used in the Map step with a novel segmented sorting
double-traversed binary search algorithm that highly utilizes the on-chip
memory hierarchy of GPUs, (ii) use a lightweight scheme to autotune the tile
size in the Gather and Scatter operations of the GMaS step, such that to adapt
the execution to the particular characteristics of each SC layer, dataset, and
GPU architecture, and (iii) employ a padding-efficient GEMM grouping approach
that reduces both memory padding and kernel launching overheads. Our
evaluations show that Minuet significantly outperforms prior SC engines by on
average (up to ) for end-to-end point cloud network
executions. Our novel segmented sorting double-traversed binary search
algorithm achieves superior speedups by on average (up to
) over prior SC engines in the Map step. The source code of Minuet
is publicly available at https://github.com/UofT-EcoSystem/Minuet
Map Generation from Large Scale Incomplete and Inaccurate Data Labels
Accurately and globally mapping human infrastructure is an important and
challenging task with applications in routing, regulation compliance
monitoring, and natural disaster response management etc.. In this paper we
present progress in developing an algorithmic pipeline and distributed compute
system that automates the process of map creation using high resolution aerial
images. Unlike previous studies, most of which use datasets that are available
only in a few cities across the world, we utilizes publicly available imagery
and map data, both of which cover the contiguous United States (CONUS). We
approach the technical challenge of inaccurate and incomplete training data
adopting state-of-the-art convolutional neural network architectures such as
the U-Net and the CycleGAN to incrementally generate maps with increasingly
more accurate and more complete labels of man-made infrastructure such as roads
and houses. Since scaling the mapping task to CONUS calls for parallelization,
we then adopted an asynchronous distributed stochastic parallel gradient
descent training scheme to distribute the computational workload onto a cluster
of GPUs with nearly linear speed-up.Comment: This paper is accepted by KDD 202
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
- …