1,386 research outputs found
DeepPicar: A Low-cost Deep Neural Network-based Autonomous Car
We present DeepPicar, a low-cost deep neural network based autonomous car
platform. DeepPicar is a small scale replication of a real self-driving car
called DAVE-2 by NVIDIA. DAVE-2 uses a deep convolutional neural network (CNN),
which takes images from a front-facing camera as input and produces car
steering angles as output. DeepPicar uses the same network architecture---9
layers, 27 million connections and 250K parameters---and can drive itself in
real-time using a web camera and a Raspberry Pi 3 quad-core platform. Using
DeepPicar, we analyze the Pi 3's computing capabilities to support end-to-end
deep learning based real-time control of autonomous vehicles. We also
systematically compare other contemporary embedded computing platforms using
the DeepPicar's CNN-based real-time control workload. We find that all tested
platforms, including the Pi 3, are capable of supporting the CNN-based
real-time control, from 20 Hz up to 100 Hz, depending on hardware platform.
However, we find that shared resource contention remains an important issue
that must be considered in applying CNN models on shared memory based embedded
computing platforms; we observe up to 11.6X execution time increase in the CNN
based control loop due to shared resource contention. To protect the CNN
workload, we also evaluate state-of-the-art cache partitioning and memory
bandwidth throttling techniques on the Pi 3. We find that cache partitioning is
ineffective, while memory bandwidth throttling is an effective solution.Comment: To be published as a conference paper at RTCSA 201
TensorFlow Doing HPC
TensorFlow is a popular emerging open-source programming framework supporting
the execution of distributed applications on heterogeneous hardware. While
TensorFlow has been initially designed for developing Machine Learning (ML)
applications, in fact TensorFlow aims at supporting the development of a much
broader range of application kinds that are outside the ML domain and can
possibly include HPC applications. However, very few experiments have been
conducted to evaluate TensorFlow performance when running HPC workloads on
supercomputers. This work addresses this lack by designing four traditional HPC
benchmark applications: STREAM, matrix-matrix multiply, Conjugate Gradient (CG)
solver and Fast Fourier Transform (FFT). We analyze their performance on two
supercomputers with accelerators and evaluate the potential of TensorFlow for
developing HPC applications. Our tests show that TensorFlow can fully take
advantage of high performance networks and accelerators on supercomputers.
Running our TensorFlow STREAM benchmark, we obtain over 50% of theoretical
communication bandwidth on our testing platform. We find an approximately 2x,
1.7x and 1.8x performance improvement when increasing the number of GPUs from
two to four in the matrix-matrix multiply, CG and FFT applications
respectively. All our performance results demonstrate that TensorFlow has high
potential of emerging also as HPC programming framework for heterogeneous
supercomputers.Comment: Accepted for publication at The Ninth International Workshop on
Accelerators and Hybrid Exascale Systems (AsHES'19
Exascale Deep Learning for Climate Analytics
We extract pixel-level masks of extreme weather patterns using variants of
Tiramisu and DeepLabv3+ neural networks. We describe improvements to the
software frameworks, input pipeline, and the network training algorithms
necessary to efficiently scale deep learning on the Piz Daint and Summit
systems. The Tiramisu network scales to 5300 P100 GPUs with a sustained
throughput of 21.0 PF/s and parallel efficiency of 79.0%. DeepLabv3+ scales up
to 27360 V100 GPUs with a sustained throughput of 325.8 PF/s and a parallel
efficiency of 90.7% in single precision. By taking advantage of the FP16 Tensor
Cores, a half-precision version of the DeepLabv3+ network achieves a peak and
sustained throughput of 1.13 EF/s and 999.0 PF/s respectively.Comment: 12 pages, 5 tables, 4, figures, Super Computing Conference November
11-16, 2018, Dallas, TX, US
- …