924 research outputs found
An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration
We empirically evaluate an undervolting technique, i.e., underscaling the
circuit supply voltage below the nominal level, to improve the power-efficiency
of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable
Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing
faults due to excessive circuit latency increase. We evaluate the
reliability-power trade-off for such accelerators. Specifically, we
experimentally study the reduced-voltage operation of multiple components of
real FPGAs, characterize the corresponding reliability behavior of CNN
accelerators, propose techniques to minimize the drawbacks of reduced-voltage
operation, and combine undervolting with architectural CNN optimization
techniques, i.e., quantization and pruning. We investigate the effect of
environmental temperature on the reliability-power trade-off of such
accelerators. We perform experiments on three identical samples of modern
Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification
CNN benchmarks. This approach allows us to study the effects of our
undervolting technique for both software and hardware variability. We achieve
more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain
is the result of eliminating the voltage guardband region, i.e., the safe
voltage region below the nominal level that is set by FPGA vendor to ensure
correct functionality in worst-case environmental and circuit conditions. 43%
of the power-efficiency gain is due to further undervolting below the
guardband, which comes at the cost of accuracy loss in the CNN accelerator. We
evaluate an effective frequency underscaling technique that prevents this
accuracy loss, and find that it reduces the power-efficiency gain from 43% to
25%.Comment: To appear at the DSN 2020 conferenc
DeepPicar: A Low-cost Deep Neural Network-based Autonomous Car
We present DeepPicar, a low-cost deep neural network based autonomous car
platform. DeepPicar is a small scale replication of a real self-driving car
called DAVE-2 by NVIDIA. DAVE-2 uses a deep convolutional neural network (CNN),
which takes images from a front-facing camera as input and produces car
steering angles as output. DeepPicar uses the same network architecture---9
layers, 27 million connections and 250K parameters---and can drive itself in
real-time using a web camera and a Raspberry Pi 3 quad-core platform. Using
DeepPicar, we analyze the Pi 3's computing capabilities to support end-to-end
deep learning based real-time control of autonomous vehicles. We also
systematically compare other contemporary embedded computing platforms using
the DeepPicar's CNN-based real-time control workload. We find that all tested
platforms, including the Pi 3, are capable of supporting the CNN-based
real-time control, from 20 Hz up to 100 Hz, depending on hardware platform.
However, we find that shared resource contention remains an important issue
that must be considered in applying CNN models on shared memory based embedded
computing platforms; we observe up to 11.6X execution time increase in the CNN
based control loop due to shared resource contention. To protect the CNN
workload, we also evaluate state-of-the-art cache partitioning and memory
bandwidth throttling techniques on the Pi 3. We find that cache partitioning is
ineffective, while memory bandwidth throttling is an effective solution.Comment: To be published as a conference paper at RTCSA 201
Development of a multi-core and multi-accelerator platform for approximate computing
Proyecto de graduación (Licenciatura en IngenierÃa en Electrónica) Instituto Tecnológico de Costa Rica, Escuela de IngenierÃa Electrónica, 2017.Changing environment in the current technologies have introduce a gap between the
ever growing needs of users and the state of present designs. As high data and hard
computation applications moved forward in the near future, the current trend reaches
for a greater performance. Approximate computing enters this scheme to boost a system
overall attributes, while working with intrinsic and error tolerable characteristics both in
software and hardware. This work proposes a multicore and multi-accelerator platform
design that uses both exact and approximate versions, also providing interaction with a
software counterpart to ensure usage of both layouts. A set of five di↵erent approximate
accelerator versions and one exact, are present for three di↵erent image processing filters,
Laplace, Sobel and Gauss, along with their respective characterization in terms of Power,
Area and Delay time. This will show better results for design versions 2 and 3. Later
it will be seen three di↵erent interfaces designs for accelerators along with a softcore
processor, Altera’s NIOS II. Results gathered demonstrate a definitively improvement
while using approximate accelerators in comparison with software and exact accelerator
implementations. Memory accessing and filter operations times, for two di↵erent matrices
sizes, present a gain of 500, 2000 and 1500 cycles measure for Laplace, Gauss and Sobel
filters respectively, while contrasting software times, and a range of 28-84, 20-40 and
68-100 ticks decrease against the use of an exact accelerator
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201
- …