438 research outputs found
An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration
We empirically evaluate an undervolting technique, i.e., underscaling the
circuit supply voltage below the nominal level, to improve the power-efficiency
of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable
Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing
faults due to excessive circuit latency increase. We evaluate the
reliability-power trade-off for such accelerators. Specifically, we
experimentally study the reduced-voltage operation of multiple components of
real FPGAs, characterize the corresponding reliability behavior of CNN
accelerators, propose techniques to minimize the drawbacks of reduced-voltage
operation, and combine undervolting with architectural CNN optimization
techniques, i.e., quantization and pruning. We investigate the effect of
environmental temperature on the reliability-power trade-off of such
accelerators. We perform experiments on three identical samples of modern
Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification
CNN benchmarks. This approach allows us to study the effects of our
undervolting technique for both software and hardware variability. We achieve
more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain
is the result of eliminating the voltage guardband region, i.e., the safe
voltage region below the nominal level that is set by FPGA vendor to ensure
correct functionality in worst-case environmental and circuit conditions. 43%
of the power-efficiency gain is due to further undervolting below the
guardband, which comes at the cost of accuracy loss in the CNN accelerator. We
evaluate an effective frequency underscaling technique that prevents this
accuracy loss, and find that it reduces the power-efficiency gain from 43% to
25%.Comment: To appear at the DSN 2020 conferenc
DeepPicar: A Low-cost Deep Neural Network-based Autonomous Car
We present DeepPicar, a low-cost deep neural network based autonomous car
platform. DeepPicar is a small scale replication of a real self-driving car
called DAVE-2 by NVIDIA. DAVE-2 uses a deep convolutional neural network (CNN),
which takes images from a front-facing camera as input and produces car
steering angles as output. DeepPicar uses the same network architecture---9
layers, 27 million connections and 250K parameters---and can drive itself in
real-time using a web camera and a Raspberry Pi 3 quad-core platform. Using
DeepPicar, we analyze the Pi 3's computing capabilities to support end-to-end
deep learning based real-time control of autonomous vehicles. We also
systematically compare other contemporary embedded computing platforms using
the DeepPicar's CNN-based real-time control workload. We find that all tested
platforms, including the Pi 3, are capable of supporting the CNN-based
real-time control, from 20 Hz up to 100 Hz, depending on hardware platform.
However, we find that shared resource contention remains an important issue
that must be considered in applying CNN models on shared memory based embedded
computing platforms; we observe up to 11.6X execution time increase in the CNN
based control loop due to shared resource contention. To protect the CNN
workload, we also evaluate state-of-the-art cache partitioning and memory
bandwidth throttling techniques on the Pi 3. We find that cache partitioning is
ineffective, while memory bandwidth throttling is an effective solution.Comment: To be published as a conference paper at RTCSA 201
FPGA in image processing supported by IOPT-Flow
Image processing is widely used in the most diverse industries. One of the tools widely used to perform image processing is the OpenCV library. Although the implementation of image processing algorithms can be made in software, it is also possible to implement image processing algorithms in hardware. In some cases, the execution time can be smaller than the execution time achieved in software.
This work main goal is to evaluate the use of VHDL, DS-Pnets, and IOPT-Flow to develop image processing systems in hardware, in FPGA-based platforms. To enable it, a validation platform was developed. A set of image processing algorithms were specified, during this work, in VHDL and/or in DS-Pnets. These were validated using the IOPT-Flow validation tool and/or the Xilinx ISE Simulator. The automatic VHDL code generator from IOPT-Flow framework was used to translate DS-Pnet models into the implementation code. The FPGA-based implementations were compared with software implementations, supported by the OpenCV library. The created DS-Pnet models were added into a folder of the IOPT-Flow editor, to create an image processing library.
It was possible to conclude that the DS-Pnets and their associated tools, IOPT-Flow tools, support the development of image processing systems. These tools, which simplify the development of image processing systems, are available online at http://gres.uninova.pt/iopt-flow/
Implementation of FPGA in the Design of Embedded Systems
The use of FPGAs (Field Programmable Gate Arrays) and configurable processors is an interesting new phenomenon in embedded development. FPGAs offer all of the features needed to implement most complex designs. Clock management is facilitated by on-chip PLL (phase-locked loop) or DLL (delay-locked loop) circuitry. Dedicated memory blocks can be
configured as basic single-port RAMs, ROMs, FIFOs, or CAMs. Data processing, as embodied in the devices’ logic fabric, varies widely. The ability to link the FPGA with backplanes, high-speed buses, and memories is afforded by support for various single ended and differential I/O standards. Also found on today’s FPGAs are system-building resources such as high speed serial I/Os, arithmetic modules, embedded processors, and large amounts of memory.
Here in our project we have tried to implement such powerful FPGAs in the design of possible embedded systems that can be designed, burned and deployed at the site of operation for handling of many kinds of applications. In our project we have basically dealt with two of such applications –one the prioritized traffic light controller and other a speech encrypting and decrypting system
Hardware Acceleration of Video analytics on FPGA using OpenCL
abstract: With the exponential growth in video content over the period of the last few years, analysis of videos is becoming more crucial for many applications such as self-driving cars, healthcare, and traffic management. Most of these video analysis application uses deep learning algorithms such as convolution neural networks (CNN) because of their high accuracy in object detection. Thus enhancing the performance of CNN models become crucial for video analysis. CNN models are computationally-expensive operations and often require high-end graphics processing units (GPUs) for acceleration. However, for real-time applications in an energy-thermal constrained environment such as traffic management, GPUs are less preferred because of their high power consumption, limited energy efficiency. They are challenging to fit in a small place.
To enable real-time video analytics in emerging large scale Internet of things (IoT) applications, the computation must happen at the network edge (near the cameras) in a distributed fashion. Thus, edge computing must be adopted. Recent studies have shown that field-programmable gate arrays (FPGAs) are highly suitable for edge computing due to their architecture adaptiveness, high computational throughput for streaming processing, and high energy efficiency.
This thesis presents a generic OpenCL-defined CNN accelerator architecture optimized for FPGA-based real-time video analytics on edge. The proposed CNN OpenCL kernel adopts a highly pipelined and parallelized 1-D systolic array architecture, which explores both spatial and temporal parallelism for energy efficiency CNN acceleration on FPGAs. The large fan-in and fan-out of computational units to the memory interface are identified as the limiting factor in existing designs that causes scalability issues, and solutions are proposed to resolve the issue with compiler automation. The proposed CNN kernel is highly scalable and parameterized by three architecture parameters, namely pe_num, reuse_fac, and vec_fac, which can be adapted to achieve 100% utilization of the coarse-grained computation resources (e.g., DSP blocks) for a given FPGA. The proposed CNN kernel is generic and can be used to accelerate a wide range of CNN models without recompiling the FPGA kernel hardware. The performance of Alexnet, Resnet-50, Retinanet, and Light-weight Retinanet has been measured by the proposed CNN kernel on Intel Arria 10 GX1150 FPGA. The measurement result shows that the proposed CNN kernel, when mapped with 100% utilization of computation resources, can achieve a latency of 11ms, 84ms, 1614.9ms, and 990.34ms for Alexnet, Resnet-50, Retinanet, and Light-weight Retinanet respectively when the input feature maps and weights are represented using 32-bit floating-point data type.Dissertation/ThesisMasters Thesis Electrical Engineering 201
FPGA based Embedded System to control an electric vehicle and the driver assistance systems
This Master Thesis involves the development of an embedded system based on FPGA
for controlling an electric vehicle based on a Kart platform and its electronic driving
aids. It consists of two distinct stages in the process of hardware-software co-design,
hardware development, which includes all the elements of the periphery of the processor
and communication elements, all developed in VHDL. An important part of the hardware
development also include the development of electronic driving aids, which include traction
control and torque vectoring differential gear, in hardware coprocessors, also writen in
VHDL. The other part of the co-design is the development of the control software, which
is going to be executed by the embedded system’s processor. This Master Thesis will be
used in a range of new electric vehicles that will be built in a near future and also gives
the base for future thesis in the fields of automotive, electronics and computing
- …