Search CORE

438 research outputs found

An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration

Author: Ergin Oguz
Kestelman Adrian Cristal
Koc Fahrettin
Mutlu Onur
Onural Erhan Baturay
Salami Behzad
Sarbazi-Azad Hamid
Unsal Osman S.
Yuksel Ismail Emir
Publication venue
Publication date: 01/01/2020
Field of study

We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators. Specifically, we experimentally study the reduced-voltage operation of multiple components of real FPGAs, characterize the corresponding reliability behavior of CNN accelerators, propose techniques to minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning. We investigate the effect of environmental temperature on the reliability-power trade-off of such accelerators. We perform experiments on three identical samples of modern Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification CNN benchmarks. This approach allows us to study the effects of our undervolting technique for both software and hardware variability. We achieve more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain is the result of eliminating the voltage guardband region, i.e., the safe voltage region below the nominal level that is set by FPGA vendor to ensure correct functionality in worst-case environmental and circuit conditions. 43% of the power-efficiency gain is due to further undervolting below the guardband, which comes at the cost of accuracy loss in the CNN accelerator. We evaluate an effective frequency underscaling technique that prevents this accuracy loss, and find that it reduces the power-efficiency gain from 43% to 25%.Comment: To appear at the DSN 2020 conferenc

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC

TOBB ETÜ Institutional Repository

DeepPicar: A Low-cost Deep Neural Network-based Autonomous Car

Author: Bechtel Michael G.
Kim Minje
McEllhiney Elise
Yun Heechul
Publication venue
Publication date: 29/07/2018
Field of study

We present DeepPicar, a low-cost deep neural network based autonomous car platform. DeepPicar is a small scale replication of a real self-driving car called DAVE-2 by NVIDIA. DAVE-2 uses a deep convolutional neural network (CNN), which takes images from a front-facing camera as input and produces car steering angles as output. DeepPicar uses the same network architecture---9 layers, 27 million connections and 250K parameters---and can drive itself in real-time using a web camera and a Raspberry Pi 3 quad-core platform. Using DeepPicar, we analyze the Pi 3's computing capabilities to support end-to-end deep learning based real-time control of autonomous vehicles. We also systematically compare other contemporary embedded computing platforms using the DeepPicar's CNN-based real-time control workload. We find that all tested platforms, including the Pi 3, are capable of supporting the CNN-based real-time control, from 20 Hz up to 100 Hz, depending on hardware platform. However, we find that shared resource contention remains an important issue that must be considered in applying CNN models on shared memory based embedded computing platforms; we observe up to 11.6X execution time increase in the CNN based control loop due to shared resource contention. To protect the CNN workload, we also evaluate state-of-the-art cache partitioning and memory bandwidth throttling techniques on the Pi 3. We find that cache partitioning is ineffective, while memory bandwidth throttling is an effective solution.Comment: To be published as a conference paper at RTCSA 201

arXiv.org e-Print Archive

Crossref

FPGA in image processing supported by IOPT-Flow

Author: Carrasqueira Tiago Miguel Saraiva
Publication venue
Publication date: 01/01/2019
Field of study

Image processing is widely used in the most diverse industries. One of the tools widely used to perform image processing is the OpenCV library. Although the implementation of image processing algorithms can be made in software, it is also possible to implement image processing algorithms in hardware. In some cases, the execution time can be smaller than the execution time achieved in software. This work main goal is to evaluate the use of VHDL, DS-Pnets, and IOPT-Flow to develop image processing systems in hardware, in FPGA-based platforms. To enable it, a validation platform was developed. A set of image processing algorithms were specified, during this work, in VHDL and/or in DS-Pnets. These were validated using the IOPT-Flow validation tool and/or the Xilinx ISE Simulator. The automatic VHDL code generator from IOPT-Flow framework was used to translate DS-Pnet models into the implementation code. The FPGA-based implementations were compared with software implementations, supported by the OpenCV library. The created DS-Pnet models were added into a folder of the IOPT-Flow editor, to create an image processing library. It was possible to conclude that the DS-Pnets and their associated tools, IOPT-Flow tools, support the development of image processing systems. These tools, which simplify the development of image processing systems, are available online at http://gres.uninova.pt/iopt-flow/

Repositório da Universidade Nova de Lisboa

Implementation of FPGA in the Design of Embedded Systems

Author: Srikanth P K A S
Srinivas K
Publication venue
Publication date: 01/01/2007
Field of study

The use of FPGAs (Field Programmable Gate Arrays) and configurable processors is an interesting new phenomenon in embedded development. FPGAs offer all of the features needed to implement most complex designs. Clock management is facilitated by on-chip PLL (phase-locked loop) or DLL (delay-locked loop) circuitry. Dedicated memory blocks can be configured as basic single-port RAMs, ROMs, FIFOs, or CAMs. Data processing, as embodied in the devices’ logic fabric, varies widely. The ability to link the FPGA with backplanes, high-speed buses, and memories is afforded by support for various single ended and differential I/O standards. Also found on today’s FPGAs are system-building resources such as high speed serial I/Os, arithmetic modules, embedded processors, and large amounts of memory. Here in our project we have tried to implement such powerful FPGAs in the design of possible embedded systems that can be designed, burned and deployed at the site of operation for handling of many kinds of applications. In our project we have basically dealt with two of such applications –one the prioritized traffic light controller and other a speech encrypting and decrypting system

ethesis@nitr

Implementation of a Car Make and Model Recognition (CMMR) algorithm on a reconfigurable platform

Author: Ξηραδάκης Νικόλαος Ι.
Publication venue
Publication date: 01/01/2022
Field of study

University of Thessaly Institutional Repository

Hardware Acceleration of Video analytics on FPGA using OpenCL

Author
Publication venue
Publication date: 01/01/2019
Field of study

abstract: With the exponential growth in video content over the period of the last few years, analysis of videos is becoming more crucial for many applications such as self-driving cars, healthcare, and traffic management. Most of these video analysis application uses deep learning algorithms such as convolution neural networks (CNN) because of their high accuracy in object detection. Thus enhancing the performance of CNN models become crucial for video analysis. CNN models are computationally-expensive operations and often require high-end graphics processing units (GPUs) for acceleration. However, for real-time applications in an energy-thermal constrained environment such as traffic management, GPUs are less preferred because of their high power consumption, limited energy efficiency. They are challenging to fit in a small place. To enable real-time video analytics in emerging large scale Internet of things (IoT) applications, the computation must happen at the network edge (near the cameras) in a distributed fashion. Thus, edge computing must be adopted. Recent studies have shown that field-programmable gate arrays (FPGAs) are highly suitable for edge computing due to their architecture adaptiveness, high computational throughput for streaming processing, and high energy efficiency. This thesis presents a generic OpenCL-defined CNN accelerator architecture optimized for FPGA-based real-time video analytics on edge. The proposed CNN OpenCL kernel adopts a highly pipelined and parallelized 1-D systolic array architecture, which explores both spatial and temporal parallelism for energy efficiency CNN acceleration on FPGAs. The large fan-in and fan-out of computational units to the memory interface are identified as the limiting factor in existing designs that causes scalability issues, and solutions are proposed to resolve the issue with compiler automation. The proposed CNN kernel is highly scalable and parameterized by three architecture parameters, namely pe_num, reuse_fac, and vec_fac, which can be adapted to achieve 100% utilization of the coarse-grained computation resources (e.g., DSP blocks) for a given FPGA. The proposed CNN kernel is generic and can be used to accelerate a wide range of CNN models without recompiling the FPGA kernel hardware. The performance of Alexnet, Resnet-50, Retinanet, and Light-weight Retinanet has been measured by the proposed CNN kernel on Intel Arria 10 GX1150 FPGA. The measurement result shows that the proposed CNN kernel, when mapped with 100% utilization of computation resources, can achieve a latency of 11ms, 84ms, 1614.9ms, and 990.34ms for Alexnet, Resnet-50, Retinanet, and Light-weight Retinanet respectively when the input feature maps and weights are represented using 32-bit floating-point data type.Dissertation/ThesisMasters Thesis Electrical Engineering 201

ASU Digital Repository

FPGA based Embedded System to control an electric vehicle and the driver assistance systems

Author: Márquez Luciano Mario
Publication venue
Publication date: 19/11/2014
Field of study

This Master Thesis involves the development of an embedded system based on FPGA for controlling an electric vehicle based on a Kart platform and its electronic driving aids. It consists of two distinct stages in the process of hardware-software co-design, hardware development, which includes all the elements of the periphery of the processor and communication elements, all developed in VHDL. An important part of the hardware development also include the development of electronic driving aids, which include traction control and torque vectoring differential gear, in hardware coprocessors, also writen in VHDL. The other part of the co-design is the development of the control software, which is going to be executed by the embedded system’s processor. This Master Thesis will be used in a range of new electric vehicles that will be built in a near future and also gives the base for future thesis in the fields of automotive, electronics and computing

Repositorio de Objetos de Docencia e Investigación de la Universidad de Cádiz