Search CORE

928 research outputs found

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

Author: Blott Michaela
Fraser Nicholas J.
Gambardella Giulio
Jahre Magnus
Leong Philip
Umuroglu Yaman
Vissers Kees
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/12/2016
Field of study

Research has shown that convolutional neural networks contain significant redundancy, and high classification accuracy can be obtained even when weights and activations are reduced from floating point to binary values. In this paper, we present FINN, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture. By utilizing a novel set of optimizations that enable efficient mapping of binarized neural networks to hardware, we implement fully connected, convolutional and pooling layers, with per-layer compute resources being tailored to user-provided throughput requirements. On a ZC706 embedded FPGA platform drawing less than 25 W total system power, we demonstrate up to 12.3 million image classifications per second with 0.31 {\mu}s latency on the MNIST dataset with 95.8% accuracy, and 21906 image classifications per second with 283 {\mu}s latency on the CIFAR-10 and SVHN datasets with respectively 80.1% and 94.9% accuracy. To the best of our knowledge, ours are the fastest classification rates reported to date on these benchmarks.Comment: To appear in the 25th International Symposium on Field-Programmable Gate Arrays, February 201

arXiv.org e-Print Archive

Crossref

NORA - Norwegian Open Research Archives

An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration

Author: Ergin Oguz
Kestelman Adrian Cristal
Koc Fahrettin
Mutlu Onur
Onural Erhan Baturay
Salami Behzad
Sarbazi-Azad Hamid
Unsal Osman S.
Yuksel Ismail Emir
Publication venue
Publication date: 01/01/2020
Field of study

We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators. Specifically, we experimentally study the reduced-voltage operation of multiple components of real FPGAs, characterize the corresponding reliability behavior of CNN accelerators, propose techniques to minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning. We investigate the effect of environmental temperature on the reliability-power trade-off of such accelerators. We perform experiments on three identical samples of modern Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification CNN benchmarks. This approach allows us to study the effects of our undervolting technique for both software and hardware variability. We achieve more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain is the result of eliminating the voltage guardband region, i.e., the safe voltage region below the nominal level that is set by FPGA vendor to ensure correct functionality in worst-case environmental and circuit conditions. 43% of the power-efficiency gain is due to further undervolting below the guardband, which comes at the cost of accuracy loss in the CNN accelerator. We evaluate an effective frequency underscaling technique that prevents this accuracy loss, and find that it reduces the power-efficiency gain from 43% to 25%.Comment: To appear at the DSN 2020 conferenc

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC

TOBB ETÜ Institutional Repository

Design exploration and performance strategies towards power-efficient FPGA-based achitectures for sound source localization

Author: Braeken An
da Silva Gomes Bruno
Segers Laurent
Steenhaut Kris
Touhafi Abdellah
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2019
Field of study

Many applications rely on MEMS microphone arrays for locating sound sources prior to their execution. Those applications not only are executed under real-time constraints but also are often embedded on low-power devices. These environments become challenging when increasing the number of microphones or requiring dynamic responses. Field-Programmable Gate Arrays (FPGAs) are usually chosen due to their flexibility and computational power. This work intends to guide the design of reconfigurable acoustic beamforming architectures, which are not only able to accurately determine the sound Direction-Of-Arrival (DoA) but also capable to satisfy the most demanding applications in terms of power efficiency. Design considerations of the required operations performing the sound location are discussed and analysed in order to facilitate the elaboration of reconfigurable acoustic beamforming architectures. Performance strategies are proposed and evaluated based on the characteristics of the presented architecture. This power-efficient architecture is compared to a different architecture prioritizing performance in order to reveal the unavoidable design trade-offs

Ghent University Academic Bibliography

Minimalistic SDHC-SPI hardware reader module for boot loader applications

Author: Bellido Díaz Manuel Jesús
Guerrero Martos David
Juan Chico Jorge
Ostúa Arangüena Enrique
Ruiz de Clavijo Vázquez Paulino
Viejo Cortés Julián
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

This paper introduces a low-footprint full hardware boot loading solution for FPGA-based Programmable Systems on Chip. The proposed module allows loading the system code and data from a standard SD card without having to re-program the whole embedded system. The hardware boot loader is processor independent and removes the need of a software boot loader and the related memory resources. The hardware overhead introduced is manageable, even in low-range FPGA chips, and negligible in mid- and high-range devices. The implementation of the SD card reader module is explained in detail and an example of a multi-boot loader is offered as well. The multi-boot loader is implemented and tested with the Xilinx's Picoblaze microcontroller

Crossref

idUS. Depósito de Investigación Universidad de Sevilla

Hardware Implementations of a Deep Learning Approach to Optimal Configuration of Reconfigurable Intelligence Surfaces

Author: Castillo Morales María Encarnación
García Ríos Antonio
Martín Martín Alberto
Morán Alejandro
Padial-Allué Rubén
Parellada Serrano Ignacio
Parrilla Roure Luis
Publication venue
Publication date: 30/01/2024
Field of study

Reconfigurable intelligent surfaces (RIS) offer the potential to customize the radio propagation environment for wireless networks, and will be a key element for 6G communications. However, due to the unique constraints in these systems, the optimization problems associated to RIS configuration are challenging to solve. This paper illustrates a new approach to the RIS configuration problem, based on the use of artificial intelligence (AI) and deep learning (DL) algorithms. Concretely, a custom convolutional neural network (CNN) intended for edge computing is presented, and implementations on different representative edge devices are compared, including the use of commercial AI-oriented devices and a field-programmable gate array (FPGA) platform. This FPGA option provides the best performance, with x20 performance increase over the closest FP32, GPU-accelerated option, and almost x3 performance advantage when compared with the INT8-quantized, TPU-accelerated implementation. More noticeably, this is achieved even when high-level synthesis (HLS) tools are used and no custom accelerators are developed. At the same time, the inherent reconfigurability of FPGAs opens a new field for their use as enabler hardware in RIS applications.This work is part of the project TED2021-129938B-I00, funded by MCIN/AEI/10.13039/501100011033 and by the European Union NextGenerationEU/PRTR

Repositorio Institucional Universidad de Granada

High Level Synthesis, a Use Case Comparison with Hardware Description Language

Author: Zwagerman Michael D.
Publication venue: ScholarWorks@GVSU
Publication date: 01/04/2015
Field of study

This paper compares Vivado High-Level Synthesis (HLS), a new mainstream technology offered by Xilinx Inc., against the typical Hardware Description Language (HDL) design approach. An example video filter application was implemented via both methods and compared for differences in performance and Non-Reoccurring Engineering (NRE). Lessons learned using HLS are also provided. The objective of this paper is to provide actual comparison data on the current state of mainstream HLS to enable informed decision making for designs considering HLS. The Xilinx Zync System on a Chip (SoC) offering is used as a platform for both the traditional HDL methods and HLS. This platform includes Field Programmable Gate Array (FPGA) fabric combined with a high speed application microprocessor. These single silicon SoC solutions appear to be a platform capable of effectively utilizing HLS. The example video application selected for implementation is a 9 by 9 kernel convolution filter performed on 24 bit 1080p video at 60 frames per second. The 2013 Xilinx Vivado tool suite was used for both HLS and HDL methods. HLS proved to be very easy to use to create a functional RTL design. With naïve implementations in both, HLS did not perform well in resource utilization. HLS also provided a design with a slower maximum clock frequency

Scholarworks@GVSU

HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA

Author: Benini Luca
Capotondi Alessandro
Kurth Andreas
Marongiu Andrea
Vogel Pirmin
Publication venue
Publication date: 01/01/2017
Field of study

Heterogeneous embedded systems on chip (HESoCs) co-integrate a standard host processor with programmable manycore accelerators (PMCAs) to combine general-purpose computing with domain-specific, efficient processing capabilities. While leading companies successfully advance their HESoC products, research lags behind due to the challenges of building a prototyping platform that unites an industry-standard host processor with an open research PMCA architecture. In this work we introduce HERO, an FPGA-based research platform that combines a PMCA composed of clusters of RISC-V cores, implemented as soft cores on an FPGA fabric, with a hard ARM Cortex-A multicore host processor. The PMCA architecture mapped on the FPGA is silicon-proven, scalable, configurable, and fully modifiable. HERO includes a complete software stack that consists of a heterogeneous cross-compilation toolchain with support for OpenMP accelerator programming, a Linux driver, and runtime libraries for both host and PMCA. HERO is designed to facilitate rapid exploration on all software and hardware layers: run-time behavior can be accurately analyzed by tracing events, and modifications can be validated through fully automated hard ware and software builds and executed tests. We demonstrate the usefulness of HERO by means of case studies from our research

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia