Search CORE

360 research outputs found

Low-power FSMs in FPGA: Encoding alternatives

Author: Boemo Eduardo I.
López-Buedo Sergio
Sutter Gustavo D.
Todorovich Elías
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2002
Field of study

The final publication is available at Springer via http://dx.doi.org/10.1007/3-540-45716-X_36Proceedings of 12th International Workshop, PATMOS 2002 Seville, Spain, September 11–13, 2002In this paper, the problem of state encoding of FPGA-based synchronous finite state machines (FSMs) for low-power is addressed. Four codification schemes have been studied: First, the usual binary encoding and the One-Hot approach suggested by the FPGA vendor; then, a code that minimizes the output logic; finally, the so-called Two-Hot code strategy. FSMs of the MCNC and PREP benchmark suites have been analyzed. Main results show that binary state encoding fit well with small machines (up to 8 states), meanwhile One-Hot is better for large FSMs (over 16 states). A power saving of up to the 57% can be achieved selecting the appropriate encoding. An areapower correlation has been observed in spite of the circuit or encoding scheme. Thus, FSMs that make use of fewer resources are good candidates to consume less power.Ministry of Science of Spain, under Contract TIC2001-2688-C03-03, has supported this work. Additional funds have been obtained from Projects 658001 and 658004 of the Fundación General de la Universidad Autónoma de Madrid

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

Finite State Machines With Input Multiplexing: A Performance Study

Author: García Vargas Ignacio
Senhadji Navarro Raouf
Publication venue: IEEE Computer Society
Publication date: 01/01/2015
Field of study

Finite state machines with input multiplexing (FSMIMs) have been proposed in previous works as a technique for efficient mapping FSMs into ROM memory. In this paper, we propose a new architecture for implementing FSMIMs, called FSMIM with state-based input selection, whose goal is to achieve a further reduction in memory usage. This paper also describes in detail the algorithms for generating FSMIMs used by the tool FSMIM-Gen, which has been developed and made available on the Internet for free public use. A comparative study in terms of speed and area between FSMIM approaches and other field programmable gate array-based techniques is presented. The results show that the FSMIM approaches obtain huge reductions in the look-up table (LUT) usage by using a small number of embedded memory blocks. In addition, speed improvements over conventional LUT-based implementations have been obtained in many cases

idUS. Depósito de Investigación Universidad de Sevilla

NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

Author: Aimar Alessandro
Calabrese Enrico
Corradi Federico
Delbruck Tobi
Linares-Barranco Alejandro
Liu Shih-Chii
Lungu Iulia-Alexandra
Milde Moritz B.
Mostafa Hesham
Rios-Navarro Antonio
Tapiador-Morales Ricardo
Publication venue
Publication date: 01/01/2017
Field of study

Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving many state-of-the-art (SOA) visual processing tasks. Even though Graphical Processing Units (GPUs) are most often used in training and deploying CNNs, their power efficiency is less than 10 GOp/s/W for single-frame runtime inference. We propose a flexible and efficient CNN accelerator architecture called NullHop that implements SOA CNNs useful for low-power and low-latency application scenarios. NullHop exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements. The flexible architecture allows high utilization of available computing resources across kernel sizes ranging from 1x1 to 7x7. NullHop can process up to 128 input and 128 output feature maps per layer in a single pass. We implemented the proposed architecture on a Xilinx Zynq FPGA platform and present results showing how our implementation reduces external memory transfers and compute time in five different CNNs ranging from small ones up to the widely known large VGG16 and VGG19 CNNs. Post-synthesis simulations using Mentor Modelsim in a 28nm process with a clock frequency of 500 MHz show that the VGG19 network achieves over 450 GOp/s. By exploiting sparsity, NullHop achieves an efficiency of 368%, maintains over 98% utilization of the MAC units, and achieves a power efficiency of over 3TOp/s/W in a core area of 6.3mm

^2

. As further proof of NullHop's usability, we interfaced its FPGA implementation with a neuromorphic event camera for real time interactive demonstrations

arXiv.org e-Print Archive

ZORA

Western Sydney ResearchDirect

idUS. Depósito de Investigación Universidad de Sevilla

ROM-Based Finite State Machine Implementation in Low Cost FPGAs

Author: Civit Balcells Antón
García Vargas Ignacio
Guerra Gutiérrez P.
Jiménez Moreno Gabriel
Senhadji Navarro Raouf
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2007
Field of study

This work presents a technique for the resource optimization of input multiplexed ROM-based Finite State Machines. This technique exploits the don't care value of the inputs to reduce the memory size as well as multiplexer complexity. This technique has been applied to a publicly available FSM benchmarks and implemented in a low-cost FPGA. Results have been compared with tools supported ROM and standard logic cells implementations. In a significant number of test cases, the proposed technique is the best design alternative, both in resource requirements and speed

idUS. Depósito de Investigación Universidad de Sevilla

Decomposition and encoding of finite state machines for FPGA implementation

Author: Slusarczyk A.S.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2004
Field of study

xii+187hlm.;24c

Pure OAI Repository

uilis.unsyiah.ac.id

Using Efficient Path Profiling to Optimize Memory Consumption of On-Chip Debugging for High-Level Synthesis

Author: Ferrandi Fabrizio
Fezzardi Pietro
Lattuada Marco
Publication venue
Publication date: 01/01/2017
Field of study

High-Level Synthesis (HLS) for FPGAs is attracting popularity and is increasingly used to handle complex systems with multiple integrated components. To increase performance and efficiency, HLS flows now adopt several advanced optimization techniques. Aggressive optimizations and system level integration can cause the introduction of bugs that are only observable on-chip. Debugging support for circuits generated with HLS is receiving a considerable attention. Among the data that can be collected on chip for debugging, one of the most important is the state of the Finite State Machines (FSM) controlling the components of the circuit. However, this usually requires a large amount of memory to trace the behavior during the execution. This work proposes an approach that takes advantage of the HLS information and of the structure of the FSM to compress control flow traces and to integrate optimized components for on-chip debugging. The generated checkers analyze the FSM execution on-fly, automatically notifying when a bug is detected, localizing it and providing data about its cause. The traces are compressed using a software profiling technique, called Efficient Path Profiling (EPP), adapted for the debugging of hardware accelerators generated with HLS. With this technique, the size of the memory used to store control flow traces can be reduced up to 2 orders of magnitude, compared to state-of-the-art

Archivio istituzionale della ricerca - Politecnico di Milano

Performance Evaluation of RAM-Based Implementation of Finite State Machines in FPGAs

Author: García Vargas Ignacio
Guisado Lízar José Luís
Senhadji Navarro Raouf
Publication venue: IEEE Computer Society
Publication date: 01/01/2012
Field of study

This paper presents a study of performance of RAM-based implementations in FPGAs of Finite State Machines (FSMs). The influence of the FSM characteristics on speed and area has been studied, taking into account the particular features of different FPGA families, like the size of LUTs, the size of memory blocks, the number of embedded multiplexer levels and the specific decoding logic for distributed RAM. Our study can be useful for efficiently implementing FPGA-based state machines

idUS. Depósito de Investigación Universidad de Sevilla

Accelerating String Set Matching in FPGA Hardware for Bioinformatics Research

Author: A Aho
Altera
Altera
AT Alex
AT Castelo
B Kuster
C Lin
D Farre
DE Kalume
DE Knuth
FM McCarthy
H Hyyro
HJ Jung
I Bogdan
I Xilinx
IT Li
J Buhler
JD Jaffe
JD Jaffe
L Tan
M Brudno
M Brudno
M Michael
Mark Lawrence
PG Lokhov
R Sidhu
RS Boyer
S Fide
Shane C Burgess
Susan M Bridges
T Oliver
TST Mak
V Boeva
Xilinx
Yoginder S Dandass
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background This paper describes techniques for accelerating the performance of the string set matching problem with particular emphasis on applications in computational proteomics. The process of matching peptide sequences against a genome translated in six reading frames is part of a proteogenomic mapping pipeline that is used as a case-study. The Aho-Corasick algorithm is adapted for execution in field programmable gate array (FPGA) devices in a manner that optimizes space and performance. In this approach, the traditional Aho-Corasick finite state machine (FSM) is split into smaller FSMs, operating in parallel, each of which matches up to 20 peptides in the input translated genome. Each of the smaller FSMs is further divided into five simpler FSMs such that each simple FSM operates on a single bit position in the input (five bits are sufficient for representing all amino acids and special symbols in protein sequences). Results This bit-split organization of the Aho-Corasick implementation enables efficient utilization of the limited random access memory (RAM) resources available in typical FPGAs. The use of on-chip RAM as opposed to FPGA logic resources for FSM implementation also enables rapid reconfiguration of the FPGA without the place and routing delays associated with complex digital designs. Conclusion Experimental results show storage efficiencies of over 80% for several data sets. Furthermore, the FPGA implementation executing at 100 MHz is nearly 20 times faster than an implementation of the traditional Aho-Corasick algorithm executing on a 2.67 GHz workstation.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central