360 research outputs found

    Low-power FSMs in FPGA: Encoding alternatives

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/3-540-45716-X_36Proceedings of 12th International Workshop, PATMOS 2002 Seville, Spain, September 11–13, 2002In this paper, the problem of state encoding of FPGA-based synchronous finite state machines (FSMs) for low-power is addressed. Four codification schemes have been studied: First, the usual binary encoding and the One-Hot approach suggested by the FPGA vendor; then, a code that minimizes the output logic; finally, the so-called Two-Hot code strategy. FSMs of the MCNC and PREP benchmark suites have been analyzed. Main results show that binary state encoding fit well with small machines (up to 8 states), meanwhile One-Hot is better for large FSMs (over 16 states). A power saving of up to the 57% can be achieved selecting the appropriate encoding. An areapower correlation has been observed in spite of the circuit or encoding scheme. Thus, FSMs that make use of fewer resources are good candidates to consume less power.Ministry of Science of Spain, under Contract TIC2001-2688-C03-03, has supported this work. Additional funds have been obtained from Projects 658001 and 658004 of the Fundación General de la Universidad Autónoma de Madrid

    Finite State Machines With Input Multiplexing: A Performance Study

    Get PDF
    Finite state machines with input multiplexing (FSMIMs) have been proposed in previous works as a technique for efficient mapping FSMs into ROM memory. In this paper, we propose a new architecture for implementing FSMIMs, called FSMIM with state-based input selection, whose goal is to achieve a further reduction in memory usage. This paper also describes in detail the algorithms for generating FSMIMs used by the tool FSMIM-Gen, which has been developed and made available on the Internet for free public use. A comparative study in terms of speed and area between FSMIM approaches and other field programmable gate array-based techniques is presented. The results show that the FSMIM approaches obtain huge reductions in the look-up table (LUT) usage by using a small number of embedded memory blocks. In addition, speed improvements over conventional LUT-based implementations have been obtained in many cases

    NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

    Get PDF
    Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving many state-of-the-art (SOA) visual processing tasks. Even though Graphical Processing Units (GPUs) are most often used in training and deploying CNNs, their power efficiency is less than 10 GOp/s/W for single-frame runtime inference. We propose a flexible and efficient CNN accelerator architecture called NullHop that implements SOA CNNs useful for low-power and low-latency application scenarios. NullHop exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements. The flexible architecture allows high utilization of available computing resources across kernel sizes ranging from 1x1 to 7x7. NullHop can process up to 128 input and 128 output feature maps per layer in a single pass. We implemented the proposed architecture on a Xilinx Zynq FPGA platform and present results showing how our implementation reduces external memory transfers and compute time in five different CNNs ranging from small ones up to the widely known large VGG16 and VGG19 CNNs. Post-synthesis simulations using Mentor Modelsim in a 28nm process with a clock frequency of 500 MHz show that the VGG19 network achieves over 450 GOp/s. By exploiting sparsity, NullHop achieves an efficiency of 368%, maintains over 98% utilization of the MAC units, and achieves a power efficiency of over 3TOp/s/W in a core area of 6.3mm2^2. As further proof of NullHop's usability, we interfaced its FPGA implementation with a neuromorphic event camera for real time interactive demonstrations

    ROM-Based Finite State Machine Implementation in Low Cost FPGAs

    Get PDF
    This work presents a technique for the resource optimization of input multiplexed ROM-based Finite State Machines. This technique exploits the don't care value of the inputs to reduce the memory size as well as multiplexer complexity. This technique has been applied to a publicly available FSM benchmarks and implemented in a low-cost FPGA. Results have been compared with tools supported ROM and standard logic cells implementations. In a significant number of test cases, the proposed technique is the best design alternative, both in resource requirements and speed

    Decomposition and encoding of finite state machines for FPGA implementation

    Get PDF
    xii+187hlm.;24c

    Using Efficient Path Profiling to Optimize Memory Consumption of On-Chip Debugging for High-Level Synthesis

    Get PDF
    High-Level Synthesis (HLS) for FPGAs is attracting popularity and is increasingly used to handle complex systems with multiple integrated components. To increase performance and efficiency, HLS flows now adopt several advanced optimization techniques. Aggressive optimizations and system level integration can cause the introduction of bugs that are only observable on-chip. Debugging support for circuits generated with HLS is receiving a considerable attention. Among the data that can be collected on chip for debugging, one of the most important is the state of the Finite State Machines (FSM) controlling the components of the circuit. However, this usually requires a large amount of memory to trace the behavior during the execution. This work proposes an approach that takes advantage of the HLS information and of the structure of the FSM to compress control flow traces and to integrate optimized components for on-chip debugging. The generated checkers analyze the FSM execution on-fly, automatically notifying when a bug is detected, localizing it and providing data about its cause. The traces are compressed using a software profiling technique, called Efficient Path Profiling (EPP), adapted for the debugging of hardware accelerators generated with HLS. With this technique, the size of the memory used to store control flow traces can be reduced up to 2 orders of magnitude, compared to state-of-the-art

    Performance Evaluation of RAM-Based Implementation of Finite State Machines in FPGAs

    Get PDF
    This paper presents a study of performance of RAM-based implementations in FPGAs of Finite State Machines (FSMs). The influence of the FSM characteristics on speed and area has been studied, taking into account the particular features of different FPGA families, like the size of LUTs, the size of memory blocks, the number of embedded multiplexer levels and the specific decoding logic for distributed RAM. Our study can be useful for efficiently implementing FPGA-based state machines

    Accelerating String Set Matching in FPGA Hardware for Bioinformatics Research

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>This paper describes techniques for accelerating the performance of the string set matching problem with particular emphasis on applications in computational proteomics. The process of matching peptide sequences against a genome translated in six reading frames is part of a proteogenomic mapping pipeline that is used as a case-study. The Aho-Corasick algorithm is adapted for execution in field programmable gate array (FPGA) devices in a manner that optimizes space and performance. In this approach, the traditional Aho-Corasick finite state machine (FSM) is split into smaller FSMs, operating in parallel, each of which matches up to 20 peptides in the input translated genome. Each of the smaller FSMs is further divided into five simpler FSMs such that each simple FSM operates on a single bit position in the input (five bits are sufficient for representing all amino acids and special symbols in protein sequences).</p> <p>Results</p> <p>This bit-split organization of the Aho-Corasick implementation enables efficient utilization of the limited random access memory (RAM) resources available in typical FPGAs. The use of on-chip RAM as opposed to FPGA logic resources for FSM implementation also enables rapid reconfiguration of the FPGA without the place and routing delays associated with complex digital designs.</p> <p>Conclusion</p> <p>Experimental results show storage efficiencies of over 80% for several data sets. Furthermore, the FPGA implementation executing at 100 MHz is nearly 20 times faster than an implementation of the traditional Aho-Corasick algorithm executing on a 2.67 GHz workstation.</p
    corecore