1,894 research outputs found

    A framework for FPGA functional units in high performance computing

    Get PDF
    FPGAs make it practical to speed up a program by defining hardware functional units that perform calculations faster than can be achieved in software. Specialised digital circuits avoid the overhead of executing sequences of instructions, and they make available the massive parallelism of the components. The FPGA operates as a coprocessor controlled by a conventional computer. An application that combines software with hardware in this way needs an interface between a communications port to the processor and the signals connected to the functional units. We present a framework that supports the design of such systems. The framework consists of a generic controller circuit defined in VHDL that can be configured by the user according to the needs of the functional units and the I/O channel. The controller contains a register file and a pipelined programmable register transfer machine, and it supports the design of both stateless and stateful functional units. Two examples are described: the implementation of a set of basic stateless arithmetic functional units, and the implementation of a stateful algorithm that exploits circuit parallelism

    Technology Mapping for Circuit Optimization Using Content-Addressable Memory

    Get PDF
    The growing complexity of Field Programmable Gate Arrays (FPGA's) is leading to architectures with high input cardinality look-up tables (LUT's). This thesis describes a methodology for area-minimizing technology mapping for combinational logic, specifically designed for such FPGA architectures. This methodology, called LURU, leverages the parallel search capabilities of Content-Addressable Memories (CAM's) to outperform traditional mapping algorithms in both execution time and quality of results. The LURU algorithm is fundamentally different from other techniques for technology mapping in that LURU uses textual string representations of circuit topology in order to efficiently store and search for circuit patterns in a CAM. A circuit is mapped to the target LUT technology using both exact and inexact string matching techniques. Common subcircuit expressions (CSE's) are also identified and used for architectural optimization---a small set of CSE's is shown to effectively cover an average of 96% of the test circuits. LURU was tested with the ISCAS'85 suite of combinational benchmark circuits and compared with the mapping algorithms FlowMap and CutMap. The area reduction shown by LURU is, on average, 20% better compared to FlowMap and CutMap. The asymptotic runtime complexity of LURU is shown to be better than that of both FlowMap and CutMap

    Parametric, Secure and Compact Implementation of RSA on FPGA

    Get PDF
    We present a fast, efficient, and parameterized modular multiplier and a secure exponentiation circuit especially intended for FPGAs on the low end of the price range. The design utilizes dedicated block multipliers as the main functional unit and Block-RAM as storage unit for the operands. The adopted design methodology allows adjusting the number of multipliers, the radix used in the multipliers, and number of words to meet the system requirements such as available resources, precision and timing constraints. The architecture, based on the Montgomery modular multiplication algorithm, utilizes a pipelining technique that allows concurrent operation of hardwired multipliers. Our design completes 1020-bit and 2040-bit modular multiplications in 7.62 μs and 27.0 μs, respectively. The multiplier uses a moderate amount of system resources while achieving the best area-time product in literature. 2040-bit modular exponentiation engine can easily fit into Xilinx Spartan-3E 500; moreover the exponentiation circuit withstands known side channel attacks

    Stochastic Digital Circuits for Probabilistic Inference

    Get PDF
    We introduce combinational stochastic logic, an abstraction that generalizes deterministic digital circuit design (based on Boolean logic gates) to the probabilistic setting. We show how this logic can be combined with techniques from contemporary digital design to generate stateless and stateful circuits for exact and approximate sampling from a range of probability distributions. We focus on Markov chain Monte Carlo algorithms for Markov random fields, using massively parallel circuits. We implement these circuits on commodity reconfigurable logic and estimate the resulting performance in time, space and price. Using our approach, these simple and general algorithms could be affordably run for thousands of iterations on models with hundreds of thousands of variables in real time

    Division-based versus general decomposition-based multiple-level logic synthesis

    Get PDF
    During the last decade, many different approaches have been proposed to solve the multiple-level synthesis problem with different minimum functionally complete systems of primitive logic blocks. The most popular of them is the division-based approach. However, modem microelectronic technology provides a large variety of building blocks which considerably differ from those typically considered. The traditional methods are therefore not suitable for synthesis with many modem building blocks. Furthermore, they often fail to find global optima for complex designs and leave unconsidered some important design aspects. Some of their weaknesses can be eliminated without leaving the paradigm they are based on, other ones are more fundamental. A paradigm which enables efficient exploitation of the opportunities created by the microelectronic technology is the general decomposition paradigm. The aim of this paper is to analyze and compare the general decomposition approach and the division-based approach. The most important advantages of the general decomposition approach are its generality (any network of any building blocks can be considered) and totality (all important design aspects can be considered) as well as handling the incompletely specified functions in a natural way. In many cases, the general decomposition approach gives much better results than the traditional approaches

    An empirical evaluation of High-Level Synthesis languages and tools for database acceleration

    Get PDF
    High Level Synthesis (HLS) languages and tools are emerging as the most promising technique to make FPGAs more accessible to software developers. Nevertheless, picking the most suitable HLS for a certain class of algorithms depends on requirements such as area and throughput, as well as on programmer experience. In this paper, we explore the different trade-offs present when using a representative set of HLS tools in the context of Database Management Systems (DBMS) acceleration. More specifically, we conduct an empirical analysis of four representative frameworks (Bluespec SystemVerilog, Altera OpenCL, LegUp and Chisel) that we utilize to accelerate commonly-used database algorithms such as sorting, the median operator, and hash joins. Through our implementation experience and empirical results for database acceleration, we conclude that the selection of the most suitable HLS depends on a set of orthogonal characteristics, which we highlight for each HLS framework.Peer ReviewedPostprint (author’s final draft
    corecore