4 research outputs found

    Multi-Bit Carry Chains for High-Performance Reconfigurable Fabrics

    Get PDF
    Ripple-carry architectures are the norm in today\u27s reconfigurable fabrics. They are simple, require minimal routing, and are easily formed across arbitrary cells in a fabric. However, their computation delay grows linearly with operand width. Many different fabric carry-chains have been presented in literature offering non-linear delays, but generally require a significant investment in routing and processing area. Carry-skip chains are well-known in arithmetic logic design, and although they too possess a linear delay, their performance is 2times or more faster than simple ripple-carry schemes. They require an expanded carry chain and minimal extra logic, but offer impressive speed-ups for arithmetic. This paper presents a reconfigurable cell that supports carry-skip arithmetic using a multi-bit carry chain achieving 2 middot k middot b+ n/b performance, where b is the block size and k is an architecture constant. The cell is specialized for arithmetic and Boolean operations with reduced configuration memory. Additional resources are provided to reuse the multi-bit carry chain for 3-source operand arithmetic to explore how multi-bit chains can be reused

    Multi-Bit Carry Chains for High-Performance Reconfigurable Fabrics

    No full text
    Ripple-carry architectures are the norm in today's reconfigurable fabrics. They are simple, require minimal routing, and are easily formed across arbitrary cells in a fabric. However, their computation delay grows linearly with operand width. Many different fabric carry-chains have been presented in literature offering non-linear delays, but generally require a significant investment in routing and processing area. Carry-skip chains are well-known in arithmetic logic design, and although they too possess a linear delay, their performance is 2times or more faster than simple ripple-carry schemes. They require an expanded carry chain and minimal extra logic, but offer impressive speed-ups for arithmetic. This paper presents a reconfigurable cell that supports carry-skip arithmetic using a multi-bit carry chain achieving 2 middot k middot b+ n/b performance, where b is the block size and k is an architecture constant. The cell is specialized for arithmetic and Boolean operations with reduced configuration memory. Additional resources are provided to reuse the multi-bit carry chain for 3-source operand arithmetic to explore how multi-bit chains can be reused.This is a manuscript of a proceeding published as Frederick, Michael T., and Arun K. Somani. "Multi-bit carry chains for high-performance reconfigurable fabrics." In 2006 International Conference on Field Programmable Logic and Applications (2006): 275. DOI: 10.1109/FPL.2006.311225. Posted with permission.</p

    Uso eficiente de aritmética redundante en FPGAs

    Get PDF
    Hasta hace pocos años, la utilización de aritmética redundante en FPGAs había sido descartada por dos razones principalmente. En primer lugar, por el buen rendimiento que ofrecían los sumadores de acarreo propagado, gracias a la lógica de de acarreo que poseían de fábrica y al pequeño tamaño de los operandos en las aplicaciones típicas para FPGAs. En segundo lugar, el excesivo consumo de área que las herramientas de síntesis obtenían cuando mapeaban unidades que trabajan en carrysave. En este trabajo, se muestra que es posible la utilización de aritmética redundante carry-save en FPGAs de manera eficiente, consiguiendo un aumento en la velocidad de operación con un consumo de recursos razonable. Se ha introducido un nuevo formato redundante doble carry-save y se ha demostrado que la manera óptima para la realización de multiplicadores de elevado ancho de palabra es la combinación de multiplicadores empotrados con sumadores carry-save.Till a few years ago, redundant arithmetic had been discarded to be use in FPGA mainly for two reasons. First, the efficient results obtained using carry-propagate adders thanks to the carry-logic embedded in FPGAs and the small sizes of operands in typical FPGA applications. Second, the high number of resources that the synthesis tools utilizes to implement carry-save circuits. In this work, it is demonstrated that carry-save arithmetic can be efficiently used in FPGA, obtaining an important speed improvement with a reasonable area cost. A new redundant format, double carry-save, has been introduced, and the optimal implementation of large size multipliers has been shown based on embedded multipliers and carry-save adders

    Closing the Gap between FPGA and ASIC:Balancing Flexibility and Efficiency

    Get PDF
    Despite many advantages of Field-Programmable Gate Arrays (FPGAs), they fail to take over the IC design market from Application-Specific Integrated Circuits (ASICs) for high-volume and even medium-volume applications, as FPGAs come with significant cost in area, delay, and power consumption. There are two main reasons that FPGAs have huge efficiency gap with ASICs: (1) FPGAs are extremely flexible as they have fully programmable soft-logic blocks and routing networks, and (2) FPGAs have hard-logic blocks that are only usable by a subset of applications. In other words, current FPGAs have a heterogeneous structure comprised of the flexible soft-logic and the efficient hard-logic blocks that suffer from inefficiency and inflexibility, respectively. The inefficiency of the soft-logic is a challenge for any application that is mapped to FPGAs, and lack of flexibility in the hard-logic results in a waste of resources when an application cannot use the hard-logic. In this thesis, we approach the inefficiency problem of FPGAs by bridging the efficiency/flexibility gap of the hard- and soft-logic. The main goal of this thesis is to compromise on efficiency of the hard-logic for flexibility, on the one hand, and to compromise on flexibility of the soft-logic for efficiency, on the other hand. In other words, this thesis deals with two issues: (1) adding more generality to the hard-logic of FPGAs, and (2) improving the soft-logic by adapting it to the generic requirements of applications. In the first part of the thesis, we introduce new techniques that expand the functionality of FPGAs hard-logic. The hard-logic includes the dedicated resources that are tightly coupled with the soft-logic –i.e., adder circuitry and carry chains –as well as the stand-alone ones –i.e., DSP blocks. These specialized resources are intended to accelerate critical arithmetic operations that appear in the pre-synthesis representation of applications; we introduce mapping and architectural solutions, which enable both types of the hard-logic to support additional arithmetic operations. We first present a mapping technique that extends the application of FPGAs carry chains for carry-save arithmetic, and then to increase the generality of the hard-logic, we introduce novel architectures; using these architectures, more applications can take advantage of FPGAs hard-logic. In the second part of the thesis, we improve the efficiency of FPGAs soft-logic by exploiting the circuit patterns that emerge after logic synthesis, i.e., connection and logic patterns. Using these patterns, we design new soft-logic blocks that have less flexibility, but more efficiency than current ones. In this part, we first introduce logic chains, fixed connections that are integrated between the soft-logic blocks of FPGAs and are well-suited for long chains of logic that appear post-synthesis. Logic chains provide fast and low cost connectivity, increase the bandwidth of the logic blocks without changing their interface with the routing network, and improve the logic density of soft-logic blocks. In addition to logic chains and as a complementary contribution, we present a non-LUT soft-logic block that comprises simple and pre-connected cells. The structure of this logic block is inspired from the logic patterns that appear post-synthesis. This block has a complexity that is only linear in the number of inputs, it sports the potential for multiple independent outputs, and the delay is only logarithmic in the number of inputs. Although this new block is less flexible than a LUT, we show (1) that effective mapping algorithms exist, (2) that, due to their simplicity, poor utilization is less of an issue than with LUTs, and (3) that a few LUTs can still be used in extreme unfortunate cases. In summary, to bridge the gap between FPGAs and ASICs, we approach the problem from two complementary directions, which balance flexibility and efficiency of the logic blocks of FPGAs. However, we were able to explore a few design points in this thesis, and future work could focus on further exploration of the design space
    corecore