296 research outputs found

    OPTIMAL AREA AND PERFORMANCE MAPPING OF K-LUT BASED FPGAS

    Get PDF
    FPGA circuits are increasingly used in many fields: for rapid prototyping of new products (including fast ASIC implementation), for logic emulation, for producing a small number of a device, or if a device should be reconfigurable in use (reconfigurable computing). Determining if an arbitrary, given wide, function can be implemented by a programmable logic block, unfortunately, it is generally, a very difficult problem. This problem is called the Boolean matching problem. This paper introduces a new implemented algorithm able to map, both for area and performance, combinational networks using k-LUT based FPGAs.k-LUT based FPGAs, combinational circuits, performance-driven mapping.

    AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

    Full text link
    CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to reprogram the FPGAs for flexible acceleration of many workloads. Nonetheless, this advantage is often overshadowed by the poor programmability of FPGAs whose programming is conventionally a RTL design practice. Although recent advances in high-level synthesis (HLS) significantly improve the FPGA programmability, it still leaves programmers facing the challenge of identifying the optimal design configuration in a tremendous design space. This paper aims to address this challenge and pave the path from software programs towards high-quality FPGA accelerators. Specifically, we first propose the composable, parallel and pipeline (CPP) microarchitecture as a template of accelerator designs. Such a well-defined template is able to support efficient accelerator designs for a broad class of computation kernels, and more importantly, drastically reduce the design space. Also, we introduce an analytical model to capture the performance and resource trade-offs among different design configurations of the CPP microarchitecture, which lays the foundation for fast design space exploration. On top of the CPP microarchitecture and its analytical model, we develop the AutoAccel framework to make the entire accelerator generation automated. AutoAccel accepts a software program as an input and performs a series of code transformations based on the result of the analytical-model-based design space exploration to construct the desired CPP microarchitecture. Our experiments show that the AutoAccel-generated accelerators outperform their corresponding software implementations by an average of 72x for a broad class of computation kernels

    Optimal simultaneous mapping and clustering for FPGA delay optimization

    Get PDF

    Design and analysis of efficient synthesis algorithms for EDAC functions in FPGAs

    Get PDF
    Error Detection and Correction (EDAC) functions have been widely used for protecting memories from single event upsets (SEU), which occur in environments with high levels of radiation or in deep submicron manufacturing technologies. This paper presents three novel synthesis algorithms that obtain areaefficient implementations for a given EDAC function, with the ultimate aim of reducing the number of sensitive configuration bits in SRAM-based Field-Programmable Gate Arrays (FPGAs). Having less sensitive bits results in a lower chance of suffering a SEU in the EDAC circuitry, thus improving the overall reliability of the whole system. Besides minimizing area, the proposed algorithms also focus on improving other figures of merit like circuit speed and power consumption. The executed benchmarks show that, when compared to other modern synthesis tools, the proposed algorithms can reduce the number of utilized look-up tables (LUTs) up to a 34.48%. Such large reductions in area usage ultimately result in reliability improvements over 10% for the implemented EDAC cores, measured as MTBF (Mean Time Between Failures). On the other hand, maximum path delays and power consumptions can be reduced up to a 17.72% and 34.37% respectively on the placed and routed designs.This work was supported by the Spanish Ministry of Educacion, Cultura y Deporte under the grant FPU12/05573, and by the Spanish Ministry of Economıa project ESP2013-48362-C2-2-P, in the frame of the activities of the Instrument Control Unit of the Infrarred Instrument of the ESA Euclid Mission carried out by the Dept. of Electronics and Computer Technology of the Universidad Politécnica de Cartagen

    A novel heuristic and provable bounds for reconfigurable architecture design

    No full text
    Published versio

    Cost Effective Implementation of Fixed Point Adders for LUT based FPGAs using Technology Dependent Optimizations

    Get PDF
    Modern day field programmable gate arrays (FPGAs) have very huge and versatile logic resources resulting in the migration of their application domain from prototype designing to low and medium volume production designing. Unfortunately most of the work pertaining to FPGA implementations does not focus on the technology dependent optimizations that can implement a desired functionality with reduced cost. In this paper we consider the mapping of simple ripple carry fixed-point adders (RCA) on look-up table (LUT) based FPGAs. The objective is to transform the given RCA Boolean network into an optimized circuit netlist that can implement the desired functionality with minimum cost. We particularly focus on 6-input LUTs that are inherent in all the modern day FPGAs. Technology dependent optimizations are carried out to utilize this FPGA primitive efficiently and the result is compared against various adder designs. The implementation targets the XC5VLX30-3FF324 device from Xilinx Virtex-5 FPGA family. The cost of the circuit is expressed in terms of the resources utilized, critical path delay and the amount of on-chip power dissipated. Our implementation results show a reduction in resources usage by at least 50%; increase in speed by at least 10% and reduction in dynamic power dissipation by at least 30%. All this is achieved without any technology independent (architectural) modification
    corecore