659 research outputs found

    Run-time power and performance scaling in 28 nm FPGAs

    Get PDF

    A Micro Power Hardware Fabric for Embedded Computing

    Get PDF
    Field Programmable Gate Arrays (FPGAs) mitigate many of the problemsencountered with the development of ASICs by offering flexibility, faster time-to-market, and amortized NRE costs, among other benefits. While FPGAs are increasingly being used for complex computational applications such as signal and image processing, networking, and cryptology, they are far from ideal for these tasks due to relatively high power consumption and silicon usage overheads compared to direct ASIC implementation. A reconfigurable device that exhibits ASIC-like power characteristics and FPGA-like costs and tool support is desirable to fill this void. In this research, a parameterized, reconfigurable fabric model named as domain specific fabric (DSF) is developed that exhibits ASIC-like power characteristics for Digital Signal Processing (DSP) style applications. Using this model, the impact of varying different design parameters on power and performance has been studied. Different optimization techniques like local search and simulated annealing are used to determine the appropriate interconnect for a specific set of applications. A design space exploration tool has been developed to automate and generate a tailored architectural instance of the fabric.The fabric has been synthesized on 160 nm cell-based ASIC fabrication process from OKI and 130 nm from IBM. A detailed power-performance analysis has been completed using signal and image processing benchmarks from the MediaBench benchmark suite and elsewhere with comparisons to other hardware and software implementations. The optimized fabric implemented using the 130 nm process yields energy within 3X of a direct ASIC implementation, 330X better than a Virtex-II Pro FPGA and 2016X better than an Intel XScale processor

    Tool for a configurable integrated circuit that uses determination of dynamic power consumption

    Get PDF
    A configurable logic tool that allows minimization of dynamic power within an FPGA design without changing user-entered specifications. The minimization of power may use minimized clock nets as a first order operation, and a second order operation that minimizes other factors, such as area of placement, area of clocks and/or slack

    High Level Synthesis and Evaluation of an Automotive RADAR Signal Processing algorithm for FPGAs

    Get PDF
    High Level Synthesis (HLS) is a technology used to design and develop hardware (HW) using high-level languages such as C/C++. An HLS model of an automotive RADAR signal processing algorithm has been developed for the purpose of comparison between the HLS model and the existing HDL model. Register Transfer Level (RTL) programming is a technology used to design and develop hardware at the register transfer level (or low level) using Hardware description languages such as Verilog and VHDL. FPGA development usually requires the knowledge of RTL technologies. HLS gives software (SW) developers the ability to design and implement their designs on an FPGA without requiring the knowledge of RTL technologies and HDL. Even though HLS is currently gaining popularity, the applications used to evaluate HLS tend to remain small. We synthesize an automotive RADAR signal processing system using HLS-based design methodology, which has mid to high complexity, and compare our synthesis results to that of the RTL-based design. We used many techniques used to make the high-level program model ready for synthesis while optimizing for both speed and resource usage using Xilinx Vivado HLS Computer-Aided Design (CAD) tool. We achieved a speed up of 2X compared to the RTL-based design while reducing the design time from approximately 16 weeks to 6 weeks. The FPGA resource utilization increased but it was still under 5% of the total resources available on the FPGA

    Smart technologies for effective reconfiguration: the FASTER approach

    Get PDF
    Current and future computing systems increasingly require that their functionality stays flexible after the system is operational, in order to cope with changing user requirements and improvements in system features, i.e. changing protocols and data-coding standards, evolving demands for support of different user applications, and newly emerging applications in communication, computing and consumer electronics. Therefore, extending the functionality and the lifetime of products requires the addition of new functionality to track and satisfy the customers needs and market and technology trends. Many contemporary products along with the software part incorporate hardware accelerators for reasons of performance and power efficiency. While adaptivity of software is straightforward, adaptation of the hardware to changing requirements constitutes a challenging problem requiring delicate solutions. The FASTER (Facilitating Analysis and Synthesis Technologies for Effective Reconfiguration) project aims at introducing a complete methodology to allow designers to easily implement a system specification on a platform which includes a general purpose processor combined with multiple accelerators running on an FPGA, taking as input a high-level description and fully exploiting, both at design time and at run time, the capabilities of partial dynamic reconfiguration. The goal is that for selected application domains, the FASTER toolchain will be able to reduce the design and verification time of complex reconfigurable systems providing additional novel verification features that are not available in existing tool flows

    Configuration Sharing Optimized Placement and Routing

    Get PDF
    Reconfigurable systems have been shown to achieve very high computational performance. However, the overhead associated with reconfiguration of hardware remains a critical factor in overall system performance. This paper discusses the development and evaluation of a technique to minimize the delay associated with reconfiguration based upon optimized sharing of configuration bit streams between design contexts. This is achieved through modified placement and routing algorithms

    A Multi-layer Fpga Framework Supporting Autonomous Runtime Partial Reconfiguration

    Get PDF
    Partial reconfiguration is a unique capability provided by several Field Programmable Gate Array (FPGA) vendors recently, which involves altering part of the programmed design within an SRAM-based FPGA at run-time. In this dissertation, a Multilayer Runtime Reconfiguration Architecture (MRRA) is developed, evaluated, and refined for Autonomous Runtime Partial Reconfiguration of FPGA devices. Under the proposed MRRA paradigm, FPGA configurations can be manipulated at runtime using on-chip resources. Operations are partitioned into Logic, Translation, and Reconfiguration layers along with a standardized set of Application Programming Interfaces (APIs). At each level, resource details are encapsulated and managed for efficiency and portability during operation. An MRRA mapping theory is developed to link the general logic function and area allocation information to the device related physical configuration level data by using mathematical data structure and physical constraints. In certain scenarios, configuration bit stream data can be read and modified directly for fast operations, relying on the use of similar logic functions and common interconnection resources for communication. A corresponding logic control flow is also developed to make the entire process autonomous. Several prototype MRRA systems are developed on a Xilinx Virtex II Pro platform. The Virtex II Pro on-chip PowerPC core and block RAM are employed to manage control operations while multiple physical interfaces establish and supplement autonomous reconfiguration capabilities. Area, speed and power optimization techniques are developed based on the developed Xilinx prototype. Evaluations and analysis of these prototype and techniques are performed on a number of benchmark and hashing algorithm case studies. The results indicate that based on a variety of test benches, up to 70% reduction in the resource utilization, up to 50% improvement in power consumption, and up to 10 times increase in run-time performance are achieved using the developed architecture and approaches compared with Xilinx baseline reconfiguration flow. Finally, a Genetic Algorithm (GA) for a FPGA fault tolerance case study is evaluated as a ultimate high-level application running on this architecture. It demonstrated that this is a hardware and software infrastructure that enables an FPGA to dynamically reconfigure itself efficiently under the control of a soft microprocessor core that is instantiated within the FPGA fabric. Such a system contributes to the observed benefits of intelligent control, fast reconfiguration, and low overhead

    Hybrid FPGA: Architecture and Interface

    No full text
    Hybrid FPGAs (Field Programmable Gate Arrays) are composed of general-purpose logic resources with different granularities, together with domain-specific coarse-grained units. This thesis proposes a novel hybrid FPGA architecture with embedded coarse-grained Floating Point Units (FPUs) to improve the floating point capability of FPGAs. Based on the proposed hybrid FPGA architecture, we examine three aspects to optimise the speed and area for domain-specific applications. First, we examine the interface between large coarse-grained embedded blocks (EBs) and fine-grained elements in hybrid FPGAs. The interface includes parameters for varying: (1) aspect ratio of EBs, (2) position of the EBs in the FPGA, (3) I/O pins arrangement of EBs, (4) interconnect flexibility of EBs, and (5) location of additional embedded elements such as memory. Second, we examine the interconnect structure for hybrid FPGAs. We investigate how large and highdensity EBs affect the routing demand for hybrid FPGAs over a set of domain-specific applications. We then propose three routing optimisation methods to meet the additional routing demand introduced by large EBs: (1) identifying the best separation distance between EBs, (2) adding routing switches on EBs to increase routing flexibility, and (3) introducing wider channel width near the edge of EBs. We study and compare the trade-offs in delay, area and routability of these three optimisation methods. Finally, we employ common subgraph extraction to determine the number of floating point adders/subtractors, multipliers and wordblocks in the FPUs. The wordblocks include registers and can implement fixed point operations. We study the area, speed and utilisation trade-offs of the selected FPU subgraphs in a set of floating point benchmark circuits. We develop an optimised coarse-grained FPU, taking into account both architectural and system-level issues. Furthermore, we investigate the trade-offs between granularities and performance by composing small FPUs into a large FPU. The results of this thesis would help design a domain-specific hybrid FPGA to meet user requirements, by optimising for speed, area or a combination of speed and area

    Size, Speed, and Power Analysis for Application-Specific Integrated Circuits Using Synthesis

    Get PDF
    An application-specific integrated circuit (ASIC) must not only provide the required functionality at the desired speed but it must also be economical. In the past, minimizing the size of the ASIC was sufficient to accomplish this goal. Today it is increasingly necessary that the ASIC also achieve minimum power dissipation or an optimal combination of speed, size and power, especially in communication and portable electronic devices. The research reported in this thesis describes the implementation of a Huffman encoder and a finite impulse response (FIR) filter using a hardware description language (HDL) and the testing of the corresponding register transfer level (RTL) for functionality. The RTL was targeted for two different libraries, TSMC-0.18 CMOS and the Xilinx Virtex V1000EHQ240-6. The RTL was synthesized and optimized for different sizes, speeds, and power by using the Synopsys Design Compiler, FPGA Compiler II, and Mentor Graphics Spectrum. Cadence place and route tools optimized area, delay, and power of post-layout stages for TSMC-0.18. Xilinx place and route tools were used for the Virtex V1000EHQ240-6. The various ASICs were produced and compared over a range of speed, area, and power. i

    Optimal simultaneous mapping and clustering for FPGA delay optimization

    Get PDF
    • ā€¦
    corecore