5 research outputs found

    Power measurements and analysis for dynamic circuit specialization

    Get PDF
    Dynamic Circuit Specialization (DCS) is a technique for optimized FPGA implementation and is built on top of Partial Reconfiguration (PR). Dynamic Partial Reconfiguration (DPR) provides an opportunity to share the silicon area between different Partially Reconfigurable Modules (PRMs) and therefore results in smaller and faster designs that potentially also reduce the power consumption. In this paper, we show that energy consumption is an important factor that has to be considered while implementing a parameterized design using DCS. In order to make a good choice for implementing a parameterized design with the goal of power optimized implementation, it is important to have a good power consumption estimate of the Dynamic Circuit Specialization. In this context, our paper presents a detailed investigation of the power consumption of a parameterized design implemented using DCS on the Xilinx Zynq-SoC FPGA. We propose an energy analysis of DCS and investigate the benefits of the use of DCS in comparison with a classic static FPGA implementation. We see that the power needed for the reconfiguration is much higher than the gain in power using the reconfiguration over the static implementation. An important reason is because of the CPU involved during the reconfiguration and the interface between the AXI bus and the HWICAP. To reduce the reconfiguration power, we include a clock gating technique to the reconfiguration interface AXI-HWICAP that makes DCS more power efficient. We also relate the power gain to the size of the implementation and to the allowed time to reconfigure versus the useful run time. We conclude that for an implementation with 10 FIR filters, the reconfiguration time should not take more than 30.3% of the total time in order to remain energy efficient. Considering a specific use case with 10 FIR filters at a reconfiguration rate of 0.01, the energy consumption using DCS implementation is 20.5% lower than using the static FIR

    Efficiently generating FPGA configurations through a stack machine

    No full text
    Parameterizable configurations are regular FPGA configurations in which some of the configuration bits are expressed as Boolean functions of a set of parameters. These configurations can be rapidly transformed to a specialized configuration by evaluating the Boolean functions for a specific set of parameter values and are therefore ideal for use in run-time reconfigurable systems. To accommodate the use of parameterizable configurations in commercial FPGAs, the concept of the Parameterizable Bitstream was introduced. In this work, we introduce a hardware implementation that evaluates the Parameterizable Bitstream based on a stack machine architecture. We enabled fast generation of specialized configurations with a significant reduction in resources (80\%) in comparison to the MicroBlaze soft processor when it is used as a configuration generation engine

    Efficiently generating FPGA configurations through a stack machine

    No full text
    Parameterizable configurations are regular FPGA configurations in which some of the configuration bits are expressed as Boolean functions of a set of parameters. These configurations can be rapidly transformed to a specialized configuration by evaluating the Boolean functions for a specific set of parameter values and are therefore ideal for use in run-time reconfigurable systems. To accommodate the use of parameterizable configurations in commercial FPGAs, the concept of the Parameterizable Bitstream was introduced. In this paper, we introduce a hardware implementation that evaluates the Parameterizable Bitstream based on a stack machine architecture. We enabled fast generation of specialized configurations with a significant reduction in resources (80%) in comparison to the MicroBlaze soft

    FPGA structures for high speed and low overhead dynamic circuit specialization

    Get PDF
    A Field Programmable Gate Array (FPGA) is a programmable digital electronic chip. The FPGA does not come with a predefined function from the manufacturer; instead, the developer has to define its function through implementing a digital circuit on the FPGA resources. The functionality of the FPGA can be reprogrammed as desired and hence the name “field programmable”. FPGAs are useful in small volume digital electronic products as the design of a digital custom chip is expensive. Changing the FPGA (also called configuring it) is done by changing the configuration data (in the form of bitstreams) that defines the FPGA functionality. These bitstreams are stored in a memory of the FPGA called configuration memory. The SRAM cells of LookUp Tables (LUTs), Block Random Access Memories (BRAMs) and DSP blocks together form the configuration memory of an FPGA. The configuration data can be modified according to the user’s needs to implement the user-defined hardware. The simplest way to program the configuration memory is to download the bitstreams using a JTAG interface. However, modern techniques such as Partial Reconfiguration (PR) enable us to configure a part in the configuration memory with partial bitstreams during run-time. The reconfiguration is achieved by swapping in partial bitstreams into the configuration memory via a configuration interface called Internal Configuration Access Port (ICAP). The ICAP is a hardware primitive (macro) present in the FPGA used to access the configuration memory internally by an embedded processor. The reconfiguration technique adds flexibility to use specialized ci rcuits that are more compact and more efficient t han t heir b ulky c ounterparts. An example of such an implementation is the use of specialized multipliers instead of big generic multipliers in an FIR implementation with constant coefficients. To specialize these circuits and reconfigure during the run-time, researchers at the HES group proposed the novel technique called parameterized reconfiguration that can be used to efficiently and automatically implement Dynamic Circuit Specialization (DCS) that is built on top of the Partial Reconfiguration method. It uses the run-time reconfiguration technique that is tailored to implement a parameterized design. An application is said to be parameterized if some of its input values change much less frequently than the rest. These inputs are called parameters. Instead of implementing these parameters as regular inputs, in DCS these inputs are implemented as constants, and the application is optimized for the constants. For every change in parameter values, the design is re-optimized (specialized) during run-time and implemented by reconfiguring the optimized design for a new set of parameters. In DCS, the bitstreams of the parameterized design are expressed as Boolean functions of the parameters. For every infrequent change in parameters, a specialized FPGA configuration is generated by evaluating the corresponding Boolean functions, and the FPGA is reconfigured with the specialized configuration. A detailed study of overheads of DCS and providing suitable solutions with appropriate custom FPGA structures is the primary goal of the dissertation. I also suggest different improvements to the FPGA configuration memory architecture. After offering the custom FPGA structures, I investigated the role of DCS on FPGA overlays and the use of custom FPGA structures that help to reduce the overheads of DCS on FPGA overlays. By doing so, I hope I can convince the developer to use DCS (which now comes with minimal costs) in real-world applications. I start the investigations of overheads of DCS by implementing an adaptive FIR filter (using the DCS technique) on three different Xilinx FPGA platforms: Virtex-II Pro, Virtex-5, and Zynq-SoC. The study of how DCS behaves and what is its overhead in the evolution of the three FPGA platforms is the non-trivial basis to discover the costs of DCS. After that, I propose custom FPGA structures (reconfiguration controllers and reconfiguration drivers) to reduce the main overhead (reconfiguration time) of DCS. These structures not only reduce the reconfiguration time but also help curbing the power hungry part of the DCS system. After these chapters, I study the role of DCS on FPGA overlays. I investigate the effect of the proposed FPGA structures on Virtual-Coarse-Grained Reconfigurable Arrays (VCGRAs). I classify the VCGRA implementations into three types: the conventional VCGRA, partially parameterized VCGRA and fully parameterized VCGRA depending upon the level of parameterization. I have designed two variants of VCGRA grids for HPC image processing applications, namely, the MAC grid and Pixie. Finally, I try to tackle the reconfiguration time overhead at the hardware level of the FPGA by customizing the FPGA configuration memory architecture. In this part of my research, I propose to use a parallel memory structure to improve the reconfiguration time of DCS drastically. However, this improvement comes with a significant overhead of hardware resources which will need to be solved in future research on commercial FPGA configuration memory architectures
    corecore