648 research outputs found

    A novel tool flow for increased routing configuration similarity in multi-mode circuits

    Get PDF
    A multi-mode circuit implements the functionality of a limited number of circuits, called modes, of which at any given time only one needs to be realised. Using run-time reconfiguration (RTR) of an FPGA, all the modes can be time-multiplexed on the same reconfigurable region, requiring only an area that can contain the biggest mode. Typically, conventional run-time reconfiguration techniques generate a configuration of the reconfigurable region for every mode separately. This results in configurations that are bit-wise very different. Thus, in this case, many bits need to be changed in the configuration memory to switch between modes, leading to long reconfiguration times. In this paper we present a novel tool flow that retains the placement of the conventional RTR flow, but uses TRoute, a reconfiguration-aware connection router, to implement the connections of all modes simultaneously. TRoute stimulates the sharing of routing resources between connections of different modes. This results in a significant increase in the similarity between the routing configurations of the modes. In the experimental results it is shown that the number of routing configuration bits that needs to be rewritten is reduced with a factor between 2 and 4 compared to conventional techniques

    An automatic tool flow for the combined implementation of multi-mode circuits

    Get PDF
    A multi-mode circuit implements the functionality of a limited number of circuits, called modes, of which at any given time only one needs to be realised. Using run-time reconfiguration of an FPGA, all the modes can be implemented on the same reconfigurable region, requiring only an area that can contain the biggest mode. Typically, conventional run-time reconfiguration techniques generate a configuration for every mode separately. To switch between modes the complete reconfigurable region is rewritten, which often leads to very long reconfiguration times. In this paper we present a novel, fully automated tool flow that exploits similarities between the modes and uses Dynamic Circuit Specialization to drastically reduce reconfiguration time. Experimental results show that the number of bits that is rewritten in the configuration memory reduces with a factor from 4.6X to 5.1X without significant performance penalties

    Design of Reconfigurable Crossbar Switch for BiNoC Router

    Get PDF
    this paper presents implementation of 10x10 reconfigurable crossbar switch (RCS) architecture for Dynamic Self-Reconfigurable BiNoC Architecture for Network On Chip. Its main purpose is to increase the performance, flexibility. This paper presents a VHDL based cycle accurate register transfer level model for evaluating the, Power and Area of reconfigurable cross bar switch in BiNoC architectures. We implemented a parameterized register transfer level design of reconfigurable crossbar switch (RCS) architecture. The design is parameterized on (i) size of packets, (ii) length and width of physical links, (iii) number, and depth of arbiters, and (iv) switching technique. The paper discusses in detail the architecture and characterization of the various reconfigurable crossbar switch (RCS) architecture components. The characterized values were integrated into the VHDL based RTL design to build the cycle accurate performance model. In this paper we show the result of simple 10x10 crossbar switch .The results include VHDL simulation of RCS on Xilinx ISE 13.1 software tool

    Techniques for low-overhead dynamic partial reconfiguration of FPGAs

    Get PDF

    FPGA structures for high speed and low overhead dynamic circuit specialization

    Get PDF
    A Field Programmable Gate Array (FPGA) is a programmable digital electronic chip. The FPGA does not come with a predefined function from the manufacturer; instead, the developer has to define its function through implementing a digital circuit on the FPGA resources. The functionality of the FPGA can be reprogrammed as desired and hence the name “field programmable”. FPGAs are useful in small volume digital electronic products as the design of a digital custom chip is expensive. Changing the FPGA (also called configuring it) is done by changing the configuration data (in the form of bitstreams) that defines the FPGA functionality. These bitstreams are stored in a memory of the FPGA called configuration memory. The SRAM cells of LookUp Tables (LUTs), Block Random Access Memories (BRAMs) and DSP blocks together form the configuration memory of an FPGA. The configuration data can be modified according to the user’s needs to implement the user-defined hardware. The simplest way to program the configuration memory is to download the bitstreams using a JTAG interface. However, modern techniques such as Partial Reconfiguration (PR) enable us to configure a part in the configuration memory with partial bitstreams during run-time. The reconfiguration is achieved by swapping in partial bitstreams into the configuration memory via a configuration interface called Internal Configuration Access Port (ICAP). The ICAP is a hardware primitive (macro) present in the FPGA used to access the configuration memory internally by an embedded processor. The reconfiguration technique adds flexibility to use specialized ci rcuits that are more compact and more efficient t han t heir b ulky c ounterparts. An example of such an implementation is the use of specialized multipliers instead of big generic multipliers in an FIR implementation with constant coefficients. To specialize these circuits and reconfigure during the run-time, researchers at the HES group proposed the novel technique called parameterized reconfiguration that can be used to efficiently and automatically implement Dynamic Circuit Specialization (DCS) that is built on top of the Partial Reconfiguration method. It uses the run-time reconfiguration technique that is tailored to implement a parameterized design. An application is said to be parameterized if some of its input values change much less frequently than the rest. These inputs are called parameters. Instead of implementing these parameters as regular inputs, in DCS these inputs are implemented as constants, and the application is optimized for the constants. For every change in parameter values, the design is re-optimized (specialized) during run-time and implemented by reconfiguring the optimized design for a new set of parameters. In DCS, the bitstreams of the parameterized design are expressed as Boolean functions of the parameters. For every infrequent change in parameters, a specialized FPGA configuration is generated by evaluating the corresponding Boolean functions, and the FPGA is reconfigured with the specialized configuration. A detailed study of overheads of DCS and providing suitable solutions with appropriate custom FPGA structures is the primary goal of the dissertation. I also suggest different improvements to the FPGA configuration memory architecture. After offering the custom FPGA structures, I investigated the role of DCS on FPGA overlays and the use of custom FPGA structures that help to reduce the overheads of DCS on FPGA overlays. By doing so, I hope I can convince the developer to use DCS (which now comes with minimal costs) in real-world applications. I start the investigations of overheads of DCS by implementing an adaptive FIR filter (using the DCS technique) on three different Xilinx FPGA platforms: Virtex-II Pro, Virtex-5, and Zynq-SoC. The study of how DCS behaves and what is its overhead in the evolution of the three FPGA platforms is the non-trivial basis to discover the costs of DCS. After that, I propose custom FPGA structures (reconfiguration controllers and reconfiguration drivers) to reduce the main overhead (reconfiguration time) of DCS. These structures not only reduce the reconfiguration time but also help curbing the power hungry part of the DCS system. After these chapters, I study the role of DCS on FPGA overlays. I investigate the effect of the proposed FPGA structures on Virtual-Coarse-Grained Reconfigurable Arrays (VCGRAs). I classify the VCGRA implementations into three types: the conventional VCGRA, partially parameterized VCGRA and fully parameterized VCGRA depending upon the level of parameterization. I have designed two variants of VCGRA grids for HPC image processing applications, namely, the MAC grid and Pixie. Finally, I try to tackle the reconfiguration time overhead at the hardware level of the FPGA by customizing the FPGA configuration memory architecture. In this part of my research, I propose to use a parallel memory structure to improve the reconfiguration time of DCS drastically. However, this improvement comes with a significant overhead of hardware resources which will need to be solved in future research on commercial FPGA configuration memory architectures

    A Modular Approach to Adaptive Reactive Streaming Systems

    Get PDF
    The latest generations of FPGA devices offer large resource counts that provide the headroom to implement large-scale and complex systems. However, there are increasing challenges for the designer, not just because of pure size and complexity, but also in harnessing effectively the flexibility and programmability of the FPGA. A central issue is the need to integrate modules from diverse sources to promote modular design and reuse. Further, the capability to perform dynamic partial reconfiguration (DPR) of FPGA devices means that implemented systems can be made reconfigurable, allowing components to be changed during operation. However, use of DPR typically requires low-level planning of the system implementation, adding to the design challenge. This dissertation presents ReShape: a high-level approach for designing systems by interconnecting modules, which gives a ‘plug and play’ look and feel to the designer, is supported by tools that carry out implementation and verification functions, and is carried through to support system reconfiguration during operation. The emphasis is on the inter-module connections and abstracting the communication patterns that are typical between modules – for example, the streaming of data that is common in many FPGA-based systems, or the reading and writing of data to and from memory modules. ShapeUp is also presented as the static precursor to ReShape. In both, the details of wiring and signaling are hidden from view, via metadata associated with individual modules. ReShape allows system reconfiguration at the module level, by supporting type checking of replacement modules and by managing the overall system implementation, via metadata associated with its FPGA floorplan. The methodology and tools have been implemented in a prototype for a broad domain-specific setting – networking systems – and have been validated on real telecommunications design projects

    Pixie: A heterogeneous Virtual Coarse-Grained Reconfigurable Array for high performance image processing applications

    Full text link
    Coarse-Grained Reconfigurable Arrays (CGRAs) enable ease of programmability and result in low development costs. They enable the ease of use specifically in reconfigurable computing applications. The smaller cost of compilation and reduced reconfiguration overhead enables them to become attractive platforms for accelerating high-performance computing applications such as image processing. The CGRAs are ASICs and therefore, expensive to produce. However, Field Programmable Gate Arrays (FPGAs) are relatively cheaper for low volume products but they are not so easily programmable. We combine best of both worlds by implementing a Virtual Coarse-Grained Reconfigurable Array (VCGRA) on FPGA. VCGRAs are a trade off between FPGA with large routing overheads and ASICs. In this perspective we present a novel heterogeneous Virtual Coarse-Grained Reconfigurable Array (VCGRA) called "Pixie" which is suitable for implementing high performance image processing applications. The proposed VCGRA contains generic processing elements and virtual channels that are described using the Hardware Description Language VHDL. Both elements have been optimized by using the parameterized configuration tool flow and result in a resource reduction of 24% for each processing elements and 82% for each virtual channels respectively.Comment: Presented at 3rd International Workshop on Overlay Architectures for FPGAs (OLAF 2017) arXiv:1704.0880

    Towards Dilated Placement of Dynamic NoC Cores

    Get PDF
    Instead of mapping application task graphs in a compact manner onto reconfigurable devices using a network-on-chip for interconnecting application cores, we propose dilating the mappings as much as the available latencies on critical connections allow. In a dilated mapping, the unused resources between an application\u27s configured components can be used to provide additional flexibility when the configuration needs to change. We motivate the reasons for dilating application task graphs targeted at reconfigurable devices; derive a simulated annealing approach to dilating the placement of such graphs; and present preliminary results of applying the algorithm to synthetic test cases. The method appears to result in successful and meaningful graph dilation and could be further tuned to satisfy desired power constraints

    Interconnect architectures for dynamically partially reconfigurable systems

    Get PDF
    Dynamically partially reconfigurable FPGAs (Field-Programmable Gate Arrays) allow hardware modules to be placed and removed at runtime while other parts of the system keep working. With their potential benefits, they have been the topic of a great deal of research over the last decade. To exploit the partial reconfiguration capability of FPGAs, there is a need for efficient, dynamically adaptive communication infrastructure that automatically adapts as modules are added to and removed from the system. Many bus and network-on-chip (NoC) architectures have been proposed to exploit this capability on FPGA technology. However, few realizations have been reported in the public literature to demonstrate or compare their performance in real world applications. While partial reconfiguration can offer many benefits, it is still rarely exploited in practical applications. Few full realizations of partially reconfigurable systems in current FPGA technologies have been published. More application experiments are required to understand the benefits and limitations of implementing partially reconfigurable systems and to guide their further development. The motivation of this thesis is to fill this research gap by providing empirical evidence of the cost and benefits of different interconnect architectures. The results will provide a baseline for future research and will be directly useful for circuit designers who must make a well-reasoned choice between the alternatives. This thesis contains the results of experiments to compare different NoC and bus interconnect architectures for FPGA-based designs in general and dynamically partially reconfigurable systems. These two interconnect schemes are implemented and evaluated in terms of performance, area and power consumption using FFT (Fast Fourier Transform) andANN(Artificial Neural Network) systems as benchmarks. Conclusions drawn from these results include recommendations concerning the interconnect approach for different kinds of applications. It is found that a NoC provides much better performance than a single channel bus and similar performance to a multi-channel bus in both parallel and parallel-pipelined FFT systems. This suggests that a NoC is a better choice for systems with multiple simultaneous communications like the FFT. Bus-based interconnect achieves better performance and consume less area and power than NoCbased scheme for the fully-connected feed-forward NN system. This suggests buses are a better choice for systems that do not require many simultaneous communications or systems with broadcast communications like a fully-connected feed-forward NN. Results from the experiments with dynamic partial reconfiguration demonstrate that buses have the advantages of better resource utilization and smaller reconfiguration time and memory than NoCs. However, NoCs are more flexible and expansible. They have the advantage of placing almost all of the communication infrastructure in the dynamic reconfiguration region. This means that different applications running on the FPGA can use different interconnection strategies without the overhead of fixed bus resources in the static region. Another objective of the research is to examine the partial reconfiguration process and reconfiguration overhead with current FPGA technologies. Partial reconfiguration allows users to efficiently change the number of running PEs to choose an optimal powerperformance operating point at the minimum cost of reconfiguration. However, this brings drawbacks including resource utilization inefficiency, power consumption overhead and decrease in system operating frequency. The experimental results report a 50% of resource utilization inefficiency with a power consumption overhead of less than 5% and a decrease in frequency of up to 32% compared to a static implementation. The results also show that most of the drawbacks of partial reconfiguration implementation come from the restrictions and limitations of partial reconfiguration design flow. If these limitations can be addressed, partial reconfiguration should still be considered with its potential benefits.Thesis (Ph.D.) -- University of Adelaide, School of Electrical and Electronic Engineering, 201
    corecore