113 research outputs found

    Generic Connectivity-Based CGRA Mapping via Integer Linear Programming

    Full text link
    Coarse-grained reconfigurable architectures (CGRAs) are programmable logic devices with large coarse-grained ALU-like logic blocks, and multi-bit datapath-style routing. CGRAs often have relatively restricted data routing networks, so they attract CAD mapping tools that use exact methods, such as Integer Linear Programming (ILP). However, tools that target general architectures must use large constraint systems to fully describe an architecture's flexibility, resulting in lengthy run-times. In this paper, we propose to derive connectivity information from an otherwise generic device model, and use this to create simpler ILPs, which we combine in an iterative schedule and retain most of the exactness of a fully-generic ILP approach. This new approach has a speed-up geometric mean of 5.88x when considering benchmarks that do not hit a time-limit of 7.5 hours on the fully-generic ILP, and 37.6x otherwise. This was measured using the set of benchmarks used to originally evaluate the fully-generic approach and several more benchmarks representing computation tasks, over three different CGRA architectures. All run-times of the new approach are less than 20 minutes, with 90th percentile time of 410 seconds. The proposed mapping techniques are integrated into, and evaluated using the open-source CGRA-ME architecture modelling and exploration framework.Comment: 8 pages of content; 8 figures; 3 tables; to appear in FCCM 2019; Uses the CGRA-ME framework at http://cgra-me.ece.utoronto.ca

    FPGA dynamic and partial reconfiguration : a survey of architectures, methods, and applications

    Get PDF
    Dynamic and partial reconfiguration are key differentiating capabilities of field programmable gate arrays (FPGAs). While they have been studied extensively in academic literature, they find limited use in deployed systems. We review FPGA reconfiguration, looking at architectures built for the purpose, and the properties of modern commercial architectures. We then investigate design flows, and identify the key challenges in making reconfigurable FPGA systems easier to design. Finally, we look at applications where reconfiguration has found use, as well as proposing new areas where this capability places FPGAs in a unique position for adoption

    Design synthesis for dynamically reconfigurable logic systems

    Get PDF
    Dynamic reconfiguration of logic circuits has been a research problem for over four decades. While applications using logic reconfiguration in practical scenarios have been demonstrated, the design of these systems has proved to be a difficult process demanding the skills of an experienced reconfigurable logic design expert. This thesis proposes an automatic synthesis method which relieves designers of some of the difficulties associated with designing partially dynamically reconfigurable systems. A new design abstraction model for reconfigurable systems is proposed in order to support design exploration using the presented method. Given an input behavioural model, a technology server and a set of design constraints, the method will generate a reconfigurable design solution in the form of a 3D floorplan and a configuration schedule. The approach makes use of genetic algorithms. It facilitates global optimisation to accommodate multiple design objectives common in reconfigurable system design, while making realistic estimates of configuration overheads and of the potential for resource sharing between configurations. A set of custom evolutionary operators has been developed to cope with a multiple-objective search space. Furthermore, the application of a simulation technique verifying the lll results of such an automatic exploration is outlined in the thesis. The qualities of the proposed method are evaluated using a set of benchmark designs taking data from a real reconfigurable logic technology. Finally, some extensions to the proposed method and possible research directions are discussed

    Mapping Framework for Heterogeneous Reconfigurable Architectures:Combining Temporal Partitioning and Multiprocessor Scheduling

    Get PDF

    Automatic mapping of graphical programming applications to microelectronic technologies

    Get PDF
    Adaptive computing systems (ACSs) and application-specific integrated circuits (ASICs) can serve as flexible hardware accelerators for applications in domains such as image processing and digital signal processing. However, the mapping of applications onto ACSs and ASICs using the traditional methods can take months for a hardware engineer to develop and debug. In this dissertation, a new approach for automatic mapping of software applications onto ACSs and ASICs has been developed, implemented and validated. This dissertation presents the design flow of the software environment called CHAMPION, which is being developed at the University of Tennessee. This environment permits high-level design entry using the Cantata graphical programming software fromKRI. Using Cantata as the design entry, CHAMPION hides from the user the low-level details of the hardware architecture and the finer issues of application mapping onto the hardware. Validation of the CHAMPION environment was performed using multiple applications of moderate complexity. In one case, theapplication mapping time which required six weeks to perform manually took only six minutes for CHAMPION, yet comparable results were produced. Furthermore, the CHAMPION environment was constructed such that retargeting to a new adaptive computing system could be accomplished in just a few hours as opposed to weeks using manual methods. Thus, CHAMPION permits both ACSs and ASICs to be utilized by a wider audience and application development accomplished in less time

    An embedded system supporting dynamic partial reconfiguration of hardware resources for morphological image processing

    Get PDF
    Processors for high-performance computing applications are generally designed with a focus on high clock rates, parallelism of operations and high communication bandwidth, often at the expense of large power consumption. However, the emphasis of many embedded systems and untethered devices is on minimal hardware requirements and reduced power consumption. With the incessant growth of computational needs for embedded applications, which contradict chip power and area needs, the burden is put on the hardware designers to come up with designs that optimize power and area requirements. This thesis investigates the efficient design of an embedded system for morphological image processing applications on Xilinx FPGAs (Field Programmable Gate Array) by optimizing both area and power usage while delivering high performance. The design leverages a unique capability of FPGAs called dynamic partial reconfiguration (DPR) which allows changing the hardware configuration of silicon pieces at runtime. DPR allows regions of the FPGA to be reprogrammed with new functionality while applications are still running in the remainder of the device. The main aim of this thesis is to design an embedded system for morphological image processing by accounting for real time and area constraints as compared to a statically configured FPGA. IP (Intellectual Property) cores are synthesized for both static and dynamic time. DPR enables instantiation of more hardware logic over a period of time on an existing device by time-multiplexing the hardware realization of functions. A comparison of power consumption is presented for the statically and dynamically reconfigured designs. Finally, a performance comparison is included for the implementation of the respective algorithms on a hardwired ARM processor as well as on another general-purpose processor. The results prove the viability of DPR for morphological image processing applications

    Exploiting Local Logic Structures to Optimize Multi-Core SoC Floorplanning

    Get PDF
    We present a throughput-driven partitioning and a throughput-preserving merging algorithm for the high-level physical synthesis of latency-insensitive (LI) systems. These two algorithms are integrated along with a published floorplanner in a new iterative physical synthesis flow to optimize system throughput and reduce area occupation. The synthesis flow iterates a floorplanning-partitioning-floorplanning-merging sequence of operations to improve the system topology and the physical locations of cores. The partitioning algorithm performs bottom-up clustering of the internal logic of a given IP core to divide it into smaller ones, each of which has no combinational path from input to output and thus is legal for LI-interface encapsulation. Applying this algorithm to cores on critical feedback loops optimizes their length and in turn enables throughput optimization via the subsequent floorplanning. The merging algorithm reduces the number of cores on non-critical loops, lowering the overall area taken by LI interfaces without hurting the system throughput. Experimental results on a large system-on-chip design show a 16.7% speedup in system throughput and a 2.1% reduction in area occupation

    Automatic synthesis of reconfigurable instruction set accelerators

    Get PDF
    corecore