233 research outputs found

    Flexible Baseband Modulator Architecture for Multi-Waveform 5G Communications

    Get PDF
    The fifth-generation (5G) revolution represents more than a mere performance enhancement of previous generations: it will deeply transform the way humans and/or machines interact, enabling a heterogeneous expansion in the number of use cases and services. Crucial to the realization of this revolution is the design of hardware components characterized by high degrees of flexibility, versatility and resource/power efficiency. This chapter proposes a field-programmable gate array (FPGA)-oriented baseband processing architecture suitable for fast-changing communication environments such as 4G/5G waveform coexistence, noncontiguous carrier aggregation (CA) or centralized cloud radio access network (C-RAN) processing. The proposed architecture supports three 5G waveform candidates and is shown to be upgradable, resource-efficient and cost-effective. Through hardware virtualization, enabled by dynamic partial reconfiguration (DPR), the design space exploration of our architecture exceeds the hardware resources available on the Zynq xc7z020 device. Moreover, dynamic frequency scaling (DFS) enables the runtime adjustment of processing throughput and power reductions by up to 88%. The combined resource overhead for DPR and DFS is very low, and the reconfiguration latency stays two orders of magnitude below the control plane latency requirements proposed for 5G communications

    ARITHMETIC LOGIC UNIT ARCHITECTURES WITH DYNAMICALLY DEFINED PRECISION

    Get PDF
    Modern central processing units (CPUs) employ arithmetic logic units (ALUs) that support statically defined precisions, often adhering to industry standards. Although CPU manufacturers highly optimize their ALUs, industry standard precisions embody accuracy and performance compromises for general purpose deployment. Hence, optimizing ALU precision holds great potential for improving speed and energy efficiency. Previous research on multiple precision ALUs focused on predefined, static precisions. Little previous work addressed ALU architectures with customized, dynamically defined precision. This dissertation presents approaches for developing dynamic precision ALU architectures for both fixed-point and floating-point to enable better performance, energy efficiency, and numeric accuracy. These new architectures enable dynamically defined precision, including support for vectorization. The new architectures also prevent performance and energy loss due to applying unnecessarily high precision on computations, which often happens with statically defined standard precisions. The new ALU architectures support different precisions through the use of configurable sub-blocks, with this dissertation including demonstration implementations for floating point adder, multiply, and fused multiply-add (FMA) circuits with 4-bit sub-blocks. For these circuits, the dynamic precision ALU speed is nearly the same as traditional ALU approaches, although the dynamic precision ALU is nearly twice as large

    Regular Datapaths on Field-Programmable Gate Arrays

    Get PDF
    Field-Programmable Gate Arrays (FPGAs) are a recent kind of programmable logic device. They allow the implementation of integrated digital electronic circuits without requiring the complex optical, chemical and mechanical processes used in a conventional chip fabrication. FPGAs can be embedded in traditional system designflows to perform prototyping and emulation tasks. In addition, they also enable novel applications such as configurable computers with hardware dynamically adaptable to a specific problem. The growing chip capacity now allows even the implementation of CPUs and DSPs on single FPGAs. However, current design automation tools trace their roots to times of very limited FPGA sizes, and are primarily optimized for the implementation of random glue logic. The wide datapaths common to CPUs and DSPs are only processed with reduced performance. This thesis presents Structured Design Implementation (SDI), a suite of specialized tools coordinated by a common strategy, which aims to efficiently map even larger regular datapaths to FPGAs. In all steps, regularity is preserved whenever possible, or restored after disruptive operations were required. The circuits are composed from parametrizable modules providing a variety of logical, arithmetical and storage functions. For each module, multiple target FPGA-specific implementation alternatives may be generated in both gatelevel netlist and layout views. A floorplanner based on a genetic algorithm is then used to simultaneously choose an actual implementation from the set of alternatives for each module, and to arrange the selected module implementations in a linear placement. The floorplanning operation optimizes for short routing delays, high routability, and fit into the target FPGA.Field-Programmable Gate-Arrays (FPGAs) sind eine noch junge Art von programmierbaren Logikbausteinen. Sie erlauben die Implementierung von integrierten Digitalschaltungen ohne die komplizierten optischen, chemischen und mechanischen Prozesse, die normalerweise für die Chipfertigung erforderlich sind. FPGAs können im Rahmen konventioneller Entwurfsmethoden zu Emulationszwecken und Prototyp-Aufbauten herangezogen werden. Sie erlauben aber auch völlig neue Anwendungen wie rekonfigurierbare Computer, deren Hardware dynamisch an ein spezielles Problem angepaßt werden kann. Die gewachsene Chip-Kapazität erlaubt nun sogar die Implementierung von CPUs und digitalen Signalprozessoren (DSPs) auf einem einzelnen FPGA. Die Leistungsfähigkeit der entstandenen Schaltungen wird jedoch durch die zur Zeit erhältlichen CAD-Werkzeuge limitiert, da diese noch auf stark beschränkte FPGA-Größen ausgerichtet sind und primär der platzsparenden Verarbeitung unregelmäßiger Logik dienen. Die breiten Datenpfade in Bit-Slice-Struktur, die den Kern vieler CPUs und DSPs darstellen, werden nur suboptimal behandelt. Diese Arbeit stellt Structured Design Implementation (SDI) vor, ein System von spezialisierten CAD-Werkzeugen, die auch größere reguläre Datenpfade effizient auf FPGAs abbilden. In allen Verarbeitungsschritten wird dabei die bestehende Regularität soweit wie möglich erhalten oder nach regularitätsvernichtenden Operationen wiederhergestellt. Zur Schaltungseingabe steht eine Bibliothek von allgemeinen Modulen aus den Bereichen Logik, Arithmetik und Speicherung bereit. Diese können durch Belegung verschiedener Parameter wie Bit-Breiten und Datentypen an aktuelle Anforderungen angepaßt werden

    Mapping Framework for Heterogeneous Reconfigurable Architectures:Combining Temporal Partitioning and Multiprocessor Scheduling

    Get PDF

    Towards a Holistic CAD Platform for Nanotechnologies

    Get PDF
    Silicon-based CMOS technologies are predicted to reach their ultimate limits by the middle of the next decade. Research on nanotechnologies is actively conducted, in a world-wide effort to develop new technologies able to maintain the Moore's law. They promise revolutionizing the computing systems by integrating tremendous numbers of devices at low cost. These trends will have a profound impact on the architectures of computing systems and will require a new paradigm of CAD. The paper presents a work in progress on this direction. It is aimed at fitting requirements and constraints of nanotechnologies, in an effort to achieve efficient use of the huge computing power promised by them. To achieve this goal we are developing CAD tools able to exploit efficiently these huge computing capabilities promised by nanotechnologies in the domain of simulation of complex systems composed by huge numbers of relatively simple elements.Comment: Submitted on behalf of TIMA Editions (http://irevues.inist.fr/tima-editions

    A scalable, portable, FPGA-based implementation of the Unscented Kalman Filter

    Get PDF
    Sustained technological progress has come to a point where robotic/autonomous systems may well soon become ubiquitous. In order for these systems to actually be useful, an increase in autonomous capability is necessary for aerospace, as well as other, applications. Greater aerospace autonomous capability means there is a need for high performance state estimation. However, the desire to reduce costs through simplified development processes and compact form factors can limit performance. A hardware-based approach, such as using a Field Programmable Gate Array (FPGA), is common when high performance is required, but hardware approaches tend to have a more complicated development process when compared to traditional software approaches; greater development complexity, in turn, results in higher costs. Leveraging the advantages of both hardware-based and software-based approaches, a hardware/software (HW/SW) codesign of the Unscented Kalman Filter (UKF), based on an FPGA, is presented. The UKF is split into an application-specific part, implemented in software to retain portability, and a non-application-specific part, implemented in hardware as a parameterisable IP core to increase performance. The codesign is split into three versions (Serial, Parallel and Pipeline) to provide flexibility when choosing the balance between resources and performance, allowing system designers to simplify the development process. Simulation results demonstrating two possible implementations of the design, a nanosatellite application and a Simultaneous Localisation and Mapping (SLAM) application, are presented. These results validate the performance of the HW/SW UKF and demonstrate its portability, particularly in small aerospace systems. Implementation (synthesis, timing, power) details for a variety of situations are presented and analysed to demonstrate how the HW/SW codesign can be scaled for any application

    Models, Design Methods and Tools for Improved Partial Dynamic Reconfiguration

    Get PDF
    Partial dynamic reconfiguration of FPGAs has attracted high attention from both academia and industry in recent years. With this technique, the functionality of the programmable devices can be adapted at runtime to changing requirements. The approach allows designers to use FPGAs more efficiently: E. g. FPGA resources can be time-shared between different functions and the functions itself can be adapted to changing workloads at runtime. Thus partial dynamic reconfiguration enables a unique combination of software-like flexibility and hardware-like performance. Still there exists no common understanding on how to assess the overhead introduced by partial dynamic reconfiguration. This dissertation presents a new cost model for both the runtime and the memory overhead that results from partial dynamic reconfiguration. It is shown how the model can be incorporated into all stages of the design optimization for reconfigurable hardware. In particular digital circuits can be mapped onto FPGAs such that only small fractions of the hardware must be reconfigured at runtime, which saves time, memory, and energy. The design optimization is most efficient if it is applied during high level synthesis. This book describes how the cost model has been integrated into a new high level synthesis tool. The tool allows the designer to trade-off FPGA resource use versus reconfiguration overhead. It is shown that partial reconfiguration causes only small overhead if the design is optimized with regard to reconfiguration cost. A wide range of experimental results is provided that demonstrates the benefits of the applied method.:1 Introduction 1 1.1 Reconfigurable Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.1 Reconfigurable System on a Chip (RSOC) . . . . . . . . . . . . 4 1.1.2 Anatomy of an Application . . . . . . . . . . . . . . . . . . . . . . 6 1.1.3 RSOC Design Characteristics and Trade-offs . . . . . . . . . . . 7 1.2 Classification of Reconfigurable Architectures . . . . . . . . . . . . . . . 10 1.2.1 Partial Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.2 Runtime Reconfiguration (RTR) . . . . . . . . . . . . . . . . . . . 10 1.2.3 Multi-Context Configuration . . . . . . . . . . . . . . . . . . . . . 11 1.2.4 Fine-Grain Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2.5 Coarse-Grain Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3 Reconfigurable Computing Specific Design Issues . . . . . . . . . . . . 12 1.4 Overview of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . 14 2 Reconfigurable Computing Systems – Background 17 2.1 Examples for RSOCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Partially Reconfigurable FPGAs: Xilinx Virtex Device Family . . . . . . 20 2.2.1 Virtex-II/Virtex-II Pro Logic Architecture . . . . . . . . . . . . . 20 2.2.2 Reconfiguration Architecture and Reconfiguration Control . . 21 2.3 Methods for Design Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.1 Behavioural Design Entry . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.2 Design Entry at Register-Transfer Level (RTL) . . . . . . . . . . 25 2.3.3 Xilinx Early Access Partial Reconfiguration Design Flow . . . . 26 2.4 Task Management in Reconfigurable Computing . . . . . . . . . . . . . 27 2.4.1 Online and Offline Task Management . . . . . . . . . . . . . . . 28 2.4.2 Task Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.4.3 Task Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.4.4 Reconfiguration Runtime Overhead . . . . . . . . . . . . . . . . 31 2.5 Configuration Data Compression . . . . . . . . . . . . . . . . . . . . . . . 32 2.6 Evaluation of Reconfigurable Systems . . . . . . . . . . . . . . . . . . . . 35 2.6.1 Energy Efficiency Models . . . . . . . . . . . . . . . . . . . . . . . 35 2.6.2 Area Efficiency Models . . . . . . . . . . . . . . . . . . . . . . . . 37 2.6.3 Runtime Efficiency Models . . . . . . . . . . . . . . . . . . . . . . 37 2.7 Similarity Based Reduction of Reconfiguration Overhead . . . . . . . . 38 2.7.1 Configuration Data Generation Methods . . . . . . . . . . . . . 39 2.7.2 Device Mapping Methods . . . . . . . . . . . . . . . . . . . . . . . 40 2.7.3 Circuit Design Methods . . . . . . . . . . . . . . . . . . . . . . . . 41 2.7.4 Model for Partial Configuration . . . . . . . . . . . . . . . . . . . 44 2.8 Contributions of this Work . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3 Runtime Reconfiguration Cost and Optimization Methods 47 3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.2 Reconfiguration State Graph . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.2.1 Reconfiguration Time Overhead . . . . . . . . . . . . . . . . . . 52 3.2.2 Dynamic Configuration Data Overhead . . . . . . . . . . . . . . 52 3.3 Configuration Cost at Bitstream Level . . . . . . . . . . . . . . . . . . . . 54 3.4 Configuration Cost at Structural Level . . . . . . . . . . . . . . . . . . . 56 3.4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.4.2 Virtual Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.4.3 Reconfiguration Costs in the VA Context . . . . . . . . . . . . . 65 3.5 Allocation Functions with Minimal Reconfiguration Costs . . . . . . . 67 3.5.1 Allocation of Node Pairs . . . . . . . . . . . . . . . . . . . . . . . 68 3.5.2 Direct Allocation of Nodes . . . . . . . . . . . . . . . . . . . . . . 76 3.5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4 Implementation Tools for Reconfigurable Computing 95 4.1 Mapping of Netlists to FPGA Resources . . . . . . . . . . . . . . . . . . . 96 4.1.1 Mapping to Device Resources . . . . . . . . . . . . . . . . . . . . 96 4.1.2 Connectivity Transformations . . . . . . . . . . . . . . . . . . . . 99 4.1.3 Mapping Variants and Reconfiguration Costs . . . . . . . . . . . 100 4.1.4 Mapping of Circuit Macros . . . . . . . . . . . . . . . . . . . . . . 101 4.1.5 Global Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.1.6 Netlist Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.2 Mapping Aware Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.2.1 Generalized Node Mapping . . . . . . . . . . . . . . . . . . . . . 104 4.2.2 Successive Node Allocation . . . . . . . . . . . . . . . . . . . . . 105 4.2.3 Node Allocation with Ant Colony Optimization . . . . . . . . . 107 4.2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.3 Netlist Mapping with Minimized Reconfiguration Cost . . . . . . . . . 110 4.3.1 Mapping Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.3.2 Mapping and Packing of Elements into Logic Blocks . . . . . . 112 4.3.3 Logic Element Selection . . . . . . . . . . . . . . . . . . . . . . . 114 4.3.4 Logic Element Selection for Min. Routing Reconfiguration . . 115 4.3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5 High-Level Synthesis for Reconfigurable Computing 125 5.1 Introduction to HLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.1.1 HLS Tool Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.1.2 Realization of the Hardware Tasks . . . . . . . . . . . . . . . . . 128 5.2 New Concepts for Task-based Reconfiguration . . . . . . . . . . . . . . 131 5.2.1 Multiple Hardware Tasks in one Reconfigurable Module . . . . 132 5.2.2 Multi-Level Reconfiguration . . . . . . . . . . . . . . . . . . . . . 133 5.2.3 Resource Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.3 Datapath Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.3.1 Task Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.3.2 Resource Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.3.3 Resource Binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.3.4 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 5.3.5 Constraints for Scheduling and Resource Binding . . . . . . . . 151 5.4 Reconfiguration Optimized Datapath Implementation . . . . . . . . . . 153 5.4.1 Effects of Scheduling and Binding on Reconfiguration Costs . 153 5.4.2 Strategies for Resource Type Binding . . . . . . . . . . . . . . . 154 5.4.3 Strategies for Resource Instance Binding . . . . . . . . . . . . . 157 5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 5.5.1 Summary of Binding Methods and Tool Setup . . . . . . . . . . 163 5.5.2 Cost Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 5.5.3 Implementation Scenarios . . . . . . . . . . . . . . . . . . . . . . 166 5.5.4 Benchmark Characteristics . . . . . . . . . . . . . . . . . . . . . . 168 5.5.5 Benchmark Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 5.5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 6 Summary and Outlook 185 Bibliography 189 A Simulated Annealing 201Partielle dynamische Rekonfiguration von FPGAs hat in den letzten Jahren große Aufmerksamkeit von Wissenschaft und Industrie auf sich gezogen. Die Technik erlaubt es, die Funktionalität von progammierbaren Bausteinen zur Laufzeit an veränderte Anforderungen anzupassen. Dynamische Rekonfiguration erlaubt es Entwicklern, FPGAs effizienter einzusetzen: z.B. können Ressourcen für verschiedene Funktionen wiederverwendet werden und die Funktionen selbst können zur Laufzeit an veränderte Verarbeitungsschritte angepasst werden. Insgesamt erlaubt partielle dynamische Rekonfiguration eine einzigartige Kombination von software-artiger Flexibilität und hardware-artiger Leistungsfähigkeit. Bis heute gibt es keine Übereinkunft darüber, wie der zusätzliche Aufwand, der durch partielle dynamische Rekonfiguration verursacht wird, zu bewerten ist. Diese Dissertation führt ein neues Kostenmodell für Laufzeit und Speicherbedarf ein, welche durch partielle dynamische Rekonfiguration verursacht wird. Es wird aufgezeigt, wie das Modell in alle Ebenen der Entwurfsoptimierung für rekonfigurierbare Hardware einbezogen werden kann. Insbesondere wird gezeigt, wie digitale Schaltungen derart auf FPGAs abgebildet werden können, sodass nur wenig Ressourcen der Hardware zur Laufzeit rekonfiguriert werden müssen. Dadurch kann Zeit, Speicher und Energie eingespart werden. Die Entwurfsoptimierung ist am effektivsten, wenn sie auf der Ebene der High-Level-Synthese angewendet wird. Diese Arbeit beschreibt, wie das Kostenmodell in ein neuartiges Werkzeug für die High-Level-Synthese integriert wurde. Das Werkzeug erlaubt es, beim Entwurf die Nutzung von FPGA-Ressourcen gegen den Rekonfigurationsaufwand abzuwägen. Es wird gezeigt, dass partielle Rekonfiguration nur wenig Kosten verursacht, wenn der Entwurf bezüglich Rekonfigurationskosten optimiert wird. Eine Anzahl von Beispielen und experimentellen Ergebnissen belegt die Vorteile der angewendeten Methodik.:1 Introduction 1 1.1 Reconfigurable Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.1 Reconfigurable System on a Chip (RSOC) . . . . . . . . . . . . 4 1.1.2 Anatomy of an Application . . . . . . . . . . . . . . . . . . . . . . 6 1.1.3 RSOC Design Characteristics and Trade-offs . . . . . . . . . . . 7 1.2 Classification of Reconfigurable Architectures . . . . . . . . . . . . . . . 10 1.2.1 Partial Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.2 Runtime Reconfiguration (RTR) . . . . . . . . . . . . . . . . . . . 10 1.2.3 Multi-Context Configuration . . . . . . . . . . . . . . . . . . . . . 11 1.2.4 Fine-Grain Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2.5 Coarse-Grain Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3 Reconfigurable Computing Specific Design Issues . . . . . . . . . . . . 12 1.4 Overview of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . 14 2 Reconfigurable Computing Systems – Background 17 2.1 Examples for RSOCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Partially Reconfigurable FPGAs: Xilinx Virtex Device Family . . . . . . 20 2.2.1 Virtex-II/Virtex-II Pro Logic Architecture . . . . . . . . . . . . . 20 2.2.2 Reconfiguration Architecture and Reconfiguration Control . . 21 2.3 Methods for Design Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.1 Behavioural Design Entry . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.2 Design Entry at Register-Transfer Level (RTL) . . . . . . . . . . 25 2.3.3 Xilinx Early Access Partial Reconfiguration Design Flow . . . . 26 2.4 Task Management in Reconfigurable Computing . . . . . . . . . . . . . 27 2.4.1 Online and Offline Task Management . . . . . . . . . . . . . . . 28 2.4.2 Task Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.4.3 Task Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.4.4 Reconfiguration Runtime Overhead . . . . . . . . . . . . . . . . 31 2.5 Configuration Data Compression . . . . . . . . . . . . . . . . . . . . . . . 32 2.6 Evaluation of Reconfigurable Systems . . . . . . . . . . . . . . . . . . . . 35 2.6.1 Energy Efficiency Models . . . . . . . . . . . . . . . . . . . . . . . 35 2.6.2 Area Efficiency Models . . . . . . . . . . . . . . . . . . . . . . . . 37 2.6.3 Runtime Efficiency Models . . . . . . . . . . . . . . . . . . . . . . 37 2.7 Similarity Based Reduction of Reconfiguration Overhead . . . . . . . . 38 2.7.1 Configuration Data Generation Methods . . . . . . . . . . . . . 39 2.7.2 Device Mapping Methods . . . . . . . . . . . . . . . . . . . . . . . 40 2.7.3 Circuit Design Methods . . . . . . . . . . . . . . . . . . . . . . . . 41 2.7.4 Model for Partial Configuration . . . . . . . . . . . . . . . . . . . 44 2.8 Contributions of this Work . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3 Runtime Reconfiguration Cost and Optimization Methods 47 3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.2 Reconfiguration State Graph . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.2.1 Reconfiguration Time Overhead . . . . . . . . . . . . . . . . . . 52 3.2.2 Dynamic Configuration Data Overhead . . . . . . . . . . . . . . 52 3.3 Configuration Cost at Bitstream Level . . . . . . . . . . . . . . . . . . . . 54 3.4 Configuration Cost at Structural Level . . . . . . . . . . . . . . . . . . . 56 3.4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.4.2 Virtual Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.4.3 Reconfiguration Costs in the VA Context . . . . . . . . . . . . . 65 3.5 Allocation Functions with Minimal Reconfiguration Costs . . . . . . . 67 3.5.1 Allocation of Node Pairs . . . . . . . . . . . . . . . . . . . . . . . 68 3.5.2 Direct Allocation of Nodes . . . . . . . . . . . . . . . . . . . . . . 76 3.5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4 Implementation Tools for Reconfigurable Computing 95 4.1 Mapping of Netlists to FPGA Resources . . . . . . . . . . . . . . . . . . . 96 4.1.1 Mapping to Device Resources . . . . . . . . . . . . . . . . . . . . 96 4.1.2 Connectivity Transformations . . . . . . . . . . . . . . . . . . . . 99 4.1.3 Mapping Variants and Reconfiguration Costs . . . . . . . . . . . 100 4.1.4 Mapping of Circuit Macros . . . . . . . . . . . . . . . . . . . . . . 101 4.1.5 Global Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.1.6 Netlist Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.2 Mapping Aware Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.2.1 Generalized Node Mapping . . . . . . . . . . . . . . . . . . . . . 104 4.2.2 Successive Node Allocation . . . . . . . . . . . . . . . . . . . . . 105 4.2.3 Node Allocation with Ant Colony Optimization . . . . . . . . . 107 4.2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.3 Netlist Mapping with Minimized Reconfiguration Cost . . . . . . . . . 110 4.3.1 Mapping Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.3.2 Mapping and Packing of Elements into Logic Blocks . . . . . . 112 4.3.3 Logic Element Selection . . . . . . . . . . . . . . . . . . . . . . . 114 4.3.4 Logic Element Selection for Min. Routing Reconfiguration . . 115 4.3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5 High-Level Synthesis for Reconfigurable Computing 125 5.1 Introduction to HLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.1.1 HLS Tool Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.1.2 Realization of the Hardware Tasks . . . . . . . . . . . . . . . . . 128 5.2 New Concepts for Task-based Reconfiguration . . . . . . . . . . . . . . 131 5.2.1 Multiple Hardware Tasks in one Reconfigurable Module . . . . 132 5.2.2 Multi-Level Reconfiguration . . . . . . . . . . . . . . . . . . . . . 133 5.2.3 Resource Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.3 Datapath Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.3.1 Task Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.3.2 Resource Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.3.3 Resource Binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.3.4 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 5.3.5 Constraints for Scheduling and Resource Binding . . . . . . . . 151 5.4 Reconfiguration Optimized Datapath Implementation . . . . . . . . . . 153 5.4.1 Effects of Scheduling and Binding on Reconfiguration Costs . 153 5.4.2 Strategies for Resource Type Binding . . . . . . . . . . . . . . . 154 5.4.3 Strategies for Resource Instance Binding . . . . . . . . . . . . . 157 5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 5.5.1 Summary of Binding Methods and Tool Setup . . . . . . . . . . 163 5.5.2 Cost Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 5.5.3 Implementation Scenarios . . . . . . . . . . . . . . . . . . . . . . 166 5.5.4 Benchmark Characteristics . . . . . . . . . . . . . . . . . . . . . . 168 5.5.5 Benchmark Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 5.5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 6 Summary and Outlook 185 Bibliography 189 A Simulated Annealing 20

    Domain-specific and reconfigurable instruction cells based architectures for low-power SoC

    Get PDF
    corecore