    Virtual Runtime Application Partitions for Resource Management in Massively Parallel Architectures

    This thesis presents a novel design paradigm, called Virtual Runtime Application Partitions (VRAP), to judiciously utilize the on-chip resources. As the dark silicon era approaches, where the power considerations will allow only a fraction chip to be powered on, judicious resource management will become a key consideration in future designs. Most of the works on resource management treat only the physical components (i.e. computation, communication, and memory blocks) as resources and manipulate the component to application mapping to optimize various parameters (e.g. energy efficiency). To further enhance the optimization potential, in addition to the physical resources we propose to manipulate abstract resources (i.e. voltage/frequency operating point, the fault-tolerance strength, the degree of parallelism, and the configuration architecture). The proposed framework (i.e. VRAP) encapsulates methods, algorithms, and hardware blocks to provide each application with the abstract resources tailored to its needs. To test the efficacy of this concept, we have developed three distinct self adaptive environments: (i) Private Operating Environment (POE), (ii) Private Reliability Environment (PRE), and (iii) Private Configuration Environment (PCE) that collectively ensure that each application meets its deadlines using minimal platform resources. In this work several novel architectural enhancements, algorithms and policies are presented to realize the virtual runtime application partitions efficiently. Considering the future design trends, we have chosen Coarse Grained Reconfigurable Architectures (CGRAs) and Network on Chips (NoCs) to test the feasibility of our approach. Specifically, we have chosen Dynamically Reconfigurable Resource Array (DRRA) and McNoC as the representative CGRA and NoC platforms. The proposed techniques are compared and evaluated using a variety of quantitative experiments. Synthesis and simulation results demonstrate VRAP significantly enhances the energy and power efficiency compared to state of the art.Siirretty Doriast

    Compiler Discovered Dynamic Scheduling of Irregular Code in High-Level Synthesis

    Dynamically scheduled high-level synthesis (HLS) achieves higher throughput than static HLS for codes with unpredictable memory accesses and control flow. However, excessive dataflow scheduling results in circuits that use more resources and have a slower critical path, even when only a part of the circuit exhibits dynamic behavior. Recent work has shown that marking parts of a dataflow circuit for static scheduling can save resources and improve performance (hybrid scheduling), but the dynamic part of the circuit still bottlenecks the critical path. We propose instead to selectively introduce dynamic scheduling into static HLS. This paper presents an algorithm for identifying code regions amenable to dynamic scheduling and shows a methodology for introducing dynamically scheduled basic blocks, loops, and memory operations into static HLS. Our algorithm is informed by modulo-scheduling and can be integrated into any modulo-scheduled HLS tool. On a set of ten benchmarks, we show that our approach achieves on average an up to 3.7Ă—\times and 3Ă—\times speedup against dynamic and hybrid scheduling, respectively, with an area overhead of 1.3Ă—\times and frequency degradation of 0.74Ă—\times when compared to static HLS.Comment: To appear in the 33rd International Conference on Field-Programmable Logic and Applications (2023

    Architecture matérielle et flot de programmation associé pour la conception de systèmes numériques tolérants aux fautes

    Whether in automotive with heat stress or in aerospace and nuclear field subjected to cosmic,neutron and gamma radiation, the environment can lead to the development of faults in electronic systems. These faults, which can be transient or permanent, will lead to erroneous results that are unacceptable in some application contexts. The use of so-called rad-hard components is sometimes compromised due to their high costs and supply problems associated with export rules.This thesis proposes a joint hardware and software approach independent of integration technology for using digital programmable devices in environments that generate faults. Our approach includes the definition of a Coarse Grained Reconfigurable Architecture (CGRA) able to execute entire application code but also all the hardware and software mechanisms to make it tolerant to transient and permanent faults. This is achieved by the combination of redundancy and dynamic reconfiguration of the CGRA based on a library of configurations generated by a complete conception flow. This implemented flow relies on a flow to map a code represented as a Control and Data Flow Graph (CDFG) on the CGRA architecture by obtaining directly a large number of different configurations and allows to exploit the full potential of architecture.This work, which has been validated through experiments with applications in the field of signal and image processing, has been the subject of two publications in international conferences and of two patents.Que ce soit dans l’automobile avec des contraintes thermiques ou dans l’aérospatial et le nucléaire soumis à des rayonnements ionisants, l’environnement entraîne l’apparition de fautes dans les systèmes électroniques. Ces fautes peuvent être transitoires ou permanentes et vont induire des résultats erronés inacceptables dans certains contextes applicatifs. L’utilisation de composants dits « rad-hard » est parfois compromise par leurs coûts élevés ou les difficultés d’approvisionnement liés aux règles d’exportation.Cette thèse propose une approche conjointe matérielle et logicielle indépendante de la technologie d’intégration permettant d’utiliser des composants numériques programmables dans des environnements susceptibles de générer des fautes. Notre proposition comporte la définition d’une Architecture Reconfigurable à Gros Grains (CGRA) capable d’exécuter des codes applicatifs complets mais aussi l’ensemble des mécanismes matériels et logiciels permettant de rendre cette architecture tolérante aux fautes. Ce résultat est obtenu par l’association de redondance et de reconfiguration dynamique du CGRA en s’appuyant sur une banque de configurations généréepar une chaîne de programmation complète. Cette chaîne outillée repose sur un flot permettant de porter un code sous forme de Control and Data Flow Graph (CDFG) sur l’architecture en obtenant un grand nombre de configurations différentes et qui permet d’exploiter au mieux le potentiel de l’architecture.Les travaux, qui ont été validés aux travers d’expériences sur des applications du domaine du traitement du signal et de l’image, ont fait l’objet de publications en conférences internationales et de dépôts de brevets

    A Dynamically Scheduled HLS Flow in MLIR

    In High-Level Synthesis (HLS), we consider abstractions that span from software to hardware and target heterogeneous architectures. Therefore, managing the complexity introduced by this is key to implementing good, maintainable, and extendible HLS compilers. Traditionally, HLS flows have been built on top of software compilation infrastructure such as LLVM, with hardware aspects of the flow existing peripherally to the core of the compiler. Through this work, we aim to show that MLIR, a compiler infrastructure with a focus on domain-specific intermediate representations (IR), is a better infrastructure for HLS compilers. Using MLIR, we define HLS and hardware abstractions as first-class citizens of the compiler, simplifying analysis, transformations, and optimization. To demonstrate this, we present a C-to-RTL, dynamically scheduled HLS flow. We find that our flow generates circuits comparable to those of an equivalent LLVM-based HLS compiler. Notably, we achieve this while lacking key optimization passes typically found in HLS compilers and through the use of an experimental front-end. To this end, we show that significant improvements in the generated RTL are but low-hanging fruit, requiring engineering effort to attain. We believe that our flow is more modular and more extendible than comparable open-source HLS compilers and is thus a good candidate as a basis for future research. Apart from the core HLS flow, we provide MLIR-based tooling for C-to-RTL cosimulation and visual debugging, with the ultimate goal of building an MLIR-based HLS infrastructure that will drive innovation in the field

    A High-Frequency Load-Store Queue with Speculative Allocations for High-Level Synthesis

    Dynamically scheduled high-level synthesis (HLS) enables the use of load-store queues (LSQs) which can disambiguate data hazards at circuit runtime, increasing throughput in codes with unpredictable memory accesses. However, the increased throughput comes at the price of lower clock frequency and higher resource usage compared to statically scheduled circuits without LSQs. The lower frequency often nullifies any throughput improvements over static scheduling, while the resource usage becomes prohibitively expensive with large queue sizes. This paper presents a method for achieving dynamically scheduled memory operations in HLS without significant clock period and resource usage increase. We present a novel LSQ based on shift-registers enabled by the opportunity to specialize queue sizes to a target code in HLS. We show a method to speculatively allocate addresses to our LSQ, significantly increasing pipeline parallelism in codes that could not benefit from an LSQ before. In stark contrast to traditional load value speculation, we do not require pipeline replays and have no overhead on misspeculation. On a set of benchmarks with data hazards, our approach achieves an average speedup of 11Ă—\times against static HLS and 5Ă—\times against dynamic HLS that uses a state of the art LSQ from previous work. Our LSQ also uses several times fewer resources, scaling to queues with hundreds of entries, and supports both on-chip and off-chip memory.Comment: To appear in the International Conference on Field Programmable Technology (FPT'23), Yokohama, Japan, 11-14 December 202