Search CORE

8 research outputs found

Prototyping Methodologies and Design of Communication-centric Heterogeneous Many-core Architectures

Author: Masing Leonard Jannik
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2020
Field of study

Architecture FPGA améliorée et flot de conception pour une reconfiguration matérielle en ligne efficace

Author: Huriaux Christophe
Publication venue: HAL CCSD
Publication date: 02/12/2015
Field of study

The self-reconfiguration capabilities of modern FPGA architectures pave the way for dynamic applications able to adapt to transient events. The CAD flows of modern architectures are nowadays mature but limited by the constraints induced by the complexity of FPGA circuits. In this thesis, multiple contributions are developed to propose an FPGA architecture supporting the dynamic placement of hardware tasks. First, an intermediate representation of these tasks configuration data, independent from their final position, is presented. This representation allows to compress the task data up to 11x with regard to its conventional raw counterpart. An accompanying CAD flow, based on state-of-the-art tools, is proposed to generate relocatable tasks from a high-level description. Then, the online behavior of this mechanism is studied. Two algorithms allowing to decode and create in real-time the conventional bit-stream are described. In addition, an enhancement of the FPGA interconnection network is proposedto increase the placement flexibility of heterogeneous tasks, at the cost of a 10% increase in average of the critical path delay. Eventually, a configurable substitute to the configuration memory found in FPGAs is studied to ease their partial reconfiguration.Les capacités d'auto-reconfiguration des architectures FPGA modernes ouvrent la voie à des applications dynamiques capables d'adapter leur fonctionnement pour répondre à des évènements ponctuels. Les flots de reconfiguration des architectures commerciales sont aujourd'hui aboutis mais limités par des contraintes inhérentes à la complexité de ces circuits. Dans cette thèse, plusieurs contributions sont avancées afin de proposer une architecture FPGA reconfigurable permettant le placement dynamique de tâches matérielles. Dans un premier temps, une représentation intermédiaire des données de configuration de ces tâches, indépendante de leur positionnement final, est présentée. Cette représentation permet notamment d'atteindre des taux de compression allant jusqu'à 11x par rapport à la représentation brute d'une tâche. Un flot de conception basé sur des outils de l'état de l'art accompagne cette représentation et génère des tâches relogeables à partir d'une description haut-niveau. Ensuite, le comportement en ligne de ce mécanisme est étudié. Deux algorithmes permettant le décodage de ces tâches et la génération en temps-réel des données de configuration propres à l'architectures son décrits. Par ailleurs, une amélioration du réseau d'interconnexion d'une architecture FPGA est proposée pour accroître la flexibilité du placement de tâches hétérogènes, avec une augmentation de 10% en moyenne du délai du chemin critique. Enfin, une alternative programmable aux mémoires de configuration de ces circuits est étudiée pour faciliter leur reconfiguration partielle

HAL-CentraleSupelec

Thèses en Ligne

INRIA a CCSD electronic archive server

HAL-Inserm

HAL-Rennes 1

An FPGA implementation of an investigative many-core processor, Fynbos : in support of a Fortran autoparallelising software pipeline

Author: Wyngaard Janet Ruth
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2014
Field of study

Includes bibliographical references.In light of the power, memory, ILP, and utilisation walls facing the computing industry, this work examines the hypothetical many-core approach to finding greater compute performance and efficiency. In order to achieve greater efficiency in an environment in which Moore’s law continues but TDP has been capped, a means of deriving performance from dark and dim silicon is needed. The many-core hypothesis is one approach to exploiting these available transistors efficiently. As understood in this work, it involves trading in hardware control complexity for hundreds to thousands of parallel simple processing elements, and operating at a clock speed sufficiently low as to allow the efficiency gains of near threshold voltage operation. Performance is there- fore dependant on exploiting a new degree of fine-grained parallelism such as is currently only found in GPGPUs, but in a manner that is not as restrictive in application domain range. While removing the complex control hardware of traditional CPUs provides space for more arithmetic hardware, a basic level of control is still required. For a number of reasons this work chooses to replace this control largely with static scheduling. This pushes the burden of control primarily to the software and specifically the compiler, rather not to the programmer or to an application specific means of control simplification. An existing legacy tool chain capable of autoparallelising sequential Fortran code to the degree of parallelism necessary for many-core exists. This work implements a many-core architecture to match it. Prototyping the design on an FPGA, it is possible to examine the real world performance of the compiler-architecture system to a greater degree than simulation only would allow. Comparing theoretical peak performance and real performance in a case study application, the system is found to be more efficient than any other reviewed, but to also significantly under perform relative to current competing architectures. This failing is apportioned to taking the need for simple hardware too far, and an inability to implement static scheduling mitigating tactics due to lack of support for such in the compiler

Cape Town University OpenUCT

Support des communications dans des architectures multicœurs par l’intermédiaire de mécanismes matériels et d’interfaces de programmation standardisées

Author: Rosa Thiago Raupp Da
Publication venue: HAL CCSD
Publication date: 08/04/2016
Field of study

The application constraints driving the design of embedded systems are constantly demanding higher performance and power efficiency. To meet these constraints, current SoC platforms rely on replicating several processing cores while adding dedicated hardware accelerators to handle specific tasks. However, developing embedded applications is becoming a key challenge, since applications workload will continue to grow and the software technologies are not evolving as fast as hardware architectures, leaving a gap in the full system design. Indeed, the increased programming complexity can be associated to the lack of software standards that supports heterogeneity, frequently leading to custom solutions. On the other hand, implementing a standard software solution for embedded systems might induce significant performance and memory usage overheads. Therefore, this Thesis focus on decreasing this gap by implementing hardware mechanisms in co-design with a standard programming interface for embedded systems. The main objectives are to increase programmability through the implementation of a standardized communication application programming interface (MCAPI), and decrease the overheads imposed by the software implementation through the use of the developed hardware mechanisms.The contributions of the Thesis comprise the implementation of MCAPI for a generic multi-core platform and dedicated hardware mechanisms to improve communication connection phase and overall performance of data transfer phase. It is demonstrated that the proposed mechanisms can be exploited by the software implementation without increasing software complexity. Furthermore, performance estimations obtained using a SystemC/TLM simulation model for the reference multi-core architecture show that the proposed mechanisms provide significant gains in terms of latency (up to 97%), throughput (40x increase) and network traffic (up to 68%) while reducing processor workload for both characterization test-cases and real application benchmarks.L’évolution des contraintes applicatives imposent des améliorations continues sur les performances et l’efficacité énergétique des systèmes embarqués. Pour répondre à ces contraintes, les plateformes « SoC » actuelles s’appuient sur la multiplication des cœurs de calcul, tout en ajoutant des accélérateurs matériels dédiés pour gérer des tâches spécifiques. Dans ce contexte, développer des applications embarquées devient un défi complexe, en effet la charge de travail des applications continue à croître alors que les technologies logicielles n’évoluent pas aussi vite que les architectures matérielles, laissant un écart dans la conception complète du système. De fait, la complexité accrue de programmation peut être associée à l’absence de standards logiciels qui prennent en charge l’hétérogénéité des architectures, menant souvent à des solutions ad hoc. A l’opposé, l’utilisation d’une solution logicielle standardisée pour les systèmes embarqués peut induire des surcoûts importants concernant les performances et l’occupation de la mémoire si elle n’est pas adaptée à l’architecture. Par conséquent, le travail de cette thèse se concentre sur la réduction de cet écart en mettant en œuvre des mécanismes matériels dont la conception prend en compte une interface de programmation standard pour systèmes embarqués. Les principaux objectifs sont ainsi d’accroître la programmabilité par la mise en œuvre d’une interface de programmation : MCAPI, et de diminuer la charge logiciel des cœurs grâce à l’utilisation des mécanismes matériels développés.Les contributions de la thèse comprennent la mise en œuvre de MCAPI pour une plate-forme multicœur générique et des mécanismes matériels pour améliorer la performance globale de la configuration de la communication et des transferts de données. Il est démontré que les mécanismes peuvent être pris en charge par les interfaces logicielles sans augmenter leur complexité. En outre, les résultats de performance obtenus en utilisant un modèle SystemC/TLM de l’architecture multicœurs de référence montrent que les mécanismes proposés apportent des gains significatifs en termes de latence, débit, trafic réseau, temps de charge processeur et temps de communication sur des cas d’étude et des applications complètes

Thèses en Ligne

Hal - Université Grenoble Alpes

HAL-CEA

A Modular Platform for Adaptive Heterogeneous Many-Core Architectures

Author: Atef Ahmed Kamaleldin
Publication venue
Publication date: 18/12/2023
Field of study

Multi-/many-core heterogeneous architectures are shaping current and upcoming generations of compute-centric platforms which are widely used starting from mobile and wearable devices to high-performance cloud computing servers. Heterogeneous many-core architectures sought to achieve an order of magnitude higher energy efficiency as well as computing performance scaling by replacing homogeneous and power-hungry general-purpose processors with multiple heterogeneous compute units supporting multiple core types and domain-specific accelerators. Drifting from homogeneous architectures to complex heterogeneous systems is heavily adopted by chip designers and the silicon industry for more than a decade. Recent silicon chips are based on a heterogeneous SoC which combines a scalable number of heterogeneous processing units from different types (e.g. CPU, GPU, custom accelerator). This shifting in computing paradigm is associated with several system-level design challenges related to the integration and communication between a highly scalable number of heterogeneous compute units as well as SoC peripherals and storage units. Moreover, the increasing design complexities make the production of heterogeneous SoC chips a monopoly for only big market players due to the increasing development and design costs. Accordingly, recent initiatives towards agile hardware development open-source tools and microarchitecture aim to democratize silicon chip production for academic and commercial usage. Agile hardware development aims to reduce development costs by providing an ecosystem for open-source hardware microarchitectures and hardware design processes. Therefore, heterogeneous many-core development and customization will be relatively less complex and less time-consuming than conventional design process methods. In order to provide a modular and agile many-core development approach, this dissertation proposes a development platform for heterogeneous and self-adaptive many-core architectures consisting of a scalable number of heterogeneous tiles that maintain design regularity features while supporting heterogeneity. The proposed platform hides the integration complexities by supporting modular tile architectures for general-purpose processing cores supporting multi-instruction set architectures (multi-ISAs) and custom hardware accelerators. By leveraging field-programmable-gate-arrays (FPGAs), the self-adaptive feature of the many-core platform can be achieved by using dynamic and partial reconfiguration (DPR) techniques. This dissertation realizes the proposed modular and adaptive heterogeneous many-core platform through three main contributions. The first contribution proposes and realizes a many-core architecture for heterogeneous ISAs. It provides a modular and reusable tilebased architecture for several heterogeneous ISAs based on open-source RISC-V ISA. The modular tile-based architecture features a configurable number of processing cores with different RISC-V ISAs and different memory hierarchies. To increase the level of heterogeneity to support the integration of custom hardware accelerators, a novel hybrid memory/accelerator tile architecture is developed and realized as the second contribution. The hybrid tile is a modular and reusable tile that can be configured at run-time to operate as a scratchpad shared memory between compute tiles or as an accelerator tile hosting a local hardware accelerator logic. The hybrid tile is designed and implemented to be seamlessly integrated into the proposed tile-based platform. The third contribution deals with the self-adaptation features by providing a reconfiguration management approach to internally control the DPR process through processing cores (RISC-V based). The internal reconfiguration process relies on a novel DPR controller targeting FPGA design flow for RISC-V-based SoC to change the types and functionalities of compute tiles at run-time

Technische Universität Dresden: Qucosa

Dynamic Command Scheduling for Real-Time Memory Controllers

Author: Akesson K.B.
Goossens K.G.W.
Li Y.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Memory controller design is challenging as real-time embedded systems feature an increasing diversity of real-time and non-real-time applications with variable transaction sizes. To satisfy the requirements of the applications, tight bounds on the worst-case execution time (WCET) of memory transactions must be provided to real-time applications, while the lowest possible average execution time must be given to the rest. Existing real-time memory controllers cannot efficiently achieve this goal as they either bound the WCET by sacrificing the average execution time, or are not scalable to directly support variable transaction sizes, or both. In this paper, we propose to use dynamic command scheduling, which is capable of efficiently dealing with transactions with variable sizes. The three main contributions of this paper are: 1) a back-end architecture for a real-time memory controller with a dynamic command scheduling algorithm, 2) a formalization of the timings of the memory transactions for the proposed architecture and algorithm, and 3) two techniques to bound the WCET of transactions with both fixed and variable sizes, respectively. We experimentally evaluate the proposed memory controller and compare both the worst-case and average-case execution times of transactions to a state-of-the-art semi-static approach. The results demonstrate that dynamic command scheduling outperforms the semi-static approach by 33.4% in the average case and performs at least equally well in the worst case. We also show the WCET is tight for transactions with fixed and variable sizes, respectively

Crossref

Cognition and Interaction: From Computers to Smart Objects and Autonomous Agents

Author: Rapp Amon
Tirassa Maurizio
Tom Ziemke
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2019
Field of study

Institutional Research Information System University of Turin

Designing Applications for Heterogeneous Many-Core Architectures with the FlexTiles Platform

Author: Becker Jürgen
Courtay Antoine
Duhem François
Goossens Kees
Huriaux Christophe
Hübner Michael
Janßen Benedikt
Koedam Martijn
Lemonnier Fabrice
Masing Leonard
Millet Philippe
Schwiegelshohn Fynn
Sentieys Olivier
Werner Stephan
Wheatley Emilie
Publication venue: HAL CCSD
Publication date: 20/07/2015
Field of study

International audienceThe FlexTiles Platform has been developed within a Seventh Framework Programme project which is co-funded by the European Union with ten participants of five countries. It aims to create a self-adaptive heterogeneous many-core architecture which is able to dynamically manage load balancing, power consumption and faulty modules. Its focus is to make the architecture efficient and to keep programming effort low. Therefore, the concept contains a dedicated automated tool-flow for creating both the hardware and the software, a simulation platform that can execute the same binaries as the FPGA prototype and a virtualization layer to manage the final heterogeneous many-core architecture for run-time adaptability. With this approach software development productivity can be increased and thus, the time-to-market and development costs can be decreased. In this paper we present the FlexTiles Development Platform with a many-core architecture demonstration. The steps to implement, validate and integrate two use-cases are discussed

HAL-CentraleSupelec

Crossref

Pure OAI Repository

INRIA a CCSD electronic archive server

HAL-Rennes 1