Search CORE

7 research outputs found

Mapping a guided image filter on the HARP reconfigurable architecture using OpenCL

Author: D'Hollander Erik
Faict Thomas
Goossens Bart
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

Intel recently introduced the Heterogeneous Architecture Research Platform, HARP. In this platform, the Central Processing Unit and a Field-Programmable Gate Array are connected through a high-bandwidth, low-latency interconnect and both share DRAM memory. For this platform, Open Computing Language (OpenCL), a High-Level Synthesis (HLS) language, is made available. By making use of HLS, a faster design cycle can be achieved compared to programming in a traditional hardware description language. This, however, comes at the cost of having less control over the hardware implementation. We will investigate how OpenCL can be applied to implement a real-time guided image filter on the HARP platform. In the first phase, the performance-critical parameters of the OpenCL programming model are defined using several specialized benchmarks. In a second phase, the guided image filter algorithm is implemented using the insights gained in the first phase. Both a floating-point and a fixed-point implementation were developed for this algorithm, based on a sliding window implementation. This resulted in a maximum floating-point performance of 135 GFLOPS, a maximum fixed-point performance of 430 GOPS and a throughput of HD color images at 74 frames per second

Ghent University Academic Bibliography

Memory models for heterogeneous systems

Author: Iorga Dan
Publication venue: Computing, Imperial College London
Publication date: 01/03/2023
Field of study

Heterogeneous systems, in which a CPU and an accelerator can execute together while sharing memory, are becoming popular in several computing sectors. Nowadays, programmers can split their computation into multiple specialised threads that can take advantage of each specialised component. FPGAs are popular accelerators with configurable logic for various tasks, and hardware manufacturers are developing platforms with tightly integrated multicore CPUs and FPGAs. In such tightly integrated platforms, the CPU threads and the FPGA threads access shared memory locations in a fine-grained manner. However, architectural optimisations will lead to instructions being observed out of order by different cores. The programmers must consider these reorderings for correct program executions. Memory models can aid in reasoning about these complex systems since they can be used to explore guarantees regarding the systems' behaviours. These models are helpful for low-level programmers, compiler writers, and designers of analysis tools. Memory models are specified according to two main paradigms: operational and axiomatic. An operational model is an abstract representation of the actual machine, described by states that represent idealised components such as buffers and queues, and the legal transitions between these states. Axiomatic models define relations between memory accesses to constrain the allowed and disallowed behaviours. This dissertation makes the following main contributions: an operational model of a CPU/FPGA system, an axiomatic one and an exploration of simulation techniques for operational models. The operational model is implemented in C and validated using all the behaviours described in the available documentation. We will see how the ambiguities from the documentation can be clarified by running tests on the hardware and consulting with the designers. Finally, to demonstrate the model's utility, we reason about a producer/consumer buffer implemented across the CPU and the FPGA. The simulation of axiomatic models can be orders of magnitude faster than operational models. For this reason, we also provide an axiomatic version of the memory model. This model allows us to generate small concurrent programs to reveal whether a specific memory model behaviour can occur. However, synthesising a single test for the FPGA requires significant time and prevents us from directly running many tests. To overcome this issue, we develop a soft-core processor that allows us to quickly run large numbers of such tests and gain higher confidence in the accuracy of our models. The simulation of the operational model faces a path-explosion problem that limits the exploration of large models. Observing that program analysis tools tackle a similar path-explosion problem, we investigate the idea of reducing the decision problem of ``whether a given memory model allows a given behaviour'' to the decision problem of ``whether a given C program is safe'', which can be handled by a variety of off-the-shelf tools. Using this approach, we can simulate our model more deeply and gain more confidence in its accuracy.Open Acces

Spiral - Imperial College Digital Repository

Analyse de faisabilité de l'implantation d'un protocole de communication sur processeur multicoeurs

Author: Gémieux Michel
Publication venue
Publication date: 01/04/2015
Field of study

RÉSUMÉ Les travaux de ce mémoire s’inscrivent dans le cadre d’un projet qui fait l’objet d’un parrainage industriel. Les résultats visent à comprendre le comportement d’un système de traitement opérant dans des contextes précis. Nous situons ce projet à l’intersection des principes d’ordonnancements de tâches, des systèmes d’exécution, de la virtualisation de fonctions de réseaux et surtout les contraintes associées à la virtualisation d’une pile de protocole LTE (Long Term Evolution), la norme de téléphonie cellulaire la plus en vue en ce moment. Une revue de littérature est proposée pour expliquer en détail les concepts vus plus haut, afin d’avoir une idée précise de la situation de test. D’abord, une étude des grappes d’unités de traitement temps réel est effectuée dans l’optique de l’implémentation de ce qu’il est convenu d’appeler un Cloud Radio Area Network (C-RAN), qui supporte sur une plateforme infonuagique l’électronique qui effectue le traitement de signal requis pour un point d’accès de téléphonie cellulaire. L’étude développée dans ce mémoire vise à évaluer les différents goulots d’étranglement qui peuvent survenir suite à la réception d’un paquet LTE au sein d’une trame CPRI (Common Public Radio Interface), jusqu’à l’envoi de ce paquet d’un serveur maitre jusqu’aux esclaves. Nous évaluons donc les latences et bandes passantes observées pour les différents protocoles composant la plateforme. Nous caractérisons notamment les communications CPRI des antennes vers le bassin de stations de base virtuelles, une communication de type Quick Path Interconnect (QPI) entre des cœurs de traitement et un réseau logique programmable de type FPGA, une communication dédiée point à point entre le FPGA et une carte NIC (Network Interface Card) pour finir avec l’envoi de trames Ethernet vers les serveurs esclaves. Cette étude nous permet de déduire que la virtualisation d’une pile LTE est viable sur une telle grappe de calcul temps réel.----------ABSTRACT The work performed as part of this Master thesis is done in the context of an industrially sponsored project. The objective is to understand the runtime behavior of a class of systems in specific contexts. We place this project at the intersection of the principles of task scheduling, runtimes, Network Functions Virtualisation (NFVs) and especially with the constraints associated with virtualization of an LTE (Long Term Evolution) stack that is the most prominent cellular telecommunication standard at the moment. A literature review is proposed to explain in detail the concepts discussed above, in order to have a clear idea of the target environment. First, a study of a real time processing cluster is carried out in relation to the implementation of the so-called Cloud Radio Area Network (C-RAN) that supports on a cloud platform all the electronics which performs the signal processing required for a cellular access point. The study developed in this paper is to evaluate the various bottlenecks that can occur following the receipt of an LTE packet within a Common Public Radio Interface (CPRI) frame, then as part of sending the package to a master server before routing it to the slaves. We evaluate the latencies and bandwidths observed for the different protocols used on the platform components. In particular, we characterize the CPRI communications from the antennas to the virtual base stations units, a Quick Path Interconnect (QPI) communication between processing cores and a programmable logic array in the type of a FPGA, a dedicated point to point communication between the FPGA and a NIC (Network Interface Card) to end with the sending Ethernet frames to the slave servers. This study allows us to infer that the virtualization of an LTE stack is viable on a real time computation cluster with the implied architecture. Then, to be able to validate the effectiveness of different scheduling algorithms, an emulation of a LTE Uplink stack virtualization will be made. Through a runtime called StarPU coupled with profiling tools, we deliver results to assess the need for dedicated thread or cores to manage tasks within a server

PolyPublie

Architecture matérielle logicielle pour l'exécution à latence réduite d'applications de télécommunications émergentes sur centre de données

Author: Gémieux Michel
Publication venue
Publication date: 01/04/2020
Field of study

RÉSUMÉ L’industrie des technologies de l’information et des communications fait face à une demande croissante de services sans fil et Internet omniprésents. Cette demande est alimentée par une explosion du nombre d’appareils mobiles riches en multimédia. Il a été estimé qu’à partir de cette année, 2020, le volume de trafic de données mobiles doublera chaque année pour plusieurs années. En conséquence, il en résulte une augmentation significative des dépenses en capital pour les systèmes construits sur les technologies actuelles de réseau d’accès ra-dio qui sont essentiellement basées sur des architectures avec une structure fixe utilisant des plates-formes propriétaires et des mécanismes de contrôle et de gestion de réseau distribués. D’autre part, pour garantir la qualité de service requise, les sous-systèmes sont dimensionnés en fonction des demandes de pointe. Par conséquent, l’extension du réseau aura un impact considérable sur les dépenses d’exploitation. La recherche proposée vise à développer une architecture matérielle et logicielle adaptée à une grappe d’unités de traitement virtualisée pour les signaux en bande de base d’accès radio en nuagique. Ce type d’architecture de-vra prendre en charge le traitement en temps réel avec des processeurs généralistes sur une plateforme hétérogène. Cela soulève deux défis principaux : la planification des tâches en temps réel et leur exécution d’une manière plus déterministe par rapport aux plates-formes généralistes existantes. Ainsi, les mécanismes d’allocation et de gestion des ressources dans les grappes informatiques doivent être revus. Le deuxième défi est d’obtenir un comporte-ment à faible variance qui implique deux préoccupations majeures : le temps de calcul et le délai de communication. Essentiellement, la variation du temps de calcul est inhérente à tous les processeurs généralistes. Néanmoins, l’infrastructure de communication des grappes informatiques existantes ne fournit aucun soutien pour les communications à faible variance. La recherche proposée est divisée en deux principaux sujets : Le calcul dynamique, l’allocation et la gestion des ressources réseau dans une grappeinformatique (hétérogène) : les algorithmes d’allocation dynamique des ressources et de planification des tâches en temps réel formeront la fonctionnalité de base prise en charge par le plan de contrôle. Afin de répondre aux fortes contraintes en temps réel de cette classe d’applications, une implémentation matérielle parallèle basée sur circuit logique programmable (FPGA) du plan de contrôle est proposée.----------ABSTRACT The Information and Communications Technology industry is facing an increasing demand for ubiquitous wireless and Internet services introduced by an explosion of multimedia-rich mobile devices. It is estimated that starting this year, 2020, the volume of mobile data traÿcs will double every year. Consequently, it results in significant increases of capital expenditures for systems built on the current Radio Access Network technologies, which are essentially based on architectures with a fixed structure (not reconfigurable) using proprietary platforms with distributed network control and management mechanisms. To ensure the required quality of service, subsystems are dimensioned with respect to the peak demands. Therefore, network expansion will considerably impact on operating expenditures. This thesis aims at developing an architecture at both hardware and software levels suitable for a virtualized Baseband Processing Unit pool in Cloud Radio Acces Network in order to support real-time processing in a General Purpose Processor based platform. This raises two main challenges: scheduling tasks in real-time and executing them in a manner that is reduces variance compared to the existing General Purpose Processor based platforms. Real-time tasks from radio air interface in the Cloud Radio Access Network must be scheduled at a finer grain and must be completed within a given timeslot. Thus, mechanisms for resource allocation and management in computing clusters must be revisited. The second challenge is obtaining a behavior with reduced variability that involves two major concerns: computing time and communication delay. Nevertheless, the communication infrastructure of existing computing clusters does not provide any support for low variance communications. The proposed research is divided into the following main subjects:Adaptive computing and network resource allocation and management in (hetero-geneous) computing clusters: The algorithms for dynamic resources allocation and real-time task scheduling will form the core functionality that the control plane will support. In order to meet the hard real-time constraints of that class of applications, a parallel Field Programable Gate Array based hardware implementation of the control plane is proposed

PolyPublie

Design Space Exploration and Resource Management of Multi/Many-Core Systems

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends

Directory of Open Access Books (DOAB)