Search CORE

14 research outputs found

LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing

Author: Alvarez Carlos
Bautista Leonardo
Becker Tobias
Billung-Meyer Gunnar
Carpenter Paul
Christmann Wolfgang
Cristal Adrian
De La Cruz Raul
Dubhashi Devdatt
Etsion Yoav
Felber Pascal
Fetzer Christof
Gaydadjiev Georgi
Göttel Christian
Hadar Elad
Hagemeyer Jens
Jimenez Daniel
Jungeblut Thorsten
Kaiser Martin
Klawonn Frank
Krupop Stefan
Kucza Nils
Madonar Sergi
Martorell Xavier
Mihklafi Amani
Mudge Trevor
Mudge Trevor
Pasin Marcelo
Pericàs Miquel
Pnevmatikatos Dionisios N.
Porrmann Mario
Port Oron
Rocha Isabelly
Salami Behzad
Salomonsson Hans
Schiavoni Valerio
Trancoso Pedro
Unsal Osman S.
vor dem Berge Micha
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Chalmers Research

Publications at Bielefeld University

LEGaTO: towards energy-efficient, secure, fault-tolerant toolset for heterogeneous computing

Author: Alvarez Carlos
Bautista Leonardo
Becker Tobias
Billung-Meyer Gunnar
Carpenter Paul
Christmann Wolfgang
Cristal Adrian
De La Cruz Raul
Dubhashi Devdatt
Etsion Yoav
Felber Pascal
Fetzer Christof
Gaydadjiev Georgi
Göttel Christian
Hagemeyer Jens
Jimenez Daniel
Jungeblut Thorsten
Kaeli David
Kaiser Martin
Klawonn Frank
Krupop Stefan
Kucza Nils
Madonar Sergi
Martorell Xavier
Mihklafi Amani
Nowack Vesna
Pasin Marcelo
Pericàs Miquel
Pericàs Miquel
Porrmann Mario
Port Oron
Rocha Isabelly
Salami Behzad
Salomonsson Hans
Schiavoni Valerio
Trancoso Pedro
Unsal Osman S.
vor dem Berge Micha
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

UPCommons. Portal del coneixement obert de la UPC

Chalmers Research

Publications at Bielefeld University

Evaluation of interconnect fabrics for an embedded MPSoC in 28 nm FD-SOI

Author: Ax Johannes
Flasskamp Martin
Jungeblut Thorsten
Kelly Wayne
Kucza Nils
Porrmann Mario
Ruckert Ulrich
Sievers Gregor
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

Embedded many-core architectures contain dozens to hundreds of CPU cores that are connected via a highly scalable NoC interconnect. Our Multiprocessor-System-on-Chip CoreVAMPSoC combines the advantages of tightly coupled bus-based communication with the scalability of NoC approaches by adding a CPU cluster as an additional level of hierarchy. In this work, we analyze different cluster interconnect implementations with 8 to 32 CPUs and compare them in terms of resource requirements and performance to hierarchical NoCs approaches. Using 28nm FD-SOI technology the area requirement for 32 CPUs and AXI crossbar is 5.59mm2 including 23.61% for the interconnect at a clock frequency of 830 MHz. In comparison, a hierarchical MPSoC with 4 CPU cluster and 8 CPUs in each cluster requires only 4.83mm2 including 11.61% for the interconnect. To evaluate the performance, we use a compiler for streaming applications to map programs to the different MPSoC configurations. We use this approach for a design-space exploration to find the most efficient architecture and partitioning for an application

Queensland University of Technology ePrints Archive

Comparing synchronous, mesochronous and asynchronous NoCs for GALS based MPSoC

Author: Ax Johannes
Jungeblut Thorsten
Kucza Nils
Porrmann Mario
Rückert Ulrich
Vohrmann Marten
Publication venue
Publication date: 01/01/2017
Field of study

Ax J, Kucza N, Vohrmann M, Jungeblut T, Porrmann M, Rückert U. Comparing synchronous, mesochronous and asynchronous NoCs for GALS based MPSoC. In: IEEE 11th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-17). Accepted

Publications at Bielefeld University

Asynchronous network-on-chips (NoCs) for resource efficient many core architectures

Author: Ax Johannes
Di Jia
Jungeblut Thorsten
Kucza Nils
Porrmann Mario
Rückert Ulrich
Smith Scott C.
Publication venue: Institution of Engineering and Technology (IET)
Publication date: 01/01/2019
Field of study

Ax J, Kucza N, Porrmann M, Rückert U, Jungeblut T. Asynchronous network-on-chips (NoCs) for resource efficient many core architectures. In: Di J, Smith SC, eds. Asynchronous Circuit Applications. Institution of Engineering and Technology (IET); 2019: 173-197.In this chapter, different GALS approaches for the implementation of embedded NoC architectures were presented. The GALS approach allows for the reduction of the resource requirements at an increased scalability of the NoC without sacrificing performance. The three approaches of synchronous, mesochronous, and asynchronous NoCs were compared. For the mesochronous NoC special synchronizers between the links were implemented. For the asynchronous NoC, the routers were completely realized as an asynchronous circuits. The results have shown that modern design methods (CCOpt design fl ow) allow a good scaling of MPSoCs even for synchronous NoCs. Nevertheless, the asynchronous NoC showed lower area and energy requirement compared to the mesochronous and synchronous implementation, while still providing a comparable performance. When comparing a place and route of an MPSoC, the asynchronous NoC leads to 3.1% less area requirements. The power consumption of an asynchronous router is only 22.4% (0.94 mW in idle state) or 53% (3.94 mW during communication) of the power consumption of a clock -based router. In the last section of the chapter, the global clock tree for an MPSoC with 256 CPUs was examined. The synchronous and mesochronous NoC show almost the same power consumption of about 7.7 mW. Using the asynchronous NoC reduces the power consumption by about 25% (5.78 mW). In addition, the mesochronous and asynchronous variants achieve a 2.6% higher clock frequency

Publications at Bielefeld University

Evaluation of Interconnect Fabrics for an Embedded MPSoC in 28 nm FD-SOI

Author: Ax Johannes
Flasskamp Martin
Jungeblut Thorsten
Kelly Wayne
Kucza Nils
Porrmann Mario
Rückert Ulrich
Sievers Gregor
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

Sievers G, Ax J, Kucza N, et al. Evaluation of Interconnect Fabrics for an Embedded MPSoC in 28 nm FD-SOI. In: 2015 IEEE International Symposium on Circuits & Systems (ISCAS). IEEE; 2015: 1925-1928.Embedded many-core architectures contain dozens to hundreds of CPU cores that are connected via a highly scalable NoC interconnect. Our Multiprocessor-System-on-Chip CoreVA-MPSoC combines the advantages of tightly coupled bus-based communication with the scalability of NoC approaches by adding a CPU cluster as an additional level of hierarchy. In this work, we analyze different cluster interconnect implementations with 8 to 32 CPUs and compare them in terms of resource requirements and performance to hierarchical NoCs approaches. Using 28 nm FD-SOI technology the area requirement for 32 CPUs and AXI crossbar is 5.59 mm2 including 23.61% for the interconnect at a clock frequency of 830 MHz. In comparison, a hierarchical MPSoC with 4 CPU cluster and 8 CPUs in each cluster requires only 4.83 mm2 including 11.61% for the interconnect. To evaluate the performance, we use a compiler for streaming applications to map programs to the different MPSoC configurations. We use this approach for a design-space exploration to find the most efficient architecture and partitioning for an application

Publications at Bielefeld University

Evaluation of heterogeneous AIoT Accelerators within VEDLIoT

Author: Azhar Muhammad Waqar
Flottmann M.
Griessl R.
Gugala K.
Hagemeyer Jens
Kaiser Martin
Kucza Nils
Latosinski G.
Mika K.
Mohammad Qararyah Fareed
Odman D.
Petersen Moura Trancoso Pedro
Porrmann F.
Porrmann Mario
Tassemeier M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2023
Field of study

Within VEDLIoT, a project targeting the development of energy-efficient Deep Learning for distributed AIoT applications, several accelerator platforms based on technologies like CPUs, embedded GPUs, FPGAs, or specialized ASICs are evaluated. The VEDLIoT approach is based on modular and scalable cognitive IoT hardware platforms. Modular microserver technology enables the integration of different, heterogeneous accelerators into one platform. Benchmarking of the different accelerators takes into account performance, energy efficiency and accuracy. The results in this paper provide a solid overview regarding available accelerator solutions and provide guidance for hardware selection for AIoT applications from far edge to cloud. VEDLIoT is an H2020 EU project which started in November 2020. It is currently in an intermediate stage. The focus is on the considerations of the performance and energy efficiency of hardware accelerators. Apart from the hardware and accelerator focus presented in this paper, the project also covers toolchain, security and safety aspects. The resulting technology is tested on a wide range of AIoT applications

Chalmers Research

Publications at Bielefeld University

A Scalable, Heterogeneous Hardware Platform for Accelerated AIoT based on Microservers

Author: Azhar Muhammad Waqar
Flottmann M.
Griessl Ren\ue9
Gugala K.
Hagemeyer Jens
Kaiser Martin
Kucza Nils
Latosinski G.
Mika K.
Mohammad Qararyah Fareed
Odman D.
Petersen Moura Trancoso Pedro
Porrmann Florian
Porrmann Mario
Tassemeier M.
Publication venue
Publication date: 01/01/2023
Field of study

Performance and energy efficiency are key aspects of next-generation AIoT\ua0hardware. This chapter presents a scalable, heterogeneous hardware platform for accelerated AIoT based on microserver technology. It integrates several\ua0accelerator platforms based on technologies like CPUs, embedded GPUs,\ua0FPGAs, or specialized ASICs, supporting the full range of the cloud−edgeIoT continuum. The modular microserver approach enables the integrationof different, heterogeneous accelerators into one platform. Benchmarking\ua0the various accelerators takes performance, energy efficiency, and accuracy\ua0into account. The results provide a solid overview of available acceleratorsolutions and guide hardware selection for AIoT applications from the far\ua0edge to the cloud

Chalmers Research

VEDLIoT: Next generation accelerated AIoT systems and applications

Author: Azhar Muhammad Waqar
Brunnegard Oliver
Eriksson Olof
Felber Pascal
Griessl Ren\ue9
Hagemeyer Jens
Kaiser Martin
Kucza Nils
M\ue9n\ue9trey J\ue4mes
Marcus Carina
Mika Kevin
Mohammad Qararyah Fareed
Pasin Marcelo
Petersen Moura Trancoso Pedro
Porrmann Florian
Salomonsson Hans
Tigges Lennart
Zouzoula Stavroula
Publication venue
Publication date: 01/01/2023
Field of study

The VEDLIoT project aims to develop energy-efficient Deep Learning methodologies for distributed Artificial Intelligence of Things (AIoT) applications. During our project, we propose a holistic approach that focuses on optimizing algorithms while addressing safety and security challenges inherent to AIoT systems. The foundation of this approach lies in a modular and scalable cognitive IoT hardware platform, which leverages microserver technology to enable users to configure the hardware to meet the requirements of a diverse array of applications. Heterogeneous computing is used to boost performance and energy efficiency. In addition, the full spectrum of hardware accelerators is integrated, providing specialized ASICs as well as FPGAs for reconfigurable computing. The project\u27s contributions span across trusted computing, remote attestation, and secure execution environments, with the ultimate goal of facilitating the design and deployment of robust and efficient AIoT systems. The overall architecture is validated on use-cases ranging from Smart Home to Automotive and Industrial IoT appliances. Ten additional use cases are integrated via an open call, broadening the range of application areas

Chalmers Research

VEDLIoT. Next generation accelerated AIoT systems and applications

Mika K, Griessl R, Kucza N, et al. VEDLIoT. Next generation accelerated AIoT systems and applications. In: CF '23: Proceedings of the 20th ACM International Conference on Computing Frontiers. New York, NY: ACM; 2023: 291-296

Publications at Bielefeld University