14 research outputs found

    LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing

    Get PDF
    LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.Peer ReviewedPostprint (author's final draft

    LEGaTO: towards energy-efficient, secure, fault-tolerant toolset for heterogeneous computing

    Get PDF
    LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.Peer ReviewedPostprint (author's final draft

    Evaluation of interconnect fabrics for an embedded MPSoC in 28 nm FD-SOI

    Get PDF
    Embedded many-core architectures contain dozens to hundreds of CPU cores that are connected via a highly scalable NoC interconnect. Our Multiprocessor-System-on-Chip CoreVAMPSoC combines the advantages of tightly coupled bus-based communication with the scalability of NoC approaches by adding a CPU cluster as an additional level of hierarchy. In this work, we analyze different cluster interconnect implementations with 8 to 32 CPUs and compare them in terms of resource requirements and performance to hierarchical NoCs approaches. Using 28nm FD-SOI technology the area requirement for 32 CPUs and AXI crossbar is 5.59mm2 including 23.61% for the interconnect at a clock frequency of 830 MHz. In comparison, a hierarchical MPSoC with 4 CPU cluster and 8 CPUs in each cluster requires only 4.83mm2 including 11.61% for the interconnect. To evaluate the performance, we use a compiler for streaming applications to map programs to the different MPSoC configurations. We use this approach for a design-space exploration to find the most efficient architecture and partitioning for an application

    Comparing synchronous, mesochronous and asynchronous NoCs for GALS based MPSoC

    No full text
    Ax J, Kucza N, Vohrmann M, Jungeblut T, Porrmann M, RĂŒckert U. Comparing synchronous, mesochronous and asynchronous NoCs for GALS based MPSoC. In: IEEE 11th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-17). Accepted

    Asynchronous network-on-chips (NoCs) for resource efficient many core architectures

    No full text
    Ax J, Kucza N, Porrmann M, RĂŒckert U, Jungeblut T. Asynchronous network-on-chips (NoCs) for resource efficient many core architectures. In: Di J, Smith SC, eds. Asynchronous Circuit Applications. Institution of Engineering and Technology (IET); 2019: 173-197.In this chapter, different GALS approaches for the implementation of embedded NoC architectures were presented. The GALS approach allows for the reduction of the resource requirements at an increased scalability of the NoC without sacrificing performance. The three approaches of synchronous, mesochronous, and asynchronous NoCs were compared. For the mesochronous NoC special synchronizers between the links were implemented. For the asynchronous NoC, the routers were completely realized as an asynchronous circuits. The results have shown that modern design methods (CCOpt design fl ow) allow a good scaling of MPSoCs even for synchronous NoCs. Nevertheless, the asynchronous NoC showed lower area and energy requirement compared to the mesochronous and synchronous implementation, while still providing a comparable performance. When comparing a place and route of an MPSoC, the asynchronous NoC leads to 3.1% less area requirements. The power consumption of an asynchronous router is only 22.4% (0.94 mW in idle state) or 53% (3.94 mW during communication) of the power consumption of a clock -based router. In the last section of the chapter, the global clock tree for an MPSoC with 256 CPUs was examined. The synchronous and mesochronous NoC show almost the same power consumption of about 7.7 mW. Using the asynchronous NoC reduces the power consumption by about 25% (5.78 mW). In addition, the mesochronous and asynchronous variants achieve a 2.6% higher clock frequency

    Evaluation of Interconnect Fabrics for an Embedded MPSoC in 28 nm FD-SOI

    No full text
    Sievers G, Ax J, Kucza N, et al. Evaluation of Interconnect Fabrics for an Embedded MPSoC in 28 nm FD-SOI. In: 2015 IEEE International Symposium on Circuits & Systems (ISCAS). IEEE; 2015: 1925-1928.Embedded many-core architectures contain dozens to hundreds of CPU cores that are connected via a highly scalable NoC interconnect. Our Multiprocessor-System-on-Chip CoreVA-MPSoC combines the advantages of tightly coupled bus-based communication with the scalability of NoC approaches by adding a CPU cluster as an additional level of hierarchy. In this work, we analyze different cluster interconnect implementations with 8 to 32 CPUs and compare them in terms of resource requirements and performance to hierarchical NoCs approaches. Using 28 nm FD-SOI technology the area requirement for 32 CPUs and AXI crossbar is 5.59 mm2 including 23.61% for the interconnect at a clock frequency of 830 MHz. In comparison, a hierarchical MPSoC with 4 CPU cluster and 8 CPUs in each cluster requires only 4.83 mm2 including 11.61% for the interconnect. To evaluate the performance, we use a compiler for streaming applications to map programs to the different MPSoC configurations. We use this approach for a design-space exploration to find the most efficient architecture and partitioning for an application

    Evaluation of heterogeneous AIoT Accelerators within VEDLIoT

    No full text
    Within VEDLIoT, a project targeting the development of energy-efficient Deep Learning for distributed AIoT applications, several accelerator platforms based on technologies like CPUs, embedded GPUs, FPGAs, or specialized ASICs are evaluated. The VEDLIoT approach is based on modular and scalable cognitive IoT hardware platforms. Modular microserver technology enables the integration of different, heterogeneous accelerators into one platform. Benchmarking of the different accelerators takes into account performance, energy efficiency and accuracy. The results in this paper provide a solid overview regarding available accelerator solutions and provide guidance for hardware selection for AIoT applications from far edge to cloud. VEDLIoT is an H2020 EU project which started in November 2020. It is currently in an intermediate stage. The focus is on the considerations of the performance and energy efficiency of hardware accelerators. Apart from the hardware and accelerator focus presented in this paper, the project also covers toolchain, security and safety aspects. The resulting technology is tested on a wide range of AIoT applications

    A Scalable, Heterogeneous Hardware Platform for Accelerated AIoT based on Microservers

    No full text
    Performance and energy efficiency are key aspects of next-generation AIoT\ua0hardware. This chapter presents a scalable, heterogeneous hardware platform for accelerated AIoT based on microserver technology. It integrates several\ua0accelerator platforms based on technologies like CPUs, embedded GPUs,\ua0FPGAs, or specialized ASICs, supporting the full range of the cloud−edgeIoT continuum. The modular microserver approach enables the integrationof different, heterogeneous accelerators into one platform. Benchmarking\ua0the various accelerators takes performance, energy efficiency, and accuracy\ua0into account. The results provide a solid overview of available acceleratorsolutions and guide hardware selection for AIoT applications from the far\ua0edge to the cloud

    VEDLIoT: Next generation accelerated AIoT systems and applications

    No full text
    The VEDLIoT project aims to develop energy-efficient Deep Learning methodologies for distributed Artificial Intelligence of Things (AIoT) applications. During our project, we propose a holistic approach that focuses on optimizing algorithms while addressing safety and security challenges inherent to AIoT systems. The foundation of this approach lies in a modular and scalable cognitive IoT hardware platform, which leverages microserver technology to enable users to configure the hardware to meet the requirements of a diverse array of applications. Heterogeneous computing is used to boost performance and energy efficiency. In addition, the full spectrum of hardware accelerators is integrated, providing specialized ASICs as well as FPGAs for reconfigurable computing. The project\u27s contributions span across trusted computing, remote attestation, and secure execution environments, with the ultimate goal of facilitating the design and deployment of robust and efficient AIoT systems. The overall architecture is validated on use-cases ranging from Smart Home to Automotive and Industrial IoT appliances. Ten additional use cases are integrated via an open call, broadening the range of application areas
    corecore