Search CORE

6,828 research outputs found

Streamlining the OpenMP Programming Model on Ultra-Low-Power Multi-core MCUs

Author: Benini Luca
Garofalo Angelo
Montagna Fabio
Rossi Davide
Tagliavini Giuseppe
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Fast Shared-Memory Barrier Synchronization for a 1024-Cores RISC-V Many-Core Cluster

Author: Benini Luca
Bertuletti Marco
Riedel Samuel
Vanelli-Coralli Alessandro
Zhang Yichao
Publication venue
Publication date: 17/07/2023
Field of study

Synchronization is likely the most critical performance killer in shared-memory parallel programs. With the rise of multi-core and many-core processors, the relative impact on performance and energy overhead of synchronization is bound to grow. This paper focuses on barrier synchronization for TeraPool, a cluster of 1024 RISC-V processors with non-uniform memory access to a tightly coupled 4MB shared L1 data memory. We compare the synchronization strategies available in other multi-core and many-core clusters to identify the optimal native barrier kernel for TeraPool. We benchmark a set of optimized barrier implementations and evaluate their performance in the framework of the widespread fork-join Open-MP style programming model. We test parallel kernels from the signal-processing and telecommunications domain, achieving less than 10% synchronization overhead over the total runtime for problems that fit TeraPool's L1 memory. By fine-tuning our tree barriers, we achieve 1.6x speed-up with respect to a naive central counter barrier and just 6.2% overhead on a typical 5G application, including a challenging multistage synchronization kernel. To our knowledge, this is the first work where shared-memory barriers are used for the synchronization of a thousand processing elements tightly coupled to shared data memory.Comment: 15 pages, 7 figure

arXiv.org e-Print Archive

Parallel Performance of MPI Sorting Algorithms on Dual-Core Processor Windows-Based Systems

Author: Elnashar Alaa Ismail
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 30/05/2011
Field of study

Message Passing Interface (MPI) is widely used to implement parallel programs. Although Windowsbased architectures provide the facilities of parallel execution and multi-threading, little attention has been focused on using MPI on these platforms. In this paper we use the dual core Window-based platform to study the effect of parallel processes number and also the number of cores on the performance of three MPI parallel implementations for some sorting algorithms

arXiv.org e-Print Archive

CiteSeerX

Crossref