2,087 research outputs found
An Efficient Thread Mapping Strategy for Multiprogramming on Manycore Processors
The emergence of multicore and manycore processors is set to change the
parallel computing world. Applications are shifting towards increased
parallelism in order to utilise these architectures efficiently. This leads to
a situation where every application creates its desirable number of threads,
based on its parallel nature and the system resources allowance. Task
scheduling in such a multithreaded multiprogramming environment is a
significant challenge. In task scheduling, not only the order of the execution,
but also the mapping of threads to the execution resources is of a great
importance. In this paper we state and discuss some fundamental rules based on
results obtained from selected applications of the BOTS benchmarks on the
64-core TILEPro64 processor. We demonstrate how previously efficient mapping
policies such as those of the SMP Linux scheduler become inefficient when the
number of threads and cores grows. We propose a novel, low-overhead technique,
a heuristic based on the amount of time spent by each CPU doing some useful
work, to fairly distribute the workloads amongst the cores in a
multiprogramming environment. Our novel approach could be implemented as a
pragma similar to those in the new task-based OpenMP versions, or can be
incorporated as a distributed thread mapping mechanism in future manycore
programming frameworks. We show that our thread mapping scheme can outperform
the native GNU/Linux thread scheduler in both single-programming and
multiprogramming environments.Comment: ParCo Conference, Munich, Germany, 201
RPPM : Rapid Performance Prediction of Multithreaded workloads on multicore processors
Analytical performance modeling is a useful complement to detailed cycle-level simulation to quickly explore the design space in an early design stage. Mechanistic analytical modeling is particularly interesting as it provides deep insight and does not require expensive offline profiling as empirical modeling. Previous work in mechanistic analytical modeling, unfortunately, is limited to single-threaded applications running on single-core processors.
This work proposes RPPM, a mechanistic analytical performance model for multi-threaded applications on multicore hardware. RPPM collects microarchitecture-independent characteristics of a multi-threaded workload to predict performance on a previously unseen multicore architecture. The profile needs to be collected only once to predict a range of processor architectures. We evaluate RPPM's accuracy against simulation and report a performance prediction error of 11.2% on average (23% max). We demonstrate RPPM's usefulness for conducting design space exploration experiments as well as for analyzing parallel application performance
Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures
This is a post-peer-review, pre-copyedit version of an article published in Lecture Notes in Computer Science. The final authenticated version is available online at: https://doi.org/10.1007/978-3-642-03770-2_24[Abstract] The current trend to multicore architectures underscores the need of parallelism. While new languages and alternatives for supporting more efficiently these systems are proposed, MPI faces this new challenge. Therefore, up-to-date performance evaluations of current options for programming multicore systems are needed. This paper evaluates MPI performance against Unified Parallel C (UPC) and OpenMP on multicore architectures. From the analysis of the results, it can be concluded that MPI is generally the best choice on multicore systems with both shared and hybrid shared/distributed memory, as it takes the highest advantage of data locality, the key factor for performance in these systems. Regarding UPC, although it exploits efficiently the data layout in memory, it suffers from remote shared memory accesses, whereas OpenMP usually lacks efficient data locality support and is restricted to shared memory systems, which limits its scalability.Gobierno de España; TIN2007-67537-C03-0
- …