Search CORE

22,531 research outputs found

CoreTSAR: Task Scheduling for Accelerator-aware Runtimes

Author: de Supinski Bronis R.
Feng Wu-chun
Rountree Barry
Scogland Thomas R. W.
Publication venue
Publication date: 01/01/2012
Field of study

Heterogeneous supercomputers that incorporate computational accelerators such as GPUs are increasingly popular due to their high peak performance, energy efficiency and comparatively low cost. Unfortunately, the programming models and frameworks designed to extract performance from all computational units still lack the flexibility of their CPU-only counterparts. Accelerated OpenMP improves this situation by supporting natural migration of OpenMP code from CPUs to a GPU. However, these implementations currently lose one of OpenMP’s best features, its flexibility: typical OpenMP applications can run on any number of CPUs. GPU implementations do not transparently employ multiple GPUs on a node or a mix of GPUs and CPUs. To address these shortcomings, we present CoreTSAR, our runtime library for dynamically scheduling tasks across heterogeneous resources, and propose straightforward extensions that incorporate this functionality into Accelerated OpenMP. We show that our approach can provide nearly linear speedup to four GPUs over only using CPUs or one GPU while increasing the overall flexibility of Accelerated OpenMP

Computer Science Technical Reports @Virginia Tech

An efficient MPI/OpenMP parallelization of the Hartree-Fock method for the second generation of Intel Xeon Phi processor

Author: Alexeev Yuri
D'mello Michael
Gordon Mark S.
Keipert Kristopher
Mironov Vladimir
Moskovsky Alexander
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/08/2017
Field of study

Modern OpenMP threading techniques are used to convert the MPI-only Hartree-Fock code in the GAMESS program to a hybrid MPI/OpenMP algorithm. Two separate implementations that differ by the sharing or replication of key data structures among threads are considered, density and Fock matrices. All implementations are benchmarked on a super-computer of 3,000 Intel Xeon Phi processors. With 64 cores per processor, scaling numbers are reported on up to 192,000 cores. The hybrid MPI/OpenMP implementation reduces the memory footprint by approximately 200 times compared to the legacy code. The MPI/OpenMP code was shown to run up to six times faster than the original for a range of molecular system sizes.Comment: SC17 conference paper, 12 pages, 7 figure

arXiv.org e-Print Archive

Crossref

A Comparison of some recent Task-based Parallel Programming Models

Author: Brorsson Mats
Faxén Karl-Filip
Podobas Artur
Publication venue
Publication date: 01/01/2010
Field of study

The need for parallel programming models that are simple to use and at the same time efficient for current ant future parallel platforms has led to recent attention to task-based models such as Cilk++, Intel TBB and the task concept in OpenMP version 3.0. The choice of model and implementation can have a major impact on the final performance and in order to understand some of the trade-offs we have made a quantitative study comparing four implementations of OpenMP (gcc, Intel icc, Sun studio and the research compiler Mercurium/nanos mcc), Cilk++ and Wool, a high-performance task-based library developed at SICS. Abstract. We use microbenchmarks to characterize costs for task-creation and stealing and the Barcelona OpenMP Tasks Suite for characterizing application performance. By far Wool and Cilk++ have the lowest overhead in both spawning and stealing tasks. This is reflected in application performance when many tasks with small granularity are spawned where Cilk++ and, in particular, has the highest performance. For coarse granularity applications, the OpenMP implementations have quite similar performance as the more light-weight Cilk++ and Wool except for one application where mcc is superior thanks to a superior task scheduler. Abstract. The OpenMP implemenations are generally not yet ready for use when the task granularity becomes very small. There is no inherent reason for this, so we expect future implementations of OpenMP to focus on this issue

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

An Efficient OpenMP Runtime System for Hierarchical Arch

Author: Broquedis François
Goglin Brice
Namyst Raymond
Thibault Samuel
Wacrenier Pierre-André
Publication venue
Publication date: 04/06/2007
Field of study

Exploiting the full computational power of always deeper hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-uniform architecture. The emergence of multi-core chips and NUMA machines makes it important to minimize the number of remote memory accesses, to favor cache affinities, and to guarantee fast completion of synchronization steps. By using the BubbleSched platform as a threading backend for the GOMP OpenMP compiler, we are able to easily transpose affinities of thread teams into scheduling hints using abstractions called bubbles. We then propose a scheduling strategy suited to nested OpenMP parallelism. The resulting preliminary performance evaluations show an important improvement of the speedup on a typical NAS OpenMP benchmark application

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

A static scheduling approach to enable safety-critical OpenMP applications

Author: Bertogna Marko
Buttazzo Giorgio
Cerutti Isabella
Melani Alessandra
Quiñones Eduardo
Serrano Maria A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Parallel computation is fundamental to satisfy the performance requirements of advanced safety-critical systems. OpenMP is a good candidate to exploit the performance opportunities of parallel platforms. However, safety-critical systems are often based on static allocation strategies, whereas current OpenMP implementations are based on dynamic schedulers. This paper proposes two OpenMP-compliant static allocation approaches: an optimal but costly approach based on an ILP formulation, and a sub-optimal but tractable approach that computes a worst-case makespan bound close to the optimal one.This work is funded by the EU projects P-SOCRATES (FP7-ICT-2013-10) and HERCULES (H2020/ICT/2015/688860), and the Spanish Ministry of Science and Innovation under contract TIN2015-65316-P.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Testing Programs That Contain OpenMP Directives

Author: Barnhart Bob
Publication venue: ScholarWorks@GVSU
Publication date: 01/01/2006
Field of study

OpenMP is a standard of compiler directives for C and Fortran programs that allow a developer to parallelize existing code. In this master\u27s project, the topic of tests for code that has been parallelized using OpenMP is addressed. How should a developer test a program to make sure that the directives have not modified the expected results of the code

Scholarworks@GVSU