Search CORE

10 research outputs found

Enabling Locality-Aware Computations in OpenMP

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2010
Field of study

An Efficient OpenMP Loop Scheduler for Irregular Applications on Large-Scale NUMA Machines

Author: A. Mahéo
E. Ayguadé
F. Broquedis
F. Broquedis
L. Huang
M. Frigo
M. Ihmsen
S. Che
S. Subramaniam
S.L. Olivier
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

International audienceNowadays shared memory HPC platforms expose a large number of cores organized in a hierarchical way. Parallel application programmers strug- gle to express more and more fine-grain parallelism and to ensure locality on such NUMA platforms. Independent loops stand as a natural source of paral- lelism. Parallel environments like OpenMP provide ways of parallelizing them efficiently, but the achieved performance is closely related to the choice of pa- rameters like the granularity of work or the loop scheduler. Considering that both can depend on the target computer, the input data and the loop workload, the application programmer most of the time fails at designing both portable and ef- ficient implementations. We propose in this paper a new OpenMP loop scheduler, called adaptive, that dynamically adapts the granularity of work considering the underlying system state. Our scheduler is able to perform dynamic load balancing while taking memory affinity into account on NUMA architectures. Results show that adaptive outperforms state-of-the-art OpenMP loop schedulers on memory- bound irregular applications, while obtaining performance comparable to static on parallel loops with a regular workload

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

Crossref

Automatic, Abstracted and Portable Topology-Aware Thread Placement

Author: Gustedt Jens
Jeannot Emmanuel
Mansouri Farouk
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2017
Field of study

International audienceEfficiently programming shared-memory machines is a difficult challenge because mapping application threads onto the memory hierarchy has a strong impact on the performance. However, optimizing such thread placement is difficult: architectures become increasingly complex and application behavior changes with implementations and input parameters, e.g problem size and number of threads. In this work, we propose a fully automatic, abstracted and portable affinity module. It produces and implements an optimized affinity strategy that combines knowledge about application characteristics and the platform topology. Implemented in the back-end of our runtime system (ORWL), our approach was used to enhance the performance and the scalability of several unmodified ORWL-coded applications: matrix multiplication, a 2D stencil (Livermore Kernel 23), and a video tracking real world application. On two SMP machines with quite different hardware characteristics, our tests show spectacular performance improvements for these unmodified application codes due to a dramatic decrease of cache misses and pipeline stalls. A comparison to reference implementations using OpenMP confirms this performance gain of almost one order of magnitude

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

Optimizing Data Locality for Fork/Join Programs Using Constrained Work Stealing

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Recommended from our members

Center for Programming Models for Scalable Parallel Computing - Towards Enhancing OpenMP for Manycore and Heterogeneous Nodes

Author: Chapman Barbara
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 01/02/2012
Field of study

OpenMP was not well recognized at the beginning of the project, around year 2003, because of its limited use in DoE production applications and the inmature hardware support for an efficient implementation. Yet in the recent years, it has been graduately adopted both in HPC applications, mostly in the form of MPI+OpenMP hybrid code, and in mid-scale desktop applications for scientific and experimental studies. We have observed this trend and worked deligiently to improve our OpenMP compiler and runtimes, as well as to work with the OpenMP standard organization to make sure OpenMP are evolved in the direction close to DoE missions. In the Center for Programming Models for Scalable Parallel Computing project, the HPCTools team at the University of Houston (UH), directed by Dr. Barbara Chapman, has been working with project partners, external collaborators and hardware vendors to increase the scalability and applicability of OpenMP for multi-core (and future manycore) platforms and for distributed memory systems by exploring different programming models, language extensions, compiler optimizations, as well as runtime library support

UNT Digital Library

Center for Programming Models for Scalable Parallel Computing - Towards Enhancing OpenMP for Manycore and Heterogeneous Nodes

Author: Chapman Barbara
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 01/02/2012
Field of study

Crossref

UNT Digital Library

Abstractions for Portable Data Management in Heterogeneous Memory Systems

Author: Foyer Clement M
Publication venue
Publication date: 24/06/2021
Field of study

Explore Bristol Research

Locality Awareness for Task Parallel Computation

Author: Olivier Stephen Lecler
Publication venue: University of North Carolina at Chapel Hill
Publication date: 01/01/2012
Field of study

The task parallel programming model allows programmers to express concurrency at a high level of abstraction and places the burden of scheduling parallel execution on the run time system. Efficient scheduling of tasks on multi-socket multicore shared memory systems requires careful consideration of an increasingly complex memory hierarchy, including shared caches and non-uniform memory access (NUMA) characteristics. In this dissertation, we study the performance impact of these issues and other performance factors that limit parallel speedup in task parallel program executions and propose new scheduling strategies to improve performance. Our performance model characterizes lost efficiency in terms of overhead time, idle time, and work time inflation due to increased data access costs. We introduce a hierarchical run time scheduler that combines the benefits of work stealing and parallel depth-first schedulers. Matching the scheduler design to the memory hierarchy of multicore NUMA systems limits costly remote data accesses while maintaining load balance and exploiting constructive data sharing among threads that share a cache. We also propose a locality- based scheduling framework based on locality domains and comprising an API for programmers to specify application locality and a scheduler that honors those specifications. Implementations of the hierarchical and locality-based schedulers in our OpenMP run time system exhibit performance improvements on several task parallel benchmark applications over existing scheduling strategies and production OpenMP run time systems.Doctor of Philosoph

Carolina Digital Repository

Enabling Locality-Aware Computations in OpenMP

Author: Barbara Chapman
Haoqiang Jin
Lei Huang
Liqi Yi
Publication venue: Hindawi Limited
Publication date: 01/01/2010
Field of study

Locality of computation is key to obtaining high performance on a broad variety of parallel architectures and applications. It is moreover an essential component of strategies for energy-efficient computing. OpenMP is a widely available industry standard for shared memory programming. With the pervasive deployment of multi-core computers and the steady growth in core count, a productive programming model such as OpenMP is increasingly expected to play an important role in adapting applications to this new hardware. However, OpenMP does not provide the programmer with explicit means to program for locality. Rather it presents the user with a “flat” memory model. In this paper, we discuss the need for explicit programmer control of locality within the context of OpenMP and present some ideas on how this might be accomplished. We describe potential extensions to OpenMP that would enable the user to manage a program's data layout and to align tasks and data in order to minimize the cost of data accesses. We give examples showing the intended use of the proposed features, describe our current implementation and present some experimental results. Our hope is that this work will lead to efforts that would help OpenMP to be a major player on emerging, multi- and many-core architectures

Directory of Open Access Journals