Search CORE

59,806 research outputs found

Adaptive, efficient, parallel execution of parallel programs

Author: Sohi Guri
Publication venue: Barcelona Supercomputing Center
Publication date: 10/09/2016
Field of study

Future parallel processors will be heterogeneous, be increasingly less reliable, and operate in dynamically changing operating conditions. This will result in a constantly varying pool of hardware resources which can greatly complicate the task of efficiently exposing a program's parallelism onto these resources. Coupled with this uncertainty is the diverse set of efficiency metrics that users may desire. This talk will describe Varuna, a system that dynamically, continuously, rapidly and transparently adapts a program's parallelism to best match the instantaneous capabilities of the hardware resources while satisfying different efficiency metrics. Varuna is applicable to both multithreaded and task-based programs and can be seamlessly inserted between the program and the operating system without needing to change the source code of either. The talk will also present results demonstrating Varuna's effectiveness in diverse execution environments using unaltered C/C++ parallel programs from various benchmark suites

UPCommons. Portal del coneixement obert de la UPC

Adaptive, efficient, parallel execution of parallel programs

Author: Gagan Gupta
Gurindar S Sohi
Srinath Sridharan
Publication venue
Publication date: 01/01/2014
Field of study

Abstract Future multicore processors will be heterogeneous, be increasingly less reliable, and operate in dynamically changing operating conditions. Such environments will result in a constantly varying pool of hardware resources which can greatly complicate the task of efficiently exposing a program's parallelism onto these resources. Coupled with this uncertainty is the diverse set of efficiency metrics that users may desire. This paper proposes Varuna, a system that dynamically, continuously, rapidly and transparently adapts a program's parallelism to best match the instantaneous capabilities of the hardware resources while satisfying different efficiency metrics. Varuna is applicable to both multithreaded and task-based programs and can be seamlessly inserted between the program and the operating system without needing to change the source code of either. We demonstrate Varuna's effectiveness in diverse execution environments using unaltered C/C++ parallel programs from various benchmark suites. Regardless of the execution environment, Varuna always outperformed the state-of-the-art approaches for the efficiency metrics considered

CiteSeerX

Adaptive Runtime Selection of Parallel Schedules in the Polytope Model

Author: Clauss Philippe
Loechner Vincent
Pradelle Benoit
Publication venue: ACM/SIGSIM
Publication date: 03/04/2011
Field of study

International audienceThere is often no unique version of a program that provides the best performance in all circumstances. Compilers should rely on an adaptive runtime decision to choose which optimizing and parallelizing transformations will lead to the best performance in any execution context.We present a new adaptive framework solving two drawbacks of existing methods: it is effective since the very first execution, and it handles slight variations of input data shape and size. In our proposal, different code versions of parallel loop nests are statically generated by the compiler. At install time, each version is profiled in different execution contexts. At runtime, the execution time of each code version is predicted using the profiling results, the current input data shape and the number of available processor cores. The predicted best version is then run. Our framework handles several versions of possibly tiled parallel loops, using the polytope model for both the profiling and the dynamic selection phases. We show on several benchmark programs that our runtime system selects one of the most efficient version with a very low runtime overhead. This quick and efficient selection leads to speedups compared to the usage of a unique version in every execution context

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Adaptive Performance Modeling on Hierarchical Grid Computing Environments

Author: Nasri Wahid
Steffenel Luiz Angelo
Trystram Denis
Publication venue: IEEE Computer Society
Publication date: 01/01/2007
Field of study

8 pagesInternational audienceIn the past, efficient parallel algorithms have always been developed specifically for the successive generations of parallel systems (vector machines, shared-memory machines, distributed-memory machines, etc.). Today, due to many reasons, such as the inherent heterogeneity, the diversity, and the continuous evolution of the existing parallel execution supports, it is very hard to solve efficiently a target problem by using a single algorithm or to write portable programs that perform well on any computational supports. Toward this goal, we propose a generic framework based on communication models and adaptive approaches in order to adaptively model performances on grid computing environments. We apply this methodology on collective communication operations and show, by achieving experiments on a real platform, that the framework provides significant performances while determining the best combination model-algorithm depending on the problem and architecture parameters

CiteSeerX

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Parallelizing dynamic sequential programs using polyhedral process networks

Author: Nadezhkin D.
Publication venue
Publication date: 01/01/2012
Field of study

The Polyhedral Process Network (PPN) is a suitable parallel model of computation (MoC) used to specify embedded streaming applications in a parallel form facilitating the efficient mapping onto embedded parallel execution platforms. Unfortunately, specifying an application using a parallel MoC is a very difficult and highly error-prone task. To overcome the associated difficulties, we have developed the pn compiler, which derives PPN specifications from sequential static affine nested loop programs (SANLPs). However, there are many applications that have adaptive and dynamic behavior which cannot be expressed as SANLPs. In order to handle such dynamic applications, in this dissertation we address an important question: whether some of the static restrictions of the SANLPs can be relaxed while keeping the ability to perform compile-time analysis and to derive PPNs in an automated way. Achieving this will significantly extend the range of applications that can be parallelized in an automated way. By studying different dynamic applications we distinguished three relaxations to SANLP programs that would allow one to specify dynamic applications as sequential programs. These relaxations allow dynamic if-conditions, for-loops with dynamic bounds and while-loops in a program. The first relaxation has already been considered. In this dissertation, we consider the other two more difficult relaxations.UBL - phd migration 201

CiteSeerX

Leiden University Scholary Publications

Towards an Adaptive Skeleton Framework for Performance Portability

Author: Maier Patrick
Morton John Magnus
Trinder Phil
Publication venue: School of Computing Science, University of Glasgow
Publication date: 21/12/2015
Field of study

The proliferation of widely available, but very different, parallel architectures makes the ability to deliver good parallel performance on a range of architectures, or performance portability, highly desirable. Irregularly-parallel problems, where the number and size of tasks is unpredictable, are particularly challenging and require dynamic coordination. The paper outlines a novel approach to delivering portable parallel performance for irregularly parallel programs. The approach combines declarative parallelism with JIT technology, dynamic scheduling, and dynamic transformation. We present the design of an adaptive skeleton library, with a task graph implementation, JIT trace costing, and adaptive transformations. We outline the architecture of the protoype adaptive skeleton execution framework in Pycket, describing tasks, serialisation, and the current scheduler.We report a preliminary evaluation of the prototype framework using 4 micro-benchmarks and a small case study on two NUMA servers (24 and 96 cores) and a small cluster (17 hosts, 272 cores). Key results include Pycket delivering good sequential performance e.g. almost as fast as C for some benchmarks; good absolute speedups on all architectures (up to 120 on 128 cores for sumEuler); and that the adaptive transformations do improve performance

Enlighten

Multidimensional integration in a heterogeneous network environment

Author: Butler
Collins
Geist
Gross
Hammersley
Kalos
L'Ecuyer
Lepage
Lepage
Press
Siniša Veseli
Stroud
Publication venue: 'Elsevier BV'
Publication date: 17/10/1997
Field of study

We consider several issues related to the multidimensional integration using a network of heterogeneous computers. Based on these considerations, we develop a new general purpose scheme which can significantly reduce the time needed for evaluation of integrals with CPU intensive integrands. This scheme is a parallel version of the well-known adaptive Monte Carlo method (the VEGAS algorithm), and is incorporated into a new integration package which uses the standard set of message-passing routines in the PVM software system.Comment: 19 pages, latex, 5 postscript figures include

arXiv.org e-Print Archive

Crossref

CERN Document Server