Search CORE

690 research outputs found

Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime

Author: Cabral Bruno
Correia Ivo
Fonseca Alcides
Rafael João
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/04/2016
Field of study

There are billions of lines of sequential code inside nowadays' software which do not benefit from the parallelism available in modern multicore architectures. Automatically parallelizing sequential code, to promote an efficient use of the available parallelism, has been a research goal for some time now. This work proposes a new approach for achieving such goal. We created a new parallelizing compiler that analyses the read and write instructions, and control-flow modifications in programs to identify a set of dependencies between the instructions in the program. Afterwards, the compiler, based on the generated dependencies graph, rewrites and organizes the program in a task-oriented structure. Parallel tasks are composed by instructions that cannot be executed in parallel. A work-stealing-based parallel runtime is responsible for scheduling and managing the granularity of the generated tasks. Furthermore, a compile-time granularity control mechanism also avoids creating unnecessary data-structures. This work focuses on the Java language, but the techniques are general enough to be applied to other programming languages. We have evaluated our approach on 8 benchmark programs against OoOJava, achieving higher speedups. In some cases, values were close to those of a manual parallelization. The resulting parallel code also has the advantage of being readable and easily configured to improve further its performance manually.Comment: Accepted for Publicatio

arXiv.org e-Print Archive

Crossref

Task migration of DSP application specified with a DFG and implemented with the BSP computing model on a CPU-GPU cluster

Author: Fristot Vincent
Houzet Dominique
Huet Sylvain
Mansouri Farouk
Publication venue: HAL CCSD
Publication date: 08/10/2013
Field of study

International audienceNowadays computer applications are becoming heavier and require, at the same time, real-time results. The Heterogeneous clusters with their computing power represent a good solution to this request. However, it is possible that during the execution, a computing element of the cluster becomes defaulting, needs maintenance, or that the load needs to be re-balanced. . . In this paper, we propose a migration strategy for relocating the execution of a task to another computing element. In particular, we are interested in remap nodes of Data Flow Graph (DFG), representing Digital Signal Processing (DSP) application, onto heterogeneous (CPU-GPU) clusters while keeping up the flow of data and minimizing the temporal perturbation. For our approach, we give a lower bound for the flow of data after the migration and, validate it by the real-time construction of visual saliency map from video input

Hal - Université Grenoble Alpes

A Survey of Pipelined Workflow Scheduling: Models and Algorithms

Author: Benoit Anne
Catalyurek Umit,
Robert Yves
Saule Erik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

International audienceA large class of applications need to execute the same workflow on different data sets of identical size. Efficient execution of such applications necessitates intelligent distribution of the application components and tasks on a parallel machine, and the execution can be orchestrated by utilizing task-, data-, pipelined-, and/or replicated-parallelism. The scheduling problem that encompasses all of these techniques is called pipelined workflow scheduling, and it has been widely studied in the last decade. Multiple models and algorithms have flourished to tackle various programming paradigms, constraints, machine behaviors or optimization goals. This paper surveys the field by summing up and structuring known results and approaches

HAL-ENS-LYON

CiteSeerX

INRIA a CCSD electronic archive server

HAL: Hyper Article en Ligne

Hal-Diderot

An OpenMP Programming Environment on Mobile Devices

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

Crossref

Parameter estimation in large-scale systems biology models: a parallel and self-adaptive cooperative strategy

Author: Banga Julio R.
Doallo Ramón
Egea Jose A.
González Patricia
Penas David R.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

[Abstract] Background The development of large-scale kinetic models is one of the current key issues in computational systems biology and bioinformatics. Here we consider the problem of parameter estimation in nonlinear dynamic models. Global optimization methods can be used to solve this type of problems but the associated computational cost is very large. Moreover, many of these methods need the tuning of a number of adjustable search parameters, requiring a number of initial exploratory runs and therefore further increasing the computation times. Here we present a novel parallel method, self-adaptive cooperative enhanced scatter search (saCeSS), to accelerate the solution of this class of problems. The method is based on the scatter search optimization metaheuristic and incorporates several key new mechanisms: (i) asynchronous cooperation between parallel processes, (ii) coarse and fine-grained parallelism, and (iii) self-tuning strategies. Results The performance and robustness of saCeSS is illustrated by solving a set of challenging parameter estimation problems, including medium and large-scale kinetic models of the bacterium E. coli, bakerés yeast S. cerevisiae, the vinegar fly D. melanogaster, Chinese Hamster Ovary cells, and a generic signal transduction network. The results consistently show that saCeSS is a robust and efficient method, allowing very significant reduction of computation times with respect to several previous state of the art methods (from days to minutes, in several cases) even when only a small number of processors is used. Conclusions The new parallel cooperative method presented here allows the solution of medium and large scale parameter estimation problems in reasonable computation times and with small hardware requirements. Further, the method includes self-tuning mechanisms which facilitate its use by non-experts. We believe that this new method can play a key role in the development of large-scale and even whole-cell dynamic models.Ministerio de Economía y Competitividad; DPI2011-28112-C04-03Ministerio de Economía y Competitividad; DPI2011-28112-C04-04Ministerio de Economía y Competitividad; DPI2014-55276-C5-2-RMinisterio de Economía y Competitividad; TIN2013-42148-PMinisterio de Economía y Competitividad; TIN2016-75845-PGalicia. Consellería de Cultura, Educación e Ordenación Universitaria; R2014/041Galicia. Consellería de Cultura, Educación e Ordenación Universitaria; R2016/045Galicia. Consellería de Cultura, Educación e Ordenación Universitaria; GRC2013/05

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Performance Aspects of Synthesizable Computing Systems

Author: Schleuniger Pascal
Publication venue: Technical University of Denmark
Publication date: 01/01/2014
Field of study

Online Research Database In Technology

The hArtes Tool Chain

Author: A. Antola
A. Cerruto
A. Lattanzi
A. Michelotti
A. Morea
C. Pilato
D. Sciuto
E. Ciavattini
F. Bettarelli
F. Ferrandi
J.G.F. Coutinho
K. Bertels
K. Sigdel
M. Lattuada
M.T. Chiaradia
R. Nutricato
R.J. Meeuws
T. Todman
V.M. Sima
W. Luk
Y. Yankova
Y.M. Lam
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

This chapter describes the different design steps needed to go from legacy code to a transformed application that can be efficiently mapped on the hArtes platform

Archivio istituzionale della ricerca - Politecnico di Milano

Exploiting Parallelism in GPUs

Author: Hechtman Blake Alan
Publication venue
Publication date
Field of study

Heterogeneous processors with accelerators provide an opportunity to improve performance within a given power budget.Many of these heterogeneous processors contain Graphics Processing Units (GPUs) that can perform graphics and embarrassingly parallel computation orders of magnitude faster than a CPU while using less energy. Beyond these obvious applications for GPUs, a larger variety of applications can benefit from a GPU's large computation and memory bandwidth. However, many of these applications are irregular and, as a result, require synchronization and scheduling that are commonly believed to perform poorly on GPUs. The basic building block of synchronization and scheduling is memory consistency, which is, therefore, the first place to look for improving performance on irregular applications. In this thesis, we approach the programmability of irregular applications on GPUs by thinking across traditional boundaries of the compute stack. We think about architecture, microarchitecture and runtime systems from the programmers perspective. To this end, we study architectural memory consistency on future GPUs with cache coherence. In addition, we design a GPU memory systemmicroarchitecture that can support fine-grain and coarse-grain synchronization without sacrificing throughput. Finally, we develop a task runtime that embraces the GPU microarchitecture to perform wellon fork/join parallelism desired by many programmers. Overall, this thesis contributes non-intuitive solutions to improve the performance and programmability of irregular applications from the programmer's perspective.Dissertatio

DukeSpace (Duke Univ.)