690 research outputs found
Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime
There are billions of lines of sequential code inside nowadays' software
which do not benefit from the parallelism available in modern multicore
architectures. Automatically parallelizing sequential code, to promote an
efficient use of the available parallelism, has been a research goal for some
time now. This work proposes a new approach for achieving such goal. We created
a new parallelizing compiler that analyses the read and write instructions, and
control-flow modifications in programs to identify a set of dependencies
between the instructions in the program. Afterwards, the compiler, based on the
generated dependencies graph, rewrites and organizes the program in a
task-oriented structure. Parallel tasks are composed by instructions that
cannot be executed in parallel. A work-stealing-based parallel runtime is
responsible for scheduling and managing the granularity of the generated tasks.
Furthermore, a compile-time granularity control mechanism also avoids creating
unnecessary data-structures. This work focuses on the Java language, but the
techniques are general enough to be applied to other programming languages. We
have evaluated our approach on 8 benchmark programs against OoOJava, achieving
higher speedups. In some cases, values were close to those of a manual
parallelization. The resulting parallel code also has the advantage of being
readable and easily configured to improve further its performance manually.Comment: Accepted for Publicatio
Task migration of DSP application specified with a DFG and implemented with the BSP computing model on a CPU-GPU cluster
International audienceNowadays computer applications are becoming heavier and require, at the same time, real-time results. The Heterogeneous clusters with their computing power represent a good solution to this request. However, it is possible that during the execution, a computing element of the cluster becomes defaulting, needs maintenance, or that the load needs to be re-balanced. . . In this paper, we propose a migration strategy for relocating the execution of a task to another computing element. In particular, we are interested in remap nodes of Data Flow Graph (DFG), representing Digital Signal Processing (DSP) application, onto heterogeneous (CPU-GPU) clusters while keeping up the flow of data and minimizing the temporal perturbation. For our approach, we give a lower bound for the flow of data after the migration and, validate it by the real-time construction of visual saliency map from video input
A Survey of Pipelined Workflow Scheduling: Models and Algorithms
International audienceA large class of applications need to execute the same workflow on different data sets of identical size. Efficient execution of such applications necessitates intelligent distribution of the application components and tasks on a parallel machine, and the execution can be orchestrated by utilizing task-, data-, pipelined-, and/or replicated-parallelism. The scheduling problem that encompasses all of these techniques is called pipelined workflow scheduling, and it has been widely studied in the last decade. Multiple models and algorithms have flourished to tackle various programming paradigms, constraints, machine behaviors or optimization goals. This paper surveys the field by summing up and structuring known results and approaches
Parameter estimation in large-scale systems biology models: a parallel and self-adaptive cooperative strategy
[Abstract]
Background
The development of large-scale kinetic models is one of the current key issues in computational systems biology and bioinformatics. Here we consider the problem of parameter estimation in nonlinear dynamic models. Global optimization methods can be used to solve this type of problems but the associated computational cost is very large. Moreover, many of these methods need the tuning of a number of adjustable search parameters, requiring a number of initial exploratory runs and therefore further increasing the computation times.
Here we present a novel parallel method, self-adaptive cooperative enhanced scatter search (saCeSS), to accelerate the solution of this class of problems. The method is based on the scatter search optimization metaheuristic and incorporates several key new mechanisms: (i) asynchronous cooperation between parallel processes, (ii) coarse and fine-grained parallelism, and (iii) self-tuning strategies.
Results
The performance and robustness of saCeSS is illustrated by solving a set of challenging parameter estimation problems, including medium and large-scale kinetic models of the bacterium E. coli, bakerés yeast S. cerevisiae, the vinegar fly D. melanogaster, Chinese Hamster Ovary cells, and a generic signal transduction network.
The results consistently show that saCeSS is a robust and efficient method, allowing very significant reduction of computation times with respect to several previous state of the art methods (from days to minutes, in several cases) even when only a small number of processors is used.
Conclusions
The new parallel cooperative method presented here allows the solution of medium and large scale parameter estimation problems in reasonable computation times and with small hardware requirements. Further, the method includes self-tuning mechanisms which facilitate its use by non-experts. We believe that this new method can play a key role in the development of large-scale and even whole-cell dynamic models.Ministerio de Economía y Competitividad; DPI2011-28112-C04-03Ministerio de Economía y Competitividad; DPI2011-28112-C04-04Ministerio de Economía y Competitividad; DPI2014-55276-C5-2-RMinisterio de Economía y Competitividad; TIN2013-42148-PMinisterio de Economía y Competitividad; TIN2016-75845-PGalicia. Consellería de Cultura, Educación e Ordenación Universitaria; R2014/041Galicia. Consellería de Cultura, Educación e Ordenación Universitaria; R2016/045Galicia. Consellería de Cultura, Educación e Ordenación Universitaria; GRC2013/05
The hArtes Tool Chain
This chapter describes the different design steps needed to go from legacy code to a transformed application that can be efficiently mapped on the hArtes platform
Exploiting Parallelism in GPUs
<p>Heterogeneous processors with accelerators provide an opportunity to improve performance within a given power budget.</p><p>Many of these heterogeneous processors contain Graphics Processing Units (GPUs) that can perform graphics and embarrassingly parallel computation orders of magnitude faster than a CPU while using less energy. Beyond these obvious applications for GPUs, a larger variety of applications can benefit from a GPU's large computation and memory bandwidth. However, many of these applications are irregular and, as a result, require synchronization and scheduling that are commonly believed to perform poorly on GPUs. The basic building block of synchronization and scheduling is memory consistency, which is, therefore, the first place to look for improving performance on irregular applications. In this thesis, we approach the programmability of irregular applications on GPUs by thinking across traditional boundaries of the compute stack. We think about architecture, microarchitecture and runtime systems from the programmers perspective. To this end, we study architectural memory consistency on future GPUs with cache coherence. In addition, we design a GPU memory system</p><p>microarchitecture that can support fine-grain and coarse-grain synchronization without sacrificing throughput. Finally, we develop a task runtime that embraces the GPU microarchitecture to perform well</p><p>on fork/join parallelism desired by many programmers. Overall, this thesis contributes non-intuitive solutions to improve the performance and programmability of irregular applications from the programmer's perspective.</p>Dissertatio
- …
