Search CORE

9,722 research outputs found

Scheduling independent tasks on multi-cores with GPU accelerators

Author: Bleuse Raphaël
Kedad-Sidhoum Safia
Monna Florence
Mounié Grégory
Trystram Denis
Publication venue: 'Wiley'
Publication date: 07/09/2014
Field of study

International audienceMore and more computers use hybrid architectures combining multi-core processors and hardware accelerators like GPUs (Graphics Process-ing Units). We present in this paper a new method for scheduling efficiently parallel applications with m CPUs and k GPUs, where each task of the appli-cation can be processed either on a core (CPU) or on a GPU. The objective is to minimize the maximum completion time (makespan). The corresponding scheduling problem is NP-hard, we propose an efficient approximation algo-rithm which achieves an approximation ratio of 4 3 + 1 3k . We first detail and analyze the method, based on a dual approximation scheme, that uses dynamic programming to balance evenly the load between the heterogeneous resources. Then, we present a faster approximation algorithm for a special case of the previous problem, where all the tasks are accelerated when affected to GPU, with a performance guarantee of 3 2 for any number of GPUs. We run some simulations based on realistic benchmarks and compare the solutions obtained by a relaxed version of the generic method to the one provided by a classical scheduling algorithm (HEFT). Finally, we present an implementation of the 4/3-approximation and its relaxed version on a classical linear algebra kernel into the scheduler of the xKaapi runtime system

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

On Designing Multicore-aware Simulators for Biological Systems

Author: Aldinucci Marco
Coppo Mario
Damiani Ferruccio
Drocco Maurizio
Torquati Massimo
Troina Angelo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/10/2010
Field of study

The stochastic simulation of biological systems is an increasingly popular technique in bioinformatics. It often is an enlightening technique, which may however result in being computational expensive. We discuss the main opportunities to speed it up on multi-core platforms, which pose new challenges for parallelisation techniques. These opportunities are developed in two general families of solutions involving both the single simulation and a bulk of independent simulations (either replicas of derived from parameter sweep). Proposed solutions are tested on the parallelisation of the CWC simulator (Calculus of Wrapped Compartments) that is carried out according to proposed solutions by way of the FastFlow programming framework making possible fast development and efficient execution on multi-cores.Comment: 19 pages + cover pag

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Institutional Research Information System University of Turin

A Framework for Approximate Optimization of BoT Application Deployment in Hybrid Cloud Environment

Author: Hoseinyfarahabady Mohammadreza
Publication venue: Faculty of Engineering and Information Technologies, School of Information Technologies
Publication date: 01/01/2015
Field of study

We adopt a systematic approach to investigate the efficiency of near-optimal deployment of large-scale CPU-intensive Bag-of-Task applications running on cloud resources with the non-proportional cost to performance ratios. Our analytical solutions perform in both known and unknown running time of the given application. It tries to optimize users' utility by choosing the most desirable tradeoff between the make-span and the total incurred expense. We propose a schema to provide a near-optimal deployment of BoT application regarding users' preferences. Our approach is to provide user with a set of Pareto-optimal solutions, and then she may select one of the possible scheduling points based on her internal utility function. Our framework can cope with uncertainty in the tasks' execution time using two methods, too. First, an estimation method based on a Monte Carlo sampling called AA algorithm is presented. It uses the minimum possible number of sampling to predict the average task running time. Second, assuming that we have access to some code analyzer, code profiling or estimation tools, a hybrid method to evaluate the accuracy of each estimation tool in certain interval times for improving resource allocation decision has been presented. We propose approximate deployment strategies that run on hybrid cloud. In essence, proposed strategies first determine either an estimated or an exact optimal schema based on the information provided from users' side and environmental parameters. Then, we exploit dynamic methods to assign tasks to resources to reach an optimal schema as close as possible by using two methods. A fast yet simple method based on First Fit Decreasing algorithm, and a more complex approach based on the approximation solution of the transformed problem into a subset sum problem. Extensive experiment results conducted on a hybrid cloud platform confirm that our framework can deliver a near optimal solution respecting user's utility function

Sydney eScholarship

Recommended from our members

Preparing sparse solvers for exascale computing.

Author: Anzt Hartwig
Boman Erik
Curfman McInnes Lois
Falgout Rob
Ghysels Pieter
Heroux Michael
Li Xiaoye
Meier Yang Ulrike
Rajamanickam Sivasankaran
Rupp Karl
Smith Barry
Tran Mills Richard
Yamazaki Ichitaro
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

eScholarship - University of California

Pipelining the Fast Multipole Method over a Runtime System

Author: Agullo Emmanuel
Bramas Béranger
Coulaud Olivier
Darve Eric
Messner Matthias
Toru Takahashi
Publication venue
Publication date: 01/01/2012
Field of study

Fast Multipole Methods (FMM) are a fundamental operation for the simulation of many physical problems. The high performance design of such methods usually requires to carefully tune the algorithm for both the targeted physics and the hardware. In this paper, we propose a new approach that achieves high performance across architectures. Our method consists of expressing the FMM algorithm as a task flow and employing a state-of-the-art runtime system, StarPU, in order to process the tasks on the different processing units. We carefully design the task flow, the mathematical operators, their Central Processing Unit (CPU) and Graphics Processing Unit (GPU) implementations, as well as scheduling schemes. We compute potentials and forces of 200 million particles in 48.7 seconds on a homogeneous 160 cores SGI Altix UV 100 and of 38 million particles in 13.34 seconds on a heterogeneous 12 cores Intel Nehalem processor enhanced with 3 Nvidia M2090 Fermi GPUs.Comment: No. RR-7981 (2012

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

Scheduling Independent Moldable Tasks on Multi-Cores with GPUs

Author: Bleuse Raphaël
Hunold Sascha
Kedad-Sidhoum Safia
Monna Florence
Mounié Grégory
Trystram Denis
Publication venue: HAL CCSD
Publication date: 01/01/2016
Field of study

The number of parallel systems using accelerators is growing up.The technology is now mature enough to allow sustainedpetaflop/s. However, reaching this performance scale requiresefficient scheduling algorithms to manage the heterogeneouscomputing resources.We present a new approach for scheduling independent tasks onmultiple CPUs and multiple GPUs. The tasks are assumed to beparallelizable on CPUs using the moldable model: the final numberof cores allotted to a task can be decided and set by thescheduler. More precisely, we design an algorithm aiming atminimizing the makespan---the maximum completion time of alltasks---for this scheduling problem. The proposed algorithmcombines a dual approximation scheme with a fast integer linearprogram (ILP). It determines both the partitioning of the tasks,ie whether a task should be mapped to CPUs or a GPU, and thenumber of CPUs allotted to a moldable task if mapped to the CPUs.A worst case analysis shows that the algorithm has anapproximation ratio of

\frac{3}{2} + \epsilon

. However, sincethe complexity of the ILP-based algorithm could benon-polynomial, we also present a proved polynomial-timealgorithm with an approximation ratio of

2+\epsilon

.We complement the theoretical analysis of our two novelalgorithms with an experimental study. In these experiments, wecompare our algorithms to a modified version of the classical\heft algorithm, adapted to handle moldable tasks. Theexperimental results show that our algorithm with the

\frac{3}{2} + \epsilon

approximation ratio producessignificantly shorter schedules than the modified \heft for mostof the instances. In addition, the experiments provide evidencethat this ILP-based algorithm is also practically able to solvelarger problem instances in a reasonable amount of time

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server