Search CORE

22 research outputs found

Towards Scheduling Evolving Applications

Author: Klein Cristian
Pérez Christian
Publication venue: HAL CCSD
Publication date: 29/08/2011
Field of study

International audienceMost high-performance computing resource managers only allow applications to request a static allocation of resources. However, evolving applications have resource requirements which change (evolve) during their execution. Currently, such applications are forced to make an allocation based on their peak resource requirements, which leads to an inefficient resource usage. This paper studies whether it makes sense for resource managers to support evolving applications. It focuses on scheduling fully-predictably evolving applications on homogeneous resources, for which it proposes several algorithms and evaluates them based on simulations. Results show that resource usage and application response time can be significantly improved with short scheduling times

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

An RMS for Non-predictably Evolving Applications

Author: Klein Cristian
Pérez Christian
Publication venue: HAL CCSD
Publication date: 08/06/2011
Field of study

International audienceNon-predictably evolving applications are applications that change their resource requirements during execution. These applications exist, for example, as a result of using adaptive numeric methods, such as adaptive mesh refinement and adaptive particle methods. Increasing interest is being shown to have such applications acquire resources on the fly. However, current HPC Resource Management Systems (RMSs) only allow a static allocation of resources, which cannot be changed after it started. Therefore, non-predictably evolving applications cannot make efficient use of HPC resources, being forced to make an allocation based on their maximum expected requirements. This paper presents CooRMv2, an RMS which supports efficient scheduling of non-predictably evolving applications. An application can make "pre-allocations" to specify its peak resource usage. The application can then dynamically allocate resources as long as the pre-allocation is not outgrown. Resources which are pre-allocated but not used, can be filled by other applications. Results show that the approach is feasible and leads to a more efficient resource usage

HAL-ENS-LYON

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

Diet-ethic: Fair Scheduling of Optional Computations in GridRPC Middleware

Author: Camillo Frédéric
Caron Eddy
Guivarch Ronan
Hurault Aurélie
Klein Cristian
Pérez Christian
Publication venue: HAL CCSD
Publication date: 10/05/2012
Field of study

Most HPC platforms require users to submit a pre-determined number of computation requests (also called jobs). Unfortunately, this is cumbersome when some of the computations are optional, i.e., they are not critical, but their completion would improve results. For example, given a deadline, the number of requests to submit for a Monte Carlo experiment is difficult to choose. The more requests are completed, the better the results are, however, submitting too many might overload the platform. Conversely, submitting too few requests may leave resources unused and misses an opportunity to improve the results. This paper introduces and solves the problem of scheduling optional computations. An architecture which auto-tunes the number of requests is proposed, then implemented in the DIET GridRPC middleware. Real-life experiments show that several metrics are improved, such as user satisfaction, fairness and the number of completed requests. Moreover, the solution is shown to be scalable.La plupart des plate-formes HPC demandent à l'utilisateur de soumettre un nombre pré-déterminé de requêtes de calcul (aussi appelées " job "). Malheureusement, cela n'est pas pertinent quand une partie des calculs est optionnelle, c'est-à-dire, que l'exécution des requêtes n'est pas critique pour l'utilisateur, mais que leur complétion pourrait améliorer les résultats. Par exemple, étant donnée une date limite, le nombre de requêtes à soumettre pour une expérience Monte Carlo est difficile à choisir. Plus il y a des requêtes qui sont exécutées, meilleures sont les résultats. Cependant, en soumettant trop de requêtes, on risque de surcharger la plate-forme. À l'opposé, en ne soumettant pas assez de requêtes, les ressources sont sous-exploitées alors qu'elles auraient pu être utilisées pour améliorer les résultats. Cet article introduit et résout le problème d'ordonnancer des requêtes optionnelles. Une architecture qui choisit automatiquement le nombre de requêtes est proposée puis implémentée dans l'intergiciel GridRPC DIET. Les expériences faites sur de vraies plate-formes - telles que Grid'5000 - montrent que plusieurs métriques peuvent être améliorées, telles que la satisfaction des utilisateurs, l'équité et le nombre des requêtes exécutées. Enfin, la solution proposée passe à l'échelle

HAL-ENS-LYON

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

Hal-Diderot

Integrating multiple clusters for compute-intensive applications

Author: Yun Zhifeng
Publication venue: LSU Digital Commons
Publication date: 01/01/2011
Field of study

Multicluster grids provide one promising solution to satisfying the growing computational demands of compute-intensive applications. However, it is challenging to seamlessly integrate all participating clusters in different domains into a single virtual computational platform. In order to fully utilize the capabilities of multicluster grids, computer scientists need to deal with the issue of joining together participating autonomic systems practically and efficiently to execute grid-enabled applications. Driven by several compute-intensive applications, this theses develops a multicluster grid management toolkit called Pelecanus to bridge the gap between user\u27s needs and the system\u27s heterogeneity. Application scientists will be able to conduct very large-scale execution across multiclusters with transparent QoS assurance. A novel model called DA-TC (Dynamic Assignment with Task Containers) is developed and is integrated into Pelecanus. This model uses the concept of a task container that allows one to decouple resource allocation from resource binding. It employs static load balancing for task container distribution and dynamic load balancing for task assignment. The slowest resources become useful rather than be bottlenecks in this manner. A cluster abstraction is implemented, which not only provides various cluster information for the DA-TC execution model, but also can be used as a standalone toolkit to monitor and evaluate the clusters\u27 functionality and performance. The performance of the proposed DA-TC model is evaluated both theoretically and experimentally. Results demonstrate the importance of reducing queuing time in decreasing the total turnaround time for an application. Experiments were conducted to understand the performance of various aspects of the DA-TC model. Experiments showed that our model could significantly reduce turnaround time and increase resource utilization for our targeted application scenarios. Four applications are implemented as case studies to determine the applicability of the DA-TC model. In each case the turnaround time is greatly reduced, which demonstrates that the DA-TC model is efficient for assisting application scientists in conducting their research. In addition, virtual resources were integrated into the DA-TC model for application execution. Experiments show that the execution model proposed in this thesis can work seamlessly with multiple hybrid grid/cloud resources to achieve reduced turnaround time

Louisiana State University

Scheduling Rigid, Evolving Applications on Homogeneous Resources

Author: Klein Cristian
Pérez Christian
Publication venue: HAL CCSD
Publication date: 15/02/2010
Field of study

Classical applications executed on clusters or grids are either rigid/moldable or workﬂow-based. However, the increase of resource computing and storage capabilities has leveraged more complex applications. For example, some code coupling applications exhibit changing resource requirements without being a workﬂow. Executing them on current batch schedulers leads to an inefficient resource usage, as a block of resources has to be reserved for the whole duration of the application. This paper studies the problem of offline scheduling of rigid and evolving applications on homogeneous resources. It proposes several scheduling algorithms and evaluates them based on simulations. Results show that signiﬁcant makespan and resource usage improvement can be achieved with short scheduling computing time

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Decentralized Online Scheduling of Malleable NP-hard Jobs

Author: Sanders Peter
Schreiber Dominik
Publication venue: Springer International Publishing
Publication date: 02/08/2022
Field of study

In this work, we address an online job scheduling problem in a large distributed computing environment. Each job has a priority and a demand of resources, takes an unknown amount of time, and is malleable, i.e., the number of allotted workers can fluctuate during its execution. We subdivide the problem into (a) determining a fair amount of resources for each job and (b) assigning each job to an according number of processing elements. Our approach is fully decentralized, uses lightweight communication, and arranges each job as a binary tree of workers which can grow and shrink as necessary. Using the NP-complete problem of propositional satisfiability (SAT) as a case study, we experimentally show on up to 128 machines (6144 cores) that our approach leads to near-optimal utilization, imposes minimal computational overhead, and performs fair scheduling of incoming jobs within a few milliseconds

KITopen

Diet-ethic: Fair Scheduling of Optional Computations in GridRPC Middleware

Author: Camillo Frédéric
Caron Eddy
Guivarch Ronan
Hurault Aurélie
Klein Cristian
Pérez Christian
Publication venue: HAL CCSD
Publication date: 10/05/2012
Field of study

INRIA a CCSD electronic archive server

Malleable task-graph scheduling with a practical speed-up model

Author: Marchal Loris
Simon Bertrand
Sinnen Oliver
Vivien Frédéric
Publication venue: HAL CCSD
Publication date: 01/02/2016
Field of study

Scientific workloads are often described by Directed Acyclic task Graphs.Indeed, DAGs represent both a model frequently studied in theoretical literature and the structure employed by dynamic runtime schedulers to handle HPC applications. A natural problem is then to compute a makespan-minimizing schedule of a given graph. In this paper, we are motivated by task graphs arising from multifrontal factorizations of sparsematrices and therefore work under the following practical model. We focus on malleable tasks (i.e., a single task can be allotted a time-varying number of processors) and specifically on a simple yet realistic speedup model: each task can be perfectly parallelized, but only up to a limited number of processors. We first prove that the associated decision problem of minimizing the makespan is NP-Complete. Then, we study a widely used algorithm, PropScheduling, under this practical model and propose a new strategy GreedyFilling. Even though both strategies are 2-approximations, experiments on real and synthetic data sets show that GreedyFilling achieves significantly lower makespans

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Enhancing the performance of malleable MPI applications by using performance-aware dynamic reconfiguration

Author: Carretero Jesús
Marinescu Maria-Cristina
Martín Gonzalo
Singh David E.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

The work in this paper focuses on providing malleability to MPI applications by using a novel performance-aware dynamic reconfiguration technique. This paper describes the design and implementation of Flex-MPI, an MPI library extension which can automatically monitor and predict the performance of applications, balance and redistribute the workload, and reconfigure the application at runtime by changing the number of processes. Unlike existent approaches, our reconfiguring policy is guided by user-defined performance criteria. We focus on iterative SPMD programs, a class of applications with critical mass within the scientific community. Extensive experiments show that Flex-MPI can improve the performance, parallel efficiency, and cost-efficiency of MPI programs with a minimal effort from the programmer.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness under the project TIN2013- 41350-P, Scalable Data Management Techniques for High-End Computing Systems, and EU under the COST Program Action IC1305, Network for Sustainable Ultrascale Computing (NESUS)Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC