Search CORE

58 research outputs found

A Bag-of-Tasks Scheduler Tolerant to Temporal Failures in Clouds

Author: Arantes Luciana
Drummond Lúcia Maria de A.
Sens Pierre
Teylo Luan
Publication venue
Publication date: 24/10/2018
Field of study

Cloud platforms have emerged as a prominent environment to execute high performance computing (HPC) applications providing on-demand resources as well as scalability. They usually offer different classes of Virtual Machines (VMs) which ensure different guarantees in terms of availability and volatility, provisioning the same resource through multiple pricing models. For instance, in Amazon EC2 cloud, the user pays per hour for on-demand VMs while spot VMs are unused instances available for lower price. Despite the monetary advantages, a spot VM can be terminated, stopped, or hibernated by EC2 at any moment. Using both hibernation-prone spot VMs (for cost sake) and on-demand VMs, we propose in this paper a static scheduling for HPC applications which are composed by independent tasks (bag-of-task) with deadline constraints. However, if a spot VM hibernates and it does not resume within a time which guarantees the application's deadline, a temporal failure takes place. Our scheduling, thus, aims at minimizing monetary costs of bag-of-tasks applications in EC2 cloud, respecting its deadline and avoiding temporal failures. To this end, our algorithm statically creates two scheduling maps: (i) the first one contains, for each task, its starting time and on which VM (i.e., an available spot or on-demand VM with the current lowest price) the task should execute; (ii) the second one contains, for each task allocated on a VM spot in the first map, its starting time and on which on-demand VM it should be executed to meet the application deadline in order to avoid temporal failures. The latter will be used whenever the hibernation period of a spot VM exceeds a time limit. Performance results from simulation with task execution traces, configuration of Amazon EC2 VM classes, and VMs market history confirms the effectiveness of our scheduling and that it tolerates temporal failures

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

On reducing the complexity of matrix clocks

Author: Ahuja
Barbosa
Barbosa
Barbosa
Barbosa
Chandy
Charron-Bost
Dijkstra
Dilworth
Drummond
Fidge
Fidge
Garg
Lamport
Lynch
Lúcia M.A. Drummond
Mattern
Raynal
Ruget
Sarin
Singhal
Valmir C. Barbosa
Welch
Publication venue: 'Elsevier BV'
Publication date: 23/09/2003
Field of study

Matrix clocks are a generalization of the notion of vector clocks that allows the local representation of causal precedence to reach into an asynchronous distributed computation's past with depth

x

, where

x\ge 1

is an integer. Maintaining matrix clocks correctly in a system of

n

nodes requires that everymessage be accompanied by

O(n^x)

numbers, which reflects an exponential dependency of the complexity of matrix clocks upon the desired depth

x

. We introduce a novel type of matrix clock, one that requires only

nx

numbers to be attached to each message while maintaining what for many applications may be the most significant portion of the information that the original matrix clock carries. In order to illustrate the new clock's applicability, we demonstrate its use in the monitoring of certain resource-sharing computations

arXiv.org e-Print Archive

Crossref

Multi-FedLS: a Framework for Cross-Silo Federated Learning Applications on Multi-Cloud Environments

Author: Arantes Luciana
Brum Rafaela C.
de Castro Maria Clicia Stelling
Drummond Lúcia Maria de A.
Sens Pierre
Publication venue
Publication date: 17/08/2023
Field of study

Federated Learning (FL) is a distributed Machine Learning (ML) technique that can benefit from cloud environments while preserving data privacy. We propose Multi-FedLS, a framework that manages multi-cloud resources, reducing execution time and financial costs of Cross-Silo Federated Learning applications by using preemptible VMs, cheaper than on-demand ones but that can be revoked at any time. Our framework encloses four modules: Pre-Scheduling, Initial Mapping, Fault Tolerance, and Dynamic Scheduler. This paper extends our previous work \cite{brum2022sbac} by formally describing the Multi-FedLS resource manager framework and its modules. Experiments were conducted with three Cross-Silo FL applications on CloudLab and a proof-of-concept confirms that Multi-FedLS can be executed on a multi-cloud composed by AWS and GCP, two commercial cloud providers. Results show that the problem of executing Cross-Silo FL applications in multi-cloud environments with preemptible VMs can be efficiently resolved using a mathematical formulation, fault tolerance techniques, and a simple heuristic to choose a new VM in case of revocation.Comment: In review by Journal of Parallel and Distributed Computin

arXiv.org e-Print Archive

Evaluating Execution Times and Costs of a Federated Learning Application on different Cloud Providers

Author: Arantes Luciana
Brum Rafaela,
Castro Maria,
Drummond Lúcia,
Sens Pierre
Publication venue: HAL CCSD
Publication date: 05/07/2022
Field of study

National audienceFederated Learning (FL) is a new area of distributed Machine Learning (ML) that emerged to deal with data privacy concerns. In FL, each client has access to a local private dataset. At every round, a client trains the model with its local dataset and sends the weights to a central server. The latter aggregates all client weights and then sends the final weights back to the clients. This approach is attractive in many domains as it allows multiple institutions to collaborate on an ML task without sharing their data. However, most ML models used in FL have millions of weights exchanged in each message. The messages sent between a client and the server can achieve gigabytes of size and are exchanged several times in the whole FL execution. This work presents a preliminary analysis of execution times and costs of a FL application in a multi-cloud scenario. Experiments were conducted considering executions on the Amazon Web Services, Google Cloud Provider, and also in both cloud providers at the same time

INRIA a CCSD electronic archive server

A Hibernation Aware Dynamic Scheduler for Cloud Environments

Author: Arantes Luciana
de A. Drummond Lúcia Maria
Sens Pierre
Teylo Luan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/08/2019
Field of study

International audienceNowadays, cloud platforms usually offer several types of Virtual Machines (VMs) which have different guarantees in terms of availability and volatility, provisioning the same resource through multiple pricing models. For instance, in the Amazon EC2 cloud, the user pays per hour for on-demand VMs while spot VMs are unused instances available for a lower price. Despite the monetary advantages, a spot VM can be terminated or hibernated by EC2 at any moment. In this work, we propose the Hibernation-Aware Dynamic Scheduler (HADS), to schedule applications composed of independent tasks (bag-of-tasks) with deadline constraints in both hibernation-prone spot VMs (for cost sake) and on-demand VMs. We also consider the problem of temporal failures, that occurs when a spot VM hibernates, and does not resume within a time that guarantees the application's deadline. Our dynamic scheduling approach aims at minimizing the monetary costs of bag-of-tasks applications execution, respecting its deadline even in the presence of hibernation. It is also able to avoid temporal failures, by using task migration and work-stealing techniques. Experimental results with real executions using Amazon EC2 VMs confirm the effectiveness of our scheduling when compared with on-demand VM only based approaches, in terms of monetary costs and execution times. It is also shown that our strategy can tolerate temporal failures

INRIA a CCSD electronic archive server