13 research outputs found
A Bag-of-Tasks Scheduler Tolerant to Temporal Failures in Clouds
Cloud platforms have emerged as a prominent environment to execute high
performance computing (HPC) applications providing on-demand resources as well
as scalability. They usually offer different classes of Virtual Machines (VMs)
which ensure different guarantees in terms of availability and volatility,
provisioning the same resource through multiple pricing models. For instance,
in Amazon EC2 cloud, the user pays per hour for on-demand VMs while spot VMs
are unused instances available for lower price. Despite the monetary
advantages, a spot VM can be terminated, stopped, or hibernated by EC2 at any
moment.
Using both hibernation-prone spot VMs (for cost sake) and on-demand VMs, we
propose in this paper a static scheduling for HPC applications which are
composed by independent tasks (bag-of-task) with deadline constraints. However,
if a spot VM hibernates and it does not resume within a time which guarantees
the application's deadline, a temporal failure takes place. Our scheduling,
thus, aims at minimizing monetary costs of bag-of-tasks applications in EC2
cloud, respecting its deadline and avoiding temporal failures. To this end, our
algorithm statically creates two scheduling maps: (i) the first one contains,
for each task, its starting time and on which VM (i.e., an available spot or
on-demand VM with the current lowest price) the task should execute; (ii) the
second one contains, for each task allocated on a VM spot in the first map, its
starting time and on which on-demand VM it should be executed to meet the
application deadline in order to avoid temporal failures. The latter will be
used whenever the hibernation period of a spot VM exceeds a time limit.
Performance results from simulation with task execution traces, configuration
of Amazon EC2 VM classes, and VMs market history confirms the effectiveness of
our scheduling and that it tolerates temporal failures
Multi-FedLS: a Framework for Cross-Silo Federated Learning Applications on Multi-Cloud Environments
Federated Learning (FL) is a distributed Machine Learning (ML) technique that
can benefit from cloud environments while preserving data privacy. We propose
Multi-FedLS, a framework that manages multi-cloud resources, reducing execution
time and financial costs of Cross-Silo Federated Learning applications by using
preemptible VMs, cheaper than on-demand ones but that can be revoked at any
time. Our framework encloses four modules: Pre-Scheduling, Initial Mapping,
Fault Tolerance, and Dynamic Scheduler. This paper extends our previous work
\cite{brum2022sbac} by formally describing the Multi-FedLS resource manager
framework and its modules. Experiments were conducted with three Cross-Silo FL
applications on CloudLab and a proof-of-concept confirms that Multi-FedLS can
be executed on a multi-cloud composed by AWS and GCP, two commercial cloud
providers. Results show that the problem of executing Cross-Silo FL
applications in multi-cloud environments with preemptible VMs can be
efficiently resolved using a mathematical formulation, fault tolerance
techniques, and a simple heuristic to choose a new VM in case of revocation.Comment: In review by Journal of Parallel and Distributed Computin
A Hibernation Aware Dynamic Scheduler for Cloud Environments
International audienceNowadays, cloud platforms usually offer several types of Virtual Machines (VMs) which have different guarantees in terms of availability and volatility, provisioning the same resource through multiple pricing models. For instance, in the Amazon EC2 cloud, the user pays per hour for on-demand VMs while spot VMs are unused instances available for a lower price. Despite the monetary advantages, a spot VM can be terminated or hibernated by EC2 at any moment. In this work, we propose the Hibernation-Aware Dynamic Scheduler (HADS), to schedule applications composed of independent tasks (bag-of-tasks) with deadline constraints in both hibernation-prone spot VMs (for cost sake) and on-demand VMs. We also consider the problem of temporal failures, that occurs when a spot VM hibernates, and does not resume within a time that guarantees the application's deadline. Our dynamic scheduling approach aims at minimizing the monetary costs of bag-of-tasks applications execution, respecting its deadline even in the presence of hibernation. It is also able to avoid temporal failures, by using task migration and work-stealing techniques. Experimental results with real executions using Amazon EC2 VMs confirm the effectiveness of our scheduling when compared with on-demand VM only based approaches, in terms of monetary costs and execution times. It is also shown that our strategy can tolerate temporal failures
MScheduler: Leveraging Spot Instances for High-Performance Reservoir Simulation in the Cloud
International audiencePetroleum reservoir simulation uses computer models to predict fluid flow in porous media, aiding to forecast oil production. Engineers execute numerous simulations with different geological realizations to refine the accuracy of the model. These experiments require considerable computational resources, which are not always available within the on-premises infrastructure. Commercial public cloud platforms can offer many advantages, such as virtually unlimited scalability and pay-per-use pricing. This paper introduces MSCHEDULER, a meta scheduler framework for reservoir simulations at Petrobras, a Brazilian energy company. It efficiently executes jobs in the cloud, utilizing spot Virtual Machines (VMs) to reduce costs and ensure job completion even with VM termination. Contributions include a novel methodology for reservoir simulation checkpointing, a cost-based scheduler, and an analysis of the strategy using real production jobs from Petrobras
Design and analyses of web scraping on burstable virtual machines
International audienceWeb scraping is a widely used technique for decisionâmaking, collecting, and structuring public data from the internet. As the volume of data continues to grow, the need for more efficient methods of data extraction becomes crucial. This article introduces a novel web scraping framework that utilizes Burstable virtual machines (VMs) on Amazon Web Services with the objective of reducing the monetary cost of execution while ensuring compliance with service level agreements (SLAs). To achieve this, the framework utilizes a combination of fixed and temporary Burstable VMs in a mixed cluster, which can be elastically scaled up to fulfill the SLA and scaled down to minimize monetary costs. Two strategies for handling VM allocation are proposed and evaluated: (i) a queue and SLAâbased strategy that employs queue size information and SLA criteria to determine the required number of VMs for the current scraping requests, and (ii) a creditâbased strategy that incorporates information about Burstable VM credits to effectively manage instance creation and termination. Experimental tests show that the proposed framework meets the defined SLA while achieving cost reductions of up to 74% compared to an approach that executes on fixedâsize clusters of Burstable instances