1,672 research outputs found
Reliable Provisioning of Spot Instances for Compute-intensive Applications
Cloud computing providers are now offering their unused resources for leasing
in the spot market, which has been considered the first step towards a
full-fledged market economy for computational resources. Spot instances are
virtual machines (VMs) available at lower prices than their standard on-demand
counterparts. These VMs will run for as long as the current price is lower than
the maximum bid price users are willing to pay per hour. Spot instances have
been increasingly used for executing compute-intensive applications. In spite
of an apparent economical advantage, due to an intermittent nature of biddable
resources, application execution times may be prolonged or they may not finish
at all. This paper proposes a resource allocation strategy that addresses the
problem of running compute-intensive jobs on a pool of intermittent virtual
machines, while also aiming to run applications in a fast and economical way.
To mitigate potential unavailability periods, a multifaceted fault-aware
resource provisioning policy is proposed. Our solution employs price and
runtime estimation mechanisms, as well as three fault tolerance techniques,
namely checkpointing, task duplication and migration. We evaluate our
strategies using trace-driven simulations, which take as input real price
variation traces, as well as an application trace from the Parallel Workload
Archive. Our results demonstrate the effectiveness of executing applications on
spot instances, respecting QoS constraints, despite occasional failures.Comment: 8 pages, 4 figure
A Dual Digraph Approach for Leaderless Atomic Broadcast (Extended Version)
Many distributed systems work on a common shared state; in such systems,
distributed agreement is necessary for consistency. With an increasing number
of servers, these systems become more susceptible to single-server failures,
increasing the relevance of fault-tolerance. Atomic broadcast enables
fault-tolerant distributed agreement, yet it is costly to solve. Most practical
algorithms entail linear work per broadcast message. AllConcur -- a leaderless
approach -- reduces the work, by connecting the servers via a sparse resilient
overlay network; yet, this resiliency entails redundancy, limiting the
reduction of work. In this paper, we propose AllConcur+, an atomic broadcast
algorithm that lifts this limitation: During intervals with no failures, it
achieves minimal work by using a redundancy-free overlay network. When failures
do occur, it automatically recovers by switching to a resilient overlay
network. In our performance evaluation of non-failure scenarios, AllConcur+
achieves comparable throughput to AllGather -- a non-fault-tolerant distributed
agreement algorithm -- and outperforms AllConcur, LCR and Libpaxos both in
terms of throughput and latency. Furthermore, our evaluation of failure
scenarios shows that AllConcur+'s expected performance is robust with regard to
occasional failures. Thus, for realistic use cases, leveraging redundancy-free
distributed agreement during intervals with no failures improves performance
significantly.Comment: Overview: 24 pages, 6 sections, 3 appendices, 8 figures, 3 tables.
Modifications from previous version: extended the evaluation of AllConcur+
with a simulation of a multiple datacenters deploymen
Datacenter Traffic Control: Understanding Techniques and Trade-offs
Datacenters provide cost-effective and flexible access to scalable compute
and storage resources necessary for today's cloud computing needs. A typical
datacenter is made up of thousands of servers connected with a large network
and usually managed by one operator. To provide quality access to the variety
of applications and services hosted on datacenters and maximize performance, it
deems necessary to use datacenter networks effectively and efficiently.
Datacenter traffic is often a mix of several classes with different priorities
and requirements. This includes user-generated interactive traffic, traffic
with deadlines, and long-running traffic. To this end, custom transport
protocols and traffic management techniques have been developed to improve
datacenter network performance.
In this tutorial paper, we review the general architecture of datacenter
networks, various topologies proposed for them, their traffic properties,
general traffic control challenges in datacenters and general traffic control
objectives. The purpose of this paper is to bring out the important
characteristics of traffic control in datacenters and not to survey all
existing solutions (as it is virtually impossible due to massive body of
existing research). We hope to provide readers with a wide range of options and
factors while considering a variety of traffic control mechanisms. We discuss
various characteristics of datacenter traffic control including management
schemes, transmission control, traffic shaping, prioritization, load balancing,
multipathing, and traffic scheduling. Next, we point to several open challenges
as well as new and interesting networking paradigms. At the end of this paper,
we briefly review inter-datacenter networks that connect geographically
dispersed datacenters which have been receiving increasing attention recently
and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial
- …