2,944 research outputs found
Timely-Throughput Optimal Coded Computing over Cloud Networks
In modern distributed computing systems, unpredictable and unreliable
infrastructures result in high variability of computing resources. Meanwhile,
there is significantly increasing demand for timely and event-driven services
with deadline constraints. Motivated by measurements over Amazon EC2 clusters,
we consider a two-state Markov model for variability of computing speed in
cloud networks. In this model, each worker can be either in a good state or a
bad state in terms of the computation speed, and the transition between these
states is modeled as a Markov chain which is unknown to the scheduler. We then
consider a Coded Computing framework, in which the data is possibly encoded and
stored at the worker nodes in order to provide robustness against nodes that
may be in a bad state. With timely computation requests submitted to the system
with computation deadlines, our goal is to design the optimal computation-load
allocation scheme and the optimal data encoding scheme that maximize the timely
computation throughput (i.e, the average number of computation tasks that are
accomplished before their deadline). Our main result is the development of a
dynamic computation strategy called Lagrange Estimate-and Allocate (LEA)
strategy, which achieves the optimal timely computation throughput. It is shown
that compared to the static allocation strategy, LEA increases the timely
computation throughput by 1.4X - 17.5X in various scenarios via simulations and
by 1.27X - 6.5X in experiments over Amazon EC2 clustersComment: to appear in MobiHoc 201
Edge Computing in the Dark: Leveraging Contextual-Combinatorial Bandit and Coded Computing
With recent advancements in edge computing capabilities, there has been a
significant increase in utilizing the edge cloud for event-driven and
time-sensitive computations. However, large-scale edge computing networks can
suffer substantially from unpredictable and unreliable computing resources
which can result in high variability of service quality. Thus, it is crucial to
design efficient task scheduling policies that guarantee quality of service and
the timeliness of computation queries. In this paper, we study the problem of
computation offloading over unknown edge cloud networks with a sequence of
timely computation jobs. Motivated by the MapReduce computation paradigm, we
assume each computation job can be partitioned to smaller Map functions that
are processed at the edge, and the Reduce function is computed at the user
after the Map results are collected from the edge nodes. We model the service
quality (success probability of returning result back to the user within
deadline) of each edge device as function of context (collection of factors
that affect edge devices). The user decides the computations to offload to each
device with the goal of receiving a recoverable set of computation results in
the given deadline. Our goal is to design an efficient edge computing policy in
the dark without the knowledge of the context or computation capabilities of
each device. By leveraging the \emph{coded computing} framework in order to
tackle failures or stragglers in computation, we formulate this problem using
contextual-combinatorial multi-armed bandits (CC-MAB), and aim to maximize the
cumulative expected reward. We propose an online learning policy called
\emph{online coded edge computing policy}, which provably achieves
asymptotically-optimal performance in terms of regret loss compared with the
optimal offline policy for the proposed CC-MAB problem
Datacenter Traffic Control: Understanding Techniques and Trade-offs
Datacenters provide cost-effective and flexible access to scalable compute
and storage resources necessary for today's cloud computing needs. A typical
datacenter is made up of thousands of servers connected with a large network
and usually managed by one operator. To provide quality access to the variety
of applications and services hosted on datacenters and maximize performance, it
deems necessary to use datacenter networks effectively and efficiently.
Datacenter traffic is often a mix of several classes with different priorities
and requirements. This includes user-generated interactive traffic, traffic
with deadlines, and long-running traffic. To this end, custom transport
protocols and traffic management techniques have been developed to improve
datacenter network performance.
In this tutorial paper, we review the general architecture of datacenter
networks, various topologies proposed for them, their traffic properties,
general traffic control challenges in datacenters and general traffic control
objectives. The purpose of this paper is to bring out the important
characteristics of traffic control in datacenters and not to survey all
existing solutions (as it is virtually impossible due to massive body of
existing research). We hope to provide readers with a wide range of options and
factors while considering a variety of traffic control mechanisms. We discuss
various characteristics of datacenter traffic control including management
schemes, transmission control, traffic shaping, prioritization, load balancing,
multipathing, and traffic scheduling. Next, we point to several open challenges
as well as new and interesting networking paradigms. At the end of this paper,
we briefly review inter-datacenter networks that connect geographically
dispersed datacenters which have been receiving increasing attention recently
and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial
DCCast: Efficient Point to Multipoint Transfers Across Datacenters
Using multiple datacenters allows for higher availability, load balancing and
reduced latency to customers of cloud services. To distribute multiple copies
of data, cloud providers depend on inter-datacenter WANs that ought to be used
efficiently considering their limited capacity and the ever-increasing data
demands. In this paper, we focus on applications that transfer objects from one
datacenter to several datacenters over dedicated inter-datacenter networks. We
present DCCast, a centralized Point to Multi-Point (P2MP) algorithm that uses
forwarding trees to efficiently deliver an object from a source datacenter to
required destination datacenters. With low computational overhead, DCCast
selects forwarding trees that minimize bandwidth usage and balance load across
all links. With simulation experiments on Google's GScale network, we show that
DCCast can reduce total bandwidth usage and tail Transfer Completion Times
(TCT) by up to compared to delivering the same objects via independent
point-to-point (P2P) transfers.Comment: 9th USENIX Workshop on Hot Topics in Cloud Computing,
https://www.usenix.org/conference/hotcloud17/program/presentation/noormohammadpou
- …