45,588 research outputs found
Timely-Throughput Optimal Coded Computing over Cloud Networks
In modern distributed computing systems, unpredictable and unreliable
infrastructures result in high variability of computing resources. Meanwhile,
there is significantly increasing demand for timely and event-driven services
with deadline constraints. Motivated by measurements over Amazon EC2 clusters,
we consider a two-state Markov model for variability of computing speed in
cloud networks. In this model, each worker can be either in a good state or a
bad state in terms of the computation speed, and the transition between these
states is modeled as a Markov chain which is unknown to the scheduler. We then
consider a Coded Computing framework, in which the data is possibly encoded and
stored at the worker nodes in order to provide robustness against nodes that
may be in a bad state. With timely computation requests submitted to the system
with computation deadlines, our goal is to design the optimal computation-load
allocation scheme and the optimal data encoding scheme that maximize the timely
computation throughput (i.e, the average number of computation tasks that are
accomplished before their deadline). Our main result is the development of a
dynamic computation strategy called Lagrange Estimate-and Allocate (LEA)
strategy, which achieves the optimal timely computation throughput. It is shown
that compared to the static allocation strategy, LEA increases the timely
computation throughput by 1.4X - 17.5X in various scenarios via simulations and
by 1.27X - 6.5X in experiments over Amazon EC2 clustersComment: to appear in MobiHoc 201
Coded Distributed Tracking
We consider the problem of tracking the state of a process that evolves over
time in a distributed setting, with multiple observers each observing parts of
the state, which is a fundamental information processing problem with a wide
range of applications. We propose a cloud-assisted scheme where the tracking is
performed over the cloud. In particular, to provide timely and accurate
updates, and alleviate the straggler problem of cloud computing, we propose a
coded distributed computing approach where coded observations are distributed
over multiple workers. The proposed scheme is based on a coded version of the
Kalman filter that operates on data encoded with an erasure correcting code,
such that the state can be estimated from partial updates computed by a subset
of the workers. We apply the proposed scheme to the problem of tracking
multiple vehicles. We show that replication achieves significantly higher
accuracy than the corresponding uncoded scheme. The use of maximum distance
separable (MDS) codes further improves accuracy for larger update intervals. In
both cases, the proposed scheme approaches the accuracy of an ideal centralized
scheme when the update interval is large enough. Finally, we observe a
trade-off between age-of-information and estimation accuracy for MDS codes.Comment: Accepted for publication at IEEE GLOBECOM 201
Coded Distributed Tracking
We consider the problem of tracking the state of a process that evolves over time in a distributed setting, with multiple observers each observing parts of the state, which is a fundamental information processing problem with a wide range of applications. We propose a cloud-assisted scheme where the tracking is performed over the cloud. In particular, to provide timely and accurate updates, and alleviate the straggler problem of cloud computing, we propose a coded distributed computing approach where coded observations are distributed over multiple workers. The proposed scheme is based on a coded version of the Kalman filter that operates on data encoded with an erasure correcting code, such that the state can be estimated from partial updates computed by a subset of the workers. We apply the proposed scheme to the problem of tracking multiple vehicles. We show that replication achieves significantly higher accuracy than the corresponding uncoded scheme. The use of maximum distance separable (MDS) codes further improves accuracy for larger update intervals. In both cases, the proposed scheme approaches the accuracy of an ideal centralized scheme when the update interval is large enough. Finally, we observe a trade- off between age-of-information and estimation accuracy for MDS codes
Reliable and timely event notification for publish/subscribe services over the internet
The publish/subscribe paradigm is gaining attention for the development of several applications in wide area networks (WANs) due to its intrinsic time, space, and synchronization decoupling properties that meet the scalability and asynchrony requirements of those applications. However, while the communication in a WAN may be affected by the unpredictable behavior of the network, with messages that can be dropped or delayed, existing publish/subscribe solutions pay just a little attention to addressing these issues. On the contrary, applications such as business intelligence, critical infrastructures, and financial services require delivery guarantees with strict temporal deadlines. In this paper, we propose a framework that enforces both reliability and timeliness for publish/subscribe services over WAN. Specifically, we combine two different approaches: gossiping, to retrieve missing packets in case of incomplete information, and network coding, to reduce the number of retransmissions and, consequently, the latency. We provide an analytical model that describes the information recovery capabilities of our algorithm and a simulation-based study, taking into account a real workload from the Air Traffic Control domain, which evidences how the proposed solution is able to ensure reliable event notification over a WAN within a reasonable bounded time window. © 2013 IEEE
Edge Computing in the Dark: Leveraging Contextual-Combinatorial Bandit and Coded Computing
With recent advancements in edge computing capabilities, there has been a
significant increase in utilizing the edge cloud for event-driven and
time-sensitive computations. However, large-scale edge computing networks can
suffer substantially from unpredictable and unreliable computing resources
which can result in high variability of service quality. Thus, it is crucial to
design efficient task scheduling policies that guarantee quality of service and
the timeliness of computation queries. In this paper, we study the problem of
computation offloading over unknown edge cloud networks with a sequence of
timely computation jobs. Motivated by the MapReduce computation paradigm, we
assume each computation job can be partitioned to smaller Map functions that
are processed at the edge, and the Reduce function is computed at the user
after the Map results are collected from the edge nodes. We model the service
quality (success probability of returning result back to the user within
deadline) of each edge device as function of context (collection of factors
that affect edge devices). The user decides the computations to offload to each
device with the goal of receiving a recoverable set of computation results in
the given deadline. Our goal is to design an efficient edge computing policy in
the dark without the knowledge of the context or computation capabilities of
each device. By leveraging the \emph{coded computing} framework in order to
tackle failures or stragglers in computation, we formulate this problem using
contextual-combinatorial multi-armed bandits (CC-MAB), and aim to maximize the
cumulative expected reward. We propose an online learning policy called
\emph{online coded edge computing policy}, which provably achieves
asymptotically-optimal performance in terms of regret loss compared with the
optimal offline policy for the proposed CC-MAB problem
- …