11 research outputs found

    Timely-Throughput Optimal Coded Computing over Cloud Networks

    Get PDF
    In modern distributed computing systems, unpredictable and unreliable infrastructures result in high variability of computing resources. Meanwhile, there is significantly increasing demand for timely and event-driven services with deadline constraints. Motivated by measurements over Amazon EC2 clusters, we consider a two-state Markov model for variability of computing speed in cloud networks. In this model, each worker can be either in a good state or a bad state in terms of the computation speed, and the transition between these states is modeled as a Markov chain which is unknown to the scheduler. We then consider a Coded Computing framework, in which the data is possibly encoded and stored at the worker nodes in order to provide robustness against nodes that may be in a bad state. With timely computation requests submitted to the system with computation deadlines, our goal is to design the optimal computation-load allocation scheme and the optimal data encoding scheme that maximize the timely computation throughput (i.e, the average number of computation tasks that are accomplished before their deadline). Our main result is the development of a dynamic computation strategy called Lagrange Estimate-and Allocate (LEA) strategy, which achieves the optimal timely computation throughput. It is shown that compared to the static allocation strategy, LEA increases the timely computation throughput by 1.4X - 17.5X in various scenarios via simulations and by 1.27X - 6.5X in experiments over Amazon EC2 clustersComment: to appear in MobiHoc 201

    Dynamic Cloud Network Control under Reconfiguration Delay and Cost

    Full text link
    Network virtualization and programmability allow operators to deploy a wide range of services over a common physical infrastructure and elastically allocate cloud and network resources according to changing requirements. While the elastic reconfiguration of virtual resources enables dynamically scaling capacity in order to support service demands with minimal operational cost, reconfiguration operations make resources unavailable during a given time period and may incur additional cost. In this paper, we address the dynamic cloud network control problem under non-negligible reconfiguration delay and cost. We show that while the capacity region remains unchanged regardless of the reconfiguration delay/cost values, a reconfiguration-agnostic policy may fail to guarantee throughput-optimality and minimum cost under nonzero reconfiguration delay/cost. We then present an adaptive dynamic cloud network control policy that allows network nodes to make local flow scheduling and resource allocation decisions while controlling the frequency of reconfiguration in order to support any input rate in the capacity region and achieve arbitrarily close to minimum cost for any finite reconfiguration delay/cost values.Comment: 15 pages, 7 figure

    Learning to detect an oddball target with observations from an exponential family

    Full text link
    The problem of detecting an odd arm from a set of K arms of a multi-armed bandit, with fixed confidence, is studied in a sequential decision-making scenario. Each arm's signal follows a distribution from a vector exponential family. All arms have the same parameters except the odd arm. The actual parameters of the odd and non-odd arms are unknown to the decision maker. Further, the decision maker incurs a cost for switching from one arm to another. This is a sequential decision making problem where the decision maker gets only a limited view of the true state of nature at each stage, but can control his view by choosing the arm to observe at each stage. Of interest are policies that satisfy a given constraint on the probability of false detection. An information-theoretic lower bound on the total cost (expected time for a reliable decision plus total switching cost) is first identified, and a variation on a sequential policy based on the generalised likelihood ratio statistic is then studied. Thanks to the vector exponential family assumption, the signal processing in this policy at each stage turns out to be very simple, in that the associated conjugate prior enables easy updates of the posterior distribution of the model parameters. The policy, with a suitable threshold, is shown to satisfy the given constraint on the probability of false detection. Further, the proposed policy is asymptotically optimal in terms of the total cost among all policies that satisfy the constraint on the probability of false detection

    Learning Algorithms for Minimizing Queue Length Regret

    Full text link
    We consider a system consisting of a single transmitter/receiver pair and NN channels over which they may communicate. Packets randomly arrive to the transmitter's queue and wait to be successfully sent to the receiver. The transmitter may attempt a frame transmission on one channel at a time, where each frame includes a packet if one is in the queue. For each channel, an attempted transmission is successful with an unknown probability. The transmitter's objective is to quickly identify the best channel to minimize the number of packets in the queue over TT time slots. To analyze system performance, we introduce queue length regret, which is the expected difference between the total queue length of a learning policy and a controller that knows the rates, a priori. One approach to designing a transmission policy would be to apply algorithms from the literature that solve the closely-related stochastic multi-armed bandit problem. These policies would focus on maximizing the number of successful frame transmissions over time. However, we show that these methods have Ω(logT)\Omega(\log{T}) queue length regret. On the other hand, we show that there exists a set of queue-length based policies that can obtain order optimal O(1)O(1) queue length regret. We use our theoretical analysis to devise heuristic methods that are shown to perform well in simulation.Comment: 28 Pages, 11 figure

    Sequential Multi-hypothesis Testing in Multi-armed Bandit Problems:An Approach for Asymptotic Optimality

    Full text link
    We consider a multi-hypothesis testing problem involving a K-armed bandit. Each arm's signal follows a distribution from a vector exponential family. The actual parameters of the arms are unknown to the decision maker. The decision maker incurs a delay cost for delay until a decision and a switching cost whenever he switches from one arm to another. His goal is to minimise the overall cost until a decision is reached on the true hypothesis. Of interest are policies that satisfy a given constraint on the probability of false detection. This is a sequential decision making problem where the decision maker gets only a limited view of the true state of nature at each stage, but can control his view by choosing the arm to observe at each stage. An information-theoretic lower bound on the total cost (expected time for a reliable decision plus total switching cost) is first identified, and a variation on a sequential policy based on the generalised likelihood ratio statistic is then studied. Due to the vector exponential family assumption, the signal processing at each stage is simple; the associated conjugate prior distribution on the unknown model parameters enables easy updates of the posterior distribution. The proposed policy, with a suitable threshold for stopping, is shown to satisfy the given constraint on the probability of false detection. Under a continuous selection assumption, the policy is also shown to be asymptotically optimal in terms of the total cost among all policies that satisfy the constraint on the probability of false detection

    Augmenting Max-Weight with Explicit Learning for Wireless Scheduling with Switching Costs

    No full text
    In small-cell wireless networks where users are connected to multiple base stations (BSs), it is often advantageous to opportunistically switch off a subset of BSs to minimize energy costs. We consider two types of energy cost: (i) the cost of maintaining a BS in the active state, and (ii) the cost of switching a BS from the active state to inactive state. The problem is to operate the network at the lowest possible energy cost (sum of activation and switching costs) subject to queue stability. In this setting, the traditional approach - a Max-Weight algorithm along with a Lyapunov-based stability argument - does not suffice to show queue stability, essentially due to the temporal co-evolution between channel scheduling and the BS activation decisions induced by the switching cost. Instead, we develop a learning and BS activation algorithm with slow temporal dynamics, and a Max-Weight based channel scheduler that has fast temporal dynamics. We show using convergence of time-inhomogeneous Markov chains, that the co-evolving dynamics of learning, BS activation and queue lengths lead to near optimal average energy costs along with queue stability
    corecore