64 research outputs found

    Asynchronous Load Balancing and Auto-scaling: Mean-Field Limit and Optimal Design

    Full text link
    We introduce a Markovian framework for load balancing where classical algorithms such as Power-of-dd are combined with asynchronous auto-scaling features. These allow the net service capacity to scale up or down in response to the current load within the same timescale of job dynamics. This is inspired by serverless frameworks such as Knative, used among others by Google Cloud Run, where servers are software functions that can be flexibly instantiated in milliseconds according to user-defined scaling rules. In this context, load balancing and auto-scaling are employed together to optimize both user-perceived delay performance and energy consumption. In the literature, these mechanisms are synchronous or rely on a central queue. The architectural novelty of our work is to consider an asynchronous and decentralized system, as in Knative, which takes scalability to the next level. Under a general assumption on the auto-scaling process, we prove a mean-field limit theorem that provides an accurate approximation for the system dynamics when the mean demand and nominal service capacity grow large in proportion. We characterize the fixed points of the mean-field limit model and provide a simple condition telling whether or not all the available servers need to be turned on to handle the incoming demand. Then, we investigate how to design optimal auto-scaling rules and find a general condition able to drive the mean-field dynamics to delay and relative energy optimality, a situation where the user-perceived delay and the relative energy wastage induced by idle servers vanish. The proposed optimality condition suggests to scale up capacity if and only if the mean demand exceeds the overall rate at which servers become idle and active. This yields the definition of tractable optimization frameworks to trade off between energy and performance, which we show as an application of our work

    Closed queueing networks under congestion: non-bottleneck independence and bottleneck convergence

    Get PDF
    We analyze the behavior of closed product-form queueing networks when the number of customers grows to infinity and remains proportionate on each route (or class). First, we focus on the stationary behavior and prove the conjecture that the stationary distribution at non-bottleneck queues converges weakly to the stationary distribution of an ergodic, open product-form queueing network. This open network is obtained by replacing bottleneck queues with per-route Poissonian sources whose rates are determined by the solution of a strictly concave optimization problem. Then, we focus on the transient behavior of the network and use fluid limits to prove that the amount of fluid, or customers, on each route eventually concentrates on the bottleneck queues only, and that the long-term proportions of fluid in each route and in each queue solve the dual of the concave optimization problem that determines the throughputs of the previous open network.Comment: 22 page

    Asymptotically Optimal Size-Interval Task Assignments

    Get PDF
    International audienceSize-based routing provides robust strategies to improve the performance of computer and communication systems with highly variable workloads because it is able to isolate small jobs from large ones in a static manner. The basic idea is that each server is assigned all jobs whose sizes belong to a distinct and continuous interval. In the literature, dispatching rules of this type are referred to as SITA (Size Interval Task Assignment) policies. Though their evident benefits, the problem of finding a SITA policy that minimizes the overall mean (steady-state) waiting time is known to be intractable. In particular it is not clear when it is preferable to balance or unbalance server loads and, in the latter case, how. In this paper, we provide an answer to these questions in the celebrated limiting regime where the system capacity grows linearly with the system demand to infinity. Within this framework, we prove that the minimum mean waiting time achievable by a SITA policy necessarily converges to the mean waiting time achieved by SITA-E, the SITA policy that equalizes server loads, provided that servers are homogeneous. However, within the set of SITA policies we also show that SITA-E can perform arbitrarily bad if servers are heterogeneous. In this case we prove that there exist exactly C! asymptotically optimal policies, where C denotes the number of server types, and all of them are linked to the solution of a single strictly convex optimization problem. It turns out that the mean waiting time achieved by any of such asymptotically optimal policies does not depend on how job-size intervals are mapped to servers. Our theoretical results are validated by numerical simulations with respect to realistic parameters and suggest that the above insights are also accurate in small systems composed of a few servers, i.e., ten

    On the Price of Anarchy and the Optimal Routing of Parallel non-Observable Queues

    Get PDF
    We consider a network of parallel, non-observable queues and analyze the ``price of anarchy'', an index measuring the worst-case performance loss of a decentralized system with respect to its centralized counterpart. Our analysis is undertaken from the new point of view where the router has the memory of previous dispatching choices, which significantly complicates the nature of the problem. In the limiting regime where the demands proportionally grow with the network capacity, we provide a tight lower bound on the socially-optimal response time and a tight upper bound on the price of anarchy by means of convex programming. Then, we exploit this result to show, by simulation, that the billiard routing scheme yields a response time which is remarkably close to our lower bound, implying that billiards minimize response time. To study the added-value of non-Bernoulli routers, we introduce the ``price of forgetting'' and prove that it is bounded from above by two, which is tight in heavy-traffic. Finally, other structural properties are derived numerically for the price of forgetting. These claim that the benefit of having memory in the router is independent of the network size and heterogeneity, while monotonically depending on the network load only. These properties yield simple product-forms well-approximating the socially-optimal response time

    Performance Analysis of Work Stealing for Streaming Systems and Optimizations

    Get PDF
    This paper studies the performance of parallel stream computations on a multiprocessor architecture using a work-stealing strategy. Incoming tasks are split in a number of jobs allocated to the processors and whenever a processor becomes idle, it steals a fraction (typically half) of the jobs from a busy processor. We propose a new model for the performance analysis of such parallel stream computations. This model takes into account both the algorithmic behavior of work-stealing as well as the hardware constraints of the architecture (synchronizations and bus contentions). Then, we show that this model can be solved using a recursive formula. We further show that this recursive analytical approach is more efficient than the classic global balance technique. However, our method remains computationally impractical when tasks split in many jobs or when many processors are considered. Therefore, bounds are proposed to efficiently solve very large models in an approximate manner. Experimental results show that these bounds are tight and robust so that they immediately find applications in optimization studies. An example is provided for the optimization of energy consumption with performance constraints. In addition, our framework is flexible and we show how it adapts to deal with several stealing strategies

    Ergodic transition in a simple model of the continuous double auction

    Get PDF
    We study a phenomenological model for the continuous double auction, whose aggregate order process is equivalent to two independent M/M/1 queues. The continuous double auction defines a continuous-time random walk for trade prices. The conditions for ergodicity of the auction are derived and, as a consequence, three possible regimes in the behavior of prices and logarithmic returns are observed. In the ergodic regime, prices are unstable and one can observe a heteroskedastic behavior in the logarithmic returns. On the contrary, non-ergodicity triggers stability of prices, even if two different regimes can be seen

    Optimization Techniques Applied to Railway Systems

    Get PDF
    Abstract We study the problem of minimizing the usage of electrical energy in rail systems. The aim is to determine a train speed profile that minimizes energy consumption given a time schedule. In collaboration with an industrial partner, we propose a new model that is more complete than the ones existing in the literature, in particular the model takes into account several non-linearities that emerge in a real setting. First, we formulate our problem within the framework of optimal control where our solution approach consists in discretizing the control problem and solving numerically the finite-dimensional optimization problem that is obtained out of the discretization. To do so we develop a platform based on AMPL and Ipopt that allows a fast and accurate solution. We then reformulate the problem within the framework of Dynamic Programming which allows to get the optimal action for any initial point. Solving the Dynamic Programming is very time consuming and we develop a C++ code to solve some simple examples. We finally implement our solution in a train simulator in order to estimate the energy reduction obtained in several real examples provided by INGETEAM S.A. The results obtained by the simulator indicate that the energy reduction is between 8% and 25%. We thus conclude that our first approach represents a scheme that could be implemented by industry to solve real-life cases

    Combining Size-Based Load Balancing with Round-Robin for Scalable Low Latency

    Get PDF
    International audienc
    • …
    corecore