607 research outputs found

    Robust Queueing Theory

    Get PDF
    We propose an alternative approach for studying queues based on robust optimization. We model the uncertainty in the arrivals and services via polyhedral uncertainty sets, which are inspired from the limit laws of probability. Using the generalized central limit theorem, this framework allows us to model heavy-tailed behavior characterized by bursts of rapidly occurring arrivals and long service times. We take a worst-case approach and obtain closed-form upper bounds on the system time in a multi-server queue. These expressions provide qualitative insights that mirror the conclusions obtained in the probabilistic setting for light-tailed arrivals and services and generalize them to the case of heavy-tailed behavior. We also develop a calculus for analyzing a network of queues based on the following key principles: (a) the departure from a queue, (b) the superposition, and (c) the thinning of arrival processes have the same uncertainty set representation as the original arrival processes. The proposed approach (a) yields results with error percentages in single digits relative to simulation, and (b) is to a large extent insensitive to the number of servers per queue, network size, degree of feedback, and traffic intensity; it is somewhat sensitive to the degree of diversity of external arrival distributions in the network

    Control-theoretic Analysis of Admission Control Mechanisms for Web Server Systems

    Get PDF
    Web sites are exposed to high rates of incoming requests. The servers may become overloaded during temporary traffic peaks when more requests arrive than the server is designed for. An admission control mechanism rejects some requests whenever the arriving traffic is too high and thereby maintains an acceptable load in the system. This paper presents how admission control mechanisms can be designed with a combination of queueing theory and control theory. In this paper we model an Apache web server as a GI/G/1-system and then design a PI-controller, commonly used in automatic control, for the server. The controller has been implemented as a module inside the Apache source code. Measurements from the laboratory setup show how robust the implemented controller is, and how it corresponds to the results from the theoretical analysis

    Resource Management in Computing Systems

    Get PDF
    Resource management is an essential building block of any modern computer and communication network. In this thesis, the results of our research in the following two tracks are summarized in four papers. The first track includes three papers and covers modeling, prediction and control for multi-tier computing systems. In the first paper, a NARX-based multi-step-ahead response time predictor for single server queuing systems is presented which can be applied to CPU-constrained computing systems. The second paper introduces a NARX-based multi-step-ahead query response time predictor for database servers. Both mentioned predictors can predict the dynamics of response times in the whole operation range particularly in high load scenarios without changes having to be applied to the current protocols and operating systems. In the third paper, queuing theory is used to model the dynamics of a database server. Several heuristics are presented to tune the parameters of the proposed model to the measured data from the database. Furthermore, an admission controller is presented, and its parameters are tuned to control the response time of queries which are sent to the database to stay below a predefined reference value.The second track includes one paper, covering a problem formulation and optimal solution for a content replication problem in Telecom operator's content delivery networks (Telco-CDNs). The problem is formulated in the form of an integer programming problem trying to minimize the communication delay and cost according to several constraints such as limited content replication budget, limited storage size and limited downlink bandwidth of each regional content server. The solution of this problem is a performance bound for any distributed content replication algorithm which addresses the same problem

    Dynamical Modeling of Cloud Applications for Runtime Performance Management

    Get PDF
    Cloud computing has quickly grown to become an essential component in many modern-day software applications. It allows consumers, such as a provider of some web service, to quickly and on demand obtain the necessary computational resources to run their applications. It is desirable for these service providers to keep the running cost of their cloud application low while adhering to various performance constraints. This is made difficult due to the dynamics imposed by, e.g., resource contentions or changing arrival rate of users, and the fact that there exist multiple ways of influencing the performance of a running cloud application. To facilitate decision making in this environment, performance models can be introduced that relate the workload and different actions to important performance metrics.In this thesis, such performance models of cloud applications are studied. In particular, we focus on modeling using queueing theory and on the fluid model for approximating the often intractable dynamics of the queue lengths. First, existing results on how the fluid model can be obtained from the mean-field approximation of a closed queueing network are simplified and extended to allow for mixed networks. The queues are allowed to follow the processor sharing or delay disciplines, and can have multiple classes with phase-type service times. An improvement to this fluid model is then presented to increase accuracy when the \emph{system size}, i.e., number of servers, initial population, and arrival rate, is small. Furthermore, a closed-form approximation of the response time CDF is presented. The methods are tested in a series of simulation experiments and shown to be accurate. This mean-field fluid model is then used to derive a general fluid model for microservices with interservice delays. The model is shown to be completely extractable at runtime in a distributed fashion. It is further evaluated on a simple microservice application and found to accurately predict important performance metrics in most cases. Furthermore, a method is devised to reduce the cost of a running application by tuning load balancing parameters between replicas. The method is built on gradient stepping by applying automatic differentiation to the fluid model. This allows for arbitrarily defined cost functions and constraints, most notably including different response time percentiles. The method is tested on a simple application distributed over multiple computing clusters and is shown to reduce costs while adhering to percentile constraints. Finally, modeling of request cloning is studied using the novel concept of synchronized service. This allows certain forms of cloning over servers, each modeled with a single queue, to be equivalently expressed as one single queue. The concept is very general regarding the involved queueing discipline and distributions, but instead introduces new, less realistic assumptions. How the equivalent queue model is affected by relaxing these assumptions is studied considering the processor sharing discipline, and an extension to enable modeling of speculative execution is made. In a simulation campaign, it is shown that these relaxations only has a minor effect in certain cases

    Using Hybrid Simulation/Analytical Queueing Networks to Capacitate USAF Air Mobility Command Passenger Terminals

    Get PDF
    The objective of this study is to model operations at an airport passenger terminal to determine the optimal service capacities at each station given estimated passenger flow patterns and service rates. The central formulation is an open Jackson queueing network that can be applied to any USAF Air Mobility Command (AMC) terminal regardless of passenger type mix and flow data. A complete methodology for analyzing passenger flows and queue performance of a single flight is produced and then embedded in a framework to analyze the same for multiple departing flights. Queueing network analysis (QNA) is used because no special software license or methodological training is required, results are obtained in a spreadsheet model with computational response times that are instantaneous, and data requirements are substantially reduced compared with discrete-event simulation (DES). However, because of the assumptions of QNA, additional research contributions were required. First, arrivals of passengers are time-dependent, not steady-state. Theoretical results for time-dependent queue networks in the literature are limited, so a method for using DES to adjust for arrival time-dependency in QNA is developed. Second, beyond quality of service in the network, a key performance measure is the percentage of passengers who do not clear the system by a fixed time. To populate the QNA mean value system sojourn time, DES is used to develop a generic sojourn time probability distribution. All DES computations have been pre-calculated off-line in this thesis and complete a hybrid DES/QNA analytical model. The model is exercised and validated through analysis of the facility at Hickam AFB, which is currently undergoing redesign. For larger flights, adding a server at the high-utilization queues, namely the USDA inspection and security screening stations, halve system congestion and dramatically increase throughput

    Internet performance modeling: the state of the art at the turn of the century

    Get PDF
    Seemingly overnight, the Internet has gone from an academic experiment to a worldwide information matrix. Along the way, computer scientists have come to realize that understanding the performance of the Internet is a remarkably challenging and subtle problem. This challenge is all the more important because of the increasingly significant role the Internet has come to play in society. To take stock of the field of Internet performance modeling, the authors organized a workshop at Schloß Dagstuhl. This paper summarizes the results of discussions, both plenary and in small groups, that took place during the four-day workshop. It identifies successes, points to areas where more work is needed, and poses “Grand Challenges” for the performance evaluation community with respect to the Internet

    The effect of workload dependence in systems: Experimental evaluation, analytic models, and policy development

    Get PDF
    This dissertation presents an analysis of performance effects of burstiness (formalized by the autocorrelation function) in multi-tiered systems via a 3-pronged approach, i.e., experimental measurements, analytic models, and policy development. This analysis considers (a) systems with finite buffers (e.g., systems with admission control that effectively operate as closed systems) and (b) systems with infinite buffers (i.e., systems that operate as open systems).;For multi-tiered systems with a finite buffer size, experimental measurements show that if autocorrelation exists in any of the tiers in a multi-tiered system, then autocorrelation propagates to all tiers of the system. The presence of autocorrelated flows in all tiers significantly degrades performance. Workload characterization in a real experimental environment driven by the TPC-W benchmark confirms the existence of autocorrelated flows, which originate from the autocorrelated service process of one of the tiers. A simple model is devised that captures the observed behavior. The model is in excellent agreement with experimental measurements and captures the propagation of autocorrelation in the multi-tiered system as well as the resulting performance trends.;For systems with an infinite buffer size, this study focuses on analytic models by proposing and comparing two families of approximations for the departure process of a BMAP/MAP/1 queue that admits batch correlated flows, and whose service time process may be autocorrelated. One approximation is based on the ETAQA methodology for the solution of M/G/1-type processes and the other arises from lumpability rules. Formal proofs are provided: both approximations preserve the marginal distribution of the inter-departure times and their initial correlation structures.;This dissertation also demonstrates how the knowledge of autocorrelation can be used to effectively improve system performance, D_EQAL, a new load balancing policy for clusters with dependent arrivals is proposed. D_EQAL separates jobs to servers according to their sizes as traditional load balancing policies do, but this separation is biased by the effort to reduce performance loss due to autocorrelation in the streams of jobs that are directed to each server. as a result of this, not all servers are equally utilized (i.e., the load in the system becomes unbalanced) but performance benefits of this load unbalancing are significant
    corecore