10 research outputs found
Modeling and Control of Server-based Systems
When deploying networked computing-based applications, proper resource management of the server-side resources is essential for maintaining quality of service and cost efficiency. The work presented in this thesis is based on six papers, all investigating problems that relate to resource management of server-based systems. Using a queueing system approach we model the performance of a database system being subjected to write-heavy traffic. We then evaluate the model using simulations and validate that it accurately mimics the behavior of a real test bed. In collaboration with Ericsson we model and design a per-request admission control scheme for a Mobile Service Support System (MSS). The model is then validated and the control scheme is evaluated in a test bed. Also, we investigate the feasibility to estimate the state of a server in an MSS using an event-based Extended Kalman Filter. In the brownout paradigm of server resource management, the amount of work required to serve a client is adjusted to compensate for temporary resource shortages. In this thesis we investigate how to perform load balancing over self-adaptive server instances. The load balancing schemes are evaluated in both simulations and test bed experiments. Further, we investigate how to employ delay-compensated feedback control to automatically adjust the amount of resources to deploy to a cloud application in the presence of a large, stochastic delay. The delay-compensated control scheme is evaluated in simulations and the conclusion is that it can be made fast and responsive compared to an industry-standard solution
Performance Modeling and Analysis of a Database Server with Write-Heavy Workload
Resource-optimization of the infrastructure for service oriented applications require accurate performance models. In this paper we investigate the performance dynamics of a MySQL/InnoDB database server with write-heavy workload. The main objective of our investigation was to understand the system dynamics due to the buffering of disk operations that occurs in database servers with write-heavy workload. In the paper, we characterize the traffic and its periodic anomalies caused by flushing of the buffer. Further, we present a performance model for the response time of the requests and show how this model can be configured to fit with actual database measurements. Also, we show that window-based admission control outperforms rate-based admission control for these types of systems
Model-Based Deadtime Compensation of Virtual Machine Startup Times
Scaling the amount of resources allocated to an application according to the actual load is a challenging problem in cloud computing. The emergence of autoscaling techniques allows for autonomous decisions to be taken when to acquire or release resources. The actuation of these decisions is however affected by time delays. Therefore, it becomes critical for the autoscaler to account for this phenomenon, in order to avoid over- or under-provisioning. This paper presents a delay-compensator inspired by the Smith predictor. The compensator allows one to close a simple feedback loop around a cloud application with a large, time-varying delay, preserving the stability of the controlled system. It also makes it possible for the closed-loop system to converge to a steady-state, even in presence of resource quantization. The presented approach is compared to a threshold-based controller with a cooldown period, that is typically adopted in industrial applications
Application of Control Theory to a Commercial Mobile Service Support System
The Mobile Service Support system (MSS), which Ericsson AB develops, handles the setup of new subscribers and services into a mobile network. Experience from deployed systems show that traffic monitoring and control of the system will be crucial for handling overload situations that may occur at sudden traffic surges. In this paper we identify and explore some important control challenges for this type of systems. Further, we present analysis and experiments showing some advantages of proposed solutions. First, we develop a load-dependent server model for the system, which is validated in testbed experiments. Further, we propose a control design based on the model, and a method for estimation of response times and arrival rates. The main contribution of this paper is that we show how control theory methods and analysis can be used for commercial telecom systems. Parts of our results have been implemented in commercial products, validating the strength of our work
Control-theoretical load-balancing for cloud applications with brownout
Cloud applications are often subject to unexpected events like flash crowds and hardware failures. Without a predictable behaviour, users may abandon an unresponsive application. This problem has been partially solved on two separate fronts: first, by adding a self-adaptive feature called brownout inside cloud applications to bound response times by modulating user experience, and, second, by introducing replicas -- copies of the applications having the same functionalities -- for redundancy and adding a load-balancer to direct incoming traffic. However, existing load-balancing strategies interfere with brownout self-adaptivity. Load-balancers are often based on response times, that are already controlled by the self-adaptive features of the application, hence they are not a good indicator of how well a replica is performing. In this paper, we present novel load-balancing strategies, specifically designed to support brownout applications. They base their decision not on response time, but on user experience degradation. We implemented our strategies in a self-adaptive application simulator, together with some state-of-the-art solutions. Results obtained in multiple scenarios show that the proposed strategies bring significant improvements when compared to the state-of-the-art ones
Event-Based Response Time Estimation
Response time is a measure of quality of service in com- puter systems. Estimation techniques, suitable for support systems for mobile phone systems, are explored. These sys- tems are complex queueing systems with large databases. The trac generated by users and system adminstrators changes rapidly, some loads can be measured other cannot. Attempts to capture all details give models that are not suit- able for on-line control. Estimators based on continuous flow models with event based measurements are designed using extended Kalman ltering. The estimators are compared with simple-data based estimators
Performance modelling of database servers in a Telecommunication Service Management system
Resource optimization mechanisms, as admission control and traffic management, require accurate performance models that capture the dynamics of the system during high loads. The main objective of this paper is to develop an accurate performance model for database servers in a telecommunication service management system. We investigate the use of a server model with load dependency. Concurrent requests add load to the system and decrease the server capacity. We derive explicit equations for the state probabilities, the average number of jobs in the system and the average response times. Further, we present some heuristics on how to tune the parameters for given measurement data. Also, using testbed experiments, we validate that the model accurately captures the dynamics of a database server with write-heavy workload
Control-based load-balancing techniques : Analysis and performance evaluation via a randomized optimization approach
Cloud applications are often subject to unexpected events like flashcrowds and hardware failures. Users that expect a predictable behavior may abandon an unresponsive application when these events occur. Researchers and engineers addressed this problem on two separate fronts: first, they introduced replicas - copies of the application with the same functionality - for redundancy and scalability; second, they added a self-adaptive feature called brownout inside cloud applications to bound response times by modulating user experience. The presence of multiple replicas requires a dedicated component to direct incoming traffic: a load-balancer. Existing load-balancing strategies based on response times interfere with the response time controller developed for brownout-compliant applications. In fact, the brownout approach bounds response times using a control action. Hence, the response time, that was used to aid load-balancing decision, is not a good indicator of how well a replica is performing. To fix this issue, this paper reviews some proposal for brownout-aware load-balancing and provides a comprehensive experimental evaluation that compares them. To provide formal guarantees on the load balancing performance, we use a randomized optimization approach and apply the scenario theory. We perform an extensive set of experiments on a real machine, extending the popular lighttpd web server and load-balancer, and obtaining a production-ready implementation. Experimental results show an improvement of the user experience over Shortest Queue First (SQF)-believed to be near-optimal in the non-adaptive case. The improved user experience is obtained preserving the response time predictability
Improving Cloud Service Resilience using Brownout-Aware Load-Balancing
We focus on improving resilience of cloud services (e.g., e-commerce website), when correlated or cascading failures lead to computing capacity shortage. We study how to extend the classical cloud service architecture composed of a load-balancer and replicas with a recently proposed self-adaptive paradigm called brownout. Such services are able to reduce their capacity requirements by degrading user experience (e.g., disabling recommendations). Combining resilience with the brownout paradigm is to date an open practical problem. The issue is to ensure that replica self-adaptivity would not confuse the load-balancing algorithm, overloading replicas that are already struggling with capacity shortage. For example, load-balancing strategies based on response times are not able to decide which replicas should be selected, since the response times are already controlled by the brownout paradigm. In this paper we propose two novel brownout-aware load-balancing algorithms. To test their practical applicability, we extended the popular lighttpd web server and load-balancer, thus obtaining a production-ready implementation. Experimental evaluation shows that the approach enables cloud services to remain responsive despite cascading failures. Moreover, when compared to Shortest Queue First (SQF), believed to be near-optimal in the non-adaptive case, our algorithms improve user experience by 5%, with high statistical significance, while preserving response time predictability
Control-based load-balancing techniques : Analysis and performance evaluation via a randomized optimization approach
Cloud applications are often subject to unexpected events like flashcrowds and hardware failures. Users that expect a predictable behavior may abandon an unresponsive application when these events occur. Researchers and engineers addressed this problem on two separate fronts: first, they introduced replicas - copies of the application with the same functionality - for redundancy and scalability; second, they added a self-adaptive feature called brownout inside cloud applications to bound response times by modulating user experience. The presence of multiple replicas requires a dedicated component to direct incoming traffic: a load-balancer. Existing load-balancing strategies based on response times interfere with the response time controller developed for brownout-compliant applications. In fact, the brownout approach bounds response times using a control action. Hence, the response time, that was used to aid load-balancing decision, is not a good indicator of how well a replica is performing. To fix this issue, this paper reviews some proposal for brownout-aware load-balancing and provides a comprehensive experimental evaluation that compares them. To provide formal guarantees on the load balancing performance, we use a randomized optimization approach and apply the scenario theory. We perform an extensive set of experiments on a real machine, extending the popular lighttpd web server and load-balancer, and obtaining a production-ready implementation. Experimental results show an improvement of the user experience over Shortest Queue First (SQF)-believed to be near-optimal in the non-adaptive case. The improved user experience is obtained preserving the response time predictability