813 research outputs found

    Dynamical Modeling of Cloud Applications for Runtime Performance Management

    Get PDF
    Cloud computing has quickly grown to become an essential component in many modern-day software applications. It allows consumers, such as a provider of some web service, to quickly and on demand obtain the necessary computational resources to run their applications. It is desirable for these service providers to keep the running cost of their cloud application low while adhering to various performance constraints. This is made difficult due to the dynamics imposed by, e.g., resource contentions or changing arrival rate of users, and the fact that there exist multiple ways of influencing the performance of a running cloud application. To facilitate decision making in this environment, performance models can be introduced that relate the workload and different actions to important performance metrics.In this thesis, such performance models of cloud applications are studied. In particular, we focus on modeling using queueing theory and on the fluid model for approximating the often intractable dynamics of the queue lengths. First, existing results on how the fluid model can be obtained from the mean-field approximation of a closed queueing network are simplified and extended to allow for mixed networks. The queues are allowed to follow the processor sharing or delay disciplines, and can have multiple classes with phase-type service times. An improvement to this fluid model is then presented to increase accuracy when the \emph{system size}, i.e., number of servers, initial population, and arrival rate, is small. Furthermore, a closed-form approximation of the response time CDF is presented. The methods are tested in a series of simulation experiments and shown to be accurate. This mean-field fluid model is then used to derive a general fluid model for microservices with interservice delays. The model is shown to be completely extractable at runtime in a distributed fashion. It is further evaluated on a simple microservice application and found to accurately predict important performance metrics in most cases. Furthermore, a method is devised to reduce the cost of a running application by tuning load balancing parameters between replicas. The method is built on gradient stepping by applying automatic differentiation to the fluid model. This allows for arbitrarily defined cost functions and constraints, most notably including different response time percentiles. The method is tested on a simple application distributed over multiple computing clusters and is shown to reduce costs while adhering to percentile constraints. Finally, modeling of request cloning is studied using the novel concept of synchronized service. This allows certain forms of cloning over servers, each modeled with a single queue, to be equivalently expressed as one single queue. The concept is very general regarding the involved queueing discipline and distributions, but instead introduces new, less realistic assumptions. How the equivalent queue model is affected by relaxing these assumptions is studied considering the processor sharing discipline, and an extension to enable modeling of speculative execution is made. In a simulation campaign, it is shown that these relaxations only has a minor effect in certain cases

    Detecting Similarities in Virtual Machine Behavior for Cloud Monitoring using Smoothed Histograms

    Get PDF
    The growing size and complexity of cloud systems determine scalability issues for resource monitoring and management. While most existing solutions con- sider each Virtual Machine (VM) as a black box with independent characteristics, we embrace a new perspective where VMs with similar behaviors in terms of resource usage are clustered together. We argue that this new approach has the potential to address scalability issues in cloud monitoring and management. In this paper, we propose a technique to cluster VMs starting from the usage of multiple resources, assuming no knowledge of the services executed on them. This innovative technique models VMs behavior exploiting the probability histogram of their resources usage, and performs smoothing-based noise reduction and selection of the most relevant information to consider for the clustering process. Through extensive evaluation, we show that our proposal achieves high and stable performance in terms of automatic VM clustering, and can reduce the monitoring requirements of cloud systems

    On Allocation Policies for Power and Performance

    Full text link
    With the increasing popularity of Internet-based services and applications, power efficiency is becoming a major concern for data center operators, as high electricity consumption not only increases greenhouse gas emissions, but also increases the cost of running the server farm itself. In this paper we address the problem of maximizing the revenue of a service provider by means of dynamic allocation policies that run the minimum amount of servers necessary to meet user's requirements in terms of performance. The results of several experiments executed using Wikipedia traces are described, showing that the proposed schemes work well, even if the workload is non-stationary. Since any resource allocation policy requires the use of forecasting mechanisms, various schemes allowing compensating errors in the load forecasts are presented and evaluated.Comment: 8 pages, 11 figures, 2010 11th IEEE/ACM International Conference on Grid Computing (GRID), pp 313 - 320 (E2GC2-2010 workshop

    Utility-based Allocation of Resources to Virtual Machines in Cloud Computing

    Get PDF
    In recent years, cloud computing has gained a wide spread use as a new computing model that offers elastic resources on demand, in a pay-as-you-go fashion. One important goal of a cloud provider is dynamic allocation of Virtual Machines (VMs) according to workload changes in order to keep application performance to Service Level Agreement (SLA) levels, while reducing resource costs. The problem is to find an adequate trade-off between the two conflicting objectives of application performance and resource costs. In this dissertation, resource allocation solutions for this trade-off are proposed by expressing application performance and resource costs in a utility function. The proposed solutions allocate VM resources at the global data center level and at the local physical machine level by optimizing the utility function. The utility function, given as the difference between performance and costs, represents the profit of the cloud provider and offers the possibility to capture in a flexible and natural way the performance-cost trade-off. For global level resource allocation, a two-tier resource management solution is developed. In the first tier, local node controllers are located that dynamically allocate resource shares to VMs, so to maximize a local node utility function. In the second tier, there is a global controller that makes VM live migration decisions in order to maximize a global utility function. Experimental results show that optimizing the global utility function by changing the number of physical nodes according to workload maintains the performance at acceptable levels while reducing costs. To allocate multiple resources at the local physical machine level, a solution based on feed-back control theory and utility function optimization is proposed. This dynamically allocates shares to multiple resources of VMs such as CPU, memory, disk and network I/O bandwidth. In addressing the complex non-linearities that exist in shared virtualized infrastructures between VM performance and resource allocations, a solution is proposed that allocates VM resources to optimize a utility function based on application performance and power modelling. An Artificial Neural Network (ANN) is used to build an on- line model of the relationships between VM resource allocations and application performance, and another one between VM resource allocations and physical machine power. To cope with large utility optimization times in the case of an increased number of VMs, a distributed resource manager is proposed. It consists of several ANNs, each responsible for modelling and resource allocation of one VM, while exchanging information with other ANNs for coordinating resource allocations. Experiments, in simulated and realistic environments, show that the distributed ANN resource manager achieves better performance-power trade-offs than a centralized version and a distributed non-coordinated resource manager. To deal with the difficulty of building an accurate online application model and long model adaptation time, a solution that offers model-free resource management based on fuzzy control is proposed. It optimizes a utility function based on a hill-climbing search heuristic implemented as fuzzy rules. To cope with long utility optimization time in the case of an increased number of VMs, a multi-agent fuzzy controller is developed where each agent, in parallel with others, optimizes its own local utility function. The fuzzy control approach eliminates the need to build a model beforehand and provides a robust solution even for noisy measurements. Experimental results show that the multi-agent fuzzy controller performs better in terms of utility value than a centralized fuzzy control version and a state-of-the-art adaptive optimal control approach, especially for an increased number of VMs. Finally, to address some of the problems of reactive VM resource allocation approaches, a proactive resource allocation solution is proposed. This approach decides on VM resource allocations based on resource demand prediction, using a machine learning technique called Support Vector Machine (SVM). To deal with interdependencies between VMs of the same multi-tier application, cross- correlation demand prediction of multiple resource usage time series of all VMs of the multi-tier application is applied. As experiments show, this results in improved prediction accuracy and application performance

    Towards Inference Delivery Networks: Distributing Machine Learning with Optimality Guarantees

    Full text link
    An increasing number of applications rely on complex inference tasks that are based on machine learning (ML). Currently, there are two options to run such tasks: either they are served directly by the end device (e.g., smartphones, IoT equipment, smart vehicles), or offloaded to a remote cloud. Both options may be unsatisfactory for many applications: local models may have inadequate accuracy, while the cloud may fail to meet delay constraints. In this paper, we present the novel idea of \emph{inference delivery networks} (IDNs), networks of computing nodes that coordinate to satisfy ML inference requests achieving the best trade-off between latency and accuracy. IDNs bridge the dichotomy between device and cloud execution by integrating inference delivery at the various tiers of the infrastructure continuum (access, edge, regional data center, cloud). We propose a distributed dynamic policy for ML model allocation in an IDN by which each node dynamically updates its local set of inference models based on requests observed during the recent past plus limited information exchange with its neighboring nodes. Our policy offers strong performance guarantees in an adversarial setting and shows improvements over greedy heuristics with similar complexity in realistic scenarios

    Auto-scaling techniques for cloud-based Complex Event Processing

    Get PDF
    One key topic in cloud computing is elasticity, which is the ability of the cloud environment to timely adapt the resource assignment along with the workload demand. According to cloud on-demand model, the infrastructure should be able to scale up and down to unpredictable workloads, in order to achieve both a guaranteed service level and cost efficiency. This work addresses the cloud elasticity problem, with particular reference to the Complex Event Processing (CEP) systems. CEP systems are designed to process large volumes of event-driven data streams and continuously provide results with a low latency and in real-time. CEP systems need to adapt to changing query and events loads. Because of the high computational requirements and varying loads, CEP are distributed system and running on cloud infrastructures. In this work we review the cloud computing auto-scaling solutions, and study their suit- ability in the CEP model. We implement some solutions in a CEP prototype and evaluate the experimental results
    • …
    corecore