1,029 research outputs found

    Model-Based Dynamic Resource Management for Service Oriented Clouds

    Get PDF
    Cloud computing is a flexible platform for software as a service, as more and more applications are deployed on cloud. Major challenges in cloud include how to characterize the workload of the applications and how to manage the cloud resources efficiently by sharing them among many applications. The current state of the art considers a simplified model of the system, either ignoring the software components altogether or ignoring the relationship between individual software services. This thesis considers the following resource management problems for cloud-based service providers: (i) how to estimate the parameters of the current workload, (ii) how to meet Quality of Service (QoS) targets while minimizing infrastructure cost, (iii) how to allocate resources considering performance costs of virtual machine reconfigurations. To address the above problems, we propose a model-based feedback loop approach. The cloud infrastructure, the services, and the applications are modelled using Layered Queuing Models (LQM). These models are then optimized. Mathematical techniques are used to reduce the complexity of the models and address the scalability issues. The main contributions of this thesis are: (i) Extended Kalman Filter (EKF) based techniques improved by dynamic clustering for scalable estimation of workload parameters, (ii) combination of adaptive empirical models (tuned during runtime) and stepwise optimizations for improving the overall allocation performance, (iii) dynamic service placement algorithms that consider the cost of virtual machine reconfiguration

    Evolutionary Neural Network Based Energy Consumption Forecast for Cloud Computing

    Get PDF
    The success of Hadoop, an open-source framework for massively parallel and distributed computing, is expected to drive energy consumption of cloud data centers to new highs as service providers continue to add new infrastructure, services and capabilities to meet the market demands. While current research on data center airflow management, HVAC (Heating, Ventilation and Air Conditioning) system design, workload distribution and optimization, and energy efficient computing hardware and software are all contributing to improved energy efficiency, energy forecast in cloud computing remains a challenge. This paper reports an evolutionary computation based modeling and forecasting approach to this problem. In particular, an evolutionary neural network is developed and structurally optimized to forecast the energy load of a cloud data center. The results, both in terms of forecasting speed and accuracy, suggest that the evolutionary neural network approach to energy consumption forecasting for cloud computing is highly promising

    Dynamical Modeling of Cloud Applications for Runtime Performance Management

    Get PDF
    Cloud computing has quickly grown to become an essential component in many modern-day software applications. It allows consumers, such as a provider of some web service, to quickly and on demand obtain the necessary computational resources to run their applications. It is desirable for these service providers to keep the running cost of their cloud application low while adhering to various performance constraints. This is made difficult due to the dynamics imposed by, e.g., resource contentions or changing arrival rate of users, and the fact that there exist multiple ways of influencing the performance of a running cloud application. To facilitate decision making in this environment, performance models can be introduced that relate the workload and different actions to important performance metrics.In this thesis, such performance models of cloud applications are studied. In particular, we focus on modeling using queueing theory and on the fluid model for approximating the often intractable dynamics of the queue lengths. First, existing results on how the fluid model can be obtained from the mean-field approximation of a closed queueing network are simplified and extended to allow for mixed networks. The queues are allowed to follow the processor sharing or delay disciplines, and can have multiple classes with phase-type service times. An improvement to this fluid model is then presented to increase accuracy when the \emph{system size}, i.e., number of servers, initial population, and arrival rate, is small. Furthermore, a closed-form approximation of the response time CDF is presented. The methods are tested in a series of simulation experiments and shown to be accurate. This mean-field fluid model is then used to derive a general fluid model for microservices with interservice delays. The model is shown to be completely extractable at runtime in a distributed fashion. It is further evaluated on a simple microservice application and found to accurately predict important performance metrics in most cases. Furthermore, a method is devised to reduce the cost of a running application by tuning load balancing parameters between replicas. The method is built on gradient stepping by applying automatic differentiation to the fluid model. This allows for arbitrarily defined cost functions and constraints, most notably including different response time percentiles. The method is tested on a simple application distributed over multiple computing clusters and is shown to reduce costs while adhering to percentile constraints. Finally, modeling of request cloning is studied using the novel concept of synchronized service. This allows certain forms of cloning over servers, each modeled with a single queue, to be equivalently expressed as one single queue. The concept is very general regarding the involved queueing discipline and distributions, but instead introduces new, less realistic assumptions. How the equivalent queue model is affected by relaxing these assumptions is studied considering the processor sharing discipline, and an extension to enable modeling of speculative execution is made. In a simulation campaign, it is shown that these relaxations only has a minor effect in certain cases

    Dynamic service placement in geographically distributed clouds

    Get PDF
    Abstract-Large-scale online service providers have been increasingly relying on geographically distributed cloud infrastructures for service hosting and delivery. In this context, a key challenge faced by service providers is to determine the locations where service applications should be placed such that the hosting cost is minimized while key performance requirements (e.g. response time) are assured. Furthermore, the dynamic nature of both demand pattern and infrastructure cost favors a dynamic solution to this problem. Currently most of the existing solutions for service placement have either ignored dynamics, or provided inadequate solutions that achieve both objectives at the same time. In this paper, we present a framework for dynamic service placement problems based on control-and game-theoretic models. In particular, we present a solution that optimizes the desired objective dynamically over time according to both demand and resource price fluctuations. We further consider the case where multiple service providers compete for resource in a dynamic manner, and show that there is a Nash equilibrium solution which is socially optimal. Using simulations based on realistic topologies, demand and resource prices, we demonstrate the effectiveness of our solution in realistic settings

    Self-management for large-scale distributed systems

    Get PDF
    Autonomic computing aims at making computing systems self-managing by using autonomic managers in order to reduce obstacles caused by management complexity. This thesis presents results of research on self-management for large-scale distributed systems. This research was motivated by the increasing complexity of computing systems and their management. In the first part, we present our platform, called Niche, for programming self-managing component-based distributed applications. In our work on Niche, we have faced and addressed the following four challenges in achieving self-management in a dynamic environment characterized by volatile resources and high churn: resource discovery, robust and efficient sensing and actuation, management bottleneck, and scale. We present results of our research on addressing the above challenges. Niche implements the autonomic computing architecture, proposed by IBM, in a fully decentralized way. Niche supports a network-transparent view of the system architecture simplifying the design of distributed self-management. Niche provides a concise and expressive API for self-management. The implementation of the platform relies on the scalability and robustness of structured overlay networks. We proceed by presenting a methodology for designing the management part of a distributed self-managing application. We define design steps that include partitioning of management functions and orchestration of multiple autonomic managers. In the second part, we discuss robustness of management and data consistency, which are necessary in a distributed system. Dealing with the effect of churn on management increases the complexity of the management logic and thus makes its development time consuming and error prone. We propose the abstraction of Robust Management Elements, which are able to heal themselves under continuous churn. Our approach is based on replicating a management element using finite state machine replication with a reconfigurable replica set. Our algorithm automates the reconfiguration (migration) of the replica set in order to tolerate continuous churn. For data consistency, we propose a majority-based distributed key-value store supporting multiple consistency levels that is based on a peer-to-peer network. The store enables the tradeoff between high availability and data consistency. Using majority allows avoiding potential drawbacks of a master-based consistency control, namely, a single-point of failure and a potential performance bottleneck. In the third part, we investigate self-management for Cloud-based storage systems with the focus on elasticity control using elements of control theory and machine learning. We have conducted research on a number of different designs of an elasticity controller, including a State-Space feedback controller and a controller that combines feedback and feedforward control. We describe our experience in designing an elasticity controller for a Cloud-based key-value store using state-space model that enables to trade-off performance for cost. We describe the steps in designing an elasticity controller. We continue by presenting the design and evaluation of ElastMan, an elasticity controller for Cloud-based elastic key-value stores that combines feedforward and feedback control
    • …
    corecore