352 research outputs found

    Managing server energy and reducing operational cost for online service providers

    Get PDF
    The past decade has seen the energy consumption in servers and Internet Data Centers (IDCs) skyrocket. A recent survey estimated that the worldwide spending on servers and cooling have risen to above $30 billion and is likely to exceed spending on the new server hardware . The rapid rise in energy consumption has posted a serious threat to both energy resources and the environment, which makes green computing not only worthwhile but also necessary. This dissertation intends to tackle the challenges of both reducing the energy consumption of server systems and by reducing the cost for Online Service Providers (OSPs). Two distinct subsystems account for most of IDC’s power: the server system, which accounts for 56% of the total power consumption of an IDC, and the cooling and humidifcation systems, which accounts for about 30% of the total power consumption. The server system dominates the energy consumption of an IDC, and its power draw can vary drastically with data center utilization. In this dissertation, we propose three models to achieve energy effciency in web server clusters: an energy proportional model, an optimal server allocation and frequency adjustment strategy, and a constrained Markov model. The proposed models have combined Dynamic Voltage/Frequency Scaling (DV/FS) and Vary-On, Vary-off (VOVF) mechanisms that work together for more energy savings. Meanwhile, corresponding strategies are proposed to deal with the transition overheads. We further extend server energy management to the IDC’s costs management, helping the OSPs to conserve, manage their own electricity cost, and lower the carbon emissions. We have developed an optimal energy-aware load dispatching strategy that periodically maps more requests to the locations with lower electricity prices. A carbon emission limit is placed, and the volatility of the carbon offset market is also considered. Two energy effcient strategies are applied to the server system and the cooling system respectively. With the rapid development of cloud services, we also carry out research to reduce the server energy in cloud computing environments. In this work, we propose a new live virtual machine (VM) placement scheme that can effectively map VMs to Physical Machines (PMs) with substantial energy savings in a heterogeneous server cluster. A VM/PM mapping probability matrix is constructed, in which each VM request is assigned with a probability running on PMs. The VM/PM mapping probability matrix takes into account resource limitations, VM operation overheads, server reliability as well as energy effciency. The evolution of Internet Data Centers and the increasing demands of web services raise great challenges to improve the energy effciency of IDCs. We also express several potential areas for future research in each chapter

    PIASA: A power and interference aware resource management strategy for heterogeneous workloads in cloud data centers

    Get PDF
    Cloud data centers have been progressively adopted in different scenarios, as reflected in the execution of heterogeneous applications with diverse workloads and diverse quality of service (QoS) requirements. Virtual machine (VM) technology eases resource management in physical servers and helps cloud providers achieve goals such as optimization of energy consumption. However, the performance of an application running inside a VM is not guaranteed due to the interference among co-hosted workloads sharing the same physical resources. Moreover, the different types of co-hosted applications with diverse QoS requirements as well as the dynamic behavior of the cloud makes efficient provisioning of resources even more difficult and a challenging problem in cloud data centers. In this paper, we address the problem of resource allocation within a data center that runs different types of application workloads, particularly CPU- and network-intensive applications. To address these challenges, we propose an interference- and power-aware management mechanism that combines a performance deviation estimator and a scheduling algorithm to guide the resource allocation in virtualized environments. We conduct simulations by injecting synthetic workloads whose characteristics follow the last version of the Google Cloud tracelogs. The results indicate that our performance-enforcing strategy is able to fulfill contracted SLAs of real-world environments while reducing energy costs by as much as 21%

    Cloud-scale VM Deflation for Running Interactive Applications On Transient Servers

    Full text link
    Transient computing has become popular in public cloud environments for running delay-insensitive batch and data processing applications at low cost. Since transient cloud servers can be revoked at any time by the cloud provider, they are considered unsuitable for running interactive application such as web services. In this paper, we present VM deflation as an alternative mechanism to server preemption for reclaiming resources from transient cloud servers under resource pressure. Using real traces from top-tier cloud providers, we show the feasibility of using VM deflation as a resource reclamation mechanism for interactive applications in public clouds. We show how current hypervisor mechanisms can be used to implement VM deflation and present cluster deflation policies for resource management of transient and on-demand cloud VMs. Experimental evaluation of our deflation system on a Linux cluster shows that microservice-based applications can be deflated by up to 50\% with negligible performance overhead. Our cluster-level deflation policies allow overcommitment levels as high as 50\%, with less than a 1\% decrease in application throughput, and can enable cloud platforms to increase revenue by 30\%.Comment: To appear at ACM HPDC 202

    aMOSS: Automated Multi-objective Server Provisioning with Stress-Strain Curving

    Full text link
    Abstract—A modern data center built upon virtualized server clusters for hosting Internet applications has multiple correlated and conflicting objectives. Utility-based approaches are often used for optimizing multiple objectives. However, it is difficult to define a local utility function to suitably represent one objective and to apply different weights on multiple local utility functions. Furthermore, choosing weights statically may not be effective in the face of highly dynamic workloads. In this paper, we propose an automated multi-objective server provisioning with stress-strain curving approach (aMOSS). First, we formulate a multi-objective optimization problem that is to minimize the number of physical machines used, the average response time and the total number of virtual servers allocated for multi-tier applications. Second, we propose a novel stress-strain curving method to automatically select the most efficient solution from a Pareto-optimal set that is obtained as the result of a non-dominated sorting based optimization technique. Third, we en-hance the method to reduce server switching cost and improve the utilization of physical machines. Simulation results demonstrate that compared to utility-based approaches, aMOSS automatically achieves the most efficient tradeoff between performance and resource allocation efficiency. We implement aMOSS in a testbed of virtualized blade servers and demonstrate that it outperforms a representative dynamic server provisioning approach in achieving the average response time guarantee and in resource allocation efficiency for a multi-tier Internet service. aMOSS provides a unique perspective to tackle the challenging autonomic server provisioning problem. I

    Towards auto-scaling in the cloud: online resource allocation techniques

    Get PDF
    Cloud computing provides an easy access to computing resources. Customers can acquire and release resources any time. However, it is not trivial to determine when and how many resources to allocate. Many applications running in the cloud face workload changes that affect their resource demand. The first thought is to plan capacity either for the average load or for the peak load. In the first case there is less cost incurred, but performance will be affected if the peak load occurs. The second case leads to money wastage, since resources will remain underutilized most of the time. Therefore there is a need for a more sophisticated resource provisioning techniques that can automatically scale the application resources according to workload demand and performance constrains. Large cloud providers such as Amazon, Microsoft, RightScale provide auto-scaling services. However, without the proper configuration and testing such services can do more harm than good. In this work I investigate application specific online resource allocation techniques that allow to dynamically adapt to incoming workload, minimize the cost of virtual resources and meet user-specified performance objectives

    Autonomic Provisioning with Self-Adaptive Neural Fuzzy Control for End-to-end Delay Guarantee

    Full text link
    Abstract—Autonomic server provisioning for performance as-surance is a critical issue in data centers. It is important but challenging to guarantee an important performance metric, percentile-based end-to-end delay of requests flowing through a virtualized multi-tier server cluster. It is mainly due to dynamically varying workload and the lack of an accurate system performance model. In this paper, we propose a novel autonomic server allocation approach based on a model-independent and self-adaptive neural fuzzy control. There are model-independent fuzzy controllers that utilize heuristic knowledge in the form of rule base for performance assurance. Those controllers are designed manually on trial and error basis, often not effective in the face of highly dynamic workloads. We design the neural fuzzy controller as a hybrid of control theoretical and machine learning techniques. It is capable of self-constructing its structure and adapting its parameters through fast online learning. Unlike other supervised machine learning techniques, it does not require off-line training. We further enhance the neural fuzzy controller to compensate for the effect of server switching delays. Extensive simulations demonstrate the effectiveness of our new approach in achieving the percentile-based end-to-end delay guarantees. Com-pared to a rule-based fuzzy controller enabled server allocation approach, the new approach delivers superior performance in the face of highly dynamic workloads. It is robust to workload variation, change in delay target and server switching delays. I

    Dynamical Modeling of Cloud Applications for Runtime Performance Management

    Get PDF
    Cloud computing has quickly grown to become an essential component in many modern-day software applications. It allows consumers, such as a provider of some web service, to quickly and on demand obtain the necessary computational resources to run their applications. It is desirable for these service providers to keep the running cost of their cloud application low while adhering to various performance constraints. This is made difficult due to the dynamics imposed by, e.g., resource contentions or changing arrival rate of users, and the fact that there exist multiple ways of influencing the performance of a running cloud application. To facilitate decision making in this environment, performance models can be introduced that relate the workload and different actions to important performance metrics.In this thesis, such performance models of cloud applications are studied. In particular, we focus on modeling using queueing theory and on the fluid model for approximating the often intractable dynamics of the queue lengths. First, existing results on how the fluid model can be obtained from the mean-field approximation of a closed queueing network are simplified and extended to allow for mixed networks. The queues are allowed to follow the processor sharing or delay disciplines, and can have multiple classes with phase-type service times. An improvement to this fluid model is then presented to increase accuracy when the \emph{system size}, i.e., number of servers, initial population, and arrival rate, is small. Furthermore, a closed-form approximation of the response time CDF is presented. The methods are tested in a series of simulation experiments and shown to be accurate. This mean-field fluid model is then used to derive a general fluid model for microservices with interservice delays. The model is shown to be completely extractable at runtime in a distributed fashion. It is further evaluated on a simple microservice application and found to accurately predict important performance metrics in most cases. Furthermore, a method is devised to reduce the cost of a running application by tuning load balancing parameters between replicas. The method is built on gradient stepping by applying automatic differentiation to the fluid model. This allows for arbitrarily defined cost functions and constraints, most notably including different response time percentiles. The method is tested on a simple application distributed over multiple computing clusters and is shown to reduce costs while adhering to percentile constraints. Finally, modeling of request cloning is studied using the novel concept of synchronized service. This allows certain forms of cloning over servers, each modeled with a single queue, to be equivalently expressed as one single queue. The concept is very general regarding the involved queueing discipline and distributions, but instead introduces new, less realistic assumptions. How the equivalent queue model is affected by relaxing these assumptions is studied considering the processor sharing discipline, and an extension to enable modeling of speculative execution is made. In a simulation campaign, it is shown that these relaxations only has a minor effect in certain cases

    A Comparison of wide area network performance using virtualized and non-virtualized client architectures

    Get PDF
    The goal of this thesis is to determine if there is a significant performance difference between two network computer architecture models. The study will measure latency and throughput for both client-server and virtualized client architectures. In the client server environment, the client computer performs a significant portion of the work and frequently requires downloading uploading files to and from a remote location. Virtual client architecture turns the client machine into a terminal, sending only keystrokes and mouse clicks and receiving only display pixel or sound changes. I accomplished the goal of comparing these architectures by comparing completion times for ping reply, file download, a small set of common work tasks, and a moderately large SQL database query. I compared these tasks using simulated wide area network, local area network, and virtual client network architectures. The study limits the architecture to one where the virtual client and server are in the same data center
    • …
    corecore