68 research outputs found

    Optimizing Virtual Resource Management in Cloud Datacenters

    Get PDF
    Datacenter clouds (e.g., Microsoft\u27s Azure, Google\u27s App Engine, and Amazon\u27s EC2) are emerging as a popular infrastructure for computing and storage due to their high scalability and elasticity. More and more companies and organizations shift their services (e.g., online social networks, Dropbox file hosting) to clouds to avoid large capital expenditures. Cloud systems employ virtualization technology to provide resources in physical machines (PMs) in the form of virtual machines (VMs). Users create VMs deployed on the cloud and each VM consumes resources (e.g., CPU, memory and bandwidth) from its host PM. Cloud providers supply services by signing Service Level Agreement (SLA) with cloud customers that serves as both the blueprint and the warranty for cloud computing. Under-provisioning of resources leads to SLA violations while over-provisioning of resources leads to resource underutilization and then revenue decrease for the cloud providers. Thus, a formidable challenge is effective management of virtual resource to maximize energy efficiency and resource utilization while satisfying the SLA. This proposal is devoted to tackle this challenge by addressing three fundamental and essential issues: i) initial VM allocation, ii) VM migration for load balance, and iii) proactive VM migration for long-term load balance. Accordingly, this proposal consists of three innovative components: (1) Initial Complementary VM Consolidation. Previous resource provisioning strategies either allocate physical resources to virtual machines (VMs) based on static VM resource demands or dynamically handle the variations in VM resource requirements through live VM migrations. However, the former fail to maximize energy efficiency and resource utilization while the latter produce high migration overhead. To handle these problems, we propose an initial VM allocation mechanism that consolidates complementary VMs with spatial/temporal-awareness. Complementary VMs are the VMs whose total demand of each resource dimension (in the spatial space) nearly reaches their host\u27s capacity during VM lifetime period (in the temporal space). Based on our observation of the existence of VM resource utilization patterns, the mechanism predicts the lifetime resource utilization patterns of short-term VMs or periodical resource utilization patterns of long-term VMs. Based on the predicted patterns, it coordinates the requirements of different resources and consolidates complementary VMs in the same physical machine (PM). This mechanism reduces the number of PMs needed to provide VM service hence increases energy efficiency and resource utilization and also reduces the number of VM migrations and SLA violations. (2) Resource Intensity Aware VM Migration for Load Balance. The unique features of clouds pose formidable challenges to achieving effective and efficient load balancing. First, VMs in clouds use different resources (e.g., CPU, bandwidth, memory) to serve a variety of services (e.g., high performance computing, web services, file services), resulting in different overutilized resources in different PMs. Also, the overutilized resources in a PM may vary over time due to the time-varying heterogenous service requests. Second, there is intensive network communication between VMs. However, previous load balancing methods statically assign equal or predefined weights to different resources, which leads to degraded performance in terms of speed and cost to achieve load balance. Also, they do not strive to minimize the VM communications between PMs. This proposed mechanism dynamically assigns different weights to different resources according to their usage intensity in the PM, which significantly reduces the time and cost to achieve load balance and avoids future load imbalance. It also tries to keep frequently communicating VMs in the same PM to reduce bandwidth cost, and migrate VMs to PMs with minimum VM performance degradation. (3) Proactive VM Migration for Long-Term Load Balance. Previous reactive load balancing algorithms migrate VMs upon the occurrence of load imbalance, while previous proactive load balancing algorithms predict PM overload to conduct VM migration. However, both methods cannot maintain long-term load balance and produce high overhead and delay due to migration VM selection and destination PM selection. To overcome these problems, we propose a proactive Markov Decision Process (MDP)-based load balancing algorithm. We handle the challenges of allying MDP in virtual resource management in cloud datacenters, which allows a PM to proactively find an optimal action to transit to a lightly loaded state that will maintain for a longer period of time. We also apply the MDP to determine destination PMs to achieve long-term PM load balance state. Our algorithm reduces the numbers of SLA violations by long-term load balance maintenance, and also reduces the load balancing overhead (e.g., CPU time, energy) and delay by quickly identifying VMs and destination PMs to migrate. Finally, we conducted extensive experiments to evaluate the proposed three mechanisms. i) We conducted simulation experiments based on two real traces and real-world testbed experiments to show that the initial complementary VM consolidation mechanism significantly reduces the number of PMs used, SLA violations and VM migrations of the previous resource provisioning strategies. ii) We conducted trace-driven simulation and real-world testbed experiments to show that RIAL outperforms other load balancing approaches in regards to the number of VM migrations, VM performance degradation and VM communication cost. iii) We conducted trace-driven experiments to show that the MDP-based load balancing algorithm outperforms previous reactive and proactive load balancing algorithms in terms of SLA violation, load balancing efficiency and long-term load balance maintenance

    Orchestrating datacenters and networks to facilitate the telecom cloud

    Get PDF
    In the Internet of services, information technology (IT) infrastructure providers play a critical role in making the services accessible to end-users. IT infrastructure providers host platforms and services in their datacenters (DCs). The cloud initiative has been accompanied by the introduction of new computing paradigms, such as Infrastructure as a Service (IaaS) and Software as a Service (SaaS), which have dramatically reduced the time and costs required to develop and deploy a service. However, transport networks become crucial to make services accessible to the user and to operate DCs. Transport networks are currently configured with big static fat pipes based on capacity over-provisioning aiming at guaranteeing traffic demand and other parameters committed in Service Level Agreement (SLA) contracts. Notwithstanding, such over-dimensioning adds high operational costs for DC operators and service providers. Therefore, new mechanisms to provide reconfiguration and adaptability of the transport network to reduce the amount of over-provisioned bandwidth are required. Although cloud-ready transport network architecture was introduced to handle the dynamic cloud and network interaction and Elastic Optical Networks (EONs) can facilitate elastic network operations, orchestration between the cloud and the interconnection network is eventually required to coordinate resources in both strata in a coherent manner. In addition, the explosion of Internet Protocol (IP)-based services requiring not only dynamic cloud and network interaction, but also additional service-specific SLA parameters and the expected benefits of Network Functions Virtualization (NFV), open the opportunity to telecom operators to exploit that cloud-ready transport network and their current infrastructure, to efficiently satisfy network requirements from the services. In the telecom cloud, a pay-per-use model can be offered to support services requiring resources from the transport network and its infrastructure. In this thesis, we study connectivity requirements from representative cloud-based services and explore connectivity models, architectures and orchestration schemes to satisfy them aiming at facilitating the telecom cloud. The main objective of this thesis is demonstrating, by means of analytical models and simulation, the viability of orchestrating DCs and networks to facilitate the telecom cloud. To achieve the main goal we first study the connectivity requirements for DC interconnection and services on a number of scenarios that require connectivity from the transport network. Specifically, we focus on studying DC federations, live-TV distribution, and 5G mobile networks. Next, we study different connectivity schemes, algorithms, and architectures aiming at satisfying those connectivity requirements. In particular, we study polling-based models for dynamic inter-DC connectivity and propose a novel notification-based connectivity scheme where inter-DC connectivity can be delegated to the network operator. Additionally, we explore virtual network topology provisioning models to support services that require service-specific SLA parameters on the telecom cloud. Finally, we focus on studying DC and network orchestration to fulfill simultaneously SLA contracts for a set of customers requiring connectivity from the transport network.En la Internet de los servicios, los proveedores de recursos relacionados con tecnologías de la información juegan un papel crítico haciéndolos accesibles a los usuarios como servicios. Dichos proveedores, hospedan plataformas y servicios en centros de datos. La oferta plataformas y servicios en la nube ha introducido nuevos paradigmas de computación tales como ofrecer la infraestructura como servicio, conocido como IaaS de sus siglas en inglés, y el software como servicio, SaaS. La disponibilidad de recursos en la nube, ha contribuido a la reducción de tiempos y costes para desarrollar y desplegar un servicio. Sin embargo, para permitir el acceso de los usuarios a los servicios así como para operar los centros de datos, las redes de transporte resultan imprescindibles. Actualmente, las redes de transporte están configuradas con conexiones estáticas y su capacidad sobredimensionada para garantizar la demanda de tráfico así como los distintos parámetros relacionados con el nivel de servicio acordado. No obstante, debido a que el exceso de capacidad en las conexiones se traduce en un elevado coste tanto para los operadores de los centros de datos como para los proveedores de servicios, son necesarios nuevos mecanismos que permitan adaptar y reconfigurar la red de forma eficiente de acuerdo a las nuevas necesidades de los servicios a los que dan soporte. A pesar de la introducción de arquitecturas que permiten la gestión de redes de transporte y su interacción con los servicios en la nube de forma dinámica, y de la irrupción de las redes ópticas elásticas, la orquestación entre la nube y la red es necesaria para coordinar de forma coherente los recursos en los distintos estratos. Además, la explosión de servicios basados el Protocolo de Internet, IP, que requieren tanto interacción dinámica con la red como parámetros particulares en los niveles de servicio además de los habituales, así como los beneficios que se esperan de la virtualización de funciones de red, representan una oportunidad para los operadores de red para explotar sus recursos y su infraestructura. La nube de operador permite ofrecer recursos del operador de red a los servicios, de forma similar a un sistema basado en pago por uso. En esta Tesis, se estudian requisitos de conectividad de servicios basados en la nube y se exploran modelos de conectividad, arquitecturas y modelos de orquestación que contribuyan a la realización de la nube de operador. El objetivo principal de esta Tesis es demostrar la viabilidad de la orquestación de centros de datos y redes para facilitar la nube de operador, mediante modelos analíticos y simulaciones. Con el fin de cumplir dicho objetivo, primero estudiamos los requisitos de conectividad para la interconexión de centros de datos y servicios en distintos escenarios que requieren conectividad en la red de transporte. En particular, nos centramos en el estudio de escenarios basados en federaciones de centros de datos, distribución de televisión en directo y la evolución de las redes móviles hacia 5G. A continuación, estudiamos distintos modelos de conectividad, algoritmos y arquitecturas para satisfacer los requisitos de conectividad. Estudiamos modelos de conectividad basados en sondeos para la interconexión de centros de datos y proponemos un modelo basado en notificaciones donde la gestión de la conectividad entre centros de datos se delega al operador de red. Estudiamos la provisión de redes virtuales para soportar en la nube de operador servicios que requieren parámetros específicos en los acuerdos de nivel de servicio además de los habituales. Finalmente, nos centramos en el estudio de la orquestación de centros de datos y redes con el objetivo de satisfacer de forma simultánea requisitos para distintos servicios.Postprint (published version

    Computing at massive scale: Scalability and dependability challenges

    Get PDF
    Large-scale Cloud systems and big data analytics frameworks are now widely used for practical services and applications. However, with the increase of data volume, together with the heterogeneity of workloads and resources, and the dynamic nature of massive user requests, the uncertainties and complexity of resource management and service provisioning increase dramatically, often resulting in poor resource utilization, vulnerable system dependability, and user-perceived performance degradations. In this paper we report our latest understanding of the current and future challenges in this particular area, and discuss both existing and potential solutions to the problems, especially those concerned with system efficiency, scalability and dependability. We first introduce a data-driven analysis methodology for characterizing the resource and workload patterns and tracing performance bottlenecks in a massive-scale distributed computing environment. We then examine and analyze several fundamental challenges and the solutions we are developing to tackle them, including for example incremental but decentralized resource scheduling, incremental messaging communication, rapid system failover, and request handling parallelism. We integrate these solutions with our data analysis methodology in order to establish an engineering approach that facilitates the optimization, tuning and verification of massive-scale distributed systems. We aim to develop and offer innovative methods and mechanisms for future computing platforms that will provide strong support for new big data and IoE (Internet of Everything) applications

    Inter-Datacenter Connectivity in Flexgrid-based Optical Networks

    Get PDF
    The huge energy consumption of datacenters (DC) requires an elastic resource management, e.g. by turning servers off when they are not used or turning them on to satisfy increments in the demand. Thanks to virtualization, jobs (e.g., web applications) can be encapsulated in virtual machines (VM) mixed with other workloads and consolidate them in the most proper server according to their performance goals. Local resource managers in DCs can migrate VMs from one server to another looking for reducing energy consumption while ensuring the committed quality of experience (QoE). Additionally, cloud providers can create DC federations based on a geographically distributed infrastructure so they can manage appropriately green energy resources available in each DC, thus reducing energy expenditure. Scheduling algorithms can perform VM migration not only within a single DC but also transferring a huge amount of raw data from one DC to another to minimize operational costs while ensuring the QoE. Since traffic between DCs is generated by VM migration, the connectivity required between two DCs highly varies along the day, presenting dramatic differences in an hourly time scale. Therefore, using a flexgrid-based optical network to interconnect DCs is an option to be considered since that technology provides fine and multiple granularity. In flexgrid optical networks the available optical spectrum is divided into frequency slices of fixed spectrum width. Optical connections can be allocated into a variable number of these slices, and its capacity can be dynamically managed by allocating or releasing slices provided that the spectrum allocated to an optical connection remain contiguous. Network providers can facilitate the interconnection among federated DCs by allowing them to request connections’ set up on demand with the desired bitrate, while tearing down those connections when they are not needed. With this aim, in the last years, huge standardization work has been done defining control plane architectures and protocols to automate connection provisioning. The Internet Engineering Task Force (IETF) is defining the Application-Based Network Operations (ABNO) architecture, which is based on standard components such as the active stateful Path Computation Element (PCE). This thesis is devoted to characterize, evaluate and analyze the problem providing optimal VM placement so as to minimize operational costs assuming that those costs are dominated by energy and communication costs. To this aim, analytical models to optimize energy consumption in DC federations are provided. Both cloud and core optical network control architectures are explored and new connectivity models for elastic operations are proposed. Mixed integer linear programming models as well as heuristic algorithms are developed and simulations are carried out. More specifically, the main objective has been attained by developing three goals covering different open issues. First we propose the Elastic Operations in Federated Datacenters for Performance and Cost Optimization (ELFADO) problem for scheduling workload and orchestrating federated DCs. A distributed and a centralized approach are studied. Second we propose architectures based on ABNO, using cross-stratum orchestration and carrier SDN, as well as elastic connectivity models supported: the dynamic elastic model and a transfer mode model respectively. Finally, we consider the centralized ELFADO and both the dynamic elastic and transfer mode connectivity models proposed and evaluate their performance

    A Survey on Load Balancing Algorithms for VM Placement in Cloud Computing

    Get PDF
    The emergence of cloud computing based on virtualization technologies brings huge opportunities to host virtual resource at low cost without the need of owning any infrastructure. Virtualization technologies enable users to acquire, configure and be charged on pay-per-use basis. However, Cloud data centers mostly comprise heterogeneous commodity servers hosting multiple virtual machines (VMs) with potential various specifications and fluctuating resource usages, which may cause imbalanced resource utilization within servers that may lead to performance degradation and service level agreements (SLAs) violations. To achieve efficient scheduling, these challenges should be addressed and solved by using load balancing strategies, which have been proved to be NP-hard problem. From multiple perspectives, this work identifies the challenges and analyzes existing algorithms for allocating VMs to PMs in infrastructure Clouds, especially focuses on load balancing. A detailed classification targeting load balancing algorithms for VM placement in cloud data centers is investigated and the surveyed algorithms are classified according to the classification. The goal of this paper is to provide a comprehensive and comparative understanding of existing literature and aid researchers by providing an insight for potential future enhancements.Comment: 22 Pages, 4 Figures, 4 Tables, in pres

    Evaluating impacts of traffic migration and virtual network functions consolidation on power aware resource allocation algorithms

    Get PDF
    © . This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/Power consumption minimization and speed of solving the resource allocation problem on cloud datacenters adopting network function virtualization architecture are among the hot topics for future Internet networks. Therefore, this paper proposes a new power aware resource allocation algorithm supporting physical servers’ consolidations combined with virtual networks consolidation to minimize datacenters’ total costs for offline scenario. In addition, the new algorithm is also integrated with an optional standalone traffic migration algorithm that can be triggered according to specific conditions and at anytime. Simulations and evaluations of the algorithm resulted on lower total costs by 30% compared to recent algorithms from Eramo et al. (2017), and when virtual network functions consolidations option was activated, total costs were 25% lower than when it was inactive. However, when migrations option was activated in the proposed allocation algorithm it did not provide any significant savings in the total power consumptions, mainly because of the allocation strategy used by the algorithm in the first place, which managed to help it to precisely allocate and efficiently utilize the least physical resources. Finally, the results showed that without migrations, allocation times where faster by 10 times than activating migrations, suggesting to apply the migration option for emergency or maintenance conditions, and use the algorithm without migrations for faster allocations and efficient power consumptions.Peer ReviewedPostprint (author's final draft

    Climbing Up Cloud Nine: Performance Enhancement Techniques for Cloud Computing Environments

    Get PDF
    With the transformation of cloud computing technologies from an attractive trend to a business reality, the need is more pressing than ever for efficient cloud service management tools and techniques. As cloud technologies continue to mature, the service model, resource allocation methodologies, energy efficiency models and general service management schemes are not yet saturated. The burden of making this all tick perfectly falls on cloud providers. Surely, economy of scale revenues and leveraging existing infrastructure and giant workforce are there as positives, but it is far from straightforward operation from that point. Performance and service delivery will still depend on the providers’ algorithms and policies which affect all operational areas. With that in mind, this thesis tackles a set of the more critical challenges faced by cloud providers with the purpose of enhancing cloud service performance and saving on providers’ cost. This is done by exploring innovative resource allocation techniques and developing novel tools and methodologies in the context of cloud resource management, power efficiency, high availability and solution evaluation. Optimal and suboptimal solutions to the resource allocation problem in cloud data centers from both the computational and the network sides are proposed. Next, a deep dive into the energy efficiency challenge in cloud data centers is presented. Consolidation-based and non-consolidation-based solutions containing a novel dynamic virtual machine idleness prediction technique are proposed and evaluated. An investigation of the problem of simulating cloud environments follows. Available simulation solutions are comprehensively evaluated and a novel design framework for cloud simulators covering multiple variations of the problem is presented. Moreover, the challenge of evaluating cloud resource management solutions performance in terms of high availability is addressed. An extensive framework is introduced to design high availability-aware cloud simulators and a prominent cloud simulator (GreenCloud) is extended to implement it. Finally, real cloud application scenarios evaluation is demonstrated using the new tool. The primary argument made in this thesis is that the proposed resource allocation and simulation techniques can serve as basis for effective solutions that mitigate performance and cost challenges faced by cloud providers pertaining to resource utilization, energy efficiency, and client satisfaction

    On the feasibility of collaborative green data center ecosystems

    Get PDF
    The increasing awareness of the impact of the IT sector on the environment, together with economic factors, have fueled many research efforts to reduce the energy expenditure of data centers. Recent work proposes to achieve additional energy savings by exploiting, in concert with customers, service workloads and to reduce data centers’ carbon footprints by adopting demand-response mechanisms between data centers and their energy providers. In this paper, we debate about the incentives that customers and data centers can have to adopt such measures and propose a new service type and pricing scheme that is economically attractive and technically realizable. Simulation results based on real measurements confirm that our scheme can achieve additional energy savings while preserving service performance and the interests of data centers and customers.Peer ReviewedPostprint (author's final draft

    Performance Modeling and Optimization of Resource Allocation in Cloud Computing Systems

    Get PDF
    Cloud computing offers on-demand network access to the computing resources through virtualization. This paradigm shifts the computer resources to the cloud, which results in cost savings as the users leasing instead of owning these resources. Clouds will also provide power constrained mobile users accessibility to the computing resources. In this thesis, we develop performance models of these systems and optimization of their resource allocation. In the performance modeling, we assume that jobs arrive to the system according to a Poisson process and they may have quite general service time distributions. Each job may consist of multiple number of tasks with each task requiring a virtual machine (VM) for its execution. The size of a job is determined by the number of its tasks, which may be a constant or a variable. In the case of constant job size, we allow different classes of jobs, with each class being determined through their arrival and service rates and number of tasks in a job. In the variable case a job generates randomly new tasks during its service time. The latter requires dynamic assignment of VMs to a job, which will be needed in providing service to mobile users. We model the systems with both constant and variable size jobs using birth-death processes. In the case of constant job size, we determined joint probability distribution of the number of jobs from each class in the system, job blocking probabilities and distribution of the utilization of resources for systems with both homogeneous and heterogeneous types of VMs. We have also analyzed tradeoffs for turning idle servers off for power saving. In the case of variable job sizes, we have determined distribution of the number of jobs in the system and average service time of a job for systems with both infinite and finite amount of resources. We have presented numerical results and any approximations are verified by simulation. The performance results may be used in the dimensioning of cloud computing centers. Next, we have developed an optimization model that determines the job schedule, which minimizes the total power consumption of a cloud computing center. It is assumed that power consumption in a computing center is due to communications and server activities. We have assumed a distributed model, where a job may be assigned VMs on different servers, referred to as fragmented service. In this model, communications among the VMs of a job on different servers is proportional to the product of the number of VMs assigned to the job on each pair of servers which results in a quadratic network power consumption in number of job fragments. Then, we have applied integer quadratic programming and the column generation method to solve the optimization problem for large scale systems in conjunction with two different algorithms to reduce the complexity and the amount of time needed to obtain the solution. In the second phase of this work, we have formulated this optimization problem as a function of discrete-time. At each discrete-time, the job load of the system consists of new arriving jobs during the present slot and unfinished jobs from the previous slots. We have developed a technique to solve this optimization problem with full, partial and no migration of the old jobs in the system. Numerical results show that this optimization results in significant operating costs savings in the cloud computing systems
    corecore