36 research outputs found

    On Reliability-Aware Server Consolidation in Cloud Datacenters

    Full text link
    In the past few years, datacenter (DC) energy consumption has become an important issue in technology world. Server consolidation using virtualization and virtual machine (VM) live migration allows cloud DCs to improve resource utilization and hence energy efficiency. In order to save energy, consolidation techniques try to turn off the idle servers, while because of workload fluctuations, these offline servers should be turned on to support the increased resource demands. These repeated on-off cycles could affect the hardware reliability and wear-and-tear of servers and as a result, increase the maintenance and replacement costs. In this paper we propose a holistic mathematical model for reliability-aware server consolidation with the objective of minimizing total DC costs including energy and reliability costs. In fact, we try to minimize the number of active PMs and racks, in a reliability-aware manner. We formulate the problem as a Mixed Integer Linear Programming (MILP) model which is in form of NP-complete. Finally, we evaluate the performance of our approach in different scenarios using extensive numerical MATLAB simulations.Comment: International Symposium on Parallel and Distributed Computing (ISPDC), Innsbruck, Austria, 201

    Server Placement with Shared Backups for Disaster-Resilient Clouds

    Full text link
    A key strategy to build disaster-resilient clouds is to employ backups of virtual machines in a geo-distributed infrastructure. Today, the continuous and acknowledged replication of virtual machines in different servers is a service provided by different hypervisors. This strategy guarantees that the virtual machines will have no loss of disk and memory content if a disaster occurs, at a cost of strict bandwidth and latency requirements. Considering this kind of service, in this work, we propose an optimization problem to place servers in a wide area network. The goal is to guarantee that backup machines do not fail at the same time as their primary counterparts. In addition, by using virtualization, we also aim to reduce the amount of backup servers required. The optimal results, achieved in real topologies, reduce the number of backup servers by at least 40%. Moreover, this work highlights several characteristics of the backup service according to the employed network, such as the fulfillment of latency requirements.Comment: Computer Networks 201

    Scheduling of data-intensive workloads in a brokered virtualized environment

    Full text link
    Providing performance predictability guarantees is increasingly important in cloud platforms, especially for data-intensive applications, for which performance depends greatly on the available rates of data transfer between the various computing/storage hosts underlying the virtualized resources assigned to the application. With the increased prevalence of brokerage services in cloud platforms, there is a need for resource management solutions that consider the brokered nature of these workloads, as well as the special demands of their intra-dependent components. In this paper, we present an offline mechanism for scheduling batches of brokered data-intensive workloads, which can be extended to an online setting. The objective of the mechanism is to decide on a packing of the workloads in a batch that minimizes the broker's incurred costs, Moreover, considering the brokered nature of such workloads, we define a payment model that provides incentives to these workloads to be scheduled as part of a batch, which we analyze theoretically. Finally, we evaluate the proposed scheduling algorithm, and exemplify the fairness of the payment model in practical settings via trace-based experiments

    Impact of Link Failures on the Performance of MapReduce in Data Center Networks

    Get PDF
    In this paper, we utilize Mixed Integer Linear Programming (MILP) models to determine the impact of link failures on the performance of shuffling operations in MapReduce when different data center network (DCN) topologies are used. For a set of non-fatal single and multi-links failures, the results indicate that different DCNs experience different completion time degradations ranging between 5% and 40%. The best performance under links failures is achieved by a server-centric PON-based DCN

    Scalable traffic-aware virtual machine management for cloud data centers

    Get PDF
    Virtual Machine (VM) management is a powerful mechanism for providing elastic services over Cloud Data Centers (DC)s. At the same time, the resulting network congestion has been repeatedly reported as the main bottleneck in DCs, even when the overall resource utilization of the infrastructure remains low. However, most current VM management strategies are traffic-agnostic, while the few that are traffic-aware only concern a static initial allocation, ignore bandwidth oversubscription, or do not scale. In this paper we present S-CORE, a scalable VM migration algorithm to dynamically reallocate VMs to servers while minimizing the overall communication footprint of active traffic flows. We formulate the aggregate VM communication as an optimization problem and we then define a novel distributed migration scheme that iteratively adapts to dynamic traffic changes. Through extensive simulation and implementation results, we show that S-CORE achieves significant (up to 87%) communication cost reduction while incurring minimal overhead and downtime

    Enabling Work-conserving Bandwidth Guarantees for Multi-tenant Datacenters via Dynamic Tenant-Queue Binding

    Full text link
    Today's cloud networks are shared among many tenants. Bandwidth guarantees and work conservation are two key properties to ensure predictable performance for tenant applications and high network utilization for providers. Despite significant efforts, very little prior work can really achieve both properties simultaneously even some of them claimed so. In this paper, we present QShare, an in-network based solution to achieve bandwidth guarantees and work conservation simultaneously. QShare leverages weighted fair queuing on commodity switches to slice network bandwidth for tenants, and solves the challenge of queue scarcity through balanced tenant placement and dynamic tenant-queue binding. QShare is readily implementable with existing switching chips. We have implemented a QShare prototype and evaluated it via both testbed experiments and simulations. Our results show that QShare ensures bandwidth guarantees while driving network utilization to over 91% even under unpredictable traffic demands.Comment: The initial work is published in IEEE INFOCOM 201

    Scalable Traffic-Aware Virtual Machine Management for Cloud Data Centers

    Get PDF
    Virtual Machine (VM) management is a powerful mechanism for providing elastic services over Cloud Data Centers (DC)s. At the same time, the resulting network congestion has been repeatedly reported as the main bottleneck in DCs, even when the overall resource utilization of the infrastructure remains low. However, most current VM management strategies are traffic-agnostic, while the few that are traffic-aware only concern a static initial allocation, ignore bandwidth oversubscription, or do not scale. In this paper we present S-CORE, a scalable VM migration algorithm to dynamically reallocate VMs to servers while minimizing the overall communication footprint of active traffic flows. We formulate the aggregate VM communication as an optimization problem and we then define a novel distributed migration scheme that iteratively adapts to dynamic traffic changes. Through extensive simulation and implementation results, we show that S-CORE achieves significant (up to 87%) communication cost reduction while incurring minimal overhead and downtime. Index Terms—Virtual Machine, Migration, Consolidation, Communication Cost, Scalable, Traffic-Aware, Data Center Networ

    Cicada: Predictive Guarantees for Cloud Network Bandwidth

    Get PDF
    In cloud-computing systems, network-bandwidth guarantees have been shown to improve predictability of application performance and cost. Most previous work on cloud-bandwidth guarantees has assumed that cloud tenants know what bandwidth guarantees they want. However, application bandwidth demands can be complex and time-varying, and many tenants might lack sufficient information to request a bandwidth guarantee that is well-matched to their needs. A tenant's lack of accurate knowledge about its future bandwidth demands can lead to over-provisioning (and thus reduced cost-efficiency) or under-provisioning (and thus poor user experience in latency-sensitive user-facing applications). We analyze traffic traces gathered over six months from an HP Cloud Services datacenter, finding that application bandwidth consumption is both time-varying and spatially inhomogeneous. This variability makes it hard to predict requirements. To solve this problem, we develop a prediction algorithm usable by a cloud provider to suggest an appropriate bandwidth guarantee to a tenant. The key idea in the prediction algorithm is to treat a set of previously observed traffic matrices as "experts" and learn online the best weighted linear combination of these experts to make its prediction. With tenant VM placement using these predictive guarantees, we find that the inter-rack network utilization in certain datacenter topologies can be more than doubled
    corecore