16,979 research outputs found

    Self-managing cloud-native applications : design, implementation and experience

    Get PDF
    Running applications in the cloud efficiently requires much more than deploying software in virtual machines. Cloud applications have to be continuously managed: (1) to adjust their resources to the incoming load and (2) to face transient failures replicating and restarting components to provide resiliency on unreliable infrastructure. Continuous management monitors application and infrastructural metrics to provide automated and responsive reactions to failures (health management) and changing environmental conditions (auto-scaling) minimizing human intervention. In the current practice, management functionalities are provided as infrastructural or third party services. In both cases they are external to the application deployment. We claim that this approach has intrinsic limits, namely that separating management functionalities from the application prevents them from naturally scaling with the application and requires additional management code and human intervention. Moreover, using infrastructure provider services for management functionalities results in vendor lock-in effectively preventing cloud applications to adapt and run on the most effective cloud for the job. In this paper we discuss the main characteristics of cloud native applications, propose a novel architecture that enables scalable and resilient self-managing applications in the cloud, and relate on our experience in porting a legacy application to the cloud applying cloud-native principles

    On Reliability-Aware Server Consolidation in Cloud Datacenters

    Full text link
    In the past few years, datacenter (DC) energy consumption has become an important issue in technology world. Server consolidation using virtualization and virtual machine (VM) live migration allows cloud DCs to improve resource utilization and hence energy efficiency. In order to save energy, consolidation techniques try to turn off the idle servers, while because of workload fluctuations, these offline servers should be turned on to support the increased resource demands. These repeated on-off cycles could affect the hardware reliability and wear-and-tear of servers and as a result, increase the maintenance and replacement costs. In this paper we propose a holistic mathematical model for reliability-aware server consolidation with the objective of minimizing total DC costs including energy and reliability costs. In fact, we try to minimize the number of active PMs and racks, in a reliability-aware manner. We formulate the problem as a Mixed Integer Linear Programming (MILP) model which is in form of NP-complete. Finally, we evaluate the performance of our approach in different scenarios using extensive numerical MATLAB simulations.Comment: International Symposium on Parallel and Distributed Computing (ISPDC), Innsbruck, Austria, 201

    Datacenter Traffic Control: Understanding Techniques and Trade-offs

    Get PDF
    Datacenters provide cost-effective and flexible access to scalable compute and storage resources necessary for today's cloud computing needs. A typical datacenter is made up of thousands of servers connected with a large network and usually managed by one operator. To provide quality access to the variety of applications and services hosted on datacenters and maximize performance, it deems necessary to use datacenter networks effectively and efficiently. Datacenter traffic is often a mix of several classes with different priorities and requirements. This includes user-generated interactive traffic, traffic with deadlines, and long-running traffic. To this end, custom transport protocols and traffic management techniques have been developed to improve datacenter network performance. In this tutorial paper, we review the general architecture of datacenter networks, various topologies proposed for them, their traffic properties, general traffic control challenges in datacenters and general traffic control objectives. The purpose of this paper is to bring out the important characteristics of traffic control in datacenters and not to survey all existing solutions (as it is virtually impossible due to massive body of existing research). We hope to provide readers with a wide range of options and factors while considering a variety of traffic control mechanisms. We discuss various characteristics of datacenter traffic control including management schemes, transmission control, traffic shaping, prioritization, load balancing, multipathing, and traffic scheduling. Next, we point to several open challenges as well as new and interesting networking paradigms. At the end of this paper, we briefly review inter-datacenter networks that connect geographically dispersed datacenters which have been receiving increasing attention recently and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial

    QuickCast: Fast and Efficient Inter-Datacenter Transfers using Forwarding Tree Cohorts

    Full text link
    Large inter-datacenter transfers are crucial for cloud service efficiency and are increasingly used by organizations that have dedicated wide area networks between datacenters. A recent work uses multicast forwarding trees to reduce the bandwidth needs and improve completion times of point-to-multipoint transfers. Using a single forwarding tree per transfer, however, leads to poor performance because the slowest receiver dictates the completion time for all receivers. Using multiple forwarding trees per transfer alleviates this concern--the average receiver could finish early; however, if done naively, bandwidth usage would also increase and it is apriori unclear how best to partition receivers, how to construct the multiple trees and how to determine the rate and schedule of flows on these trees. This paper presents QuickCast, a first solution to these problems. Using simulations on real-world network topologies, we see that QuickCast can speed up the average receiver's completion time by as much as 10×10\times while only using 1.04×1.04\times more bandwidth; further, the completion time for all receivers also improves by as much as 1.6×1.6\times faster at high loads.Comment: [Extended Version] Accepted for presentation in IEEE INFOCOM 2018, Honolulu, H
    corecore