813 research outputs found

    Datacenter Traffic Control: Understanding Techniques and Trade-offs

    Get PDF
    Datacenters provide cost-effective and flexible access to scalable compute and storage resources necessary for today's cloud computing needs. A typical datacenter is made up of thousands of servers connected with a large network and usually managed by one operator. To provide quality access to the variety of applications and services hosted on datacenters and maximize performance, it deems necessary to use datacenter networks effectively and efficiently. Datacenter traffic is often a mix of several classes with different priorities and requirements. This includes user-generated interactive traffic, traffic with deadlines, and long-running traffic. To this end, custom transport protocols and traffic management techniques have been developed to improve datacenter network performance. In this tutorial paper, we review the general architecture of datacenter networks, various topologies proposed for them, their traffic properties, general traffic control challenges in datacenters and general traffic control objectives. The purpose of this paper is to bring out the important characteristics of traffic control in datacenters and not to survey all existing solutions (as it is virtually impossible due to massive body of existing research). We hope to provide readers with a wide range of options and factors while considering a variety of traffic control mechanisms. We discuss various characteristics of datacenter traffic control including management schemes, transmission control, traffic shaping, prioritization, load balancing, multipathing, and traffic scheduling. Next, we point to several open challenges as well as new and interesting networking paradigms. At the end of this paper, we briefly review inter-datacenter networks that connect geographically dispersed datacenters which have been receiving increasing attention recently and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial

    A Framework for Controlling Quality of Sessions in Multimedia Systems

    Get PDF
    Collaborative multimedia systems demand overall session quality control beyond the level of quality of service (QoS) pertaining to individual connections in isolation of others. At every instant in time, the quality of the session depends on the actual QoS offered by the system to each of the application streams, as well as on the relative priorities of these streams according to the application semantics. We introduce a framework for achieving QoSess control and address the architectural issues involved in designing a QoSess control laver that realizes the proposed framework. In addition, we detail our contributions for two main components of the QoSess control layer. The first component is a scalable and robust feedback protocol, which allows for determining the worst case state among a group of receivers of a stream. This mechanism is used for controlling the transmission rates of multimedia sources in both cases of layered and single-rate multicast streams. The second component is a set of inter-stream adaptation algorithms that dynamically control the bandwidth shares of the streams belonging to a session. Additionally, in order to ensure stability and responsiveness in the inter-stream adaptation process, several measures are taken, including devising a domain rate control protocol. The performance of the proposed mechanisms is analyzed and their advantages are demonstrated by simulation and experimental results

    Enabling Work-conserving Bandwidth Guarantees for Multi-tenant Datacenters via Dynamic Tenant-Queue Binding

    Full text link
    Today's cloud networks are shared among many tenants. Bandwidth guarantees and work conservation are two key properties to ensure predictable performance for tenant applications and high network utilization for providers. Despite significant efforts, very little prior work can really achieve both properties simultaneously even some of them claimed so. In this paper, we present QShare, an in-network based solution to achieve bandwidth guarantees and work conservation simultaneously. QShare leverages weighted fair queuing on commodity switches to slice network bandwidth for tenants, and solves the challenge of queue scarcity through balanced tenant placement and dynamic tenant-queue binding. QShare is readily implementable with existing switching chips. We have implemented a QShare prototype and evaluated it via both testbed experiments and simulations. Our results show that QShare ensures bandwidth guarantees while driving network utilization to over 91% even under unpredictable traffic demands.Comment: The initial work is published in IEEE INFOCOM 201

    Theories and Models for Internet Quality of Service

    Get PDF
    We survey recent advances in theories and models for Internet Quality of Service (QoS). We start with the theory of network calculus, which lays the foundation for support of deterministic performance guarantees in networks, and illustrate its applications to integrated services, differentiated services, and streaming media playback delays. We also present mechanisms and architecture for scalable support of guaranteed services in the Internet, based on the concept of a stateless core. Methods for scalable control operations are also briefly discussed. We then turn our attention to statistical performance guarantees, and describe several new probabilistic results that can be used for a statistical dimensioning of differentiated services. Lastly, we review recent proposals and results in supporting performance guarantees in a best effort context. These include models for elastic throughput guarantees based on TCP performance modeling, techniques for some quality of service differentiation without access control, and methods that allow an application to control the performance it receives, in the absence of network support

    Advances in Internet Quality of Service

    Get PDF
    We describe recent advances in theories and architecture that support performance guarantees needed for quality of service networks. We start with deterministic computations and give applications to integrated services, differentiated services, and playback delays. We review the methods used for obtaining a scalable integrated services support, based on the concept of a stateless core. New probabilistic results that can be used for a statistical dimensioning of differentiated services are explained; some are based on classical queuing theory, while others capitalize on the deterministic results. Then we discuss performance guarantees in a best effort context; we review: methods to provide some quality of service in a pure best effort environment; methods to provide some quality of service differentiation without access control, and methods that allow an application to control the performance it receives, in the absence of network support

    Data-Driven Intelligent Scheduling For Long Running Workloads In Large-Scale Datacenters

    Get PDF
    Cloud computing is becoming a fundamental facility of society today. Large-scale public or private cloud datacenters spreading millions of servers, as a warehouse-scale computer, are supporting most business of Fortune-500 companies and serving billions of users around the world. Unfortunately, modern industry-wide average datacenter utilization is as low as 6% to 12%. Low utilization not only negatively impacts operational and capital components of cost efficiency, but also becomes the scaling bottleneck due to the limits of electricity delivered by nearby utility. It is critical and challenge to improve multi-resource efficiency for global datacenters. Additionally, with the great commercial success of diverse big data analytics services, enterprise datacenters are evolving to host heterogeneous computation workloads including online web services, batch processing, machine learning, streaming computing, interactive query and graph computation on shared clusters. Most of them are long-running workloads that leverage long-lived containers to execute tasks. We concluded datacenter resource scheduling works over last 15 years. Most previous works are designed to maximize the cluster efficiency for short-lived tasks in batch processing system like Hadoop. They are not suitable for modern long-running workloads of Microservices, Spark, Flink, Pregel, Storm or Tensorflow like systems. It is urgent to develop new effective scheduling and resource allocation approaches to improve efficiency in large-scale enterprise datacenters. In the dissertation, we are the first of works to define and identify the problems, challenges and scenarios of scheduling and resource management for diverse long-running workloads in modern datacenter. They rely on predictive scheduling techniques to perform reservation, auto-scaling, migration or rescheduling. It forces us to pursue and explore more intelligent scheduling techniques by adequate predictive knowledges. We innovatively specify what is intelligent scheduling, what abilities are necessary towards intelligent scheduling, how to leverage intelligent scheduling to transfer NP-hard online scheduling problems to resolvable offline scheduling issues. We designed and implemented an intelligent cloud datacenter scheduler, which automatically performs resource-to-performance modeling, predictive optimal reservation estimation, QoS (interference)-aware predictive scheduling to maximize resource efficiency of multi-dimensions (CPU, Memory, Network, Disk I/O), and strictly guarantee service level agreements (SLA) for long-running workloads. Finally, we introduced a large-scale co-location techniques of executing long-running and other workloads on the shared global datacenter infrastructure of Alibaba Group. It effectively improves cluster utilization from 10% to averagely 50%. It is far more complicated beyond scheduling that involves technique evolutions of IDC, network, physical datacenter topology, storage, server hardwares, operating systems and containerization. We demonstrate its effectiveness by analysis of newest Alibaba public cluster trace in 2017. We are the first of works to reveal the global view of scenarios, challenges and status in Alibaba large-scale global datacenters by data demonstration, including big promotion events like Double 11 . Data-driven intelligent scheduling methodologies and effective infrastructure co-location techniques are critical and necessary to pursue maximized multi-resource efficiency in modern large-scale datacenter, especially for long-running workloads

    Spectrum Trading: An Abstracted Bibliography

    Full text link
    This document contains a bibliographic list of major papers on spectrum trading and their abstracts. The aim of the list is to offer researchers entering this field a fast panorama of the current literature. The list is continually updated on the webpage \url{http://www.disp.uniroma2.it/users/naldi/Ricspt.html}. Omissions and papers suggested for inclusion may be pointed out to the authors through e-mail (\textit{[email protected]})

    Deliverable JRA1.1: Evaluation of current network control and management planes for multi-domain network infrastructure

    Get PDF
    This deliverable includes a compilation and evaluation of available control and management architectures and protocols applicable to a multilayer infrastructure in a multi-domain Virtual Network environment.The scope of this deliverable is mainly focused on the virtualisation of the resources within a network and at processing nodes. The virtualization of the FEDERICA infrastructure allows the provisioning of its available resources to users by means of FEDERICA slices. A slice is seen by the user as a real physical network under his/her domain, however it maps to a logical partition (a virtual instance) of the physical FEDERICA resources. A slice is built to exhibit to the highest degree all the principles applicable to a physical network (isolation, reproducibility, manageability, ...). Currently, there are no standard definitions available for network virtualization or its associated architectures. Therefore, this deliverable proposes the Virtual Network layer architecture and evaluates a set of Management- and Control Planes that can be used for the partitioning and virtualization of the FEDERICA network resources. This evaluation has been performed taking into account an initial set of FEDERICA requirements; a possible extension of the selected tools will be evaluated in future deliverables. The studies described in this deliverable define the virtual architecture of the FEDERICA infrastructure. During this activity, the need has been recognised to establish a new set of basic definitions (taxonomy) for the building blocks that compose the so-called slice, i.e. the virtual network instantiation (which is virtual with regard to the abstracted view made of the building blocks of the FEDERICA infrastructure) and its architectural plane representation. These definitions will be established as a common nomenclature for the FEDERICA project. Other important aspects when defining a new architecture are the user requirements. It is crucial that the resulting architecture fits the demands that users may have. Since this deliverable has been produced at the same time as the contact process with users, made by the project activities related to the Use Case definitions, JRA1 has proposed a set of basic Use Cases to be considered as starting point for its internal studies. When researchers want to experiment with their developments, they need not only network resources on their slices, but also a slice of the processing resources. These processing slice resources are understood as virtual machine instances that users can use to make them behave as software routers or end nodes, on which to download the software protocols or applications they have produced and want to assess in a realistic environment. Hence, this deliverable also studies the APIs of several virtual machine management software products in order to identify which best suits FEDERICA’s needs.Postprint (published version
    • …
    corecore