510 research outputs found

    Datacenter Traffic Control: Understanding Techniques and Trade-offs

    Get PDF
    Datacenters provide cost-effective and flexible access to scalable compute and storage resources necessary for today's cloud computing needs. A typical datacenter is made up of thousands of servers connected with a large network and usually managed by one operator. To provide quality access to the variety of applications and services hosted on datacenters and maximize performance, it deems necessary to use datacenter networks effectively and efficiently. Datacenter traffic is often a mix of several classes with different priorities and requirements. This includes user-generated interactive traffic, traffic with deadlines, and long-running traffic. To this end, custom transport protocols and traffic management techniques have been developed to improve datacenter network performance. In this tutorial paper, we review the general architecture of datacenter networks, various topologies proposed for them, their traffic properties, general traffic control challenges in datacenters and general traffic control objectives. The purpose of this paper is to bring out the important characteristics of traffic control in datacenters and not to survey all existing solutions (as it is virtually impossible due to massive body of existing research). We hope to provide readers with a wide range of options and factors while considering a variety of traffic control mechanisms. We discuss various characteristics of datacenter traffic control including management schemes, transmission control, traffic shaping, prioritization, load balancing, multipathing, and traffic scheduling. Next, we point to several open challenges as well as new and interesting networking paradigms. At the end of this paper, we briefly review inter-datacenter networks that connect geographically dispersed datacenters which have been receiving increasing attention recently and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial

    TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for On-line Data-Intensive Applications

    Get PDF
    Datacenters running on-line, data-intensive applications (OLDIs) consume significant amounts of energy. However, reducing their energy is challenging due to their tight response time requirements. A key aspect of OLDIs is that each user query goes to all or many of the nodes in the cluster, so that the overall time budget is dictated by the tail of the replies' latency distribution; replies see latency variations both in the network and compute. Previous work proposes to achieve load-proportional energy by slowing down the computation at lower datacenter loads based directly on response times (i.e., at lower loads, the proposal exploits the average slack in the time budget provisioned for the peak load). In contrast, we propose TimeTrader to reduce energy by exploiting the latency slack in the sub- critical replies which arrive before the deadline (e.g., 80% of replies are 3-4x faster than the tail). This slack is present at all loads and subsumes the previous work's load-related slack. While the previous work shifts the leaves' response time distribution to consume the slack at lower loads, TimeTrader reshapes the distribution at all loads by slowing down individual sub-critical nodes without increasing missed deadlines. TimeTrader exploits slack in both the network and compute budgets. Further, TimeTrader leverages Earliest Deadline First scheduling to largely decouple critical requests from the queuing delays of sub- critical requests which can then be slowed down without hurting critical requests. A combination of real-system measurements and at-scale simulations shows that without adding to missed deadlines, TimeTrader saves 15-19% and 41-49% energy at 90% and 30% loading, respectively, in a datacenter with 512 nodes, whereas previous work saves 0% and 31-37%.Comment: 13 page

    Exploiting the power of multiplicity: a holistic survey of network-layer multipath

    Get PDF
    The Internet is inherently a multipath network: For an underlying network with only a single path, connecting various nodes would have been debilitatingly fragile. Unfortunately, traditional Internet technologies have been designed around the restrictive assumption of a single working path between a source and a destination. The lack of native multipath support constrains network performance even as the underlying network is richly connected and has redundant multiple paths. Computer networks can exploit the power of multiplicity, through which a diverse collection of paths is resource pooled as a single resource, to unlock the inherent redundancy of the Internet. This opens up a new vista of opportunities, promising increased throughput (through concurrent usage of multiple paths) and increased reliability and fault tolerance (through the use of multiple paths in backup/redundant arrangements). There are many emerging trends in networking that signify that the Internet's future will be multipath, including the use of multipath technology in data center computing; the ready availability of multiple heterogeneous radio interfaces in wireless (such as Wi-Fi and cellular) in wireless devices; ubiquity of mobile devices that are multihomed with heterogeneous access networks; and the development and standardization of multipath transport protocols such as multipath TCP. The aim of this paper is to provide a comprehensive survey of the literature on network-layer multipath solutions. We will present a detailed investigation of two important design issues, namely, the control plane problem of how to compute and select the routes and the data plane problem of how to split the flow on the computed paths. The main contribution of this paper is a systematic articulation of the main design issues in network-layer multipath routing along with a broad-ranging survey of the vast literature on network-layer multipathing. We also highlight open issues and identify directions for future work

    NUMFabric: Fast and Flexible Bandwidth Allocation in Datacenters

    Get PDF
    We present xFabric, a novel datacenter transport design that provides flexible and fast bandwidth allocation control. xFabric is flexible: it enables operators to specify how bandwidth is allocated amongst contending flows to optimize for different service-level objectives such as minimizing flow completion times, weighted allocations, different notions of fairness, etc. xFabric is also very fast, it converges to the specified allocation one-to-two order of magnitudes faster than prior schemes. Underlying xFabric, is a novel distributed algorithm that uses in-network packet scheduling to rapidly solve general network utility maximization problems for bandwidth allocation. We evaluate xFabric using realistic datacenter topologies and highly dynamic workloads and show that it is able to provide flexibility and fast convergence in such stressful environments.Google Faculty Research Awar
    corecore