30 research outputs found

    Datacenter Traffic Control: Understanding Techniques and Trade-offs

    Get PDF
    Datacenters provide cost-effective and flexible access to scalable compute and storage resources necessary for today's cloud computing needs. A typical datacenter is made up of thousands of servers connected with a large network and usually managed by one operator. To provide quality access to the variety of applications and services hosted on datacenters and maximize performance, it deems necessary to use datacenter networks effectively and efficiently. Datacenter traffic is often a mix of several classes with different priorities and requirements. This includes user-generated interactive traffic, traffic with deadlines, and long-running traffic. To this end, custom transport protocols and traffic management techniques have been developed to improve datacenter network performance. In this tutorial paper, we review the general architecture of datacenter networks, various topologies proposed for them, their traffic properties, general traffic control challenges in datacenters and general traffic control objectives. The purpose of this paper is to bring out the important characteristics of traffic control in datacenters and not to survey all existing solutions (as it is virtually impossible due to massive body of existing research). We hope to provide readers with a wide range of options and factors while considering a variety of traffic control mechanisms. We discuss various characteristics of datacenter traffic control including management schemes, transmission control, traffic shaping, prioritization, load balancing, multipathing, and traffic scheduling. Next, we point to several open challenges as well as new and interesting networking paradigms. At the end of this paper, we briefly review inter-datacenter networks that connect geographically dispersed datacenters which have been receiving increasing attention recently and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial

    Controlling Network Latency in Mixed Hadoop Clusters: Do We Need Active Queue Management?

    Get PDF
    With the advent of big data, data center applications are processing vast amounts of unstructured and semi-structured data, in parallel on large clusters, across hundreds to thousands of nodes. The highest performance for these batch big data workloads is achieved using expensive network equipment with large buffers, which accommodate bursts in network traffic and allocate bandwidth fairly even when the network is congested. Throughput-sensitive big data applications are, however, often executed in the same data center as latency-sensitive workloads. For both workloads to be supported well, the network must provide both maximum throughput and low latency. Progress has been made in this direction, as modern network switches support Active Queue Management (AQM) and Explicit Congestion Notifications (ECN), both mechanisms to control the level of queue occupancy, reducing the total network latency. This paper is the first study of the effect of Active Queue Management on both throughput and latency, in the context of Hadoop and the MapReduce programming model. We give a quantitative comparison of four different approaches for controlling buffer occupancy and latency: RED and CoDel, both standalone and also combined with ECN and DCTCP network protocol, and identify the AQM configurations that maintain Hadoop execution time gains from larger buffers within 5%, while reducing network packet latency caused by bufferbloat by up to 85%. Finally, we provide recommendations to administrators of Hadoop clusters as to how to improve latency without degrading the throughput of batch big data workloads.The research leading to these results has received funding from the European Unions Seventh Framework Programme (FP7/2007–2013) under grant agreement number 610456 (Euroserver). The research was also supported by the Ministry of Economy and Competitiveness of Spain under the contracts TIN2012-34557 and TIN2015-65316-P, Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), HiPEAC-3 Network of Excellence (ICT- 287759), and the Severo Ochoa Program (SEV-2011-00067) of the Spanish Government.Peer ReviewedPostprint (author's final draft

    A study on fairness and latency issues over high speed networks and data center networks

    Get PDF
    Newly emerging computer networks, such as high speed networks and data center networks, have characteristics of high bandwidth and high burstiness which make it difficult to address issues such as fairness, queuing latency and link utilization. In this study, we first conduct extensive experimental evaluation of the performance of 10Gbps high speed networks. We found inter-protocol unfairness and larger queuing latency are two outstanding issues in high speed networks and data center networks. There have been several proposals to address fairness and latency issues at switch level via queuing schemes. These queuing schemes have been fairly successful in addressing either fairness issue or large latency but not both at the same time. We propose a new queuing scheme called Approximated-Fair and Controlled-Delay (AFCD) queuing scheme that meets following goals for high speed networks: approximated fairness, controlled low queuing delay, high link utilization and simple implementation. The design of AFCD utilizes a novel synergistic approach by forming an alliance between approximated fair queuing and controlled delay queuing. AFCD maintains very small amount of state information in sending rate estimation of flows and makes drop decision based on a target delay of individual flow. We then present FaLL, a Fair and Low Latency queuing scheme that meets stringent performance requirements of data center networks: fair share of bandwidth, low queuing latency, high throughput, and ease of deployment. FaLL uses an efficiency module, a fairness module and a target delay based dropping scheme to meet these goals. Through rigorous experiments on real testbed, we show that FaLL outperforms various peer solutions in variety of network conditions over data center networks

    Energy Efficient Ethernet on MapReduce Clusters: Packet Coalescing To Improve 10GbE Links

    Get PDF
    An important challenge of modern data centers is to reduce energy consumption, of which a substantial proportion is due to the network. Switches and NICs supporting the recent energy efficient Ethernet (EEE) standard are now available, but current practice is to disable EEE in production use, since its effect on real world application performance is poorly understood. This paper contributes to this discussion by analyzing the impact of EEE on MapReduce workloads, in terms of performance overheads and energy savings. MapReduce is the central programming model of Apache Hadoop, one of the most widely used application frameworks in modern data centers. We find that, while 1GbE links (edge links) achieve good energy savings using the standard EEE implementation, optimum energy savings in the 10 GbE links (aggregation and core links) are only possible, if these links employ packet coalescing. Packet coalescing must, however, be carefully configured in order to avoid excessive performance degradation. With our new analysis of how the static parameters of packet coalescing perform under different cluster loads, we were able to cover both idle and heavy load periods that can exist on this type of environment. Finally, we evaluate our recommendation for packet coalescing for 10 GbE links using the energy-delay metric. This paper is an extension of our previous work [1], which was published in the Proceedings of the 40th Annual IEEE Conference on Local Computer Networks (LCN 2015).This work was supported in part by the European Union’s Seventh Framework Programme (FP7/2007-2013) under Grant 610456 (EUROSERVER), in part by the Spanish Government through the Severo Ochoa programme (SEV-2011-00067 and SEV-2015-0493), in part by the Spanish Ministry of Economy a nd Competitiveness under Contract TIN2012-34557 and Contract TIN2015-65316-P, and in part by the Generalitat de Catalunya under Contract 2014-SGR-1051 and Contract 2014-SGR-1272.Peer ReviewedPostprint (author's final draft

    Incast mitigation in a data center storage cluster through a dynamic fair-share buffer policy

    Get PDF
    Incast is a phenomenon when multiple devices interact with only one device at a given time. Multiple storage senders overflow either the switch buffer or the single-receiver memory. This pattern causes all concurrent-senders to stop and wait for buffer/memory availability, and leads to a packet loss and retransmission—resulting in a huge latency. We present a software-defined technique tackling the many-to-one communication pattern—Incast—in a data center storage cluster. Our proposed method decouples the default TCP windowing mechanism from all storage servers, and delegates it to the software-defined storage controller. The proposed method removes the TCP saw-tooth behavior, provides a global flow awareness, and implements the dynamic fair-share buffer policy for end-to-end I/O path. It considers all I/O stages (applications, device drivers, NICs, switches/routers, file systems, I/O schedulers, main memory, and physical disks) while achieving the maximum I/O throughput. The policy, which is part of the proposed method, allocates fair-share bandwidth utilization for all storage servers. Priority queues are incorporated to handle the most important data flows. In addition, the proposed method provides better manageability and maintainability compared with traditional storage networks, where data plane and control plane reside in the same device
    corecore