25 research outputs found

    Incast mitigation in a data center storage cluster through a dynamic fair-share buffer policy

    Get PDF
    Incast is a phenomenon when multiple devices interact with only one device at a given time. Multiple storage senders overflow either the switch buffer or the single-receiver memory. This pattern causes all concurrent-senders to stop and wait for buffer/memory availability, and leads to a packet loss and retransmission—resulting in a huge latency. We present a software-defined technique tackling the many-to-one communication pattern—Incast—in a data center storage cluster. Our proposed method decouples the default TCP windowing mechanism from all storage servers, and delegates it to the software-defined storage controller. The proposed method removes the TCP saw-tooth behavior, provides a global flow awareness, and implements the dynamic fair-share buffer policy for end-to-end I/O path. It considers all I/O stages (applications, device drivers, NICs, switches/routers, file systems, I/O schedulers, main memory, and physical disks) while achieving the maximum I/O throughput. The policy, which is part of the proposed method, allocates fair-share bandwidth utilization for all storage servers. Priority queues are incorporated to handle the most important data flows. In addition, the proposed method provides better manageability and maintainability compared with traditional storage networks, where data plane and control plane reside in the same device

    Re-architecting datacenter networks and stacks for low latency and high performance

    Get PDF
    © 2017 ACM. Modern datacenter networks provide very high capacity via redundant Clos topologies and low switch latency, but transport protocols rarely deliver matching performance. We present NDP, a novel datacenter transport architecture that achieves near-optimal completion times for short transfers and high flow throughput in a wide range of scenarios, including incast. NDP switch buffers are very shallow and when they fill the switches trim packets to headers and priority forward the headers. This gives receivers a full view of instantaneous demand from all senders, and is the basis for our novel, high-performance, multipath-aware transport protocol that can deal gracefully with massive incast events and prioritize traffic from different senders on RTT timescales. We implemented NDP in Linux hosts with DPDK, in a software switch, in a NetFPGA-based hardware switch, and in P4. We evaluate NDP's performance in our implementations and in large-scale simulations, simultaneously demonstrating support for very low-latency and high throughput.This work was partly funded by the SSICLOPS H2020 project (644866)

    Enhancing programmability for adaptive resource management in next generation data centre networks

    Get PDF
    Recently, Data Centre (DC) infrastructures have been growing rapidly to support a wide range of emerging services, and provide the underlying connectivity and compute resources that facilitate the "*-as-a-Service" model. This has led to the deployment of a multitude of services multiplexed over few, very large-scale centralised infrastructures. In order to cope with the ebb and flow of users, services and traffic, infrastructures have been provisioned for peak-demand resulting in the average utilisation of resources to be low. This overprovisionning has been further motivated by the complexity in predicting traffic demands over diverse timescales and the stringent economic impact of outages. At the same time, the emergence of Software Defined Networking (SDN), is offering new means to monitor and manage the network infrastructure to address this underutilisation. This dissertation aims to show how measurement-based resource management can improve performance and resource utilisation by adaptively tuning the infrastructure to the changing operating conditions. To achieve this dynamicity, the infrastructure must be able to centrally monitor, notify and react based on the current operating state, from per-packet dynamics to longstanding traffic trends and topological changes. However, the management and orchestration abilities of current SDN realisations is too limiting and must evolve for next generation networks. The current focus has been on logically centralising the routing and forwarding decisions. However, in order to achieve the necessary fine-grained insight, the data plane of the individual device must be programmable to collect and disseminate the metrics of interest. The results of this work demonstrates that a logically centralised controller can dynamically collect and measure network operating metrics to subsequently compute and disseminate fine-tuned environment-specific settings. They show how this approach can prevent TCP throughput incast collapse and improve TCP performance by an order of magnitude for partition-aggregate traffic patterns. Futhermore, the paradigm is generalised to show the benefits for other services widely used in DCs such as, e.g, routing, telemetry, and security

    Performance of Quantized Congestion Notification in TCP Incast in Data Centers

    Get PDF
    This thesis analyzes the performance of Quantized Congestion Notification (QCN) during data access from clustered servers in data centers. The reasons why QCN does not perform adequately in these situations are examined and several modifications are proposed to the protocol to improve its performance in these scenarios. The causes of QCN performance degradation are traced to flow rate variability, and it is shown that adaptive sampling at the switch and adaptive self-increase of flow rates at the QCN rate limiter significantly enhance QCN performance in a TCP Incast setup. The performance of QCN is compared against TCP modifications in a heterogeneous environment, and it is shown that modifications to QCN yield better performance. Finally, the performance of QCN with the proposed modifications is compared with that of unmodified QCN in other workloads to show that the modifications do not negatively affect QCN performance in general

    Re-architecting datacenter networks and stacks for low latency and high performance

    Get PDF
    Modern datacenter networks provide very high capacity via redundant Clos topologies and low switch latency, but transport protocols rarely deliver matching performance. We present NDP, a novel data-center transport architecture that achieves near-optimal completion times for short transfers and high flow throughput in a wide range of scenarios, including incast. NDP switch buffers are very shallow and when they fill the switches trim packets to headers and priority forward the headers. This gives receivers a full view of instantaneous demand from all senders, and is the basis for our novel, high-performance, multipath-aware transport protocol that can deal gracefully with massive incast events and prioritize traffic from different senders on RTT timescales. We implemented NDP in Linux hosts with DPDK, in a software switch, in a NetFPGA-based hardware switch, and in P4. We evaluate NDP's performance in our implementations and in large-scale simulations, simultaneously demonstrating support for very low-latency and high throughput

    Tuning the aggressive TCP behavior for highly concurrent HTTP connections in intra-datacenter

    Get PDF
    This is the author accepted manuscript. The final version is available from the publisher via the DOI in this record.IEEE Modern data centers host diverse hyper text transfer protocol (HTTP)-based services, which employ persistent transmission control protocol (TCP) connections to send HTTP requests and responses. However, the ON/OFF pattern of HTTP traffic disturbs the increase of TCP congestion window, potentially triggering packet loss at the beginning of ON period. Furthermore, the transmission performance becomes worse due to severe congestion in the concurrent transfer of HTTP response. In this paper, we provide the first extensive study to investigate the root cause of performance degradation of highly concurrent HTTP connections in data center network. We further present the design and implementation of TCP-TRIM, which employs probe packets to smooth the aggressive increase of congestion window in persistent TCP connection and leverages congestion detection and control at end-host to limit the growth of switch queue length under highly concurrent TCP connections. The experimental results of at-scale simulations and real implementations demonstrate that TCP-TRIM reduces the completion time of HTTP response by up to 80 & #x0025;, while introducing little deployment overhead only at the end hosts.This work is supported by the National Natural Science Foundation of China (61572530, 61502539, 61402541, 61462007 and 61420106009)

    Network and Server Resource Management Strategies for Data Centre Infrastructures: A Survey

    Get PDF
    The advent of virtualisation and the increasing demand for outsourced, elastic compute charged on a pay-as-you-use basis has stimulated the development of large-scale Cloud Data Centres (DCs) housing tens of thousands of computer clusters. Of the signi�cant capital outlay required for building and operating such infrastructures, server and network equipment account for 45% and 15% of the total cost, respectively, making resource utilisation e�ciency paramount in order to increase the operators' Return-on-Investment (RoI). In this paper, we present an extensive survey on the management of server and network resources over virtualised Cloud DC infrastructures, highlighting key concepts and results, and critically discussing their limitations and implications for future research opportunities. We highlight the need for and bene �ts of adaptive resource provisioning that alleviates reliance on static utilisation prediction models and exploits direct measurement of resource utilisation on servers and network nodes. Coupling such distributed measurement with logically-centralised Software De�ned Networking (SDN) principles, we subsequently discuss the challenges and opportunities for converged resource management over converged ICT environments, through unifying control loops to globally orchestrate adaptive and load-sensitive resource provisioning

    MMPTCP: a novel transport protocol for data centre networks

    Get PDF
    Modern data centres provide large aggregate capacity in the backbone of networks so that servers can theoretically communicate with each other at their maximum rates. However, the Transport Control Protocol (TCP) cannot efficiently use this large capacity even if Equal-Cost Multi-Path (ECMP) routing is enabled to exploit the existence of parallel paths. MultiPath TCP (MPTCP) can effectively use the network resources of such topologies by performing fast distributed load balancing. MPTCP is an appealing technique for data centres that are very dynamic in nature. However, it is ill-suited for handling short flows since it increases their flow completion time. To mitigate these problems, we propose Maximum MultiPath TCP (MMPTCP), a novel transport protocol for modern data centres. Unlike MPTCP, it provides high performance for all network flows. It also decreases the bursty nature of data centres, which is essentially rooted in traffic patterns of short flows. MMPTCP achieves these nice features by randomising a flow’s packets via all parallel paths to a destination during the initial phase of data transmission until a certain amount of data is delivered. It then switches to MPTCP with several subflows in which data transmission is governed by MPTCP congestion control. In this way, short flows are delivered very fast via the initial phase only, and long flows are delivered by MPTCP with several subflows. We evaluate MMPTCP in a FatTree topology under various network conditions. We found that MMPTCP decreases the loss rate of all the links throughout the network and helps competing flows to achieve a better performance. Unlike MPTCP with a fixed number of subflows, MMPTCP offers high burst tolerance and low-latency for short flows while it maintains high overall network utilisation. MMPTCP is incrementally deployable in existing data centres because it does not require any modification to the network and application layers
    corecore