48 research outputs found
F-DCTCP: Fair Congestion Control for SDN-Based Data Center Networks
Network congestion control and management has long been a major issue in providing low-latency, high throughput and high link-utilisation. In particular, Data Center Network (DCN) environments, often dominated by partition-aggregate workloads, suffer performance collapse due to TCP Incast caused by inadequate congestion control parameters. This paper proposes a step in solving this problem. In particular, we describe a novel framework to overcome and control congestion in DCNs based on Software Defined Networking (SDN). We propose a native SDN-based congestion control mechanism, termed Fair Data Center TCP (F-DCTCP). As we shall see, the experimental results show that F-DCTCP outperforms all previous proposals by providing the best combined overall performance in terms of throughout, fairness and flow completion times, making it highly attractive for next generation networks
Datacenter Traffic Control: Understanding Techniques and Trade-offs
Datacenters provide cost-effective and flexible access to scalable compute
and storage resources necessary for today's cloud computing needs. A typical
datacenter is made up of thousands of servers connected with a large network
and usually managed by one operator. To provide quality access to the variety
of applications and services hosted on datacenters and maximize performance, it
deems necessary to use datacenter networks effectively and efficiently.
Datacenter traffic is often a mix of several classes with different priorities
and requirements. This includes user-generated interactive traffic, traffic
with deadlines, and long-running traffic. To this end, custom transport
protocols and traffic management techniques have been developed to improve
datacenter network performance.
In this tutorial paper, we review the general architecture of datacenter
networks, various topologies proposed for them, their traffic properties,
general traffic control challenges in datacenters and general traffic control
objectives. The purpose of this paper is to bring out the important
characteristics of traffic control in datacenters and not to survey all
existing solutions (as it is virtually impossible due to massive body of
existing research). We hope to provide readers with a wide range of options and
factors while considering a variety of traffic control mechanisms. We discuss
various characteristics of datacenter traffic control including management
schemes, transmission control, traffic shaping, prioritization, load balancing,
multipathing, and traffic scheduling. Next, we point to several open challenges
as well as new and interesting networking paradigms. At the end of this paper,
we briefly review inter-datacenter networks that connect geographically
dispersed datacenters which have been receiving increasing attention recently
and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial
Incast mitigation in a data center storage cluster through a dynamic fair-share buffer policy
Incast is a phenomenon when multiple devices interact with only one device at a given time. Multiple storage senders overflow either the switch buffer or the single-receiver memory. This pattern causes all concurrent-senders to stop and wait for buffer/memory availability, and leads to a packet loss and retransmissionāresulting in a huge latency. We present a software-defined technique tackling the many-to-one communication patternāIncastāin a data center storage cluster. Our proposed method decouples the default TCP windowing mechanism from all storage servers, and delegates it to the software-defined storage controller. The proposed method removes the TCP saw-tooth behavior, provides a global flow awareness, and implements the dynamic fair-share buffer policy for end-to-end I/O path. It considers all I/O stages (applications, device drivers, NICs, switches/routers, file systems, I/O schedulers, main memory, and physical disks) while achieving the maximum I/O throughput. The policy, which is part of the proposed method, allocates fair-share bandwidth utilization for all storage servers. Priority queues are incorporated to handle the most important data flows. In addition, the proposed method provides better manageability and maintainability compared with traditional storage networks, where data plane and control plane reside in the same device
Congestion control, energy efficiency and virtual machine placement for data centers
Data centers, facilities with communications network equipment and servers for data processing and/or storage, are prevalent and essential to provide a myriad of services and applications for various private, non-profit, and government systems, and they also form the foundation of cloud computing, which is transforming the technological landscape of the Internet. With rapid deployment of modern high-speed low-latency large-scale data centers, many issues have emerged in data centers, such as data center architecture design, congestion control, energy efficiency, virtual machine placement, and load balancing.
The objective of this thesis is multi-fold. First, an enhanced Quantized Congestion Notification (QCN) congestion notification algorithm, called fair QCN (FQCN), is proposed to improve rate allocation fairness of multiple flows sharing one bottleneck link in data center networks. Detailed analysis on FQCN and simulation results is provided to validate the fair share rate allocation while maintaining the queue length stability. Furthermore, the effects of congestion notification algorithms, including QCN, AF-QCN and FQCN, are investigated with respect to TCP throughput collapse. The results show that FQCN can significantly enhance TCP throughput performance, and achieve better TCP throughput than QCN and AF-QCN in a TCP Incast setting.
Second, a unified congestion detection, notification and control system for data center networks is designed to efficiently resolve network congestion in a uniform solution and to ensure convergence to statistical fairness with āno stateā switches simultaneously. The architecture of the proposed system is described in detail and the FQCN algorithm is implemented in the proposed framework. The simulation results of the FQCN algorithm implemented in the proposed framework validate the robustness and efficiency of the proposed congestion control system.
Third, a two-level power optimization model, namely, Hierarchical EneRgy Optimization (HERO), is established to reduce the power consumption of data center networks by switching off network switches and links while still guaranteeing full connectivity and maximizing link utilization. The power-saving performance of the proposed HERO model is evaluated by simulations with different traffic patterns. The simulation results have shown that HERO can reduce the power consumption of data center networks effectively with reduced complexity.
Last, several heterogeneity aware dominant resource assistant heuristic algorithms, namely, dominant residual resource aware first-fit decreasing (DRR-FFD), individual DRR-FFD (iDRR-FFD) and dominant residual resource based bin fill (DRR-BinFill), are proposed for virtual machine (VM) consolidation. The proposed heuristic algorithms exploit the heterogeneity of the VMsā requirements for different resources by capturing the differences among VMsā demands, and the heterogeneity of the physical machinesā resource capacities by capturing the differences among physical machinesā resources. The performance of the proposed heuristic algorithms is evaluated with different classes of synthetic workloads under different VM requirement heterogeneity conditions, and the simulation results demonstrate that the proposed heuristics achieve quite similar consolidation performance as dimension-aware heuristics with almost the same computational cost as those of the single dimensional heuristics
Tuning the aggressive TCP behavior for highly concurrent HTTP connections in intra-datacenter
This is the author accepted manuscript. The final version is available from the publisher via the DOI in this record.IEEE Modern data centers host diverse hyper text transfer protocol (HTTP)-based services, which employ persistent transmission control protocol (TCP) connections to send HTTP requests and responses. However, the ON/OFF pattern of HTTP traffic disturbs the increase of TCP congestion window, potentially triggering packet loss at the beginning of ON period. Furthermore, the transmission performance becomes worse due to severe congestion in the concurrent transfer of HTTP response. In this paper, we provide the first extensive study to investigate the root cause of performance degradation of highly concurrent HTTP connections in data center network. We further present the design and implementation of TCP-TRIM, which employs probe packets to smooth the aggressive increase of congestion window in persistent TCP connection and leverages congestion detection and control at end-host to limit the growth of switch queue length under highly concurrent TCP connections. The experimental results of at-scale simulations and real implementations demonstrate that TCP-TRIM reduces the completion time of HTTP response by up to 80 & #x0025;, while introducing little deployment overhead only at the end hosts.This work is supported by the National Natural Science
Foundation of China (61572530, 61502539, 61402541,
61462007 and 61420106009)
Recommended from our members
Understanding the characteristics of Internet traffic and designing an efficient RaptorQ-based data transport protocol for modern data centres
This thesis is the amalgamation of research on efficient data transport protocols for data centres and a comprehensive and systematic study of Internet traffic, which came as a result of the need to understand traffic patterns and workloads in modern computer networks.
The first part of the thesis is on the development of efficient data transport pro- tocols for data centres. We study modern data transport protocols for data centres through large scale simulations using the OMNeT++ simulator. We developed and experimented with an OMNeT++ model of NDP. This has led to the identification of limitations of the state of the art and the formulation of research questions with respect to data transport protocols for modern data centres. The developed model includes an implementation of a Fat-tree topology and per-packet ECMP load bal- ancing. We discuss how we integrated the model with the INET Framework and validated it by running various experiments that test different model parameters and components. This work revealed limitations of NDP with respect to efficient one-to-many and many-to-one communication in data centres, which led to the de- velopment of SCDP, a novel and general-purpose data transport protocol for data centres that, in contrast to all other protocols proposed to date, natively supports one-to-many and many-to-one data communication, which is extremely common in modern data centres. SCDP does so without compromising on efficiency for short and long unicast flows. SCDP achieves this by integrating RaptorQ codes with receiver-driven data transport, in-network packet trimming and Multi-Level Feed- back Queuing (MLFQ); (1) RaptorQ codes enable efficient one-to-many and many- to-one data transport; (2) on top of RaptorQ codes, receiver- driven flow control, in combination with in-network packet trimming, enable efficient usage of network re- sources as well as multi-path transport and packet spraying for all transport modes. Incast and Outcast are eliminated; (3) the systematic nature of RaptorQ codes, in combination with MLFQ, enable fast, decoding-free completion of short flows. We extensively evaluated SCDP in a wide range of simulated scenarios with realistic data centre workloads. For one-to-many and many-to-one transport sessions, SCDP performs significantly better than NDP. For short and long unicast flows, SCDP performs equally well or better compared to NDP.
In the second part of the thesis, we extensively study Internet traffic. Getting good statistical models of traffic on network links is a well-known, often-studied problem. A lot of attention has been given to correlation patterns and flow duration. The distribution of the amount of traffic per unit time is an equally important but less studied problem. We study a large number of traffic traces from many different networks including academic, commercial and residential networks using state-of-the-art statistical techniques. We show that the log-normal distribution is a better fit than the Gaussian distribution. We also investigate a second, heavy- tailed distribution and show that its performance is better than Gaussian but worse than log-normal. We examine anomalous traces which are a poor fit for all tested distributions and show that this is often due to traffic outages or links that hit maximum capacity. Stationarity tests showed that the traffic is stationary at some range of aggregation times. We demonstrate the utility of the log-normal distribution in two contexts: predicting the proportion of time traffic will exceed a given level (for link capacity estimation) and predicting 95th percentile pricing. We also show the log-normal distribution is a better predictor than Gaussian orWeibull distributions
Recommended from our members
Optimising data centre operation by removing the transport bottleneck
Data centres lie at the heart of almost every service on the Internet. Data centres are used to provide search results, to power social media, to store and index email, to host ācloudā applications, for online retail and to provide a myriad of other web services. Consequently the more efficient they can be made the better for all of us. The power of modern data centres is in combining commodity off-the-shelf server hardware and network equipment to provide what Googleās Barrosso and Ho Ģlzle describe as āwarehouse scaleā computers.
Data centres rely on TCP, a transport protocol that was originally designed for use in the Internet. Like other such protocols, TCP has been optimised to maximise throughput, usually by filling up queues at the bottleneck. However, for most applications within a data centre network latency is more critical than throughput. Consequently the choice of transport protocol becomes a bottleneck for performance. My thesis is that the solution to this is to move away from the use of one-size-fits-all transport protocols towards ones that have been designed to reduce latency across the data centre and which can dynamically respond to the needs of the applications.
This dissertation focuses on optimising the transport layer in data centre networks. In particular I address the question of whether any single transport mechanism can be flexible enough to cater to the needs of all data centre traffic. I show that one leading protocol (DCTCP) has been heavily optimised for certain network conditions. I then explore approaches that seek to minimise latency for applications that care about it while still allowing throughput-intensive applications to receive a good level of service. My key contributions to this are Silo and Trevi.
Trevi is a novel transport system for storage traffic that utilises fountain coding to max- imise throughput and minimise latency while being agnostic to drop, thus allowing storage traffic to be pushed out of the way when latency sensitive traffic is present in the network. Silo is an admission control system that is designed to give tenants of a multi-tenant data centre guaranteed low latency network performance. Both of these were developed in collaboration with others