13 research outputs found
Datacenter Traffic Control: Understanding Techniques and Trade-offs
Datacenters provide cost-effective and flexible access to scalable compute
and storage resources necessary for today's cloud computing needs. A typical
datacenter is made up of thousands of servers connected with a large network
and usually managed by one operator. To provide quality access to the variety
of applications and services hosted on datacenters and maximize performance, it
deems necessary to use datacenter networks effectively and efficiently.
Datacenter traffic is often a mix of several classes with different priorities
and requirements. This includes user-generated interactive traffic, traffic
with deadlines, and long-running traffic. To this end, custom transport
protocols and traffic management techniques have been developed to improve
datacenter network performance.
In this tutorial paper, we review the general architecture of datacenter
networks, various topologies proposed for them, their traffic properties,
general traffic control challenges in datacenters and general traffic control
objectives. The purpose of this paper is to bring out the important
characteristics of traffic control in datacenters and not to survey all
existing solutions (as it is virtually impossible due to massive body of
existing research). We hope to provide readers with a wide range of options and
factors while considering a variety of traffic control mechanisms. We discuss
various characteristics of datacenter traffic control including management
schemes, transmission control, traffic shaping, prioritization, load balancing,
multipathing, and traffic scheduling. Next, we point to several open challenges
as well as new and interesting networking paradigms. At the end of this paper,
we briefly review inter-datacenter networks that connect geographically
dispersed datacenters which have been receiving increasing attention recently
and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial
An edge-queued datagram service for all datacenter traffic
Modern datacenters support a wide range of protocols and in-network switch enhancements aimed at improving performance. Unfortunately, the resulting protocols often do not coexist gracefully because they inevitably interact via queuing in the network. In this paper we describe EQDS, a new datagram service for datacenters that moves almost all of the queuing out of the core network and into the sending host. This enables it to support multiple (conflicting) higher layer protocols, while only sending packets into the network according to any receiver-driven credit scheme. EQDS can transparently speed up legacy TCP and RDMA stacks, and enables transport protocol evolution, while benefiting from future switch enhancements without needing to modify higher layer stacks. We show through simulation and multiple implementations that EQDS can reduce FCT of legacy TCP by 2x, improve the NVMeOF-RDMA throughput by 30%, and safely run TCP alongside RDMA on the same network
Recommended from our members
Optimising data centre operation by removing the transport bottleneck
Data centres lie at the heart of almost every service on the Internet. Data centres are used to provide search results, to power social media, to store and index email, to host “cloud” applications, for online retail and to provide a myriad of other web services. Consequently the more efficient they can be made the better for all of us. The power of modern data centres is in combining commodity off-the-shelf server hardware and network equipment to provide what Google’s Barrosso and Ho ̈lzle describe as “warehouse scale” computers.
Data centres rely on TCP, a transport protocol that was originally designed for use in the Internet. Like other such protocols, TCP has been optimised to maximise throughput, usually by filling up queues at the bottleneck. However, for most applications within a data centre network latency is more critical than throughput. Consequently the choice of transport protocol becomes a bottleneck for performance. My thesis is that the solution to this is to move away from the use of one-size-fits-all transport protocols towards ones that have been designed to reduce latency across the data centre and which can dynamically respond to the needs of the applications.
This dissertation focuses on optimising the transport layer in data centre networks. In particular I address the question of whether any single transport mechanism can be flexible enough to cater to the needs of all data centre traffic. I show that one leading protocol (DCTCP) has been heavily optimised for certain network conditions. I then explore approaches that seek to minimise latency for applications that care about it while still allowing throughput-intensive applications to receive a good level of service. My key contributions to this are Silo and Trevi.
Trevi is a novel transport system for storage traffic that utilises fountain coding to max- imise throughput and minimise latency while being agnostic to drop, thus allowing storage traffic to be pushed out of the way when latency sensitive traffic is present in the network. Silo is an admission control system that is designed to give tenants of a multi-tenant data centre guaranteed low latency network performance. Both of these were developed in collaboration with others
Recommended from our members
Understanding the characteristics of Internet traffic and designing an efficient RaptorQ-based data transport protocol for modern data centres
This thesis is the amalgamation of research on efficient data transport protocols for data centres and a comprehensive and systematic study of Internet traffic, which came as a result of the need to understand traffic patterns and workloads in modern computer networks.
The first part of the thesis is on the development of efficient data transport pro- tocols for data centres. We study modern data transport protocols for data centres through large scale simulations using the OMNeT++ simulator. We developed and experimented with an OMNeT++ model of NDP. This has led to the identification of limitations of the state of the art and the formulation of research questions with respect to data transport protocols for modern data centres. The developed model includes an implementation of a Fat-tree topology and per-packet ECMP load bal- ancing. We discuss how we integrated the model with the INET Framework and validated it by running various experiments that test different model parameters and components. This work revealed limitations of NDP with respect to efficient one-to-many and many-to-one communication in data centres, which led to the de- velopment of SCDP, a novel and general-purpose data transport protocol for data centres that, in contrast to all other protocols proposed to date, natively supports one-to-many and many-to-one data communication, which is extremely common in modern data centres. SCDP does so without compromising on efficiency for short and long unicast flows. SCDP achieves this by integrating RaptorQ codes with receiver-driven data transport, in-network packet trimming and Multi-Level Feed- back Queuing (MLFQ); (1) RaptorQ codes enable efficient one-to-many and many- to-one data transport; (2) on top of RaptorQ codes, receiver- driven flow control, in combination with in-network packet trimming, enable efficient usage of network re- sources as well as multi-path transport and packet spraying for all transport modes. Incast and Outcast are eliminated; (3) the systematic nature of RaptorQ codes, in combination with MLFQ, enable fast, decoding-free completion of short flows. We extensively evaluated SCDP in a wide range of simulated scenarios with realistic data centre workloads. For one-to-many and many-to-one transport sessions, SCDP performs significantly better than NDP. For short and long unicast flows, SCDP performs equally well or better compared to NDP.
In the second part of the thesis, we extensively study Internet traffic. Getting good statistical models of traffic on network links is a well-known, often-studied problem. A lot of attention has been given to correlation patterns and flow duration. The distribution of the amount of traffic per unit time is an equally important but less studied problem. We study a large number of traffic traces from many different networks including academic, commercial and residential networks using state-of-the-art statistical techniques. We show that the log-normal distribution is a better fit than the Gaussian distribution. We also investigate a second, heavy- tailed distribution and show that its performance is better than Gaussian but worse than log-normal. We examine anomalous traces which are a poor fit for all tested distributions and show that this is often due to traffic outages or links that hit maximum capacity. Stationarity tests showed that the traffic is stationary at some range of aggregation times. We demonstrate the utility of the log-normal distribution in two contexts: predicting the proportion of time traffic will exceed a given level (for link capacity estimation) and predicting 95th percentile pricing. We also show the log-normal distribution is a better predictor than Gaussian orWeibull distributions
Consistent high performance and flexible congestion control architecture
The part of TCP software stack that controls how fast a data sender transfers packets is usually referred as congestion control, because it was originally introduced to avoid network congestion of multiple competing flows. During the recent 30 years of Internet evolution, traditional TCP congestion control architecture, though having a army of specially-engineered implementations and improvements over the original software, suffers increasingly more from surprisingly poor performance in today's complicated network conditions. We argue the traditional TCP congestion control family has little hope of achieving consistent high performance due to a fundamental architectural deficiency: hardwiring packet-level events to control responses.
In this thesis, we propose Performance-oriented Congestion Control (PCC), a new congestion control architecture in which each sender continuously observes the connection between its rate control actions and empirically experienced performance, enabling it to use intelligent control algorithms to consistently adopt actions that result in high performance. We first build the above foundation of PCC architecture analytically prove the viability of this new congestion control architecture. Specifically, we show that, controversial to intuition, with certain form of utility function and a theoretically simplified rate control algorithm, selfishly competing senders converge to a fair and stable Nash Equilibrium. With this architectural and theoretical guideline, we then design and implement the first congestion control protocol in PCC family: PCC Allegro. PCC Allegro immediate demonstrates its architectural benefits with significant, often more than 10X, performance gain on a wide spectrum of challenging network conditions. With these very encouraging performance validation, we further advance PCC's architecture on both utilty function framework and the learning rate control algorithm. Taking a principled approach using online learning theory, we designed PCC Vivace with a new strictly socially concave utility function framework and a gradient-ascend based learning rate control algorithm. PCC Vivace significantly improves performance on fast-changing networks, yields better tradeoff in convergence speed and stability and better TCP friendliness comparing to PCC Allegro and other state-of-art new congestion control protocols. Moreover, PCC Vivace's expressive utility function framework can be tuned differently at different competing flows to produce predictable converged throughput ratios for each flow. This opens significant future potential for PCC Vivace in centrally control networking paradigm like Software Defined Networks (SDN). Finally, with all these research advances, we aim to push PCC architecture to production use with a a user-space tunneling proxy and successfully integration with Google's QUIC transport framework
Network and Server Resource Management Strategies for Data Centre Infrastructures: A Survey
The advent of virtualisation and the increasing demand for outsourced, elastic compute charged on a pay-as-you-use basis has stimulated the development of large-scale Cloud Data Centres (DCs) housing tens of thousands of computer
clusters. Of the signi�cant capital outlay required for building and operating such infrastructures, server and network equipment account for 45% and 15% of the total cost, respectively, making resource utilisation e�ciency paramount in order to increase the operators' Return-on-Investment (RoI).
In this paper, we present an extensive survey on the management of server and network resources over virtualised Cloud DC infrastructures, highlighting
key concepts and results, and critically discussing their limitations and implications for future research opportunities. We highlight the need for and bene
�ts of adaptive resource provisioning that alleviates reliance on static utilisation prediction models and exploits direct measurement of resource utilisation
on servers and network nodes. Coupling such distributed measurement with logically-centralised Software De�ned Networking (SDN) principles, we subsequently
discuss the challenges and opportunities for converged resource management over converged ICT environments, through unifying control loops to globally orchestrate adaptive and load-sensitive resource provisioning
Enhancing programmability for adaptive resource management in next generation data centre networks
Recently, Data Centre (DC) infrastructures have been growing rapidly to support a wide range of emerging services, and provide the underlying connectivity and compute resources that facilitate the "*-as-a-Service" model. This has led to the deployment of a multitude of services multiplexed over few, very large-scale centralised infrastructures. In order to cope with the ebb and flow of users, services and traffic, infrastructures have been provisioned for peak-demand resulting in the average utilisation of resources to be low. This overprovisionning has been further motivated by the complexity in predicting traffic demands over diverse timescales and the stringent economic impact of outages. At the same time, the emergence of Software Defined Networking (SDN), is offering new means to monitor and manage the network infrastructure to address this underutilisation.
This dissertation aims to show how measurement-based resource management can improve performance and resource utilisation by adaptively tuning the infrastructure to the changing operating conditions. To achieve this dynamicity, the infrastructure must be able to centrally monitor, notify and react based on the current operating state, from per-packet dynamics to longstanding traffic trends and topological changes. However, the management and orchestration abilities of current SDN realisations is too limiting and must evolve for next generation networks. The current focus has been on logically centralising the routing and forwarding decisions. However, in order to achieve the necessary fine-grained insight, the data plane of the individual device must be programmable to collect and disseminate the metrics of interest.
The results of this work demonstrates that a logically centralised controller can dynamically collect and measure network operating metrics to subsequently compute and disseminate fine-tuned environment-specific settings. They show how this approach can prevent TCP throughput incast collapse and improve TCP performance by an order of magnitude for partition-aggregate traffic patterns. Futhermore, the paradigm is generalised to show the benefits for other services widely used in DCs such as, e.g, routing, telemetry, and security
Mitigating interconnect and end host congestion in modern networks
One of the most critical building blocks of the Internet is the mechanism to mitigate network congestion. While existing congestion control approaches have served their purpose well in the last decades, the last few years saw a significant increase in new applications and user demand, stressing the network infrastructure to the extent that new ways of handling congestion are required. This dissertation identifies the congestion problems caused by the increased scale of the network usage, both in inter-AS connects and on end hosts in data centers, and presents abstractions and frameworks that allow for improved solutions to mitigate congestion. To mitigate inter-AS congestion, we develop Unison, a framework that allows an ISP to jointly optimize its intra-domain routes and inter-domain routes, in collaboration with content providers. The basic idea is to provide the ISP operator and the neighbors of the ISP with an abstraction of the ISP network in the form of a virtual switch (vSwitch). Unison allows the ISP to provide hints to its neighbors, suggesting alternative routes that can improve their performance. We investigate how the vSwitch abstraction can be used to maximize the throughput of the ISP. To mitigate end-host congestion in data center networks, we develop a backpressure mechanism for queuing architecture in congested end hosts to cope with tens of thousands of flows. We show that current end-host mechanisms can lead to high CPU utilization, high tail latency, and low throughput in cases of congestion of egress traffic. We introduce the design, implementation, and evaluation of zero-drop networking (zD) stack, a new architecture for handling congestion of scheduled buffers. Besides queue overflow, another cause of congestion is CPU resource exhaustion. The CPU cost of processing packets in networking stacks, however, has not been fully investigated in the literature. Much of the focus of the community has been on scaling servers in terms of aggregate traffic intensity, but bottlenecks caused by the increasing number of concurrent flows have received little attention. We conduct a comprehensive analysis on the CPU cost of processing packets and identify the root cause that leads to high CPU overhead and degraded performance in terms of throughput and RTT. Our work highlights considerations beyond packets per second for the design of future stacks that scale to millions of flows.Ph.D
Improved algorithms for TCP congestion control
Reliable and efficient data transfer on the Internet is an important issue. Since late
70’s the protocol responsible for that has been the de facto standard TCP, which
has proven to be successful through out the years, its self-managed congestion
control algorithms have retained the stability of the Internet for decades. However,
the variety of existing new technologies such as high-speed networks (e.g. fibre
optics) with high-speed long-delay set-up (e.g. cross-Atlantic links) and wireless
technologies have posed lots of challenges to TCP congestion control algorithms.
The congestion control research community proposed solutions to most of these
challenges. This dissertation adds to the existing work by: firstly tackling the highspeed
long-delay problem of TCP, we propose enhancements to one of the existing
TCP variants (part of Linux kernel stack). We then propose our own variant:
TCP-Gentle. Secondly, tackling the challenge of differentiating the wireless loss
from congestive loss in a passive way and we propose a novel loss differentiation
algorithm which quantifies the noise in packet inter arrival times and use this
information together with the span (ratio of maximum to minimum packet inter
arrival times) to adapt the multiplicative decrease factor according to a predefined
logical formula. Finally, extending the well-known drift model of TCP to account
for wireless loss and some hypothetical cases (e.g. variable multiplicative decrease),
we have undertaken stability analysis for the new version of the model
Application-Aware Network Design Using Software Defined Networking for Application Performance Optimization for Big Data and Video Streaming
Title from PDF of title page viewed October 30, 2017Dissertation advisor: Deep MedhiVitaIncludes bibliographical references (pages 122-135)Thesis (Ph.D.)--School of Computing and Engineering. University of Missouri--Kansas City, 2017This dissertation investigates improvement in application performance. For applications, we consider two classes: Hadoop MapReduce and video streaming. The Hadoop
MapReduce (M/R) framework has become the de facto standard for Big Data analytics.
However, the lack of network-awareness of the default MapReduce resource manager in
a traditional IP network can cause unbalanced job scheduling and network bottlenecks;
such factors can eventually lead to an increase in the Hadoop MapReduce job completion time. Dynamic Video streaming over the HTTP (MPEG-DASH) is becoming the defacto
dominating transport for today’s video applications. It has been implemented in today’s
major media carriers such as Youtube and Netflix. It enables new video applications to
fully utilize the existing physical IP network infrastructure. For new 3D immersive medias such as Virtual Reality and 360-degree videos are drawing great attentions from both consumers and researchers in recent years. One of the biggest challenges in streaming
such 3D media is the high band width demands and video quality. A new Tile-based video
is introduced in both video codec and streaming layer to reduce the transferred media size.
In this dissertation, we propose a Software-Defined Network (SDN) approach in
an Application-Aware Network (AAN) platform. We first present an architecture for our
approach and then show how this architecture can be applied to two aforementioned application areas. Our approach provides both underlying network functions and application
level forwarding logics for Hadoop MapReduce and video streaming. By incorporating a
comprehensive view of the network, the SDN controller can optimize MapReduce work
loads and DASH flows for videos by application-aware traffic reroute. We quantify the
improvement for both Hadoop and MPEG-DASH in terms of job completion time and
user’s quality of experience (QoE), respectively. Based on our experiments, we observed
that our AAN platform for Hadoop MapReduce job optimization offer a significant improvement compared to a static, traditional IP network environment by reducing job run
time by 16% to 300% for various MapReduce benchmark jobs. As for MPEG-DASH
based video streaming, we can increase user perceived video bitrate by 100%.Introduction -- Research survey -- Proposed architecture -- AAN-SDN for Hadoop -- Study of User QoE Improvement for Dynamic Adaptive Streaming over HTTP (MPEG-DASH) -- AAN-SDN For MPEG-DASH -- Conclusion -- Appendix A. Mininet Topology Source Code For DASH Setup -- Appendix B. Hadoop Installation Source Code -- Appendix C. Openvswitch Installation Source Code -- Appendix D. HiBench Installation Guid