16 research outputs found
Empowering Cloud Data Centers with Network Programmability
Cloud data centers are a critical infrastructure for modern Internet services such as web search, social networking and e-commerce. However, the gradual slow-down of Moore’s law has put a burden on the growth of data centers’ performance and energy efficiency. In addition, the increasing of millisecond-scale and microsecond-scale tasks also bring higher requirements to the throughput and latency for the cloud applications. Today’s server-based solutions are hard to meet the performance requirements in many scenarios like resource management, scheduling, high-speed traffic monitoring and testing.
In this dissertation, we study these problems from a network perspective. We investigate a new architecture that leverages the programmability of new-generation network switches to improve the performance and reliability of clouds. As programmable switches only provide very limited memory and functionalities, we exploit compact data structures and deeply co-design software and hardware to best utilize the resource. More specifically, this dissertation presents four systems:
(i) NetLock: A new centralized lock management architecture that co-designs programmable switches and servers to simultaneously achieve high performance and rich policy support. It provides orders-of-magnitude higher throughput than existing systems with microsecond-level latency, and supports many commonly-used policies such as performance isolation.
(ii) HCSFQ: A scalable and practical solution to implement hierarchical fair queueing on commodity hardware at line rate. Instead of relying on a hierarchy of queues with complex queue management, HCSFQ does not keep per-flow states and uses only one queue to achieve hierarchical fair queueing.
(iii) AIFO: A new approach for programmable packet scheduling that only uses a single FIFO queue. AIFO utilizes an admission control mechanism to approximate PIFO which is theoretically ideal but hard to implement with commodity devices.
(iv) Lumina: A tool that enables fine-grained analysis of hardware network stack. By exploiting network programmability to emulate various network scenarios, Lumina is able to help users understand the micro-behaviors of hardware network stacks
ResTP: A Configurable and Adaptable Multipath Transport Protocol for Future Internet Resilience
Motivated by the shortcomings of common transport protocols, e.g., TCP, UDP, and MPTCP, in modern networking and the belief that a general-purpose transport-layer protocol, which can operate efficiently over diverse network environments while being able to provide desired services for various application types, we design a new transport protocol, ResTP. The rapid advancement of networking technology and use paradigms is continually supporting new applications. The configurable and adaptable multipath-capable ResTP is not only distinct from the standard protocols by its flexibility in satisfying the requirements of different traffic classes considering the characteristics of the underlying networks, but by its emphasis on providing resilience. Resilience is an essential property that is unfortunately missing in the current Internet. In this dissertation, we present the design of ResTP, including the services that it supports and the set of algorithms that implement each service. We also discuss our modular implementation of ResTP in the open-source network simulator ns-3. Finally, the protocol is simulated under various network scenarios, and the results are analyzed in comparison with conventional protocols such as TCP, UDP, and MPTCP to demonstrate that ResTP is a promising new transport-layer protocol providing resilience in the Future Internet (FI)
On Improving Efficiency of Data-Intensive Applications in Geo-Distributed Environments
Distributed systems are pervasively demanded and adopted in nowadays for processing data-intensive workloads since they greatly accelerate large-scale data processing with scalable parallelism and improved data locality. Traditional distributed systems initially targeted computing clusters but have since evolved to data centers with multiple clusters. These systems are mostly built on top of homogeneous, tightly integrated resources connected in high-speed local-area networks (LANs), and typically require data to be ingested to a central data center for processing. Today, with enormous volumes of data continuously generated from geographically distributed locations, direct adoption of such systems is prohibitively inefficient due to the limited system scalability and high cost for centralizing the geo-distributed data over the wide-area networks (WANs). More commonly, it becomes a trend to build geo-distributed systems wherein data processing jobs are performed on top of geo-distributed, heterogeneous resources in proximity to the data at vastly distributed geo-locations. However, critical challenges and mechanisms for efficient execution of data-intensive applications in such geo-distributed environments are unclear by far. The goal of this dissertation is to identify such challenges and mechanisms, by extensively using the research principles and methodology of conventional distributed systems to investigate the geo-distributed environment, and by developing new techniques to tackle these challenges and run data-intensive applications with efficiency at scale. The contributions of this dissertation are threefold. Firstly, the dissertation shows that the high level of resource heterogeneity exhibited in the geo-distributed environment undermines the scalability of geo-distributed systems. Virtualization-based resource abstraction mechanisms have been introduced to abstract the hardware, network, and OS resources throughout the system, to mitigate the underlying resource heterogeneity and enhance the system scalability. Secondly, the dissertation reveals the overwhelming performance and monetary cost incurred by indulgent data sharing over the WANs in geo-distributed systems. Network optimization approaches, including linear- programming-based global optimization, greedy bin-packing heuristics, and TCP enhancement, are developed to optimize the network resource utilization and circumvent unnecessary expenses imposed on data sharing in WANs. Lastly, the dissertation highlights the importance of data locality for data-intensive applications running in the geo-distributed environment. Novel data caching and locality-aware scheduling techniques are devised to improve the data locality.Doctor of Philosoph
Recommended from our members
Understanding the characteristics of Internet traffic and designing an efficient RaptorQ-based data transport protocol for modern data centres
This thesis is the amalgamation of research on efficient data transport protocols for data centres and a comprehensive and systematic study of Internet traffic, which came as a result of the need to understand traffic patterns and workloads in modern computer networks.
The first part of the thesis is on the development of efficient data transport pro- tocols for data centres. We study modern data transport protocols for data centres through large scale simulations using the OMNeT++ simulator. We developed and experimented with an OMNeT++ model of NDP. This has led to the identification of limitations of the state of the art and the formulation of research questions with respect to data transport protocols for modern data centres. The developed model includes an implementation of a Fat-tree topology and per-packet ECMP load bal- ancing. We discuss how we integrated the model with the INET Framework and validated it by running various experiments that test different model parameters and components. This work revealed limitations of NDP with respect to efficient one-to-many and many-to-one communication in data centres, which led to the de- velopment of SCDP, a novel and general-purpose data transport protocol for data centres that, in contrast to all other protocols proposed to date, natively supports one-to-many and many-to-one data communication, which is extremely common in modern data centres. SCDP does so without compromising on efficiency for short and long unicast flows. SCDP achieves this by integrating RaptorQ codes with receiver-driven data transport, in-network packet trimming and Multi-Level Feed- back Queuing (MLFQ); (1) RaptorQ codes enable efficient one-to-many and many- to-one data transport; (2) on top of RaptorQ codes, receiver- driven flow control, in combination with in-network packet trimming, enable efficient usage of network re- sources as well as multi-path transport and packet spraying for all transport modes. Incast and Outcast are eliminated; (3) the systematic nature of RaptorQ codes, in combination with MLFQ, enable fast, decoding-free completion of short flows. We extensively evaluated SCDP in a wide range of simulated scenarios with realistic data centre workloads. For one-to-many and many-to-one transport sessions, SCDP performs significantly better than NDP. For short and long unicast flows, SCDP performs equally well or better compared to NDP.
In the second part of the thesis, we extensively study Internet traffic. Getting good statistical models of traffic on network links is a well-known, often-studied problem. A lot of attention has been given to correlation patterns and flow duration. The distribution of the amount of traffic per unit time is an equally important but less studied problem. We study a large number of traffic traces from many different networks including academic, commercial and residential networks using state-of-the-art statistical techniques. We show that the log-normal distribution is a better fit than the Gaussian distribution. We also investigate a second, heavy- tailed distribution and show that its performance is better than Gaussian but worse than log-normal. We examine anomalous traces which are a poor fit for all tested distributions and show that this is often due to traffic outages or links that hit maximum capacity. Stationarity tests showed that the traffic is stationary at some range of aggregation times. We demonstrate the utility of the log-normal distribution in two contexts: predicting the proportion of time traffic will exceed a given level (for link capacity estimation) and predicting 95th percentile pricing. We also show the log-normal distribution is a better predictor than Gaussian orWeibull distributions
Tuning the aggressive TCP behavior for highly concurrent HTTP connections in intra-datacenter
This is the author accepted manuscript. The final version is available from the publisher via the DOI in this record.IEEE Modern data centers host diverse hyper text transfer protocol (HTTP)-based services, which employ persistent transmission control protocol (TCP) connections to send HTTP requests and responses. However, the ON/OFF pattern of HTTP traffic disturbs the increase of TCP congestion window, potentially triggering packet loss at the beginning of ON period. Furthermore, the transmission performance becomes worse due to severe congestion in the concurrent transfer of HTTP response. In this paper, we provide the first extensive study to investigate the root cause of performance degradation of highly concurrent HTTP connections in data center network. We further present the design and implementation of TCP-TRIM, which employs probe packets to smooth the aggressive increase of congestion window in persistent TCP connection and leverages congestion detection and control at end-host to limit the growth of switch queue length under highly concurrent TCP connections. The experimental results of at-scale simulations and real implementations demonstrate that TCP-TRIM reduces the completion time of HTTP response by up to 80 & #x0025;, while introducing little deployment overhead only at the end hosts.This work is supported by the National Natural Science
Foundation of China (61572530, 61502539, 61402541,
61462007 and 61420106009)
Implementation of the Algorithm for Congestion control in the Dynamic Circuit Network (DCN)
Transport Control Protocol (TCP) incast congestion happens when a number of senders work in parallel with the same server where the high bandwidth and low latency network problem occurs. For many data center network applications such as a search engine, heavy traffic is present on such a server. Incast congestion degrades the entire performance as packets are lost at a server side due to buffer overflow, and as a result, the response time becomes longer. In this work, we focus on TCP throughput, round-trip time (RTT), receive window and retransmission. Our method is based on the proactive adjust of the TCP receive window before the packet loss occurs. We aim to avoid the wastage of the bandwidth by adjusting its size as per the number of packets. To avoid the packet loss, the ICTCP algorithm has been implemented in the data center network (ToR)
Recommended from our members
Measurement-Driven Algorithm and System Design for Wireless and Datacenter Networks
The growing number of mobile devices and data-intensive applications pose unique challenges for wireless access networks as well as datacenter networks that enable modern cloud-based services. With the enormous increase in volume and complexity of traffic from applications such as video streaming and cloud computing, the interconnection networks have become a major performance bottleneck. In this thesis, we study algorithms and architectures spanning several layers of the networking protocol stack that enable and accelerate novel applications and that are easily deployable and scalable. The design of these algorithms and architectures is motivated by measurements and observations in real world or experimental testbeds.
In the first part of this thesis, we address the challenge of wireless content delivery in crowded areas. We present the AMuSe system, whose objective is to enable scalable and adaptive WiFi multicast. AMuSe is based on accurate receiver feedback and incurs a small control overhead. This feedback information can be used by the multicast sender to optimize multicast service quality, e.g., by dynamically adjusting transmission bitrate. Specifically, we develop an algorithm for dynamic selection of a subset of the multicast receivers as feedback nodes which periodically send information about the channel quality to the multicast sender. Further, we describe the Multicast Dynamic Rate Adaptation (MuDRA) algorithm that utilizes AMuSe's feedback to optimally tune the physical layer multicast rate. MuDRA balances fast adaptation to channel conditions and stability, which is essential for multimedia applications.
We implemented the AMuSe system on the ORBIT testbed and evaluated its performance in large groups with approximately 200 WiFi nodes. Our extensive experiments demonstrate that AMuSe can provide accurate feedback in a dense multicast environment. It outperforms several alternatives even in the case of external interference and changing network conditions. Further, our experimental evaluation of MuDRA on the ORBIT testbed shows that MuDRA outperforms other schemes and supports high throughput multicast flows to hundreds of nodes while meeting quality requirements. As an example application, MuDRA can support multiple high quality video streams, where 90% of the nodes report excellent or very good video quality.
Next, we specifically focus on ensuring high Quality of Experience (QoE) for video streaming over WiFi multicast. We formulate the problem of joint adaptation of multicast transmission rate and video rate for ensuring high video QoE as a utility maximization problem and propose an online control algorithm called DYVR which is based on Lyapunov optimization techniques. We evaluated the performance of DYVR through analysis, simulations, and experiments using a testbed composed of Android devices and o the shelf APs. Our evaluation shows that DYVR can ensure high video rates while guaranteeing a low but acceptable number of segment losses, buffer underflows, and video rate switches.
We leverage the lessons learnt from AMuSe for WiFi to address the performance issues with LTE evolved Multimedia Broadcast/Multicast Service (eMBMS). We present the Dynamic Monitoring (DyMo) system which provides low-overhead and real-time feedback about eMBMS performance. DyMo employs eMBMS for broadcasting instructions which indicate the reporting rates as a function of the observed Quality of Service (QoS) for each UE. This simple feedback mechanism collects very limited QoS reports which can be used for network optimization. We evaluated the performance of DyMo analytically and via simulations. DyMo infers the optimal eMBMS settings with extremely low overhead, while meeting strict QoS requirements under different UE mobility patterns and presence of network component failures.
In the second part of the thesis, we study datacenter networks which are key enablers of the end-user applications such as video streaming and storage. Datacenter applications such as distributed file systems, one-to-many virtual machine migrations, and large-scale data processing involve bulk multicast flows. We propose a hardware and software system for enabling physical layer optical multicast in datacenter networks using passive optical splitters. We built a prototype and developed a simulation environment to evaluate the performance of the system for bulk multicasting. Our evaluation shows that the optical multicast architecture can achieve higher throughput and lower latency than IP multicast and peer-to-peer multicast schemes with lower switching energy consumption.
Finally, we study the problem of congestion control in datacenter networks. Quantized Congestion Control (QCN), a switch-supported standard, utilizes direct multi-bit feedback from the network for hardware rate limiting. Although QCN has been shown to be fast-reacting and effective, being a Layer-2 technology limits its adoption in IP-routed Layer 3 datacenters. We address several design challenges to overcome QCN feedback's Layer- 2 limitation and use it to design window-based congestion control (QCN-CC) and load balancing (QCN-LB) schemes. Our extensive simulations, based on real world workloads, demonstrate the advantages of explicit, multi-bit congestion feedback, especially in a typical environment where intra-datacenter traffic with short Round Trip Times (RTT: tens of s) run in conjunction with web-facing traffic with long RTTs (tens of milliseconds)
Evaluation of a Set of TCP Features over Narrowband Radio Bearer for Train Communication
An engineering approach to the evaluation of the TCP as a narrowband bearer for short messages in the low latency train-trackside communication scenario is described in this report. The project was developed in cooperation with Bombardier Transportation Sweden AB as a part of the “ETCS over GPRS” venture. With the increase of the demands from the railway industry, the currently used circuit-switched GSM-R technology becomes unsatisfactory from the radio system capacity point of view and the need of a new solution is highly required. The packet-switched GPRS solution using TCP as a suite is under research for this specific scenario. The investigated problem in this report concerns the tuning of the retransmission mechanism, which includes the TCP features TCP_RTO_MIN and TCP_KEEPALIVE. This implies the tuning of those features to be able to detect a loss of communication and to react less aggressively for the short and instantaneous changes in the network delay. This thesis work began with a preparation phase in which a broad literature analysis of the background theory was made and followed by the development of applications that realizes the traffic model. Later in the performance phase the required changes were applied on the system and finally tested in a lab. The tests have been performed using one and four pairs of client-server applications communicating over an emulated link. The TCP features were modified at two levels, the TCP_RTO_MIN by a kernel recompilation and the TCP_KEEPALIVE by changes on the live system. Results from the tests have shown that for the higher than the default value of the TCP_RTO_MIN the less retransmissions were triggered. The TCP_KEEPALIVE has proven to be a sufficient feature to indicate a loss of the link. However the achieved improvement in performance was not as high as expected, but acceptable for this scenario. The train-trackside communication system could benefit from the proposed changes