180 research outputs found
Recommended from our members
Understanding the characteristics of Internet traffic and designing an efficient RaptorQ-based data transport protocol for modern data centres
This thesis is the amalgamation of research on efficient data transport protocols for data centres and a comprehensive and systematic study of Internet traffic, which came as a result of the need to understand traffic patterns and workloads in modern computer networks.
The first part of the thesis is on the development of efficient data transport pro- tocols for data centres. We study modern data transport protocols for data centres through large scale simulations using the OMNeT++ simulator. We developed and experimented with an OMNeT++ model of NDP. This has led to the identification of limitations of the state of the art and the formulation of research questions with respect to data transport protocols for modern data centres. The developed model includes an implementation of a Fat-tree topology and per-packet ECMP load bal- ancing. We discuss how we integrated the model with the INET Framework and validated it by running various experiments that test different model parameters and components. This work revealed limitations of NDP with respect to efficient one-to-many and many-to-one communication in data centres, which led to the de- velopment of SCDP, a novel and general-purpose data transport protocol for data centres that, in contrast to all other protocols proposed to date, natively supports one-to-many and many-to-one data communication, which is extremely common in modern data centres. SCDP does so without compromising on efficiency for short and long unicast flows. SCDP achieves this by integrating RaptorQ codes with receiver-driven data transport, in-network packet trimming and Multi-Level Feed- back Queuing (MLFQ); (1) RaptorQ codes enable efficient one-to-many and many- to-one data transport; (2) on top of RaptorQ codes, receiver- driven flow control, in combination with in-network packet trimming, enable efficient usage of network re- sources as well as multi-path transport and packet spraying for all transport modes. Incast and Outcast are eliminated; (3) the systematic nature of RaptorQ codes, in combination with MLFQ, enable fast, decoding-free completion of short flows. We extensively evaluated SCDP in a wide range of simulated scenarios with realistic data centre workloads. For one-to-many and many-to-one transport sessions, SCDP performs significantly better than NDP. For short and long unicast flows, SCDP performs equally well or better compared to NDP.
In the second part of the thesis, we extensively study Internet traffic. Getting good statistical models of traffic on network links is a well-known, often-studied problem. A lot of attention has been given to correlation patterns and flow duration. The distribution of the amount of traffic per unit time is an equally important but less studied problem. We study a large number of traffic traces from many different networks including academic, commercial and residential networks using state-of-the-art statistical techniques. We show that the log-normal distribution is a better fit than the Gaussian distribution. We also investigate a second, heavy- tailed distribution and show that its performance is better than Gaussian but worse than log-normal. We examine anomalous traces which are a poor fit for all tested distributions and show that this is often due to traffic outages or links that hit maximum capacity. Stationarity tests showed that the traffic is stationary at some range of aggregation times. We demonstrate the utility of the log-normal distribution in two contexts: predicting the proportion of time traffic will exceed a given level (for link capacity estimation) and predicting 95th percentile pricing. We also show the log-normal distribution is a better predictor than Gaussian orWeibull distributions
Wireless Communication in Data Centers: A Survey
Data centers (DCs) is becoming increasingly an integral part of the computing infrastructures of most enterprises. Therefore, the concept of DC networks (DCNs) is receiving an increased attention in the network research community. Most DCNs deployed today can be classified as wired DCNs as copper and optical fiber cables are used for intra- and inter-rack connections in the network. Despite recent advances, wired DCNs face two inevitable problems; cabling complexity and hotspots. To address these problems, recent research works suggest the incorporation of wireless communication technology into DCNs. Wireless links can be used to either augment conventional wired DCNs, or to realize a pure wireless DCN. As the design spectrum of DCs broadens, so does the need for a clear classification to differentiate various design options. In this paper, we analyze the free space optical (FSO) communication and the 60 GHz radio frequency (RF), the two key candidate technologies for implementing wireless links in DCNs. We present a generic classification scheme that can be used to classify current and future DCNs based on the communication technology used in the network. The proposed classification is then used to review and summarize major research in this area. We also discuss open questions and future research directions in the area of wireless DCs
Learning algorithms for the control of routing in integrated service communication networks
There is a high degree of uncertainty regarding the nature of traffic on future integrated service networks. This uncertainty motivates the use of adaptive resource allocation policies that can take advantage of the statistical fluctuations in the traffic demands. The adaptive control mechanisms must be 'lightweight', in terms of their overheads, and scale to potentially large networks with many traffic flows. Adaptive routing is one form of adaptive resource allocation, and this thesis considers the application of Stochastic Learning Automata (SLA) for distributed, lightweight adaptive routing in future integrated service communication networks. The thesis begins with a broad critical review of the use of Artificial Intelligence (AI) techniques applied to the control of communication networks. Detailed simulation models of integrated service networks are then constructed, and learning automata based routing is compared with traditional techniques on large scale networks. Learning automata are examined for the 'Quality-of-Service' (QoS) routing problem in realistic network topologies, where flows may be routed in the network subject to multiple QoS metrics, such as bandwidth and delay. It is found that learning automata based routing gives considerable blocking probability improvements over shortest path routing, despite only using local connectivity information and a simple probabilistic updating strategy. Furthermore, automata are considered for routing in more complex environments spanning issues such as multi-rate traffic, trunk reservation, routing over multiple domains, routing in high bandwidth-delay product networks and the use of learning automata as a background learning process. Automata are also examined for routing of both 'real-time' and 'non-real-time' traffics in an integrated traffic environment, where the non-real-time traffic has access to the bandwidth 'left over' by the real-time traffic. It is found that adopting learning automata for the routing of the real-time traffic may improve the performance to both real and non-real-time traffics under certain conditions. In addition, it is found that one set of learning automata may route both traffic types satisfactorily. Automata are considered for the routing of multicast connections in receiver-oriented, dynamic environments, where receivers may join and leave the multicast sessions dynamically. Automata are shown to be able to minimise the average delay or the total cost of the resulting trees using the appropriate feedback from the environment. Automata provide a distributed solution to the dynamic multicast problem, requiring purely local connectivity information and a simple updating strategy. Finally, automata are considered for the routing of multicast connections that require QoS guarantees, again in receiver-oriented dynamic environments. It is found that the distributed application of learning automata leads to considerably lower blocking probabilities than a shortest path tree approach, due to a combination of load balancing and minimum cost behaviour
Recommended from our members
Optimising data centre operation by removing the transport bottleneck
Data centres lie at the heart of almost every service on the Internet. Data centres are used to provide search results, to power social media, to store and index email, to host “cloud” applications, for online retail and to provide a myriad of other web services. Consequently the more efficient they can be made the better for all of us. The power of modern data centres is in combining commodity off-the-shelf server hardware and network equipment to provide what Google’s Barrosso and Ho ̈lzle describe as “warehouse scale” computers.
Data centres rely on TCP, a transport protocol that was originally designed for use in the Internet. Like other such protocols, TCP has been optimised to maximise throughput, usually by filling up queues at the bottleneck. However, for most applications within a data centre network latency is more critical than throughput. Consequently the choice of transport protocol becomes a bottleneck for performance. My thesis is that the solution to this is to move away from the use of one-size-fits-all transport protocols towards ones that have been designed to reduce latency across the data centre and which can dynamically respond to the needs of the applications.
This dissertation focuses on optimising the transport layer in data centre networks. In particular I address the question of whether any single transport mechanism can be flexible enough to cater to the needs of all data centre traffic. I show that one leading protocol (DCTCP) has been heavily optimised for certain network conditions. I then explore approaches that seek to minimise latency for applications that care about it while still allowing throughput-intensive applications to receive a good level of service. My key contributions to this are Silo and Trevi.
Trevi is a novel transport system for storage traffic that utilises fountain coding to max- imise throughput and minimise latency while being agnostic to drop, thus allowing storage traffic to be pushed out of the way when latency sensitive traffic is present in the network. Silo is an admission control system that is designed to give tenants of a multi-tenant data centre guaranteed low latency network performance. Both of these were developed in collaboration with others
Dynamic Optical Networks for Data Centres and Media Production
This thesis explores all-optical networks for data centres, with a particular focus on network designs for live media production. A design for an all-optical data centre network is presented, with experimental verification of the feasibility of the network data plane. The design uses fast tunable (< 200 ns) lasers and coherent receivers across a passive optical star coupler core, forming a network capable of reaching over 1000 nodes. Experimental transmission of 25 Gb/s data across the network core, with combined wavelength switching and time division multiplexing (WS-TDM), is demonstrated. Enhancements to laser tuning time via current pre-emphasis are discussed, including experimental demonstration of fast wavelength switching (< 35 ns) of a single laser between all combinations of 96 wavelengths spaced at 50 GHz over a range wider than the optical C-band. Methods of increasing the overall network throughput by using a higher complexity modulation format are also described, along with designs for line codes to enable pulse amplitude modulation across the WS-TDM network core. The construction of an optical star coupler network core is investigated, by evaluating methods of constructing large star couplers from smaller optical coupler components. By using optical circuit switches to rearrange star coupler connectivity, the network can be partitioned, creating independent reserves of bandwidth and resulting in increased overall network throughput. Several topologies for constructing a star from optical couplers are compared, and algorithms for optimum construction methods are presented. All of the designs target strict criteria for the flexible and dynamic creation of multicast groups, which will enable future live media production workflows in data centres. The data throughput performance of the network designs is simulated under synthetic and practical media production traffic scenarios, showing improved throughput when reconfigurable star couplers are used compared to a single large star. An energy consumption evaluation shows reduced network power consumption compared to incumbent and other proposed data centre network technologies
Live media production: multicast optimization and visibility for clos fabric in media data centers
Media production data centers are undergoing a major architectural shift to introduce digitization concepts to media creation and media processing workflows. Content companies such as NBC Universal, CBS/Viacom and Disney are modernizing their workflows to take advantage of the flexibility of IP and virtualization.
In these new environments, multicast is utilized to provide point-to-multi-point communications. In order to build point-to-multi-point trees, Multicast has an established set of control protocols such as IGMP and PIM. The existing multicast protocols do not optimize multicast tree formation for maximizing network throughput which lead to decreased fabric utilization and decreased total number of admitted flows. In addition, existing multicast protocols are not bandwidth-aware and could cause links to over-subscribe leading to packet loss and lower video quality.
TV production traffic patterns are unique due to ultra high bandwidth requirements and high sensitivity to packet loss that leads to video impairments. In such environments, operators need monitoring tools that are able to proactively monitor video flows and provide actionable alerts. Existing network monitoring tools are inadequate because they are reactive by design and perform generic monitoring of flows with no insights into video domain.
The first part of this dissertation includes a design and implementation of a novel Intelligent Rendezvous Point algorithm iRP for bandwidth-aware multicast routing in media DC fabrics. iRP utilizes a controller-based architecture to optimize multicast tree formation and to increase bandwidth availability in the fabric. The system offers up to 50\% increase in fabric capacity to handle multicast flows passing through the fabric.
In the second part of this dissertation, DiRP algorithm is presented. DiRP is based on a distributed decision-making approach to achieve multicast tree capacity optimization while maintaining low multicast tree setup time. DiRP algorithm is tested using commercially available data center switches. DiRP algorithm offers substantially lower path setup time compared to centralized systems while maintaining bandwidth awareness when setting up the fabric.
The third part of this dissertation studies the utilization of machine learning algorithms to improve on multicast efficiency in the fabric. The work includes implementation and testing of LiRP algorithm to increase iRP\u27s fabric efficiency by implementing k-fold cross validation method to predict future multicast group memberships for time-series analysis. Testing results confirm that LiRP system increases the efficiency of iRP by up to 40\% through prediction of multicast group memberships with online arrival.
In the fourth part of this dissertation, The problem of live video monitoring is studied. Existing network monitoring tools are either reactive by design or perform generic monitoring of flows with no insights into video domain. MediaFlow is a robust system for active network monitoring and reporting of video quality for thousands of flows simultaneously using a fraction of the cost of traditional monitoring solutions. MediaFlow is able to detect and report on integrity of video flows at a granularity of 100 mSec at line rate for thousands of flows. The system increases video monitoring scale by a thousand-fold compared to edge monitoring solutions
Software defined networking for radio telescopes: a case study on the applicability of SDN for MeerKAT
Scientific instruments like radio telescopes depend on high-performance networks for internal data exchange. The high bandwidth data exchange between the components of a radio telescope makes use of multicast networking. Complex multicast networks are hard to maintain and grow, and specific installations require modified network switches. This study evaluates Software Defined Networking (SDN) for use in the MeerKAT radio telescope to alleviate the management complexity and allow for a vendor-neutral implementation. The purpose of this dissertation is to verify that an SDN multicast network can produce suitable paths for data flow through the network and to see if such an implementation is easier to maintain and grow. There is little literature regarding SDN for radio telescope networks; however, there is considerable work where different aspects of SDN are discussed and demonstrated for video streaming. SDN with multicast for video streaming, although simpler, forms the background research. Considerable work was put into understanding and documenting the different aspects of a radio telescope affecting the data network. The telescope network controller generates the OpenFlow rules required by the SDN controller and is a new concept introduced in this work. The telescope network controller is fitted with two placement algorithms to demonstrate its flexibility. Both algorithms are suitable for the expected workload, but they produce very different traffic patterns. The two algorithms are not compared to one another, they were created to demonstrate the ease of adding domain specific knowledge to an SDN. The telescope network controller makes it easy to introduce and use new flow placement algorithms, thus making traffic engineering feasible for the radio telescope. Complex multicast networks are easier to maintain and grow with SDN. SDN allows customised packet forwarding rules typically unattainable with standard routing and other standard network protocols and implementations. A radio telescope with a software-defined data network is resilient, easier to maintain, vendor-neutral, and possesses advanced traffic engineering mechanisms
- …