208 research outputs found

    Datacenter Traffic Control: Understanding Techniques and Trade-offs

    Get PDF
    Datacenters provide cost-effective and flexible access to scalable compute and storage resources necessary for today's cloud computing needs. A typical datacenter is made up of thousands of servers connected with a large network and usually managed by one operator. To provide quality access to the variety of applications and services hosted on datacenters and maximize performance, it deems necessary to use datacenter networks effectively and efficiently. Datacenter traffic is often a mix of several classes with different priorities and requirements. This includes user-generated interactive traffic, traffic with deadlines, and long-running traffic. To this end, custom transport protocols and traffic management techniques have been developed to improve datacenter network performance. In this tutorial paper, we review the general architecture of datacenter networks, various topologies proposed for them, their traffic properties, general traffic control challenges in datacenters and general traffic control objectives. The purpose of this paper is to bring out the important characteristics of traffic control in datacenters and not to survey all existing solutions (as it is virtually impossible due to massive body of existing research). We hope to provide readers with a wide range of options and factors while considering a variety of traffic control mechanisms. We discuss various characteristics of datacenter traffic control including management schemes, transmission control, traffic shaping, prioritization, load balancing, multipathing, and traffic scheduling. Next, we point to several open challenges as well as new and interesting networking paradigms. At the end of this paper, we briefly review inter-datacenter networks that connect geographically dispersed datacenters which have been receiving increasing attention recently and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial

    FatPaths: Routing in Supercomputers and Data Centers when Shortest Paths Fall Short

    Full text link
    We introduce FatPaths: a simple, generic, and robust routing architecture that enables state-of-the-art low-diameter topologies such as Slim Fly to achieve unprecedented performance. FatPaths targets Ethernet stacks in both HPC supercomputers as well as cloud data centers and clusters. FatPaths exposes and exploits the rich ("fat") diversity of both minimal and non-minimal paths for high-performance multi-pathing. Moreover, FatPaths uses a redesigned "purified" transport layer that removes virtually all TCP performance issues (e.g., the slow start), and incorporates flowlet switching, a technique used to prevent packet reordering in TCP networks, to enable very simple and effective load balancing. Our design enables recent low-diameter topologies to outperform powerful Clos designs, achieving 15% higher net throughput at 2x lower latency for comparable cost. FatPaths will significantly accelerate Ethernet clusters that form more than 50% of the Top500 list and it may become a standard routing scheme for modern topologies

    Effectiveness of segment routing technology in reducing the bandwidth and cloud resources provisioning times in network function virtualization architectures

    Get PDF
    Network Function Virtualization is a new technology allowing for a elastic cloud and bandwidth resource allocation. The technology requires an orchestrator whose role is the service and resource orchestration. It receives service requests, each one characterized by a Service Function Chain, which is a set of service functions to be executed according to a given order. It implements an algorithm for deciding where both to allocate the cloud and bandwidth resources and to route the SFCs. In a traditional orchestration algorithm, the orchestrator has a detailed knowledge of the cloud and network infrastructures and that can lead to high computational complexity of the SFC Routing and Cloud and Bandwidth resource Allocation (SRCBA) algorithm. In this paper, we propose and evaluate the effectiveness of a scalable orchestration architecture inherited by the one proposed within the European Telecommunications Standards Institute (ETSI) and based on the functional separation of an NFV orchestrator in Resource Orchestrator (RO) and Network Service Orchestrator (NSO). Each cloud domain is equipped with an RO whose task is to provide a simple and abstract representation of the cloud infrastructure. These representations are notified of the NSO that can apply a simplified and less complex SRCBA algorithm. In addition, we show how the segment routing technology can help to simplify the SFC routing by means of an effective addressing of the service functions. The scalable orchestration solution has been investigated and compared to the one of a traditional orchestrator in some network scenarios and varying the number of cloud domains. We have verified that the execution time of the SRCBA algorithm can be drastically reduced without degrading the performance in terms of cloud and bandwidth resource costs

    Reducing the Cost of Operating a Datacenter Network

    Get PDF
    Datacenters are a significant capital expense for many enterprises. Yet, they are difficult to manage and are hard to design and maintain. The initial design of a datacenter network tends to follow vendor guidelines, but subsequent upgrades and expansions to it are mostly ad hoc, with equipment being upgraded piecemeal after its amortization period runs out and equipment acquisition is tied to budget cycles rather than changes in workload. These networks are also brittle and inflexible. They tend to be manually managed, and cannot perform dynamic traffic engineering. The high-level goal of this dissertation is to reduce the total cost of owning a datacenter by improving its network. To achieve this, we make the following contributions. First, we develop an automated, theoretically well-founded approach to planning cost-effective datacenter upgrades and expansions. Second, we propose a scalable traffic management framework for datacenter networks. Together, we show that these contributions can significantly reduce the cost of operating a datacenter network. To design cost-effective network topologies, especially as the network expands over time, updated equipment must coexist with legacy equipment, which makes the network heterogeneous. However, heterogeneous high-performance network designs are not well understood. Our first step, therefore, is to develop the theory of heterogeneous Clos topologies. Using our theory, we propose an optimization framework, called LEGUP, which designs a heterogeneous Clos network to implement in a new or legacy datacenter. Although effective, LEGUP imposes a certain amount of structure on the network. To deal with situations when this is infeasible, our second contribution is a framework, called REWIRE, which using optimization to design unstructured DCN topologies. Our results indicate that these unstructured topologies have up to 100-500\% more bisection bandwidth than a fat-tree for the same dollar cost. Our third contribution is two frameworks for datacenter network traffic engineering. Because of the multiplicity of end-to-end paths in DCN fabrics, such as Clos networks and the topologies designed by REWIRE, careful traffic engineering is needed to maximize throughput. This requires timely detection of elephant flows---flows that carry large amount of data---and management of those flows. Previously proposed approaches incur high monitoring overheads, consume significant switch resources, or have long detection times. We make two proposals for elephant flow detection. First, in the Mahout framework, we suggest that such flows be detected by observing the end hosts' socket buffers, which provide efficient visibility of flow behavior. Second, in the DevoFlow framework, we add efficient stats-collection mechanisms to network switches. Using simulations and experiments, we show that these frameworks reduce traffic engineering overheads by at least an order of magnitude while still providing near-optimal performance

    Designing Scalable Networks for Future Large Datacenters

    Get PDF
    Modern datacenters require a network with high cross-section bandwidth, fine-grained security, support for virtualization, and simple management that can scale to hundreds of thousands of hosts at low cost. This thesis first presents the firmware for Rain Man, a novel datacenter network architecture that meets these requirements, and then performs a general scalability study of the design space. The firmware for Rain Man, a scalable Software-Defined Networking architecture, employs novel algorithms and uses previously unused forwarding hardware. This allows Rain Man to scale at high performance to networks of forty thousand hosts on arbitrary network topologies. In the general scalability study of the design space of SDN architectures, this thesis identifies three different architectural dimensions common among the networks: source versus hop-by-hop routing, the granularity at which flows are routed, and arbitrary versus restrictive routing and finds that a source-routed, host-pair granularity network with arbitrary routes is the most scalable

    On energy consumption of switch-centric data center networks

    Get PDF
    Data center network (DCN) is the core of cloud computing and accounts for 40% energy spend when compared to cooling system, power distribution and conversion of the whole data center (DC) facility. It is essential to reduce the energy consumption of DCN to esnure energy-efficient (green) data center can be achieved. An analysis of DC performance and efficiency emphasizing the effect of bandwidth provisioning and throughput on energy proportionality of two most common switch-centric DCN topologies: three-tier (3T) and fat tree (FT) based on the amount of actual energy that is turned into computing power are presented. Energy consumption of switch-centric DCNs by realistic simulations is analyzed using GreenCloud simulator. Power related metrics were derived and adapted for the information technology equipment (ITE) processes within the DCN. These metrics are acknowledged as subset of the major metrics of power usage effectiveness (PUE) and data center infrastructure efficiency (DCIE), known to DCs. This study suggests that despite in overall FT consumes more energy, it spends less energy for transmission of a single bit of information, outperforming 3T

    Leveraging Commodity Photonics to Reduce Datacenter Network Latency

    Get PDF
    Most datacenter network (DCN) designs focus on maximizing bisection bandwidth rather than minimizing server-to-server latency. They are, therefore, ill-suited for important latency-sensitive applications, such as high performance computing, realtime analytic systems and high-frequency financial trading. Although there are a number of existing approaches to reduce network latency, they are only partially effective, workload dependent, and often require network protocol changes. In this thesis, we explore architectural approaches to building a low-latency DCN and introduce Quartz, a new optical design element consisting of a full mesh of switches connected by an optical ring. We can reduce the network latency of a hierarchical or random network by replacing portions of it with a Quartz ring. Our analysis shows that, in a standard 3-tier DCN, replacing high port-count core switches with Quartz can significantly reduce switching delays, and replacing groups of top-of-rack and aggregation switches with Quartz can significantly reduce congestion-related delays from cross-traffic. We overcome the complexity of wiring a complete mesh by using low-cost optical multiplexers that enable us to efficiently implement a logical mesh as a physical ring. We evaluate our performance using both simulations and a small working prototype. Our evaluation results confirm our analysis, and demonstrate that it is possible to build low-latency DCNs using inexpensive commodity elements without significant concessions to cost, scalability, or wiring complexity

    P4TE: PISA Switch Based Traffic Engineering in Fat-Tree Data Center Networks

    Full text link
    This work presents P4TE, an in-band traffic monitoring, load-aware packet forwarding, and flow rate controlling mechanism for traffic engineering in fat-tree topology-based data center networks using PISA switches. It achieves sub-RTT reaction time to change in network conditions, improved flow completion time, and balanced link utilization. Unlike the classical probe-based monitoring approach, P4TE uses an in-band monitoring approach to identify traffic events in the data plane. Based on these events, it re-adjusts the priorities of the paths. It uses a heuristic-based load-aware forwarding path selection mechanism to respond to changing network conditions and control the flow rate by sending feedback to the end hosts. It is implementable on emerging v1model.p4 architecture-based programmable switches and capable of maintaining the line-rate performance. Our evaluation shows that P4TE uses a small amount of resources in the PISA pipeline and achieves an improved flow completion time than ECMP and HULA
    • …
    corecore