128 research outputs found
Datacenter Traffic Control: Understanding Techniques and Trade-offs
Datacenters provide cost-effective and flexible access to scalable compute
and storage resources necessary for today's cloud computing needs. A typical
datacenter is made up of thousands of servers connected with a large network
and usually managed by one operator. To provide quality access to the variety
of applications and services hosted on datacenters and maximize performance, it
deems necessary to use datacenter networks effectively and efficiently.
Datacenter traffic is often a mix of several classes with different priorities
and requirements. This includes user-generated interactive traffic, traffic
with deadlines, and long-running traffic. To this end, custom transport
protocols and traffic management techniques have been developed to improve
datacenter network performance.
In this tutorial paper, we review the general architecture of datacenter
networks, various topologies proposed for them, their traffic properties,
general traffic control challenges in datacenters and general traffic control
objectives. The purpose of this paper is to bring out the important
characteristics of traffic control in datacenters and not to survey all
existing solutions (as it is virtually impossible due to massive body of
existing research). We hope to provide readers with a wide range of options and
factors while considering a variety of traffic control mechanisms. We discuss
various characteristics of datacenter traffic control including management
schemes, transmission control, traffic shaping, prioritization, load balancing,
multipathing, and traffic scheduling. Next, we point to several open challenges
as well as new and interesting networking paradigms. At the end of this paper,
we briefly review inter-datacenter networks that connect geographically
dispersed datacenters which have been receiving increasing attention recently
and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial
An edge-queued datagram service for all datacenter traffic
Modern datacenters support a wide range of protocols and in-network switch enhancements aimed at improving performance. Unfortunately, the resulting protocols often do not coexist gracefully because they inevitably interact via queuing in the network. In this paper we describe EQDS, a new datagram service for datacenters that moves almost all of the queuing out of the core network and into the sending host. This enables it to support multiple (conflicting) higher layer protocols, while only sending packets into the network according to any receiver-driven credit scheme. EQDS can transparently speed up legacy TCP and RDMA stacks, and enables transport protocol evolution, while benefiting from future switch enhancements without needing to modify higher layer stacks. We show through simulation and multiple implementations that EQDS can reduce FCT of legacy TCP by 2x, improve the NVMeOF-RDMA throughput by 30%, and safely run TCP alongside RDMA on the same network
OSMOSIS: Enabling Multi-Tenancy in Datacenter SmartNICs
Multi-tenancy is essential for unleashing SmartNIC's potential in
datacenters. Our systematic analysis in this work shows that existing on-path
SmartNICs have resource multiplexing limitations. For example, existing
solutions lack multi-tenancy capabilities such as performance isolation and QoS
provisioning for compute and IO resources. Compared to standard NIC data paths
with a well-defined set of offloaded functions, unpredictable execution times
of SmartNIC kernels make conventional approaches for multi-tenancy and QoS
insufficient. We fill this gap with OSMOSIS, a SmartNICs resource manager
co-design. OSMOSIS extends existing OS mechanisms to enable dynamic hardware
resource multiplexing on top of the on-path packet processing data plane. We
implement OSMOSIS within an open-source RISC-V-based 400Gbit/s SmartNIC. Our
performance results demonstrate that OSMOSIS fully supports multi-tenancy and
enables broader adoption of SmartNICs in datacenters with low overhead.Comment: 12 pages, 14 figures, 103 reference
FatPaths: Routing in Supercomputers and Data Centers when Shortest Paths Fall Short
We introduce FatPaths: a simple, generic, and robust routing architecture
that enables state-of-the-art low-diameter topologies such as Slim Fly to
achieve unprecedented performance. FatPaths targets Ethernet stacks in both HPC
supercomputers as well as cloud data centers and clusters. FatPaths exposes and
exploits the rich ("fat") diversity of both minimal and non-minimal paths for
high-performance multi-pathing. Moreover, FatPaths uses a redesigned "purified"
transport layer that removes virtually all TCP performance issues (e.g., the
slow start), and incorporates flowlet switching, a technique used to prevent
packet reordering in TCP networks, to enable very simple and effective load
balancing. Our design enables recent low-diameter topologies to outperform
powerful Clos designs, achieving 15% higher net throughput at 2x lower latency
for comparable cost. FatPaths will significantly accelerate Ethernet clusters
that form more than 50% of the Top500 list and it may become a standard routing
scheme for modern topologies
Application-centric bandwidth allocation in datacenters
Today's datacenters host a large number of concurrently executing applications with diverse intra-datacenter latency and bandwidth requirements.
Some of these applications, such as data analytics, graph processing, and machine learning training, are data-intensive and require high bandwidth to function properly.
However, these bandwidth-hungry applications can often congest the datacenter network, leading to queuing delays that hurt application completion time.
To remove the network as a potential performance bottleneck, datacenter operators have begun deploying high-end HPC-grade networks like InfiniBand.
These networks offer fully offloaded network stacks, remote direct memory access (RDMA) capability, and non-discarding links, which allow them to provide both low latency and high bandwidth for a single application.
However, it is unclear how well such networks accommodate a mix of latency- and bandwidth-sensitive traffic in a real-world deployment.
In this thesis, we aim to answer the above question.
To do so, we develop RPerf, a latency measurement tool for RDMA-based networks that can precisely measure the InfiniBand switch latency without hardware support.
Using RPerf, we benchmark a rack-scale InfiniBand cluster in both isolated and mixed-traffic scenarios.
Our key finding is that the evaluated switch can provide either low latency or high bandwidth, but not both simultaneously in a mixed-traffic scenario.
We also evaluate several options to improve the latency-bandwidth trade-off and demonstrate that none are ideal.
We find that while queue separation is a solution to protect latency-sensitive applications, it fails to properly manage the bandwidth of other applications.
We also aim to resolve the problem with bandwidth management for non-latency-sensitive applications.
Previous efforts to address this problem have generally focused on achieving max-min fairness at the flow level.
However, we observe that different workloads exhibit varying levels of sensitivity to network bandwidth.
For some workloads, even a small reduction in available bandwidth can significantly increase completion time, while for others, completion time is largely insensitive to available network bandwidth.
As a result, simply splitting the bandwidth equally among all workloads is sub-optimal for overall application-level performance.
To address this issue, we first propose a robust methodology capable of effectively measuring the sensitivity of applications to bandwidth.
We then design Saba, an application-aware bandwidth allocation framework that distributes network bandwidth based on application-level sensitivity.
Saba combines ahead-of-time application profiling to determine bandwidth sensitivity with runtime bandwidth allocation using lightweight software support, with no modifications to network hardware or protocols.
Experiments with a 32-server hardware testbed show that Saba can significantly increase overall performance by reducing the job completion time for bandwidth-sensitive jobs
- …