2,242 research outputs found
Datacenter Traffic Control: Understanding Techniques and Trade-offs
Datacenters provide cost-effective and flexible access to scalable compute
and storage resources necessary for today's cloud computing needs. A typical
datacenter is made up of thousands of servers connected with a large network
and usually managed by one operator. To provide quality access to the variety
of applications and services hosted on datacenters and maximize performance, it
deems necessary to use datacenter networks effectively and efficiently.
Datacenter traffic is often a mix of several classes with different priorities
and requirements. This includes user-generated interactive traffic, traffic
with deadlines, and long-running traffic. To this end, custom transport
protocols and traffic management techniques have been developed to improve
datacenter network performance.
In this tutorial paper, we review the general architecture of datacenter
networks, various topologies proposed for them, their traffic properties,
general traffic control challenges in datacenters and general traffic control
objectives. The purpose of this paper is to bring out the important
characteristics of traffic control in datacenters and not to survey all
existing solutions (as it is virtually impossible due to massive body of
existing research). We hope to provide readers with a wide range of options and
factors while considering a variety of traffic control mechanisms. We discuss
various characteristics of datacenter traffic control including management
schemes, transmission control, traffic shaping, prioritization, load balancing,
multipathing, and traffic scheduling. Next, we point to several open challenges
as well as new and interesting networking paradigms. At the end of this paper,
we briefly review inter-datacenter networks that connect geographically
dispersed datacenters which have been receiving increasing attention recently
and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial
RepFlow: Minimizing Flow Completion Times with Replicated Flows in Data Centers
Short TCP flows that are critical for many interactive applications in data
centers are plagued by large flows and head-of-line blocking in switches.
Hash-based load balancing schemes such as ECMP aggravate the matter and result
in long-tailed flow completion times (FCT). Previous work on reducing FCT
usually requires custom switch hardware and/or protocol changes. We propose
RepFlow, a simple yet practically effective approach that replicates each short
flow to reduce the completion times, without any change to switches or host
kernels. With ECMP the original and replicated flows traverse distinct paths
with different congestion levels, thereby reducing the probability of having
long queueing delay. We develop a simple analytical model to demonstrate the
potential improvement of RepFlow. Extensive NS-3 simulations and Mininet
implementation show that RepFlow provides 50%--70% speedup in both mean and
99-th percentile FCT for all loads, and offers near-optimal FCT when used with
DCTCP.Comment: To appear in IEEE INFOCOM 201
Distributed VNF Scaling in Large-scale Datacenters: An ADMM-based Approach
Network Functions Virtualization (NFV) is a promising network architecture
where network functions are virtualized and decoupled from proprietary
hardware. In modern datacenters, user network traffic requires a set of Virtual
Network Functions (VNFs) as a service chain to process traffic demands. Traffic
fluctuations in Large-scale DataCenters (LDCs) could result in overload and
underload phenomena in service chains. In this paper, we propose a distributed
approach based on Alternating Direction Method of Multipliers (ADMM) to jointly
load balance the traffic and horizontally scale up and down VNFs in LDCs with
minimum deployment and forwarding costs. Initially we formulate the targeted
optimization problem as a Mixed Integer Linear Programming (MILP) model, which
is NP-complete. Secondly, we relax it into two Linear Programming (LP) models
to cope with over and underloaded service chains. In the case of small or
medium size datacenters, LP models could be run in a central fashion with a low
time complexity. However, in LDCs, increasing the number of LP variables
results in additional time consumption in the central algorithm. To mitigate
this, our study proposes a distributed approach based on ADMM. The
effectiveness of the proposed mechanism is validated in different scenarios.Comment: IEEE International Conference on Communication Technology (ICCT),
Chengdu, China, 201
- …