114 research outputs found
Towards Efficient, Work-Conserving, and Fair Bandwidth Guarantee in Cloud Datacenters
Bandwidth guarantee is a critical feature to enable performance predictability in cloud datacenters. This process is expected to achieve three requirements: work conservation, fairness, and simplicity. However, the distributed nature of datacenters raises significant challenges to attaining those requirements at the same time. In this paper, we propose an efficient approach that can satisfy the three requirements simultaneously. Our scheme takes advantage of multipath TCP (MPTCP) to generate explicit bandwidth guarantee (BG) traffic and work conservation (WC) traffic.We further prioritize the BG traffic over the WC traffic in the network fabric. Due to the priority setting, WC cannot harm bandwidth guarantees and thus is effectively supported. We show that the MPTCP fits this direction well but presents some new issues when the WC subfows own a low priority. We thus adapt the MPTCP to handle these issues through a customized scheduler (which strictly prioritizes BG subfow during packet scheduling) and adopting a large receive buffer. In addition, we enable tenants to share unused bandwidth fairly by managing the overall aggressiveness of the WC traffic. The proposed system can be easily implemented with commercial off-the-shelf servers and switches.We have implemented with the Linux kernel MPTCP for experiments. The extensive experiments in a small cluster (including one MapReduce experiment) and trace-driven simulations show that our scheme achieves the design goals effectively
Enabling Work-conserving Bandwidth Guarantees for Multi-tenant Datacenters via Dynamic Tenant-Queue Binding
Today's cloud networks are shared among many tenants. Bandwidth guarantees
and work conservation are two key properties to ensure predictable performance
for tenant applications and high network utilization for providers. Despite
significant efforts, very little prior work can really achieve both properties
simultaneously even some of them claimed so.
In this paper, we present QShare, an in-network based solution to achieve
bandwidth guarantees and work conservation simultaneously. QShare leverages
weighted fair queuing on commodity switches to slice network bandwidth for
tenants, and solves the challenge of queue scarcity through balanced tenant
placement and dynamic tenant-queue binding. QShare is readily implementable
with existing switching chips. We have implemented a QShare prototype and
evaluated it via both testbed experiments and simulations. Our results show
that QShare ensures bandwidth guarantees while driving network utilization to
over 91% even under unpredictable traffic demands.Comment: The initial work is published in IEEE INFOCOM 201
Datacenter Traffic Control: Understanding Techniques and Trade-offs
Datacenters provide cost-effective and flexible access to scalable compute
and storage resources necessary for today's cloud computing needs. A typical
datacenter is made up of thousands of servers connected with a large network
and usually managed by one operator. To provide quality access to the variety
of applications and services hosted on datacenters and maximize performance, it
deems necessary to use datacenter networks effectively and efficiently.
Datacenter traffic is often a mix of several classes with different priorities
and requirements. This includes user-generated interactive traffic, traffic
with deadlines, and long-running traffic. To this end, custom transport
protocols and traffic management techniques have been developed to improve
datacenter network performance.
In this tutorial paper, we review the general architecture of datacenter
networks, various topologies proposed for them, their traffic properties,
general traffic control challenges in datacenters and general traffic control
objectives. The purpose of this paper is to bring out the important
characteristics of traffic control in datacenters and not to survey all
existing solutions (as it is virtually impossible due to massive body of
existing research). We hope to provide readers with a wide range of options and
factors while considering a variety of traffic control mechanisms. We discuss
various characteristics of datacenter traffic control including management
schemes, transmission control, traffic shaping, prioritization, load balancing,
multipathing, and traffic scheduling. Next, we point to several open challenges
as well as new and interesting networking paradigms. At the end of this paper,
we briefly review inter-datacenter networks that connect geographically
dispersed datacenters which have been receiving increasing attention recently
and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial
Minimal deployable endpoint-driven network forwarding: principle, designs and applications
Networked systems now have significant impact on human lives: the Internet, connecting the world globally, is the foundation of our information age, the data centers, running hundreds of thousands of servers, drive the era of cloud computing, and even the Tor project, a networked system providing online anonymity, now serves millions of daily users.
Guided by the end-to-end principle, many computer networks have been designed with a simple and flexible core offering general data transfer service, whereas the bulk of the application-level functionalities have been implemented on endpoints that are attached to the edge of the network. Although the end-to-end design principle gives these networked systems tremendous success, a number of new requirements have emerged for computer networks and their running applications, including untrustworthy of endpoints, privacy requirement of endpoints, more demanding applications, the rise of third-party Intermediaries and the asymmetric capability of endpoints and so on. These emerging requirements have created various challenges in different networked systems.
To address these challenges, there are no obvious solutions without adding in-network functions to the network core. However, no design principle has ever been proposed for guiding the implementation of in-network functions. In this thesis, We propose the first such principle and apply this principle to propose four designs in three different networked systems to address four separate challenges. We demonstrate through detailed implementation and extensive evaluations that the proposed principle can live in harmony with the end-to-end principle, and a combination of the two principle offers more complete, effective and accurate guides for innovating the modern computer networks and their applications.Ope
Application-driven Bandwidth Guarantees in Datacenters
Providing bandwidth guarantees to specific applications is be-coming increasingly important as applications compete for shared cloud network resources. We present CloudMirror, a solution that provides bandwidth guarantees to cloud applications based on a new network abstraction and workload placement algorithm. An effective network abstraction would enable applications to easily and accurately specify their requirements, while simultaneously enabling the infrastructure to provision resources efficiently for deployed applications. Prior research has approached the bandwidth guarantee specification by using abstractions that resemble physical network topologies. We present a contrasting approach of deriving a network abstraction based on application communication structure, called Tenant Application Graph or TAG. CloudMirror also incorporates a new workload place-ment algorithm that efficiently meets bandwidth requirements specified by TAGs while factoring in high availability consider-ations. Extensive simulations using real application traces and datacenter topologies show that CloudMirror can handle 40% more bandwidth demand than the state of the art (e.g., the Ok-topus system), while improving high availability from 20 % to 70%
Trustworthy Knowledge Planes For Federated Distributed Systems
In federated distributed systems, such as the Internet and the public cloud, the constituent systems can differ in their configuration and provisioning, resulting in significant impacts on the performance, robustness, and security of applications. Yet these systems lack support for distinguishing such characteristics, resulting in uninformed service selection and poor inter-operator coordination. This thesis presents the design and implementation of a trustworthy knowledge plane that can determine such characteristics about autonomous networks on the Internet. A knowledge plane collects the state of network devices and participants. Using this state, applications infer whether a network possesses some characteristic of interest. The knowledge plane uses attestation to attribute state descriptions to the principals that generated them, thereby making the results of inference more trustworthy. Trustworthy knowledge planes enable applications to establish stronger assumptions about their network operating environment, resulting in improved robustness and reduced deployment barriers. We have prototyped the knowledge plane and associated devices. Experience with deploying analyses over production networks demonstrate that knowledge planes impose low cost and can scale to support Internet-scale networks
- …