344 research outputs found
Multi-controller Based Software-Defined Networking: A Survey
Software-Defined Networking (SDN) is a novel network paradigm that enables flexible management for networks. As the network size increases, the single centralized controller cannot meet the increasing demand for flow processing. Thus, the promising solution for SDN with large-scale networks is the multi-controller. In this paper, we present a compressive survey for multi-controller research in SDN. First, we introduce the overview of multi-controller, including the origin of multi-controller and its challenges. Then, we classify multi-controller research into four aspects (scalability, consistency, reliability, load balancing) depending on the process of implementing the multi-controller. Finally, we propose some relevant research issues to deal with in the future and conclude the multi-controller research
Improving Cloud Middlebox Infrastructure for Online Services
Middleboxes are an indispensable part of the datacenter networks that provide high availability, scalability and performance to the online services. Using load balancer as an example, this thesis shows that the prevalent scale-out middlebox designs using commodity servers are plagued with three fundamental problems: (1) The server-based layer-4 middleboxes are costly and inflate round-trip-time as much as 2x by processing the packets in software. (2) The middlebox instances cause traffic detouring en route from sources to destinations, which inflates network bandwidth usage by as much as 3.2x and can cause transient congestion. (3) Additionally, existing cloud providers do not support layer-7 middleboxes as a service, and third-party proxy-based layer-7 middlebox design exhibits poor availability as TCP state stored locally on middlebox instances are lost upon instance failure. This thesis examines the root causes of the above problems and proposes new cloud-scale middlebox design principles that systemically address all three problems.
First, to address the performance problem, we make a key observation that existing commodity switches have resources available to implement key layer-4 middlebox functionalities such as load balancer, and by processing packets in hardware, switches offer low latency and high capacity benefits, at no additional cost as the switch resources are idle. Motivated by this observation, we propose the design principle of using idle switch resources to accelerate middlebox functionailites. To demonstrate the principle, we developed the complete L4 load balancer design that uses commodity switches for low cost and high performance, and carefully fuses a few software load balancer instances to provide for high availability.
Second, to address the high network overhead problem from traffic detouring through middlebox instances, we propose to exploit the principles of locality and flexibility in placing the middlebox instances and servers to handle the traffic closer to the sources and reduce the overall traffic and link utilization in the network.
Third, to provide high availability in a layer 7 middleboxes, we propose a novel middlebox design principle of decoupling the TCP state from middlebox instances and storing it in persistent key-value store so that any middlebox instance can seamlessly take over any TCP connection when middlebox instances fail. We demonstrate the effectiveness of the above cloud-scale middlebox design principles using load balancers as an example. Specifically, we have prototyped the three design principles in three cloud-scale load balancers: Duet, Rubik, and Yoda, respectively. Our evaluation using a datacenter testbed and large scale simulations show that Duet lowers the costs by 12x and latency overhead by 1000x, Rubik further lowers the datacenter network traffic overhead by 3x, and Yoda L7 Load balancer-as-a-service is practical; decoupling TCP state from load balancer instances has a negligible
State-preserving container orchestration in failover scenarios
Containers have been widely adopted for deployment of high availability applications
and services. This adoption is in part due to the native support
of fault tolerance mechanisms in container orchestration frameworks such as
Kubernetes. While Kubernetes provides service replication as a fault tolerance
mechanism for stateless applications, service replication does not satisfy
requirements for stateful applications. Currently this shortcoming is addressed
by data replication in databases. This requires a tight coupling and modification
of the stateful application to support high availability. Thus, this thesis
proposes a new Checkpoint/Restore (C/R) Kubernetes operator to achieve
fault tolerance for stateful applications without any modification of the application.
The operator takes a checkpoint in a configurable interval. In case
of a fault a new application container is created automatically from the most
recent checkpoint. We compare the proposed approach with a more conventional
approach in which we pull and restore the application state from the
application through an API. We measure the overhead of both methods, the
service interruption and the recovery time in case of faults. We find the C/R
Operator has similar performance in recovery time as the traditional approach,
but does not need any application modification. The results signify C/R as a
promising technology for a fault tolerance mechanism for stateful applications
- …