49 research outputs found

    Empowering Cloud Data Centers with Network Programmability

    Get PDF
    Cloud data centers are a critical infrastructure for modern Internet services such as web search, social networking and e-commerce. However, the gradual slow-down of Moore’s law has put a burden on the growth of data centers’ performance and energy efficiency. In addition, the increasing of millisecond-scale and microsecond-scale tasks also bring higher requirements to the throughput and latency for the cloud applications. Today’s server-based solutions are hard to meet the performance requirements in many scenarios like resource management, scheduling, high-speed traffic monitoring and testing. In this dissertation, we study these problems from a network perspective. We investigate a new architecture that leverages the programmability of new-generation network switches to improve the performance and reliability of clouds. As programmable switches only provide very limited memory and functionalities, we exploit compact data structures and deeply co-design software and hardware to best utilize the resource. More specifically, this dissertation presents four systems: (i) NetLock: A new centralized lock management architecture that co-designs programmable switches and servers to simultaneously achieve high performance and rich policy support. It provides orders-of-magnitude higher throughput than existing systems with microsecond-level latency, and supports many commonly-used policies such as performance isolation. (ii) HCSFQ: A scalable and practical solution to implement hierarchical fair queueing on commodity hardware at line rate. Instead of relying on a hierarchy of queues with complex queue management, HCSFQ does not keep per-flow states and uses only one queue to achieve hierarchical fair queueing. (iii) AIFO: A new approach for programmable packet scheduling that only uses a single FIFO queue. AIFO utilizes an admission control mechanism to approximate PIFO which is theoretically ideal but hard to implement with commodity devices. (iv) Lumina: A tool that enables fine-grained analysis of hardware network stack. By exploiting network programmability to emulate various network scenarios, Lumina is able to help users understand the micro-behaviors of hardware network stacks

    Online learning on the programmable dataplane

    Get PDF
    This thesis makes the case for managing computer networks with datadriven methods automated statistical inference and control based on measurement data and runtime observations—and argues for their tight integration with programmable dataplane hardware to make management decisions faster and from more precise data. Optimisation, defence, and measurement of networked infrastructure are each challenging tasks in their own right, which are currently dominated by the use of hand-crafted heuristic methods. These become harder to reason about and deploy as networks scale in rates and number of forwarding elements, but their design requires expert knowledge and care around unexpected protocol interactions. This makes tailored, per-deployment or -workload solutions infeasible to develop. Recent advances in machine learning offer capable function approximation and closed-loop control which suit many of these tasks. New, programmable dataplane hardware enables more agility in the network— runtime reprogrammability, precise traffic measurement, and low latency on-path processing. The synthesis of these two developments allows complex decisions to be made on previously unusable state, and made quicker by offloading inference to the network. To justify this argument, I advance the state of the art in data-driven defence of networks, novel dataplane-friendly online reinforcement learning algorithms, and in-network data reduction to allow classification of switchscale data. Each requires co-design aware of the network, and of the failure modes of systems and carried traffic. To make online learning possible in the dataplane, I use fixed-point arithmetic and modify classical (non-neural) approaches to take advantage of the SmartNIC compute model and make use of rich device local state. I show that data-driven solutions still require great care to correctly design, but with the right domain expertise they can improve on pathological cases in DDoS defence, such as protecting legitimate UDP traffic. In-network aggregation to histograms is shown to enable accurate classification from fine temporal effects, and allows hosts to scale such classification to far larger flow counts and traffic volume. Moving reinforcement learning to the dataplane is shown to offer substantial benefits to stateaction latency and online learning throughput versus host machines; allowing policies to react faster to fine-grained network events. The dataplane environment is key in making reactive online learning feasible—to port further algorithms and learnt functions, I collate and analyse the strengths of current and future hardware designs, as well as individual algorithms

    Traffic Optimization in Data Center and Software-Defined Programmable Networks

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    ResTP: A Configurable and Adaptable Multipath Transport Protocol for Future Internet Resilience

    Get PDF
    Motivated by the shortcomings of common transport protocols, e.g., TCP, UDP, and MPTCP, in modern networking and the belief that a general-purpose transport-layer protocol, which can operate efficiently over diverse network environments while being able to provide desired services for various application types, we design a new transport protocol, ResTP. The rapid advancement of networking technology and use paradigms is continually supporting new applications. The configurable and adaptable multipath-capable ResTP is not only distinct from the standard protocols by its flexibility in satisfying the requirements of different traffic classes considering the characteristics of the underlying networks, but by its emphasis on providing resilience. Resilience is an essential property that is unfortunately missing in the current Internet. In this dissertation, we present the design of ResTP, including the services that it supports and the set of algorithms that implement each service. We also discuss our modular implementation of ResTP in the open-source network simulator ns-3. Finally, the protocol is simulated under various network scenarios, and the results are analyzed in comparison with conventional protocols such as TCP, UDP, and MPTCP to demonstrate that ResTP is a promising new transport-layer protocol providing resilience in the Future Internet (FI)

    On Improving Efficiency of Data-Intensive Applications in Geo-Distributed Environments

    Get PDF
    Distributed systems are pervasively demanded and adopted in nowadays for processing data-intensive workloads since they greatly accelerate large-scale data processing with scalable parallelism and improved data locality. Traditional distributed systems initially targeted computing clusters but have since evolved to data centers with multiple clusters. These systems are mostly built on top of homogeneous, tightly integrated resources connected in high-speed local-area networks (LANs), and typically require data to be ingested to a central data center for processing. Today, with enormous volumes of data continuously generated from geographically distributed locations, direct adoption of such systems is prohibitively inefficient due to the limited system scalability and high cost for centralizing the geo-distributed data over the wide-area networks (WANs). More commonly, it becomes a trend to build geo-distributed systems wherein data processing jobs are performed on top of geo-distributed, heterogeneous resources in proximity to the data at vastly distributed geo-locations. However, critical challenges and mechanisms for efficient execution of data-intensive applications in such geo-distributed environments are unclear by far. The goal of this dissertation is to identify such challenges and mechanisms, by extensively using the research principles and methodology of conventional distributed systems to investigate the geo-distributed environment, and by developing new techniques to tackle these challenges and run data-intensive applications with efficiency at scale. The contributions of this dissertation are threefold. Firstly, the dissertation shows that the high level of resource heterogeneity exhibited in the geo-distributed environment undermines the scalability of geo-distributed systems. Virtualization-based resource abstraction mechanisms have been introduced to abstract the hardware, network, and OS resources throughout the system, to mitigate the underlying resource heterogeneity and enhance the system scalability. Secondly, the dissertation reveals the overwhelming performance and monetary cost incurred by indulgent data sharing over the WANs in geo-distributed systems. Network optimization approaches, including linear- programming-based global optimization, greedy bin-packing heuristics, and TCP enhancement, are developed to optimize the network resource utilization and circumvent unnecessary expenses imposed on data sharing in WANs. Lastly, the dissertation highlights the importance of data locality for data-intensive applications running in the geo-distributed environment. Novel data caching and locality-aware scheduling techniques are devised to improve the data locality.Doctor of Philosoph

    An adaptive network coding scheme for multipath transmission in cellular-based vehicular networks

    Get PDF
    With the emergence of vehicular Internet-of-Things (IoT) applications, it is a significant challenge for vehicular IoT systems to obtain higher throughput in vehicle-to-cloud multipath transmission. Network Coding (NC) has been recognized as a promising paradigm for improving vehicular wireless network throughput by reducing packet loss in transmission. However, existing researches on NC do not consider the influence of the rapid quality change of wireless links on NC schemes, which poses a great challenge to dynamically adjust the coding rate according to the variation of link quality in vehicle-to-cloud multipath transmission in order to avoid consuming unnecessary bandwidth resources and to increase network throughput. Therefore, we propose an Adaptive Network Coding (ANC) scheme brought by the novel integration of the Hidden Markov Model (HMM) into the NC scheme to efficiently adjust the coding rate according to the estimated packet loss rate (PLR). The ANC scheme conquers the rapid change of wireless link quality to obtain the utmost throughput and reduce the packet loss in transmission. In terms of the throughput performance, the simulations and real experiment results show that the ANC scheme outperforms state-of-the-art NC schemes for vehicular wireless multipath transmission in vehicular IoT systems.This work was supported in part by the Fundamental Research Funds for the Central Universities under Grant No.2019YJS015, in part by the National Natural Science Foundation of China (NSFC) under Grant 61872029, and in part by the Beijing Municipal Natural Science Foundation under Grant 4182048
    corecore