15 research outputs found

    Conflict detection in software-defined networks

    Get PDF
    The SDN architecture facilitates the flexible deployment of network functions. While promoting innovation, this architecture induces yet a higher chance of conflicts compared to conventional networks. The detection of conflicts in SDN is the focus of this work. Restrictions of the formal analytical approach drive our choice of an experimental approach, in which we determine a parameter space and a methodology to perform experiments. We have created a dataset covering a number of situations occurring in SDN. The investigation of the dataset yields a conflict taxonomy composed of various classes organized in three broad types: local, distributed and hidden conflicts. Interestingly, hidden conflicts caused by side-effects of control applications‘ behaviour are completely new. We introduce the new concept of multi-property set, and the ·r (“dot r”) operator for the effective comparison of SDN rules. With these capable means, we present algorithms to detect conflicts and develop a conflict detection prototype. The evaluation of the prototype justifies the correctness and the realizability of our proposed concepts and methodologies for classifying as well as for detecting conflicts. Altogether, our work establishes a foundation for further conflict handling efforts in SDN, e.g., conflict resolution and avoidance. In addition, we point out challenges to be explored. Cuong Tran won the DAAD scholarship for his doctoral research at the Munich Network Management Team, Ludwig-Maximilians-UniversitĂ€t MĂŒnchen, and achieved the degree in 2022. He loves to do research on policy conflicts in networked systems, IP multicast and alternatives, network security, and virtualized systems. Besides, teaching and sharing are also among his interests

    Failure Resilience and Traffic Engineering for Multi-Controller Software Defined Networking

    Get PDF
    This thesis explores and proposes solutions to address the challenges faced by Multi-Controller SDN (MCSDN) systems when deploying TE optimisation on WANs. Despite the interest from the research community, existing MCSDN systems present limitations. For example, TE optimisation systems are computationally complex, have high consistency requirements, and need network-wide state to operate. Because of such requirements, MCSDN systems can encounter performance overheads and state consistency problems when implementing TE. Moreover, performance and consistency problems are more prominent when deploying the system on WANs as these network types have higher inter-device latency, delaying state propagation. Unlike existing literature, this thesis presents several design choices that address all four challenges affecting MCSDN systems (scalability, consistency, resilience, and coordination). We use the presented design choices to build Helix, a hierarchical MCSDN system. Helix provides better scalability, performance and failure resilience compared to existing MCSDN systems by sharing minimal state between controllers, offloading operations closer to the data plane and deploying lightweight tasks. A challenge that we faced when building Helix was that existing TE algorithms did not meet Helix's design choices. This thesis presents a new CSPF-based TE algorithm that needs minimal state to operate and supports offloading inter-area TE to local controllers, fulfilling Helix's requirements. Helix's TE algorithm provides better performance and forwarding stability, addressing 1.6x more congestion while performing up to 29x fewer path modifications than the other algorithms evaluated in our experiments. While MCSDN literature has explored evaluating different aspects of system performance, there is a lack of readily available tools and concrete testing methodologies. To this end, this thesis provides concrete testing methodologies and tools readily available to the MCSDN community to evaluate the data plane failure resilience, control plane failure resilience, and TE optimisation performance of MCSDN systems

    Type-Safe Data Plane Programming

    Get PDF
    Since the mid-1990s, there have been efforts to enable more flexible processing of network packets by making packet processing programmable. With the advent of software-defined networking (SDN), this idea has now become a reality. Early approaches initially focused on control plane programming, with the goal of implementing centralized network policies at a high level of abstraction without having to use low-level, device-specific configuration mechanisms. For this purpose, various network programming languages have been developed, which provide correctness guarantees and make the formal verification of network policies possible. More recently, it is also possible to program the network data plane. Being able to define the structure of network packet headers freely, opens up a whole new range of applications, from implementing new network protocols up to moving application logic directly into the network. Until today, the P4 language has become the de facto standard for programming data planes. While P4 provides declarative abstractions for programming data planes, P4 lacks basic safety guarantees to help avoid errors and implement correct applications for the data plane. Modern programming languages use static type systems to provide languages with basic safety guarantees that completely eliminate the occurrence of entire categories of errors. Surprisingly, however, the use of type systems in the field of network programming has hardly been investigated. This dissertation investigates what appropriate type systems must look like in order to provide data plane programming languages—in particular, P4—with static correctness guarantees. As a first step, we present SafeP4, a domain-specific language for programmable data planes that is equipped with a static type system that guarantees that all headers that are read or written are valid, which is a common cause of errors. We then present Π4, whose type system is based on dependent types and is thus able to bridge the gap in terms of expressiveness between SafeP4 and full-fledged verification tools. At the same time, Π4 enables modular verification of programs. Our evaluation using open source programs confirms that accessing invalid packet headers is a common source of errors in practice and that the SafeP4’s type system is capable of identifying buggy programs. Using case studies, we show that Π4’s type system is capable of expressing and verifying a variety of real-world correctness properties

    Hyperscale Data Processing With Network-Centric Designs

    Get PDF
    Today’s largest data processing workloads are hosted in cloud data centers. Due to unprecedented data growth and the end of Moore’s Law, these workloads have ballooned to the hyperscale level, encompassing billions to trillions of data items and hundreds to thousands of machines per query. Enabling and expanding with these workloads are highly scalable data center networks that connect up to hundreds of thousands of networked servers. These massive scales fundamentally challenge the designs of both data processing systems and data center networks, and the classic layered designs are no longer sustainable. Rather than optimize these massive layers in silos, we build systems across them with principled network-centric designs. In current networks, we redesign data processing systems with network-awareness to minimize the cost of moving data in the network. In future networks, we propose new interfaces and services that the cloud infrastructure offers to applications and codesign data processing systems to achieve optimal query processing performance. To transform the network to future designs, we facilitate network innovation at scale. This dissertation presents a line of systems work that covers all three directions. It first discusses GraphRex, a network-aware system that combines classic database and systems techniques to push the performance of massive graph queries in current data centers. It then introduces data processing in disaggregated data centers, a promising new cloud proposal. It details TELEPORT, a compute pushdown feature that eliminates data processing performance bottlenecks in disaggregated data centers, and Redy, which provides high-performance caches using remote disaggregated memory. Finally, it presents MimicNet, a fine-grained simulation framework that evaluates network proposals at datacenter scale with machine learning approximation. These systems demonstrate that our ideas in network-centric designs achieve orders of magnitude higher efficiency compared to the state of the art at hyperscale

    Systems Support for Trusted Execution Environments

    Get PDF
    Cloud computing has become a default choice for data processing by both large corporations and individuals due to its economy of scale and ease of system management. However, the question of trust and trustoworthy computing inside the Cloud environments has been long neglected in practice and further exacerbated by the proliferation of AI and its use for processing of sensitive user data. Attempts to implement the mechanisms for trustworthy computing in the cloud have previously remained theoretical due to lack of hardware primitives in the commodity CPUs, while a combination of Secure Boot, TPMs, and virtualization has seen only limited adoption. The situation has changed in 2016, when Intel introduced the Software Guard Extensions (SGX) and its enclaves to the x86 ISA CPUs: for the first time, it became possible to build trustworthy applications relying on a commonly available technology. However, Intel SGX posed challenges to the practitioners who discovered the limitations of this technology, from the limited support of legacy applications and integration of SGX enclaves into the existing system, to the performance bottlenecks on communication, startup, and memory utilization. In this thesis, our goal is enable trustworthy computing in the cloud by relying on the imperfect SGX promitives. To this end, we develop and evaluate solutions to issues stemming from limited systems support of Intel SGX: we investigate the mechanisms for runtime support of POSIX applications with SCONE, an efficient SGX runtime library developed with performance limitations of SGX in mind. We further develop this topic with FFQ, which is a concurrent queue for SCONE's asynchronous system call interface. ShieldBox is our study of interplay of kernel bypass and trusted execution technologies for NFV, which also tackles the problem of low-latency clocks inside enclave. The two last systems, Clemmys and T-Lease are built on a more recent SGXv2 ISA extension. In Clemmys, SGXv2 allows us to significantly reduce the startup time of SGX-enabled functions inside a Function-as-a-Service platform. Finally, in T-Lease we solve the problem of trusted time by introducing a trusted lease primitive for distributed systems. We perform evaluation of all of these systems and prove that they can be practically utilized in existing systems with minimal overhead, and can be combined with both legacy systems and other SGX-based solutions. In the course of the thesis, we enable trusted computing for individual applications, high-performance network functions, and distributed computing framework, making a <vision of trusted cloud computing a reality

    On the Importance of Infrastructure-Awareness in Large-Scale Distributed Storage Systems

    Get PDF
    Big data applications put significant latency and throughput demands on distributed storage systems. Meeting these demands requires storage systems to use a significant amount of infrastructure resources, such as network capacity and storage devices. Resource demands largely depend on the workloads and can vary significantly over time. Moreover, demand hotspots can move rapidly between different infrastructure locations. Existing storage systems are largely infrastructure-oblivious as they are designed to support a broad range of hardware and deployment scenarios. Most only use basic configuration information about the infrastructure to make important placement and routing decisions. In the case of cloud-based storage systems, cloud services have their own infrastructure-specific limitations, such as minimum request sizes and maximum number of concurrent requests. By ignoring infrastructure-specific details, these storage systems are unable to react to resource demand changes and may have additional inefficiencies from performing redundant network operations. As a result, provisioning enough resources for these systems to address all possible workloads and scenarios would be cost prohibitive. This thesis studies the performance problems in commonly used distributed storage systems and introduces novel infrastructure-aware design methods to improve their performance. First, it addresses the problem of slow reads due to network congestion that is induced by disjoint replica and path selection. Selecting a read replica separately from the network path can perform poorly if all paths to the pre-selected endpoints are congested. Second, this thesis looks at scalability limitations of consensus protocols that are commonly used in geo-distributed key value stores and distributed ledgers. Due to their network-oblivious designs, existing protocols redundantly communicate over highly oversubscribed WAN links, which poorly utilize network resources and limits consistent replication at large scale. Finally, this thesis addresses the need for a cloud-specific realtime storage system for capital market use cases. Public cloud infrastructures provide feature-rich and cost-effective storage services. However, existing realtime timeseries databases are not built to take advantage of cloud storage services. Therefore, they do not effectively utilize cloud services to provide high performance while minimizing deployment cost. This thesis presents three systems that address these problems by using infrastructure-aware design methods. Our performance evaluation of these systems shows that infrastructure-aware design is highly effective in improving the performance of large scale distributed storage systems

    Rethinking Software Network Data Planes in the Era of Microservices

    Get PDF
    L'abstract Ăš presente nell'allegato / the abstract is in the attachmen

    Towards Scalable Network Traffic Measurement With Sketches

    Get PDF
    Driven by the ever-increasing data volume through the Internet, the per-port speed of network devices reached 400 Gbps, and high-end switches are capable of processing 25.6 Tbps of network traffic. To improve the efficiency and security of the network, network traffic measurement becomes more important than ever. For fast and accurate traffic measurement, managing an accurate working set of active flows (WSAF) at line rates is a key challenge. WSAF is usually located in high-speed but expensive memories, such as TCAM or SRAM, and thus their capacity is quite limited. To scale up the per-flow measurement, we pursue three thrusts. In the first thrust, we propose to use In-DRAM WSAF and put a compact data structure (i.e., sketch) called FlowRegulator before WSAF to compensate for DRAM\u27s slow access time. Per our results, FlowRegulator can substantially reduce massive influxes to WSAF without compromising measurement accuracy. In the second thrust, we integrate our sketch into a network system and propose an SDN-based WLAN monitoring and management framework called RFlow+, which can overcome the limitations of existing traffic measurement solutions (e.g., OpenFlow and sFlow), such as a limited view, incomplete flow statistics, and poor trade-off between measurement accuracy and CPU/network overheads. In the third thrust, we introduce a novel sampling scheme to deal with the poor trade-off that is provided by the standard simple random sampling (SRS). Even though SRS has been widely used in practice because of its simplicity, it provides non-uniform sampling rates for different flows, because it samples packets over an aggregated data flow. Starting with a simple idea that independent per-flow packet sampling provides the most accurate estimation of each flow, we introduce a new concept of per-flow systematic sampling, aiming to provide the same sampling rate across all flows. In addition, we provide a concrete sampling method called SketchFlow, which approximates the idea of the per-flow systematic sampling using a sketch saturation event

    Scalable and Reliable Middlebox Deployment

    Get PDF
    Middleboxes are pervasive in modern computer networks providing functionalities beyond mere packet forwarding. Load balancers, intrusion detection systems, and network address translators are typical examples of middleboxes. Despite their benefits, middleboxes come with several challenges with respect to their scalability and reliability. The goal of this thesis is to devise middlebox deployment solutions that are cost effective, scalable, and fault tolerant. The thesis includes three main contributions: First, distributed service function chaining with multiple instances of a middlebox deployed on different physical servers to optimize resource usage; Second, Constellation, a geo-distributed middlebox framework enabling a middlebox application to operate with high performance across wide area networks; Third, a fault tolerant service function chaining system
    corecore