271 research outputs found

    Scalability and Resilience Analysis of Software-Defined Networking

    Get PDF
    Software-defined Networking (SDN) ist eine moderne Architektur für Kommunikationsnetze, welche entwickelt wurde, um die Einführung von neuen Diensten und Funktionen in Netzwerke zu erleichtern. Durch eine Trennung der Weiterleitungs- und Kontrollfunktionen sind nur wenige Kontrollelemente mit Software-Updates zu versehen, um Veränderungen am Netz vornehmen zu können. Allerdings wirft die Netzstrukturierung von SDN neue Fragen bezüglich Skalierbarkeit und Ausfallsicherheit auf, welche in dezentralen Netzstrukturen nicht auftreten. In dieser Arbeit befassen wir uns mit Fragestellungen zu Skalierbarkeit und Ausfallsicherheit in Bezug auf Unicast- und Multicast-Verkehr in SDN-basierten Netzen. Wir führen eine Komprimierungstechnik für Routingtabellen ein, welche die Skalierungsproblematik aktueller SDN Weiterleitungsgeräte verbessern soll und ermitteln ihre Effizienz in einer Leistungsbewertung. Außerdem diskutieren wir unterschiedliche Methoden, um die Ausfallsicherheit in SDN zu verbessern. Wir analysieren sie auf öffentlich zugänglichen Netzwerken und benennen Vor- und Nachteile der Ansätze. Abschließend schlagen wir eine skalierbare und ausfallsichere Architektur für Multicast-basiertes SDN vor. Wir untersuchen ihre Effizienz in einer Leistungsbewertung und zeigen ihre Umsetzbarkeit mithilfe eines Prototypen.Software-Defined Networking (SDN) is a novel architecture for communication networks that has been developed to ease the introduction of new network services and functions. It leverages the separation of the data plane and the control plane to allow network services to be deployed solely in software. Although SDN provides great flexibility, the applicability of SDN in communication networks raises several questions with regard to scalability and resilience against network failures. These concerns are not prevalent in current decentralized network architectures. In this thesis, we address scalability and resilience issues with regard to unicast and multicast traffic for SDN-based networks. We propose a new compression method for inter-domain routing tables to address hardware limitations of current SDN switches and analyze its effectiveness. We propose various resilience methods for SDN and identify their key performance indicators in the context of carrier-grade and datacenter networks. We discuss the advantages and disadvantages of these proposals and their appropriate use cases. Finally, we propose a scalable and resilient software-defined multicast architecture. We study the effectiveness of our approach and show its feasibility using a prototype implementation

    Resilient and Scalable Forwarding for Software-Defined Networks with P4-Programmable Switches

    Get PDF
    Traditional networking devices support only fixed features and limited configurability. Network softwarization leverages programmable software and hardware platforms to remove those limitations. In this context the concept of programmable data planes allows directly to program the packet processing pipeline of networking devices and create custom control plane algorithms. This flexibility enables the design of novel networking mechanisms where the status quo struggles to meet high demands of next-generation networks like 5G, Internet of Things, cloud computing, and industry 4.0. P4 is the most popular technology to implement programmable data planes. However, programmable data planes, and in particular, the P4 technology, emerged only recently. Thus, P4 support for some well-established networking concepts is still lacking and several issues remain unsolved due to the different characteristics of programmable data planes in comparison to traditional networking. The research of this thesis focuses on two open issues of programmable data planes. First, it develops resilient and efficient forwarding mechanisms for the P4 data plane as there are no satisfying state of the art best practices yet. Second, it enables BIER in high-performance P4 data planes. BIER is a novel, scalable, and efficient transport mechanism for IP multicast traffic which has only very limited support of high-performance forwarding platforms yet. The main results of this thesis are published as 8 peer-reviewed and one post-publication peer-reviewed publication. The results cover the development of suitable resilience mechanisms for P4 data planes, the development and implementation of resilient BIER forwarding in P4, and the extensive evaluations of all developed and implemented mechanisms. Furthermore, the results contain a comprehensive P4 literature study. Two more peer-reviewed papers contain additional content that is not directly related to the main results. They implement congestion avoidance mechanisms in P4 and develop a scheduling concept to find cost-optimized load schedules based on day-ahead forecasts

    Technology-related disasters:a survey towards disaster-resilient software defined networks

    Get PDF
    Resilience against disaster scenarios is essential to network operators, not only because of the potential economic impact of a disaster but also because communication networks form the basis of crisis management. COST RECODIS aims at studying measures, rules, techniques and prediction mechanisms for different disaster scenarios. This paper gives an overview of different solutions in the context of technology-related disasters. After a general overview, the paper focuses on resilient Software Defined Networks

    Resiliency in numerical algorithm design for extreme scale simulations

    Get PDF
    This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48 h on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 1023 floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically.Peer Reviewed"Article signat per 36 autors/es: Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M. Ciorba, Nathan DeBardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N. Gansterer, Luc Giraud, Dominik G ̈oddeke, Marco Heisig, Fabienne Jezequel, Nils Kohl, Xiaoye Sherry Li, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S. Quintana-Ortiz, Francesco Rizzi, Ulrich Rude, Martin Schulz, Fred Fung, Robert Speck, Linda Stals, Keita Teranishi, Samuel Thibault, Dominik Thonnes, Andreas Wagner and Barbara Wohlmuth"Postprint (author's final draft

    Foutbestendige toekomstige internetarchitecturen

    Get PDF

    Efficient Routing Protection Algorithm Based on Optimized Network Topology

    Get PDF
    Network failures are unavoidable and occur frequently. When the network fails, intra-domain routing protocols deploying on the Internet need to undergo a long convergence process. During this period, a large number of messages are discarded, which results in a decline in the user experience and severely affects the quality of service of Internet Service Providers (ISP). Therefore, improving the availability of intra-domain routing is a trending research question to be solved. Industry usually employs routing protection algorithms to improve intra-domain routing availability. However, existing routing protection schemes compute as many backup paths as possible to reduce message loss due to network failures, which increases the cost of the network and impedes the methods deployed in practice. To address the issues, this study proposes an efficient routing protection algorithm based on optimized network topology (ERPBONT). ERPBONT adopts the optimized network topology to calculate a backup path with the minimum path coincidence degree with the shortest path for all source purposes. Firstly, the backup path with the minimum path coincidence with the shortest path is described as an integer programming problem. Then the simulated annealing algorithm ERPBONT is used to find the optimal solution. Finally, the algorithm is tested on the simulated topology and the real topology. The experimental results show that ERPBONT effectively reduces the path coincidence between the shortest path and the backup path, and significantly improves the routing availability

    Resilient routing in the internet

    Get PDF
    Although it is widely known that the Internet is not prone to random failures, unplanned failures due to attacks can be very damaging. This prevents many organisations from deploying beneficial operations through the Internet. In general, the data is delivered from a source to a destination via a series of routers (i.e routing path). These routers employ routing protocols to compute best paths based on routing information they possess. However, when a failure occurs, the routers must re-construct their routing tables, which may take several seconds to complete. Evidently, most losses occur during this period. IP Fast Re-Route (IPFRR), Multi-Topology (MT) routing, and overlays are examples of solutions proposed to handle network failures. These techniques alleviate the packet losses to different extents, yet none have provided optimal solutions. This thesis focuses on identifying the fundamental routing problem due to convergence process. It describes the mechanisms of each existing technique as well as its pros and cons. Furthermore, it presents new techniques for fast re-routing as follows. Enhanced Loop-Free Alternates (E-LFAs) increase the repair coverage of the existing techniques, Loop-Free Alternates (LFAs). In addition, two techniques namely, Full Fast Failure Recovery (F3R) and fast re-route using Alternate Next Hop Counters (ANHC), offer full protection against any single link failures. Nevertheless, the former technique requires significantly higher computational overheads and incurs longer backup routes. Both techniques are proved to be complete and correct while ANHC neither requires any major modifications to the traditional routing paradigm nor incurs significant overheads. Furthermore, in the presence of failures, ANHC does not jeopardise other operable parts of the network. As emerging applications require higher reliability, multiple failures scenarios cannot be ignored. Most existing fast re-route techniques are able to handle only single or dual failures cases. This thesis provides an insight on a novel approach known as Packet Re-cycling (PR), which is capable of handling any number of failures in an oriented network. That is, packets can be forwarded successfully as long as a path between a source and a destination is available. Since the Internet-based services and applications continue to advance, improving the network resilience will be a challenging research topic for the decades to come

    Carrier grade resilience in geographically distributed software defined networks

    Get PDF
    The Internet is a fundamental infrastructure in modern life, supporting many different communication services. One of the most critical properties of the Internet is its ability to recover from failures, such as link or equipment failure. The goal of network resilience heavily influenced the design of the Internet, leading to the use of distributed routing protocols. While distributed algorithms largely solve the issue of network resilience, other concerns remain. A significant concern is network management, as it is a complex and error-prone process. In addition, network control logic is tightly integrated into the forwarding devices, making it difficult to upgrade the logic to introduce new features. Finally, the lack of a common control platform requires new network functions to provide their own solutions to common, but challenging, issues related to operating in a distributed environment. A new network architecture, software-defined networking (SDN), aims to alleviate many of these network challenges by introducing useful abstractions into the control plane. In an SDN architecture, control functions are implemented as network applications, and run in a logically centralized network operating system (NOS). The NOS provides the applications with abstractions for common functions, such as network discovery, installation of forwarding behaviour, and state distribution. Network management can be handled programmatically instead of manually, and new features can be introduced by simply updating or adding a control application in the NOS. Given proper design, an SDN architecture could improve the performance of reactive approaches to restoring traffic after a network failure. However, it has been shown in this dissertation that a reactive approach to traffic restoration will not meet the requirements of carrier grade networks, which require that traffic is redirected onto a back-up route less than 50 ms after the failure is detected. To achieve 50 ms recovery, a proactive approach must be used, where back-up rules are calculated and installed before a failure occurs. Several different protocols implement this proactive approach in traditional networks, and some work has also been done in the SDN space. However, current SDN solutions for fast recovery are not necessarily suitable for a carrier grade environment. This dissertation proposes a new failure recovery strategy for SDN, based on existing protocols used in traditional carrier grade networks. The use of segment routing allows for back-up routes to be encoded into the packet header when a failure occurs, without needing to inform other switches of the failure. Back-up routes follow the post-convergence path, meaning that they will not violate traffic engineering constraints on the network. An MPLS (multiprotocol label switching) data plane is used to ensure compatibility with current carrier networks, as MPLS is currently a common protocol in carrier networks. The proposed solution was implemented as a network application, on top of an open-source network operating system. A geographically distributed network testbed was used to verify the suitability for a geographically distributed carrier network. Proof of concept tests showed that the proposed solution provides complete protection for any single link, link aggregate or node failure in the network. In addition, communication latencies in the network do not influence the restoration time, as they do in reactive approaches. Finally, analysis of the back-up path metrics, such as back-up path lengths and number of labels required, showed that the application installed efficient back-up paths

    Traffic Re-engineering: Extending Resource Pooling Through the Application of Re-feedback

    Get PDF
    Parallelism pervades the Internet, yet efficiently pooling this increasing path diversity has remained elusive. With no holistic solution for resource pooling, each layer of the Internet architecture attempts to balance traffic according to its own needs, potentially at the expense of others. From the edges, traffic is implicitly pooled over multiple paths by retrieving content from different sources. Within the network, traffic is explicitly balanced across multiple links through the use of traffic engineering. This work explores how the current architecture can be realigned to facilitate resource pooling at both network and transport layers, where tension between stakeholders is strongest. The central theme of this thesis is that traffic engineering can be performed more efficiently, flexibly and robustly through the use of re-feedback. A cross-layer architecture is proposed for sharing the responsibility for resource pooling across both hosts and network. Building on this framework, two novel forms of traffic management are evaluated. Efficient pooling of traffic across paths is achieved through the development of an in-network congestion balancer, which can function in the absence of multipath transport. Network and transport mechanisms are then designed and implemented to facilitate path fail-over, greatly improving resilience without requiring receiver side cooperation. These contributions are framed by a longitudinal measurement study which provides evidence for many of the design choices taken. A methodology for scalably recovering flow metrics from passive traces is developed which in turn is systematically applied to over five years of interdomain traffic data. The resulting findings challenge traditional assumptions on the preponderance of congestion control on resource sharing, with over half of all traffic being constrained by limits other than network capacity. All of the above represent concerted attempts to rethink and reassert traffic engineering in an Internet where competing solutions for resource pooling proliferate. By delegating responsibilities currently overloading the routing architecture towards hosts and re-engineering traffic management around the core strengths of the network, the proposed architectural changes allow the tussle surrounding resource pooling to be drawn out without compromising the scalability and evolvability of the Internet
    corecore