601 research outputs found

    Randomized Local Fast Rerouting for Datacenter Networks with Almost Optimal Congestion

    Get PDF
    To ensure high availability, datacenter networks must rely on local fast rerouting mechanisms that allow routers to quickly react to link failures, in a fully decentralized manner. However, configuring these mechanisms to provide a high resilience against multiple failures while avoiding congestion along failover routes is algorithmically challenging, as the rerouting rules can only depend on local failure information and must be defined ahead of time. This paper presents a randomized local fast rerouting algorithm for Clos networks, the predominant datacenter topologies. Given a graph G=(V,E)G=(V,E) describing a Clos topology, our algorithm defines local routing rules for each node v∈Vv\in V, which only depend on the packet's destination and are conditioned on the incident link failures. We prove that as long as number of failures at each node does not exceed a certain bound, our algorithm achieves an asymptotically minimal congestion up to polyloglog factors along failover paths. Our lower bounds are developed under some natural routing assumptions

    Segment Routing: a Comprehensive Survey of Research Activities, Standardization Efforts and Implementation Results

    Full text link
    Fixed and mobile telecom operators, enterprise network operators and cloud providers strive to face the challenging demands coming from the evolution of IP networks (e.g. huge bandwidth requirements, integration of billions of devices and millions of services in the cloud). Proposed in the early 2010s, Segment Routing (SR) architecture helps face these challenging demands, and it is currently being adopted and deployed. SR architecture is based on the concept of source routing and has interesting scalability properties, as it dramatically reduces the amount of state information to be configured in the core nodes to support complex services. SR architecture was first implemented with the MPLS dataplane and then, quite recently, with the IPv6 dataplane (SRv6). IPv6 SR architecture (SRv6) has been extended from the simple steering of packets across nodes to a general network programming approach, making it very suitable for use cases such as Service Function Chaining and Network Function Virtualization. In this paper we present a tutorial and a comprehensive survey on SR technology, analyzing standardization efforts, patents, research activities and implementation results. We start with an introduction on the motivations for Segment Routing and an overview of its evolution and standardization. Then, we provide a tutorial on Segment Routing technology, with a focus on the novel SRv6 solution. We discuss the standardization efforts and the patents providing details on the most important documents and mentioning other ongoing activities. We then thoroughly analyze research activities according to a taxonomy. We have identified 8 main categories during our analysis of the current state of play: Monitoring, Traffic Engineering, Failure Recovery, Centrally Controlled Architectures, Path Encoding, Network Programming, Performance Evaluation and Miscellaneous...Comment: SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIAL

    Routing on the Channel Dependency Graph:: A New Approach to Deadlock-Free, Destination-Based, High-Performance Routing for Lossless Interconnection Networks

    Get PDF
    In the pursuit for ever-increasing compute power, and with Moore's law slowly coming to an end, high-performance computing started to scale-out to larger systems. Alongside the increasing system size, the interconnection network is growing to accommodate and connect tens of thousands of compute nodes. These networks have a large influence on total cost, application performance, energy consumption, and overall system efficiency of the supercomputer. Unfortunately, state-of-the-art routing algorithms, which define the packet paths through the network, do not utilize this important resource efficiently. Topology-aware routing algorithms become increasingly inapplicable, due to irregular topologies, which either are irregular by design, or most often a result of hardware failures. Exchanging faulty network components potentially requires whole system downtime further increasing the cost of the failure. This management approach becomes more and more impractical due to the scale of today's networks and the accompanying steady decrease of the mean time between failures. Alternative methods of operating and maintaining these high-performance interconnects, both in terms of hardware- and software-management, are necessary to mitigate negative effects experienced by scientific applications executed on the supercomputer. However, existing topology-agnostic routing algorithms either suffer from poor load balancing or are not bounded in the number of virtual channels needed to resolve deadlocks in the routing tables. Using the fail-in-place strategy, a well-established method for storage systems to repair only critical component failures, is a feasible solution for current and future HPC interconnects as well as other large-scale installations such as data center networks. Although, an appropriate combination of topology and routing algorithm is required to minimize the throughput degradation for the entire system. This thesis contributes a network simulation toolchain to facilitate the process of finding a suitable combination, either during system design or while it is in operation. On top of this foundation, a key contribution is a novel scheduling-aware routing, which reduces fault-induced throughput degradation while improving overall network utilization. The scheduling-aware routing performs frequent property preserving routing updates to optimize the path balancing for simultaneously running batch jobs. The increased deployment of lossless interconnection networks, in conjunction with fail-in-place modes of operation and topology-agnostic, scheduling-aware routing algorithms, necessitates new solutions to solve the routing-deadlock problem. Therefore, this thesis further advances the state-of-the-art by introducing a novel concept of routing on the channel dependency graph, which allows the design of an universally applicable destination-based routing capable of optimizing the path balancing without exceeding a given number of virtual channels, which are a common hardware limitation. This disruptive innovation enables implicit deadlock-avoidance during path calculation, instead of solving both problems separately as all previous solutions
    • …
    corecore