507 research outputs found

    Routing on the Channel Dependency Graph:: A New Approach to Deadlock-Free, Destination-Based, High-Performance Routing for Lossless Interconnection Networks

    Get PDF
    In the pursuit for ever-increasing compute power, and with Moore's law slowly coming to an end, high-performance computing started to scale-out to larger systems. Alongside the increasing system size, the interconnection network is growing to accommodate and connect tens of thousands of compute nodes. These networks have a large influence on total cost, application performance, energy consumption, and overall system efficiency of the supercomputer. Unfortunately, state-of-the-art routing algorithms, which define the packet paths through the network, do not utilize this important resource efficiently. Topology-aware routing algorithms become increasingly inapplicable, due to irregular topologies, which either are irregular by design, or most often a result of hardware failures. Exchanging faulty network components potentially requires whole system downtime further increasing the cost of the failure. This management approach becomes more and more impractical due to the scale of today's networks and the accompanying steady decrease of the mean time between failures. Alternative methods of operating and maintaining these high-performance interconnects, both in terms of hardware- and software-management, are necessary to mitigate negative effects experienced by scientific applications executed on the supercomputer. However, existing topology-agnostic routing algorithms either suffer from poor load balancing or are not bounded in the number of virtual channels needed to resolve deadlocks in the routing tables. Using the fail-in-place strategy, a well-established method for storage systems to repair only critical component failures, is a feasible solution for current and future HPC interconnects as well as other large-scale installations such as data center networks. Although, an appropriate combination of topology and routing algorithm is required to minimize the throughput degradation for the entire system. This thesis contributes a network simulation toolchain to facilitate the process of finding a suitable combination, either during system design or while it is in operation. On top of this foundation, a key contribution is a novel scheduling-aware routing, which reduces fault-induced throughput degradation while improving overall network utilization. The scheduling-aware routing performs frequent property preserving routing updates to optimize the path balancing for simultaneously running batch jobs. The increased deployment of lossless interconnection networks, in conjunction with fail-in-place modes of operation and topology-agnostic, scheduling-aware routing algorithms, necessitates new solutions to solve the routing-deadlock problem. Therefore, this thesis further advances the state-of-the-art by introducing a novel concept of routing on the channel dependency graph, which allows the design of an universally applicable destination-based routing capable of optimizing the path balancing without exceeding a given number of virtual channels, which are a common hardware limitation. This disruptive innovation enables implicit deadlock-avoidance during path calculation, instead of solving both problems separately as all previous solutions

    Survivability in layered networks

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 195-204).In layered networks, a single failure at the lower (physical) layer may cause multiple failures at the upper (logical) layer. As a result, traditional schemes that protect against single failures may not be effective in layered networks. This thesis studies the problem of maximizing network survivability in the layered setting, with a focus on optimizing the embedding of the logical network onto the physical network. In the first part of the thesis, we start with an investigation of the fundamental properties of layered networks, and show that basic network connectivity structures, such as cuts, paths and spanning trees, exhibit fundamentally different characteristics from their single-layer counterparts. This leads to our development of a new crosslayer survivability metric that properly quantifies the resilience of the layered network against physical failures. Using this new metric, we design algorithms to embed the logical network onto the physical network based on multi-commodity flows, to maximize the cross-layer survivability. In the second part of the thesis, we extend our model to a random failure setting and study the cross-layer reliability of the networks, defined to be the probability that the upper layer network stays connected under the random failure events. We generalize the classical polynomial expression for network reliability to the layered setting. Using Monte-Carlo techniques, we develop efficient algorithms to compute an approximate polynomial expression for reliability, as a function of the link failure probability. The construction of the polynomial eliminates the need to resample when the cross-layer reliability under different link failure probabilities is assessed. Furthermore, the polynomial expression provides important insight into the connection between the link failure probability, the cross-layer reliability and the structure of a layered network. We show that in general the optimal embedding depends on the link failure probability, and characterize the properties of embeddings that maximize the reliability under different failure probability regimes. Based on these results, we propose new iterative approaches to improve the reliability of the layered networks. We demonstrate via extensive simulations that these new approaches result in embeddings with significantly higher reliability than existing algorithms.by Kayi Lee.Ph.D

    A Survey of Scheduling in Time-Sensitive Networking (TSN)

    Full text link
    TSN is an enhancement of Ethernet which provides various mechanisms for real-time communication. Time-triggered (TT) traffic represents periodic data streams with strict real-time requirements. Amongst others, TSN supports scheduled transmission of TT streams, i.e., the transmission of their packets by edge nodes is coordinated in such a way that none or very little queuing delay occurs in intermediate nodes. TSN supports multiple priority queues per egress port. The TAS uses so-called gates to explicitly allow and block these queues for transmission on a short periodic timescale. The TAS is utilized to protect scheduled traffic from other traffic to minimize its queuing delay. In this work, we consider scheduling in TSN which comprises the computation of periodic transmission instants at edge nodes and the periodic opening and closing of queue gates. In this paper, we first give a brief overview of TSN features and standards. We state the TSN scheduling problem and explain common extensions which also include optimization problems. We review scheduling and optimization methods that have been used in this context. Then, the contribution of currently available research work is surveyed. We extract and compile optimization objectives, solved problem instances, and evaluation results. Research domains are identified, and specific contributions are analyzed. Finally, we discuss potential research directions and open problems.Comment: 34 pages, 19 figures, 9 tables 110 reference

    Survivable Cloud Network Mapping for Disaster Recovery Support

    Get PDF
    Network virtualization is a key provision for improving the scalability and reliability of cloud computing services. In recent years, various mapping schemes have been developed to reserve VN resources over substrate networks. However, many cloud providers are very concerned about improving service reliability under catastrophic disaster conditions yielding multiple system failures. To address this challenge, this work presents a novel failure region-disjoint VN mapping scheme to improve VN mapping survivability. The problem is first formulated as a mixed integer linear programming problem and then two heuristic solutions are proposed to compute a pair of failure region-disjoint VN mappings. The solution also takes into account mapping costs and load balancing concerns to help improve resource efficiencies. The schemes are then analyzed in detail for a variety of networks and their overall performances compared to some existing survivable VN mapping scheme

    Logical topology design for IP rerouting: ASONs versus static OTNs

    Get PDF
    IP-based backbone networks are gradually moving to a network model consisting of high-speed routers that are flexibly interconnected by a mesh of light paths set up by an optical transport network that consists of wavelength division multiplexing (WDM) links and optical cross-connects. In such a model, the generalized MPLS protocol suite could provide the IP centric control plane component that will be used to deliver rapid and dynamic circuit provisioning of end-to-end optical light paths between the routers. This is called an automatic switched optical (transport) network (ASON). An ASON enables reconfiguration of the logical IP topology by setting up and tearing down light paths. This allows to up- or downgrade link capacities during a router failure to the capacities needed by the new routing of the affected traffic. Such survivability against (single) IP router failures is cost-effective, as capacity to the IP layer can be provided flexibly when necessary. We present and investigate a logical topology optimization problem that minimizes the total amount or cost of the needed resources (interfaces, wavelengths, WDM line-systems, amplifiers, etc.) in both the IP and the optical layer. A novel optimization aspect in this problem is the possibility, as a result of the ASON, to reuse the physical resources (like interface cards and WDM line-systems) over the different network states (the failure-free and all the router failure scenarios). We devised a simple optimization strategy to investigate the cost of the ASON approach and compare it with other schemes that survive single router failures

    Circuit design and analysis for on-FPGA communication systems

    No full text
    On-chip communication system has emerged as a prominently important subject in Very-Large- Scale-Integration (VLSI) design, as the trend of technology scaling favours logics more than interconnects. Interconnects often dictates the system performance, and, therefore, research for new methodologies and system architectures that deliver high-performance communication services across the chip is mandatory. The interconnect challenge is exacerbated in Field-Programmable Gate Array (FPGA), as a type of ASIC where the hardware can be programmed post-fabrication. Communication across an FPGA will be deteriorating as a result of interconnect scaling. The programmable fabrics, switches and the specific routing architecture also introduce additional latency and bandwidth degradation further hindering intra-chip communication performance. Past research efforts mainly focused on optimizing logic elements and functional units in FPGAs. Communication with programmable interconnect received little attention and is inadequately understood. This thesis is among the first to research on-chip communication systems that are built on top of programmable fabrics and proposes methodologies to maximize the interconnect throughput performance. There are three major contributions in this thesis: (i) an analysis of on-chip interconnect fringing, which degrades the bandwidth of communication channels due to routing congestions in reconfigurable architectures; (ii) a new analogue wave signalling scheme that significantly improves the interconnect throughput by exploiting the fundamental electrical characteristics of the reconfigurable interconnect structures. This new scheme can potentially mitigate the interconnect scaling challenges. (iii) a novel Dynamic Programming (DP)-network to provide adaptive routing in network-on-chip (NoC) systems. The DP-network architecture performs runtime optimization for route planning and dynamic routing which, effectively utilizes the in-silicon bandwidth. This thesis explores a new horizon in reconfigurable system design, in which new methodologies and concepts are proposed to enhance the on-FPGA communication throughput performance that is of vital importance in new technology processes
    • …
    corecore