470 research outputs found

    Optimal fault-tolerant routings with small routing tables for k-connected graphs

    Get PDF
    AbstractWe study the problem of designing fault-tolerant routings with small routing tables for a k-connected network of n processors in the surviving route graph model. The surviving route graph R(G,ρ)/F for a graph G, a routing ρ and a set of faults F is a directed graph consisting of nonfaulty nodes of G with a directed edge from a node x to a node y iff there are no faults on the route from x to y. The diameter of the surviving route graph could be one of the fault-tolerance measures for the graph G and the routing ρ and it is denoted by D(R(G,ρ)/F). We want to reduce the total number of routes defined in the routing, and the maximum of the number of routes defined for a node (called route degree) as least as possible. In this paper, we show that we can construct a routing λ for every n-node k-connected graph such that n⩾2k2, in which the route degree is O(kn), the total number of routes is O(k2n) and D(R(G,λ)/F)⩽3 for any fault set F(|F|<k). In particular, in the case that k=2 we can construct a routing λ′ for every biconnected graph in which the route degree is O(n), the total number of routes is O(n) and D(R(G,λ′)/{f})⩽3 for any fault f. We also show that we can construct a routing ρ1 for every n-node biconnected graph, in which the total number of routes is O(n) and D(R(G,ρ1)/{f})⩽2 for any fault f, and a routing ρ2 (using ρ1) for every n-node biconnected graph, in which the route degree is O(n), the total number of routes is O(nn) and D(R(G,ρ2)/{f})⩽2 for any fault f

    Routing on the Channel Dependency Graph:: A New Approach to Deadlock-Free, Destination-Based, High-Performance Routing for Lossless Interconnection Networks

    Get PDF
    In the pursuit for ever-increasing compute power, and with Moore's law slowly coming to an end, high-performance computing started to scale-out to larger systems. Alongside the increasing system size, the interconnection network is growing to accommodate and connect tens of thousands of compute nodes. These networks have a large influence on total cost, application performance, energy consumption, and overall system efficiency of the supercomputer. Unfortunately, state-of-the-art routing algorithms, which define the packet paths through the network, do not utilize this important resource efficiently. Topology-aware routing algorithms become increasingly inapplicable, due to irregular topologies, which either are irregular by design, or most often a result of hardware failures. Exchanging faulty network components potentially requires whole system downtime further increasing the cost of the failure. This management approach becomes more and more impractical due to the scale of today's networks and the accompanying steady decrease of the mean time between failures. Alternative methods of operating and maintaining these high-performance interconnects, both in terms of hardware- and software-management, are necessary to mitigate negative effects experienced by scientific applications executed on the supercomputer. However, existing topology-agnostic routing algorithms either suffer from poor load balancing or are not bounded in the number of virtual channels needed to resolve deadlocks in the routing tables. Using the fail-in-place strategy, a well-established method for storage systems to repair only critical component failures, is a feasible solution for current and future HPC interconnects as well as other large-scale installations such as data center networks. Although, an appropriate combination of topology and routing algorithm is required to minimize the throughput degradation for the entire system. This thesis contributes a network simulation toolchain to facilitate the process of finding a suitable combination, either during system design or while it is in operation. On top of this foundation, a key contribution is a novel scheduling-aware routing, which reduces fault-induced throughput degradation while improving overall network utilization. The scheduling-aware routing performs frequent property preserving routing updates to optimize the path balancing for simultaneously running batch jobs. The increased deployment of lossless interconnection networks, in conjunction with fail-in-place modes of operation and topology-agnostic, scheduling-aware routing algorithms, necessitates new solutions to solve the routing-deadlock problem. Therefore, this thesis further advances the state-of-the-art by introducing a novel concept of routing on the channel dependency graph, which allows the design of an universally applicable destination-based routing capable of optimizing the path balancing without exceeding a given number of virtual channels, which are a common hardware limitation. This disruptive innovation enables implicit deadlock-avoidance during path calculation, instead of solving both problems separately as all previous solutions

    NoCo: ILP-based worst-case contention estimation for mesh real-time manycores

    Get PDF
    Manycores are capable of providing the computational demands required by functionally-advanced critical applications in domains such as automotive and avionics. In manycores a network-on-chip (NoC) provides access to shared caches and memories and hence concentrates most of the contention that tasks suffer, with effects on the worst-case contention delay (WCD) of packets and tasks' WCET. While several proposals minimize the impact of individual NoC parameters on WCD, e.g. mapping and routing, there are strong dependences among these NoC parameters. Hence, finding the optimal NoC configurations requires optimizing all parameters simultaneously, which represents a multidimensional optimization problem. In this paper we propose NoCo, a novel approach that combines ILP and stochastic optimization to find NoC configurations in terms of packet routing, application mapping, and arbitration weight allocation. Our results show that NoCo improves other techniques that optimize a subset of NoC parameters.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness under grant TIN2015- 65316-P and the HiPEAC Network of Excellence. It also received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (agreement No. 772773). Carles Hernández is jointly supported by the MINECO and FEDER funds through grant TIN2014-60404-JIN. Jaume Abella has been partially supported by the Spanish Ministry of Economy and Competitiveness under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717. Enrico Mezzetti has been partially supported by the Spanish Ministry of Economy and Competitiveness under Juan de la Cierva-Incorporaci´on postdoctoral fellowship number IJCI-2016-27396.Peer ReviewedPostprint (author's final draft

    Proxy Circuits for Fault-Tolerant Primitive Interfacing in Reconfigurable Devices Targeting Extreme Environments

    Get PDF
    Continuous interface access to device-level primitives in reconfigurable devices in extreme environments is key to reliable operation. However, it is possible for a primitive's interface controller, which is static to be rendered non-operational by a permanent damage in the controller's circuitry. In order to mitigate this, this paper proposes the use of relocatable proxy circuits to provide remote interfacing capability to primitives from anywhere on a reconfigurable device. A demonstration with device register read controller shows that an improvement in fault-tolerance can be achieved

    Fault-tolerant networks-on-chip routing with coarse and fine-grained look-ahead

    Get PDF
    Fault tolerance and adaptive capabilities are challenges for modern networks-on-chip (NoC) due to the increase in physical defects in advanced manufacturing processes. Two novel adaptive routing algorithms, namely coarse and fine-grained (FG) look-ahead algorithms, are proposed in this paper to enhance 2-D mesh/torus NoC system fault-tolerant capabilities. These strategies use fault flag codes from neighboring nodes to obtain the status or conditions of real-time traffic in an NoC region, then calculate the path weights and choose the route to forward packets. This approach enables the router to minimize congestion for the adjacent connected channels and also to bypass a path with faulty channels by looking ahead at distant neighboring router paths. The novelty of the proposed routing algorithms is the weighted path selection strategies, which make near-optimal routing decisions to maintain the NoC system performance under high fault rates. Results show that the proposed routing algorithms can achieve performance improvement compared to other state of the art works under various traffic loads and high fault rates. The routing algorithm with FG look-ahead capability achieves a higher throughput compared with the coarse-grained approach under complex fault patterns. The hardware area/power overheads of both routing approaches are relatively low which does not prohibit scalability for large-scale NoC implementations

    On Achieving the Shortest-Path Routing in 2-D Meshes

    Get PDF
    corecore