Search CORE

13 research outputs found

A general, fault tolerant, adaptive, deadlock-free routing protocol for network-on-chip

Author: Abadal S.
Alarcón E.
Colle Didier
Pickavet Mario
Stroobant Pieter
Tavernier Wouter
Publication venue
Publication date: 01/01/2018
Field of study

A General, Fault tolerant, Adaptive, Deadlock-free Routing Protocol for Network-on-chip

Author: Abadal Sergi
Alarcón Eduard
Colle Didier
Pickavet Mario
Stroobant Pieter
Tavernier Wouter
Publication venue
Publication date: 01/01/2018
Field of study

The paper presents a topology-agnostic greedy protocol for network-on-chip routing. The proposed routing algorithm can tolerate any number of permanent faults, and is proven to be deadlock-free. We introduce a specialized variant of the algorithm, which is optimized for 2D mesh networks, both flat and wireless. The adaptiveness and minimality of several variants this algorithm are analyzed through graph-based simulations.Comment: Presented at 11th International Workshop on Network on Chip Architectures (NoCArc 2018

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC

Ghent University Academic Bibliography

A General, Fault tolerant, Adaptive, Deadlock-free Routing Protocol for Network-on-chip

Author: Abadal Sergi
Alarcón Eduard
Colle Didier
Pickavet Mario
Stroobant Pieter
Tavernier Wouter
Publication venue
Publication date: 01/01/2018
Field of study

arXiv.org e-Print Archive

Crossref

Wageningen University & Research Publications

Online Research Database In Technology

Fail-in-Place Network Design: Interaction Between Topology, Routing Algorithm and Failures

Author: Jens Domke
Satoshi Matsuoka
Torsten Hoefler
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/12/2014
Field of study

Abstract—The growing system size of high performance com-puters results in a steady decrease of the mean time between failures. Exchanging network components often requires whole system downtime which increases the cost of failures. In this work, we study a fail-in-place strategy where broken network elements remain untouched. We show, that a fail-in-place strategy is feasible for todays networks and the degradation is manageable, and provide guidelines for the design. Our network failure simulation toolchain allows system designers to extrapolate the performance degradation based on expected failure rates, and it can be used to evaluate the current state of a system. In a case study of real-world HPC systems, we will analyze the performance degradation throughout the systems lifetime under the assumption that faulty network components are not repaired, which results in a recommendation to change the used routing algorithm to improve the network performance as well as the fail-in-place characteristic. Keywords—Network design, network simulations, network man-agement, fail-in-place, routing protocols, fault tolerance, availability I

CiteSeerX

Crossref

Slim Fly: A Cost Effective Low-Diameter Network Topology

Author: Maciej Besta
Torsten Hoefler
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Abstract—We introduce a high-performance cost-effective net-work topology called Slim Fly that approaches the theoretically optimal network diameter. Slim Fly is based on graphs that approximate the solution to the degree-diameter problem. We analyze Slim Fly and compare it to both traditional and state-of-the-art networks. Our analysis shows that Slim Fly has significant advantages over other topologies in latency, bandwidth, resiliency, cost, and power consumption. Finally, we propose deadlock-free routing schemes and physical layouts for large computing centers as well as a detailed cost and power model. Slim Fly enables constructing cost effective and highly resilient datacenter and HPC networks that offer low latency and high bandwidth under different HPC workloads such as stencil or graph computations. I

CiteSeerX

Crossref

Teichien sogo ketsugomo no tame no sukeraburuna rutingu shuho

Author: Kawano Ryuta
カワノリュウタ
河野隆太
Publication venue: 慶應義塾大学大学院理工学研究科
Publication date
Field of study

KeiO Academic Resource Archive

Performance evaluation of different routing algorithms in network on chip

Author: Singh J K
Publication venue
Publication date: 02/06/2014
Field of study

Network on Chip (NoC) is one of the efficient on-chip communication architecture for System on Chip (SoC) where a large number of computational and storage blocks are integrated on a single chip. NoCs have tackled the disadvantages of SoCs as well as they are scalable. But an efficient routing algorithm can enhance the performance of NoC. In one chapter of the thesis three different types of routing algorithms are compared i.e. XY, OE, and DyAD. XY routing algorithm is a distributed deterministic algorithm. Odd-Even (OE) routing algorithm is distributed adaptive routing algorithm with deadlock-free ability. DyAD is a smart routing algorithm which combines the features of both deterministic and adaptive routing. In another chapter of thesis three different types of deadlock free routing algorithms are compared i.e. one deterministic routing (XY routing algorithm), three partially adaptive routing (West first, North last and Negative first) and two adaptive routing (DyXY, OE) are being compared with % of load for various traffic patterns. In another chapter of thesis, a fault tolerant algorithm is described and its performance is compared with all the deadlock free routing algorithms in a NoC having link faults and node faults. All these simulation is done in NIRGAM 2.1 simulator which is a cycle accurate systemC based simulator

ethesis@nitr

Routing on the Channel Dependency Graph:: A New Approach to Deadlock-Free, Destination-Based, High-Performance Routing for Lossless Interconnection Networks

Author: Domke Jens
Publication venue
Publication date: 16/06/2017
Field of study

In the pursuit for ever-increasing compute power, and with Moore's law slowly coming to an end, high-performance computing started to scale-out to larger systems. Alongside the increasing system size, the interconnection network is growing to accommodate and connect tens of thousands of compute nodes. These networks have a large influence on total cost, application performance, energy consumption, and overall system efficiency of the supercomputer. Unfortunately, state-of-the-art routing algorithms, which define the packet paths through the network, do not utilize this important resource efficiently. Topology-aware routing algorithms become increasingly inapplicable, due to irregular topologies, which either are irregular by design, or most often a result of hardware failures. Exchanging faulty network components potentially requires whole system downtime further increasing the cost of the failure. This management approach becomes more and more impractical due to the scale of today's networks and the accompanying steady decrease of the mean time between failures. Alternative methods of operating and maintaining these high-performance interconnects, both in terms of hardware- and software-management, are necessary to mitigate negative effects experienced by scientific applications executed on the supercomputer. However, existing topology-agnostic routing algorithms either suffer from poor load balancing or are not bounded in the number of virtual channels needed to resolve deadlocks in the routing tables. Using the fail-in-place strategy, a well-established method for storage systems to repair only critical component failures, is a feasible solution for current and future HPC interconnects as well as other large-scale installations such as data center networks. Although, an appropriate combination of topology and routing algorithm is required to minimize the throughput degradation for the entire system. This thesis contributes a network simulation toolchain to facilitate the process of finding a suitable combination, either during system design or while it is in operation. On top of this foundation, a key contribution is a novel scheduling-aware routing, which reduces fault-induced throughput degradation while improving overall network utilization. The scheduling-aware routing performs frequent property preserving routing updates to optimize the path balancing for simultaneously running batch jobs. The increased deployment of lossless interconnection networks, in conjunction with fail-in-place modes of operation and topology-agnostic, scheduling-aware routing algorithms, necessitates new solutions to solve the routing-deadlock problem. Therefore, this thesis further advances the state-of-the-art by introducing a novel concept of routing on the channel dependency graph, which allows the design of an universally applicable destination-based routing capable of optimizing the path balancing without exceeding a given number of virtual channels, which are a common hardware limitation. This disruptive innovation enables implicit deadlock-avoidance during path calculation, instead of solving both problems separately as all previous solutions

Technische Universität Dresden: Qucosa