70 research outputs found

    An Efficient Implementation of Distributed Routing Algorithms for NoCs

    Full text link
    The design of NoCs for multi-core chips introduces new design constraints like power consumption, area, and ul-tra low latencies. Although 2D meshes are preferred, het-erogeneous blocks, fabrication faults, reliability issues, and chip virtualization may lead to the need of irregular topolo-gies or regions. In this situation, efficient routing becomes a challenge. Although the use of routing tables at switches is flexible, it does not scale in terms of latency and area due to its memory requirements. LBDR (Logic-Based Distributed Routing) is proposed as a new routing method that removes the need of using rout-ing tables at all. LBDR enables the implementation of many routing algorithms on most of the practical topologies we might find in the near future in a multi-core system. From an initial topology and routing algorithm, a set of three bits per switch/output port is computed. Evaluation results show that, by using a small logic, LBDR mimics the performance of routing algorithms when implemented with routing ta-bles, both in regular and irregular topologies.

    Edinet: An Execution Driven Interconnection Network Simulator for DSM Systems

    Get PDF
    Abstract. Evaluation studies on interconnection networks for distributed memory multiprocessors usually assume synthetic or trace-driven workloads. However, when the final design choices must be done a more precise evaluation study should be performed. In this paper, we describe a new execution-driven simulation tool to evaluate interconnection networks for distributed memory multiprocessors using real application workloads. As an example, we have developed a NCC-NUMA memory model and obtained some simulation results from the SPLASH-2 suite, using different network routing algorithms

    Boosting Ethernet Performance by Segment-Based Routing

    Full text link
    Ethernet is turning out to be a cost-effective solution for building Cluster networks offering compatibility, simpli-city, high bandwidth, scalability and a good performance-to-cost ratio. Nevertheless, Ethernet still makes inefficient use of network resources (links) and suffers from long fail-ure recovery time due to the lack of a suitable routing algo-rithm. In this paper we embed an efficient routing algorithm into 802.3 Ethernet technology, making it possible to use off-the-shelf equipment to build high-performance and cost-effective Ethernet clusters, with an efficient use of link band-width and with fault tolerant capabilities. The algorithm, referred to as Segment-Based Routing (SR), is a determinis-tic routing algorithm that achieves high performance with-out the need for virtual channels (not available in Ether-net). Moreover, SR is topology agnostic, meaning it can be applied to any topology, and tolerates any combination of faults derived from the original topology when combined with static reconfiguration. Through simulations we ver-ify an overall improvement in throughput by a factor of 1.2 to 10.0 when compared to the conventional Ethernet rou-ting algorithm, the Spanning Tree Protocol (STP), and other topology agnostic routing algorithms such as Up*/Down* and Tree-based Turn-prohibition, the last one being recently proposed for Ethernet.

    RECN-IQ: A Cost-Effective Input-Queued Switch Architecture with Congestion Management

    Full text link
    As the number of computing and storage nodes keeps in-creasing, the interconnection network is becoming a key element of many computing and communication systems, where the overall performance directly depends on network performance. This performance may dramatically drop during congestion situations. Although congestion may be avoided by overdimensioning the network, the current trend is to reduce overall cost and power consumption by reduc-ing the number of network components. Thus, the network will be prone to congestion, thereby becoming mandatory the use of congestion management techniques. In that sense, the technique known as Regional Explicit Congestion Notification (RECN) completely eliminates the Head-of-Line (HOL) blocking produced by congested pack

    Segment-based routing: an efficient fault-tolerant routing algorithm for meshes and tori

    Full text link
    Computers get faster every year, but the demand for computing resources seems to grow at an even faster rate. Depending on the problem domain, this demand for more power can be satisfied by either, massively parallel com-puters, or clusters of computers. Common for both ap-proaches is the dependence on high performance intercon-nect networks such as Myrinet, Infiniband, or 10 Giga-bit Ethernet. While high throughput and low latency are key features of interconnection networks, the issue of fault-tolerance is now becoming increasingly important. As the number of network components grows so does the probabil-ity for failure, thus it becomes important to also consider the fault-tolerance mechanism of interconnection networks. The main challenge then lies in combining performance and fault-tolerance, while still keeping cost and complexity low. This paper proposes a new deterministic routing method-ology for tori and meshes, which achieves high performance without the use of virtual channels. Furthermore, it is topol-ogy agnostic in nature, meaning it can handle any topol-ogy derived from any combination of faults when combined with static reconfiguration. The algorithm, referred to as Segment-based Routing (SR), works by partitioning a topol-ogy into subnets, and subnets into segments. This allows us to place bidirectional turn restrictions locally within a seg-ment. As segments are independent, we gain the freedom to place turn restrictions within a segment independently from other segments. This results in a larger degree of freedom when placing turn restrictions compared to other routing strategies. In this paper a way to compute segment-based routing tables is presented and applied to meshes and tori. Evalua-tion results show that SR increases performance by a factor of 1.8 over FX and up*/down * routing. ∗This work was supported by the Spanish CICYT under Gran

    Efficient Routing in Heterogeneous SoC Designs with Small Implementation Overhead

    Get PDF
    In application-specific SoCs, the irregularity of the topology ends up in a complex and customized implementation of the routing algorithm, usually relying on routing tables implemented with memory structures at source end nodes. As system size increases, the routing tables also increase in size with nonnegligible impact on power, area, and latency overheads. In this paper, we present a routing implementation for application-specific SoCs able to implement in an efficient manner (with no routing tables and using a small logic block in every switch) a deadlock-free routing algorithm in these irregular networks. The mechanism relies on a tool that maps the initial irregular topology of the SoC system into a logical regular structure where the mechanism can be applied. We provide details for both the mapping tool and the proposed routing mechanism. Evaluation results show the effectiveness of the mapping tool as well as the low area and timing requirements of the mechanism. With the mapping tool and the routing mechanism, complex irregular SoC topologies can now be supported without the need for routing tables

    Enabling dynamic and intelligent workflows for HPC, data analytics, and AI convergence

    Get PDF
    The evolution of High-Performance Computing (HPC) platforms enables the design and execution of progressively larger and more complex workflow applications in these systems. The complexity comes not only from the number of elements that compose the workflows but also from the type of computations they perform. While traditional HPC workflows target simulations and modelling of physical phenomena, current needs require in addition data analytics (DA) and artificial intelligence (AI) tasks. However, the development of these workflows is hampered by the lack of proper programming models and environments that support the integration of HPC, DA, and AI, as well as the lack of tools to easily deploy and execute the workflows in HPC systems. To progress in this direction, this paper presents use cases where complex workflows are required and investigates the main issues to be addressed for the HPC/DA/AI convergence. Based on this study, the paper identifies the challenges of a new workflow platform to manage complex workflows. Finally, it proposes a development approach for such a workflow platform addressing these challenges in two directions: first, by defining a software stack that provides the functionalities to manage these complex workflows; and second, by proposing the HPC Workflow as a Service (HPCWaaS) paradigm, which leverages the software stack to facilitate the reusability of complex workflows in federated HPC infrastructures. Proposals presented in this work are subject to study and development as part of the EuroHPC eFlows4HPC project.This work has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 955558. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Spain, Germany, France, Italy, Poland, Switzerland and Norway. In Spain, it has received complementary funding from MCIN/AEI/10.13039/501100011033, Spain and the European Union NextGenerationEU/PRTR (contracts PCI2021-121957, PCI2021-121931, PCI2021-121944, and PCI2021-121927). In Germany, it has received complementary funding from the German Federal Ministry of Education and Research (contracts 16HPC016K, 6GPC016K, 16HPC017 and 16HPC018). In France, it has received financial support from Caisse des dépôts et consignations (CDC) under the action PIA ADEIP (project Calculateurs). In Italy, it has been preliminary approved for complimentary funding by Ministero dello Sviluppo Economico (MiSE) (ref. project prop. 2659). In Norway, it has received complementary funding from the Norwegian Research Council, Norway under project number 323825. In Switzerland, it has been preliminary approved for complimentary funding by the State Secretariat for Education, Research, and Innovation (SERI), Norway. In Poland, it is partially supported by the National Centre for Research and Development under decision DWM/EuroHPCJU/4/2021. The authors also acknowledge financial support by MCIN/AEI /10.13039/501100011033, Spain through the “Severo Ochoa Programme for Centres of Excellence in R&D” under Grant CEX2018-000797-S, the Spanish Government, Spain (contract PID2019-107255 GB) and by Generalitat de Catalunya, Spain (contract 2017-SGR-01414). Anna Queralt is a Serra Húnter Fellow.With funding from the Spanish government through the ‘Severo Ochoa Centre of Excellence’ accreditation (CEX2018-000797-S)
    corecore