47 research outputs found

    Порівняльний аналіз ефективності псевдооптимальних і регулярних топологій мереж на кристалі з використанням Netmaker

    Get PDF
    Approaches to optimizing network communication subsystem on a chip. The simulation psevdooptymalnyh and regular topologies of networks on chip with 9 nodes using System Verilog library Netmaker. The simulation results showed that psevdooptymalni topology for high-performance network design case with so many nodes and connecting lines that can not be achieved using standard regular topologies.Рассмотрены подходы к оптимизации подсистемы связи сетей на кристалле. Проведено моделирование псевдооптимальных и регулярных топологий сетей на кристалле с 9-ю узлами с помощью System Verilog библиотеки Netmaker. Результаты моделирования показали, что псевдооптимальные топологии высокоэффективны для случаев проектирования сетей с таким количеством узлов и соединительных линий, которые не могут быть достигнуты при использовании типовых регулярных топологий.Розглянуто підходи до оптимізації підсистеми зв'язку мереж на кристалі. Проведено моделювання псевдооптимальних та регулярних топологій мереж на кристалі з 9‑ма вузлами за допомогою System Verilog бібліотеки Netmaker. Результати моделювання показали, що псевдооптимальні топологіі високоефективні для випадків проектування мереж із такою кількістю вузлів і з’єднувальних ліній, що не можуть бути досягнуті при використанні типових регулярних топологій

    [[alternative]]Research on Balancing Traffic to Enhance Performance in Meshes

    Get PDF
    計畫編號:NSC89-2213-E032-039研究期間:200008~200107研究經費:401,000[[sponsorship]]行政院國家科學委員

    Effects of injection pressure on network throughput

    Get PDF
    ©2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.Recent parallel systems use multiple injection ports and various injection policies, but little is known about their impact on network performance. This paper evaluates the influence that these injection interfaces have on maximum sustained throughput in adaptive cut-through torus networks by modeling the number of injection queues (1 or 4), and the allocation of new packets to those queues. Network evaluations for medium to large size 2D tori show that designs with multiple injection ports do not improve performance under uniform traffic. On the contrary, they result in more pressure from the injection interface to acquire the scarce network resources of an already clogged system. Interestingly, for small networks, a single injection FIFO queue, with the HOLB it entails, indirectly provides the much needed injection control. For networks with thousands of nodes and multiple injection channels, as those being implemented in current massively parallel processors, this implicit form of congestion control is not enough. In such systems, restrictive injection policies are required to prevent routers from being flooded with new packets for loads beyond saturation.C. Izu, J. Miguel-Alonso, J.A. Gregori

    Performance analysis of the doubly-linked list protocol family for distributed shared memory systems

    Get PDF
    The 2nd International Conference on Algorithms and Architectures for Parallel Processing, Singapore, 11-13 June 1996The doubly-linked list (DLL) protocol provides a memory efficient, scalable, high-performance and yet easy to implement method to maintain memory coherence in distributed shared memory (DSM) systems. In this paper, the performance analysis of the DLL family of protocols is presented. Theoretically, the DLL protocol with stable owners has the shortest remote memory access latency among the DLL protocol family. According to the simulated performance evaluation, the DLL-S protocol is 65.7% faster than the DDM algorithm for the linear equation solver; and is 16.5% faster for the matrix multiplier. From the trend of the performance figures, it is predicted that the improvement in performance due to the DLL-S protocol will be considerably greater when a larger number of processors are used, indicating that the DLL-S protocol is also the most scalable of the protocols tested.published_or_final_versio

    More Improvement by Helping Ant to Fault-Tolerant Heuristic Routing Algorithm in Mesh Networks

    Get PDF
    Abstract: Routing with fault-tolerant mechanisms has a crucial effect on the fast exchange of information in variety of networks including mesh networks. This study attempts to choose an optimal path in terms of fault tolerance to transmit messages from source to destination while taking into account faulty nodes in such mesh networks. In this study, we take advantage of ant colony optimization algorithm to propose Adaptive Heuristic Routing algorithms to this problem. We use color pheromone ants to overcome problem of fail-recover behavior of network components. The proposed method is compared with fault-tolerant routing algorithm in mesh networks using the balanced ring. Simulation results depict that this method reacted quickly in terms of network faults, meanwhile in each time step the data can choose the optimal path to reach their destination. In this study, we improve performance of the proposed method using update ants to inform other nodes about the discovered shortest path. Simulation results show that the proposed method dramaticcaly increase efficiency of routing mechanism in mesh networks

    A fault-tolerant routing strategy for k-ary n-direct s-indirect topologies based on intermediate nodes

    Full text link
    [EN] Exascale computing systems are being built with thousands of nodes. The high number of components of these systems significantly increases the probability of failure. A key component for them is the interconnection network. If failures occur in the interconnection network, they may isolate a large fraction of the machine. For this reason, an efficient fault-tolerant mechanism is needed to keep the system interconnected, even in the presence of faults. A recently proposed topology for these large systems is the hybrid k-ary n-direct s-indirect family that provides optimal performance and connectivity at a reduced hardware cost. This paper presents a fault-tolerant routing methodology for the k-ary n-direct s-indirect topology that degrades performance gracefully in presence of faults and tolerates a large number of faults without disabling any healthy computing node. In order to tolerate network failures, the methodology uses a simple mechanism. For any source-destination pair, if necessary, packets are forwarded to the destination node through a set of intermediate nodes (without being ejected from the network) with the aim of circumventing faults. The evaluation results shows that the proposed methodology tolerates a large number of faults. For instance, it is able to tolerate more than 99.5% of fault combinations when there are 10 faults in a 3-D network with 1000 nodes using only 1 intermediate node and more than 99.98% if 2 intermediate nodes are used. Furthermore, the methodology offers a gracious performance degradation. As an example, performance degrades only by 1% for a 2-D network with 1024 nodes and 1% faulty links.This work was supported by the Spanish Ministerio de Economía y Competitividad (MINECO), by FEDER funds under Grant TIN2015-66972-C5-1-R, by Programa de Ayudas de Investigación y Desarrollo (PAID) from Universitat Politècnica de alència and by the financial support of the FP7 HiPEAC Network of Excellence under grant agreement 287759Peñaranda Cebrián, R.; Gómez Requena, ME.; López Rodríguez, PJ.; Gran, EG.; Skeie, T. (2017). A fault-tolerant routing strategy for k-ary n-direct s-indirect topologies based on intermediate nodes. Concurrency and Computation Practice and Experience. 29(13):1-11. https://doi.org/10.1002/cpe.4065S111291

    A Dag Based Wormhole Routing Strategy

    Get PDF
    The wormhole routing (WR) technique is replacing the hitherto popular storeand- forward routing in message passing multicomputers. This is because the latter has speed and node size constraints. The wormhole routing is, on the other hand, susceptible to deadlock. A few WR schemes suggested recently in the literature, concentrate on avoiding deadlock. This thesis presents a Directed Acyclic Graph (DAG) based WR technique. At low traffic levels the proposed method follows a minimal path. But the routing is adaptive at higher traffic levels. We prove that the algorithm is deadlock-free. This method is compared for its performance with a deterministic algorithm which is a de facto standard. We also compare its implementation costs with other adaptive routing algorithms and the relative merits and demerits are highlighted in the text

    VLPW: The Very Long Packet Window Architecture for High Throughput Network-On-Chip Router Designs

    Get PDF
    ChipMulti-processor (CMP) architectures have become mainstream for designing processors. With a large number of cores, Network-On-Chip (NOC) provides a scalable communication method for CMPs. NOC must be carefully designed to provide low latencies and high throughput in the resource-constrained environment. To improve the network throughput, we propose the Very Long Packet Window (VLPW) architecture for the NOC router design that tries to close the throughput gap between state-of-the-art on-chip routers and the ideal interconnect fabric. To improve throughput, VLPW optimizes Switch Allocation (SA) efficiency. Existing SA normally applies Round-Robin scheduling to arbitrate among the packets targeting the same output port. However, this simple approach suffers from low arbitration efficiency and incurs low network throughput. Instead of relying solely on simple switch scheduling, the VLPW router design globally schedules all the input packets, resolves the output conflicts and achieves high throughput. With the VLPW architecture, we propose two scheduling schemes: Global Fairness and Global Diversity. Our simulation results show that the VLPW router achieves more than 20% throughput improvement without negative effects on zero-load latency
    corecore