Search CORE

80 research outputs found

New Fault Tolerant Multicast Routing Techniques to Enhance Distributed-Memory Systems Performance

Author: Shaheen Masoud Esmail Masoud
Publication venue: The Aquila Digital Community
Publication date: 01/12/2013
Field of study

Distributed-memory systems are a key to achieve high performance computing and the most favorable architectures used in advanced research problems. Mesh connected multicomputer are one of the most popular architectures that have been implemented in many distributed-memory systems. These systems must support communication operations efficiently to achieve good performance. The wormhole switching technique has been widely used in design of distributed-memory systems in which the packet is divided into small flits. Also, the multicast communication has been widely used in distributed-memory systems which is one source node sends the same message to several destination nodes. Fault tolerance refers to the ability of the system to operate correctly in the presence of faults. Development of fault tolerant multicast routing algorithms in 2D mesh networks is an important issue. This dissertation presents, new fault tolerant multicast routing algorithms for distributed-memory systems performance using wormhole routed 2D mesh. These algorithms are described for fault tolerant routing in 2D mesh networks, but it can also be extended to other topologies. These algorithms are a combination of a unicast-based multicast algorithm and tree-based multicast algorithms. These algorithms works effectively for the most commonly encountered faults in mesh networks, f-rings, f-chains and concave fault regions. It is shown that the proposed routing algorithms are effective even in the presence of a large number of fault regions and large size of fault region. These algorithms are proved to be deadlock-free. Also, the problem of fault regions overlap is solved. Four essential performance metrics in mesh networks will be considered and calculated; also these algorithms are a limited-global-information-based multicasting which is a compromise of local-information-based approach and global-information-based approach. Data mining is used to validate the results and to enlarge the sample. The proposed new multicast routing techniques are used to enhance the performance of distributed-memory systems. Simulation results are presented to demonstrate the efficiency of the proposed algorithms

Aquila Digital Community

Quarc: a novel network-on-chip architecture

Author: Moadeli M.
Shahrabi A.
Vanderbauwhede W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

This paper introduces the Quarc NoC, a novel NoC architecture inspired by the Spidergon NoC. The Quarc scheme significantly outperforms the Spidergon NoC through balancing the traffic which is the result of the modifications applied to the topology and the routing elements.The proposed architecture is highly efficient in performing collective communication operations including broadcast and multicast. We present the topology, routing discipline and switch architecture for the Quarc NoC and demonstrate the performance with the results obtained from discrete event simulations

Crossref

Enlighten

ResearchOnline@GCU

Quarc: an architecture for efficient on-chip communication

Author: Moadeli Mahmoud
Publication venue
Publication date: 01/01/2010
Field of study

The exponential downscaling of the feature size has enforced a paradigm shift from computation-based design to communication-based design in system on chip development. Buses, the traditional communication architecture in systems on chip, are incapable of addressing the increasing bandwidth requirements of future large systems. Networks on chip have emerged as an interconnection architecture offering unique solutions to the technological and design issues related to communication in future systems on chip. The transition from buses as a shared medium to networks on chip as a segmented medium has given rise to new challenges in system on chip realm. By leveraging the shared nature of the communication medium, buses have been highly efficient in delivering multicast communication. The segmented nature of networks, however, inhibits the multicast messages to be delivered as efficiently by networks on chip. Relying on extensive research on multicast communication in parallel computers, several network on chip architectures have offered mechanisms to perform the operation, while conforming to resource constraints of the network on chip paradigm. Multicast communication in majority of these networks on chip is implemented by establishing a connection between source and all multicast destinations before the message transmission commences. Establishing the connections incurs an overhead and, therefore, is not desirable; in particular in latency sensitive services such as cache coherence. To address high performance multicast communication, this research presents Quarc, a novel network on chip architecture. The Quarc architecture targets an area-efficient, low power, high performance implementation. The thesis covers a detailed representation of the building blocks of the architecture, including topology, router and network interface. The cost and performance comparison of the Quarc architecture against other network on chip architectures reveals that the Quarc architecture is a highly efficient architecture. Moreover, the thesis introduces novel performance models of complex traffic patterns, including multicast and quality of service-aware communication

Glasgow Theses Service

CiteSeerX

OpenGrey Repository

A performance model of communication in the quarc NoC

Author: Moadeli M.
Shahrabi A.
Vanderbauwhede W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Networks on-chip (NoC) emerged as a promising communication medium for future MPSoC development. To serve this purpose, the NoCs have to be able to efficiently exchange all types of traffic including the collective communications at a reasonable cost. The Quarc NoC is introduced as a NOC which is highly efficient in performing collective communication operations such as broadcast and multicast. This paper presents an introduction to the Quarc scheme and an analytical model to compute the average message latency in the architecture. To validate the model we compare the model latency prediction against the results obtained from discrete-event simulations

Crossref

Enlighten

ResearchOnline@GCU

Quarc: a high-efficiency network on-chip architecture

Author: Maji Partha
Moadeli Mahmoud
Vanderbauwhede Wim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

The novel Quarc NoC architecture, inspired by the Spidergon scheme is introduced as a NoC architecture that is highly efficient in performing collective communication operations including broadcast and multicast. The efficiency of the Quarc architecture is achieved through balancing the traffic which is the result of the modifications applied to the topology and the routing elements of the Spidergon NoC. This paper provides an ASIC implementation of both architectures using UMCpsilas 0.13 mum CMOS technology and demonstrates an analysis and comparison of the cost and performance between the Quarc and the Spidergon NoCs

Crossref

Enlighten

On the performance of broadcast algorithms in interconnection networks

Author: Al-Dubai A.Y.
Ould-Khaoua M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Broadcast Communication is among the most primitive collective capabilities of any message passing network. Broadcast algorithms for the mesh have been widely reported in the literature. However, most existing algorithms have been studied within limited conditions, such as light traffic load and fixed network sizes. In other words, most of these algorithms have not been studied at different Quality of Service (QoS) levels. In contrast, this study examines the broadcast operation, taking into account the scalability, parallelism, a wide range of traffic loads through the propagation of broadcast messages. To the best of our knowledge, this study is the first to consider the issue of broadcast latency at both the network and node levels across different traffic loads. Results are shown from a comparative analysis confirming that the coded-path based broadcast algorithms exhibit superior performance characteristics over some existing algorithms

Enlighten

High-Speed Message Routing Mechanisms for Massively Parallel Computers

Author: Kazumi Tsutada
蔦田和美
Publication venue
Publication date: 06/12/2017
Field of study

現在超並列処理システム(MPP)は、伝統的なベクトルプロセッサやSIMDマシンの牙城であった多くの分野に進出している。これらのシステムは、入手が容易な高性能 CPUの急激な進歩をうまく利用し、これらを数百～数千個接続して均質なマルチプロセッサのシステムとして構成したものである。しかし、これらのシステムの性能は、現実の問題を解くときは必ずしも良くなく、常に公称の最高性能にははるかに及ばないのが現状である。これらのシステムではプロセッサ間の通信はすべて相互結合網によって行われるので、実現可能な最高性能を決める決定的な要素は相互結合網と、それに使われる通信機構である。本論文ではMPPの相互結合網に使われる、効率的な通信機構を実現する2つの方法を提案する。第1は「特急ルータ」の提案であり、これを相互結合網に用いた場合の適合性を検註する。特急ルータは多重の単方向レジスタ挿入パスを利用して、時間空間混合分割型ネットワークを実現するためのものである。異なる基数や次元数について、特急ルータのスイッチ回路とバッファ回路の性能を予測するための正確なモデルを開発した。この結果、特急ルータは効率的な通信を行うためのすべての条件を満足していることが確かめられた。さらに重要な点は、特急ルータはネットワークに故障のある場合や、通信が錯綜する場合にも、低遅延時間、高スループットを損なわない経路制御が行えることである。シミュレーションによって評価した特急ルータのの性能は、これまでに発表された固定経路選択方式のルータより優れており、また他の適応経路制御方式のルータに比べても、同程度あるいはそれを越えていることが確かめられた。第2は経路長制限方式のマルチキャスト通信の提案である。マルチキャスト通信は多くの並列処理問題において速度向上に寄与する通信方式である。そこでワームホール通信方式において問題となるマルチキャスト通信におけるデッドロックの問題について研究した。そしてこの問題を解決する方法として経路長制限方式のマルチキャスト通信を提案し、この方式による通信性能をシミュレーションによって評価し、ユニキャスト方式やマルチパス方式によるマルチキャスト通信の性能と比較した。その結果、提案する経路長制限方式のマルチキャスト通信は、パリヤ同期のためのクラスタへのマルチキャスト通信や、最近傍ノードへのマルチキャストや全ノードへの放送の場合に、特に優れた解決法となることを明らかにした

Kansai Gaidai University Repository

Institutional Repositories DataBase (IRDB)

Tokushima University Institutional Repository

A Dag Based Wormhole Routing Strategy

Author: Roy John
Publication venue: LSU Digital Commons
Publication date: 01/05/1994
Field of study

The wormhole routing (WR) technique is replacing the hitherto popular storeand- forward routing in message passing multicomputers. This is because the latter has speed and node size constraints. The wormhole routing is, on the other hand, susceptible to deadlock. A few WR schemes suggested recently in the literature, concentrate on avoiding deadlock. This thesis presents a Directed Acyclic Graph (DAG) based WR technique. At low traffic levels the proposed method follows a minimal path. But the routing is adaptive at higher traffic levels. We prove that the algorithm is deadlock-free. This method is compared for its performance with a deterministic algorithm which is a de facto standard. We also compare its implementation costs with other adaptive routing algorithms and the relative merits and demerits are highlighted in the text

Louisiana State University

High-Speed Message Routing Mechanisms for Massively Parallel Computers

Author: Flavell Andrew Colin
Publication venue
Publication date: 06/12/2017
Field of study

Tokushima University Institutional Repository

Multicast communication in wormhole-routed star graph interconnection networks

Author: Akers
Chen
Chih-Ping Chu
Dally
Day
Lin
McKinley
McKinley
Nen-Chung Wang
Ni
Qiu
Robinson
Sheu
Sheu
Tseng
Tzung-Shi Chen
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref