Search CORE

29 research outputs found

New Fault Tolerant Multicast Routing Techniques to Enhance Distributed-Memory Systems Performance

Author: Shaheen Masoud Esmail Masoud
Publication venue: The Aquila Digital Community
Publication date: 01/12/2013
Field of study

Distributed-memory systems are a key to achieve high performance computing and the most favorable architectures used in advanced research problems. Mesh connected multicomputer are one of the most popular architectures that have been implemented in many distributed-memory systems. These systems must support communication operations efficiently to achieve good performance. The wormhole switching technique has been widely used in design of distributed-memory systems in which the packet is divided into small flits. Also, the multicast communication has been widely used in distributed-memory systems which is one source node sends the same message to several destination nodes. Fault tolerance refers to the ability of the system to operate correctly in the presence of faults. Development of fault tolerant multicast routing algorithms in 2D mesh networks is an important issue. This dissertation presents, new fault tolerant multicast routing algorithms for distributed-memory systems performance using wormhole routed 2D mesh. These algorithms are described for fault tolerant routing in 2D mesh networks, but it can also be extended to other topologies. These algorithms are a combination of a unicast-based multicast algorithm and tree-based multicast algorithms. These algorithms works effectively for the most commonly encountered faults in mesh networks, f-rings, f-chains and concave fault regions. It is shown that the proposed routing algorithms are effective even in the presence of a large number of fault regions and large size of fault region. These algorithms are proved to be deadlock-free. Also, the problem of fault regions overlap is solved. Four essential performance metrics in mesh networks will be considered and calculated; also these algorithms are a limited-global-information-based multicasting which is a compromise of local-information-based approach and global-information-based approach. Data mining is used to validate the results and to enlarge the sample. The proposed new multicast routing techniques are used to enhance the performance of distributed-memory systems. Simulation results are presented to demonstrate the efficiency of the proposed algorithms

Aquila Digital Community

Recommended from our members

Torus routing in the presence of multicasts

Author: Ishibashi Hiroki
Publication venue: CSUSB ScholarWorks
Publication date: 01/01/1996
Field of study

CSUSB ScholarWorks

Efficient All-to-All Collective Communication Schedules for Direct-Connect Topologies

Author: Basu Prithwish
Fantl Jason
Khoury Joud
Krishnamurthy Arvind
Pal Siddharth
Zhao Liangyu
Publication venue
Publication date: 23/09/2023
Field of study

The all-to-all collective communications primitive is widely used in machine learning (ML) and high performance computing (HPC) workloads, and optimizing its performance is of interest to both ML and HPC communities. All-to-all is a particularly challenging workload that can severely strain the underlying interconnect bandwidth at scale. This is mainly because of the quadratic scaling in the number of messages that must be simultaneously serviced combined with large message sizes. This paper takes a holistic approach to optimize the performance of all-to-all collective communications on supercomputer-scale direct-connect interconnects. We address several algorithmic and practical challenges in developing efficient and bandwidth-optimal all-to-all schedules for any topology, lowering the schedules to various backends and fabrics that may or may not expose additional forwarding bandwidth, establishing an upper bound on all-to-all throughput, and exploring novel topologies that deliver near-optimal all-to-all performance

arXiv.org e-Print Archive

Time-Step Optimal Broadcasting in 3-D Meshes with Minimum Total Communication Distance

Author: Dally
Duato
Jie Wu
Johnsson
Koeninger
Lamport
Lin
Nelson
Ramanathan
Songluan Cang
Suh
Tsai
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

On the Potential of NoC Virtualization for Multicore Chips

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Crossref

Efficient Multicast Algorithms for Mesh and Torus Networks

Author: Malani Ankit
Publication venue
Publication date: 20/12/2012
Field of study

With the increasing popularity of multicomputers, efficient way of communication within its processors has become a popular area of research. Multicomputers refer to a computer system that has multiple processors, they have high computational power and they can perform multiple tasks concurrently. Mesh and Torus are some of the commonly used network topologies in building multicomputer systems. Their performance highly depends on the underlying network communication such as multicast. Multicast is a communication method in which a message is sent from a source node to a certain number of destinations. Two major parameters used to evaluate multicast are time that a multicast process takes to deliver the message to all destinations and traffic that indicates the number of links used for this process. Research indicates that in general, it is NP- complete to find an optimal multicasting algorithm which is efficient on both time and traffic. This thesis suggests two new algorithms to achieve multicast in mesh and torus networks. Extensive simulations of these algorithms show that in practice they perform better than existing ones

Concordia University Research Repository

High-Speed Message Routing Mechanisms for Massively Parallel Computers

Author: Kazumi Tsutada
蔦田和美
Publication venue
Publication date: 06/12/2017
Field of study

現在超並列処理システム(MPP)は、伝統的なベクトルプロセッサやSIMDマシンの牙城であった多くの分野に進出している。これらのシステムは、入手が容易な高性能 CPUの急激な進歩をうまく利用し、これらを数百～数千個接続して均質なマルチプロセッサのシステムとして構成したものである。しかし、これらのシステムの性能は、現実の問題を解くときは必ずしも良くなく、常に公称の最高性能にははるかに及ばないのが現状である。これらのシステムではプロセッサ間の通信はすべて相互結合網によって行われるので、実現可能な最高性能を決める決定的な要素は相互結合網と、それに使われる通信機構である。本論文ではMPPの相互結合網に使われる、効率的な通信機構を実現する2つの方法を提案する。第1は「特急ルータ」の提案であり、これを相互結合網に用いた場合の適合性を検註する。特急ルータは多重の単方向レジスタ挿入パスを利用して、時間空間混合分割型ネットワークを実現するためのものである。異なる基数や次元数について、特急ルータのスイッチ回路とバッファ回路の性能を予測するための正確なモデルを開発した。この結果、特急ルータは効率的な通信を行うためのすべての条件を満足していることが確かめられた。さらに重要な点は、特急ルータはネットワークに故障のある場合や、通信が錯綜する場合にも、低遅延時間、高スループットを損なわない経路制御が行えることである。シミュレーションによって評価した特急ルータのの性能は、これまでに発表された固定経路選択方式のルータより優れており、また他の適応経路制御方式のルータに比べても、同程度あるいはそれを越えていることが確かめられた。第2は経路長制限方式のマルチキャスト通信の提案である。マルチキャスト通信は多くの並列処理問題において速度向上に寄与する通信方式である。そこでワームホール通信方式において問題となるマルチキャスト通信におけるデッドロックの問題について研究した。そしてこの問題を解決する方法として経路長制限方式のマルチキャスト通信を提案し、この方式による通信性能をシミュレーションによって評価し、ユニキャスト方式やマルチパス方式によるマルチキャスト通信の性能と比較した。その結果、提案する経路長制限方式のマルチキャスト通信は、パリヤ同期のためのクラスタへのマルチキャスト通信や、最近傍ノードへのマルチキャストや全ノードへの放送の場合に、特に優れた解決法となることを明らかにした

Kansai Gaidai University Repository

Institutional Repositories DataBase (IRDB)

Tokushima University Institutional Repository

High-Speed Message Routing Mechanisms for Massively Parallel Computers

Author: Flavell Andrew Colin
Publication venue
Publication date: 06/12/2017
Field of study

Tokushima University Institutional Repository

Efficient mechanisms to provide fault tolerance in interconnection networks for pc clusters

Author: Montañana Aliaga José Miguel
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 21/07/2008
Field of study

Actualmente, los clusters de PC son un alternativa rentable a los computadores paralelos. En estos sistemas, miles de componentes (procesadores y/o discos duros) se conectan a través de redes de interconexión de altas prestaciones. Entre las tecnologías de red actualmente disponibles para construir clusters, InfiniBand (IBA) ha emergido como un nuevo estándar de interconexión para clusters. De hecho, ha sido adoptado por muchos de los sistemas más potentes construidos actualmente (lista top500). A medida que el número de nodos aumenta en estos sistemas, la red de interconexión también crece. Junto con el aumento del número de componentes la probabilidad de averías aumenta dramáticamente, y así, la tolerancia a fallos en el sistema en general, y de la red de interconexión en particular, se convierte en una necesidad. Desafortunadamente, la mayor parte de las estrategias de encaminamiento tolerantes a fallos propuestas para los computadores masivamente paralelos no pueden ser aplicadas porque el encaminamiento y las transiciones de canal virtual son deterministas en IBA, lo que impide que los paquetes eviten los fallos. Por lo tanto, son necesarias nuevas estrategias para tolerar fallos. Por ello, esta tesis se centra en proporcionar los niveles adecuados de tolerancia a fallos a los clusters de PC, y en particular a las redes IBA. En esta tesis proponemos y evaluamos varios mecanismos adecuados para las redes de interconexión para clusters. El primer mecanismo para proporcionar tolerancia a fallos en IBA (al que nos referimos como encaminamiento tolerante a fallos basado en transiciones; TFTR) consiste en usar varias rutas disjuntas entre cada par de nodos origen-destino y seleccionar la ruta apropiada en el nodo fuente usando el mecanismo APM proporcionado por IBA. Consiste en migrar las rutas afectadas por el fallo a las rutas alternativas sin fallos. Sin embargo, con este fin, es necesario un algoritmo eficiente de encaminamiento capaz de proporcionar suficientesMontañana Aliaga, JM. (2008). Efficient mechanisms to provide fault tolerance in interconnection networks for pc clusters [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/2603Palanci

Crossref

RiuNet