Search CORE

53 research outputs found

CLEX: Yet Another Supercomputer Architecture?

Author: Lenzen Christoph
Wattenhofer Roger
Publication venue
Publication date: 01/01/2016
Field of study

We propose the CLEX supercomputer topology and routing scheme. We prove that CLEX can utilize a constant fraction of the total bandwidth for point-to-point communication, at delays proportional to the sum of the number of intermediate hops and the maximum physical distance between any two nodes. Moreover, % applying an asymmetric bandwidth assignment to the links, all-to-all communication can be realized

(1+o(1))

-optimally both with regard to bandwidth and delays. This is achieved at node degrees of

n^{\varepsilon}

, for an arbitrary small constant

\varepsilon\in (0,1]

. In contrast, these results are impossible in any network featuring constant or polylogarithmic node degrees. Through simulation, we assess the benefits of an implementation of the proposed communication strategy. Our results indicate that, for a million processors, CLEX can increase bandwidth utilization and reduce average routing path length by at least factors

10

respectively

5

in comparison to a torus network. Furthermore, the CLEX communication scheme features several other properties, such as deadlock-freedom, inherent fault-tolerance, and canonical partition into smaller subsystems

arXiv.org e-Print Archive

MPG.PuRe

Adaptive Routing Strategies for Modern High Performance Networks

Author: Patrick Geoffray
Torsten Hoefler
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Today’s scalable high-performance applications heavily depend on the bandwidth characteristics of their commu-nication patterns. Contemporary multi-stage interconnec-tion networks suffer from network contention which might decrease application performance. Our experiments show that the effective bisection bandwidth of a non-blocking 512-node Clos network is as low as 38 % if the network is routed statically. In this paper, we propose and ana-lyze different adaptive routing schemes for those networks. We chose Myrinet/MX to implement our proposed routing schemes. Our best adaptive routing scheme is able to in-crease the effective bisection bandwidth to 77 % for 512 nodes and 100 % for smaller node counts. Thus, we show that our proposed adaptive routing schemes are able to im-prove network throughput significantly.

CiteSeerX

Crossref

Performance Evaluation of Checkpoint/Restart Techniques

Author: Azeem Basma Abdel
Helal Manal
Publication venue
Publication date: 29/11/2023
Field of study

Distributed applications running on a large cluster environment, such as the cloud instances will have shorter execution time. However, the application might suffer from sudden termination due to unpredicted computing node failures, thus loosing the whole computation. Checkpoint/restart is a fault tolerance technique used to solve this problem. In this work we evaluated the performance of two of the most commonly used checkpoint/restart techniques (Distributed Multithreaded Checkpointing (DMTCP) and Berkeley Lab Checkpoint/Restart library (BLCR) integrated into the OpenMPI framework). We aimed to test their validity and evaluate their performance in both local and Amazon Elastic Compute Cloud (EC2) environments. The experiments were conducted on Amazon EC2 as a well-known proprietary cloud computing service provider. Results obtained were reported and compared to evaluate checkpoint and restart time values, data scalability and compute processes scalability. The findings proved that DMTCP performs better than BLCR for checkpoint and restart speed, data scalability and compute processes scalability experiments

arXiv.org e-Print Archive

Benchmarking the computation and communication performance of the CM-5

Author: Geoffrey Fox
Kivanc Dincer
Sanjay Ranka
Zeki Bozkus
Publication venue: 'Wiley'
Publication date: 01/01/2002
Field of study

Crossref

Performance Evaluation and Implementation of two Adaptive Routing Algorithms for XGFT Networks

Author: Kariniemi Heikki
Nurmi Jari
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 20/02/2012
Field of study

EXtended Generalized Fat Trees (XGFT) are Bidirectional Multistage Interconnection Networks (BMIN). They are more scalable for different system sizes and different performance requirements than fat trees from which they have evolved. The improved scalability has been achieved by allowing switches with different number of ports to be used in different switch stages of these hierarchical networks. XGFTs can be constructed from two separate networks for routing packets upwards and downwards in the XGFT. These up-routing and down-routing networks can be implemented separately with small switches which are connected to each other within the switch nodes of the XGFT. This kind of XGFT achieves higher performance if its topmost root switches are connected to each other with additional links, and if adaptive Turn-Back-When-Possible (TBWP) routing algorithm is used instead of shortest-path routing algorithms. This paper shows that the TBWP has always simple and feasible hardware implementations independently of the structure of the XGFT. This is achieved by address space encoding which eliminates complex computations from the routing decision functions. This paper presents also a new shortest-path routing algorithm named Turn-Back (TB). The~TB~algorithm was designed for such XGFT implementations where the up-routing and down-routing of the packets is performed with one larger switch block within the switch nodes, and where shortest-path routing produces good performance. It is shown in this paper that the TBWP and TB route packets correctly to their destinations. In addition, the performances of the routing algorithms are evaluated with simulations and compared. Simulation results show that the TB is able to produce higher performance than the TBWP with different traffic patterns. They also show that the performance of the XGFTs could be improved by suitable mapping of the communicating

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

On random wiring in practicable folded clos networks for modern datacenters

Author: Beivide Palacio Ramón
Camarero Coterillo Cristobal
Martínez Fernández María del Carmen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2018
Field of study

Big scale, high performance and fault-tolerance, low-cost and graceful expandability are pursued features in current datacenter networks (DCN). Although there have been many proposals for DCNs, most modern installations are equipped with classical folded Clos networks. Recently, regular random topologies, as the Jellyfish, have been proposed for DCNs. However, their completely unstructured nature entails serious design problems. In this paper we propose Random Folded Clos (RFC) and Hydra networks in which the interconnection between certain switches levels is made randomly. Both RFCs and Hydras preserve important properties of Clos networks that provide a straightforward deadlock-free multi-path routing. The proposed networks leverage randomness to be gracefully expandable, thereby allowing for fine grain upgrading. RFCs and Hydras are compared in the paper, in topological and cost terms, against fat-trees, orthogonal fat-trees and random regular networks. Also, experiments are carried out to simulate their performance under synthetic traffic patterns emulating common loads present in warehouse scale computers. These theoretical and empirical studies reveal the interest of these topologies, concluding that Hydra constitutes a practicable alternative to current datacenter networks since it appropriately balance all the main design requirements. Moreover, Hydras perform better than the fat-trees, their natural competitor, being able to connect the same or more computing nodes with significant lower cost and latency while exhibiting comparable throughput. © 1990-2012 IEEE

UCrea

SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures

Author: Fernandez Ivan
Giannoula Christina
Goumas Georgios
Gómez-Luna Juan
Karakostas Vasileios
Koziris Nectarios
Mutlu Onur
Orosa Lois
Papadopoulou Nikela
Vijaykumar Nandita
Publication venue
Publication date: 13/02/2021
Field of study

Near-Data-Processing (NDP) architectures present a promising way to alleviate data movement costs and can provide significant performance and energy benefits to parallel applications. Typically, NDP architectures support several NDP units, each including multiple simple cores placed close to memory. To fully leverage the benefits of NDP and achieve high performance for parallel workloads, efficient synchronization among the NDP cores of a system is necessary. However, supporting synchronization in many NDP systems is challenging because they lack shared caches and hardware cache coherence support, which are commonly used for synchronization in multicore systems, and communication across different NDP units can be expensive. This paper comprehensively examines the synchronization problem in NDP systems, and proposes SynCron, an end-to-end synchronization solution for NDP systems. SynCron adds low-cost hardware support near memory for synchronization acceleration, and avoids the need for hardware cache coherence support. SynCron has three components: 1) a specialized cache memory structure to avoid memory accesses for synchronization and minimize latency overheads, 2) a hierarchical message-passing communication protocol to minimize expensive communication across NDP units of the system, and 3) a hardware-only overflow management scheme to avoid performance degradation when hardware resources for synchronization tracking are exceeded. We evaluate SynCron using a variety of parallel workloads, covering various contention scenarios. SynCron improves performance by 1.27