Search CORE

5,851 research outputs found

OFAR-CM: Efficient Dragonfly networks with simple congestion management

Author: Beivide Palacio Ramon
García Marina
Rodríguez Germán
Valero Cortés Mateo
Vallejo Enrique
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Dragonfly networks are appealing topologies for large-scale Data center and HPC networks, that provide high throughput with low diameter and moderate cost. However, they are prone to congestion under certain frequent traffic patterns that saturate specific network links. Adaptive non-minimal routing can be used to avoid such congestion. That kind of routing employs longer paths to circumvent local or global congested links. However, if a distance-based deadlock avoidance mechanism is employed, more Virtual Channels (VCs) are required, what increases design complexity and cost. OFAR (On-the-Fly Adaptive Routing) is a previously proposed routing that decouples VCs from deadlock avoidance, making local and global misrouting affordable. However, the severity of congestion with OFAR is higher, as it relies on an escape sub network with low bisection bandwidth. Additionally, OFAR allows for unlimited misroutings on the escape sub network, leading to unbounded paths in the network and long latencies. In this paper we propose and evaluate OFAR-CM, a variant of OFAR combined with a simple congestion management (CM) mechanism which only relies on local information, specifically the credit count of the output ports in the local router. With simple escape sub networks such as a Hamiltonian ring or a tree, OFAR outperforms former proposals with distance-based deadlock avoidance. Additionally, although long paths are allowed in theory, in practice packets arrive at their destination in a small number of hops. Altogether, OFAR-CM constitutes the first practicable mechanism to the date that supports both local and global misrouting in Dragonfly networks.The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement n. ERC-2012-Adg-321253- RoMoL, the Spanish Ministry of Science under contracts TIN2010-21291-C02-02, TIN2012-34557, and by the European HiPEAC Network of Excellence. M. García participated in this work while affiliated with the University of Cantabria.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

Data-Mining Synthesised Schedulers for Hard Real-Time Systems

Author: Kloukinas C.
Verimag G.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2004
Field of study

The analysis of hard real-time systems, traditionally performed using RMA/PCP or simulation, is nowadays also studied as a scheduler synthesis problem, where one automatically constructs a scheduler which can guarantee avoidance of deadlock and deadline-miss system states. Even though this approach has the potential for a finer control of a hard real-time system, using fewer resources and easily adapting to further quality aspects (memory/energy consumption, jitter minimisation, etc.), synthesised schedulers are usually extremely large and difficult to understand. Their big size is a consequence of their inherent precision, since they attempt to describe exactly the frontier among the safe and unsafe system states. It nevertheless hinders their application in practise, since it is extremely difficult to validate them or to use them for better understanding the behaviour of the system. In this paper, we show how one can adapt data-mining techniques to decrease the size of a synthesised scheduler and force its inherent structure to appear, thus giving the system designer a wealth of additional information for understanding and optimising the scheduler and the underlying system. We present, in particular, how it can be used for obtaining hints for a good task distribution to different processing units, for optimising the scheduler itself (sometimes even removing it altogether in a safe manner) and obtaining both per-task and per-system views of the schedulability of the system

City Research Online

APEnet+: a 3D toroidal network enabling Petaflops scale Lattice QCD simulations on commodity clusters

Author: Ammendola Roberto
Biagioni Andrea
Cicero Francesca Lo
Frezza Ottorino
Lonardo Alessandro
Paolucci Pier
Petronzio Roberto
Rossetti Davide
Salamon Andrea
Salina Gaetano
Simula Francesco
Tantalo Nazario
Tosoratto Laura
Vicini Piero
Publication venue
Publication date: 01/01/2010
Field of study

Many scientific computations need multi-node parallelism for matching up both space (memory) and time (speed) ever-increasing requirements. The use of GPUs as accelerators introduces yet another level of complexity for the programmer and may potentially result in large overheads due to the complex memory hierarchy. Additionally, top-notch problems may easily employ more than a Petaflops of sustained computing power, requiring thousands of GPUs orchestrated with some parallel programming model. Here we describe APEnet+, the new generation of our interconnect, which scales up to tens of thousands of nodes with linear cost, thus improving the price/performance ratio on large clusters. The project target is the development of the Apelink+ host adapter featuring a low latency, high bandwidth direct network, state-of-the-art wire speeds on the links and a PCIe X8 gen2 host interface. It features hardware support for the RDMA programming model and experimental acceleration of GPU networking. A Linux kernel driver, a set of low-level RDMA APIs and an OpenMPI library driver are available, allowing for painless porting of standard applications. Finally, we give an insight of future work and intended developments

arXiv.org e-Print Archive

ART

Composable Models for Timing and Liveness Analysis in Distributed Real-Time Embedded Systems Middleware

Author: Gill Christopher
Sanchez Cesar
Sipma Henry B.
Subramonian Venika
Publication venue: Washington University Open Scholarship
Publication date: 23/11/2005
Field of study

Middleware for distributed real-time embedded (DRE) systems has grown increasingly complex, to address functional and temporal requirements of diverse applications. While current approaches to modeling middleware have eased the task of assembling, deploying and conﬁguring middleware and the applications that use it, a lower-level set of formal models is needed to uncover subtle timing and liveness hazards introduced by interference between and within distributed computations, particularly in the face of alternative middleware concurrency strategies. In this paper, we propose timed automata as a formal model of low-level middleware building blocks from which a variety different middleware conﬁgurations can be constructed. When combined with analysis techniques such as model checking, this formal model can help developers in verifying the correctness of various middleware conﬁgurations with respect to the timing and liveness constraints of each particular application

Washington University St. Louis: Open Scholarship