150 research outputs found

    Shared memory with hidden latency on a family of mesh-like networks

    Get PDF

    Aspects of k-k-Routing in Meshes and OTIS Networks

    Get PDF
    Aspects of k-k Routing in Meshes and OTIS-Networks Abstract Efficient data transport in parallel computers build on sparse interconnection networks is crucial for their performance. A basic transport problem in such a computer is the k-k routing problem. In this thesis, aspects of the k-k routing problem on r-dimensional meshes and OTIS-G networks are discussed. The first oblivious routing algorithms for these networks are presented that solve the k-k routing problem in an asymptotically optimal running time and a constant buffer size. Furthermore, other aspects of the k-k routing problem for OTIS-G networks are analysed. In particular, lower bounds for the problem based on the diameter and bisection width of OTIS-G networks are given, and the k-k sorting problem on the OTIS-Mesh is considered. Based on OTIS-G networks, a new class of networks, called Extended OTIS-G networks, is introduced, which have smaller diameters than OTIS-G networks.Für die Leistungfähigkeit von Parallelrechnern, die über ein Verbindungsnetzwerk kommunizieren, ist ein effizienter Datentransport entscheidend. Ein grundlegendes Transportproblem in einem solchen Rechner ist das k-k Routing Problem. In dieser Arbeit werden Aspekte dieses Problems in r-dimensionalen Gittern und OTIS-G Netzwerken untersucht. Es wird der erste vergessliche (oblivious) Routing Algorithmus vorgestellt, der das k-k Routing Problem in diesen Netzwerken in einer asymptotisch optimalen Laufzeit bei konstanter Puffergröße löst. Für OTIS-G Netzwerke werden untere Laufzeitschranken für das untersuchte Problem angegeben, die auf dem Durchmesser und der Bisektionsweite der Netzwerke basieren. Weiterhin wird ein Algorithmus vorgestellt, der das k-k Sorting Problem mit einer Laufzeit löst, die nahe an der Bisektions- und Durchmesserschranke liegt. Basierend auf den OTIS-G Netzwerken, wird eine neue Klasse von Netzwerken eingeführt, die sogenannten Extended OTIS-G Netzwerke, die sich durch einen kleineren Durchmesser von OTIS-G Netzwerken unterscheiden

    The Effect Of Hot Spots On The Performance Of Mesh--Based Networks

    Get PDF
    Direct network performance is affected by different design parameters which include number of virtual channels, number of ports, routing algorithm, switching technique, deadlock handling technique, packet size, and buffer size. Another factor that affects network performance is the traffic pattern. In this thesis, we study the effect of hotspot traffic on system performance. Specifically, we study the effect of hotspot factor, hotspot number, and hot spot location on the performance of mesh-based networks. Simulations are run on two network topologies, both the mesh and torus. We pay more attention to meshes because they are widely used in commercial machines. Comparisons between oblivious wormhole switching and chaotic packet switching are reported. Overall packet switching proved to be more efficient in terms of throughput when compared to wormhole switching. In the case of uniform random traffic, it is shown that the differences between chaotic and oblivious routing are indistinguishable. Networks with low number of hotspots show better performance. As the number of hotspots increases network latency tends to increase. It is shown that when the hotspot factor increases, performance of packet switching is better than that of wormhole switching. It is also shown that the location of hotspots affects network performance particularly with the oblivious routers since their achieved latencies proved to be more vulnerable to changes in the hotspot location. It is also shown that the smaller the size of the network the earlier network saturation occurs. Further, it is shown that the chaos router’s adaptivity is useful in this case. Finally, for tori, performance is not greatly affected by hotspot presence. This is mostly due to the symmetric nature of tori

    Routing on the Channel Dependency Graph:: A New Approach to Deadlock-Free, Destination-Based, High-Performance Routing for Lossless Interconnection Networks

    Get PDF
    In the pursuit for ever-increasing compute power, and with Moore's law slowly coming to an end, high-performance computing started to scale-out to larger systems. Alongside the increasing system size, the interconnection network is growing to accommodate and connect tens of thousands of compute nodes. These networks have a large influence on total cost, application performance, energy consumption, and overall system efficiency of the supercomputer. Unfortunately, state-of-the-art routing algorithms, which define the packet paths through the network, do not utilize this important resource efficiently. Topology-aware routing algorithms become increasingly inapplicable, due to irregular topologies, which either are irregular by design, or most often a result of hardware failures. Exchanging faulty network components potentially requires whole system downtime further increasing the cost of the failure. This management approach becomes more and more impractical due to the scale of today's networks and the accompanying steady decrease of the mean time between failures. Alternative methods of operating and maintaining these high-performance interconnects, both in terms of hardware- and software-management, are necessary to mitigate negative effects experienced by scientific applications executed on the supercomputer. However, existing topology-agnostic routing algorithms either suffer from poor load balancing or are not bounded in the number of virtual channels needed to resolve deadlocks in the routing tables. Using the fail-in-place strategy, a well-established method for storage systems to repair only critical component failures, is a feasible solution for current and future HPC interconnects as well as other large-scale installations such as data center networks. Although, an appropriate combination of topology and routing algorithm is required to minimize the throughput degradation for the entire system. This thesis contributes a network simulation toolchain to facilitate the process of finding a suitable combination, either during system design or while it is in operation. On top of this foundation, a key contribution is a novel scheduling-aware routing, which reduces fault-induced throughput degradation while improving overall network utilization. The scheduling-aware routing performs frequent property preserving routing updates to optimize the path balancing for simultaneously running batch jobs. The increased deployment of lossless interconnection networks, in conjunction with fail-in-place modes of operation and topology-agnostic, scheduling-aware routing algorithms, necessitates new solutions to solve the routing-deadlock problem. Therefore, this thesis further advances the state-of-the-art by introducing a novel concept of routing on the channel dependency graph, which allows the design of an universally applicable destination-based routing capable of optimizing the path balancing without exceeding a given number of virtual channels, which are a common hardware limitation. This disruptive innovation enables implicit deadlock-avoidance during path calculation, instead of solving both problems separately as all previous solutions

    (,k)(\ell,k)-Routing on Plane Grids

    Get PDF
    The packet routing problem plays an essential role in communication networks. It involves how to transfer data from some origins to some destinations within a reasonable amount of time. In the (,k)(\ell,k)-routing problem, each node can send at most \ell packets and receive at most kk packets. Permutation routing is the particular case =k=1\ell=k=1. In the rr-central routing problem, all nodes at distance at most rr from a fixed node vv want to send a packet to vv. In this article we study the permutation routing, the rr-central routing and the general (,k)(\ell,k)-routing problems on plane grids, that is square grids, triangular grids and hexagonal grids. We use the \emph{store-and-forward} Δ\Delta-port model, and we consider both full and half-duplex networks. The main contributions are the following: \begin{itemize} \item[1.] Tight permutation routing algorithms on full-duplex hexagonal grids, and half duplex triangular and hexagonal grids. \item[2.] Tight rr-central routing algorithms on triangular and hexagonal grids. \item[3.] Tight (k,k)(k,k)-routing algorithms on square, triangular and hexagonal grids. \item[4.] Good approximation algorithms (in terms of running time) for (,k)(\ell,k)-routing on square, triangular and hexagonal grids, together with new lower bounds on the running time of any algorithm using shortest path routing. \end{itemize} \noindent All these algorithms are completely distributed, i.e. can be implemented independently at each node. Finally, we also formulate the (,k)(\ell,k)-routing problem as a \textsc{Weighted Edge Coloring} problem on bipartite graphs

    Universal Wormhole Routing

    Get PDF
    In this paper, we examine the wormhole routing problem in terms of the “congestion” c and “dilation” d for a set of packet paths. We show, with mild restrictions, that there is a simple randomized algorithm for routing any set of P packets in O(cdη+cLηlogP) time with high probability, where L is the number of flits in a packet, and η=min{d,L}; only a constant number of flits are stored in each queue at any time. Using this result, we show that a fat-tree network of area Θ(A) can simulate wormhole routing on any network of comparable area with O(log^3 A) slowdown, when all worms have the same length. Variable-length worms are also considered. We run some simulations on the fat-tree which show that not only does wormhole routing tend to perform better than the more heavily studied store-and-forward routing in this context, but that performance superior to our provable bound is attainable in practice

    An optimal permutation routing algorithm on full-duplex hexagonal networks

    Get PDF
    Distributed Computing and NetworkingInternational audienceIn the permutation routing problem, each processor is the origin of at most one packet and the destination of no more than one packet. The goal is to minimize the number of time steps required to route all packets to their respective destinations, under the constraint that each link can be crossed simultaneously by no more than one packet. We study this problem in a hexagonal network, i.e. a finite subgraph of a triangular grid, which is a widely used network in practical applications. We present an optimal distributed permutation routing algorithm on full-duplex hexagonal networks, using the addressing scheme described by F.G. Nocetti, I. Stojmenovic and J. Zhang (IEEE TPDS 13(9): 962-971, 2002). Furthermore, we prove that this algorithm is oblivious and translation invariant
    corecore