36 research outputs found
On the design of a high-performance adaptive router for CC-NUMA multiprocessors
Copyright © 2003 IEEEThis work presents the design and evaluation of an adaptive packet router aimed at supporting CC-NUMA traffic. We exploit a simple and efficient packet injection mechanism to avoid deadlock, which leads to a fully adaptive routing by employing only three virtual channels. In addition, we selectively use output buffers for implementing the most utilized virtual paths in order to reduce head-of-line blocking. The careful implementation of these features has resulted in a good trade off between network performance and hardware cost. The outcome of this research is a High-Performance Adaptive Router (HPAR), which adequately balances the needs of parallel applications: minimal network latency at low loads and high throughput at heavy loads. The paper includes an evaluation process in which HPAR is compared with other adaptive routers using FIFO input buffering, with or without additional virtual channels to reduce head-of-line blocking. This evaluation contemplates both the VLSI costs of each router and their performance under synthetic and real application workloads. To make the comparison fair, all the routers use the same efficient deadlock avoidance mechanism. In all the experiments, HPAR exhibited the best response among all the routers tested. The throughput gains ranged from 10 percent to 40 percent in respect to its most direct rival, which employs more hardware resources. Other results shown that HPAR achieves up to 83 percent of its theoretical maximum throughput under random traffic and up to 70 percent when running real applications. Moreover, the observed packet latencies were comparable to those exhibited by simpler routers. Therefore, HPAR can be considered as a suitable candidate to implement packet interchange in next generations of CC-NUMA multiprocessors.Valentín Puente, José-Ángel Gregorio, Ramón Beivide, and Cruz Iz
Distance-hereditary embeddings of circulant graphs
©2003 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.In this paper we present a distance-hereditary decomposition of optimal chordal rings of 2k2 nodes into a set of rings of 2k nodes, where k is the diameter. All the rings belonging to this set have the same length and their diameter corresponds to the diameter of the chordal ring in which they are embedded. The members of this embedded set of rings are non-disjoint and preserve the minimal routing of the original circulant graph. Besides its practical consequences, our research allows the presentation of these optimal circulant graphs as a particular evolution of the traditional ring topology.Carmen Martinez, Beivide Beivide, Jaime Gutierrez, [Maria] Cruz Iz
The case of chaotic routing revisited
This paper presents a new evaluation of the Chaos router, a cut-through non-minimal adaptive router, which was reported to reach 95% of its theoretical throughput limit, at the time where most router proposals only reached 60 to 80%. We will revisit the Chaos router design, provide a new vision of its strengths and relate them to the state-of-the-art in adaptive router design. In particular, our analysis has identified a parameter of the router design that was not emphasized in the network evaluation presented by their authors, but that is the key to its outstanding performance. This parameter is the channel operation mode. By using the links in half-duplex mode, it allows adjacent network nodes to allocate their bandwidth to one or the other direction in response to the traffic needs. This channel operation mode reduces base latency and increases network throughput compared to full duplex mode for most synthetic traffic patterns.Cruz Izu, Ramon Beivide and Jose Angel Gregori
A new derivative of midimew-connected mesh network
In this paper, we present a derivative of Midimew connected
Mesh Network (MMN) by reassigning the free links for higher level interconnection for the optimum performance of the MMN; called Derived MMN (DMMN). We present the architecture of DMMN, addressing of nodes, routing of message and evaluate the static network performance. It is shown that the proposed DMMN possesses several attractive features,
including constant degree, small diameter, low cost, small average distance, moderate bisection width, and same fault tolerant performance than that of other conventional and hierarchical interconnection networks. With the same node degree, arc connectivity, bisection width, and wiring
complexity, the average distance of the DMMN is lower than that of other networks
Necessary and Sufficient Conditions for Deadlock-free Networks
In this paper we develop a new and generic theory about the necessary and sufficient conditions for deadlock-free routing in the interconnection networks An extension of the channel dependency graph described by Dally is defined, the channel dynamic dependency graph. The main achievement of this new concept is consecuence of introducing the concept of time and the flow control function in its definition. Our theory remains valid for different routing and flow control functions showing that even if Duato 's theorem conditions are not fulfilled the network can be deadlock-free. Index Terms - Multicomputer networks, deadlock, flow control, routing 1 Introduction Many recent experimental and commercial parallel computers [13] use direct networks for low latency, high bandwidth interprocessor communication. The typical direct networks are k-ary n-cube structures [7], which are cubes with dimension n and k nodes in each dimension. Rings, meshes and tori are included in this class of networ..
Parallel Simulation of Message Routing Networks
A realistic model of a message router designed to be used in a distributed memory parallel computer is presented. The behaviour of this model is analyzed using two discrete-event simulators: a classical, sequential one and a parallel one that uses a conservative synchronization strategy. The use of parallel simulators allows us to run faster simulations, if a parallel computer is available. Different factors that improve the performance of the parallel simulation are discussed, focusing in the model under study and the available computer: a network of transputers. These factors are the load of the model being simulated, the grain size of the simulator and the simulator ability to exploit the lookahead property of the model. 1 Introduction Our research group is interested in the analysis and design of message routers to be used in distributed memory multicomputer systems. The effectiveness of our proposals has, of course, to be demonstrated. Three possibilities are commonly considered..
Dense Gaussian networks: Suitable topologies for on-chip multiprocessors
This paper explores the suitability of dense circulant graphs of degree four for the design of on-chip interconnection networks. Networks based on these graphs reduce the Torus diameter in a factor √2, which translates into significant performance gains for unicast traffic. In addition, they are clearly superior to Tori when managing collective communications. This paper introduces a new two-dimensional node’s labeling of the networks explored which simplifies their analysis and exploitation. In particular, it provides simple and optimal solutions to two important architectural issues: routing and broadcasting. Other implementation issues such as network folding and scalability by using hierarchical networks are also explored in this work.Carmen Martínez, Enrique Vallejo, Ramón Beivide, Cruz Izu and Miquel Moret