Search CORE

618 research outputs found

Routing with locality in partitioned-bus meshes

Author: Cheung Steven
Lau Francis CM
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1994
Field of study

We show that adding partitioned-buses (as opposed to long buses that span an entire row or column) to ordinary meshes can reduce the routing time by approximately one-third for permutation routing with locality. A matching time lower bound is also proved. The result can be generalized to multi-packet routing.published_or_final_versio

HKU Scholars Hub

OutFlank Routing: Increasing Throughput in Toroidal Interconnection Networks

Author: Versaci Francesco
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/10/2013
Field of study

We present a new, deadlock-free, routing scheme for toroidal interconnection networks, called OutFlank Routing (OFR). OFR is an adaptive strategy which exploits non-minimal links, both in the source and in the destination nodes. When minimal links are congested, OFR deroutes packets to carefully chosen intermediate destinations, in order to obtain travel paths which are only an additive constant longer than the shortest ones. Since routing performance is very sensitive to changes in the traffic model or in the router parameters, an accurate discrete-event simulator of the toroidal network has been developed to empirically validate OFR, by comparing it against other relevant routing strategies, over a range of typical real-world traffic patterns. On the 16x16x16 (4096 nodes) simulated network OFR exhibits improvements of the maximum sustained throughput between 14% and 114%, with respect to Adaptive Bubble Routing.Comment: 9 pages, 5 figures, to be presented at ICPADS 201

arXiv.org e-Print Archive

Crossref

AMR on the CM-2

Author: Berger Marsha J.
Saltzman Jeff S.
Publication venue
Publication date
Field of study

We describe the development of a structured adaptive mesh algorithm (AMR) for the Connection Machine-2 (CM-2). We develop a data layout scheme that preserves locality even for communication between fine and coarse grids. On 8K of a 32K machine we achieve performance slightly less than 1 CPU of the Cray Y-MP. We apply our algorithm to an inviscid compressible flow problem

NASA Technical Reports Server

Simulation Of Multi-core Systems And Interconnections And Evaluation Of Fat-Mesh Networks

Author: Zhang Yu
Publication venue
Publication date: 28/01/2009
Field of study

Simulators are very important in computer architecture research as they enable the exploration of new architectures to obtain detailed performance evaluation without building costly physical hardware. Simulation is even more critical to study future many-core architectures as it provides the opportunity to assess currently non-existing computer systems. In this thesis, a multiprocessor simulator is presented based on a cycle accurate architecture simulator called SESC. The shared L2 cache system is extended into a distributed shared cache (DSC) with a directory-based cache coherency protocol. A mesh network module is extended and integrated into SESC to replace the bus for scalable inter-processor communication. While these efforts complete an extended multiprocessor simulation infrastructure, two interconnection enhancements are proposed and evaluated. A novel non-uniform fat-mesh network structure similar to the idea of fat-tree is proposed. This non-uniform mesh network takes advantage of the average traffic pattern, typically all-to-all in DSC, to dedicate additional links for connections with heavy traffic (e.g., near the center) and fewer links for lighter traffic (e.g., near the periphery). Two fat-mesh schemes are implemented based on different routing algorithms. Analytical fat-mesh models are constructed by presenting the expressions for the traffic requirements of personalized all-to-all traffic. Performance improvements over the uniform mesh are demonstrated in the results from the simulator. A hybrid network consisting of one packet switching plane and multiple circuit switching planes is constructed as the second enhancement. The circuit switching planes provide fast paths between neighbors with heavy communication traffic. A compiler technique that abstracts the symbolic expressions of benchmarks' communication patterns can be used to help facilitate the circuit establishment

D-Scholarship@Pitt

On chip interconnects for multiprocessor turbo decoding architectures

Author: A. Baghdadi
Angiolini
Bahl
Benedetto
Benedetto
Benes
Benini
Boutillon
Cooley
Dinoi
Dobkin
Dolinar
Douillard
G. Masera
H. Moussa
Imase
Imase
Kwak
M. Martina
Martina
Masera
Opferman
Papaharalabos
Tarable
Viterbi
Wang
Zhang
Publication venue: Elsevier
Publication date: 01/01/2011
Field of study

International audienc

Crossref

HAL-Université de Bretagne Occidentale

HAL Descartes

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Introduction to a system for implementing neural net connections on SIMD architectures

Author: Tomboulian Sherryl
Publication venue
Publication date
Field of study

Neural networks have attracted much interest recently, and using parallel architectures to simulate neural networks is a natural and necessary application. The SIMD model of parallel computation is chosen, because systems of this type can be built with large numbers of processing elements. However, such systems are not naturally suited to generalized communication. A method is proposed that allows an implementation of neural network connections on massively parallel SIMD architectures. The key to this system is an algorithm permitting the formation of arbitrary connections between the neurons. A feature is the ability to add new connections quickly. It also has error recovery ability and is robust over a variety of network topologies. Simulations of the general connection system, and its implementation on the Connection Machine, indicate that the time and space requirements are proportional to the product of the average number of connections per neuron and the diameter of the interconnection network

NASA Technical Reports Server

Magic-State Functional Units: Mapping and Scheduling Multi-Level Distillation Circuits for Fault-Tolerant Quantum Architectures

Author: Chong Frederic T.
Ding Yongshan
Franklin Diana
Holmes Adam
Javadi-Abhari Ali
Martonosi Margaret
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/09/2018
Field of study

Quantum computers have recently made great strides and are on a long-term path towards useful fault-tolerant computation. A dominant overhead in fault-tolerant quantum computation is the production of high-fidelity encoded qubits, called magic states, which enable reliable error-corrected computation. We present the first detailed designs of hardware functional units that implement space-time optimized magic-state factories for surface code error-corrected machines. Interactions among distant qubits require surface code braids (physical pathways on chip) which must be routed. Magic-state factories are circuits comprised of a complex set of braids that is more difficult to route than quantum circuits considered in previous work [1]. This paper explores the impact of scheduling techniques, such as gate reordering and qubit renaming, and we propose two novel mapping techniques: braid repulsion and dipole moment braid rotation. We combine these techniques with graph partitioning and community detection algorithms, and further introduce a stitching algorithm for mapping subgraphs onto a physical machine. Our results show a factor of 5.64 reduction in space-time volume compared to the best-known previous designs for magic-state factories.Comment: 13 pages, 10 figure

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref