Search CORE

2,445 research outputs found

Directed taskgraph scheduling using simulated annealing

Author: D'Hollander Erik
Devis Yves
Publication venue: 'Informa UK Limited'
Publication date: 01/01/1991
Field of study

Measuring and Understanding Throughput of Network Topologies

Author: Godfrey P. Brighten
Jyothi Sangeetha Abdu
Kolla Alexandra
Singla Ankit
Publication venue
Publication date: 14/11/2016
Field of study

High throughput is of particular interest in data center and HPC networks. Although myriad network topologies have been proposed, a broad head-to-head comparison across topologies and across traffic patterns is absent, and the right way to compare worst-case throughput performance is a subtle problem. In this paper, we develop a framework to benchmark the throughput of network topologies, using a two-pronged approach. First, we study performance on a variety of synthetic and experimentally-measured traffic matrices (TMs). Second, we show how to measure worst-case throughput by generating a near-worst-case TM for any given topology. We apply the framework to study the performance of these TMs in a wide range of network topologies, revealing insights into the performance of topologies with scaling, robustness of performance across TMs, and the effect of scattered workload placement. Our evaluation code is freely available

arXiv.org e-Print Archive

CiteSeerX

Crossref

A system for routing arbitrary directed graphs on SIMD architectures

Author: Tomboulian Sherryl
Publication venue
Publication date
Field of study

There are many problems which can be described in terms of directed graphs that contain a large number of vertices where simple computations occur using data from connecting vertices. A method is given for parallelizing such problems on an SIMD machine model that is bit-serial and uses only nearest neighbor connections for communication. Each vertex of the graph will be assigned to a processor in the machine. Algorithms are given that will be used to implement movement of data along the arcs of the graph. This architecture and algorithms define a system that is relatively simple to build and can do graph processing. All arcs can be transversed in parallel in time O(T), where T is empirically proportional to the diameter of the interconnection network times the average degree of the graph. Modifying or adding a new arc takes the same time as parallel traversal

NASA Technical Reports Server

A study of the communication cost of the FFT on torus multicomputers

Author: Díaz de Cerio Ripalda Luis Manuel
González Colás Antonio María
Valero García Miguel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1995
Field of study

The computation of a one-dimensional FFT on a c-dimensional torus multicomputer is analyzed. Different approaches are proposed which differ in the way they use the interconnection network. The first approach is based on the multidimensional index mapping technique for the FFT computation. The second approach starts from a hypercube algorithm and then embeds the hypercube onto the torus. The third approach reduces the communication cost of the hypercube algorithm by pipelining the communication operations. A novel methodology to pipeline the communication operations on a torus is proposed. Analytical models are presented to compare the different approaches. This comparison study shows that the best approach depends on the number of dimensions of the torus and the communication start-up and transfer times. The analytical models allow us to select the most efficient approach for the available machine.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor

Author: Rao Hariprasad Nannapaneni
Publication venue
Publication date
Field of study

The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing

NASA Technical Reports Server

Hypercube algorithms on mesh connected multicomputers

Author: Díaz de Cerio Ripalda Luis Manuel
González Colás Antonio María
Valero García Miguel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

A new methodology named CALMANT (CC-cube Algorithms on Meshes and Tori) for mapping a type of algorithm that we call CC-cube algorithm onto multicomputers with hypercube, mesh, or torus interconnection topology is proposed. This methodology is suitable when the initial problem can be expressed as a set of processes that communicate through a hypercube topology (a CC-cube algorithm). There are many important algorithms that fit into the CC-cube type. CALMANT is based on three different techniques: (a) the standard embedding to assign the processes of the algorithm to the nodes of the mesh multicomputer; (b) the communication pipelining technique to increase the level of communication parallelism inherent in the CC-cube algorithms; and (c) optimal message-scheduling algorithms proposed in this work in order to avoid conflicts and minimizing in this way the communication time. Although CALMANT is proposed for multicomputers with different interconnection network topologies, the paper only focuses on the particular case of meshes.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

A parallel simulated annealing algorithm for standard cell placement on a hypercube computer

Author: Jones Mark Howard
Publication venue
Publication date: 01/01/1987
Field of study

A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm

Illinois Digital Environment for Access to Learning and Scholarship Repository

NASA Technical Reports Server

Speeding up multiprocessor machines with reconfigurable optical interconnects - art. no. 61240K

Author: ARTUNDO I
Dambre Joni
DEBAES C
DESMET L
Heirman Wim
Thienpont Hugo
Van Campenhout Jan
Publication venue: SPIE-INT SOCIETY OPTICAL ENGINEERING
Publication date: 01/01/2006
Field of study

Ghent University Academic Bibliography

Hypercube matrix computation task

Author: Calalo R.
Imbriale W.
Liewer P.
Lyons J.
Manshadi F.
Patterson J.
Publication venue
Publication date
Field of study

The Hypercube Matrix Computation (Year 1986-1987) task investigated the applicability of a parallel computing architecture to the solution of large scale electromagnetic scattering problems. Two existing electromagnetic scattering codes were selected for conversion to the Mark III Hypercube concurrent computing environment. They were selected so that the underlying numerical algorithms utilized would be different thereby providing a more thorough evaluation of the appropriateness of the parallel environment for these types of problems. The first code was a frequency domain method of moments solution, NEC-2, developed at Lawrence Livermore National Laboratory. The second code was a time domain finite difference solution of Maxwell's equations to solve for the scattered fields. Once the codes were implemented on the hypercube and verified to obtain correct solutions by comparing the results with those from sequential runs, several measures were used to evaluate the performance of the two codes. First, a comparison was provided of the problem size possible on the hypercube with 128 megabytes of memory for a 32-node configuration with that available in a typical sequential user environment of 4 to 8 megabytes. Then, the performance of the codes was anlyzed for the computational speedup attained by the parallel architecture

NASA Technical Reports Server