Search CORE

1,008 research outputs found

Sorting Integers on the AP1000

Author: Lynes Andrew
Weaver Lex
Publication venue
Publication date: 23/04/2000
Field of study

Sorting is one of the classic problems of computer science. Whilst well understood on sequential machines, the diversity of architectures amongst parallel systems means that algorithms do not perform uniformly on all platforms. This document describes the implementation of a radix based algorithm for sorting positive integers on a Fujitsu AP1000 Supercomputer, which was constructed as an entry in the Joint Symposium on Parallel Processing (JSPP) 1994 Parallel Software Contest (PSC94). Brief consideration is also given to a full radix sort conducted in parallel across the machine.Comment: 1994 Project Report, 23 page

arXiv.org e-Print Archive

CERN Document Server

A parallel algorithm for global routing

Author: Banerjee Prithviraj
Brouwer Randall J.
Publication venue
Publication date
Field of study

A Parallel Hierarchical algorithm for Global Routing (PHIGURE) is presented. The router is based on the work of Burstein and Pelavin, but has many extensions for general global routing and parallel execution. Main features of the algorithm include structured hierarchical decomposition into separate independent tasks which are suitable for parallel execution and adaptive simplex solution for adding feedthroughs and adjusting channel heights for row-based layout. Alternative decomposition methods and the various levels of parallelism available in the algorithm are examined closely. The algorithm is described and results are presented for a shared-memory multiprocessor implementation

NASA Technical Reports Server

Integration of tools for the Design and Assessment of High-Performance, Highly Reliable Computing Systems (DAHPHRS), phase 1

Author: Baker R.
Frank G.
Gray G.
Scheper C.
Yalamanchili S.
Publication venue
Publication date
Field of study

Systems for Space Defense Initiative (SDI) space applications typically require both high performance and very high reliability. These requirements present the systems engineer evaluating such systems with the extremely difficult problem of conducting performance and reliability trade-offs over large design spaces. A controlled development process supported by appropriate automated tools must be used to assure that the system will meet design objectives. This report describes an investigation of methods, tools, and techniques necessary to support performance and reliability modeling for SDI systems development. Models of the JPL Hypercubes, the Encore Multimax, and the C.S. Draper Lab Fault-Tolerant Parallel Processor (FTPP) parallel-computing architectures using candidate SDI weapons-to-target assignment algorithms as workloads were built and analyzed as a means of identifying the necessary system models, how the models interact, and what experiments and analyses should be performed. As a result of this effort, weaknesses in the existing methods and tools were revealed and capabilities that will be required for both individual tools and an integrated toolset were identified

NASA Technical Reports Server

Parallel Kalman Filtering on the Connection Machine

Author: Krecker Donald K
Palis Michael A
Publication venue: ScholarlyCommons
Publication date: 01/11/1990
Field of study

A parallel algorithm for square root Kalman filtering is developed and implemented on the Connection Machine (CM). The algorithm makes efficient use of parallel prefix or scan operations which are primitive instructions in the CM. Performance measurements show that the CM filter runs in time linear in the state vector size. This represents a great improvement over serial implementations which run in cubic time. A specific multiple target tracking application is also considered, in which several targets (e.g., satellites, aircrafts and missiles) are to be tracked simultaneously, each requiring one or more filters. A parallel algorithm is developed which, for fixed size filters, runs in constant time, independent of the number of filters simultaneously processed

ScholarlyCommons@Penn

Multiphase complete exchange: A theoretical analysis

Author: Bokhari Shahid H.
Publication venue
Publication date
Field of study

Complete Exchange requires each of N processors to send a unique message to each of the remaining N-1 processors. For a circuit switched hypercube with N = 2(sub d) processors, the Direct and Standard algorithms for Complete Exchange are optimal for very large and very small message sizes, respectively. For intermediate sizes, a hybrid Multiphase algorithm is better. This carries out Direct exchanges on a set of subcubes whose dimensions are a partition of the integer d. The best such algorithm for a given message size m could hitherto only be found by enumerating all partitions of d. The Multiphase algorithm is analyzed assuming a high performance communication network. It is proved that only algorithms corresponding to equipartitions of d (partitions in which the maximum and minimum elements differ by at most 1) can possibly be optimal. The run times of these algorithms plotted against m form a hull of optimality. It is proved that, although there is an exponential number of partitions, (1) the number of faces on this hull is Theta(square root of d), (2) the hull can be found in theta(square root of d) time, and (3) once it has been found, the optimal algorithm for any given m can be found in Theta(log d) time. These results provide a very fast technique for minimizing communication overhead in many important applications, such as matrix transpose, Fast Fourier transform, and ADI

NASA Technical Reports Server

Hierarchical gate-array routing on a hypercube multiprocessor

Author: Mudge Trevor N.
Olukotun O. A.
Publication venue: 'Elsevier BV'
Publication date: 01/04/1990
Field of study

Gate-arrays are the most common design style for semicustom VLSI integrated circuits. An important part of the gate-array design process is the routing of wires between the logic elements, which is an extremely compute-intensive operation. This paper presents an algorithm for routing gate-arrays that uses a hypercube connected parallel processor to provide the necessary computation power. In order to make optimal use of the hypercube, the routing algorithm is organized so that interprocessor communication is kept to minimum. It occurs only during the global routing and crossing placement phases of the algorithm, which constitute less than 15% of the total running time of the algorithm. On the basis of the results of executing the algorithm on two gate-array benchmarks the case is made for using hypercube multiprocessors as accelerators for compute-intensive CAD operations.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/28650/1/0000466.pd

Deep Blue Documents at the University of Michigan

A class of hierarchical graphs as topologies for interconnection networks

Author: Hsu H.-C.
Lai P.-L.
Stewart I.A.
Tsai C.-H.
Publication venue: Elsevier
Publication date: 01/06/2010
Field of study

We study some topological and algorithmic properties of a recently defined hierarchical interconnection network, the hierarchical crossed cube HCC(k,n), which draws upon constructions used within the well-known hypercube and also the crossed cube. In particular, we study: the construction of shortest paths between arbitrary vertices in HCC(k,n); the connectivity of HCC(k,n); and one-to-all broadcasts in parallel machines whose underlying topology is HCC(k,n) (with both one-port and multi-port store-and-forward models of communication). Moreover, (some of) our proofs are applicable not just to hierarchical crossed cubes but to hierarchical interconnection networks formed by replacing crossed cubes with other families of interconnection networks. As such, we provide a generic construction with accompanying generic results relating to some topological and algorithmic properties of a wide range of hierarchical interconnection networks

Durham Research Online

Elsevier - Publisher Connector

Parallel Computation on Hypercube-Like Machines.

Author: Kwon Kyung Hee
Publication venue: LSU Digital Commons
Publication date: 01/01/1991
Field of study

The hypercube interconnection network has been recognized to be very suitable for a parallel computing architecture due to its attractive topological properties. Recently, several modified hypercubes have been propose to improve the performance of a hypercube. This dissertation deals with two modified hypercubes, the X-hypercube and the Z-cube. The X-hypercube is a variant of the hypercube, with the same amount of hardware but a diameter of only

\lceil

(n + 1)/2

\rceil

in a hypercube of dimension n. The Z-cube has only 75 percent of the edges of a hypercube with the same number vertices and the same diameter as the hypercube. In this dissertation, we investigate some topological properties and the effectiveness of the X-hypercube and the Z-cube in their combinatorial and computational aspects. We give the optimal or nearly optimal data communication algorithms including routing, broadcasting, and census function for the X-hypercube and the Z-cube. We also give the optimal embedding algorithms between the X-hypercube and the hypercube. It is shown that the average distance between vertices in a X-hypercube is roughly 13/16 of that in a hypercube. This implies that a X-hypercube achieves the better average communication performance than a hypercube. In addition, a set of fundamental SIMD algorithms for a X-hypercube is given. Our results indicate that the X-hypercube makes an improvement in performance over the hypercube, but not as much as the reduction in a diameter, and the Z-cube is a good alternative for the hypercube as far as the VLSI implementation is of major concern

Louisiana State University