2,954 research outputs found
Ordered fast fourier transforms on a massively parallel hypercube multiprocessor
Design alternatives for ordered Fast Fourier Transformation (FFT) algorithms were examined on massively parallel hypercube multiprocessors such as the Connection Machine. Particular emphasis is placed on reducing communication which is known to dominate the overall computing time. To this end, the order and computational phases of the FFT were combined, and the sequence to processor maps that reduce communication were used. The class of ordered transforms is expanded to include any FFT in which the order of the transform is the same as that of the input sequence. Two such orderings are examined, namely, standard-order and A-order which can be implemented with equal ease on the Connection Machine where orderings are determined by geometries and priorities. If the sequence has N = 2 exp r elements and the hypercube has P = 2 exp d processors, then a standard-order FFT can be implemented with d + r/2 + 1 parallel transmissions. An A-order sequence can be transformed with 2d - r/2 parallel transmissions which is r - d + 1 fewer than the standard order. A parallel method for computing the trigonometric coefficients is presented that does not use trigonometric functions or interprocessor communication. A performance of 0.9 GFLOPS was obtained for an A-order transform on the Connection Machine
Sorting Integers on the AP1000
Sorting is one of the classic problems of computer science. Whilst well
understood on sequential machines, the diversity of architectures amongst
parallel systems means that algorithms do not perform uniformly on all
platforms. This document describes the implementation of a radix based
algorithm for sorting positive integers on a Fujitsu AP1000 Supercomputer,
which was constructed as an entry in the Joint Symposium on Parallel Processing
(JSPP) 1994 Parallel Software Contest (PSC94). Brief consideration is also
given to a full radix sort conducted in parallel across the machine.Comment: 1994 Project Report, 23 page
On the average running time of odd-even merge sort
This paper is concerned with the average running time of Batcher's odd-even merge sort when implemented on a collection of processors. We consider the case where , the size of the input, is an arbitrary multiple of the number of processors used. We show that Batcher's odd-even merge (for two sorted lists of length each) can be implemented to run in time on the average, and that odd-even merge sort can be implemented to run in time on the average. In the case of merging (sorting), the average is taken over all possible outcomes of the merging (all possible permutations of elements). That means that odd-even merge and odd-even merge sort have an optimal average running time if . The constants involved are also quite small
Hypercube algorithms on mesh connected multicomputers
A new methodology named CALMANT (CC-cube Algorithms on Meshes and Tori) for mapping a type of algorithm that we call CC-cube algorithm onto multicomputers with hypercube, mesh, or torus interconnection topology is proposed. This methodology is suitable when the initial problem can be expressed as a set of processes that communicate through a hypercube topology (a CC-cube algorithm). There are many important algorithms that fit into the CC-cube type. CALMANT is based on three different techniques: (a) the standard embedding to assign the processes of the algorithm to the nodes of the mesh multicomputer; (b) the communication pipelining technique to increase the level of communication parallelism inherent in the CC-cube algorithms; and (c) optimal message-scheduling algorithms proposed in this work in order to avoid conflicts and minimizing in this way the communication time. Although CALMANT is proposed for multicomputers with different interconnection network topologies, the paper only focuses on the particular case of meshes.Peer ReviewedPostprint (published version
- …