659 research outputs found
On the average running time of odd-even merge sort
This paper is concerned with the average running time of Batcher's odd-even merge sort when implemented on a collection of processors. We consider the case where , the size of the input, is an arbitrary multiple of the number of processors used. We show that Batcher's odd-even merge (for two sorted lists of length each) can be implemented to run in time on the average, and that odd-even merge sort can be implemented to run in time on the average. In the case of merging (sorting), the average is taken over all possible outcomes of the merging (all possible permutations of elements). That means that odd-even merge and odd-even merge sort have an optimal average running time if . The constants involved are also quite small
Ordered fast fourier transforms on a massively parallel hypercube multiprocessor
Design alternatives for ordered Fast Fourier Transformation (FFT) algorithms were examined on massively parallel hypercube multiprocessors such as the Connection Machine. Particular emphasis is placed on reducing communication which is known to dominate the overall computing time. To this end, the order and computational phases of the FFT were combined, and the sequence to processor maps that reduce communication were used. The class of ordered transforms is expanded to include any FFT in which the order of the transform is the same as that of the input sequence. Two such orderings are examined, namely, standard-order and A-order which can be implemented with equal ease on the Connection Machine where orderings are determined by geometries and priorities. If the sequence has N = 2 exp r elements and the hypercube has P = 2 exp d processors, then a standard-order FFT can be implemented with d + r/2 + 1 parallel transmissions. An A-order sequence can be transformed with 2d - r/2 parallel transmissions which is r - d + 1 fewer than the standard order. A parallel method for computing the trigonometric coefficients is presented that does not use trigonometric functions or interprocessor communication. A performance of 0.9 GFLOPS was obtained for an A-order transform on the Connection Machine
Sorting Integers on the AP1000
Sorting is one of the classic problems of computer science. Whilst well
understood on sequential machines, the diversity of architectures amongst
parallel systems means that algorithms do not perform uniformly on all
platforms. This document describes the implementation of a radix based
algorithm for sorting positive integers on a Fujitsu AP1000 Supercomputer,
which was constructed as an entry in the Joint Symposium on Parallel Processing
(JSPP) 1994 Parallel Software Contest (PSC94). Brief consideration is also
given to a full radix sort conducted in parallel across the machine.Comment: 1994 Project Report, 23 page
Novel Approach to Super Yang-Mills Theory on Lattice - Exact fermionic symmetry and "Ichimatsu" pattern -
We present a lattice theory with an exact fermionic symmetry, which mixes the
link and the fermionic variables. The staggered fermionic variables may be
reconstructed into a Majorana fermion in the continuum limit. The gauge action
has a novel structure. Though it is the ordinary plaquette action, two
different couplings are assigned in the ``Ichimatsu pattern'' or the checkered
pattern. In the naive continuum limit, the fermionic symmetry survives as a
continuum (or an ) symmetry. The transformation of the fermion is
proportional to the field strength multiplied by the difference of the two
gauge couplings in this limit. This work is an extension of our recently
proposed cell model toward the realization of supersymmetric Yang-Mills theory
on lattice.Comment: 26 pages, 4 figure
Highly parallel computation
Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed
Energy Scaling of Minimum-Bias Tunes
We propose that the flexibility offered by modern event-generator tuning
tools allows for more than just obtaining "best fits" to a collection of data.
In particular, we argue that the universality of the underlying physics model
can be tested by performing several, mutually independent, optimizations of the
generator parameters in different physical regions. For regions in which these
optimizations return similar and self-consistent parameter values, the model
can be considered universal. Deviations from this behavior can be associated
with a breakdown of the modeling, with the nature of the deviations giving
clues as to the nature of the breakdown. We apply this procedure to study the
energy scaling of a class of minimum-bias models based on multiple parton
interactions (MPI) and pT-ordered showers, implemented in the Pythia 6.4
generator. We find that a parameter controlling the strength of color
reconnections in the final state is the most important source of
non-universality in this model.Comment: 17 pages, 3 figures, 4 table
Concurrent Image Processing Executive (CIPE)
The design and implementation of a Concurrent Image Processing Executive (CIPE), which is intended to become the support system software for a prototype high performance science analysis workstation are discussed. The target machine for this software is a JPL/Caltech Mark IIIfp Hypercube hosted by either a MASSCOMP 5600 or a Sun-3, Sun-4 workstation; however, the design will accommodate other concurrent machines of similar architecture, i.e., local memory, multiple-instruction-multiple-data (MIMD) machines. The CIPE system provides both a multimode user interface and an applications programmer interface, and has been designed around four loosely coupled modules; (1) user interface, (2) host-resident executive, (3) hypercube-resident executive, and (4) application functions. The loose coupling between modules allows modification of a particular module without significantly affecting the other modules in the system. In order to enhance hypercube memory utilization and to allow expansion of image processing capabilities, a specialized program management method, incremental loading, was devised. To minimize data transfer between host and hypercube a data management method which distributes, redistributes, and tracks data set information was implemented
Parallel smoothing algorithms for causal and acausal systems
Includes bibliographical references (p. 17-18).Caption title.Research supported in part by the Air Force Office of Scientific Research. AFOSR-88-0032 Research supported in part by the U.S. Army Research Office. DAAL03-86-K-0171 Research supported in part by the Office of Naval Research. N00014-91-J-1001Darrin Taylor and Alan S. Willsky
- …