659 research outputs found

    On the average running time of odd-even merge sort

    Get PDF
    This paper is concerned with the average running time of Batcher's odd-even merge sort when implemented on a collection of processors. We consider the case where nn, the size of the input, is an arbitrary multiple of the number pp of processors used. We show that Batcher's odd-even merge (for two sorted lists of length nn each) can be implemented to run in time O((n/p)(log(2+p2/n)))O((n/p)(\log (2+p^2/n))) on the average, and that odd-even merge sort can be implemented to run in time O((n/p)(logn+logplog(2+p2/n)))O((n/p)(\log n+\log p\log (2+p^2/n))) on the average. In the case of merging (sorting), the average is taken over all possible outcomes of the merging (all possible permutations of nn elements). That means that odd-even merge and odd-even merge sort have an optimal average running time if np2n\geq p^2. The constants involved are also quite small

    Ordered fast fourier transforms on a massively parallel hypercube multiprocessor

    Get PDF
    Design alternatives for ordered Fast Fourier Transformation (FFT) algorithms were examined on massively parallel hypercube multiprocessors such as the Connection Machine. Particular emphasis is placed on reducing communication which is known to dominate the overall computing time. To this end, the order and computational phases of the FFT were combined, and the sequence to processor maps that reduce communication were used. The class of ordered transforms is expanded to include any FFT in which the order of the transform is the same as that of the input sequence. Two such orderings are examined, namely, standard-order and A-order which can be implemented with equal ease on the Connection Machine where orderings are determined by geometries and priorities. If the sequence has N = 2 exp r elements and the hypercube has P = 2 exp d processors, then a standard-order FFT can be implemented with d + r/2 + 1 parallel transmissions. An A-order sequence can be transformed with 2d - r/2 parallel transmissions which is r - d + 1 fewer than the standard order. A parallel method for computing the trigonometric coefficients is presented that does not use trigonometric functions or interprocessor communication. A performance of 0.9 GFLOPS was obtained for an A-order transform on the Connection Machine

    Sorting Integers on the AP1000

    Full text link
    Sorting is one of the classic problems of computer science. Whilst well understood on sequential machines, the diversity of architectures amongst parallel systems means that algorithms do not perform uniformly on all platforms. This document describes the implementation of a radix based algorithm for sorting positive integers on a Fujitsu AP1000 Supercomputer, which was constructed as an entry in the Joint Symposium on Parallel Processing (JSPP) 1994 Parallel Software Contest (PSC94). Brief consideration is also given to a full radix sort conducted in parallel across the machine.Comment: 1994 Project Report, 23 page

    Novel Approach to Super Yang-Mills Theory on Lattice - Exact fermionic symmetry and "Ichimatsu" pattern -

    Get PDF
    We present a lattice theory with an exact fermionic symmetry, which mixes the link and the fermionic variables. The staggered fermionic variables may be reconstructed into a Majorana fermion in the continuum limit. The gauge action has a novel structure. Though it is the ordinary plaquette action, two different couplings are assigned in the ``Ichimatsu pattern'' or the checkered pattern. In the naive continuum limit, the fermionic symmetry survives as a continuum (or an O(a0)O(a^0)) symmetry. The transformation of the fermion is proportional to the field strength multiplied by the difference of the two gauge couplings in this limit. This work is an extension of our recently proposed cell model toward the realization of supersymmetric Yang-Mills theory on lattice.Comment: 26 pages, 4 figure

    Highly parallel computation

    Get PDF
    Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed

    Energy Scaling of Minimum-Bias Tunes

    Get PDF
    We propose that the flexibility offered by modern event-generator tuning tools allows for more than just obtaining "best fits" to a collection of data. In particular, we argue that the universality of the underlying physics model can be tested by performing several, mutually independent, optimizations of the generator parameters in different physical regions. For regions in which these optimizations return similar and self-consistent parameter values, the model can be considered universal. Deviations from this behavior can be associated with a breakdown of the modeling, with the nature of the deviations giving clues as to the nature of the breakdown. We apply this procedure to study the energy scaling of a class of minimum-bias models based on multiple parton interactions (MPI) and pT-ordered showers, implemented in the Pythia 6.4 generator. We find that a parameter controlling the strength of color reconnections in the final state is the most important source of non-universality in this model.Comment: 17 pages, 3 figures, 4 table

    Concurrent Image Processing Executive (CIPE)

    Get PDF
    The design and implementation of a Concurrent Image Processing Executive (CIPE), which is intended to become the support system software for a prototype high performance science analysis workstation are discussed. The target machine for this software is a JPL/Caltech Mark IIIfp Hypercube hosted by either a MASSCOMP 5600 or a Sun-3, Sun-4 workstation; however, the design will accommodate other concurrent machines of similar architecture, i.e., local memory, multiple-instruction-multiple-data (MIMD) machines. The CIPE system provides both a multimode user interface and an applications programmer interface, and has been designed around four loosely coupled modules; (1) user interface, (2) host-resident executive, (3) hypercube-resident executive, and (4) application functions. The loose coupling between modules allows modification of a particular module without significantly affecting the other modules in the system. In order to enhance hypercube memory utilization and to allow expansion of image processing capabilities, a specialized program management method, incremental loading, was devised. To minimize data transfer between host and hypercube a data management method which distributes, redistributes, and tracks data set information was implemented

    Parallel smoothing algorithms for causal and acausal systems

    Get PDF
    Includes bibliographical references (p. 17-18).Caption title.Research supported in part by the Air Force Office of Scientific Research. AFOSR-88-0032 Research supported in part by the U.S. Army Research Office. DAAL03-86-K-0171 Research supported in part by the Office of Naval Research. N00014-91-J-1001Darrin Taylor and Alan S. Willsky
    corecore