Search CORE

659 research outputs found

On the average running time of odd-even merge sort

Author: Rüb C.
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/1995
Field of study

This paper is concerned with the average running time of Batcher's odd-even merge sort when implemented on a collection of processors. We consider the case where

n

, the size of the input, is an arbitrary multiple of the number

p

of processors used. We show that Batcher's odd-even merge (for two sorted lists of length

n

each) can be implemented to run in time

O((n/p)(\log (2+p^2/n)))

on the average, and that odd-even merge sort can be implemented to run in time

O((n/p)(\log n+\log p\log (2+p^2/n)))

on the average. In the case of merging (sorting), the average is taken over all possible outcomes of the merging (all possible permutations of

n

elements). That means that odd-even merge and odd-even merge sort have an optimal average running time if

n\geq p^2

. The constants involved are also quite small

MPG.PuRe

Ordered fast fourier transforms on a massively parallel hypercube multiprocessor

Author: Swarztrauber Paul N.
Tong Charles
Publication venue
Publication date
Field of study

Design alternatives for ordered Fast Fourier Transformation (FFT) algorithms were examined on massively parallel hypercube multiprocessors such as the Connection Machine. Particular emphasis is placed on reducing communication which is known to dominate the overall computing time. To this end, the order and computational phases of the FFT were combined, and the sequence to processor maps that reduce communication were used. The class of ordered transforms is expanded to include any FFT in which the order of the transform is the same as that of the input sequence. Two such orderings are examined, namely, standard-order and A-order which can be implemented with equal ease on the Connection Machine where orderings are determined by geometries and priorities. If the sequence has N = 2 exp r elements and the hypercube has P = 2 exp d processors, then a standard-order FFT can be implemented with d + r/2 + 1 parallel transmissions. An A-order sequence can be transformed with 2d - r/2 parallel transmissions which is r - d + 1 fewer than the standard order. A parallel method for computing the trigonometric coefficients is presented that does not use trigonometric functions or interprocessor communication. A performance of 0.9 GFLOPS was obtained for an A-order transform on the Connection Machine

NASA Technical Reports Server

Sorting Integers on the AP1000

Author: Lynes Andrew
Weaver Lex
Publication venue
Publication date: 23/04/2000
Field of study

Sorting is one of the classic problems of computer science. Whilst well understood on sequential machines, the diversity of architectures amongst parallel systems means that algorithms do not perform uniformly on all platforms. This document describes the implementation of a radix based algorithm for sorting positive integers on a Fujitsu AP1000 Supercomputer, which was constructed as an entry in the Joint Symposium on Parallel Processing (JSPP) 1994 Parallel Software Contest (PSC94). Brief consideration is also given to a full radix sort conducted in parallel across the machine.Comment: 1994 Project Report, 23 page

arXiv.org e-Print Archive

CERN Document Server

Novel Approach to Super Yang-Mills Theory on Lattice - Exact fermionic symmetry and "Ichimatsu" pattern -

Author: A. Feo
D.B. Kaplan
D.B. Kaplan
D.B. Kaplan
G. Curci
H. Aratyn
H. Aratyn
H. Aratyn
H. Aratyn
H. J. Rothe
Hideyuki Sawanaka
Hiroto So
K. Itoh
K. Itoh
K. Itoh
K. Itoh
K. Osterwalder
Katsumi Itoh
Mitsuhiro Kato
Naoya Ukita
T. Kugo
Publication venue: 'IOP Publishing'
Publication date: 28/10/2002
Field of study

We present a lattice theory with an exact fermionic symmetry, which mixes the link and the fermionic variables. The staggered fermionic variables may be reconstructed into a Majorana fermion in the continuum limit. The gauge action has a novel structure. Though it is the ordinary plaquette action, two different couplings are assigned in the ``Ichimatsu pattern'' or the checkered pattern. In the naive continuum limit, the fermionic symmetry survives as a continuum (or an

O(a^0)

) symmetry. The transformation of the fermion is proportional to the field strength multiplied by the difference of the two gauge couplings in this limit. This work is an extension of our recently proposed cell model toward the realization of supersymmetric Yang-Mills theory on lattice.Comment: 26 pages, 4 figure

arXiv.org e-Print Archive

Crossref

CERN Document Server

Highly parallel computation

Author: Denning Peter J.
Tichy Walter F.
Publication venue
Publication date
Field of study

Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed

NASA Technical Reports Server

Energy Scaling of Minimum-Bias Tunes

Author: A. Buckley
A. Donnachie
A. Sherstnev
B. Andersson
B. Andersson
D.E. Acosta
F. Abe
G. Aad
H. Schulz
H.L. Lai
J. Pumplin
J. Rathsman
M. Dobbs
P. Z. Skands
P.Z. Skands
P.Z. Skands
R. Ansorge
R.E. Ansorge
T. Aaltonen
T. Rogers
T. Sjöstrand
T. Sjöstrand
T. Sjöstrand
T. Sjöstrand
T. Sjöstrand
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

We propose that the flexibility offered by modern event-generator tuning tools allows for more than just obtaining "best fits" to a collection of data. In particular, we argue that the universality of the underlying physics model can be tested by performing several, mutually independent, optimizations of the generator parameters in different physical regions. For regions in which these optimizations return similar and self-consistent parameter values, the model can be considered universal. Deviations from this behavior can be associated with a breakdown of the modeling, with the nature of the deviations giving clues as to the nature of the breakdown. We apply this procedure to study the energy scaling of a class of minimum-bias models based on multiple parton interactions (MPI) and pT-ordered showers, implemented in the Pythia 6.4 generator. We find that a parameter controlling the strength of color reconnections in the final state is the most important source of non-universality in this model.Comment: 17 pages, 3 figures, 4 table

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Springer - Publisher Connector

CERN Document Server

Concurrent Image Processing Executive (CIPE)

Author: Cooper Gregory T.
Groom Steven L.
Lee Meemong
Mazer Alan S.
Williams Winifred I.
Publication venue
Publication date
Field of study

The design and implementation of a Concurrent Image Processing Executive (CIPE), which is intended to become the support system software for a prototype high performance science analysis workstation are discussed. The target machine for this software is a JPL/Caltech Mark IIIfp Hypercube hosted by either a MASSCOMP 5600 or a Sun-3, Sun-4 workstation; however, the design will accommodate other concurrent machines of similar architecture, i.e., local memory, multiple-instruction-multiple-data (MIMD) machines. The CIPE system provides both a multimode user interface and an applications programmer interface, and has been designed around four loosely coupled modules; (1) user interface, (2) host-resident executive, (3) hypercube-resident executive, and (4) application functions. The loose coupling between modules allows modification of a particular module without significantly affecting the other modules in the system. In order to enhance hypercube memory utilization and to allow expansion of image processing capabilities, a specialized program management method, incremental loading, was devised. To minimize data transfer between host and hypercube a data management method which distributes, redistributes, and tracks data set information was implemented

NASA Technical Reports Server

Parallel smoothing algorithms for causal and acausal systems

Author
Publication venue: Massachusetts Institute of Technology, Laboratory for Information and Decision Systems
Publication date: 01/01/1991
Field of study

Includes bibliographical references (p. 17-18).Caption title.Research supported in part by the Air Force Office of Scientific Research. AFOSR-88-0032 Research supported in part by the U.S. Army Research Office. DAAL03-86-K-0171 Research supported in part by the Office of Naval Research. N00014-91-J-1001Darrin Taylor and Alan S. Willsky

DSpace@MIT