Search CORE

12,404 research outputs found

On the average running time of odd-even merge sort

Author: Rüb C.
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/1995
Field of study

This paper is concerned with the average running time of Batcher's odd-even merge sort when implemented on a collection of processors. We consider the case where

n

, the size of the input, is an arbitrary multiple of the number

p

of processors used. We show that Batcher's odd-even merge (for two sorted lists of length

n

each) can be implemented to run in time

O((n/p)(\log (2+p^2/n)))

on the average, and that odd-even merge sort can be implemented to run in time

O((n/p)(\log n+\log p\log (2+p^2/n)))

on the average. In the case of merging (sorting), the average is taken over all possible outcomes of the merging (all possible permutations of

n

elements). That means that odd-even merge and odd-even merge sort have an optimal average running time if

n\geq p^2

. The constants involved are also quite small

MPG.PuRe

A Parallel Monte Carlo Code for Simulating Collisional N-body Systems

Author: Alok Choudhary
Bharath Pattabiraman
Binney
Böker
Chatterjee
Collins
Frederic A. Rasio
Fregeau
Fregeau
Giersz
Gokhan Memik
Goswami
Gropp
Heggie
Heggie
Heggie
Joshi
Joshi
L'Ecuyer
Li
Lightman
Lusk
McLaughlin
Merritt
Miller
Nvidia.
Spitzer
Stefan Umbreit
Stodolkiewicz
Trenti
Umbreit
Vassiliki Kalogera
Wei-keng Liao
Publication venue: 'IOP Publishing'
Publication date: 15/11/2012
Field of study

We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N~10^7 particles. Our code is based on the the Henon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures, and the introduction of a parallel random number generation scheme, as well as a parallel sorting algorithm, required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. The implementation uses the Message Passing Interface (MPI) library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude, from 10^5 to 10^7. We find that our results are in good agreement with self-similar core-collapse solutions, and the core collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within less than 0.04% throughout all simulations. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N=10^5, 128 for N=10^6 and 256 for N=10^7. The runtime reaches a saturation with the addition of more processors beyond these limits which is a characteristic of the parallel sorting algorithm. The resulting maximum speedups we achieve are approximately 60x, 100x, and 220x, respectively.Comment: 53 pages, 13 figures, accepted for publication in ApJ Supplement

arXiv.org e-Print Archive

Crossref

Recommended from our members

Crosslinking in parallel

Author: Asuri Hari S.
Publication venue: eScholarship, University of California
Publication date: 01/01/1992
Field of study

A crosslink is a double link established between the two entries of an edge in an adjacency list representation of a graph. Crosslinks play important roles in several parallel algorithms as they provide constant time access between the two entries of an edge; the existence of crosslinks is usually assumed. We consider the problem of establishing crosslinks in a crosslink-less adjacency list for graphs that belong to a class of graphs called the linearly contractible graphs, and show that cross-links can be established optimally in O(log n log*n) time using a CREW PRAM and optimally in O(log n) time using a CRCW PRAM for such graphs

eScholarship - University of California