Search CORE

1,046,010 research outputs found

A parallel butterfly algorithm

Author: Demanet Laurent
Maxwell Nicholas
Poulson Jack
Ying Lexing
Publication venue
Publication date: 25/11/2013
Field of study

The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform \int K(x,y) g(y) dy at large numbers of target points when the kernel, K(x,y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(N^d) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r^2 N^d log N). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of \alpha and per-process inverse bandwidth of \beta, executes in at most O(r^2 N^d/p log N + \beta r N^d/p + \alpha)log p) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x,y)=exp(i \Phi(x,y)), where \Phi(x,y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms and an analogue of a 3D generalized Radon transform were respectively observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively.Comment: To appear in SIAM Journal on Scientific Computin

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Simple parallel and distributed algorithms for spectral graph sparsification

Author: Jonathan
Koutis Ioannis
Koutis Ioannis
Livne Oren E.
Peng Richard
Publication venue
Publication date: 17/04/2014
Field of study

We describe a simple algorithm for spectral graph sparsification, based on iterative computations of weighted spanners and uniform sampling. Leveraging the algorithms of Baswana and Sen for computing spanners, we obtain the first distributed spectral sparsification algorithm. We also obtain a parallel algorithm with improved work and time guarantees. Combining this algorithm with the parallel framework of Peng and Spielman for solving symmetric diagonally dominant linear systems, we get a parallel solver which is much closer to being practical and significantly more efficient in terms of the total work.Comment: replaces "A simple parallel and distributed algorithm for spectral sparsification". Minor change

arXiv.org e-Print Archive

Crossref

Scalable parallel computation of the translation operator in three dimensions

Author: Bogaert Ignace
De Zutter Daniël
Fostier Jan
Michiels Bart
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

We propose a novel algorithm for the parallel, distributed-memory computation of the translation operator in the three-dimensional multilevel fast multipole algorithm (MLFMA). Sequential algorithms can compute the translation operator with L multipoles and O(L-2) sampling points in O(L-2) time. State-of-the-art hierarchical parallelization schemes of the MLFMA rely on the distribution of radiation patterns and associated translation operators among P = O(L-2) parallel processes, necessitating the development of distributed-memory algorithms for the computation of the translation operator. Whereas a baseline parallel algorithm computes this translation operator in O(L) time, we propose an algorithm that achieves this in only O(log L) time. For large translation operators and a high number of parallel processes, our algorithm proves to be roughly ten times faster than the baseline algorithm

Crossref

Ghent University Academic Bibliography

Parallel Algorithm and Dynamic Exponent for Diffusion-limited Aggregation

Author: A. Gibbons
C. Amitrano
C. Amitrano
C. H. Bennett
C. H. Bennett
C. H. Papadimitriou
H. Kaufman
J. Machta
J. Machta
J. Machta
P. Ossadnik
P. Ossadnik
R. C. Ball
R. F. Voss
R. F. Voss
R. Greenlaw
R. J. Anderson
S. Tolman
T. A. Witten
T. C. Halsey
T. Vicsek
Publication venue: 'American Physical Society (APS)'
Publication date: 17/12/1996
Field of study

A parallel algorithm for ``diffusion-limited aggregation'' (DLA) is described and analyzed from the perspective of computational complexity. The dynamic exponent z of the algorithm is defined with respect to the probabilistic parallel random-access machine (PRAM) model of parallel computation according to

T \sim L^{z}

, where L is the cluster size, T is the running time, and the algorithm uses a number of processors polynomial in L\@. It is argued that z=D-D_2/2, where D is the fractal dimension and D_2 is the second generalized dimension. Simulations of DLA are carried out to measure D_2 and to test scaling assumptions employed in the complexity analysis of the parallel algorithm. It is plausible that the parallel algorithm attains the minimum possible value of the dynamic exponent in which case z characterizes the intrinsic history dependence of DLA.Comment: 24 pages Revtex and 2 figures. A major improvement to the algorithm and smaller dynamic exponent in this versio

arXiv.org e-Print Archive

Crossref

An SMP Soft Classification Algorithm for Remote Sensing

Author: Easterling David R.
Phillips Rhonda D.
Watson Layne T.
Wynne Randolph H.
Publication venue
Publication date: 01/01/2012
Field of study

This work introduces a symmetric multiprocessing (SMP) version of the continuous iterative guided spectral class rejection (CIGSCR) algorithm, a semiautomated classiﬁcation algorithm for remote sensing (multispectral) images. The algorithm uses soft data clusters to produce a soft classiﬁcation containing inherently more information than a comparable hard classiﬁcation at an increased computational cost. Previous work suggests that similar algorithms achieve good parallel scalability, motivating the parallel algorithm development work here. Experimental results of applying parallel CIGSCR to an image with approximately 10^8 pixels and six bands demonstrate superlinear speedup. A soft two class classiﬁcation is generated in just over four minutes using 32 processors

Computer Science Technical Reports @Virginia Tech