1,046,010 research outputs found
A parallel butterfly algorithm
The butterfly algorithm is a fast algorithm which approximately evaluates a
discrete analogue of the integral transform \int K(x,y) g(y) dy at large
numbers of target points when the kernel, K(x,y), is approximately low-rank
when restricted to subdomains satisfying a certain simple geometric condition.
In d dimensions with O(N^d) quasi-uniformly distributed source and target
points, when each appropriate submatrix of K is approximately rank-r, the
running time of the algorithm is at most O(r^2 N^d log N). A parallelization of
the butterfly algorithm is introduced which, assuming a message latency of
\alpha and per-process inverse bandwidth of \beta, executes in at most O(r^2
N^d/p log N + \beta r N^d/p + \alpha)log p) time using p processes. This
parallel algorithm was then instantiated in the form of the open-source
DistButterfly library for the special case where K(x,y)=exp(i \Phi(x,y)), where
\Phi(x,y) is a black-box, sufficiently smooth, real-valued phase function.
Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for
important classes of phase functions. Using quasi-uniform sources, hyperbolic
Radon transforms and an analogue of a 3D generalized Radon transform were
respectively observed to strong-scale from 1-node/16-cores up to
1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively.Comment: To appear in SIAM Journal on Scientific Computin
Simple parallel and distributed algorithms for spectral graph sparsification
We describe a simple algorithm for spectral graph sparsification, based on
iterative computations of weighted spanners and uniform sampling. Leveraging
the algorithms of Baswana and Sen for computing spanners, we obtain the first
distributed spectral sparsification algorithm. We also obtain a parallel
algorithm with improved work and time guarantees. Combining this algorithm with
the parallel framework of Peng and Spielman for solving symmetric diagonally
dominant linear systems, we get a parallel solver which is much closer to being
practical and significantly more efficient in terms of the total work.Comment: replaces "A simple parallel and distributed algorithm for spectral
sparsification". Minor change
Scalable parallel computation of the translation operator in three dimensions
We propose a novel algorithm for the parallel, distributed-memory computation of the translation operator in the three-dimensional multilevel fast multipole algorithm (MLFMA). Sequential algorithms can compute the translation operator with L multipoles and O(L-2) sampling points in O(L-2) time. State-of-the-art hierarchical parallelization schemes of the MLFMA rely on the distribution of radiation patterns and associated translation operators among P = O(L-2) parallel processes, necessitating the development of distributed-memory algorithms for the computation of the translation operator. Whereas a baseline parallel algorithm computes this translation operator in O(L) time, we propose an algorithm that achieves this in only O(log L) time. For large translation operators and a high number of parallel processes, our algorithm proves to be roughly ten times faster than the baseline algorithm
Parallel Algorithm and Dynamic Exponent for Diffusion-limited Aggregation
A parallel algorithm for ``diffusion-limited aggregation'' (DLA) is described
and analyzed from the perspective of computational complexity. The dynamic
exponent z of the algorithm is defined with respect to the probabilistic
parallel random-access machine (PRAM) model of parallel computation according
to , where L is the cluster size, T is the running time, and the
algorithm uses a number of processors polynomial in L\@. It is argued that
z=D-D_2/2, where D is the fractal dimension and D_2 is the second generalized
dimension. Simulations of DLA are carried out to measure D_2 and to test
scaling assumptions employed in the complexity analysis of the parallel
algorithm. It is plausible that the parallel algorithm attains the minimum
possible value of the dynamic exponent in which case z characterizes the
intrinsic history dependence of DLA.Comment: 24 pages Revtex and 2 figures. A major improvement to the algorithm
and smaller dynamic exponent in this versio
An SMP Soft Classification Algorithm for Remote Sensing
This work introduces a symmetric multiprocessing (SMP) version of the continuous iterative
guided spectral class rejection (CIGSCR) algorithm, a semiautomated classification algorithm for remote
sensing (multispectral) images. The algorithm uses soft data clusters to produce a soft classification
containing inherently more information than a comparable hard classification at an increased computational
cost. Previous work suggests that similar algorithms achieve good parallel scalability, motivating the parallel
algorithm development work here. Experimental results of applying parallel CIGSCR to an image with
approximately 10^8 pixels and six bands demonstrate superlinear speedup. A soft two class classification is
generated in just over four minutes using 32 processors
- …
