20 research outputs found
A parallel butterfly algorithm
The butterfly algorithm is a fast algorithm which approximately evaluates a
discrete analogue of the integral transform \int K(x,y) g(y) dy at large
numbers of target points when the kernel, K(x,y), is approximately low-rank
when restricted to subdomains satisfying a certain simple geometric condition.
In d dimensions with O(N^d) quasi-uniformly distributed source and target
points, when each appropriate submatrix of K is approximately rank-r, the
running time of the algorithm is at most O(r^2 N^d log N). A parallelization of
the butterfly algorithm is introduced which, assuming a message latency of
\alpha and per-process inverse bandwidth of \beta, executes in at most O(r^2
N^d/p log N + \beta r N^d/p + \alpha)log p) time using p processes. This
parallel algorithm was then instantiated in the form of the open-source
DistButterfly library for the special case where K(x,y)=exp(i \Phi(x,y)), where
\Phi(x,y) is a black-box, sufficiently smooth, real-valued phase function.
Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for
important classes of phase functions. Using quasi-uniform sources, hyperbolic
Radon transforms and an analogue of a 3D generalized Radon transform were
respectively observed to strong-scale from 1-node/16-cores up to
1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively.Comment: To appear in SIAM Journal on Scientific Computin
Fast hyperbolic Radon transform represented as convolutions in log-polar coordinates
The hyperbolic Radon transform is a commonly used tool in seismic processing,
for instance in seismic velocity analysis, data interpolation and for multiple
removal. A direct implementation by summation of traces with different moveouts
is computationally expensive for large data sets. In this paper we present a
new method for fast computation of the hyperbolic Radon transforms. It is based
on using a log-polar sampling with which the main computational parts reduce to
computing convolutions. This allows for fast implementations by means of FFT.
In addition to the FFT operations, interpolation procedures are required for
switching between coordinates in the time-offset; Radon; and log-polar domains.
Graphical Processor Units (GPUs) are suitable to use as a computational
platform for this purpose, due to the hardware supported interpolation routines
as well as optimized routines for FFT. Performance tests show large speed-ups
of the proposed algorithm. Hence, it is suitable to use in iterative methods,
and we provide examples for data interpolation and multiple removal using this
approach.Comment: 21 pages, 10 figures, 2 table
Butterfly Factorization
The paper introduces the butterfly factorization as a data-sparse
approximation for the matrices that satisfy a complementary low-rank property.
The factorization can be constructed efficiently if either fast algorithms for
applying the matrix and its adjoint are available or the entries of the matrix
can be sampled individually. For an matrix, the resulting
factorization is a product of sparse matrices, each with
non-zero entries. Hence, it can be applied rapidly in operations.
Numerical results are provided to demonstrate the effectiveness of the
butterfly factorization and its construction algorithms