637 research outputs found
Fast evaluation of real and complex exponential sums
Recently, the butterfly approximation scheme and hierarchical approximations
have been proposed for the efficient computation of integral transforms with
oscillatory and with asymptotically smooth kernels. Combining both approaches,
we propose a certain fast Fourier-Laplace transform, which in particular allows
for the fast evaluation of polynomials at nodes in the complex unit disk. All
theoretical results are illustrated by numerical experiments
A Fast Butterfly Algorithm for the Computation of Fourier Integral Operators
This paper is concerned with the fast computation of Fourier integral
operators of the general form \int_{\R^d} e^{2\pi\i \Phi(x,k)} f(k) d k,
where is a frequency variable, is a phase function obeying a
standard homogeneity condition, and is a given input. This is of interest
for such fundamental computations are connected with the problem of finding
numerical solutions to wave equations, and also frequently arise in many
applications including reflection seismology, curvilinear tomography and
others. In two dimensions, when the input and output are sampled on Cartesian grids, a direct evaluation requires operations, which is
often times prohibitively expensive.
This paper introduces a novel algorithm running in time, i.
e. with near-optimal computational complexity, and whose overall structure
follows that of the butterfly algorithm [Michielssen and Boag, IEEE Trans
Antennas Propagat 44 (1996), 1086-1093]. Underlying this algorithm is a
mathematical insight concerning the restriction of the kernel e^{2\pi\i
\Phi(x,k)} to subsets of the time and frequency domains. Whenever these
subsets obey a simple geometric condition, the restricted kernel has
approximately low-rank; we propose constructing such low-rank approximations
using a special interpolation scheme, which prefactors the oscillatory
component, interpolates the remaining nonoscillatory part and, lastly,
remodulates the outcome. A byproduct of this scheme is that the whole algorithm
is highly efficient in terms of memory requirement. Numerical results
demonstrate the performance and illustrate the empirical properties of this
algorithm.Comment: 25 pages, 2 figure
A Multiscale Butterfly Algorithm for Multidimensional Fourier Integral Operators
This paper presents an efficient multiscale butterfly algorithm for computing
Fourier integral operators (FIOs) of the form (\mathcal{L} f)(x) =
\int_{R^d}a(x,\xi) e^{2\pi \i \Phi(x,\xi)}\hat{f}(\xi) d\xi, where
is a phase function, is an amplitude function, and
is a given input. The frequency domain is hierarchically decomposed into
a union of Cartesian coronas. The integral kernel a(x,\xi) e^{2\pi \i
\Phi(x,\xi)} in each corona satisfies a special low-rank property that enables
the application of a butterfly algorithm on the Cartesian phase-space grid.
This leads to an algorithm with quasi-linear operation complexity and linear
memory complexity. Different from previous butterfly methods for the FIOs, this
new approach is simple and reduces the computational cost by avoiding extra
coordinate transformations. Numerical examples in two and three dimensions are
provided to demonstrate the practical advantages of the new algorithm
Butterfly-Net: Optimal Function Representation Based on Convolutional Neural Networks
Deep networks, especially convolutional neural networks (CNNs), have been
successfully applied in various areas of machine learning as well as to
challenging problems in other scientific and engineering fields. This paper
introduces Butterfly-Net, a low-complexity CNN with structured and sparse
cross-channel connections, together with a Butterfly initialization strategy
for a family of networks. Theoretical analysis of the approximation power of
Butterfly-Net to the Fourier representation of input data shows that the error
decays exponentially as the depth increases. Combining Butterfly-Net with a
fully connected neural network, a large class of problems are proved to be well
approximated with network complexity depending on the effective frequency
bandwidth instead of the input dimension. Regular CNN is covered as a special
case in our analysis. Numerical experiments validate the analytical results on
the approximation of Fourier kernels and energy functionals of Poisson's
equations. Moreover, all experiments support that training from Butterfly
initialization outperforms training from random initialization. Also, adding
the remaining cross-channel connections, although significantly increase the
parameter number, does not much improve the post-training accuracy and is more
sensitive to data distribution
A Unified Framework for Oscillatory Integral Transform: When to use NUFFT or Butterfly Factorization?
This paper concerns the fast evaluation of the matvec for , which is the discretization of the oscillatory
integral transform with a kernel function
K(x,\xi)=\alpha(x,\xi)e^{2\pi\i \Phi(x,\xi)}, where is a
smooth amplitude function, and is a piecewise smooth phase
function with discontinuous points in and . A unified framework
is proposed to compute with time and memory complexity via
the non-uniform fast Fourier transform (NUFFT) or the butterfly factorization
(BF), together with an fast algorithm to determine whether NUFFT or BF
is more suitable. This framework works for two cases: 1) explicit formulas for
the amplitude and phase functions are known, 2) only indirect access of the
amplitude and phase functions are available. Especially in the case of indirect
access, our main contributions are: 1) an algorithm for recovering
the amplitude and phase functions is proposed based on a new low-rank matrix
recovery algorithm, 2) a new stable and nearly optimal BF with amplitude and
phase functions in a form of a low-rank factorization (IBF-MAT) is proposed to
evaluate the matvec . Numerical results are provided to demonstrate the
effectiveness of the proposed framework
Fast and backward stable transforms between spherical harmonic expansions and bivariate Fourier series
A rapid transformation is derived between spherical harmonic expansions and
their analogues in a bivariate Fourier series. The change of basis is described
in two steps: firstly, expansions in normalized associated Legendre functions
of all orders are converted to those of order zero and one; then, these
intermediate expressions are re-expanded in trigonometric form. The first step
proceeds with a butterfly factorization of the well-conditioned matrices of
connection coefficients. The second step proceeds with fast orthogonal
polynomial transforms via hierarchically off-diagonal low-rank matrix
decompositions. Total pre-computation requires at best
flops; and, asymptotically optimal execution time of
is rigorously proved via connection to Fourier integral operators.Comment: arXiv admin note: text overlap with arXiv:0910.5435 by other author
Recommended from our members
Sparse Recovery and Representation Learning
This dissertation focuses on sparse representation and dictionary learning, with three relative topics. First, in chapter 1, we study the problem of low-rank matrix recovery in the presence of prior information. We first study the recovery of low-rank matrices with a necessary and sufficient condition, called the Null Space Property, for exact recovery from compressively sampled measurements using nuclear norm minimization. Here, we provide an alternative theoretical analysis of the bound on the number of random Gaussian measurements needed for the condition to be satisfied with high probability. We then study low-rank matrix recovery when prior information is available. We analyze an existing algorithm, provide the necessary and sufficient conditions for exact recovery and show that the existing algorithm is limited in certain cases. We provide an alternative recovery algorithm to deal with the drawback and provide sufficient recovery conditions based on that. In chapter 2, we study the problem of learning a sparsifying dictionary of a set of data, focusing on learning dictionaries that admit fast transforms. Inspired by the Fast Fourier Transform, we propose a learning algorithm involving unknown parameters for a linear transformation matrix. Empirically, our algorithm can produce dictionaries that provide lower numerical sparsity for the sparse representation of images than the Discrete Fourier Transformation (DFT). Additionally, due to its structure, the learned dictionary can recover the original signal from the sparse representation in computations. In chapter 3, we study the representation learning problem in a more complex setting. We use the concept of dictionary learning and apply it in a deep generative model. Motivated by an application in the computer gaming industry where designers needs to have an urban layout generation tool that allows fast generation and modification, we present a novel solution to synthesize high quality building placements using conditional generative latent optimization together with adversarial training. The capability of the proposed method is demonstrated in various examples. The inference is nearly in real time, thus it can assist designers to iterate their designs of virtual cities quickly
A fast butterfly algorithm for the hyperbolic Radon transform
We introduce a fast butterfly algorithm for the hyperbolic Radon transform commonly used in seismic data processing. For two-dimensional data, the algorithm runs in complexity O(N[superscript 2] logN), where N is representative of the number of points in either dimension of data space or model space. Using a series of examples, we show that the proposed algorithm is significantly more efficient than conventional integration
Randomized estimation of spectral densities of large matrices made accurate
For a large Hermitian matrix , it is often the
case that the only affordable operation is matrix-vector multiplication. In
such case, randomized method is a powerful way to estimate the spectral density
(or density of states) of . However, randomized methods developed so far for
estimating spectral densities only extract information from different random
vectors independently, and the accuracy is therefore inherently limited to
where is the number of random vectors. In
this paper we demonstrate that the " barrier" can
be overcome by taking advantage of the correlated information of random vectors
when properly filtered by polynomials of . Our method uses the fact that the
estimation of the spectral density essentially requires the computation of the
trace of a series of matrix functions that are numerically low rank. By
repeatedly applying to the same set of random vectors and taking different
linear combination of the results, we can sweep through the entire spectrum of
by building such low rank decomposition at different parts of the spectrum.
Under some assumptions, we demonstrate that a robust and efficient
implementation of such spectrum sweeping method can compute the spectral
density accurately with computational cost and
memory cost. Numerical results indicate that the new method
can significantly outperform existing randomized methods in terms of accuracy.
As an application, we demonstrate a way to accurately compute a trace of a
smooth matrix function, by carefully balancing the smoothness of the integrand
and the regularized density of states using a deconvolution procedure
Monarch: Expressive Structured Matrices for Efficient and Accurate Training
Large neural networks excel in many domains, but they are expensive to train
and fine-tune. A popular approach to reduce their compute or memory
requirements is to replace dense weight matrices with structured ones (e.g.,
sparse, low-rank, Fourier transform). These methods have not seen widespread
adoption (1) in end-to-end training due to unfavorable efficiency--quality
tradeoffs, and (2) in dense-to-sparse fine-tuning due to lack of tractable
algorithms to approximate a given dense weight matrix. To address these issues,
we propose a class of matrices (Monarch) that is hardware-efficient (they are
parameterized as products of two block-diagonal matrices for better hardware
utilization) and expressive (they can represent many commonly used transforms).
Surprisingly, the problem of approximating a dense weight matrix with a Monarch
matrix, though nonconvex, has an analytical optimal solution. These properties
of Monarch matrices unlock new ways to train and fine-tune sparse and dense
models. We empirically validate that Monarch can achieve favorable
accuracy-efficiency tradeoffs in several end-to-end sparse training
applications: speeding up ViT and GPT-2 training on ImageNet classification and
Wikitext-103 language modeling by 2x with comparable model quality, and
reducing the error on PDE solving and MRI reconstruction tasks by 40%. In
sparse-to-dense training, with a simple technique called "reverse
sparsification," Monarch matrices serve as a useful intermediate representation
to speed up GPT-2 pretraining on OpenWebText by 2x without quality drop. The
same technique brings 23% faster BERT pretraining than even the very optimized
implementation from Nvidia that set the MLPerf 1.1 record. In dense-to-sparse
fine-tuning, as a proof-of-concept, our Monarch approximation algorithm speeds
up BERT fine-tuning on GLUE by 1.7x with comparable accuracy
- ā¦