637 research outputs found

    Fast evaluation of real and complex exponential sums

    Full text link
    Recently, the butterfly approximation scheme and hierarchical approximations have been proposed for the efficient computation of integral transforms with oscillatory and with asymptotically smooth kernels. Combining both approaches, we propose a certain fast Fourier-Laplace transform, which in particular allows for the fast evaluation of polynomials at nodes in the complex unit disk. All theoretical results are illustrated by numerical experiments

    A Fast Butterfly Algorithm for the Computation of Fourier Integral Operators

    Get PDF
    This paper is concerned with the fast computation of Fourier integral operators of the general form \int_{\R^d} e^{2\pi\i \Phi(x,k)} f(k) d k, where kk is a frequency variable, Ī¦(x,k)\Phi(x,k) is a phase function obeying a standard homogeneity condition, and ff is a given input. This is of interest for such fundamental computations are connected with the problem of finding numerical solutions to wave equations, and also frequently arise in many applications including reflection seismology, curvilinear tomography and others. In two dimensions, when the input and output are sampled on NƗNN \times N Cartesian grids, a direct evaluation requires O(N4)O(N^4) operations, which is often times prohibitively expensive. This paper introduces a novel algorithm running in O(N2logā”N)O(N^2 \log N) time, i. e. with near-optimal computational complexity, and whose overall structure follows that of the butterfly algorithm [Michielssen and Boag, IEEE Trans Antennas Propagat 44 (1996), 1086-1093]. Underlying this algorithm is a mathematical insight concerning the restriction of the kernel e^{2\pi\i \Phi(x,k)} to subsets of the time and frequency domains. Whenever these subsets obey a simple geometric condition, the restricted kernel has approximately low-rank; we propose constructing such low-rank approximations using a special interpolation scheme, which prefactors the oscillatory component, interpolates the remaining nonoscillatory part and, lastly, remodulates the outcome. A byproduct of this scheme is that the whole algorithm is highly efficient in terms of memory requirement. Numerical results demonstrate the performance and illustrate the empirical properties of this algorithm.Comment: 25 pages, 2 figure

    A Multiscale Butterfly Algorithm for Multidimensional Fourier Integral Operators

    Full text link
    This paper presents an efficient multiscale butterfly algorithm for computing Fourier integral operators (FIOs) of the form (\mathcal{L} f)(x) = \int_{R^d}a(x,\xi) e^{2\pi \i \Phi(x,\xi)}\hat{f}(\xi) d\xi, where Ī¦(x,Ī¾)\Phi(x,\xi) is a phase function, a(x,Ī¾)a(x,\xi) is an amplitude function, and f(x)f(x) is a given input. The frequency domain is hierarchically decomposed into a union of Cartesian coronas. The integral kernel a(x,\xi) e^{2\pi \i \Phi(x,\xi)} in each corona satisfies a special low-rank property that enables the application of a butterfly algorithm on the Cartesian phase-space grid. This leads to an algorithm with quasi-linear operation complexity and linear memory complexity. Different from previous butterfly methods for the FIOs, this new approach is simple and reduces the computational cost by avoiding extra coordinate transformations. Numerical examples in two and three dimensions are provided to demonstrate the practical advantages of the new algorithm

    Butterfly-Net: Optimal Function Representation Based on Convolutional Neural Networks

    Full text link
    Deep networks, especially convolutional neural networks (CNNs), have been successfully applied in various areas of machine learning as well as to challenging problems in other scientific and engineering fields. This paper introduces Butterfly-Net, a low-complexity CNN with structured and sparse cross-channel connections, together with a Butterfly initialization strategy for a family of networks. Theoretical analysis of the approximation power of Butterfly-Net to the Fourier representation of input data shows that the error decays exponentially as the depth increases. Combining Butterfly-Net with a fully connected neural network, a large class of problems are proved to be well approximated with network complexity depending on the effective frequency bandwidth instead of the input dimension. Regular CNN is covered as a special case in our analysis. Numerical experiments validate the analytical results on the approximation of Fourier kernels and energy functionals of Poisson's equations. Moreover, all experiments support that training from Butterfly initialization outperforms training from random initialization. Also, adding the remaining cross-channel connections, although significantly increase the parameter number, does not much improve the post-training accuracy and is more sensitive to data distribution

    A Unified Framework for Oscillatory Integral Transform: When to use NUFFT or Butterfly Factorization?

    Full text link
    This paper concerns the fast evaluation of the matvec g=Kfg=Kf for KāˆˆCNƗNK\in \mathbb{C}^{N\times N}, which is the discretization of the oscillatory integral transform g(x)=āˆ«K(x,Ī¾)f(Ī¾)dĪ¾g(x) = \int K(x,\xi) f(\xi)d\xi with a kernel function K(x,\xi)=\alpha(x,\xi)e^{2\pi\i \Phi(x,\xi)}, where Ī±(x,Ī¾)\alpha(x,\xi) is a smooth amplitude function, and Ī¦(x,Ī¾)\Phi(x,\xi) is a piecewise smooth phase function with O(1)O(1) discontinuous points in xx and Ī¾\xi. A unified framework is proposed to compute KfKf with O(Nlogā”N)O(N\log N) time and memory complexity via the non-uniform fast Fourier transform (NUFFT) or the butterfly factorization (BF), together with an O(N)O(N) fast algorithm to determine whether NUFFT or BF is more suitable. This framework works for two cases: 1) explicit formulas for the amplitude and phase functions are known, 2) only indirect access of the amplitude and phase functions are available. Especially in the case of indirect access, our main contributions are: 1) an O(Nlogā”N)O(N\log N) algorithm for recovering the amplitude and phase functions is proposed based on a new low-rank matrix recovery algorithm, 2) a new stable and nearly optimal BF with amplitude and phase functions in a form of a low-rank factorization (IBF-MAT) is proposed to evaluate the matvec KfKf. Numerical results are provided to demonstrate the effectiveness of the proposed framework

    Fast and backward stable transforms between spherical harmonic expansions and bivariate Fourier series

    Full text link
    A rapid transformation is derived between spherical harmonic expansions and their analogues in a bivariate Fourier series. The change of basis is described in two steps: firstly, expansions in normalized associated Legendre functions of all orders are converted to those of order zero and one; then, these intermediate expressions are re-expanded in trigonometric form. The first step proceeds with a butterfly factorization of the well-conditioned matrices of connection coefficients. The second step proceeds with fast orthogonal polynomial transforms via hierarchically off-diagonal low-rank matrix decompositions. Total pre-computation requires at best O(n3logā”n)\mathcal{O}(n^3\log n) flops; and, asymptotically optimal execution time of O(n2logā”2n)\mathcal{O}(n^2\log^2 n) is rigorously proved via connection to Fourier integral operators.Comment: arXiv admin note: text overlap with arXiv:0910.5435 by other author

    A fast butterfly algorithm for the hyperbolic Radon transform

    Get PDF
    We introduce a fast butterfly algorithm for the hyperbolic Radon transform commonly used in seismic data processing. For two-dimensional data, the algorithm runs in complexity O(N[superscript 2] logN), where N is representative of the number of points in either dimension of data space or model space. Using a series of examples, we show that the proposed algorithm is significantly more efficient than conventional integration

    Randomized estimation of spectral densities of large matrices made accurate

    Full text link
    For a large Hermitian matrix AāˆˆCNƗNA\in \mathbb{C}^{N\times N}, it is often the case that the only affordable operation is matrix-vector multiplication. In such case, randomized method is a powerful way to estimate the spectral density (or density of states) of AA. However, randomized methods developed so far for estimating spectral densities only extract information from different random vectors independently, and the accuracy is therefore inherently limited to O(1/Nv)\mathcal{O}(1/\sqrt{N_{v}}) where NvN_{v} is the number of random vectors. In this paper we demonstrate that the "O(1/Nv)\mathcal{O}(1/\sqrt{N_{v}}) barrier" can be overcome by taking advantage of the correlated information of random vectors when properly filtered by polynomials of AA. Our method uses the fact that the estimation of the spectral density essentially requires the computation of the trace of a series of matrix functions that are numerically low rank. By repeatedly applying AA to the same set of random vectors and taking different linear combination of the results, we can sweep through the entire spectrum of AA by building such low rank decomposition at different parts of the spectrum. Under some assumptions, we demonstrate that a robust and efficient implementation of such spectrum sweeping method can compute the spectral density accurately with O(N2)\mathcal{O}(N^2) computational cost and O(N)\mathcal{O}(N) memory cost. Numerical results indicate that the new method can significantly outperform existing randomized methods in terms of accuracy. As an application, we demonstrate a way to accurately compute a trace of a smooth matrix function, by carefully balancing the smoothness of the integrand and the regularized density of states using a deconvolution procedure

    Monarch: Expressive Structured Matrices for Efficient and Accurate Training

    Full text link
    Large neural networks excel in many domains, but they are expensive to train and fine-tune. A popular approach to reduce their compute or memory requirements is to replace dense weight matrices with structured ones (e.g., sparse, low-rank, Fourier transform). These methods have not seen widespread adoption (1) in end-to-end training due to unfavorable efficiency--quality tradeoffs, and (2) in dense-to-sparse fine-tuning due to lack of tractable algorithms to approximate a given dense weight matrix. To address these issues, we propose a class of matrices (Monarch) that is hardware-efficient (they are parameterized as products of two block-diagonal matrices for better hardware utilization) and expressive (they can represent many commonly used transforms). Surprisingly, the problem of approximating a dense weight matrix with a Monarch matrix, though nonconvex, has an analytical optimal solution. These properties of Monarch matrices unlock new ways to train and fine-tune sparse and dense models. We empirically validate that Monarch can achieve favorable accuracy-efficiency tradeoffs in several end-to-end sparse training applications: speeding up ViT and GPT-2 training on ImageNet classification and Wikitext-103 language modeling by 2x with comparable model quality, and reducing the error on PDE solving and MRI reconstruction tasks by 40%. In sparse-to-dense training, with a simple technique called "reverse sparsification," Monarch matrices serve as a useful intermediate representation to speed up GPT-2 pretraining on OpenWebText by 2x without quality drop. The same technique brings 23% faster BERT pretraining than even the very optimized implementation from Nvidia that set the MLPerf 1.1 record. In dense-to-sparse fine-tuning, as a proof-of-concept, our Monarch approximation algorithm speeds up BERT fine-tuning on GLUE by 1.7x with comparable accuracy
    • ā€¦
    corecore