532 research outputs found

    Evaluating parametric holonomic sequences using rectangular splitting

    Full text link
    We adapt the rectangular splitting technique of Paterson and Stockmeyer to the problem of evaluating terms in holonomic sequences that depend on a parameter. This approach allows computing the nn-th term in a recurrent sequence of suitable type using O(n1/2)O(n^{1/2}) "expensive" operations at the cost of an increased number of "cheap" operations. Rectangular splitting has little overhead and can perform better than either naive evaluation or asymptotically faster algorithms for ranges of nn encountered in applications. As an example, fast numerical evaluation of the gamma function is investigated. Our work generalizes two previous algorithms of Smith.Comment: 8 pages, 2 figure

    Parallel Integer Polynomial Multiplication

    Get PDF
    We propose a new algorithm for multiplying dense polynomials with integer coefficients in a parallel fashion, targeting multi-core processor architectures. Complexity estimates and experimental comparisons demonstrate the advantages of this new approach

    A Streaming Multi-GPU Implementation of Image Simulation Algorithms for Scanning Transmission Electron Microscopy

    Full text link
    Simulation of atomic resolution image formation in scanning transmission electron microscopy can require significant computation times using traditional methods. A recently developed method, termed plane-wave reciprocal-space interpolated scattering matrix (PRISM), demonstrates potential for significant acceleration of such simulations with negligible loss of accuracy. Here we present a software package called Prismatic for parallelized simulation of image formation in scanning transmission electron microscopy (STEM) using both the PRISM and multislice methods. By distributing the workload between multiple CUDA-enabled GPUs and multicore processors, accelerations as high as 1000x for PRISM and 30x for multislice are achieved relative to traditional multislice implementations using a single 4-GPU machine. We demonstrate a potentially important application of Prismatic, using it to compute images for atomic electron tomography at sufficient speeds to include in the reconstruction pipeline. Prismatic is freely available both as an open-source CUDA/C++ package with a graphical user interface and as a Python package, PyPrismatic

    A fast and well-conditioned spectral method for singular integral equations

    Get PDF
    We develop a spectral method for solving univariate singular integral equations over unions of intervals by utilizing Chebyshev and ultraspherical polynomials to reformulate the equations as almost-banded infinite-dimensional systems. This is accomplished by utilizing low rank approximations for sparse representations of the bivariate kernels. The resulting system can be solved in O(m2n){\cal O}(m^2n) operations using an adaptive QR factorization, where mm is the bandwidth and nn is the optimal number of unknowns needed to resolve the true solution. The complexity is reduced to O(mn){\cal O}(m n) operations by pre-caching the QR factorization when the same operator is used for multiple right-hand sides. Stability is proved by showing that the resulting linear operator can be diagonally preconditioned to be a compact perturbation of the identity. Applications considered include the Faraday cage, and acoustic scattering for the Helmholtz and gravity Helmholtz equations, including spectrally accurate numerical evaluation of the far- and near-field solution. The Julia software package SingularIntegralEquations.jl implements our method with a convenient, user-friendly interface

    Correcting soft errors online in fast fourier transform

    Get PDF
    While many algorithm-based fault tolerance (ABFT) schemes have been proposed to detect soft errors offline in the fast Fourier transform (FFT) after computation finishes, none of the existing ABFT schemes detect soft errors online before the computation finishes. This paper presents an online ABFT scheme for FFT so that soft errors can be detected online and the corrupted computation can be terminated in a much more timely manner. We also extend our scheme to tolerate both arithmetic errors and memory errors, develop strategies to reduce its fault tolerance overhead and improve its numerical stability and fault coverage, and finally incorporate it into the widely used FFTW library - one of the today's fastest FFT software implementations. Experimental results demonstrate that: (1) the proposed online ABFT scheme introduces much lower overhead than the existing offline ABFT schemes; (2) it detects errors in a much more timely manner; and (3) it also has higher numerical stability and better fault coverage

    Fast algorithms and efficient GPU implementations for the Radon transform and the back-projection operator represented as convolution operators

    Full text link
    The Radon transform and its adjoint, the back-projection operator, can both be expressed as convolutions in log-polar coordinates. Hence, fast algorithms for the application of the operators can be constructed by using FFT, if data is resampled at log-polar coordinates. Radon data is typically measured on an equally spaced grid in polar coordinates, and reconstructions are represented (as images) in Cartesian coordinates. Therefore, in addition to FFT, several steps of interpolation have to be conducted in order to apply the Radon transform and the back-projection operator by means of convolutions. Both the interpolation and the FFT operations can be efficiently implemented on Graphical Processor Units (GPUs). For the interpolation, it is possible to make use of the fact that linear interpolation is hard-wired on GPUs, meaning that it has the same computational cost as direct memory access. Cubic order interpolation schemes can be constructed by combining linear interpolation steps which provides important computation speedup. We provide details about how the Radon transform and the back-projection can be implemented efficiently as convolution operators on GPUs. For large data sizes, speedups of about 10 times are obtained in relation to the computational times of other software packages based on GPU implementations of the Radon transform and the back-projection operator. Moreover, speedups of more than a 1000 times are obtained against the CPU-implementations provided in the MATLAB image processing toolbox
    • …
    corecore