532 research outputs found
Evaluating parametric holonomic sequences using rectangular splitting
We adapt the rectangular splitting technique of Paterson and Stockmeyer to
the problem of evaluating terms in holonomic sequences that depend on a
parameter. This approach allows computing the -th term in a recurrent
sequence of suitable type using "expensive" operations at the cost
of an increased number of "cheap" operations.
Rectangular splitting has little overhead and can perform better than either
naive evaluation or asymptotically faster algorithms for ranges of
encountered in applications. As an example, fast numerical evaluation of the
gamma function is investigated. Our work generalizes two previous algorithms of
Smith.Comment: 8 pages, 2 figure
Parallel Integer Polynomial Multiplication
We propose a new algorithm for multiplying dense polynomials with integer
coefficients in a parallel fashion, targeting multi-core processor
architectures. Complexity estimates and experimental comparisons demonstrate
the advantages of this new approach
A Streaming Multi-GPU Implementation of Image Simulation Algorithms for Scanning Transmission Electron Microscopy
Simulation of atomic resolution image formation in scanning transmission
electron microscopy can require significant computation times using traditional
methods. A recently developed method, termed plane-wave reciprocal-space
interpolated scattering matrix (PRISM), demonstrates potential for significant
acceleration of such simulations with negligible loss of accuracy. Here we
present a software package called Prismatic for parallelized simulation of
image formation in scanning transmission electron microscopy (STEM) using both
the PRISM and multislice methods. By distributing the workload between multiple
CUDA-enabled GPUs and multicore processors, accelerations as high as 1000x for
PRISM and 30x for multislice are achieved relative to traditional multislice
implementations using a single 4-GPU machine. We demonstrate a potentially
important application of Prismatic, using it to compute images for atomic
electron tomography at sufficient speeds to include in the reconstruction
pipeline. Prismatic is freely available both as an open-source CUDA/C++ package
with a graphical user interface and as a Python package, PyPrismatic
A fast and well-conditioned spectral method for singular integral equations
We develop a spectral method for solving univariate singular integral
equations over unions of intervals by utilizing Chebyshev and ultraspherical
polynomials to reformulate the equations as almost-banded infinite-dimensional
systems. This is accomplished by utilizing low rank approximations for sparse
representations of the bivariate kernels. The resulting system can be solved in
operations using an adaptive QR factorization, where is
the bandwidth and is the optimal number of unknowns needed to resolve the
true solution. The complexity is reduced to operations by
pre-caching the QR factorization when the same operator is used for multiple
right-hand sides. Stability is proved by showing that the resulting linear
operator can be diagonally preconditioned to be a compact perturbation of the
identity. Applications considered include the Faraday cage, and acoustic
scattering for the Helmholtz and gravity Helmholtz equations, including
spectrally accurate numerical evaluation of the far- and near-field solution.
The Julia software package SingularIntegralEquations.jl implements our method
with a convenient, user-friendly interface
Correcting soft errors online in fast fourier transform
While many algorithm-based fault tolerance (ABFT) schemes have been proposed to detect soft errors offline in the fast Fourier transform (FFT) after computation finishes, none of the existing ABFT schemes detect soft errors online before the computation finishes. This paper presents an online ABFT scheme for FFT so that soft errors can be detected online and the corrupted computation can be terminated in a much more timely manner. We also extend our scheme to tolerate both arithmetic errors and memory errors, develop strategies to reduce its fault tolerance overhead and improve its numerical stability and fault coverage, and finally incorporate it into the widely used FFTW library - one of the today's fastest FFT software implementations. Experimental results demonstrate that: (1) the proposed online ABFT scheme introduces much lower overhead than the existing offline ABFT schemes; (2) it detects errors in a much more timely manner; and (3) it also has higher numerical stability and better fault coverage
Fast algorithms and efficient GPU implementations for the Radon transform and the back-projection operator represented as convolution operators
The Radon transform and its adjoint, the back-projection operator, can both
be expressed as convolutions in log-polar coordinates. Hence, fast algorithms
for the application of the operators can be constructed by using FFT, if data
is resampled at log-polar coordinates. Radon data is typically measured on an
equally spaced grid in polar coordinates, and reconstructions are represented
(as images) in Cartesian coordinates. Therefore, in addition to FFT, several
steps of interpolation have to be conducted in order to apply the Radon
transform and the back-projection operator by means of convolutions.
Both the interpolation and the FFT operations can be efficiently implemented
on Graphical Processor Units (GPUs). For the interpolation, it is possible to
make use of the fact that linear interpolation is hard-wired on GPUs, meaning
that it has the same computational cost as direct memory access. Cubic order
interpolation schemes can be constructed by combining linear interpolation
steps which provides important computation speedup.
We provide details about how the Radon transform and the back-projection can
be implemented efficiently as convolution operators on GPUs. For large data
sizes, speedups of about 10 times are obtained in relation to the computational
times of other software packages based on GPU implementations of the Radon
transform and the back-projection operator. Moreover, speedups of more than a
1000 times are obtained against the CPU-implementations provided in the MATLAB
image processing toolbox
- …