545 research outputs found
Accelerated Modeling of Near and Far-Field Diffraction for Coronagraphic Optical Systems
Accurately predicting the performance of coronagraphs and tolerancing optical
surfaces for high-contrast imaging requires a detailed accounting of
diffraction effects. Unlike simple Fraunhofer diffraction modeling, near and
far-field diffraction effects, such as the Talbot effect, are captured by
plane-to-plane propagation using Fresnel and angular spectrum propagation. This
approach requires a sequence of computationally intensive Fourier transforms
and quadratic phase functions, which limit the design and aberration
sensitivity parameter space which can be explored at high-fidelity in the
course of coronagraph design. This study presents the results of optimizing the
multi-surface propagation module of the open source Physical Optics Propagation
in PYthon (POPPY) package. This optimization was performed by implementing and
benchmarking Fourier transforms and array operations on graphics processing
units, as well as optimizing multithreaded numerical calculations using the
NumExpr python library where appropriate, to speed the end-to-end simulation of
observatory and coronagraph optical systems. Using realistic systems, this
study demonstrates a greater than five-fold decrease in wall-clock runtime over
POPPY's previous implementation and describes opportunities for further
improvements in diffraction modeling performance.Comment: Presented at SPIE ASTI 2018, Austin Texas. 11 pages, 6 figure
Application of graphics processing units to search pipelines for gravitational waves from coalescing binaries of compact objects
We report a novel application of a graphics processing unit (GPU) for the purpose of accelerating the search pipelines for gravitational waves from coalescing binaries of compact objects. A speed-up of 16-fold in total has been achieved with an NVIDIA GeForce 8800 Ultra GPU card compared with one core of a 2.5 GHz Intel Q9300 central processing unit (CPU). We show that substantial improvements are possible and discuss the reduction in CPU count required for the detection of inspiral sources afforded by the use of GPUs
BioEM: GPU-accelerated computing of Bayesian inference of electron microscopy images
In cryo-electron microscopy (EM), molecular structures are determined from
large numbers of projection images of individual particles. To harness the full
power of this single-molecule information, we use the Bayesian inference of EM
(BioEM) formalism. By ranking structural models using posterior probabilities
calculated for individual images, BioEM in principle addresses the challenge of
working with highly dynamic or heterogeneous systems not easily handled in
traditional EM reconstruction. However, the calculation of these posteriors for
large numbers of particles and models is computationally demanding. Here we
present highly parallelized, GPU-accelerated computer software that performs
this task efficiently. Our flexible formulation employs CUDA, OpenMP, and MPI
parallelization combined with both CPU and GPU computing. The resulting BioEM
software scales nearly ideally both on pure CPU and on CPU+GPU architectures,
thus enabling Bayesian analysis of tens of thousands of images in a reasonable
time. The general mathematical framework and robust algorithms are not limited
to cryo-electron microscopy but can be generalized for electron tomography and
other imaging experiments
Efficient Spherical Harmonic Transforms aimed at pseudo-spectral numerical simulations
In this paper, we report on very efficient algorithms for the spherical
harmonic transform (SHT). Explicitly vectorized variations of the algorithm
based on the Gauss-Legendre quadrature are discussed and implemented in the
SHTns library which includes scalar and vector transforms. The main
breakthrough is to achieve very efficient on-the-fly computations of the
Legendre associated functions, even for very high resolutions, by taking
advantage of the specific properties of the SHT and the advanced capabilities
of current and future computers. This allows us to simultaneously and
significantly reduce memory usage and computation time of the SHT. We measure
the performance and accuracy of our algorithms. Even though the complexity of
the algorithms implemented in SHTns are in (where N is the maximum
harmonic degree of the transform), they perform much better than any third
party implementation, including lower complexity algorithms, even for
truncations as high as N=1023. SHTns is available at
https://bitbucket.org/nschaeff/shtns as open source software.Comment: 8 page
BlackNUFFT: Modular customizable black box hybrid parallelization of type 3 NUFFT in 3D
Many applications benefit from an efficient Discrete Fourier Transform (DFT) between arbitrarily spaced points. The Non Uniform Fast Fourier Transform reduces the computational cost of such operation from to exploiting gridding algorithms and a standard Fast Fourier Transform on an equi-spaced grid. The parallelization of the NUFFT of type 3 (between arbitrary points in space and frequency) still poses some challenges: we present a novel and flexible hybrid parallelization in a MPI-multithreaded environment exploiting existing HPC libraries on modern architectures. To ensure the reliability of the developed library, we exploit continuous integration strategies using Travis CI. We present performance analyses to prove the effectiveness of our implementation, possible extensions to the existing library, and an application of NUFFT type 3 to MRI image processing
- …