3,165 research outputs found
Accelerated Modeling of Near and Far-Field Diffraction for Coronagraphic Optical Systems
Accurately predicting the performance of coronagraphs and tolerancing optical
surfaces for high-contrast imaging requires a detailed accounting of
diffraction effects. Unlike simple Fraunhofer diffraction modeling, near and
far-field diffraction effects, such as the Talbot effect, are captured by
plane-to-plane propagation using Fresnel and angular spectrum propagation. This
approach requires a sequence of computationally intensive Fourier transforms
and quadratic phase functions, which limit the design and aberration
sensitivity parameter space which can be explored at high-fidelity in the
course of coronagraph design. This study presents the results of optimizing the
multi-surface propagation module of the open source Physical Optics Propagation
in PYthon (POPPY) package. This optimization was performed by implementing and
benchmarking Fourier transforms and array operations on graphics processing
units, as well as optimizing multithreaded numerical calculations using the
NumExpr python library where appropriate, to speed the end-to-end simulation of
observatory and coronagraph optical systems. Using realistic systems, this
study demonstrates a greater than five-fold decrease in wall-clock runtime over
POPPY's previous implementation and describes opportunities for further
improvements in diffraction modeling performance.Comment: Presented at SPIE ASTI 2018, Austin Texas. 11 pages, 6 figure
A Multi-GPU Programming Library for Real-Time Applications
We present MGPU, a C++ programming library targeted at single-node multi-GPU
systems. Such systems combine disproportionate floating point performance with
high data locality and are thus well suited to implement real-time algorithms.
We describe the library design, programming interface and implementation
details in light of this specific problem domain. The core concepts of this
work are a novel kind of container abstraction and MPI-like communication
methods for intra-system communication. We further demonstrate how MGPU is used
as a framework for porting existing GPU libraries to multi-device
architectures. Putting our library to the test, we accelerate an iterative
non-linear image reconstruction algorithm for real-time magnetic resonance
imaging using multiple GPUs. We achieve a speed-up of about 1.7 using 2 GPUs
and reach a final speed-up of 2.1 with 4 GPUs. These promising results lead us
to conclude that multi-GPU systems are a viable solution for real-time MRI
reconstruction as well as signal-processing applications in general.Comment: 15 pages, 10 figure
A pseudospectral matrix method for time-dependent tensor fields on a spherical shell
We construct a pseudospectral method for the solution of time-dependent,
non-linear partial differential equations on a three-dimensional spherical
shell. The problem we address is the treatment of tensor fields on the sphere.
As a test case we consider the evolution of a single black hole in numerical
general relativity. A natural strategy would be the expansion in tensor
spherical harmonics in spherical coordinates. Instead, we consider the simpler
and potentially more efficient possibility of a double Fourier expansion on the
sphere for tensors in Cartesian coordinates. As usual for the double Fourier
method, we employ a filter to address time-step limitations and certain
stability issues. We find that a tensor filter based on spin-weighted spherical
harmonics is successful, while two simplified, non-spin-weighted filters do not
lead to stable evolutions. The derivatives and the filter are implemented by
matrix multiplication for efficiency. A key technical point is the construction
of a matrix multiplication method for the spin-weighted spherical harmonic
filter. As example for the efficient parallelization of the double Fourier,
spin-weighted filter method we discuss an implementation on a GPU, which
achieves a speed-up of up to a factor of 20 compared to a single core CPU
implementation.Comment: 33 pages, 9 figure
Fast hyperbolic Radon transform represented as convolutions in log-polar coordinates
The hyperbolic Radon transform is a commonly used tool in seismic processing,
for instance in seismic velocity analysis, data interpolation and for multiple
removal. A direct implementation by summation of traces with different moveouts
is computationally expensive for large data sets. In this paper we present a
new method for fast computation of the hyperbolic Radon transforms. It is based
on using a log-polar sampling with which the main computational parts reduce to
computing convolutions. This allows for fast implementations by means of FFT.
In addition to the FFT operations, interpolation procedures are required for
switching between coordinates in the time-offset; Radon; and log-polar domains.
Graphical Processor Units (GPUs) are suitable to use as a computational
platform for this purpose, due to the hardware supported interpolation routines
as well as optimized routines for FFT. Performance tests show large speed-ups
of the proposed algorithm. Hence, it is suitable to use in iterative methods,
and we provide examples for data interpolation and multiple removal using this
approach.Comment: 21 pages, 10 figures, 2 table
Accelerating the Fourier split operator method via graphics processing units
Current generations of graphics processing units have turned into highly
parallel devices with general computing capabilities. Thus, graphics processing
units may be utilized, for example, to solve time dependent partial
differential equations by the Fourier split operator method. In this
contribution, we demonstrate that graphics processing units are capable to
calculate fast Fourier transforms much more efficiently than traditional
central processing units. Thus, graphics processing units render efficient
implementations of the Fourier split operator method possible. Performance
gains of more than an order of magnitude as compared to implementations for
traditional central processing units are reached in the solution of the time
dependent Schr\"odinger equation and the time dependent Dirac equation
ARKCoS: Artifact-Suppressed Accelerated Radial Kernel Convolution on the Sphere
We describe a hybrid Fourier/direct space convolution algorithm for compact
radial (azimuthally symmetric) kernels on the sphere. For high resolution maps
covering a large fraction of the sky, our implementation takes advantage of the
inexpensive massive parallelism afforded by consumer graphics processing units
(GPUs). Applications involve modeling of instrumental beam shapes in terms of
compact kernels, computation of fine-scale wavelet transformations, and optimal
filtering for the detection of point sources. Our algorithm works for any
pixelization where pixels are grouped into isolatitude rings. Even for kernels
that are not bandwidth limited, ringing features are completely absent on an
ECP grid. We demonstrate that they can be highly suppressed on the popular
HEALPix pixelization, for which we develop a freely available implementation of
the algorithm. As an example application, we show that running on a high-end
consumer graphics card our method speeds up beam convolution for simulations of
a characteristic Planck high frequency instrument channel by two orders of
magnitude compared to the commonly used HEALPix implementation on one CPU core
while maintaining at typical a fractional RMS accuracy of about 1 part in 10^5.Comment: 10 pages, 6 figures. Submitted to Astronomy and Astrophysics.
Replaced to match published version. Code can be downloaded at
https://github.com/elsner/arkco
- …