422 research outputs found
ARKCoS: Artifact-Suppressed Accelerated Radial Kernel Convolution on the Sphere
We describe a hybrid Fourier/direct space convolution algorithm for compact
radial (azimuthally symmetric) kernels on the sphere. For high resolution maps
covering a large fraction of the sky, our implementation takes advantage of the
inexpensive massive parallelism afforded by consumer graphics processing units
(GPUs). Applications involve modeling of instrumental beam shapes in terms of
compact kernels, computation of fine-scale wavelet transformations, and optimal
filtering for the detection of point sources. Our algorithm works for any
pixelization where pixels are grouped into isolatitude rings. Even for kernels
that are not bandwidth limited, ringing features are completely absent on an
ECP grid. We demonstrate that they can be highly suppressed on the popular
HEALPix pixelization, for which we develop a freely available implementation of
the algorithm. As an example application, we show that running on a high-end
consumer graphics card our method speeds up beam convolution for simulations of
a characteristic Planck high frequency instrument channel by two orders of
magnitude compared to the commonly used HEALPix implementation on one CPU core
while maintaining at typical a fractional RMS accuracy of about 1 part in 10^5.Comment: 10 pages, 6 figures. Submitted to Astronomy and Astrophysics.
Replaced to match published version. Code can be downloaded at
https://github.com/elsner/arkco
Using hybrid GPU/CPU kernel splitting to accelerate spherical convolutions
We present a general method for accelerating by more than an order of
magnitude the convolution of pixelated functions on the sphere with a
radially-symmetric kernel. Our method splits the kernel into a compact
real-space component and a compact spherical harmonic space component. These
components can then be convolved in parallel using an inexpensive commodity GPU
and a CPU. We provide models for the computational cost of both real-space and
Fourier space convolutions and an estimate for the approximation error. Using
these models we can determine the optimum split that minimizes the wall clock
time for the convolution while satisfying the desired error bounds. We apply
this technique to the problem of simulating a cosmic microwave background (CMB)
anisotropy sky map at the resolution typical of the high resolution maps
produced by the Planck mission. For the main Planck CMB science channels we
achieve a speedup of over a factor of ten, assuming an acceptable fractional
rms error of order 1.e-5 in the power spectrum of the output map.Comment: 9 pages, 11 figures, 1 table, accepted by Astronomy & Computing w/
minor revisions. arXiv admin note: substantial text overlap with
arXiv:1211.355
Spherical harmonic transform with GPUs
We describe an algorithm for computing an inverse spherical harmonic
transform suitable for graphic processing units (GPU). We use CUDA and base our
implementation on a Fortran90 routine included in a publicly available parallel
package, S2HAT. We focus our attention on the two major sequential steps
involved in the transforms computation, retaining the efficient parallel
framework of the original code. We detail optimization techniques used to
enhance the performance of the CUDA-based code and contrast them with those
implemented in the Fortran90 version. We also present performance comparisons
of a single CPU plus GPU unit with the S2HAT code running on either a single or
4 processors. In particular we find that use of the latest generation of GPUs,
such as NVIDIA GF100 (Fermi), can accelerate the spherical harmonic transforms
by as much as 18 times with respect to S2HAT executed on one core, and by as
much as 5.5 with respect to S2HAT on 4 cores, with the overall performance
being limited by the Fast Fourier transforms. The work presented here has been
performed in the context of the Cosmic Microwave Background simulations and
analysis. However, we expect that the developed software will be of more
general interest and applicability
Efficient Spherical Harmonic Transforms aimed at pseudo-spectral numerical simulations
In this paper, we report on very efficient algorithms for the spherical
harmonic transform (SHT). Explicitly vectorized variations of the algorithm
based on the Gauss-Legendre quadrature are discussed and implemented in the
SHTns library which includes scalar and vector transforms. The main
breakthrough is to achieve very efficient on-the-fly computations of the
Legendre associated functions, even for very high resolutions, by taking
advantage of the specific properties of the SHT and the advanced capabilities
of current and future computers. This allows us to simultaneously and
significantly reduce memory usage and computation time of the SHT. We measure
the performance and accuracy of our algorithms. Even though the complexity of
the algorithms implemented in SHTns are in (where N is the maximum
harmonic degree of the transform), they perform much better than any third
party implementation, including lower complexity algorithms, even for
truncations as high as N=1023. SHTns is available at
https://bitbucket.org/nschaeff/shtns as open source software.Comment: 8 page
Parallel Spherical Harmonic Transforms on heterogeneous architectures (GPUs/multi-core CPUs)
Spherical Harmonic Transforms (SHT) are at the heart of many scientific and
practical applications ranging from climate modelling to cosmological
observations. In many of these areas new, cutting-edge science goals have been
recently proposed requiring simulations and analyses of experimental or
observational data at very high resolutions and of unprecedented volumes. Both
these aspects pose formidable challenge for the currently existing
implementations of the transforms.
This paper describes parallel algorithms for computing SHT with two variants
of intra-node parallelism appropriate for novel supercomputer architectures,
multi-core processors and Graphic Processing Units (GPU). It also discusses
their performance, alone and embedded within a top-level, MPI-based
parallelisation layer ported from the S2HAT library, in terms of their
accuracy, overall efficiency and scalability. We show that our inverse SHT run
on GeForce 400 Series GPUs equipped with latest CUDA architecture ("Fermi")
outperforms the state of the art implementation for a multi-core processor
executed on a current Intel Core i7-2600K. Furthermore, we show that an
MPI/CUDA version of the inverse transform run on a cluster of 128 Nvidia Tesla
S1070 is as much as 3 times faster than the hybrid MPI/OpenMP version executed
on the same number of quad-core processors Intel Nahalem for problem sizes
motivated by our target applications. Performance of the direct transforms is
however found to be at the best comparable in these cases. We discuss in detail
the algorithmic solutions devised for major steps involved in the transforms
calculation, emphasising those with a major impact on their overall
performance, and elucidates the sources of the dichotomy between the direct and
the inverse operations
Spherical Fourier Neural Operators: Learning Stable Dynamics on the Sphere
Fourier Neural Operators (FNOs) have proven to be an efficient and effective
method for resolution-independent operator learning in a broad variety of
application areas across scientific machine learning. A key reason for their
success is their ability to accurately model long-range dependencies in
spatio-temporal data by learning global convolutions in a computationally
efficient manner. To this end, FNOs rely on the discrete Fourier transform
(DFT), however, DFTs cause visual and spectral artifacts as well as pronounced
dissipation when learning operators in spherical coordinates since they
incorrectly assume a flat geometry. To overcome this limitation, we generalize
FNOs on the sphere, introducing Spherical FNOs (SFNOs) for learning operators
on spherical geometries. We apply SFNOs to forecasting atmospheric dynamics,
and demonstrate stable auto\-regressive rollouts for a year of simulated time
(1,460 steps), while retaining physically plausible dynamics. The SFNO has
important implications for machine learning-based simulation of climate
dynamics that could eventually help accelerate our response to climate change
Robust Object Classification Approach using Spherical Harmonics
Point clouds produced by either 3D scanners or multi-view images are often imperfect and contain noise or outliers. This paper presents an end-to-end robust spherical harmonics approach to classifying 3D objects. The proposed framework first uses the voxel grid of concentric spheres to learn features over the unit ball. We then limit the spherical harmonics order level to suppress the effect of noise and outliers. In addition, the entire classification operation is performed in the Fourier domain. As a result, our proposed model learned features that are less sensitive to data perturbations and corruptions. We tested our proposed model against several types of data perturbations and corruptions, such as noise and outliers. Our results show that the proposed model has fewer parameters, competes with state-of-art networks in terms of robustness to data inaccuracies, and is faster than other robust methods. Our implementation code is also publicly available1
Scalable and equivariant spherical CNNs by discrete-continuous (DISCO) convolutions
No existing spherical convolutional neural network (CNN) framework is both
computationally scalable and rotationally equivariant. Continuous approaches
capture rotational equivariance but are often prohibitively computationally demanding. Discrete approaches offer more favorable computational performance
but at the cost of equivariance. We develop a hybrid discrete-continuous (DISCO)
group convolution that is simultaneously equivariant and computationally scalable
to high-resolution. While our framework can be applied to any compact group, we
specialize to the sphere. Our DISCO spherical convolutions exhibit SO(3) rotational equivariance, where SO(n) is the special orthogonal group representing
rotations in n-dimensions. When restricting rotations of the convolution to the
quotient space SO(3)/SO(2) for further computational enhancements, we recover
a form of asymptotic SO(3) rotational equivariance. Through a sparse tensor implementation we achieve linear scaling in number of pixels on the sphere for both
computational cost and memory usage. For 4k spherical images we realize a saving of 109
in computational cost and 104
in memory usage when compared to the
most efficient alternative equivariant spherical convolution. We apply the DISCO
spherical CNN framework to a number of benchmark dense-prediction problems
on the sphere, such as semantic segmentation and depth estimation, on all of which
we achieve the state-of-the-art performance
- …