20 research outputs found
Surface Reconstruction from Scattered Point via RBF Interpolation on GPU
In this paper we describe a parallel implicit method based on radial basis
functions (RBF) for surface reconstruction. The applicability of RBF methods is
hindered by its computational demand, that requires the solution of linear
systems of size equal to the number of data points. Our reconstruction
implementation relies on parallel scientific libraries and is supported for
massively multi-core architectures, namely Graphic Processor Units (GPUs). The
performance of the proposed method in terms of accuracy of the reconstruction
and computing time shows that the RBF interpolant can be very effective for
such problem.Comment: arXiv admin note: text overlap with arXiv:0909.5413 by other author
Optimal, scalable forward models for computing gravity anomalies
We describe three approaches for computing a gravity signal from a density
anomaly. The first approach consists of the classical "summation" technique,
whilst the remaining two methods solve the Poisson problem for the
gravitational potential using either a Finite Element (FE) discretization
employing a multilevel preconditioner, or a Green's function evaluated with the
Fast Multipole Method (FMM). The methods utilizing the PDE formulation
described here differ from previously published approaches used in gravity
modeling in that they are optimal, implying that both the memory and
computational time required scale linearly with respect to the number of
unknowns in the potential field. Additionally, all of the implementations
presented here are developed such that the computations can be performed in a
massively parallel, distributed memory computing environment. Through numerical
experiments, we compare the methods on the basis of their discretization error,
CPU time and parallel scalability. We demonstrate the parallel scalability of
all these techniques by running forward models with up to voxels on
1000's of cores.Comment: 38 pages, 13 figures; accepted by Geophysical Journal Internationa
Optimal, scalable forward models for computing gravity anomalies
We describe three approaches for computing a gravity signal from a density anomaly. The first approach consists of the classical ‘summation' technique, while the remaining two methods solve the Poisson problem for the gravitational potential using either a finite-element (FE) discretization employing a multilevel pre-conditioner, or a Green′s function evaluated with the fast multipole method (FMM). The methods using the Poisson formulation described here differ from previously published approaches used in gravity modelling in that they are optimal, implying that both the memory and computational time required scale linearly with respect to the number of unknowns in the potential field. Additionally, all of the implementations presented here are developed such that the computations can be performed in a massively parallel, distributed memory-computing environment. Through numerical experiments, we compare the methods on the basis of their discretization error, CPU time and parallel scalability. We demonstrate the parallel scalability of all these techniques by running forward models with up to 108 voxels on 1000s of core
Algorithmic patterns for -matrices on many-core processors
In this work, we consider the reformulation of hierarchical ()
matrix algorithms for many-core processors with a model implementation on
graphics processing units (GPUs). matrices approximate specific
dense matrices, e.g., from discretized integral equations or kernel ridge
regression, leading to log-linear time complexity in dense matrix-vector
products. The parallelization of matrix operations on many-core
processors is difficult due to the complex nature of the underlying algorithms.
While previous algorithmic advances for many-core hardware focused on
accelerating existing matrix CPU implementations by many-core
processors, we here aim at totally relying on that processor type. As main
contribution, we introduce the necessary parallel algorithmic patterns allowing
to map the full matrix construction and the fast matrix-vector
product to many-core hardware. Here, crucial ingredients are space filling
curves, parallel tree traversal and batching of linear algebra operations. The
resulting model GPU implementation hmglib is the, to the best of the authors
knowledge, first entirely GPU-based Open Source matrix library of
this kind. We conclude this work by an in-depth performance analysis and a
comparative performance study against a standard matrix library,
highlighting profound speedups of our many-core parallel approach
Recommended from our members
Block preconditioners for linear systems arising from multiscale collocation with compactly supported RBFs
Symmetric collocation methods with radial basis functions allow
approximation of the solution of a partial differential equation, even if the
right-hand side is only known at scattered data points, without needing to
generate a grid. However, the benefit of a guaranteed symmetric positive
definite block system comes at a high computational cost. This cost can be
alleviated somewhat by considering compactly supported radial basis functions
and a multiscale technique. But the condition number and sparsity will still
deteriorate with the number of data points. Therefore, we study certain block
diagonal and triangular preconditioners. We investigate ideal preconditioners
and determine the spectra of the preconditioned matrices before proposing
more practical preconditioners based on a restricted additive Schwarz method
with coarse grid correction (ARASM). Numerical results verify the
effectiveness of the preconditioners
PARALLEL MESHLESS RADIAL BASIS FUNCTION COLLOCATION METHOD FOR NEUTRON DIFFUSION PROBLEMS
The meshless global radial basis function (RBF) collocation method is widely used to model physical phenomena in science and engineering. The method produces highly accurate solutions with an exponential convergence rate. However, due to the global approximation structure of the method, dense node distributions lead to long computation times and hinder the applicability of the technique. In order to overcome this issue, this study proposes a parallel meshless global RBF collocation algorithm. The algorithm is applied to 2-D neutron diffusion problems. The multiquadric is used as the RBF. The algorithm is developed with Mathematica and eight virtual processors are used in calculations on a multicore computer with four physical cores. The method provides accurate numerical results in a stable manner. Parallel speedup increases with the number of processors up to five and seven processors for external and fission source problems, respectively. The speedup values are limited by the constrained resource sharing of the multicore computer’s memory. On the other hand, significant time savings are achieved with parallel computation. For the four-group fission source problem, when 4316 interpolation nodes are employed, the utilization of seven processors instead of sequential computation decreases the computation time of the meshless approach by 716 s
FMM-based vortex method for simulation of isotropic turbulence on GPUs, compared with a spectral method
The Lagrangian vortex method offers an alternative numerical approach for
direct numerical simulation of turbulence. The fact that it uses the fast
multipole method (FMM)--a hierarchical algorithm for N-body problems with
highly scalable parallel implementations--as numerical engine makes it a
potentially good candidate for exascale systems. However, there have been few
validation studies of Lagrangian vortex simulations and the insufficient
comparisons against standard DNS codes has left ample room for skepticism. This
paper presents a comparison between a Lagrangian vortex method and a
pseudo-spectral method for the simulation of decaying homogeneous isotropic
turbulence. This flow field is chosen despite the fact that it is not the most
favorable flow problem for particle methods (which shine in wake flows or where
vorticity is compact), due to the fact that it is ideal for the quantitative
validation of DNS codes. We use a 256^3 grid with Re_lambda=50 and 100 and look
at the turbulence statistics, including high-order moments. The focus is on the
effect of the various parameters in the vortex method, e.g., order of FMM
series expansion, frequency of reinitialization, overlap ratio and time step.
The vortex method uses an FMM code (exaFMM) that runs on GPU hardware using
CUDA, while the spectral code (hit3d) runs on CPU only. Results indicate that,
for this application (and with the current code implementations), the spectral
method is an order of magnitude faster than the vortex method when using a
single GPU for the FMM and six CPU cores for the FFT
Pole assignment control design for time–varying time–delay systems using radial basis functions
Systems with time-varying time delays present a particularly challenging control problem. They have been observed across a wide array of domains, from hydraulic actuators to insulin delivery control systems. Control systems that address system time-delays, nonlinearities and uncertainty are the subject of much research but, whilst the specific concept of varying time delays is sometimes acknowledged (for example in the control of hydraulic manipulators), this appears to be less widely investigated than some other types of nonlinearity. In part motivated by recent research into internal multi-model control, as similarly applied to systems with unknown time-varying delays, the present work utilises a Gaussian radial basis function to switch between two or more partial controllers. Each partial controller is based on a linear model with a (time-invariant) time delay. The new algorithm is developed and evaluated via simulation using a non-minimal state space (NMSS) framework, with pole assignment as the design criterion. Simulation results suggest that it yields improved performance in comparison to a simpler switching approach and the equivalent linear control system. However, laboratory examples and further research into robustness and stability is required in the next step