6,225 research outputs found
Recent progress and challenges in exploiting graphics processors in computational fluid dynamics
The progress made in accelerating simulations of fluid flow using GPUs, and
the challenges that remain, are surveyed. The review first provides an
introduction to GPU computing and programming, and discusses various
considerations for improved performance. Case studies comparing the performance
of CPU- and GPU- based solvers for the Laplace and incompressible Navier-Stokes
equations are performed in order to demonstrate the potential improvement even
with simple codes. Recent efforts to accelerate CFD simulations using GPUs are
reviewed for laminar, turbulent, and reactive flow solvers. Also, GPU
implementations of the lattice Boltzmann method are reviewed. Finally,
recommendations for implementing CFD codes on GPUs are given and remaining
challenges are discussed, such as the need to develop new strategies and
redesign algorithms to enable GPU acceleration.Comment: In press in the Journal of Supercomputin
Layered Depth-Normal Images: a Sparse Implicit Representation of Solid Models
This paper presents a novel implicit representation of solid models. With
this representation, every solid model can be effectively presented by three
layered depth-normal images (LDNIs) that are perpendicular to three orthogonal
axes respectively. The layered depth-normal images for a solid model, whose
boundary is presented by a polygonal mesh, can be generated efficiently with
help of the graphics hardware accelerated sampling. Based on this implicit
representation - LDNIs, solid modeling operations including the Boolean
operations and the offsetting operation have been developed. A contouring
algorithm is also introduced in this paper to generate thin structure and sharp
feature preserved mesh surfaces from the layered depth-normal images.
Comparisons between LDNIs and other implicit representation of solid models are
given at the end of the paper to demonstrate the advantages of LDNIs
Biomolecular electrostatics using a fast multipole BEM on up to 512 GPUs and a billion unknowns
We present teraflop-scale calculations of biomolecular electrostatics enabled
by the combination of algorithmic and hardware acceleration. The algorithmic
acceleration is achieved with the fast multipole method (FMM) in conjunction
with a boundary element method (BEM) formulation of the continuum electrostatic
model, as well as the BIBEE approximation to BEM. The hardware acceleration is
achieved through graphics processors, GPUs. We demonstrate the power of our
algorithms and software for the calculation of the electrostatic interactions
between biological molecules in solution. The applications demonstrated include
the electrostatics of protein--drug binding and several multi-million atom
systems consisting of hundreds to thousands of copies of lysozyme molecules.
The parallel scalability of the software was studied in a cluster at the
Nagasaki Advanced Computing Center, using 128 nodes, each with 4 GPUs. Delicate
tuning has resulted in strong scaling with parallel efficiency of 0.8 for 256
and 0.5 for 512 GPUs. The largest application run, with over 20 million atoms
and one billion unknowns, required only one minute on 512 GPUs. We are
currently adapting our BEM software to solve the linearized Poisson-Boltzmann
equation for dilute ionic solutions, and it is also designed to be flexible
enough to be extended for a variety of integral equation problems, ranging from
Poisson problems to Helmholtz problems in electromagnetics and acoustics to
high Reynolds number flow
Haptic Assembly Using Skeletal Densities and Fourier Transforms
Haptic-assisted virtual assembly and prototyping has seen significant
attention over the past two decades. However, in spite of the appealing
prospects, its adoption has been slower than expected. We identify the main
roadblocks as the inherent geometric complexities faced when assembling objects
of arbitrary shape, and the computation time limitation imposed by the
notorious 1 kHz haptic refresh rate. We addressed the first problem in a recent
work by introducing a generic energy model for geometric guidance and
constraints between features of arbitrary shape. In the present work, we
address the second challenge by leveraging Fourier transforms to compute the
constraint forces and torques. Our new concept of 'geometric energy' field is
computed automatically from a cross-correlation of 'skeletal densities' in the
frequency domain, and serves as a generalization of the manually specified
virtual fixtures or heuristically identified mating constraints proposed in the
literature. The formulation of the energy field as a convolution enables
efficient computation using fast Fourier transforms (FFT) on the graphics
processing unit (GPU). We show that our method is effective for low-clearance
assembly of objects of arbitrary geometric and syntactic complexity.Comment: A shorter version was presented in ASME Computers and Information in
Engineering Conference (CIE'2015) (Best Paper Award
A Fast Free-viewpoint Video Synthesis Algorithm for Sports Scenes
In this paper, we report on a parallel freeviewpoint video synthesis
algorithm that can efficiently reconstruct a high-quality 3D scene
representation of sports scenes. The proposed method focuses on a scene that is
captured by multiple synchronized cameras featuring wide-baselines. The
following strategies are introduced to accelerate the production of a
free-viewpoint video taking the improvement of visual quality into account: (1)
a sparse point cloud is reconstructed using a volumetric visual hull approach,
and an exact 3D ROI is found for each object using an efficient connected
components labeling algorithm. Next, the reconstruction of a dense point cloud
is accelerated by implementing visual hull only in the ROIs; (2) an accurate
polyhedral surface mesh is built by estimating the exact intersections between
grid cells and the visual hull; (3) the appearance of the reconstructed
presentation is reproduced in a view-dependent manner that respectively renders
the non-occluded and occluded region with the nearest camera and its
neighboring cameras. The production for volleyball and judo sequences
demonstrates the effectiveness of our method in terms of both execution time
and visual quality.Comment: 7 pages, 11 figure
GPU accelerated fast multipole boundary element method for simulation of 3D bubble dynamics in potential flow
A numerical method for simulation of bubble dynamics in three-dimensional
potential flows is presented. The approach is based on the boundary element
method for the Laplace equation accelerated via the fast multipole method
implemented on a heterogeneous CPU/GPU architecture. For mesh stabilization, a
new smoothing technique using a surface filter is presented. This technique
relies on spherical harmonics expansion of surface functions for bubbles
topologically equivalent to a sphere (or Fourier series for toroidal bubbles).
The method is validated by comparisons with solutions available in the
literature and convergence studies for bubbles in acoustic fields. The accuracy
and performance of the algorithm are discussed. It is demonstrated that the
approach enables simulation of dynamics of bubble clusters with thousands of
bubbles and millions of boundary elements on contemporary personal
workstations. The algorithm is scalable and can be extended to larger systems.Comment: This paper is intended to be published in some journal and prepared
in the format required by Computational Mechanic
Efficient Tsunami Modeling on Adaptive Grids with Graphics Processing Units (GPUs)
Solving the shallow water equations efficiently is critical to the study of
natural hazards induced by tsunami and storm surge, since it provides more
response time in an early warning system and allows more runs to be done for
probabilistic assessment where thousands of runs may be required. Using
Adaptive Mesh Refinement (AMR) speeds up the process by greatly reducing
computational demands, while accelerating the code using the Graphics
Processing Unit (GPU) does so through using faster hardware. Combining both, we
present an efficient CUDA implementation of GeoClaw, an open source
Godunov-type high-resolution finite volume numerical scheme on adaptive grids
for shallow water system with varying topography. The use of AMR and spherical
coordinates allows modeling transoceanic tsunami simulation. Numerical
experiments on several realistic tsunami modeling problems illustrate the
correctness and efficiency of the code, which implements a simplified
dimensionally-split version of the algorithms. This implementation is shown to
be accurate and faster than the original when using CPUs alone. The GPU
implementation, when running on a single GPU, is observed to be 3.6 to 6.4
times faster than the original model running in parallel on a 16-core CPU.
Three metrics are proposed to evaluate relative performance of the model, which
shows efficient usage of hardware resources
Memory footprint reduction for the FFT-based volume integral equation method via tensor decompositions
We present a method of memory footprint reduction for FFT-based,
electromagnetic (EM) volume integral equation (VIE) formulations. The arising
Green's function tensors have low multilinear rank, which allows Tucker
decomposition to be employed for their compression, thereby greatly reducing
the required memory storage for numerical simulations. Consequently, the
compressed components are able to fit inside a graphical processing unit (GPU)
on which highly parallelized computations can vastly accelerate the iterative
solution of the arising linear system. In addition, the element-wise products
throughout the iterative solver's process require additional flops, thus, we
provide a variety of novel and efficient methods that maintain the linear
complexity of the classic element-wise product with an additional
multiplicative small constant. We demonstrate the utility of our approach via
its application to VIE simulations for the Magnetic Resonance Imaging (MRI) of
a human head. For these simulations we report an order of magnitude
acceleration over standard techniques.Comment: 11 pages, 10 figures, 5 tables, 2 algorithms, journa
A GPU accelerated Barnes-Hut Tree Code for FLASH4
We present a GPU accelerated CUDA-C implementation of the Barnes Hut (BH)
tree code for calculating the gravitational potential on octree adaptive
meshes. The tree code algorithm is implemented within the FLASH4 adaptive mesh
refinement (AMR) code framework and therefore fully MPI parallel. We describe
the algorithm and present test results that demonstrate its accuracy and
performance in comparison to the algorithms available in the current FLASH4
version. We use a MacLaurin spheroid to test the accuracy of our new
implementation and use spherical, collapsing cloud cores with effective AMR to
carry out performance tests also in comparison with previous gravity solvers.
Depending on the setup and the GPU/CPU ratio, we find a speedup for the gravity
unit of at least a factor of 3 and up to 60 in comparison to the gravity
solvers implemented in the FLASH4 code. We find an overall speedup factor for
full simulations of at least factor 1.6 up to a factor of 10Comment: For further information see: http://www.hs.uni-hamburg.de/gpub
3-D nonlinear force-free field reconstruction of solar active region 11158 by direct boundary integral equation
A 3-D coronal magnetic field is reconstructed for NOAA 11158 on Feb 14, 2011.
A GPU-accelerated direct boundary integral equation (DBIE) method is
implemented. This is about 1000 times faster than the original DBIE used on
solar NLFFF modeling. Using the SDO/HMI vector magnetogram as the bottom
boundary condition, the reconstructed magnetic field lines are compared with
the projected EUV loop structures from different views three-dimensionally by
SDO/AIA and STEREO A/B spacecraft simultaneously for the first time. They show
very good agreement so that the topological configurations of the magnetic
fields can be analyzed, thus its role in the flare process of the active region
can be better understood. A quantitative comparison with some stereoscopically
reconstructed coronal loops shows that the present averaged misalignment angles
are at the same order as the state-of-the-art results obtained with
reconstructed coronal loops as prescribed conditions and better than other
NLFFF methods. It is found that the observed coronal loop structures can be
grouped into bundles of closed and open loops with some central bright coronal
loops around the polarity inversion line (PIL). The reconstructed
highly-shearing magnetic field lines agree very well with the low-lying
S-shaped filament channel along PIL. They are in a pivot position to all other
surrounding coronal structures, and a group of electric current lines
co-aligned with the central bright EUV loops overlying the filament channel is
also obtained. This central lower-lying magnetic field loop system must have
played a key role in powering the flare. It should be noted that while a
strand-like coronal feature along PIL may be related to the filament, one
cannot simply attribute all the coronal bright features along PIL to
manifestation of the filament without any stereoscopically information. It
shows that DBIE is rigorous and effective.Comment: Solar Physics, accepte
- …