6,225 research outputs found

    Recent progress and challenges in exploiting graphics processors in computational fluid dynamics

    Full text link
    The progress made in accelerating simulations of fluid flow using GPUs, and the challenges that remain, are surveyed. The review first provides an introduction to GPU computing and programming, and discusses various considerations for improved performance. Case studies comparing the performance of CPU- and GPU- based solvers for the Laplace and incompressible Navier-Stokes equations are performed in order to demonstrate the potential improvement even with simple codes. Recent efforts to accelerate CFD simulations using GPUs are reviewed for laminar, turbulent, and reactive flow solvers. Also, GPU implementations of the lattice Boltzmann method are reviewed. Finally, recommendations for implementing CFD codes on GPUs are given and remaining challenges are discussed, such as the need to develop new strategies and redesign algorithms to enable GPU acceleration.Comment: In press in the Journal of Supercomputin

    Layered Depth-Normal Images: a Sparse Implicit Representation of Solid Models

    Full text link
    This paper presents a novel implicit representation of solid models. With this representation, every solid model can be effectively presented by three layered depth-normal images (LDNIs) that are perpendicular to three orthogonal axes respectively. The layered depth-normal images for a solid model, whose boundary is presented by a polygonal mesh, can be generated efficiently with help of the graphics hardware accelerated sampling. Based on this implicit representation - LDNIs, solid modeling operations including the Boolean operations and the offsetting operation have been developed. A contouring algorithm is also introduced in this paper to generate thin structure and sharp feature preserved mesh surfaces from the layered depth-normal images. Comparisons between LDNIs and other implicit representation of solid models are given at the end of the paper to demonstrate the advantages of LDNIs

    Biomolecular electrostatics using a fast multipole BEM on up to 512 GPUs and a billion unknowns

    Full text link
    We present teraflop-scale calculations of biomolecular electrostatics enabled by the combination of algorithmic and hardware acceleration. The algorithmic acceleration is achieved with the fast multipole method (FMM) in conjunction with a boundary element method (BEM) formulation of the continuum electrostatic model, as well as the BIBEE approximation to BEM. The hardware acceleration is achieved through graphics processors, GPUs. We demonstrate the power of our algorithms and software for the calculation of the electrostatic interactions between biological molecules in solution. The applications demonstrated include the electrostatics of protein--drug binding and several multi-million atom systems consisting of hundreds to thousands of copies of lysozyme molecules. The parallel scalability of the software was studied in a cluster at the Nagasaki Advanced Computing Center, using 128 nodes, each with 4 GPUs. Delicate tuning has resulted in strong scaling with parallel efficiency of 0.8 for 256 and 0.5 for 512 GPUs. The largest application run, with over 20 million atoms and one billion unknowns, required only one minute on 512 GPUs. We are currently adapting our BEM software to solve the linearized Poisson-Boltzmann equation for dilute ionic solutions, and it is also designed to be flexible enough to be extended for a variety of integral equation problems, ranging from Poisson problems to Helmholtz problems in electromagnetics and acoustics to high Reynolds number flow

    Haptic Assembly Using Skeletal Densities and Fourier Transforms

    Full text link
    Haptic-assisted virtual assembly and prototyping has seen significant attention over the past two decades. However, in spite of the appealing prospects, its adoption has been slower than expected. We identify the main roadblocks as the inherent geometric complexities faced when assembling objects of arbitrary shape, and the computation time limitation imposed by the notorious 1 kHz haptic refresh rate. We addressed the first problem in a recent work by introducing a generic energy model for geometric guidance and constraints between features of arbitrary shape. In the present work, we address the second challenge by leveraging Fourier transforms to compute the constraint forces and torques. Our new concept of 'geometric energy' field is computed automatically from a cross-correlation of 'skeletal densities' in the frequency domain, and serves as a generalization of the manually specified virtual fixtures or heuristically identified mating constraints proposed in the literature. The formulation of the energy field as a convolution enables efficient computation using fast Fourier transforms (FFT) on the graphics processing unit (GPU). We show that our method is effective for low-clearance assembly of objects of arbitrary geometric and syntactic complexity.Comment: A shorter version was presented in ASME Computers and Information in Engineering Conference (CIE'2015) (Best Paper Award

    A Fast Free-viewpoint Video Synthesis Algorithm for Sports Scenes

    Full text link
    In this paper, we report on a parallel freeviewpoint video synthesis algorithm that can efficiently reconstruct a high-quality 3D scene representation of sports scenes. The proposed method focuses on a scene that is captured by multiple synchronized cameras featuring wide-baselines. The following strategies are introduced to accelerate the production of a free-viewpoint video taking the improvement of visual quality into account: (1) a sparse point cloud is reconstructed using a volumetric visual hull approach, and an exact 3D ROI is found for each object using an efficient connected components labeling algorithm. Next, the reconstruction of a dense point cloud is accelerated by implementing visual hull only in the ROIs; (2) an accurate polyhedral surface mesh is built by estimating the exact intersections between grid cells and the visual hull; (3) the appearance of the reconstructed presentation is reproduced in a view-dependent manner that respectively renders the non-occluded and occluded region with the nearest camera and its neighboring cameras. The production for volleyball and judo sequences demonstrates the effectiveness of our method in terms of both execution time and visual quality.Comment: 7 pages, 11 figure

    GPU accelerated fast multipole boundary element method for simulation of 3D bubble dynamics in potential flow

    Full text link
    A numerical method for simulation of bubble dynamics in three-dimensional potential flows is presented. The approach is based on the boundary element method for the Laplace equation accelerated via the fast multipole method implemented on a heterogeneous CPU/GPU architecture. For mesh stabilization, a new smoothing technique using a surface filter is presented. This technique relies on spherical harmonics expansion of surface functions for bubbles topologically equivalent to a sphere (or Fourier series for toroidal bubbles). The method is validated by comparisons with solutions available in the literature and convergence studies for bubbles in acoustic fields. The accuracy and performance of the algorithm are discussed. It is demonstrated that the approach enables simulation of dynamics of bubble clusters with thousands of bubbles and millions of boundary elements on contemporary personal workstations. The algorithm is scalable and can be extended to larger systems.Comment: This paper is intended to be published in some journal and prepared in the format required by Computational Mechanic

    Efficient Tsunami Modeling on Adaptive Grids with Graphics Processing Units (GPUs)

    Full text link
    Solving the shallow water equations efficiently is critical to the study of natural hazards induced by tsunami and storm surge, since it provides more response time in an early warning system and allows more runs to be done for probabilistic assessment where thousands of runs may be required. Using Adaptive Mesh Refinement (AMR) speeds up the process by greatly reducing computational demands, while accelerating the code using the Graphics Processing Unit (GPU) does so through using faster hardware. Combining both, we present an efficient CUDA implementation of GeoClaw, an open source Godunov-type high-resolution finite volume numerical scheme on adaptive grids for shallow water system with varying topography. The use of AMR and spherical coordinates allows modeling transoceanic tsunami simulation. Numerical experiments on several realistic tsunami modeling problems illustrate the correctness and efficiency of the code, which implements a simplified dimensionally-split version of the algorithms. This implementation is shown to be accurate and faster than the original when using CPUs alone. The GPU implementation, when running on a single GPU, is observed to be 3.6 to 6.4 times faster than the original model running in parallel on a 16-core CPU. Three metrics are proposed to evaluate relative performance of the model, which shows efficient usage of hardware resources

    Memory footprint reduction for the FFT-based volume integral equation method via tensor decompositions

    Full text link
    We present a method of memory footprint reduction for FFT-based, electromagnetic (EM) volume integral equation (VIE) formulations. The arising Green's function tensors have low multilinear rank, which allows Tucker decomposition to be employed for their compression, thereby greatly reducing the required memory storage for numerical simulations. Consequently, the compressed components are able to fit inside a graphical processing unit (GPU) on which highly parallelized computations can vastly accelerate the iterative solution of the arising linear system. In addition, the element-wise products throughout the iterative solver's process require additional flops, thus, we provide a variety of novel and efficient methods that maintain the linear complexity of the classic element-wise product with an additional multiplicative small constant. We demonstrate the utility of our approach via its application to VIE simulations for the Magnetic Resonance Imaging (MRI) of a human head. For these simulations we report an order of magnitude acceleration over standard techniques.Comment: 11 pages, 10 figures, 5 tables, 2 algorithms, journa

    A GPU accelerated Barnes-Hut Tree Code for FLASH4

    Full text link
    We present a GPU accelerated CUDA-C implementation of the Barnes Hut (BH) tree code for calculating the gravitational potential on octree adaptive meshes. The tree code algorithm is implemented within the FLASH4 adaptive mesh refinement (AMR) code framework and therefore fully MPI parallel. We describe the algorithm and present test results that demonstrate its accuracy and performance in comparison to the algorithms available in the current FLASH4 version. We use a MacLaurin spheroid to test the accuracy of our new implementation and use spherical, collapsing cloud cores with effective AMR to carry out performance tests also in comparison with previous gravity solvers. Depending on the setup and the GPU/CPU ratio, we find a speedup for the gravity unit of at least a factor of 3 and up to 60 in comparison to the gravity solvers implemented in the FLASH4 code. We find an overall speedup factor for full simulations of at least factor 1.6 up to a factor of 10Comment: For further information see: http://www.hs.uni-hamburg.de/gpub

    3-D nonlinear force-free field reconstruction of solar active region 11158 by direct boundary integral equation

    Full text link
    A 3-D coronal magnetic field is reconstructed for NOAA 11158 on Feb 14, 2011. A GPU-accelerated direct boundary integral equation (DBIE) method is implemented. This is about 1000 times faster than the original DBIE used on solar NLFFF modeling. Using the SDO/HMI vector magnetogram as the bottom boundary condition, the reconstructed magnetic field lines are compared with the projected EUV loop structures from different views three-dimensionally by SDO/AIA and STEREO A/B spacecraft simultaneously for the first time. They show very good agreement so that the topological configurations of the magnetic fields can be analyzed, thus its role in the flare process of the active region can be better understood. A quantitative comparison with some stereoscopically reconstructed coronal loops shows that the present averaged misalignment angles are at the same order as the state-of-the-art results obtained with reconstructed coronal loops as prescribed conditions and better than other NLFFF methods. It is found that the observed coronal loop structures can be grouped into bundles of closed and open loops with some central bright coronal loops around the polarity inversion line (PIL). The reconstructed highly-shearing magnetic field lines agree very well with the low-lying S-shaped filament channel along PIL. They are in a pivot position to all other surrounding coronal structures, and a group of electric current lines co-aligned with the central bright EUV loops overlying the filament channel is also obtained. This central lower-lying magnetic field loop system must have played a key role in powering the flare. It should be noted that while a strand-like coronal feature along PIL may be related to the filament, one cannot simply attribute all the coronal bright features along PIL to manifestation of the filament without any stereoscopically information. It shows that DBIE is rigorous and effective.Comment: Solar Physics, accepte
    • …
    corecore