Search CORE

14,074 research outputs found

A scalable H-matrix approach for the solution of boundary integral equations on multi-GPU clusters

Author: Harbrecht Helmut
Zaspel Peter
Publication venue
Publication date: 01/06/2018
Field of study

In this work, we consider the solution of boundary integral equations by means of a scalable hierarchical matrix approach on clusters equipped with graphics hardware, i.e. graphics processing units (GPUs). To this end, we extend our existing single-GPU hierarchical matrix library hmglib such that it is able to scale on many GPUs and such that it can be coupled to arbitrary application codes. Using a model GPU implementation of a boundary element method (BEM) solver, we are able to achieve more than 67 percent relative parallel speed-up going from 128 to 1024 GPUs for a model geometry test case with 1.5 million unknowns and a real-world geometry test case with almost 1.2 million unknowns. On 1024 GPUs of the cluster Titan, it takes less than 6 minutes to solve the 1.5 million unknowns problem, with 5.7 minutes for the setup phase and 20 seconds for the iterative solver. To the best of the authors' knowledge, we here discuss the first fully GPU-based distributed-memory parallel hierarchical matrix Open Source library using the traditional H-matrix format and adaptive cross approximation with an application to BEM problems

arXiv.org e-Print Archive

edoc

A minimalistic approach for fast computation of geodesic distances on triangular meshes

Author: Calla Luciano A. Romero
Montenegro Anselmo A.
Perez Lizeth J. Fuentes
Publication venue: 'Elsevier BV'
Publication date: 23/08/2019
Field of study

The computation of geodesic distances is an important research topic in Geometry Processing and 3D Shape Analysis as it is a basic component of many methods used in these areas. In this work, we present a minimalistic parallel algorithm based on front propagation to compute approximate geodesic distances on meshes. Our method is practical and simple to implement and does not require any heavy pre-processing. The convergence of our algorithm depends on the number of discrete level sets around the source points from which distance information propagates. To appropriately implement our method on GPUs taking into account memory coalescence problems, we take advantage of a graph representation based on a breadth-first search traversal that works harmoniously with our parallel front propagation approach. We report experiments that show how our method scales with the size of the problem. We compare the mean error and processing time obtained by our method with such measures computed using other methods. Our method produces results in competitive times with almost the same accuracy, especially for large meshes. We also demonstrate its use for solving two classical geometry processing problems: the regular sampling problem and the Voronoi tessellation on meshes.Comment: Preprint submitted to Computers & Graphic

arXiv.org e-Print Archive

ZORA

Finite Element Integration on GPUs

Author: Andy R. Terrel
Datta K.
Markall G.
Maruyama N.
Matthew G. Knepley
Murthy G.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 28/02/2011
Field of study

We present a novel finite element integration method for low order elements on GPUs. We achieve more than 100GF for element integration on first order discretizations of both the Laplacian and Elasticity operators.Comment: 16 pages, 3 figure

arXiv.org e-Print Archive

Crossref

GPU-driven recombination and transformation of YCoCg-R video samples

Author: De Neve Wesley
Van de Walle Rik
Van Rijsselbergen Dieter
Publication venue: ACTA Press Anaheim
Publication date: 01/01/2006
Field of study

Common programmable Graphics Processing Units (GPU) are capable of more than just rendering real-time effects for games. They can also be used for image processing and the acceleration of video decoding. This paper describes an extended implementation of the H.264/AVC YCoCg-R to RGB color space transformation on the GPU. Both the color space transformation and recombination of the color samples from a nontrivial data layout are performed by the GPU. Using mid- to high-range GPUs, this extended implementation offers a significant gain in processing speed compared to an existing basic GPU version and an optimized CPU implementation. An ATI X1900 GPU was capable of processing more than 73 high-resolution 1080p YCoCg-R frames per second, which is over twice the speed of the CPU-only transformation using a Pentium D 820

Ghent University Academic Bibliography

OpenACC Based GPU Parallelization of Plane Sweep Algorithm for Geometric Intersection

Author: AB Khlopotine
JL Bentley
M McKenney
MT Goodrich
MT Goodrich
Publication venue: e-Publications@Marquette
Publication date: 01/01/2019
Field of study

Line segment intersection is one of the elementary operations in computational geometry. Complex problems in Geographic Information Systems (GIS) like finding map overlays or spatial joins using polygonal data require solving segment intersections. Plane sweep paradigm is used for finding geometric intersection in an efficient manner. However, it is difficult to parallelize due to its in-order processing of spatial events. We present a new fine-grained parallel algorithm for geometric intersection and its CPU and GPU implementation using OpenMP and OpenACC. To the best of our knowledge, this is the first work demonstrating an effective parallelization of plane sweep on GPUs. We chose compiler directive based approach for implementation because of its simplicity to parallelize sequential code. Using Nvidia Tesla P100 GPU, our implementation achieves around 40X speedup for line segment intersection problem on 40K and 80K data sets compared to sequential CGAL library

epublications@Marquette

Crossref

GPU in Physics Computation: Case Geant4 Navigation

Author: Kommeri Jukka
Niemi Tapio
Seiskari Otto
Publication venue
Publication date: 24/09/2012
Field of study

General purpose computing on graphic processing units (GPU) is a potential method of speeding up scientific computation with low cost and high energy efficiency. We experimented with the particle physics simulation toolkit Geant4 used at CERN to benchmark its geometry navigation functionality on a GPU. The goal was to find out whether Geant4 physics simulations could benefit from GPU acceleration and how difficult it is to modify Geant4 code to run in a GPU. We ported selected parts of Geant4 code to C99 & CUDA and implemented a simple gamma physics simulation utilizing this code to measure efficiency. The performance of the program was tested by running it on two different platforms: NVIDIA GeForce 470 GTX GPU and a 12-core AMD CPU system. Our conclusion was that GPUs can be a competitive alternate for multi-core computers but porting existing software in an efficient way is challenging

arXiv.org e-Print Archive

CiteSeerX

CERN Document Server