Search CORE

2,850 research outputs found

Accelerating the Fourier split operator method via graphics processing units

Author: Bauke Heiko
Keitel Christoph H.
Publication venue: 'Elsevier BV'
Publication date: 17/12/2010
Field of study

Current generations of graphics processing units have turned into highly parallel devices with general computing capabilities. Thus, graphics processing units may be utilized, for example, to solve time dependent partial differential equations by the Fourier split operator method. In this contribution, we demonstrate that graphics processing units are capable to calculate fast Fourier transforms much more efficiently than traditional central processing units. Thus, graphics processing units render efficient implementations of the Fourier split operator method possible. Performance gains of more than an order of magnitude as compared to implementations for traditional central processing units are reached in the solution of the time dependent Schr\"odinger equation and the time dependent Dirac equation

arXiv.org e-Print Archive

MPG.PuRe

A sparse octree gravitational N-body code that runs entirely on the GPU processor

Author: Barnes
Barnes
Barnes
Belleman
Billeter
Buck
Burtscher
de Berg
Dehnen
Dubinski
Evghenii Gaburov
Fukushige
Gaburov
Gaburov
Hamada
Hamada
Harfst
Hut
Jeroen Bédorf
Knuth
Lauterbach
Makino
Makino
McMillan
Nyland
Plummer
Portegies Zwart
Portegies Zwart
Raman
Salmon
Satish
Simon Portegies Zwart
Springel
Warren
Yokota
Publication venue: 'Elsevier BV'
Publication date: 01/04/2012
Field of study

We present parallel algorithms for constructing and traversing sparse octrees on graphics processing units (GPUs). The algorithms are based on parallel-scan and sort methods. To test the performance and feasibility, we implemented them in CUDA in the form of a gravitational tree-code which completely runs on the GPU.(The code is publicly available at: http://castle.strw.leidenuniv.nl/software.html) The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second.Comment: Accepted version. Published in Journal of Computational Physics. 35 pages, 12 figures, single colum

arXiv.org e-Print Archive

Crossref

Leiden University Scholary Publications

Massively parallel split-step Fourier techniques for simulating quantum systems on graphics processing units

Author: James Schloss
Publication venue
Publication date: 27/12/2019
Field of study

The split-step Fourier method is a powerful technique for solving partial differential equations and simulating ultracold atomic systems of various forms. In this body of work, we focus on several variations of this method to allow for simulations of one, two, and three-dimensional quantum systems, along with several notable methods for controlling these systems. In particular, we use quantum optimal control and shortcuts to adiabaticity to study the non-adiabatic generation of superposition states in strongly correlated one-dimensional systems, analyze chaotic vortex trajectories in two dimensions by using rotation and phase imprinting methods, and create stable, threedimensional vortex structures in Bose–Einstein condensates through artificial magnetic fields generated by the evanescent field of an optical nanofiber. We also discuss algorithmic optimizations for implementing the split-step Fourier method on graphics processing units. All computational methods present in this work are demonstrated on physical systems and have been incorporated into a state-of-the-art and open-source software suite known as GPUE, which is currently the fastest quantum simulator of its kind.Okinawa Institute of Science and Technology Graduate Universit

OIST Institutional Repository

Institutional Repositories DataBase (IRDB)

Accelerated computational micromechanics

Author: Bhattacharya Kaushik
Zhou Hao
Publication venue
Publication date: 09/10/2020
Field of study

We present an approach to solving problems in micromechanics that is amenable to massively parallel calculations through the use of graphical processing units and other accelerators. The problems lead to nonlinear differential equations that are typically second order in space and first order in time. This combination of nonlinearity and nonlocality makes such problems difficult to solve in parallel. However, this combination is a result of collapsing nonlocal, but linear and universal physical laws (kinematic compatibility, balance laws), and nonlinear but local constitutive relations. We propose an operator-splitting scheme inspired by this structure. The governing equations are formulated as (incremental) variational problems, the differential constraints like compatibility are introduced using an augmented Lagrangian, and the resulting incremental variational principle is solved by the alternating direction method of multipliers. The resulting algorithm has a natural connection to physical principles, and also enables massively parallel implementation on structured grids. We present this method and use it to study two examples. The first concerns the long wavelength instability of finite elasticity, and allows us to verify the approach against previous numerical simulations. We also use this example to study convergence and parallel performance. The second example concerns microstructure evolution in liquid crystal elastomers and provides new insights into some counter-intuitive properties of these materials. We use this example to validate the model and the approach against experimental observations

Development of a Chemically Reacting Flow Solver on the Graphic Processing Units

Author: Le Hai
Publication venue: SJSU ScholarWorks
Publication date: 01/01/2011
Field of study

The focus of the current research is to develop a numerical framework on the Graphic Processing Units (GPU) capable of modeling chemically reacting flow. The framework incorporates a high-order finite volume method coupled with an implicit solver for the chemical kinetics. Both the fluid solver and the kinetics solver are designed to take advantage of the GPU architecture to achieve high performance. The structure of the numerical framework is shown, detailing different aspects of the optimization implemented on the solver. The mathematical formulation of the core algorithms is presented along with a series of standard test cases, including both nonreactive and reactive flows, in order to validate the capability of the numerical solver. The performance results obtained with the current framework show the parallelization efficiency of the solver and emphasize the capability of the GPU in performing scientific calculations. Distribution A: Approved for public release; distribution unlimited. PA #1117

SJSU ScholarWorks

Implementation and Validation of a Computationally Efficient DNS Solver for Reacting Flows in OpenFOAM

Author: Bockhorn H.
Denev J.
Habisreuther P.
Trimis D.
Zhang F.
Zirwes T.
Publication venue: Scipedia S.L.
Publication date: 11/03/2021
Field of study

To meet future climate goals, the efficiency of combustion devices has to be increased. This requires a better understanding of the underlying physics. The simulation of turbulent flames is a challenge because of the multi-scale nature of combustion processes: relevant length scales span over five orders of magnitude and time scales over more than ten. Because of this, the direct numerical simulation (DNS) of turbulent flames is only possible on large supercomputers. A DNS solver for chemically reacting flows implemented in the open-source framework OpenFOAM is presented. The thermo-chemical library Cantera is used to compute detailed transport coefficients based on kinetic gas theory. The multi-scale nature of time scales, which are much lower for the combustion chemistry than for the flow, are bridged by an operator splitting approach, which employs the open-source solver Sundials to integrate chemical reaction rates. Because the direct simulation of turbulent flames has to be performed on supercomputers, special care has been taken to improve the computational performance. A tool was developed which generates highly optimized C++ source code for the computation of chemical reaction rates. Additionally, a load balancing approach specifically made for the computation of chemical reaction rates is employed. In total, these optimizations can reduce total simulation times by up to 70 %. The accuracy of the new solver is assessed from different canonical testcases: 2D and 3D Taylor-Green vortex simulations show that the solver can reach up to fourth order convergence rates and that results differ by less than 1 % when compared to spectral DNS codes. Molecular diffusion and chemical reaction rates are compared to solutions of 1D flames from Cantera, showing perfect agreement. The solver is used to simulate the Sydney/Sandia burner. The simulation is performed on one of Germany\u27s largest supercomputer on 28 800 CPU cores, employing 150 million cells and a chemical reaction mechanism with 19 species and about 200 reactions. Comparison with experimental data shows excellent agreement for time averaged and RMS values

KITopen

Scipedia

Towards Lattice Quantum Chromodynamics on FPGA devices

Author: Korcyl Grzegorz
Korcyl Piotr
Publication venue: 'Elsevier BV'
Publication date: 04/12/2019
Field of study

In this paper we describe a single-node, double precision Field Programmable Gate Array (FPGA) implementation of the Conjugate Gradient algorithm in the context of Lattice Quantum Chromodynamics. As a benchmark of our proposal we invert numerically the Dirac-Wilson operator on a 4-dimensional grid on three Xilinx hardware solutions: Zynq Ultrascale+ evaluation board, the Alveo U250 accelerator and the largest device available on the market, the VU13P device. In our implementation we separate software/hardware parts in such a way that the entire multiplication by the Dirac operator is performed in hardware, and the rest of the algorithm runs on the host. We find out that the FPGA implementation can offer a performance comparable with that obtained using current CPU or Intel's many core Xeon Phi accelerators. A possible multiple node FPGA-based system is discussed and we argue that power-efficient High Performance Computing (HPC) systems can be implemented using FPGA devices only.Comment: 17 pages, 4 figure

arXiv.org e-Print Archive

University of Regensburg Publication Server

Jagiellonian Univeristy Repository