Search CORE

6,211 research outputs found

QCD simulations with staggered fermions on GPUs

Author: Bonati Claudio
Cossu Guido
D'Elia Massimo
Incardona Pietro
Publication venue: 'Elsevier BV'
Publication date: 28/12/2011
Field of study

We report on our implementation of the RHMC algorithm for the simulation of lattice QCD with two staggered flavors on Graphics Processing Units, using the NVIDIA CUDA programming language. The main feature of our code is that the GPU is not used just as an accelerator, but instead the whole Molecular Dynamics trajectory is performed on it. After pointing out the main bottlenecks and how to circumvent them, we discuss the obtained performances. We present some preliminary results regarding OpenCL and multiGPU extensions of our code and discuss future perspectives.Comment: 22 pages, 14 eps figures, final version to be published in Computer Physics Communication

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

UnipiEprints

Exascale Deep Learning for Climate Analytics

Author: Deslippe Jack
Fatica Massimiliano
Houston Michael
Kurth Thorsten
Luehr Nathan
Mahesh Ankur
Matheson Michael
Mudigonda Mayur
Phillips Everett
Prabhat
Romero Joshua
Treichler Sean
Publication venue
Publication date: 03/10/2018
Field of study

We extract pixel-level masks of extreme weather patterns using variants of Tiramisu and DeepLabv3+ neural networks. We describe improvements to the software frameworks, input pipeline, and the network training algorithms necessary to efficiently scale deep learning on the Piz Daint and Summit systems. The Tiramisu network scales to 5300 P100 GPUs with a sustained throughput of 21.0 PF/s and parallel efficiency of 79.0%. DeepLabv3+ scales up to 27360 V100 GPUs with a sustained throughput of 325.8 PF/s and a parallel efficiency of 90.7% in single precision. By taking advantage of the FP16 Tensor Cores, a half-precision version of the DeepLabv3+ network achieves a peak and sustained throughput of 1.13 EF/s and 999.0 PF/s respectively.Comment: 12 pages, 5 tables, 4, figures, Super Computing Conference November 11-16, 2018, Dallas, TX, US

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

MuMax: a new high-performance micromagnetic simulation tool

Author: A. Vansteenkiste
B. Van de Wiele
Berger
Bohlens
Brown
Donahue
Donahue
Fischbacher
Frigo
Kakay
Li
McMichael
Najafi
Parkin
Scholz
Van de Wiele
Van de Wiele
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

We present MuMax, a general-purpose micromagnetic simulation tool running on Graphical Processing Units (GPUs). MuMax is designed for high performance computations and specifically targets large simulations. In that case speedups of over a factor 100x can easily be obtained compared to the CPU-based OOMMF program developed at NIST. MuMax aims to be general and broadly applicable. It solves the classical Landau-Lifshitz equation taking into account the magnetostatic, exchange and anisotropy interactions, thermal effects and spin-transfer torque. Periodic boundary conditions can optionally be imposed. A spatial discretization using finite differences in 2 or 3 dimensions can be employed. MuMax is publicly available as open source software. It can thus be freely used and extended by community. Due to its high computational performance, MuMax should open up the possibility of running extensive simulations that would be nearly inaccessible with typical CPU-based simulators.Comment: To be published in JMM

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

Design and optimization of a portable LQCD Monte Carlo code using OpenACC

Author: Bonati Claudio
Calore Enrico
Coscetti Simone
D'Elia Massimo
Mesiti Michele
Negro Francesco
Schifano Sebastiano Fabio
Silvi Giorgio
Tripiccione Raffaele
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2017
Field of study

The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core GPUs, exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this scenario, code portability (and performance portability) become necessary for easy maintainability of applications; this is very relevant in scientific computing where code changes are very frequent, making it tedious and prone to error to keep different code versions aligned. In this work we present the design and optimization of a state-of-the-art production-level LQCD Monte Carlo application, using the directive-based OpenACC programming model. OpenACC abstracts parallel programming to a descriptive level, relieving programmers from specifying how codes should be mapped onto the target architecture. We describe the implementation of a code fully written in OpenACC, and show that we are able to target several different architectures, including state-of-the-art traditional CPUs and GPUs, with the same code. We also measure performance, evaluating the computing efficiency of our OpenACC code on several architectures, comparing with GPU-specific implementations and showing that a good level of performance-portability can be reached.Comment: 26 pages, 2 png figures, preprint of an article submitted for consideration in International Journal of Modern Physics

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Università di Ferrara

Juelich Shared Electronic Resources

GPU driven finite difference WENO scheme for real time solution of the shallow water equations

Author: Falconer R.
Meyer K.
Parna P.
Publication venue
Publication date: 15/01/2018
Field of study

The shallow water equations are applicable to many common engineering problems involving modelling of waves dominated by motions in the horizontal directions (e.g. tsunami propagation, dam breaks). As such events pose substantial economic costs, as well as potential loss of life, accurate real-time simulation and visualization methods are of great importance. For this purpose, we propose a new finite difference scheme for the 2D shallow water equations that is specifically formulated to take advantage of modern GPUs. The new scheme is based on the so-called Picard integral formulation of conservation laws combined with Weighted Essentially Non-Oscillatory reconstruction. The emphasis of the work is on third order in space and second order in time solutions (in both single and double precision). Further, the scheme is well-balanced for bathymetry functions that are not surface piercing and can handle wetting and drying in a GPU-friendly manner without resorting to long and specific case-by-case procedures. We also present a fast single kernel GPU implementation with a novel boundary condition application technique that allows for simultaneous real-time visualization and single precision simulations even on large ( > 2000 × 2000) grids on consumer-level hardware - the full kernel source codes are also provided online at https://github.com/pparna/swe_pifweno3

Abertay Research Portal

University of Dundee Online Publications