Search CORE

149 research outputs found

The Mont-Blanc Project: First Phase Successfully Finished

Author: Allalen Momme
Brayford David
Brömmel Dirk
Halver Rene
Meinke Jan
Mohanty Sandipan
Mohr Bernd
Tafani Daniele
Weinberg Volker
Publication venue
Publication date: 01/01/2015
Field of study

Running from October 2011 to June 2015, the aim of the European project Mont-Blanc has been to develop an approach to Exascale computing based on embedded power-efficient technology. The main goals of the project were to i) build an HPC prototype using currently available energy-efficient embedded technology, ii) design a Next Generation system to overcome the limitations of the built prototype and iii) port a set of representative Exascale applications to the system. This article summarises the contributions from the Leibniz Supercomputing Centre (LRZ) and the Juelich Supercomputing Centre (JSC), Germany, to the Mont-Blanc project.Comment: 5 pages, 3 figure

arXiv.org e-Print Archive

Juelich Shared Electronic Resources

The gravitational billion body problem

Author: Bédorf J. (Jeroen)
Publication venue
Publication date: 02/09/2014
Field of study

CWI's Institutional Repository

Integrating an N -Body Problem with SDC and PFASST

Author: Emmett M
Gibbon P
Krause R
Minion M
Ruprecht D
Speck R
Winkel M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Vortex methods for the Navier–Stokes equations are based on a Lagrangian particle discretization, which reduces the governing equations to a first-order initial value system of ordinary differential equations for the position and vorticity of N particles. In this paper, the accuracy of solving this system by time-serial spectral deferred corrections (SDC) as well as by the time-parallel Parallel Full Approximation Scheme in Space and Time (PFASST) is investigated. PFASST is based on intertwining SDC iterations with differing resolution in a manner similar to the Parareal algorithm and uses a Full Approximation Scheme (FAS) correction to improve the accuracy of coarser SDC iterations. It is demonstrated that SDC and PFASST can generate highly accurate solutions, and the performance in terms of function evaluations required for a certain accuracy is analyzed and compared to a standard Runge–Kutta method

Juelich Shared Electronic Resources

White Rose Research Online

A space-time parallel solver for the three-dimensional heat equation

Author: Bolten Matthias
Emmett Matthew
Krause Rolf
Minion Michael
Ruprecht Daniel
Speck Robert
Publication venue: 'IOS Press'
Publication date: 01/01/2014
Field of study

The paper presents a combination of the time-parallel “parallel full approximation scheme in space and time” (PFASST) with a parallel multigrid method (PMG) in space, resulting in a mesh-based solver for the three-dimensional heat equation with a uniquely high degree of efficient concurrency. Parallel scaling tests are reported on the Cray XE6 machine “Monte Rosa” on up to

16,384

cores and on the IBM Blue Gene/Q system “JUQUEEN” on up to

65,536

cores. The efficacy of the combined spatial- and temporal parallelization is shown by demonstrating that using PFASST in addition to PMG significantly extends the strong-scaling limit. Implications of using spatial coarsening strategies in PFASST’s multi-level hierarchy in large-scale parallel simulations are discussed

arXiv.org e-Print Archive

Juelich Shared Electronic Resources

White Rose Research Online

The gravitational billion body problem : Het miljard deeltjes probleem

Author: Bédorf J.
Publication venue
Publication date: 02/09/2014
Field of study

The increased availability of accelerator technology in modern supercomputers forces users to redesign their algorithms. These accelerators are specifically designed to offer huge amounts of parallel compute power. In this thesis I show how to harness the power of these parallel processors for astrophysical simulations. I start with an introduction that presents the developments in astrophysical algorithms and used hardware since the 1960__s till today. In the following scientific chapters I discuss the use of GPU accelerator technology for direct N-body methods and for the more advanced hierarchical algorithms. These advanced algorithms are more complex to implement on large parallel architectures, but by redesigning the algorithms it is possible to take advantage of the GPU. The developed algorithms are applied to simulate galaxy mergers to explain discrepancies in observational results. In the simulations we test different merger configurations and try to match the results with observational data. The final chapter shows how to scale the developed software code to thousands of GPUs as available in the Titan supercomputer. The in this thesis developed and presented algorithms allow astronomers to take advantage of the new GPU technology and thereby run simulations that contain thousand times more particles than was possible beforeNWOUBL - phd migration 201

Leiden University Scholary Publications

Fast Gravitational Approach for Rigid Point Set Registration with Ordinary Differential Equations

Author: Ali Sk Aziz
Golyanik Vladislav
Kahraman Kerem
Stricker Didier
Theobalt Christian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

This article introduces a new physics-based method for rigid point set alignment called Fast Gravitational Approach (FGA). In FGA, the source and target point sets are interpreted as rigid particle swarms with masses interacting in a globally multiply-linked manner while moving in a simulated gravitational force field. The optimal alignment is obtained by explicit modeling of forces acting on the particles as well as their velocities and displacements with second-order ordinary differential equations of motion. Additional alignment cues (point-based or geometric features, and other boundary conditions) can be integrated into FGA through particle masses. We propose a smooth-particle mass function for point mass initialization, which improves robustness to noise and structural discontinuities. To avoid prohibitive quadratic complexity of all-to-all point interactions, we adapt a Barnes-Hut tree for accelerated force computation and achieve quasilinear computational complexity. We show that the new method class has characteristics not found in previous alignment methods such as efficient handling of partial overlaps, inhomogeneous point sampling densities, and coping with large point clouds with reduced runtime compared to the state of the art. Experiments show that our method performs on par with or outperforms all compared competing non-deep-learning-based and general-purpose techniques (which do not assume the availability of training data and a scene prior) in resolving transformations for LiDAR data and gains state-of-the-art accuracy and speed when coping with different types of data disturbances.Comment: 18 pages, 18 figures and two table

arXiv.org e-Print Archive

MPG.PuRe

A Space and Bandwidth Efficient Multicore Algorithm for the Particle-in-Cell Method

Author: Barsamian Yann,
Charguéraud Arthur
Ketterlin Alain
Publication venue: HAL CCSD
Publication date: 10/09/2017
Field of study

International audienceThe Particle-in-Cell (PIC) method allows solving partial differential equation through simulations, with important applications in plasma physics. To simulate thousands of billions of particles on clusters of multicore machines, prior work has proposed hybrid algorithms that combine domain decomposition and particle decomposition with carefully optimized algorithms for handling particles processed on each multicore socket. Regarding the multicore processing, existing algorithms either suffer from suboptimal execution time, due to sorting operations or use of atomic instructions, or suffer from suboptimal space usage. In this paper, we propose a novel parallel algorithm for two-dimensional PIC simulations on multicore hardware that features asymptotically-optimal memory consumption, and does not perform unnecessary accesses to the main memory. In practice, our algorithm reaches 65% of the maximum bandwidth, and shows excellent scalability on the classical Landau damping and two-stream instability test cases

INRIA a CCSD electronic archive server

Afterlive: A performant code for Vlasov-Hybrid simulations

Author: Kilian Patrick
Schreiner Cedric
Spanier Felix
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

A parallelized implementation of the Vlasov-Hybrid method [Nunn, 1993] is presented. This method is a hybrid between a gridded Eulerian description and Lagrangian meta-particles. Unlike the Particle-in-Cell method [Dawson, 1983] which simply adds up the contribution of meta-particles, this method does a reconstruction of the distribution function

f

in every time step for each species. This interpolation method combines meta-particles with different weights in such a way that particles with large weight do not drown out particles that represent small contributions to the phase space density. These core properties allow the use of a much larger range of macro factors and can thus represent a much larger dynamic range in phase space density. The reconstructed phase space density

f

is used to calculate momenta of the distribution function such as the charge density

\rho

. The charge density

\rho

is also used as input into a spectral solver that calculates the self-consistent electrostatic field which is used to update the particles for the next time-step. Afterlive (A Fourier-based Tool in the Electrostatic limit for the Rapid Low-noise Integration of the Vlasov Equation) is fully parallelized using MPI and writes output using parallel HDF5. The input to the simulation is read from a JSON description that sets the initial particle distributions as well as domain size and discretization constraints. The implementation presented here is intentionally limited to one spatial dimension and resolves one or three dimensions in velocity space. Additional spatial dimensions can be added in a straight forward way, but make runs computationally even more costly.Comment: Accepted for publication in Computer Physics Communication

arXiv.org e-Print Archive

MPG.PuRe