149 research outputs found
The Mont-Blanc Project: First Phase Successfully Finished
Running from October 2011 to June 2015, the aim of the European project
Mont-Blanc has been to develop an approach to Exascale computing based on
embedded power-efficient technology. The main goals of the project were to i)
build an HPC prototype using currently available energy-efficient embedded
technology, ii) design a Next Generation system to overcome the limitations of
the built prototype and iii) port a set of representative Exascale applications
to the system. This article summarises the contributions from the Leibniz
Supercomputing Centre (LRZ) and the Juelich Supercomputing Centre (JSC),
Germany, to the Mont-Blanc project.Comment: 5 pages, 3 figure
Integrating an N -Body Problem with SDC and PFASST
Vortex methods for the NavierâStokes equations are based on a Lagrangian particle discretization, which reduces the governing equations to a first-order initial value system of ordinary differential equations for the position and vorticity of N particles. In this paper, the accuracy of solving this system by time-serial spectral deferred corrections (SDC) as well as by the time-parallel Parallel Full Approximation Scheme in Space and Time (PFASST) is investigated. PFASST is based on intertwining SDC iterations with differing resolution in a manner similar to the Parareal algorithm and uses a Full Approximation Scheme (FAS) correction to improve the accuracy of coarser SDC iterations. It is demonstrated that SDC and PFASST can generate highly accurate solutions, and the performance in terms of function evaluations required for a certain accuracy is analyzed and compared to a standard RungeâKutta method
A space-time parallel solver for the three-dimensional heat equation
The paper presents a combination of the time-parallel âparallel full approximation scheme in space and timeâ (PFASST) with a parallel multigrid method (PMG) in space, resulting in a mesh-based solver for the three-dimensional heat equation with a uniquely high degree of efficient concurrency. Parallel scaling tests are reported on the Cray XE6 machine âMonte Rosaâ on up to cores and on the IBM Blue Gene/Q system âJUQUEENâ on up to cores. The efficacy of the combined spatial- and temporal parallelization is shown by demonstrating that using PFASST in addition to PMG significantly extends the strong-scaling limit. Implications of using spatial coarsening strategies in PFASSTâs multi-level hierarchy in large-scale parallel simulations are discussed
The gravitational billion body problem : Het miljard deeltjes probleem
The increased availability of accelerator technology in modern supercomputers forces users to redesign their algorithms. These accelerators are specifically designed to offer huge amounts of parallel compute power. In this thesis I show how to harness the power of these parallel processors for astrophysical simulations. I start with an introduction that presents the developments in astrophysical algorithms and used hardware since the 1960__s till today. In the following scientific chapters I discuss the use of GPU accelerator technology for direct N-body methods and for the more advanced hierarchical algorithms. These advanced algorithms are more complex to implement on large parallel architectures, but by redesigning the algorithms it is possible to take advantage of the GPU. The developed algorithms are applied to simulate galaxy mergers to explain discrepancies in observational results. In the simulations we test different merger configurations and try to match the results with observational data. The final chapter shows how to scale the developed software code to thousands of GPUs as available in the Titan supercomputer. The in this thesis developed and presented algorithms allow astronomers to take advantage of the new GPU technology and thereby run simulations that contain thousand times more particles than was possible beforeNWOUBL - phd migration 201
Fast Gravitational Approach for Rigid Point Set Registration with Ordinary Differential Equations
This article introduces a new physics-based method for rigid point set
alignment called Fast Gravitational Approach (FGA). In FGA, the source and
target point sets are interpreted as rigid particle swarms with masses
interacting in a globally multiply-linked manner while moving in a simulated
gravitational force field. The optimal alignment is obtained by explicit
modeling of forces acting on the particles as well as their velocities and
displacements with second-order ordinary differential equations of motion.
Additional alignment cues (point-based or geometric features, and other
boundary conditions) can be integrated into FGA through particle masses. We
propose a smooth-particle mass function for point mass initialization, which
improves robustness to noise and structural discontinuities. To avoid
prohibitive quadratic complexity of all-to-all point interactions, we adapt a
Barnes-Hut tree for accelerated force computation and achieve quasilinear
computational complexity. We show that the new method class has characteristics
not found in previous alignment methods such as efficient handling of partial
overlaps, inhomogeneous point sampling densities, and coping with large point
clouds with reduced runtime compared to the state of the art. Experiments show
that our method performs on par with or outperforms all compared competing
non-deep-learning-based and general-purpose techniques (which do not assume the
availability of training data and a scene prior) in resolving transformations
for LiDAR data and gains state-of-the-art accuracy and speed when coping with
different types of data disturbances.Comment: 18 pages, 18 figures and two table
A Space and Bandwidth Efficient Multicore Algorithm for the Particle-in-Cell Method
International audienceThe Particle-in-Cell (PIC) method allows solving partial differential equation through simulations, with important applications in plasma physics. To simulate thousands of billions of particles on clusters of multicore machines, prior work has proposed hybrid algorithms that combine domain decomposition and particle decomposition with carefully optimized algorithms for handling particles processed on each multicore socket. Regarding the multicore processing, existing algorithms either suffer from suboptimal execution time, due to sorting operations or use of atomic instructions, or suffer from suboptimal space usage. In this paper, we propose a novel parallel algorithm for two-dimensional PIC simulations on multicore hardware that features asymptotically-optimal memory consumption, and does not perform unnecessary accesses to the main memory. In practice, our algorithm reaches 65% of the maximum bandwidth, and shows excellent scalability on the classical Landau damping and two-stream instability test cases
Afterlive: A performant code for Vlasov-Hybrid simulations
A parallelized implementation of the Vlasov-Hybrid method [Nunn, 1993] is
presented. This method is a hybrid between a gridded Eulerian description and
Lagrangian meta-particles. Unlike the Particle-in-Cell method [Dawson, 1983]
which simply adds up the contribution of meta-particles, this method does a
reconstruction of the distribution function in every time step for each
species. This interpolation method combines meta-particles with different
weights in such a way that particles with large weight do not drown out
particles that represent small contributions to the phase space density. These
core properties allow the use of a much larger range of macro factors and can
thus represent a much larger dynamic range in phase space density.
The reconstructed phase space density is used to calculate momenta of the
distribution function such as the charge density . The charge density
is also used as input into a spectral solver that calculates the
self-consistent electrostatic field which is used to update the particles for
the next time-step.
Afterlive (A Fourier-based Tool in the Electrostatic limit for the Rapid
Low-noise Integration of the Vlasov Equation) is fully parallelized using MPI
and writes output using parallel HDF5. The input to the simulation is read from
a JSON description that sets the initial particle distributions as well as
domain size and discretization constraints. The implementation presented here
is intentionally limited to one spatial dimension and resolves one or three
dimensions in velocity space. Additional spatial dimensions can be added in a
straight forward way, but make runs computationally even more costly.Comment: Accepted for publication in Computer Physics Communication
- âŠ