223 research outputs found
Enabling Radiative Transfer on AMR grids in CRASH
We introduce CRASH-AMR, a new version of the cosmological Radiative Transfer
(RT) code CRASH, enabled to use refined grids. This new feature allows us to
attain higher resolution in our RT simulations and thus to describe more
accurately ionisation and temperature patterns in high density regions. We have
tested CRASH-AMR by simulating the evolution of an ionised region produced by a
single source embedded in gas at constant density, as well as by a more
realistic configuration of multiple sources in an inhomogeneous density field.
While we find an excellent agreement with the previous version of CRASH when
the AMR feature is disabled, showing that no numerical artifact has been
introduced in CRASH-AMR, when additional refinement levels are used the code
can simulate more accurately the physics of ionised gas in high density
regions. This result has been attained at no computational loss, as RT
simulations on AMR grids with maximum resolution equivalent to that of a
uniform cartesian grid can be run with a gain of up to 60% in computational
time.Comment: 19 pages, 17 figures. MNRAS, in pres
Cluster-based communication and load balancing for simulations on dynamically adaptive grids
short paperThe present paper introduces a new communication and load-balancing scheme based on a clustering of the grid which we use for the efficient parallelization of simulations on dynamically adaptive grids.
With a partitioning based on space-filling curves (SFCs), this yields several advantageous properties regarding the memory requirements and load balancing. However, for such an SFC- based partitioning, additional connectivity information has to be stored and updated for dynamically changing grids.
In this work, we present our approach to keep this connectivity information run-length encoded (RLE) only for the interfaces shared between partitions. Using special properties of the underlying grid traversal and used communication scheme, we update this connectivity information implicitly for dynamically changing grids and can represent the connectivity information as a sparse communication graph: graph nodes (partitions) represent bulks of connected grid cells and each graph edge (RLE connectivity information) a unique relation between adjacent partitions. This directly leads to an efficient shared-memory parallelization with graph nodes assigned to computing cores and an efficient en bloc data exchange via graph edges. We further refer to such a partitioning approach with RLE meta information as a cluster-based domain decomposition and to each partition as a cluster. With the sparse communication graph in mind, we then extend the connectivity information represented by the graph edges with MPI ranks, yielding an en bloc communication for distributed-memory systems and a hybrid parallelization. For data migration, the stack-based intra-cluster communication allows a very low memory footprint for data migration and the RLE leads to efficient updates of connectivity information.
Our benchmark is based on a shallow water simulation on a dynamically adaptive grid. We conducted performance studies for MPI-only and hybrid parallelizations, yielding an efficiency of over 90% on 256 cores. Furthermore, we demonstrate the applicability of cluster-based optimizations on distributed-memory systems.We like to thank the Munich Centre of Advanced Computing for for funding this project by
providing computing time on the MAC Cluster. This work was partly supported by the German
Research Foundation (DFG) as part of the Transregional Collaborative Research Centre
”Invasive Computing” (SFB/TR 89)
SFC-based Communication Metadata Encoding for Adaptive Mesh
This volume of the series “Advances in Parallel Computing” contains the proceedings of the International Conference on Parallel Programming – ParCo 2013 – held from 10 to 13 September 2013 in Garching, Germany. The conference was hosted by the Technische Universität München (Department of Informatics) and the Leibniz Supercomputing Centre.The present paper studies two adaptive mesh refinement (AMR) codes
whose grids rely on recursive subdivison in combination with space-filling curves
(SFCs). A non-overlapping domain decomposition based upon these SFCs yields
several well-known advantageous properties with respect to communication demands,
balancing, and partition connectivity. However, the administration of the
meta data, i.e. to track which partitions exchange data in which cardinality, is nontrivial
due to the SFC’s fractal meandering and the dynamic adaptivity. We introduce
an analysed tree grammar for the meta data that restricts it without loss of
information hierarchically along the subdivision tree and applies run length encoding.
Hence, its meta data memory footprint is very small, and it can be computed
and maintained on-the-fly even for permanently changing grids. It facilitates a forkjoin
pattern for shared data parallelism. And it facilitates replicated data parallelism
tackling latency and bandwidth constraints respectively due to communication in
the background and reduces memory requirements by avoiding adjacency information
stored per element. We demonstrate this at hands of shared and distributed
parallelized domain decompositions.This work was supported by the German Research Foundation (DFG) as part of the
Transregional Collaborative Research Centre “Invasive Computing (SFB/TR 89). It is
partially based on work supported by Award No. UK-c0020, made by the King Abdullah
University of Science and Technology (KAUST)
Efficient cosmological parameter sampling using sparse grids
We present a novel method to significantly speed up cosmological parameter
sampling. The method relies on constructing an interpolation of the
CMB-log-likelihood based on sparse grids, which is used as a shortcut for the
likelihood-evaluation. We obtain excellent results over a large region in
parameter space, comprising about 25 log-likelihoods around the peak, and we
reproduce the one-dimensional projections of the likelihood almost perfectly.
In speed and accuracy, our technique is competitive to existing approaches to
accelerate parameter estimation based on polynomial interpolation or neural
networks, while having some advantages over them. In our method, there is no
danger of creating unphysical wiggles as it can be the case for polynomial fits
of a high degree. Furthermore, we do not require a long training time as for
neural networks, but the construction of the interpolation is determined by the
time it takes to evaluate the likelihood at the sampling points, which can be
parallelised to an arbitrary degree. Our approach is completely general, and it
can adaptively exploit the properties of the underlying function. We can thus
apply it to any problem where an accurate interpolation of a function is
needed.Comment: Submitted to MNRAS, 13 pages, 13 figure
Invasive Computing in HPC with X10
High performance computing with thousands of cores relies on distributed
memory due to memory consistency reasons. The resource
management on such systems usually relies on static assignment of
resources at the start of each application. Such a static scheduling
is incapable of starting applications with required resources being
used by others since a reduction of resources assigned to applications
without stopping them is not possible. This lack of dynamic
adaptive scheduling leads to idling resources until the remaining
amount of requested resources gets available. Additionally, applications
with changing resource requirements lead to idling or less
efficiently used resources. The invasive computing paradigm suggests
dynamic resource scheduling and applications able to dynamically
adapt to changing resource requirements.
As a case study, we developed an invasive resource manager as
well as a multigrid with dynamically changing resource demands.
Such a multigrid has changing scalability behavior during its execution
and requires data migration upon reallocation due to distributed
memory systems.
To counteract the additional complexity introduced by the additional
interfaces, e. g. for data migration, we use the X10 programming
language for improved programmability. Our results show
improved application throughput and the dynamic adaptivity. In addition,
we show our extension for the distributed arrays of X10 to
support data migrationThis work was supported by the German Research Foundation
(DFG) as part of the Transregional Collaborative Research Centre
“Invasive Computing” (SFB/TR 89)
High-Dimensional Stochastic Design Optimization by Adaptive-Sparse Polynomial Dimensional Decomposition
This paper presents a novel adaptive-sparse polynomial dimensional
decomposition (PDD) method for stochastic design optimization of complex
systems. The method entails an adaptive-sparse PDD approximation of a
high-dimensional stochastic response for statistical moment and reliability
analyses; a novel integration of the adaptive-sparse PDD approximation and
score functions for estimating the first-order design sensitivities of the
statistical moments and failure probability; and standard gradient-based
optimization algorithms. New analytical formulae are presented for the design
sensitivities that are simultaneously determined along with the moments or the
failure probability. Numerical results stemming from mathematical functions
indicate that the new method provides more computationally efficient design
solutions than the existing methods. Finally, stochastic shape optimization of
a jet engine bracket with 79 variables was performed, demonstrating the power
of the new method to tackle practical engineering problems.Comment: 18 pages, 2 figures, to appear in Sparse Grids and
Applications--Stuttgart 2014, Lecture Notes in Computational Science and
Engineering 109, edited by J. Garcke and D. Pfl\"{u}ger, Springer
International Publishing, 201
Smolyak's algorithm: A powerful black box for the acceleration of scientific computations
We provide a general discussion of Smolyak's algorithm for the acceleration
of scientific computations. The algorithm first appeared in Smolyak's work on
multidimensional integration and interpolation. Since then, it has been
generalized in multiple directions and has been associated with the keywords:
sparse grids, hyperbolic cross approximation, combination technique, and
multilevel methods. Variants of Smolyak's algorithm have been employed in the
computation of high-dimensional integrals in finance, chemistry, and physics,
in the numerical solution of partial and stochastic differential equations, and
in uncertainty quantification. Motivated by this broad and ever-increasing
range of applications, we describe a general framework that summarizes
fundamental results and assumptions in a concise application-independent
manner
ELPA: A parallel solver for the generalized eigenvalue problem
For symmetric (hermitian) (dense or banded) matrices the computation of eigenvalues and eigenvectors Ax = λBx is an important task, e.g. in electronic structure calculations. If a larger number of eigenvectors are needed, often direct solvers are applied. On parallel architectures the ELPA implementation has proven to be very efficient, also compared to other parallel solvers like EigenExa or MAGMA. The main improvement that allows better parallel efficiency in ELPA is the two-step transformation of dense to band to tridiagonal form. This was the achievement of the ELPA project. The continuation of this project has been targeting at additional improvements like allowing monitoring and autotuning of the ELPA code, optimizing the code for different architectures, developing curtailed algorithms for banded A and B, and applying the improved code to solve typical examples in electronic structure calculations. In this paper we will present the outcome of this project
A posteriori error analysis and adaptive non-intrusive numerical schemes for systems of random conservation laws
In this article we consider one-dimensional random systems of hyperbolic
conservation laws. We first establish existence and uniqueness of random
entropy admissible solutions for initial value problems of conservation laws
which involve random initial data and random flux functions. Based on these
results we present an a posteriori error analysis for a numerical approximation
of the random entropy admissible solution. For the stochastic discretization,
we consider a non-intrusive approach, the Stochastic Collocation method. The
spatio-temporal discretization relies on the Runge--Kutta Discontinuous
Galerkin method. We derive the a posteriori estimator using continuous
reconstructions of the discrete solution. Combined with the relative entropy
stability framework this yields computable error bounds for the entire
space-stochastic discretization error. The estimator admits a splitting into a
stochastic and a deterministic (space-time) part, allowing for a novel
residual-based space-stochastic adaptive mesh refinement algorithm. We conclude
with various numerical examples investigating the scaling properties of the
residuals and illustrating the efficiency of the proposed adaptive algorithm
Efficient Resolution of Anisotropic Structures
We highlight some recent new delevelopments concerning the sparse
representation of possibly high-dimensional functions exhibiting strong
anisotropic features and low regularity in isotropic Sobolev or Besov scales.
Specifically, we focus on the solution of transport equations which exhibit
propagation of singularities where, additionally, high-dimensionality enters
when the convection field, and hence the solutions, depend on parameters
varying over some compact set. Important constituents of our approach are
directionally adaptive discretization concepts motivated by compactly supported
shearlet systems, and well-conditioned stable variational formulations that
support trial spaces with anisotropic refinements with arbitrary
directionalities. We prove that they provide tight error-residual relations
which are used to contrive rigorously founded adaptive refinement schemes which
converge in . Moreover, in the context of parameter dependent problems we
discuss two approaches serving different purposes and working under different
regularity assumptions. For frequent query problems, making essential use of
the novel well-conditioned variational formulations, a new Reduced Basis Method
is outlined which exhibits a certain rate-optimal performance for indefinite,
unsymmetric or singularly perturbed problems. For the radiative transfer
problem with scattering a sparse tensor method is presented which mitigates or
even overcomes the curse of dimensionality under suitable (so far still
isotropic) regularity assumptions. Numerical examples for both methods
illustrate the theoretical findings
- …