244 research outputs found
Enabling Radiative Transfer on AMR grids in CRASH
We introduce CRASH-AMR, a new version of the cosmological Radiative Transfer
(RT) code CRASH, enabled to use refined grids. This new feature allows us to
attain higher resolution in our RT simulations and thus to describe more
accurately ionisation and temperature patterns in high density regions. We have
tested CRASH-AMR by simulating the evolution of an ionised region produced by a
single source embedded in gas at constant density, as well as by a more
realistic configuration of multiple sources in an inhomogeneous density field.
While we find an excellent agreement with the previous version of CRASH when
the AMR feature is disabled, showing that no numerical artifact has been
introduced in CRASH-AMR, when additional refinement levels are used the code
can simulate more accurately the physics of ionised gas in high density
regions. This result has been attained at no computational loss, as RT
simulations on AMR grids with maximum resolution equivalent to that of a
uniform cartesian grid can be run with a gain of up to 60% in computational
time.Comment: 19 pages, 17 figures. MNRAS, in pres
Cluster-based communication and load balancing for simulations on dynamically adaptive grids
short paperThe present paper introduces a new communication and load-balancing scheme based on a clustering of the grid which we use for the efficient parallelization of simulations on dynamically adaptive grids.
With a partitioning based on space-filling curves (SFCs), this yields several advantageous properties regarding the memory requirements and load balancing. However, for such an SFC- based partitioning, additional connectivity information has to be stored and updated for dynamically changing grids.
In this work, we present our approach to keep this connectivity information run-length encoded (RLE) only for the interfaces shared between partitions. Using special properties of the underlying grid traversal and used communication scheme, we update this connectivity information implicitly for dynamically changing grids and can represent the connectivity information as a sparse communication graph: graph nodes (partitions) represent bulks of connected grid cells and each graph edge (RLE connectivity information) a unique relation between adjacent partitions. This directly leads to an efficient shared-memory parallelization with graph nodes assigned to computing cores and an efficient en bloc data exchange via graph edges. We further refer to such a partitioning approach with RLE meta information as a cluster-based domain decomposition and to each partition as a cluster. With the sparse communication graph in mind, we then extend the connectivity information represented by the graph edges with MPI ranks, yielding an en bloc communication for distributed-memory systems and a hybrid parallelization. For data migration, the stack-based intra-cluster communication allows a very low memory footprint for data migration and the RLE leads to efficient updates of connectivity information.
Our benchmark is based on a shallow water simulation on a dynamically adaptive grid. We conducted performance studies for MPI-only and hybrid parallelizations, yielding an efficiency of over 90% on 256 cores. Furthermore, we demonstrate the applicability of cluster-based optimizations on distributed-memory systems.We like to thank the Munich Centre of Advanced Computing for for funding this project by
providing computing time on the MAC Cluster. This work was partly supported by the German
Research Foundation (DFG) as part of the Transregional Collaborative Research Centre
”Invasive Computing” (SFB/TR 89)
Efficient cosmological parameter sampling using sparse grids
We present a novel method to significantly speed up cosmological parameter
sampling. The method relies on constructing an interpolation of the
CMB-log-likelihood based on sparse grids, which is used as a shortcut for the
likelihood-evaluation. We obtain excellent results over a large region in
parameter space, comprising about 25 log-likelihoods around the peak, and we
reproduce the one-dimensional projections of the likelihood almost perfectly.
In speed and accuracy, our technique is competitive to existing approaches to
accelerate parameter estimation based on polynomial interpolation or neural
networks, while having some advantages over them. In our method, there is no
danger of creating unphysical wiggles as it can be the case for polynomial fits
of a high degree. Furthermore, we do not require a long training time as for
neural networks, but the construction of the interpolation is determined by the
time it takes to evaluate the likelihood at the sampling points, which can be
parallelised to an arbitrary degree. Our approach is completely general, and it
can adaptively exploit the properties of the underlying function. We can thus
apply it to any problem where an accurate interpolation of a function is
needed.Comment: Submitted to MNRAS, 13 pages, 13 figure
SFC-based Communication Metadata Encoding for Adaptive Mesh
This volume of the series “Advances in Parallel Computing” contains the proceedings of the International Conference on Parallel Programming – ParCo 2013 – held from 10 to 13 September 2013 in Garching, Germany. The conference was hosted by the Technische Universität München (Department of Informatics) and the Leibniz Supercomputing Centre.The present paper studies two adaptive mesh refinement (AMR) codes
whose grids rely on recursive subdivison in combination with space-filling curves
(SFCs). A non-overlapping domain decomposition based upon these SFCs yields
several well-known advantageous properties with respect to communication demands,
balancing, and partition connectivity. However, the administration of the
meta data, i.e. to track which partitions exchange data in which cardinality, is nontrivial
due to the SFC’s fractal meandering and the dynamic adaptivity. We introduce
an analysed tree grammar for the meta data that restricts it without loss of
information hierarchically along the subdivision tree and applies run length encoding.
Hence, its meta data memory footprint is very small, and it can be computed
and maintained on-the-fly even for permanently changing grids. It facilitates a forkjoin
pattern for shared data parallelism. And it facilitates replicated data parallelism
tackling latency and bandwidth constraints respectively due to communication in
the background and reduces memory requirements by avoiding adjacency information
stored per element. We demonstrate this at hands of shared and distributed
parallelized domain decompositions.This work was supported by the German Research Foundation (DFG) as part of the
Transregional Collaborative Research Centre “Invasive Computing (SFB/TR 89). It is
partially based on work supported by Award No. UK-c0020, made by the King Abdullah
University of Science and Technology (KAUST)
Invasive Computing in HPC with X10
High performance computing with thousands of cores relies on distributed
memory due to memory consistency reasons. The resource
management on such systems usually relies on static assignment of
resources at the start of each application. Such a static scheduling
is incapable of starting applications with required resources being
used by others since a reduction of resources assigned to applications
without stopping them is not possible. This lack of dynamic
adaptive scheduling leads to idling resources until the remaining
amount of requested resources gets available. Additionally, applications
with changing resource requirements lead to idling or less
efficiently used resources. The invasive computing paradigm suggests
dynamic resource scheduling and applications able to dynamically
adapt to changing resource requirements.
As a case study, we developed an invasive resource manager as
well as a multigrid with dynamically changing resource demands.
Such a multigrid has changing scalability behavior during its execution
and requires data migration upon reallocation due to distributed
memory systems.
To counteract the additional complexity introduced by the additional
interfaces, e. g. for data migration, we use the X10 programming
language for improved programmability. Our results show
improved application throughput and the dynamic adaptivity. In addition,
we show our extension for the distributed arrays of X10 to
support data migrationThis work was supported by the German Research Foundation
(DFG) as part of the Transregional Collaborative Research Centre
“Invasive Computing” (SFB/TR 89)
Smolyak's algorithm: A powerful black box for the acceleration of scientific computations
We provide a general discussion of Smolyak's algorithm for the acceleration
of scientific computations. The algorithm first appeared in Smolyak's work on
multidimensional integration and interpolation. Since then, it has been
generalized in multiple directions and has been associated with the keywords:
sparse grids, hyperbolic cross approximation, combination technique, and
multilevel methods. Variants of Smolyak's algorithm have been employed in the
computation of high-dimensional integrals in finance, chemistry, and physics,
in the numerical solution of partial and stochastic differential equations, and
in uncertainty quantification. Motivated by this broad and ever-increasing
range of applications, we describe a general framework that summarizes
fundamental results and assumptions in a concise application-independent
manner
A Dimension-Adaptive Multi-Index Monte Carlo Method Applied to a Model of a Heat Exchanger
We present an adaptive version of the Multi-Index Monte Carlo method,
introduced by Haji-Ali, Nobile and Tempone (2016), for simulating PDEs with
coefficients that are random fields. A classical technique for sampling from
these random fields is the Karhunen-Lo\`eve expansion. Our adaptive algorithm
is based on the adaptive algorithm used in sparse grid cubature as introduced
by Gerstner and Griebel (2003), and automatically chooses the number of terms
needed in this expansion, as well as the required spatial discretizations of
the PDE model. We apply the method to a simplified model of a heat exchanger
with random insulator material, where the stochastic characteristics are
modeled as a lognormal random field, and we show consistent computational
savings
The Superposition Principle: A Conceptual Perspective on Pedestrian Stream Simulations
Models using a superposition of scalar fields for navigation are prevalent in microscopic pedestrian stream simulations. However, classifications, differences, and similarities of models are not clear at the conceptual level of navigation mechanisms. In this paper, we describe the superposition of scalar fields as an approach to microscopic crowd modelling and corresponding motion schemes. We use this background discussion to focus on the similarities and differences of models, and find that many models make use of similar mechanisms for the navigation of virtual agents. In some cases, the differences between models can be reduced to differences between discretisation schemes. The interpretation of scalar fields varies across models, but most of the time this variation does not have a large impact on simulation outcomes. The conceptual analysis of different models of pedestrian dynamics allows for a better understanding of their capabilities and limitations and may lead to better model development and validation
A posteriori error analysis and adaptive non-intrusive numerical schemes for systems of random conservation laws
In this article we consider one-dimensional random systems of hyperbolic
conservation laws. We first establish existence and uniqueness of random
entropy admissible solutions for initial value problems of conservation laws
which involve random initial data and random flux functions. Based on these
results we present an a posteriori error analysis for a numerical approximation
of the random entropy admissible solution. For the stochastic discretization,
we consider a non-intrusive approach, the Stochastic Collocation method. The
spatio-temporal discretization relies on the Runge--Kutta Discontinuous
Galerkin method. We derive the a posteriori estimator using continuous
reconstructions of the discrete solution. Combined with the relative entropy
stability framework this yields computable error bounds for the entire
space-stochastic discretization error. The estimator admits a splitting into a
stochastic and a deterministic (space-time) part, allowing for a novel
residual-based space-stochastic adaptive mesh refinement algorithm. We conclude
with various numerical examples investigating the scaling properties of the
residuals and illustrating the efficiency of the proposed adaptive algorithm
Research and Education in Computational Science and Engineering
Over the past two decades the field of computational science and engineering
(CSE) has penetrated both basic and applied research in academia, industry, and
laboratories to advance discovery, optimize systems, support decision-makers,
and educate the scientific and engineering workforce. Informed by centuries of
theory and experiment, CSE performs computational experiments to answer
questions that neither theory nor experiment alone is equipped to answer. CSE
provides scientists and engineers of all persuasions with algorithmic
inventions and software systems that transcend disciplines and scales. Carried
on a wave of digital technology, CSE brings the power of parallelism to bear on
troves of data. Mathematics-based advanced computing has become a prevalent
means of discovery and innovation in essentially all areas of science,
engineering, technology, and society; and the CSE community is at the core of
this transformation. However, a combination of disruptive
developments---including the architectural complexity of extreme-scale
computing, the data revolution that engulfs the planet, and the specialization
required to follow the applications to new frontiers---is redefining the scope
and reach of the CSE endeavor. This report describes the rapid expansion of CSE
and the challenges to sustaining its bold advances. The report also presents
strategies and directions for CSE research and education for the next decade.Comment: Major revision, to appear in SIAM Revie
- …