57 research outputs found
Automatic Synthesis of Low-Complexity Translation Operators for the Fast Multipole Method
We demonstrate a new, hybrid symbolic-numerical method for the automatic
synthesis of all families of translation operators required for the execution
of the Fast Multipole Method (FMM). Our method is applicable in any
dimensionality and to any translation-invariant kernel. The Fast Multipole
Method, of course, is the leading approach for attaining linear complexity in
the evaluation of long-range (e.g. Coulomb) many-body interactions. Low
complexity in translation operators for the Fast Multipole Method (FMM) is
usually achieved by algorithms specialized for a potential obeying a specific
partial differential equation (PDE). Absent a PDE or specialized algorithms,
Taylor series based FMMs or kernel-independent FMM have been used, at
asymptotically higher expense.
When symbolically provided with a constant-coefficient elliptic PDE obeyed by
the potential, our algorithm can automatically synthesize translation operators
requiring operations, where is the expansion order and
is dimension, compared with operations in a naive
approach carried out on (Cartesian) Taylor expansions. This is achieved by
using a compression scheme that asymptotically reduces the number of terms in
the Taylor expansion and then operating directly on this ``compressed''
representation. Judicious exploitation of shared subexpressions permits
formation, translation, and evaluation of local and multipole expansions to be
performed in operations, while an FFT-based scheme permits
multipole-to-local translations in operations. We
demonstrate computational scaling of code generation and evaluation as well as
numerical accuracy through numerical experiments on a number of potentials from
classical physics
Algorithmische und Code-Optimierungen Molekulardynamiksimulationen fĂŒr Verfahrenstechnik
The focus of this work lies on implementational improvements and, in particular, node-level performance optimization of the simulation software ls1-mardyn. Through data structure improvements, SIMD vectorization and, especially, OpenMP parallelization, the worldâs first simulation of 2*1013 molecules at over 1 PFLOP/sec was enabled. To allow for long-range interactions, the Fast Multipole Method was introduced to ls1-mardyn. The algorithm was optimized for sequential, shared-memory, and distributed-memory execution on up to 32,768 MPI processes.Der Fokus dieser Arbeit liegt auf Code-Optimierungen und insbesondere Leistungsoptimierung auf Knoten-Ebene fĂŒr die Simulationssoftware ls1-mardyn. Durch verbesserte Datenstrukturen, SIMD-Vektorisierung und vor allem OpenMP-Parallelisierung wurde die weltweit erste Petaflop-Simulation von 2*1013 MolekĂŒlen ermöglicht. Zur Simulation von langreichweitigen Wechselwirkungen wurde die Fast-Multipole-Methode in ls1-mardyn eingefĂŒhrt. Sequenzielle, Shared- und Distributed-Memory-Optimierungen wurden angewandt und erlaubten eine AusfĂŒhrung auf bis zu 32768 MPI-Prozessen
GPU fast multipole method with lambda-dynamics features
A significant and computationally most demanding part of molecular dynamics simulations is the calculation of long-range electrostatic interactions. Such interactions can be evaluated directly by the naĂŻve pairwise summation algorithm, which is a ubiquitous showcase example for the compute power of graphics processing units (GPUS). However, the pairwise summation has O(N^2) computational complexity for N interacting particles; thus, an approximation method with a better scaling is required. Today, the prevalent method for such approximation in the field is particle mesh Ewald (PME). PME takes advantage of fast Fourier transforms (FFTS) to approximate the solution efficiently. However, as the underlying FFTS require all-to-all communication between ranks, PME runs into a communication bottleneck. Such communication overhead is negligible only for a moderate parallelization. With increased parallelization, as needed for high-performance applications, the usage of PME becomes unprofitable. Another PME drawback is its inability to perform constant pH simulations efficiently. In such simulations, the protonation states of a protein are allowed to change dynamically during the simulation. The description of this process requires a separate evaluation of the energies for each protonation state. This can not be calculated efficiently with PME as the algorithm requires a repeated FFT for each state, which leads to a linear overhead with respect to the number of states. For a fast approximation of pairwise Coulombic interactions, which does not suffer from PME drawbacks, the Fast Multipole Method (FMM) has been implemented and fully parallelized with CUDA. To assure the optimal FMM performance for diverse MD systems multiple parallelization strategies have been developed. The algorithm has been efficiently incorporated into GROMACS and subsequently tested to determine the optimal FMM parameter set for MD simulations. Finally, the FMM has been incorporated into GROMACS to allow for out-of-the-box electrostatic calculations. The performance of the single-GPU FMM implementation, tested in GROMACS 2019, achieves about a third of highly optimized CUDA PME performance when simulating systems with uniform particle distributions. However, the FMM is expected to outperform PME at high parallelization because the FMM global communication overhead is minimal compared to that of PME. Further, the FMM has been enhanced to provide the energies of an arbitrary number of titratable sites as needed in the constant-pH method. The extension is not fully optimized yet, but the first results show the strength of the FMM for constant pH simulations. For a relatively large system with half a million particles and more than a hundred titratable sites, a straightforward approach to compute alternative energies requires the repetition of a simulation for each state of the sites. The FMM calculates all energy terms only a factor 1.5 slower than a single simulation step. Further improvements of the GPU implementation are expected to yield even more speedup compared to the actual implementation.2021-11-1
Integral-equation-based fast algorithms and graph-theoretic methods for large-scale simulations
In this dissertation, we extend Greengard and Rokhlin's seminal work on fast multipole method (FMM) in three aspects. First, we have implemented and released open-source new-version of FMM solvers for the Laplace, Yukawa, and low-frequency Helmholtz equations to further broaden and facilitate the applications of FMM in different scientific fields. Secondly, we propose a graph-theoretic parallelization scheme to map the FMM onto modern parallel computer architectures. We have particularly established the critical path analysis, exponential node growth condition for concurrency-breadth, and a spatio-temporal graph partition strategy. Thirdly, we introduce a new kernel-independent FMM based on Fourier series expansions and discuss how information can be collected, compressed, and transmitted through the tree structure for a wide class of kernels
Simulating cosmic structure formation with the GADGET-4 code
Numerical methods have become a powerful tool for research in astrophysics,
but their utility depends critically on the availability of suitable simulation
codes. This calls for continuous efforts in code development, which is
necessitated also by the rapidly evolving technology underlying today's
computing hardware. Here we discuss recent methodological progress in the
GADGET code, which has been widely applied in cosmic structure formation over
the past two decades. The new version offers improvements in force accuracy, in
time-stepping, in adaptivity to a large dynamic range in timescales, in
computational efficiency, and in parallel scalability through a special
MPI/shared-memory parallelization and communication strategy, and a
more-sophisticated domain decomposition algorithm. A manifestly momentum
conserving fast multipole method (FMM) can be employed as an alternative to the
one-sided TreePM gravity solver introduced in earlier versions. Two different
flavours of smoothed particle hydrodynamics, a classic entropy-conserving
formulation and a pressure-based approach, are supported for dealing with
gaseous flows. The code is able to cope with very large problem sizes, thus
allowing accurate predictions for cosmic structure formation in support of
future precision tests of cosmology, and at the same time is well adapted to
high dynamic range zoom-calculations with extreme variability of the particle
number density in the simulated volume. The GADGET-4 code is publicly released
to the community and contains infrastructure for on-the-fly group and
substructure finding and tracking, as well as merger tree building, a simple
model for radiative cooling and star formation, a high dynamic range power
spectrum estimator, and an initial conditions generator based on second-order
Lagrangian perturbation theory.Comment: 82 pages, 65 figures, accepted by MNRAS, for the code see
https://wwwmpa.mpa-garching.mpg.de/gadget
Gratings: Theory and Numeric Applications
International audienceThe book containes 11 chapters written by an international team of specialist in electromagnetic theory, numerical methods for modelling of light diffraction by periodic structures having one-, two-, or three-dimensional periodicity, and aiming numerous applications in many classical domains like optical engineering, spectroscopy, and optical telecommunications, together with newly born fields such as photonics, plasmonics, photovoltaics, metamaterials studies, cloaking, negative refraction, and super-lensing. Each chapter presents in detail a specific theoretical method aiming to a direct numerical application by university and industrial researchers and engineers
Gratings: Theory and Numeric Applications, Second Revisited Edition
International audienceThe second Edition of the Book contains 13 chapters, written by an international team of specialist in electromagnetic theory, numerical methods for modelling of light diffraction by periodic structures having one-, two-, or three-dimensional periodicity, and aiming numerous applications in many classical domains like optical engineering, spectroscopy, and optical telecommunications, together with newly born fields such as photonics, plasmonics, photovoltaics, metamaterials studies, cloaking, negative refraction, and super-lensing. Each chapter presents in detail a specific theoretical method aiming to a direct numerical application by university and industrial researchers and engineers.In comparison with the First Edition, we have added two more chapters (ch.12 and ch.13), and revised four other chapters (ch.6, ch.7, ch.10, and ch.11
Annual Research Briefs: 1995
This report contains the 1995 annual progress reports of the Research Fellows and students of the Center for Turbulence Research (CTR). In 1995 CTR continued its concentration on the development and application of large-eddy simulation to complex flows, development of novel modeling concepts for engineering computations in the Reynolds averaged framework, and turbulent combustion. In large-eddy simulation, a number of numerical and experimental issues have surfaced which are being addressed. The first group of reports in this volume are on large-eddy simulation. A key finding in this area was the revelation of possibly significant numerical errors that may overwhelm the effects of the subgrid-scale model. We also commissioned a new experiment to support the LES validation studies. The remaining articles in this report are concerned with Reynolds averaged modeling, studies of turbulence physics and flow generated sound, combustion, and simulation techniques. Fundamental studies of turbulent combustion using direct numerical simulations which started at CTR will continue to be emphasized. These studies and their counterparts carried out during the summer programs have had a noticeable impact on combustion research world wide
On non-Gaussian beams and optomechanical parametric instabilities in interferometric gravitational wave detectors
Direct detection of gravitational radiation, predicted by Einsteinâs general theory of relativity, remains one of the most exciting challenges in experimental physics. Due to their relatively weak interaction with matter, gravitational waves promise to allow exploration of hitherto inaccessible objects and epochs. Unfortunately, this weak coupling also hinders detection with strain amplitudes at the Earth estimated to be of order 10^â21.
Due to their wide bandwidth and theoretical sensitivity, kilometre-scale Michelson style interferometers have become the preferred instrument with which to attempt ground based detection. A worldwide network of first generation instruments has been constructed and prodigious volumes of data recorded. Despite each instrument approaching or having reached its design sensitivity, a confirmed detection remains elusive.
Planned upgrades to these instruments aim to increase strain sensitivity by an order of magnitude, commencing the era of second generation detectors. Entry into this regime will be accompanied by an entirely new set of challenges, two of which are addressed in this work.
As advanced interferometers are commissioned, instrumental artifacts will give way to fundamental noise sources. In the region of peak sensitivity it is expected that thermal noise in the interferometersâ dielectric mirror coatings will be the principal source of displacement noise. Theory suggests that increasing the spot size of laser light incident on these mirrors will reduce the measured thermal noise. In the first part of this work we examine one method of realising larger spots.
By adopting non-spherical mirrors in the interferometersâ arms it is possible to create resonators which support a wide, flat-topped field known as the mesa beam. This beam has been shown to theoretically reduce all forms of mirror thermal noise without being significantly more difficult to control. In this work we investigate these claims using a bespoke prototype mirror. The first results regarding a non-Gaussian beam created in a manner applicable to a gravitational wave interferometer are presented.
A common theme among all second generation interferometer designs is the desire to maximise circulating power. This increased power is partnered by commensurately increased thermal perturbations. Since the attractive properties of the mesa beam are produced by the fine structure of its supporting mirrors, it is important that we understand the effects absorption of stored optical power could have on mesa fields. In the second part of this work we report on numerical evaluations of measured thermal noise and mesa beam intensity profile as a function of absorbed power.
Increased optical power also has less obvious consequences. As a result of radiation pressure, there exists a pathway between optical energy stored in an interferometerâs arms and mechanical energy stored in the acoustic modes of its test masses. Under appropriate conditions, this coupling can excite one or more test masses to such a degree that interferometer operation becomes impossible. In the final part of this work we determine whether it is possible to mitigate these parametric instabilities using electrostatic actuators originally designed to control the position and orientation of the test masses
- âŠ