57 research outputs found

    Automatic Synthesis of Low-Complexity Translation Operators for the Fast Multipole Method

    Full text link
    We demonstrate a new, hybrid symbolic-numerical method for the automatic synthesis of all families of translation operators required for the execution of the Fast Multipole Method (FMM). Our method is applicable in any dimensionality and to any translation-invariant kernel. The Fast Multipole Method, of course, is the leading approach for attaining linear complexity in the evaluation of long-range (e.g. Coulomb) many-body interactions. Low complexity in translation operators for the Fast Multipole Method (FMM) is usually achieved by algorithms specialized for a potential obeying a specific partial differential equation (PDE). Absent a PDE or specialized algorithms, Taylor series based FMMs or kernel-independent FMM have been used, at asymptotically higher expense. When symbolically provided with a constant-coefficient elliptic PDE obeyed by the potential, our algorithm can automatically synthesize translation operators requiring O(pd)\mathrm{O}(p^d) operations, where pp is the expansion order and dd is dimension, compared with O(p2d)\mathrm{O}(p^{2d}) operations in a naive approach carried out on (Cartesian) Taylor expansions. This is achieved by using a compression scheme that asymptotically reduces the number of terms in the Taylor expansion and then operating directly on this ``compressed'' representation. Judicious exploitation of shared subexpressions permits formation, translation, and evaluation of local and multipole expansions to be performed in O(pd)\mathrm{O}(p^{d}) operations, while an FFT-based scheme permits multipole-to-local translations in O(pd−1log⁡(p))\mathrm{O}(p^{d-1}\log(p)) operations. We demonstrate computational scaling of code generation and evaluation as well as numerical accuracy through numerical experiments on a number of potentials from classical physics

    Algorithmische und Code-Optimierungen Molekulardynamiksimulationen fĂŒr Verfahrenstechnik

    Get PDF
    The focus of this work lies on implementational improvements and, in particular, node-level performance optimization of the simulation software ls1-mardyn. Through data structure improvements, SIMD vectorization and, especially, OpenMP parallelization, the world’s first simulation of 2*1013 molecules at over 1 PFLOP/sec was enabled. To allow for long-range interactions, the Fast Multipole Method was introduced to ls1-mardyn. The algorithm was optimized for sequential, shared-memory, and distributed-memory execution on up to 32,768 MPI processes.Der Fokus dieser Arbeit liegt auf Code-Optimierungen und insbesondere Leistungsoptimierung auf Knoten-Ebene fĂŒr die Simulationssoftware ls1-mardyn. Durch verbesserte Datenstrukturen, SIMD-Vektorisierung und vor allem OpenMP-Parallelisierung wurde die weltweit erste Petaflop-Simulation von 2*1013 MolekĂŒlen ermöglicht. Zur Simulation von langreichweitigen Wechselwirkungen wurde die Fast-Multipole-Methode in ls1-mardyn eingefĂŒhrt. Sequenzielle, Shared- und Distributed-Memory-Optimierungen wurden angewandt und erlaubten eine AusfĂŒhrung auf bis zu 32768 MPI-Prozessen

    GPU fast multipole method with lambda-dynamics features

    Get PDF
    A significant and computationally most demanding part of molecular dynamics simulations is the calculation of long-range electrostatic interactions. Such interactions can be evaluated directly by the naĂŻve pairwise summation algorithm, which is a ubiquitous showcase example for the compute power of graphics processing units (GPUS). However, the pairwise summation has O(N^2) computational complexity for N interacting particles; thus, an approximation method with a better scaling is required. Today, the prevalent method for such approximation in the field is particle mesh Ewald (PME). PME takes advantage of fast Fourier transforms (FFTS) to approximate the solution efficiently. However, as the underlying FFTS require all-to-all communication between ranks, PME runs into a communication bottleneck. Such communication overhead is negligible only for a moderate parallelization. With increased parallelization, as needed for high-performance applications, the usage of PME becomes unprofitable. Another PME drawback is its inability to perform constant pH simulations efficiently. In such simulations, the protonation states of a protein are allowed to change dynamically during the simulation. The description of this process requires a separate evaluation of the energies for each protonation state. This can not be calculated efficiently with PME as the algorithm requires a repeated FFT for each state, which leads to a linear overhead with respect to the number of states. For a fast approximation of pairwise Coulombic interactions, which does not suffer from PME drawbacks, the Fast Multipole Method (FMM) has been implemented and fully parallelized with CUDA. To assure the optimal FMM performance for diverse MD systems multiple parallelization strategies have been developed. The algorithm has been efficiently incorporated into GROMACS and subsequently tested to determine the optimal FMM parameter set for MD simulations. Finally, the FMM has been incorporated into GROMACS to allow for out-of-the-box electrostatic calculations. The performance of the single-GPU FMM implementation, tested in GROMACS 2019, achieves about a third of highly optimized CUDA PME performance when simulating systems with uniform particle distributions. However, the FMM is expected to outperform PME at high parallelization because the FMM global communication overhead is minimal compared to that of PME. Further, the FMM has been enhanced to provide the energies of an arbitrary number of titratable sites as needed in the constant-pH method. The extension is not fully optimized yet, but the first results show the strength of the FMM for constant pH simulations. For a relatively large system with half a million particles and more than a hundred titratable sites, a straightforward approach to compute alternative energies requires the repetition of a simulation for each state of the sites. The FMM calculates all energy terms only a factor 1.5 slower than a single simulation step. Further improvements of the GPU implementation are expected to yield even more speedup compared to the actual implementation.2021-11-1

    Integral-equation-based fast algorithms and graph-theoretic methods for large-scale simulations

    Get PDF
    In this dissertation, we extend Greengard and Rokhlin's seminal work on fast multipole method (FMM) in three aspects. First, we have implemented and released open-source new-version of FMM solvers for the Laplace, Yukawa, and low-frequency Helmholtz equations to further broaden and facilitate the applications of FMM in different scientific fields. Secondly, we propose a graph-theoretic parallelization scheme to map the FMM onto modern parallel computer architectures. We have particularly established the critical path analysis, exponential node growth condition for concurrency-breadth, and a spatio-temporal graph partition strategy. Thirdly, we introduce a new kernel-independent FMM based on Fourier series expansions and discuss how information can be collected, compressed, and transmitted through the tree structure for a wide class of kernels

    Simulating cosmic structure formation with the GADGET-4 code

    Full text link
    Numerical methods have become a powerful tool for research in astrophysics, but their utility depends critically on the availability of suitable simulation codes. This calls for continuous efforts in code development, which is necessitated also by the rapidly evolving technology underlying today's computing hardware. Here we discuss recent methodological progress in the GADGET code, which has been widely applied in cosmic structure formation over the past two decades. The new version offers improvements in force accuracy, in time-stepping, in adaptivity to a large dynamic range in timescales, in computational efficiency, and in parallel scalability through a special MPI/shared-memory parallelization and communication strategy, and a more-sophisticated domain decomposition algorithm. A manifestly momentum conserving fast multipole method (FMM) can be employed as an alternative to the one-sided TreePM gravity solver introduced in earlier versions. Two different flavours of smoothed particle hydrodynamics, a classic entropy-conserving formulation and a pressure-based approach, are supported for dealing with gaseous flows. The code is able to cope with very large problem sizes, thus allowing accurate predictions for cosmic structure formation in support of future precision tests of cosmology, and at the same time is well adapted to high dynamic range zoom-calculations with extreme variability of the particle number density in the simulated volume. The GADGET-4 code is publicly released to the community and contains infrastructure for on-the-fly group and substructure finding and tracking, as well as merger tree building, a simple model for radiative cooling and star formation, a high dynamic range power spectrum estimator, and an initial conditions generator based on second-order Lagrangian perturbation theory.Comment: 82 pages, 65 figures, accepted by MNRAS, for the code see https://wwwmpa.mpa-garching.mpg.de/gadget

    Gratings: Theory and Numeric Applications

    Get PDF
    International audienceThe book containes 11 chapters written by an international team of specialist in electromagnetic theory, numerical methods for modelling of light diffraction by periodic structures having one-, two-, or three-dimensional periodicity, and aiming numerous applications in many classical domains like optical engineering, spectroscopy, and optical telecommunications, together with newly born fields such as photonics, plasmonics, photovoltaics, metamaterials studies, cloaking, negative refraction, and super-lensing. Each chapter presents in detail a specific theoretical method aiming to a direct numerical application by university and industrial researchers and engineers

    Gratings: Theory and Numeric Applications, Second Revisited Edition

    Get PDF
    International audienceThe second Edition of the Book contains 13 chapters, written by an international team of specialist in electromagnetic theory, numerical methods for modelling of light diffraction by periodic structures having one-, two-, or three-dimensional periodicity, and aiming numerous applications in many classical domains like optical engineering, spectroscopy, and optical telecommunications, together with newly born fields such as photonics, plasmonics, photovoltaics, metamaterials studies, cloaking, negative refraction, and super-lensing. Each chapter presents in detail a specific theoretical method aiming to a direct numerical application by university and industrial researchers and engineers.In comparison with the First Edition, we have added two more chapters (ch.12 and ch.13), and revised four other chapters (ch.6, ch.7, ch.10, and ch.11

    Annual Research Briefs: 1995

    Get PDF
    This report contains the 1995 annual progress reports of the Research Fellows and students of the Center for Turbulence Research (CTR). In 1995 CTR continued its concentration on the development and application of large-eddy simulation to complex flows, development of novel modeling concepts for engineering computations in the Reynolds averaged framework, and turbulent combustion. In large-eddy simulation, a number of numerical and experimental issues have surfaced which are being addressed. The first group of reports in this volume are on large-eddy simulation. A key finding in this area was the revelation of possibly significant numerical errors that may overwhelm the effects of the subgrid-scale model. We also commissioned a new experiment to support the LES validation studies. The remaining articles in this report are concerned with Reynolds averaged modeling, studies of turbulence physics and flow generated sound, combustion, and simulation techniques. Fundamental studies of turbulent combustion using direct numerical simulations which started at CTR will continue to be emphasized. These studies and their counterparts carried out during the summer programs have had a noticeable impact on combustion research world wide

    On non-Gaussian beams and optomechanical parametric instabilities in interferometric gravitational wave detectors

    Get PDF
    Direct detection of gravitational radiation, predicted by Einstein’s general theory of relativity, remains one of the most exciting challenges in experimental physics. Due to their relatively weak interaction with matter, gravitational waves promise to allow exploration of hitherto inaccessible objects and epochs. Unfortunately, this weak coupling also hinders detection with strain amplitudes at the Earth estimated to be of order 10^−21. Due to their wide bandwidth and theoretical sensitivity, kilometre-scale Michelson style interferometers have become the preferred instrument with which to attempt ground based detection. A worldwide network of first generation instruments has been constructed and prodigious volumes of data recorded. Despite each instrument approaching or having reached its design sensitivity, a confirmed detection remains elusive. Planned upgrades to these instruments aim to increase strain sensitivity by an order of magnitude, commencing the era of second generation detectors. Entry into this regime will be accompanied by an entirely new set of challenges, two of which are addressed in this work. As advanced interferometers are commissioned, instrumental artifacts will give way to fundamental noise sources. In the region of peak sensitivity it is expected that thermal noise in the interferometers’ dielectric mirror coatings will be the principal source of displacement noise. Theory suggests that increasing the spot size of laser light incident on these mirrors will reduce the measured thermal noise. In the first part of this work we examine one method of realising larger spots. By adopting non-spherical mirrors in the interferometers’ arms it is possible to create resonators which support a wide, flat-topped field known as the mesa beam. This beam has been shown to theoretically reduce all forms of mirror thermal noise without being significantly more difficult to control. In this work we investigate these claims using a bespoke prototype mirror. The first results regarding a non-Gaussian beam created in a manner applicable to a gravitational wave interferometer are presented. A common theme among all second generation interferometer designs is the desire to maximise circulating power. This increased power is partnered by commensurately increased thermal perturbations. Since the attractive properties of the mesa beam are produced by the fine structure of its supporting mirrors, it is important that we understand the effects absorption of stored optical power could have on mesa fields. In the second part of this work we report on numerical evaluations of measured thermal noise and mesa beam intensity profile as a function of absorbed power. Increased optical power also has less obvious consequences. As a result of radiation pressure, there exists a pathway between optical energy stored in an interferometer’s arms and mechanical energy stored in the acoustic modes of its test masses. Under appropriate conditions, this coupling can excite one or more test masses to such a degree that interferometer operation becomes impossible. In the final part of this work we determine whether it is possible to mitigate these parametric instabilities using electrostatic actuators originally designed to control the position and orientation of the test masses
    • 

    corecore