24 research outputs found
2HOT: An Improved Parallel Hashed Oct-Tree N-Body Algorithm for Cosmological Simulation
We report on improvements made over the past two decades to our adaptive
treecode N-body method (HOT). A mathematical and computational approach to the
cosmological N-body problem is described, with performance and scalability
measured up to 256k () processors. We present error analysis and
scientific application results from a series of more than ten 69 billion
() particle cosmological simulations, accounting for
floating point operations. These results include the first simulations using
the new constraints on the standard model of cosmology from the Planck
satellite. Our simulations set a new standard for accuracy and scientific
throughput, while meeting or exceeding the computational efficiency of the
latest generation of hybrid TreePM N-body methods.Comment: 12 pages, 8 figures, 77 references; To appear in Proceedings of SC
'1
ASKIT: Approximate Skeletonization Kernel-Independent Treecode in High Dimensions
We present a fast algorithm for kernel summation problems in high-dimensions.
These problems appear in computational physics, numerical approximation,
non-parametric statistics, and machine learning. In our context, the sums
depend on a kernel function that is a pair potential defined on a dataset of
points in a high-dimensional Euclidean space. A direct evaluation of the sum
scales quadratically with the number of points. Fast kernel summation methods
can reduce this cost to linear complexity, but the constants involved do not
scale well with the dimensionality of the dataset.
The main algorithmic components of fast kernel summation algorithms are the
separation of the kernel sum between near and far field (which is the basis for
pruning) and the efficient and accurate approximation of the far field.
We introduce novel methods for pruning and approximating the far field. Our
far field approximation requires only kernel evaluations and does not use
analytic expansions. Pruning is not done using bounding boxes but rather
combinatorially using a sparsified nearest-neighbor graph of the input. The
time complexity of our algorithm depends linearly on the ambient dimension. The
error in the algorithm depends on the low-rank approximability of the far
field, which in turn depends on the kernel function and on the intrinsic
dimensionality of the distribution of the points. The error of the far field
approximation does not depend on the ambient dimension.
We present the new algorithm along with experimental results that demonstrate
its performance. We report results for Gaussian kernel sums for 100 million
points in 64 dimensions, for one million points in 1000 dimensions, and for
problems in which the Gaussian kernel has a variable bandwidth. To the best of
our knowledge, all of these experiments are impossible or prohibitively
expensive with existing fast kernel summation methods.Comment: 22 pages, 6 figure
Development and Application of Numerical Methods in Biomolecular Solvation
This work addresses the development of fast summation methods for long range particle interactions and their application to problems in biomolecular solvation, which describes the interaction of proteins or other biomolecules with their solvent environment. At the core of this work are treecodes, tree-based fast summation methods which, for N particles, reduce the cost of computing particle interactions from O(N^2) to O(N log N). Background on fast summation methods and treecodes in particular, as well as several treecode improvements developed in the early stages of this work, are presented.
Building on treecodes, dual tree traversal (DTT) methods are another class of tree-based fast summation methods which reduce the cost of computing particle interactions for N particles to O(N). The primary result of this work is the development of an O(N) dual tree traversal fast summation method based on barycentric Lagrange polynomial interpolation (BLDTT). This method is implemented to run across multiple GPU compute nodes in the software package BaryTree. Across different problem sizes, particle distributions, geometries, and interaction kernels, the BLDTT shows consistently better performance than the previously developed barycentric Lagrange treecode (BLTC).
The first major biomolecular solvation application of fast summation methods presented is to the Poisson–Boltzmann implicit solvent model, and in particular, the treecode-accelerated boundary integral Poisson–Boltzmann solver (TABI-PB). The work on TABI-PB consists of three primary projects and an application. The first project investigates the impact of various biomolecular surface meshing codes on TABI-PB, and integrated the NanoShaper software into the package, resulting in significantly better performance. Second, a node patch method for discretizing the system of integral equations is introduced to replace the previous centroid collocation scheme, resulting in faster convergence of solvation energies. Third, a new version of TABI-PB with GPU acceleration based on the BLDTT is developed, resulting in even more scalability. An application investigating the binding of biomolecular complexes is undertaken using the previous Taylor treecode-based version of TABI-PB. In addition to these projects, work performed over the course of this thesis integrated TABI-PB into the popular Adaptive Poisson–Boltzmann Solver (APBS) developed at Pacific Northwest National Laboratory.
The second major application of fast summation methods is to the 3D reference interaction site model (3D-RISM), a statistical-mechanics based continuum solvation model. This work applies cluster-particle Taylor expansion treecodes to treat long-range asymptotic Coulomb-like potentials in 3D-RISM, and results in significant speedups and improved scalability to the 3D-RISM package implemented in AmberTools. Additionally, preliminary work on specialized GPU-accelerated treecodes based on BaryTree
for 3D-RISM long-range asymptotic functions is presented.PHDApplied and Interdisciplinary MathematicsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/168120/1/lwwilson_1.pd
Vorticity structure and evolution in a transverse jet with new algorithms for scalable particle simulation
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 2004.Includes bibliographical references (p. 188-200).Transverse jets arise in many applications, including propulsion, effluent dispersion, oil field flows, V/STOL aerodynamics, and drug delivery. Furthermore, they exemplify flows dominated by coherent structures that cascade into smaller scales, a source of many current challenges in fluid dynamics. This study seeks a fundamental, mechanistic understanding of the relationship between the dispersion of jet fluid and the underlying vortical structures of the transverse jet-and of how to develop actuation that optimally manipulates their dynamics to affect mixing. We develop a massively parallel 3-D vortex simulation of a high-momentum transverse jet at large Reynolds number, featuring a discrete filament representation of the vorticity field with local mesh refinement to capture stretching and folding and hair-pin removal to regularize the formation of small scales. A novel formulation of the vorticity flux boundary conditions rigorously accounts for the interaction of channel vorticity with the jet boundary layer. This formulation yields analytical expressions for vortex lines in near field of the jet and suggests effective modes of unsteady actuation at the nozzle. The present computational approach requires hierarchical N-body methods for velocity evaluation at each timestep, as direct summation is prohibitively expensive. We introduce new clustering algorithms for parallel domain decomposition of N-body interactions and demonstrate the optimality of the resulting cluster geometries. We also develop compatible techniques for dynamic load balancing, including adaptive scaling of cluster metrics and adaptive redistribution of their centroids. These tools extend to parallel hierarchical simulation of N-body problems in gravitational astrophysics,(cont.) molecular dynamics, and other fields. Simulations reveal the mechanisms by which vortical structures evolve; previous computational and experimental investigations of these processes have been incomplete at best, limited to low Reynolds numbers, transient early-stage dynamics, or Eulerian diagnostics of essentially Lagrangian phenomena. Transformation of the cylindrical shear layer emanating from the nozzle, initially dominated by azimuthal vorticity, begins with axial elongation of its lee side to form sections of counter-rotating vorticity aligned with the jet trajectory. Periodic rollup of the shear layer accompanies this deformation, creating arcs carrying azimuthal vorticity of alternating signs, curved toward the windward side of the jet. Following the pronounced bending of the trajectory into the crossflow, we observe a catastrophic breakdown of these sparse periodic structures into a dense distribution of smaller scales, with an attendant complexity of tangled vortex filaments. Nonetheless, spatial filtering of this region reveals the persistence of counter-rotating streamwise vorticity. We further characterize the flow by calculating maximum direct Lyapunov exponents of particle trajectories, identifying repelling material surfaces that organize finite-time mixing.by Youssef Mohamed Marzouk.Ph.D
Geometry-Oblivious FMM for Compressing Dense SPD Matrices
We present GOFMM (geometry-oblivious FMM), a novel method that creates a
hierarchical low-rank approximation, "compression," of an arbitrary dense
symmetric positive definite (SPD) matrix. For many applications, GOFMM enables
an approximate matrix-vector multiplication in or even time,
where is the matrix size. Compression requires storage and work.
In general, our scheme belongs to the family of hierarchical matrix
approximation methods. In particular, it generalizes the fast multipole method
(FMM) to a purely algebraic setting by only requiring the ability to sample
matrix entries. Neither geometric information (i.e., point coordinates) nor
knowledge of how the matrix entries have been generated is required, thus the
term "geometry-oblivious." Also, we introduce a shared-memory parallel scheme
for hierarchical matrix computations that reduces synchronization barriers. We
present results on the Intel Knights Landing and Haswell architectures, and on
the NVIDIA Pascal architecture for a variety of matrices.Comment: 13 pages, accepted by SC'1