24 research outputs found

    2HOT: An Improved Parallel Hashed Oct-Tree N-Body Algorithm for Cosmological Simulation

    Full text link
    We report on improvements made over the past two decades to our adaptive treecode N-body method (HOT). A mathematical and computational approach to the cosmological N-body problem is described, with performance and scalability measured up to 256k (2182^{18}) processors. We present error analysis and scientific application results from a series of more than ten 69 billion (409634096^3) particle cosmological simulations, accounting for 4×10204 \times 10^{20} floating point operations. These results include the first simulations using the new constraints on the standard model of cosmology from the Planck satellite. Our simulations set a new standard for accuracy and scientific throughput, while meeting or exceeding the computational efficiency of the latest generation of hybrid TreePM N-body methods.Comment: 12 pages, 8 figures, 77 references; To appear in Proceedings of SC '1

    ASKIT: Approximate Skeletonization Kernel-Independent Treecode in High Dimensions

    Full text link
    We present a fast algorithm for kernel summation problems in high-dimensions. These problems appear in computational physics, numerical approximation, non-parametric statistics, and machine learning. In our context, the sums depend on a kernel function that is a pair potential defined on a dataset of points in a high-dimensional Euclidean space. A direct evaluation of the sum scales quadratically with the number of points. Fast kernel summation methods can reduce this cost to linear complexity, but the constants involved do not scale well with the dimensionality of the dataset. The main algorithmic components of fast kernel summation algorithms are the separation of the kernel sum between near and far field (which is the basis for pruning) and the efficient and accurate approximation of the far field. We introduce novel methods for pruning and approximating the far field. Our far field approximation requires only kernel evaluations and does not use analytic expansions. Pruning is not done using bounding boxes but rather combinatorially using a sparsified nearest-neighbor graph of the input. The time complexity of our algorithm depends linearly on the ambient dimension. The error in the algorithm depends on the low-rank approximability of the far field, which in turn depends on the kernel function and on the intrinsic dimensionality of the distribution of the points. The error of the far field approximation does not depend on the ambient dimension. We present the new algorithm along with experimental results that demonstrate its performance. We report results for Gaussian kernel sums for 100 million points in 64 dimensions, for one million points in 1000 dimensions, and for problems in which the Gaussian kernel has a variable bandwidth. To the best of our knowledge, all of these experiments are impossible or prohibitively expensive with existing fast kernel summation methods.Comment: 22 pages, 6 figure

    Development and Application of Numerical Methods in Biomolecular Solvation

    Full text link
    This work addresses the development of fast summation methods for long range particle interactions and their application to problems in biomolecular solvation, which describes the interaction of proteins or other biomolecules with their solvent environment. At the core of this work are treecodes, tree-based fast summation methods which, for N particles, reduce the cost of computing particle interactions from O(N^2) to O(N log N). Background on fast summation methods and treecodes in particular, as well as several treecode improvements developed in the early stages of this work, are presented. Building on treecodes, dual tree traversal (DTT) methods are another class of tree-based fast summation methods which reduce the cost of computing particle interactions for N particles to O(N). The primary result of this work is the development of an O(N) dual tree traversal fast summation method based on barycentric Lagrange polynomial interpolation (BLDTT). This method is implemented to run across multiple GPU compute nodes in the software package BaryTree. Across different problem sizes, particle distributions, geometries, and interaction kernels, the BLDTT shows consistently better performance than the previously developed barycentric Lagrange treecode (BLTC). The first major biomolecular solvation application of fast summation methods presented is to the Poisson–Boltzmann implicit solvent model, and in particular, the treecode-accelerated boundary integral Poisson–Boltzmann solver (TABI-PB). The work on TABI-PB consists of three primary projects and an application. The first project investigates the impact of various biomolecular surface meshing codes on TABI-PB, and integrated the NanoShaper software into the package, resulting in significantly better performance. Second, a node patch method for discretizing the system of integral equations is introduced to replace the previous centroid collocation scheme, resulting in faster convergence of solvation energies. Third, a new version of TABI-PB with GPU acceleration based on the BLDTT is developed, resulting in even more scalability. An application investigating the binding of biomolecular complexes is undertaken using the previous Taylor treecode-based version of TABI-PB. In addition to these projects, work performed over the course of this thesis integrated TABI-PB into the popular Adaptive Poisson–Boltzmann Solver (APBS) developed at Pacific Northwest National Laboratory. The second major application of fast summation methods is to the 3D reference interaction site model (3D-RISM), a statistical-mechanics based continuum solvation model. This work applies cluster-particle Taylor expansion treecodes to treat long-range asymptotic Coulomb-like potentials in 3D-RISM, and results in significant speedups and improved scalability to the 3D-RISM package implemented in AmberTools. Additionally, preliminary work on specialized GPU-accelerated treecodes based on BaryTree for 3D-RISM long-range asymptotic functions is presented.PHDApplied and Interdisciplinary MathematicsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/168120/1/lwwilson_1.pd

    Vorticity structure and evolution in a transverse jet with new algorithms for scalable particle simulation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 2004.Includes bibliographical references (p. 188-200).Transverse jets arise in many applications, including propulsion, effluent dispersion, oil field flows, V/STOL aerodynamics, and drug delivery. Furthermore, they exemplify flows dominated by coherent structures that cascade into smaller scales, a source of many current challenges in fluid dynamics. This study seeks a fundamental, mechanistic understanding of the relationship between the dispersion of jet fluid and the underlying vortical structures of the transverse jet-and of how to develop actuation that optimally manipulates their dynamics to affect mixing. We develop a massively parallel 3-D vortex simulation of a high-momentum transverse jet at large Reynolds number, featuring a discrete filament representation of the vorticity field with local mesh refinement to capture stretching and folding and hair-pin removal to regularize the formation of small scales. A novel formulation of the vorticity flux boundary conditions rigorously accounts for the interaction of channel vorticity with the jet boundary layer. This formulation yields analytical expressions for vortex lines in near field of the jet and suggests effective modes of unsteady actuation at the nozzle. The present computational approach requires hierarchical N-body methods for velocity evaluation at each timestep, as direct summation is prohibitively expensive. We introduce new clustering algorithms for parallel domain decomposition of N-body interactions and demonstrate the optimality of the resulting cluster geometries. We also develop compatible techniques for dynamic load balancing, including adaptive scaling of cluster metrics and adaptive redistribution of their centroids. These tools extend to parallel hierarchical simulation of N-body problems in gravitational astrophysics,(cont.) molecular dynamics, and other fields. Simulations reveal the mechanisms by which vortical structures evolve; previous computational and experimental investigations of these processes have been incomplete at best, limited to low Reynolds numbers, transient early-stage dynamics, or Eulerian diagnostics of essentially Lagrangian phenomena. Transformation of the cylindrical shear layer emanating from the nozzle, initially dominated by azimuthal vorticity, begins with axial elongation of its lee side to form sections of counter-rotating vorticity aligned with the jet trajectory. Periodic rollup of the shear layer accompanies this deformation, creating arcs carrying azimuthal vorticity of alternating signs, curved toward the windward side of the jet. Following the pronounced bending of the trajectory into the crossflow, we observe a catastrophic breakdown of these sparse periodic structures into a dense distribution of smaller scales, with an attendant complexity of tangled vortex filaments. Nonetheless, spatial filtering of this region reveals the persistence of counter-rotating streamwise vorticity. We further characterize the flow by calculating maximum direct Lyapunov exponents of particle trajectories, identifying repelling material surfaces that organize finite-time mixing.by Youssef Mohamed Marzouk.Ph.D

    Geometry-Oblivious FMM for Compressing Dense SPD Matrices

    Full text link
    We present GOFMM (geometry-oblivious FMM), a novel method that creates a hierarchical low-rank approximation, "compression," of an arbitrary dense symmetric positive definite (SPD) matrix. For many applications, GOFMM enables an approximate matrix-vector multiplication in NlogNN \log N or even NN time, where NN is the matrix size. Compression requires NlogNN \log N storage and work. In general, our scheme belongs to the family of hierarchical matrix approximation methods. In particular, it generalizes the fast multipole method (FMM) to a purely algebraic setting by only requiring the ability to sample matrix entries. Neither geometric information (i.e., point coordinates) nor knowledge of how the matrix entries have been generated is required, thus the term "geometry-oblivious." Also, we introduce a shared-memory parallel scheme for hierarchical matrix computations that reduces synchronization barriers. We present results on the Intel Knights Landing and Haswell architectures, and on the NVIDIA Pascal architecture for a variety of matrices.Comment: 13 pages, accepted by SC'1
    corecore