191 research outputs found
Passively parallel regularized stokeslets
Stokes flow, discussed by G.G. Stokes in 1851, describes many microscopic
biological flow phenomena, including cilia-driven transport and flagellar
motility; the need to quantify and understand these flows has motivated decades
of mathematical and computational research. Regularized stokeslet methods,
which have been used and refined over the past twenty years, offer significant
advantages in simplicity of implementation, with a recent modification based on
nearest-neighbour interpolation providing significant improvements in
efficiency and accuracy. Moreover this method can be implemented with the
majority of the computation taking place through built-in linear algebra,
entailing that state-of-the-art hardware and software developments in the
latter, in particular multicore and GPU computing, can be exploited through
minimal modifications ('passive parallelism') to existing MATLAB computer code.
Hence, and with widely-available GPU hardware, significant improvements in the
efficiency of the regularized stokeslet method can be obtained. The approach is
demonstrated through computational experiments on three model biological flows:
undulatory propulsion of multiple C. Elegans, simulation of progression and
transport by multiple sperm in a geometrically confined region, and left-right
symmetry breaking particle transport in the ventral node of the mouse embryo.
In general an order-of-magnitude improvement in efficiency is observed. This
development further widens the complexity of biological flow systems that are
accessible without the need for extensive code development or specialist
facilities.Comment: 21 pages, 7 figures, submitte
Power Bounded Computing on Current & Emerging HPC Systems
Power has become a critical constraint for the evolution of large scale High Performance Computing (HPC) systems and commercial data centers. This constraint spans almost every level of computing technologies, from IC chips all the way up to data centers due to physical, technical, and economic reasons. To cope with this reality, it is necessary to understand how available or permissible power impacts the design and performance of emergent computer systems. For this reason, we propose power bounded computing and corresponding technologies to optimize performance on HPC systems with limited power budgets.
We have multiple research objectives in this dissertation. They center on the understanding of the interaction between performance, power bounds, and a hierarchical power management strategy. First, we develop heuristics and application aware power allocation methods to improve application performance on a single node. Second, we develop algorithms to coordinate power across nodes and components based on application characteristic and power budget on a cluster. Third, we investigate performance interference induced by hardware and power contentions, and propose a contention aware job scheduling to maximize system throughput under given power budgets for node sharing system. Fourth, we extend to GPU-accelerated systems and workloads and develop an online dynamic performance & power approach to meet both performance requirement and power efficiency.
Power bounded computing improves performance scalability and power efficiency and decreases operation costs of HPC systems and data centers. This dissertation opens up several new ways for research in power bounded computing to address the power challenges in HPC systems. The proposed power and resource management techniques provide new directions and guidelines to green exscale computing and other computing systems
Development and Application of Numerical Methods in Biomolecular Solvation
This work addresses the development of fast summation methods for long range particle interactions and their application to problems in biomolecular solvation, which describes the interaction of proteins or other biomolecules with their solvent environment. At the core of this work are treecodes, tree-based fast summation methods which, for N particles, reduce the cost of computing particle interactions from O(N^2) to O(N log N). Background on fast summation methods and treecodes in particular, as well as several treecode improvements developed in the early stages of this work, are presented.
Building on treecodes, dual tree traversal (DTT) methods are another class of tree-based fast summation methods which reduce the cost of computing particle interactions for N particles to O(N). The primary result of this work is the development of an O(N) dual tree traversal fast summation method based on barycentric Lagrange polynomial interpolation (BLDTT). This method is implemented to run across multiple GPU compute nodes in the software package BaryTree. Across different problem sizes, particle distributions, geometries, and interaction kernels, the BLDTT shows consistently better performance than the previously developed barycentric Lagrange treecode (BLTC).
The first major biomolecular solvation application of fast summation methods presented is to the Poisson–Boltzmann implicit solvent model, and in particular, the treecode-accelerated boundary integral Poisson–Boltzmann solver (TABI-PB). The work on TABI-PB consists of three primary projects and an application. The first project investigates the impact of various biomolecular surface meshing codes on TABI-PB, and integrated the NanoShaper software into the package, resulting in significantly better performance. Second, a node patch method for discretizing the system of integral equations is introduced to replace the previous centroid collocation scheme, resulting in faster convergence of solvation energies. Third, a new version of TABI-PB with GPU acceleration based on the BLDTT is developed, resulting in even more scalability. An application investigating the binding of biomolecular complexes is undertaken using the previous Taylor treecode-based version of TABI-PB. In addition to these projects, work performed over the course of this thesis integrated TABI-PB into the popular Adaptive Poisson–Boltzmann Solver (APBS) developed at Pacific Northwest National Laboratory.
The second major application of fast summation methods is to the 3D reference interaction site model (3D-RISM), a statistical-mechanics based continuum solvation model. This work applies cluster-particle Taylor expansion treecodes to treat long-range asymptotic Coulomb-like potentials in 3D-RISM, and results in significant speedups and improved scalability to the 3D-RISM package implemented in AmberTools. Additionally, preliminary work on specialized GPU-accelerated treecodes based on BaryTree
for 3D-RISM long-range asymptotic functions is presented.PHDApplied and Interdisciplinary MathematicsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/168120/1/lwwilson_1.pd
Biomolecular electrostatics with continuum models: a boundary integral implementation and applications to biosensors
The implicit-solvent model uses continuum electrostatic theory to represent the salt solution around dissolved biomolecules, leading to a coupled system of the Poisson-Boltzmann and Poisson equations. This thesis uses the implicit-solvent model to study solvation, binding and adsorption of proteins.
We developed an implicit-solvent model solver that uses the boundary element method (BEM), called PyGBe. BEM numerically solves integral equations along the biomolecule-solvent interface only, therefore, it does not need to discretize the entire domain. PyGBe accelerates the BEM with a treecode algorithm and runs on graphic processing units. We performed extensive verification and validation of the code, comparing it with experimental observations, analytical solutions, and other numerical tools. Our results suggest that a BEM approach is more appropriate than volumetric based methods, like finite-difference or finite-element, for high accuracy calculations. We also discussed the effect of features like solvent-filled cavities and Stern layers in the implicit-solvent model, and realized that they become relevant in binding energy calculations.
The application that drove this work was nano-scale biosensors-- devices designed to detect biomolecules. Biosensors are built with a functionalized layer of ligand molecules, to which the target molecule binds when it is detected. With our code, we performed a study of the orientation of proteins near charged surfaces, and investigated the ideal conditions for ligand molecule adsorption. Using immunoglobulin G as a test case, we found out that low salt concentration in the solvent and high positive surface charge density leads to favorable orientations of the ligand molecule for biosensing applications.
We also studied the plasmonic response of localized surface plasmon resonance (LSPR) biosensors. LSPR biosensors monitor the plasmon resonance frequency of metallic nanoparticles, which shifts when a target molecule binds to a ligand molecule. Electrostatics is a valid approximation to the LSPR biosensor optical phenomenon in the long-wavelength limit, and BEM was able to reproduce the shift in the plasmon resonance frequency as proteins approach the nanoparticle
Fast and Accurate Boundary Element Methods in Three Dimensions
The Laplace and Helmholtz equations are two of the most important partial differential equations (PDEs) in science, and govern problems in electromagnetism, acoustics, astrophysics, and aerodynamics. The boundary element method (BEM) is a powerful method for solving these PDEs. The BEM reduces the dimensionality of the problem by one, and treats complex boundary shapes and multi-domain problems well. The BEM also suffers from a few problems. The entries in the system matrices require computing boundary integrals, which can be difficult to do accurately, especially in the Galerkin formulation. These matrices are also dense, requiring O(N^2) to store and O(N^3) to solve using direct matrix decompositions, where N is the number of unknowns. This can effectively restrict the size of a problem.
Methods are presented for computing the boundary integrals that arise in the Galerkin formulation to any accuracy. Integrals involving geometrically separated triangles are non-singular, and are computed using a technique based on spherical harmonics and multipole expansions and translations. Integrals involving triangles that have common vertices, edges, or are coincident are treated via scaling and symmetry arguments, combined with recursive geometric decomposition of the integrals.
The fast multipole method (FMM) is used to accelerate the BEM. The FMM is usually designed around point sources, not the integral expressions in the BEM. To apply the FMM to these expressions, the internal logic of the FMM must be changed, but this can be difficult. The correction factor matrix method is presented, which approximates the integrals using a quadrature. The quadrature points are treated as point sources, which are plugged directly into current FMM codes. Any inaccuracies are corrected during a correction factor step. This method reduces the quadratic and cubic scalings of the BEM to linear.
Software is developed for computing the solutions to acoustic scattering problems involving spheroids and disks. This software uses spheroidal wave functions to analytically build the solutions to these problems. This software is used to verify the accuracy of the BEM for the Helmholtz equation.
The product of these contributions is a fast and accurate BEM solver for the Laplace and Helmholtz equations
Geometry-Oblivious FMM for Compressing Dense SPD Matrices
We present GOFMM (geometry-oblivious FMM), a novel method that creates a
hierarchical low-rank approximation, "compression," of an arbitrary dense
symmetric positive definite (SPD) matrix. For many applications, GOFMM enables
an approximate matrix-vector multiplication in or even time,
where is the matrix size. Compression requires storage and work.
In general, our scheme belongs to the family of hierarchical matrix
approximation methods. In particular, it generalizes the fast multipole method
(FMM) to a purely algebraic setting by only requiring the ability to sample
matrix entries. Neither geometric information (i.e., point coordinates) nor
knowledge of how the matrix entries have been generated is required, thus the
term "geometry-oblivious." Also, we introduce a shared-memory parallel scheme
for hierarchical matrix computations that reduces synchronization barriers. We
present results on the Intel Knights Landing and Haswell architectures, and on
the NVIDIA Pascal architecture for a variety of matrices.Comment: 13 pages, accepted by SC'1
GPU-based Private Information Retrieval for On-Device Machine Learning Inference
On-device machine learning (ML) inference can enable the use of private user
data on user devices without revealing them to remote servers. However, a pure
on-device solution to private ML inference is impractical for many applications
that rely on embedding tables that are too large to be stored on-device. In
particular, recommendation models typically use multiple embedding tables each
on the order of 1-10 GBs of data, making them impractical to store on-device.
To overcome this barrier, we propose the use of private information retrieval
(PIR) to efficiently and privately retrieve embeddings from servers without
sharing any private information. As off-the-shelf PIR algorithms are usually
too computationally intensive to directly use for latency-sensitive inference
tasks, we 1) propose novel GPU-based acceleration of PIR, and 2) co-design PIR
with the downstream ML application to obtain further speedup. Our GPU
acceleration strategy improves system throughput by more than over
an optimized CPU PIR implementation, and our PIR-ML co-design provides an over
additional throughput improvement at fixed model quality. Together,
for various on-device ML applications such as recommendation and language
modeling, our system on a single V100 GPU can serve up to queries per
second -- a throughput improvement over a CPU-based baseline --
while maintaining model accuracy
- …