281 research outputs found
Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1. Generalized Born
We present an implementation of generalized Born implicit
solvent
all-atom classical molecular dynamics (MD) within the AMBER program
package that runs entirely on CUDA enabled NVIDIA graphics processing
units (GPUs). We discuss the algorithms that are used to exploit the
processing power of the GPUs and show the performance that can be
achieved in comparison to simulations on conventional CPU clusters.
The implementation supports three different precision models in which
the contributions to the forces are calculated in single precision
floating point arithmetic but accumulated in double precision (SPDP),
or everything is computed in single precision (SPSP) or double precision
(DPDP). In addition to performance, we have focused on understanding
the implications of the different precision models on the outcome
of implicit solvent MD simulations. We show results for a range of
tests including the accuracy of single point force evaluations and
energy conservation as well as structural properties pertainining
to protein dynamics. The numerical noise due to rounding errors within
the SPSP precision model is sufficiently large to lead to an accumulation
of errors which can result in unphysical trajectories for long time
scale simulations. We recommend the use of the mixed-precision SPDP
model since the numerical results obtained are comparable with those
of the full double precision DPDP model and the reference double precision
CPU implementation but at significantly reduced computational cost.
Our implementation provides performance for GB simulations on a single
desktop that is on par with, and in some cases exceeds, that of traditional
supercomputers
High-performance and hardware-aware computing: proceedings of the second International Workshop on New Frontiers in High-performance and Hardware-aware Computing (HipHaC\u2711), San Antonio, Texas, USA, February 2011 ; (in conjunction with HPCA-17)
High-performance system architectures are increasingly exploiting heterogeneity. The HipHaC workshop aims at combining new aspects of parallel, heterogeneous, and reconfigurable microprocessor technologies with concepts of high-performance computing and, particularly, numerical solution methods. Compute- and memory-intensive applications can only benefit from the full
hardware potential if all features on all levels are taken into account in a holistic approach
Constraint programming on a heterogeneous multicore architecture
As bibliotecas para programação com restrições são úteis ao desenvolverem-se aplicações em linguagens de programação normalmente mais utilizadas pois não necessitam que os programadores aprendam uma. Nova, linguagem, fornecendo ferramentas de programação declarativa para utilização com os sistemas convencionais. Algumas soluções para programação com restrições favorecem completude, tais como sistemas baseados em propagação. Outras estão mais interessadas em obter uma boa solução rapidamente, rejeitando a necessidade de encontram todas as soluções; esta sendo a alternativa utilizada nos sistemas de pesquisa local. Conceber soluções hÃbridas (propagação + pesquisa local) parece prometedor pois as vantagens de ambas alternativas podem ser combinadas numa única solução. As arquiteturas paralelas são cada vez mais comuns, em parte devido à disponibilidade em grande escala, de sistemas individuais mas também devido à tendência em generalizar o uso de processadores multicore ou seja., processadores com várias unidades de processamento. Nesta tese é proposta uma. Arquitetura para resolvedores de restrições mistos, de pendendo de métodos de propagação e pesquisa local, a qual foi concebida para funcionar eficazmente numa arquitetura. Heterogéneo multiprocessador. /ABSTRACT - Constraint programming libraries are useful when building applications developed mostly in mainstrearn programming languages: they do not require the developers to acquire skills for a new language, providing instead declarative programming tools for use within conventional systems. Some approaches to constraint programming favour completeness, such as propagation-based systems. Others are more interested in getting to a good solution fast, regardless of whether all solutions may be found; this approach is used in local search systems. Designing hybrid approaches (propagation + local search) seems promising since the advantages may be combined into a single approach. Parallel architectures are becoming more commonplace, partly due to the large-scale availability of individual systems but also because of the trend towards generalizing the use of multicore microprocessors. In this thesis an architecture for mixed constraint solvers is proposed, relying both on propagation and local search, which is designed to function effectively in a heterogeneous multicore architecture
Computational Methods in Science and Engineering : Proceedings of the Workshop SimLabs@KIT, November 29 - 30, 2010, Karlsruhe, Germany
In this proceedings volume we provide a compilation of article contributions equally covering applications from different research fields and ranging from capacity up to capability computing. Besides classical computing aspects such as parallelization, the focus of these proceedings is on multi-scale approaches and methods for tackling algorithm and data complexity. Also practical aspects regarding the usage of the HPC infrastructure and available tools and software at the SCC are presented
Parallel cryptanalysis
Most of today’s cryptographic primitives are based on computations that are hard to perform for a potential attacker but easy to perform for somebody who is in possession of some secret information, the key, that opens a back door in these hard computations and allows them to be solved in a small amount of time. To estimate the strength of a cryptographic primitive it is important to know how hard it is to perform the computation without knowledge of the secret back door and to get an understanding of how much money or time the attacker has to spend. Usually a cryptographic primitive allows the cryptographer to choose parameters that make an attack harder at the cost of making the computations using the secret key harder as well. Therefore designing a cryptographic primitive imposes the dilemma of choosing the parameters strong enough to resist an attack up to a certain cost while choosing them small enough to allow usage of the primitive in the real world, e.g. on small computing devices like smart phones. This thesis investigates three different attacks on particular cryptographic systems: Wagner’s generalized birthday attack is applied to the compression function of the hash function FSB. Pollard’s rho algorithm is used for attacking Certicom’s ECC Challenge ECC2K-130. The implementation of the XL algorithm has not been specialized for an attack on a specific cryptographic primitive but can be used for attacking some cryptographic primitives by solving multivariate quadratic systems. All three attacks are general attacks, i.e. they apply to various cryptographic systems; the implementations of Wagner’s generalized birthday attack and Pollard’s rho algorithm can be adapted for attacking other primitives than those given in this thesis. The three attacks have been implemented on different parallel architectures. XL has been parallelized using the Block Wiedemann algorithm on a NUMA system using OpenMP and on an Infiniband cluster using MPI. Wagner’s attack was performed on a distributed system of 8 multi-core nodes connected by an Ethernet network. The work on Pollard’s Rho algorithm is part of a large research collaboration with several research groups; the computations are embarrassingly parallel and are executed in a distributed fashion in several facilities with almost negligible communication cost. This dissertation presents implementations of the iteration function of Pollard’s Rho algorithm on Graphics Processing Units and on the Cell Broadband Engine
Recommended from our members
ATOMISTIC SIMULATIONS OF INTRINSICALLY DISORDERED PROTEIN FOLDING AND DYNAMICS
Intrinsically disordered proteins (IDPs) are crucial in biology and human diseases, necessitating a comprehensive understanding of their structure, dynamics, and interactions. Atomistic simulations have emerged as a key tool for unraveling the molecular intricacies and establishing mechanistic insights into how these proteins facilitate diverse biological functions. However, achieving accurate simulations requires both an appropriate protein force field capable of describing the energy landscape of functionally relevant IDP conformations and sufficient conformational sampling to capture the free energy landscape of IDP dynamics. These factors are fundamental in comprehending potential IDP structures, dynamics, and interactions.
I first conducted explicit solvent simulations to assess the performance of two state-of-the-art protein force fields, namely CHARMM36m and a99SB-disp, in capturing the stability of small protein-protein interactions. To evaluate their accuracy, I selected a set of 46 amino acid backbone and side chain pairs with representative configurations and computed the free energy profiles of their interactions. The results demonstrated that CHARMM36m consistently predicted stronger protein-protein interactions compared to a99SB-disp. Notably, the most significant overestimation in CHARMM36m occurred in charged pairs involving Arg and Glu side chains, with an overestimation of up to 2.9 kcal/mol. Through free energy decomposition analysis, I determined that these overestimations were primarily driven by protein-water electrostatic interactions rather than van der Waals (vdW) interactions. Consequently, these findings suggest that careful rebalancing of electrostatic interactions should be considered in the further optimization of protein force fields.
In order to enhance the conformational sampling of IDPs, I developed an integrated approach that combines an improved implicit solvent model called Generalized Born with molecular volume and solvent accessible surface area (GBMV2/SA) with a multiscale enhanced sampling (MSES) technique. To make this approach more efficient, I implemented it as a standalone OpenMM plugin on Graphics Processing Units (GPUs). The results demonstrated that the GPU-GBMV2/SA model achieved numerical equivalence to the original CPU-GBMV2/SA models, while providing a remarkable ~60x speedup on a single NVIDIA TITAN X (Pascal) graphics card for molecular dynamic simulations of both folded and unstructured proteins. This significant acceleration greatly facilitated the application of the approach in biomolecular simulations.
In addition, I conducted an evaluation of the reliability of GBMV2/SA models in simulating both folded and unfolded proteins. The results revealed that the GBMV2/SA model accurately describes small proteins, but its applicability is limited when it comes to larger proteins such as KID and p53-TAD proteins. This limitation can be attributed to the absence of long-range solute-solvent dispersion interactions in the model. To address this issue, I introduced a comprehensive treatment of nonpolar solvation free energy called GBMV2/NP model. Unfortunately, the GBMV2/NP model exhibited a destabilizing effect on well-folded proteins, particularly larger ones, due to an inaccurate representation of the repulsive solvent accessible surface area (SASA) model caused by the utilization of unphysical van der Waals volume. This observation highlights the need for further improvements in accurately describing the nonpolar term in the model
A hybrid parallel framework for computational solid mechanics
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 95-98).A novel, hybrid parallel C++ framework for computational solid mechanics is developed and presented. The modular and extensible design of this framework allows it to support a wide variety of numerical schemes including discontinuous Galerkin formulations and higher order methods, multiphysics problems, hybrid meshes made of different types of elements and a number of different linear and non-linear solvers. In addition, native, seamless support is included for hardware acceleration by Graphics Processing Units (GPUs) via NVIDIA's CUDA architecture for both single GPU workstations and heterogenous clusters of GPUs. The capabilities of the framework are demonstrated through a series of sample problems, including a laser induced cylindrical shock propagation, a dynamic problem involving a micro-truss array made of millions of elements, and a tension problem involving a shape memory alloy with a multifield formulation to model the superelastic effect.by Piotr Fidkowski.S.M
Dense and sparse parallel linear algebra algorithms on graphics processing units
Una lÃnea de desarrollo seguida en el campo de la supercomputación es el uso de procesadores de propósito especÃfico para acelerar determinados tipos de cálculo. En esta tesis estudiamos el uso de tarjetas gráficas como aceleradores de la computación y lo aplicamos al ámbito del álgebra lineal. En particular trabajamos con la biblioteca SLEPc para resolver problemas de cálculo de autovalores en matrices de gran dimensión, y para aplicar funciones de matrices en los cálculos de aplicaciones cientÃficas. SLEPc es una biblioteca paralela que se basa en el estándar MPI y está desarrollada con la premisa de ser escalable, esto es, de permitir resolver problemas más grandes al aumentar las unidades de procesado.
El problema lineal de autovalores, Ax = lambda x en su forma estándar, lo abordamos con el uso de técnicas iterativas, en concreto con métodos de Krylov, con los que calculamos una pequeña porción del espectro de autovalores. Este tipo de algoritmos se basa en generar un subespacio de tamaño reducido (m) en el que proyectar el problema de gran dimensión (n), siendo m << n. Una vez se ha proyectado el problema, se resuelve este mediante métodos directos, que nos proporcionan aproximaciones a los autovalores del problema inicial que querÃamos resolver. Las operaciones que se utilizan en la expansión del subespacio varÃan en función de si los autovalores deseados están en el exterior o en el interior del espectro. En caso de buscar autovalores en el exterior del espectro, la expansión se hace mediante multiplicaciones matriz-vector. Esta operación la realizamos en la GPU, bien mediante el uso de bibliotecas o mediante la creación de funciones que aprovechan la estructura de la matriz. En caso de autovalores en el interior del espectro, la expansión requiere resolver sistemas de ecuaciones lineales. En esta tesis implementamos varios algoritmos para la resolución de sistemas de ecuaciones lineales para el caso especÃfico de matrices con estructura tridiagonal a bloques, que se ejecutan en GPU.
En el cálculo de las funciones de matrices hemos de diferenciar entre la aplicación directa de una función sobre una matriz, f(A), y la aplicación de la acción de una función de matriz sobre un vector, f(A)b. El primer caso implica un cálculo denso que limita el tamaño del problema. El segundo permite trabajar con matrices dispersas grandes, y para resolverlo también hacemos uso de métodos de Krylov. La expansión del subespacio se hace mediante multiplicaciones matriz-vector, y hacemos uso de GPUs de la misma forma que al resolver autovalores. En este caso el problema proyectado comienza siendo de tamaño m, pero se incrementa en m en cada reinicio del método. La resolución del problema proyectado se hace aplicando una función de matriz de forma directa. Nosotros hemos implementado varios algoritmos para calcular las funciones de matrices raÃz cuadrada y exponencial, en las que el uso de GPUs permite acelerar el cálculo.One line of development followed in the field of supercomputing is the use of specific purpose processors to speed up certain types of computations. In this thesis we study the use of graphics processing units as computer accelerators and apply it to the field of linear algebra. In particular, we work with the SLEPc library to solve large scale eigenvalue problems, and to apply matrix functions in scientific applications. SLEPc is a parallel library based on the MPI standard and is developed with the premise of being scalable, i.e. to allow solving larger problems by increasing the processing units.
We address the linear eigenvalue problem, Ax = lambda x in its standard form, using iterative techniques, in particular with Krylov's methods, with which we calculate a small portion of the eigenvalue spectrum. This type of algorithms is based on generating a subspace of reduced size (m) in which to project the large dimension problem (n), being m << n. Once the problem has been projected, it is solved by direct methods, which provide us with approximations of the eigenvalues of the initial problem we wanted to solve. The operations used in the expansion of the subspace vary depending on whether the desired eigenvalues are from the exterior or from the interior of the spectrum. In the case of searching for exterior eigenvalues, the expansion is done by matrix-vector multiplications. We do this on the GPU, either by using libraries or by creating functions that take advantage of the structure of the matrix. In the case of eigenvalues from the interior of the spectrum, the expansion requires solving linear systems of equations. In this thesis we implemented several algorithms to solve linear systems of equations for the specific case of matrices with a block-tridiagonal structure, that are run on GPU.
In the computation of matrix functions we have to distinguish between the direct application of a matrix function, f(A), and the action of a matrix function on a vector, f(A)b. The first case involves a dense computation that limits the size of the problem. The second allows us to work with large sparse matrices, and to solve it we also make use of Krylov's methods. The expansion of subspace is done by matrix-vector multiplication, and we use GPUs in the same way as when solving eigenvalues. In this case the projected problem starts being of size m, but it is increased by m on each restart of the method. The solution of the projected problem is done by directly applying a matrix function. We have implemented several algorithms to compute the square root and the exponential matrix functions, in which the use of GPUs allows us to speed up the computation.Una lÃnia de desenvolupament seguida en el camp de la supercomputació és l'ús de processadors de propòsit especÃfic per a accelerar determinats tipus de cà lcul. En aquesta tesi estudiem l'ús de targetes grà fiques com a acceleradors de la computació i ho apliquem a l'à mbit de l'à lgebra lineal. En particular treballem amb la biblioteca SLEPc per a resoldre problemes de cà lcul d'autovalors en matrius de gran dimensió, i per a aplicar funcions de matrius en els cà lculs d'aplicacions cientÃfiques. SLEPc és una biblioteca paral·lela que es basa en l'està ndard MPI i està desenvolupada amb la premissa de ser escalable, açò és, de permetre resoldre problemes més grans en augmentar les unitats de processament.
El problema lineal d'autovalors, Ax = lambda x en la seua forma està ndard, ho abordem amb l'ús de tècniques iteratives, en concret amb mètodes de Krylov, amb els quals calculem una xicoteta porció de l'espectre d'autovalors. Aquest tipus d'algorismes es basa a generar un subespai de grandà ria reduïda (m) en el qual projectar el problema de gran dimensió (n), sent m << n. Una vegada s'ha projectat el problema, es resol aquest mitjançant mètodes directes, que ens proporcionen aproximacions als autovalors del problema inicial que volÃem resoldre. Les operacions que s'utilitzen en l'expansió del subespai varien en funció de si els autovalors desitjats estan en l'exterior o a l'interior de l'espectre. En cas de cercar autovalors en l'exterior de l'espectre, l'expansió es fa mitjançant multiplicacions matriu-vector. Aquesta operació la realitzem en la GPU, bé mitjançant l'ús de biblioteques o mitjançant la creació de funcions que aprofiten l'estructura de la matriu. En cas d'autovalors a l'interior de l'espectre, l'expansió requereix resoldre sistemes d'equacions lineals. En aquesta tesi implementem diversos algorismes per a la resolució de sistemes d'equacions lineals per al cas especÃfic de matrius amb estructura tridiagonal a blocs, que s'executen en GPU.
En el cà lcul de les funcions de matrius hem de diferenciar entre l'aplicació directa d'una funció sobre una matriu, f(A), i l'aplicació de l'acció d'una funció de matriu sobre un vector, f(A)b. El primer cas implica un cà lcul dens que limita la grandà ria del problema. El segon permet treballar amb matrius disperses grans, i per a resoldre-ho també fem ús de mètodes de Krylov. L'expansió del subespai es fa mitjançant multiplicacions matriu-vector, i fem ús de GPUs de la mateixa forma que en resoldre autovalors. En aquest cas el problema projectat comença sent de grandà ria m, però s'incrementa en m en cada reinici del mètode. La resolució del problema projectat es fa aplicant una funció de matriu de forma directa. Nosaltres hem implementat diversos algorismes per a calcular les funcions de matrius arrel quadrada i exponencial, en les quals l'ús de GPUs permet accelerar el cà lcul.Lamas Daviña, A. (2018). Dense and sparse parallel linear algebra algorithms on graphics processing units [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/112425TESI
- …