178 research outputs found
OpenCL-accelerated first-principles calculations of all-electron quantum perturbations on HPC resources
We have proposed, for the first time, an OpenCL implementation for the all-electron density-functional perturbation theory (DFPT) calculations in FHI-aims, which can effectively compute all its time-consuming simulation stages, i.e., the real-space integration of the response density, the Poisson solver for the calculation of the electrostatic potential, and the response Hamiltonian matrix, by utilizing various heterogeneous accelerators. Furthermore, to fully exploit the massively parallel computing capabilities, we have performed a series of general-purpose graphics processing unit (GPGPU)-targeted optimizations that significantly improved the execution efficiency by reducing register requirements, branch divergence, and memory transactions. Evaluations on the Sugon supercomputer have shown that notable speedups can be achieved across various materials
Interactive drug-design: using advanced computing to evaluate the induced fit effect
This thesis describes the efforts made to provide protein flexibility in a molecular modelling
software application, which prior to this work, was operating using rigid proteins and semi
flexible ligands. Protein flexibility during molecular modelling simulations is a non-‐trivial
task requiring a great number of floating point operations and it could not be accomplished
without the help of supercomputing such as GPGPUs (or possibly Xeon Phi).
The thesis is structured as follows. It provides a background section, where the reader can
find the necessary context and references in order to be able to understand this report.
Next is a state of the art section, which describes what had been done in the fields of
molecular dynamics and flexible haptic protein ligand docking prior to this work. An
implementation section follows, which lists failed efforts that provided the necessary
feedback in order to design efficient algorithms to accomplish this task.
Chapter 6 describes in detail an irregular – grid decomposition approach in order to provide
fast non-‐bonded interaction computations for GPGPUs. This technique is also associated
with algorithms that provide fast bonded interaction computations and exclusions handling
for 1-‐4 bonded atoms during the non-‐bonded forces computation part. Performance
benchmarks as well as accuracy tables for energy and force computations are provided to
demonstrate the efficiency of the methodologies explained in this chapter.
Chapter 7 provides an overview of an evolutionary strategy used to overcome the problems
associated with the limited capabilities of local search strategies such as steepest descents,
which get trapped in the first local minima they find. Our proposed method is able to
explore the potential energy landscape in such a way that it can pick competitive uphill
solutions to escape local minima in the hope of finding deeper valleys. This methodology
is also serving the purpose of providing a good number of conformational updates such
that it is able to restore the areas of interaction between the protein and the ligand while
searching for optimum global solutions
Numerical methods for electronic structure calculations
In this thesis, several numerical methods for electronic structure calculations are presented. The first is a quadrature scheme for the accurate and efficient computation of electrostatic potentials. The quadrature is applied to calculations on real-space grids, and to Coulomb integrals over Gaussian-type orbitals. Second, we introduce a real-space representation for three-dimensional scalar functions encountered in electronic structure calculations. In this representation, each function is partitioned into numerical atom-centred parts (the bubbles), and the remainder is represented on a three-dimensional Cartesian grid. The algorithms to carry out the required operations are discussed, along with benchmarks of their computer implementations. The presented methods are all of a divide-and-conquer nature, breaking the problem into simple pieces which are suitable for execution in emerging massively parallel computer architectures, such as general-purpose graphics processing units.Numeriska metoder för beräkning av elektronstrukturen för molekylära system presenteras i denna avhandling. Först diskuteras en kvadratur för noggranna och effektiva beräkningar av elektrostatiska potentialer. Kvadraturen används för numerisk beräkning av Coulomb-integraler över Gaussiska orbitaler. Därefter introduceras en ny numerisk representation av tredimensionella skalära funktioner. Den numeriska representationen används för att beskriva funktioner som förekommer i elektronstrukturberäkningar. Varje funktion uttrycks numeriskt i numeriska atomcentrerade funktioner (bubbles) omkring varje atoms och återstoden representeras numeriskt på ett tredimensionellt punktgitter. Algoritmerna som används för att utföra matematiska operationer och manipuleringar av skalarfunktionerna diskuteras och prestandan för datorimplementeringen av algoritmerna undersöks. Det numeriska tillvägagångsättet hör till kategorin "söndra och härska" dvs. problemet sönderdelas i ett antal enklare problem, som är väl ämnade för moderna massivt parallella datorarkitekturer såsom generella grafikkort (GPGPU), vilka kan användas för mera krävande beräkningsändamål
Parallel Spherical Harmonic Transforms on heterogeneous architectures (GPUs/multi-core CPUs)
Spherical Harmonic Transforms (SHT) are at the heart of many scientific and
practical applications ranging from climate modelling to cosmological
observations. In many of these areas new, cutting-edge science goals have been
recently proposed requiring simulations and analyses of experimental or
observational data at very high resolutions and of unprecedented volumes. Both
these aspects pose formidable challenge for the currently existing
implementations of the transforms.
This paper describes parallel algorithms for computing SHT with two variants
of intra-node parallelism appropriate for novel supercomputer architectures,
multi-core processors and Graphic Processing Units (GPU). It also discusses
their performance, alone and embedded within a top-level, MPI-based
parallelisation layer ported from the S2HAT library, in terms of their
accuracy, overall efficiency and scalability. We show that our inverse SHT run
on GeForce 400 Series GPUs equipped with latest CUDA architecture ("Fermi")
outperforms the state of the art implementation for a multi-core processor
executed on a current Intel Core i7-2600K. Furthermore, we show that an
MPI/CUDA version of the inverse transform run on a cluster of 128 Nvidia Tesla
S1070 is as much as 3 times faster than the hybrid MPI/OpenMP version executed
on the same number of quad-core processors Intel Nahalem for problem sizes
motivated by our target applications. Performance of the direct transforms is
however found to be at the best comparable in these cases. We discuss in detail
the algorithmic solutions devised for major steps involved in the transforms
calculation, emphasising those with a major impact on their overall
performance, and elucidates the sources of the dichotomy between the direct and
the inverse operations
- …