178 research outputs found

    OpenCL-accelerated first-principles calculations of all-electron quantum perturbations on HPC resources

    Get PDF
    We have proposed, for the first time, an OpenCL implementation for the all-electron density-functional perturbation theory (DFPT) calculations in FHI-aims, which can effectively compute all its time-consuming simulation stages, i.e., the real-space integration of the response density, the Poisson solver for the calculation of the electrostatic potential, and the response Hamiltonian matrix, by utilizing various heterogeneous accelerators. Furthermore, to fully exploit the massively parallel computing capabilities, we have performed a series of general-purpose graphics processing unit (GPGPU)-targeted optimizations that significantly improved the execution efficiency by reducing register requirements, branch divergence, and memory transactions. Evaluations on the Sugon supercomputer have shown that notable speedups can be achieved across various materials

    Interactive drug-design: using advanced computing to evaluate the induced fit effect

    Get PDF
    This thesis describes the efforts made to provide protein flexibility in a molecular modelling software application, which prior to this work, was operating using rigid proteins and semi flexible ligands. Protein flexibility during molecular modelling simulations is a non-­‐trivial task requiring a great number of floating point operations and it could not be accomplished without the help of supercomputing such as GPGPUs (or possibly Xeon Phi). The thesis is structured as follows. It provides a background section, where the reader can find the necessary context and references in order to be able to understand this report. Next is a state of the art section, which describes what had been done in the fields of molecular dynamics and flexible haptic protein ligand docking prior to this work. An implementation section follows, which lists failed efforts that provided the necessary feedback in order to design efficient algorithms to accomplish this task. Chapter 6 describes in detail an irregular – grid decomposition approach in order to provide fast non-­‐bonded interaction computations for GPGPUs. This technique is also associated with algorithms that provide fast bonded interaction computations and exclusions handling for 1-­‐4 bonded atoms during the non-­‐bonded forces computation part. Performance benchmarks as well as accuracy tables for energy and force computations are provided to demonstrate the efficiency of the methodologies explained in this chapter. Chapter 7 provides an overview of an evolutionary strategy used to overcome the problems associated with the limited capabilities of local search strategies such as steepest descents, which get trapped in the first local minima they find. Our proposed method is able to explore the potential energy landscape in such a way that it can pick competitive uphill solutions to escape local minima in the hope of finding deeper valleys. This methodology is also serving the purpose of providing a good number of conformational updates such that it is able to restore the areas of interaction between the protein and the ligand while searching for optimum global solutions

    Parallel Triplet Finding for Particle Track Reconstruction. [Mit einer ausführlichen deutschen Zusammenfassung]

    Get PDF

    Numerical methods for electronic structure calculations

    Get PDF
    In this thesis, several numerical methods for electronic structure calculations are presented. The first is a quadrature scheme for the accurate and efficient computation of electrostatic potentials. The quadrature is applied to calculations on real-space grids, and to Coulomb integrals over Gaussian-type orbitals. Second, we introduce a real-space representation for three-dimensional scalar functions encountered in electronic structure calculations. In this representation, each function is partitioned into numerical atom-centred parts (the bubbles), and the remainder is represented on a three-dimensional Cartesian grid. The algorithms to carry out the required operations are discussed, along with benchmarks of their computer implementations. The presented methods are all of a divide-and-conquer nature, breaking the problem into simple pieces which are suitable for execution in emerging massively parallel computer architectures, such as general-purpose graphics processing units.Numeriska metoder för beräkning av elektronstrukturen för molekylära system presenteras i denna avhandling. Först diskuteras en kvadratur för noggranna och effektiva beräkningar av elektrostatiska potentialer. Kvadraturen används för numerisk beräkning av Coulomb-integraler över Gaussiska orbitaler. Därefter introduceras en ny numerisk representation av tredimensionella skalära funktioner. Den numeriska representationen används för att beskriva funktioner som förekommer i elektronstrukturberäkningar. Varje funktion uttrycks numeriskt i numeriska atomcentrerade funktioner (bubbles) omkring varje atoms och återstoden representeras numeriskt på ett tredimensionellt punktgitter. Algoritmerna som används för att utföra matematiska operationer och manipuleringar av skalarfunktionerna diskuteras och prestandan för datorimplementeringen av algoritmerna undersöks. Det numeriska tillvägagångsättet hör till kategorin "söndra och härska" dvs. problemet sönderdelas i ett antal enklare problem, som är väl ämnade för moderna massivt parallella datorarkitekturer såsom generella grafikkort (GPGPU), vilka kan användas för mera krävande beräkningsändamål

    Parallel Spherical Harmonic Transforms on heterogeneous architectures (GPUs/multi-core CPUs)

    Get PDF
    Spherical Harmonic Transforms (SHT) are at the heart of many scientific and practical applications ranging from climate modelling to cosmological observations. In many of these areas new, cutting-edge science goals have been recently proposed requiring simulations and analyses of experimental or observational data at very high resolutions and of unprecedented volumes. Both these aspects pose formidable challenge for the currently existing implementations of the transforms. This paper describes parallel algorithms for computing SHT with two variants of intra-node parallelism appropriate for novel supercomputer architectures, multi-core processors and Graphic Processing Units (GPU). It also discusses their performance, alone and embedded within a top-level, MPI-based parallelisation layer ported from the S2HAT library, in terms of their accuracy, overall efficiency and scalability. We show that our inverse SHT run on GeForce 400 Series GPUs equipped with latest CUDA architecture ("Fermi") outperforms the state of the art implementation for a multi-core processor executed on a current Intel Core i7-2600K. Furthermore, we show that an MPI/CUDA version of the inverse transform run on a cluster of 128 Nvidia Tesla S1070 is as much as 3 times faster than the hybrid MPI/OpenMP version executed on the same number of quad-core processors Intel Nahalem for problem sizes motivated by our target applications. Performance of the direct transforms is however found to be at the best comparable in these cases. We discuss in detail the algorithmic solutions devised for major steps involved in the transforms calculation, emphasising those with a major impact on their overall performance, and elucidates the sources of the dichotomy between the direct and the inverse operations
    corecore