9 research outputs found

    A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks

    No full text
    The adoption of hybrid CPU–GPU nodes in traditional supercomputing platforms such as the Cray-XK6 opens acceleration opportunities for electronic structure calculations in materials science and chemistry applications, where medium-sized generalized eigenvalue problems must be solved many times. These eigenvalue problems are too small to effectively solve on distributed systems, but can benefit from the massive computing power concentrated on a single-node, hybrid CPU–GPU system. However, hybrid systems call for the development of new algorithms that efficiently exploit heterogeneity and massive parallelism of not just GPUs, but of multicore/manycore CPUs as well. Addressing these demands, we developed a generalized eigensolver featuring novel algorithms of increased computational intensity (compared with the standard algorithms), decomposition of the computation into fine-grained memory aware tasks, and their hybrid execution. The resulting eigensolvers are state-of-the-art in high-performance computing, significantly outperforming existing libraries. We describe the algorithm and analyze its performance impact on applications of interest when different fractions of eigenvectors are needed by the host electronic structure code. </jats:p

    Performance optimizations for porting the openQ*D package to GPUs

    No full text
    OpenQ*D code has been used by the RC* collaboration for the generation of fully dynamical QCD+QED gauge configurations with C* boundary conditions. In this talk, optimization of solvers provided with the openQ*D package relevant for porting the code on GPU-accelerated supercomputing platforms is discussed. We present the analysis of the current implementations of the GCR solver preconditioned with Schwarz alternating procedure for ill-conditioned Dirac-operators. With the goal of enabling support for GPUs from various vendors, a novel method of adaptive CPU/GPU-hybrid implementation is proposed.ISSN:1824-803

    Red-Blue Pebbling Revisited: Near Optimal Parallel Matrix-Matrix Multiplication

    No full text
    We propose COSMA: a parallel matrix-matrix multiplication algorithm that is near communication-optimal for all combinations of matrix dimensions, processor counts, and memory sizes. The key idea behind COSMA is to derive an optimal (up to a factor of 0.03% for 10MB of fast memory) sequential schedule and then parallelize it, preserving I/O optimality. To achieve this, we use the red-blue pebble game to precisely model MMM dependencies and derive a constructive and tight sequential and parallel I/O lower bound proofs. Compared to 2D or 3D algorithms, which fix processor decomposition upfront and then map it to the matrix dimensions, it reduces communication volume by up to √ times. COSMA outperforms the established ScaLAPACK, CARMA, and CTF algorithms in all scenarios up to 12.8x (2.2x on average), achieving up to 88% of Piz Daint's peak performance. Our work does not require any hand tuning and is maintained as an open source implementation

    DCA++ project: Sustainable and scalable development of a high-performance research code

    No full text
    Scientific discoveries across all fields, from physics to biology, are increasingly driven by computer simulations. At the same time, the computational demand of many problems necessitates large-scale calculations on high-performance supercomputers. Developing and maintaining the underlying codes, however, has become a challenging task due to a combination of factors. Leadership computer systems require massive parallelism, while their architectures are diversifying. New sophisticated algorithms are continuously developed and have to be implemented efficiently for such complex systems. Finally, the multidisciplinary nature of modern science involves large, changing teams to work on a given codebase. Using the example of the DCA++ project, a highly scalable and efficient research code to solve quantum many-body problems, we explore how computational science can overcome these challenges by adopting modern software engineering approaches. We present our principles for scientific software development and describe concrete practices to meet them, adapted from agile software development frameworks.ISSN:1742-6588ISSN:1742-659

    DCA++: A software framework to solve correlated electron problems with modern quantum cluster methods

    No full text
    We present the first open release of the DCA++ project, a high-performance research software framework to solve quantum many-body problems with cutting edge quantum cluster algorithms. DCA++ implements the dynamical cluster approximation (DCA) and its DCA+ extension with a continuous self-energy. The algorithms capture nonlocal correlations in strongly correlated electron systems, thereby giving insight into high-Tc superconductivity. The code's scalability allows efficient usage of systems at all scales, from workstations to leadership computers. With regard to the increasing heterogeneity of modern computing machines, DCA++ provides portable performance on conventional and emerging new architectures, such as hybrid CPU–GPU, sustaining multiple petaflops on ORNL's Titan and CSCS’ Piz Daint supercomputers. Moreover, we show how sustainable and scalable development of the code base has been achieved by adopting standard techniques of the software industry. These include employing a distributed version control system, applying test-driven development and following continuous integration.ISSN:0010-4655ISSN:1879-294
    corecore