482 research outputs found

    A parallel algorithm for Hamiltonian matrix construction in electron-molecule collision calculations: MPI-SCATCI

    Full text link
    Construction and diagonalization of the Hamiltonian matrix is the rate-limiting step in most low-energy electron -- molecule collision calculations. Tennyson (J Phys B, 29 (1996) 1817) implemented a novel algorithm for Hamiltonian construction which took advantage of the structure of the wavefunction in such calculations. This algorithm is re-engineered to make use of modern computer architectures and the use of appropriate diagonalizers is considered. Test calculations demonstrate that significant speed-ups can be gained using multiple CPUs. This opens the way to calculations which consider higher collision energies, larger molecules and / or more target states. The methodology, which is implemented as part of the UK molecular R-matrix codes (UKRMol and UKRMol+) can also be used for studies of bound molecular Rydberg states, photoionisation and positron-molecule collisions.Comment: Write up of a computer program MPI-SCATCI Computer Physics Communications, in pres

    A Parallel, Distributed Memory Implementation of the Adaptive Sampling Configuration Interaction Method

    Full text link
    Many-body simulations of quantum systems is an active field of research that involves many different methods targeting various computing platforms. Many methods commonly employed, particularly coupled cluster methods, have been adapted to leverage the latest advances in modern high-performance computing.Selected configuration interaction (sCI) methods have seen extensive usage and development in recent years. However development of sCI methods targeting massively parallel resources has been explored only in a few research works. In this work, we present a parallel, distributed memory implementation of the adaptive sampling configuration interaction approach (ASCI) for sCI. In particular, we will address key concerns pertaining to the parallelization of the determinant search and selection, Hamiltonian formation, and the variational eigenvalue calculation for the ASCI method. Load balancing in the search step is achieved through the application of memory-efficient determinant constraints originally developed for the ASCI-PT2 method. Presented benchmarks demonstrate parallel efficiency exceeding 95\% for the variational ASCI calculation of Cr2_2 (24e,30o) with 106,10710^6, 10^7, and 3∗1083*10^8 variational determinants up to 16,384 CPUs. To the best of the authors' knowledge, this is the largest variational ASCI calculation to date.Comment: 32 pages, 4 figure

    The parallel computation of the smallest eigenpair of an acoustic problem with damping

    Get PDF
    Acoustic problems with damping may give rise to large quadratic eigenproblems. Efficient and parallelizable algorithms are required for solving these problems. The recently proposed Jacobi-Davidson method is well suited for parallel computing: no matrix decomposition and no back or forward substitutions are needed. This paper describes the parallel solution of the smallest eigenpair of a realistic and very large quadratic eigenproblem with the Jacobi-Davidson method

    Parallel sparse matrix-vector multiplication as a test case for hybrid MPI+OpenMP programming

    Full text link
    We evaluate optimized parallel sparse matrix-vector operations for two representative application areas on widespread multicore-based cluster configurations. First the single-socket baseline performance is analyzed and modeled with respect to basic architectural properties of standard multicore chips. Going beyond the single node, parallel sparse matrix-vector operations often suffer from an unfavorable communication to computation ratio. Starting from the observation that nonblocking MPI is not able to hide communication cost using standard MPI implementations, we demonstrate that explicit overlap of communication and computation can be achieved by using a dedicated communication thread, which may run on a virtual core. We compare our approach to pure MPI and the widely used "vector-like" hybrid programming strategy.Comment: 12 pages, 6 figure

    PHIST: a Pipelined, Hybrid-parallel Iterative Solver Toolkit

    Get PDF
    The increasing complexity of hardware and software environments in high-performance computing poses big challenges on the development of sustainable and hardware-efcient numerical software. This paper addresses these challenges in the context of sparse solvers. Existing solutions typically target sustainability, flexibility or performance, but rarely all of them. Our new library PHIST provides implementations of solvers for sparse linear systems and eigenvalue problems. It is a productivity platform for performance-aware developers of algorithms and application software with abstractions that do not obscure the view on hardware-software interaction. The PHIST software architecture and the PHIST development process were designed to overcome shortcomings of existing packages. An interface layer for basic sparse linear algebra functionality that can be provided by multiple backends ensures sustainability, and PHIST supports common techniques for improving scalability and performance of algorithms such as blocking and kernel fusion. We showcase these concepts using the PHIST implementation of a block Jacobi-Davidson solver for non-Hermitian and generalized eigenproblems. We study its performance on a multi-core CPU, a GPU and a large-scale many-core system. Furthermore, we show how an existing implementation of a block Krylov-Schur method in the Trilinos package Anasazi can beneft from the performance engineering techniques used in PHIST

    Approximation spectrale de matrices issues d opérateurs discrétisés

    Get PDF
    Cette thèse considère la solution numérique d'un problème aux valeurs propres de grandes dimensions, dans lequel l'opérateur est dérivé d'un problème de transfert radiatif. Ainsi, cette thèse étudie l'utilisation de matrices hiérarchiques, une représentation efficace de tableaux, très intéressante pour une utilisation avec des problèmes de grandes dimensions. Les matrices sont des représentations hiérarchiques de structures de données efficaces pour les matrices denses, l'idée de base étant la division d'une matrice en une hiérarchie de blocs et l approximation de certains blocs par une matrice de petite caractéristique. Son utilisation permet de diminuer la mémoire nécessaire tout en réduisant les coûts informatiques. L'application de l'utilisation de matrices hiérarchique est analysée dans le contexte de la solution numérique d'un problème aux valeurs propres de grandes dimensions résultant de la discrétisation d'un opérateur intégral. L'opérateur est de convolution et est défini par la première fonction exponentielle intégrale, donc faiblement singulière. Pour le calcul informatique, nous avons accès à HLIB (Hierarchical matrices LIBrary) qui fournit des routines pour la construction de la structure hiérarchique des matrices et des algorithmes pour les opérations approximative avec ces matrices. Nous incorporons certaines routines comme la multiplication matrice-vecteur ou la decomposition LU, en SLEPc (Hierarchical matrices LIBrary) pour explorer les algorithmes existants afin de résoudre les problèmes de valeur propre. Nous développons aussi des expressions analytiques pour l'approximation des noyaux dégénérés utilisés dans la thèse et déduire ainsi les limites supérieures d'erreur pour ces approximations. Les résultats numériques obtenus avec d'autres techniques pour résoudre le problème en question sont utilisés pour la comparaison avec ceux obtenus avec la nouvelle technique, illustrant l'efficacité de ce dernierIn this thesis, we consider the numerical solution of a large eigenvalue problem in which the integral operator comes from a radiative transfer problem. It is considered the use of hierarchical matrices, an efficient data-sparse representation of matrices, especially useful for large dimensional problems. It consists on low-rank subblocks leading to low memory requirements as well as cheap computational costs. We discuss the use of the hierarchical matrix technique in the numerical solution of a large scale eigenvalue problem arising from a finite rank discretization of an integral operator. The operator is of convolution type, it is defined through the first exponential-integral function and hence it is weakly singular. We access HLIB (Hierarchical matrices LIBrary) that provides, among others, routines for the construction of hierarchical matrix structures and arithmetic algorithms to perform approximative matrix operations. Moreover, it is incorporated the matrix-vector multiply routines from HLIB, as well as LU factorization for preconditioning, into SLEPc (Scalable Library for Eigenvalue Problem Computations) in order to exploit the available algorithms to solve eigenvalue problems. It is also developed analytical expressions for the approximate degenerate kernels and deducted error upper bounds for these approximations. The numerical results obtained with other approaches to solve the problem are used to compare with the ones obtained with this technique, illustrating the efficiency of the techniques developed and implemented in this workST ETIENNE-Bib. électronique (422189901) / SudocSudocFranceF

    Computing and deflating eigenvalues while solving multiple right hand side linear systems in Quantum Chromodynamics

    Full text link
    We present a new algorithm that computes eigenvalues and eigenvectors of a Hermitian positive definite matrix while solving a linear system of equations with Conjugate Gradient (CG). Traditionally, all the CG iteration vectors could be saved and recombined through the eigenvectors of the tridiagonal projection matrix, which is equivalent theoretically to unrestarted Lanczos. Our algorithm capitalizes on the iteration vectors produced by CG to update only a small window of vectors that approximate the eigenvectors. While this window is restarted in a locally optimal way, the CG algorithm for the linear system is unaffected. Yet, in all our experiments, this small window converges to the required eigenvectors at a rate identical to unrestarted Lanczos. After the solution of the linear system, eigenvectors that have not accurately converged can be improved in an incremental fashion by solving additional linear systems. In this case, eigenvectors identified in earlier systems can be used to deflate, and thus accelerate, the convergence of subsequent systems. We have used this algorithm with excellent results in lattice QCD applications, where hundreds of right hand sides may be needed. Specifically, about 70 eigenvectors are obtained to full accuracy after solving 24 right hand sides. Deflating these from the large number of subsequent right hand sides removes the dreaded critical slowdown, where the conditioning of the matrix increases as the quark mass reaches a critical value. Our experiments show almost a constant number of iterations for our method, regardless of quark mass, and speedups of 8 over original CG for light quark masses.Comment: 22 pages, 26 eps figure

    A bibliography on parallel and vector numerical algorithms

    Get PDF
    This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
    • …
    corecore