482 research outputs found
A parallel algorithm for Hamiltonian matrix construction in electron-molecule collision calculations: MPI-SCATCI
Construction and diagonalization of the Hamiltonian matrix is the
rate-limiting step in most low-energy electron -- molecule collision
calculations. Tennyson (J Phys B, 29 (1996) 1817) implemented a novel algorithm
for Hamiltonian construction which took advantage of the structure of the
wavefunction in such calculations. This algorithm is re-engineered to make use
of modern computer architectures and the use of appropriate diagonalizers is
considered. Test calculations demonstrate that significant speed-ups can be
gained using multiple CPUs. This opens the way to calculations which consider
higher collision energies, larger molecules and / or more target states. The
methodology, which is implemented as part of the UK molecular R-matrix codes
(UKRMol and UKRMol+) can also be used for studies of bound molecular Rydberg
states, photoionisation and positron-molecule collisions.Comment: Write up of a computer program MPI-SCATCI Computer Physics
Communications, in pres
A Parallel, Distributed Memory Implementation of the Adaptive Sampling Configuration Interaction Method
Many-body simulations of quantum systems is an active field of research that
involves many different methods targeting various computing platforms. Many
methods commonly employed, particularly coupled cluster methods, have been
adapted to leverage the latest advances in modern high-performance
computing.Selected configuration interaction (sCI) methods have seen extensive
usage and development in recent years. However development of sCI methods
targeting massively parallel resources has been explored only in a few research
works. In this work, we present a parallel, distributed memory implementation
of the adaptive sampling configuration interaction approach (ASCI) for sCI. In
particular, we will address key concerns pertaining to the parallelization of
the determinant search and selection, Hamiltonian formation, and the
variational eigenvalue calculation for the ASCI method. Load balancing in the
search step is achieved through the application of memory-efficient determinant
constraints originally developed for the ASCI-PT2 method. Presented benchmarks
demonstrate parallel efficiency exceeding 95\% for the variational ASCI
calculation of Cr (24e,30o) with , and variational
determinants up to 16,384 CPUs. To the best of the authors' knowledge, this is
the largest variational ASCI calculation to date.Comment: 32 pages, 4 figure
The parallel computation of the smallest eigenpair of an acoustic problem with damping
Acoustic problems with damping may give rise to large quadratic eigenproblems. Efficient and parallelizable algorithms are required for solving these problems. The recently proposed Jacobi-Davidson method is well suited for parallel computing: no matrix decomposition and no back or forward substitutions are needed. This paper describes the parallel solution of the smallest eigenpair of a realistic and very large quadratic eigenproblem with the Jacobi-Davidson method
Parallel sparse matrix-vector multiplication as a test case for hybrid MPI+OpenMP programming
We evaluate optimized parallel sparse matrix-vector operations for two
representative application areas on widespread multicore-based cluster
configurations. First the single-socket baseline performance is analyzed and
modeled with respect to basic architectural properties of standard multicore
chips. Going beyond the single node, parallel sparse matrix-vector operations
often suffer from an unfavorable communication to computation ratio. Starting
from the observation that nonblocking MPI is not able to hide communication
cost using standard MPI implementations, we demonstrate that explicit overlap
of communication and computation can be achieved by using a dedicated
communication thread, which may run on a virtual core. We compare our approach
to pure MPI and the widely used "vector-like" hybrid programming strategy.Comment: 12 pages, 6 figure
PHIST: a Pipelined, Hybrid-parallel Iterative Solver Toolkit
The increasing complexity of hardware and software environments in high-performance computing poses big challenges on the
development of sustainable and hardware-efcient numerical software. This paper addresses these challenges in the context of sparse
solvers. Existing solutions typically target sustainability, flexibility or performance, but rarely all of them.
Our new library PHIST provides implementations of solvers for sparse linear systems and eigenvalue problems. It is a productivity
platform for performance-aware developers of algorithms and application software with abstractions that do not obscure the view on
hardware-software interaction.
The PHIST software architecture and the PHIST development process were designed to overcome shortcomings of existing packages.
An interface layer for basic sparse linear algebra functionality that can be provided by multiple backends ensures sustainability, and
PHIST supports common techniques for improving scalability and performance of algorithms such as blocking and kernel fusion.
We showcase these concepts using the PHIST implementation of a block Jacobi-Davidson solver for non-Hermitian and generalized
eigenproblems. We study its performance on a multi-core CPU, a GPU and a large-scale many-core system. Furthermore, we show
how an existing implementation of a block Krylov-Schur method in the Trilinos package Anasazi can beneft from the performance
engineering techniques used in PHIST
Computing and deflating eigenvalues while solving multiple right hand side linear systems in Quantum Chromodynamics
We present a new algorithm that computes eigenvalues and eigenvectors of a
Hermitian positive definite matrix while solving a linear system of equations
with Conjugate Gradient (CG). Traditionally, all the CG iteration vectors could
be saved and recombined through the eigenvectors of the tridiagonal projection
matrix, which is equivalent theoretically to unrestarted Lanczos. Our algorithm
capitalizes on the iteration vectors produced by CG to update only a small
window of vectors that approximate the eigenvectors. While this window is
restarted in a locally optimal way, the CG algorithm for the linear system is
unaffected. Yet, in all our experiments, this small window converges to the
required eigenvectors at a rate identical to unrestarted Lanczos. After the
solution of the linear system, eigenvectors that have not accurately converged
can be improved in an incremental fashion by solving additional linear systems.
In this case, eigenvectors identified in earlier systems can be used to
deflate, and thus accelerate, the convergence of subsequent systems. We have
used this algorithm with excellent results in lattice QCD applications, where
hundreds of right hand sides may be needed. Specifically, about 70 eigenvectors
are obtained to full accuracy after solving 24 right hand sides. Deflating
these from the large number of subsequent right hand sides removes the dreaded
critical slowdown, where the conditioning of the matrix increases as the quark
mass reaches a critical value. Our experiments show almost a constant number of
iterations for our method, regardless of quark mass, and speedups of 8 over
original CG for light quark masses.Comment: 22 pages, 26 eps figure
Approximation spectrale de matrices issues d opérateurs discrétisés
Cette thèse considère la solution numérique d'un problème aux valeurs propres de grandes dimensions, dans lequel l'opérateur est dérivé d'un problème de transfert radiatif. Ainsi, cette thèse étudie l'utilisation de matrices hiérarchiques, une représentation efficace de tableaux, très intéressante pour une utilisation avec des problèmes de grandes dimensions. Les matrices sont des représentations hiérarchiques de structures de données efficaces pour les matrices denses, l'idée de base étant la division d'une matrice en une hiérarchie de blocs et l approximation de certains blocs par une matrice de petite caractéristique. Son utilisation permet de diminuer la mémoire nécessaire tout en réduisant les coûts informatiques. L'application de l'utilisation de matrices hiérarchique est analysée dans le contexte de la solution numérique d'un problème aux valeurs propres de grandes dimensions résultant de la discrétisation d'un opérateur intégral. L'opérateur est de convolution et est défini par la première fonction exponentielle intégrale, donc faiblement singulière. Pour le calcul informatique, nous avons accès à HLIB (Hierarchical matrices LIBrary) qui fournit des routines pour la construction de la structure hiérarchique des matrices et des algorithmes pour les opérations approximative avec ces matrices. Nous incorporons certaines routines comme la multiplication matrice-vecteur ou la decomposition LU, en SLEPc (Hierarchical matrices LIBrary) pour explorer les algorithmes existants afin de résoudre les problèmes de valeur propre. Nous développons aussi des expressions analytiques pour l'approximation des noyaux dégénérés utilisés dans la thèse et déduire ainsi les limites supérieures d'erreur pour ces approximations. Les résultats numériques obtenus avec d'autres techniques pour résoudre le problème en question sont utilisés pour la comparaison avec ceux obtenus avec la nouvelle technique, illustrant l'efficacité de ce dernierIn this thesis, we consider the numerical solution of a large eigenvalue problem in which the integral operator comes from a radiative transfer problem. It is considered the use of hierarchical matrices, an efficient data-sparse representation of matrices, especially useful for large dimensional problems. It consists on low-rank subblocks leading to low memory requirements as well as cheap computational costs. We discuss the use of the hierarchical matrix technique in the numerical solution of a large scale eigenvalue problem arising from a finite rank discretization of an integral operator. The operator is of convolution type, it is defined through the first exponential-integral function and hence it is weakly singular. We access HLIB (Hierarchical matrices LIBrary) that provides, among others, routines for the construction of hierarchical matrix structures and arithmetic algorithms to perform approximative matrix operations. Moreover, it is incorporated the matrix-vector multiply routines from HLIB, as well as LU factorization for preconditioning, into SLEPc (Scalable Library for Eigenvalue Problem Computations) in order to exploit the available algorithms to solve eigenvalue problems. It is also developed analytical expressions for the approximate degenerate kernels and deducted error upper bounds for these approximations. The numerical results obtained with other approaches to solve the problem are used to compare with the ones obtained with this technique, illustrating the efficiency of the techniques developed and implemented in this workST ETIENNE-Bib. électronique (422189901) / SudocSudocFranceF
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
- …