10 research outputs found
Parallel alternating iterative algorithms with and without overlapping on multicore architectures
We consider the problem of solving large sparse linear systems where the coefficient matrix is possibly singular but the equations are consistent. Block two-stage methods in which the inner iterations are performed using alternating methods are studied. These methods are ideal for parallel processing and provide a very general setting to study parallel block methods including overlapping. Convergence properties of these methods are established when the matrix in question is either M-matrix or symmetric matrix. Different parallel versions of these methods and implementation strategies, with and without overlapping blocks, are explored. The reported experiments show the behavior and effectiveness of the designed parallel algorithms by exploiting the benefits of shared memory inside the nodes of current SMP supercomputers.This research was partially supported by the Spanish Ministry of Science and Innovation under grant number TIN2011-26254, and by the European Union FEDER (CAPAP-H5 network TIN2014-53522- REDT)
Recommended from our members
Fermion Low Modes in Lattice QCD: Topology, the η' Mass and Algorithm Development
Lattice gauge theory is an important approach to understanding quantum chromodynamics (QCD) due to the large coupling constant in the theory at low energy. In this thesis, we report our study of the topological properties of the gauge fields and we calculate _η and _η' which are related to the topology of the gauge fields. We also develop two algorithms to speed up the inversion of the Dirac equation which is computationally demanding in lattice QCD calculations.
The topology of lattice gauge fields is important but difficult to study because of the large local fluctuations of the gauge fields. In chapter 2, we probe the topological properties of the gauge fields through the measurement of closed quark loops, field strength and low-lying eigenvectors of the Shamir domain wall operator. The closed quark loops suggest the slow evolution of topological modes during the generation of QCD configurations. The chirality of the low-lying eigenvectors is studied and the lattice eigenvectors are compared to the eigenvectors in the continuous theory. The topological charges are calculated from the eigenvectors and the results agree with the topological charges calculated from the smoothed gauge fields. The fermion correlators are also obtained from the eigenvectors.
The non-trivial topological properties of QCD gauge fields are important to the mass of the η and η', _η and _η'. Lattice QCD is an area where _{\eta} can be calculated by using gauge fields that are sampled over different topological sectors. We calculate _η and _η' in chapter 3 by including the fermion correlators and the topological charge density correlators. The errors of _η and _η' are reduced to the percent level and the mixing angle between the octet, singlet states in the SU(3) limit and the physical eigenstates is calculated.
An algorithm that reduces communication and increases the usage of the local computational power is developed in chapter 4. The algorithm uses the multisplitting algorithm as a preconditioner in the preconditioned conjugate gradient method. It speeds up the inversion of the Dirac equation during the evolution phase.
In chapter 5, we utilize two lattices, called the coarse lattice and the fine lattice, that lie on the renormalization group trajectory and have different lattice spacings. We find that the low-mode space of the coarse lattice corresponds to the low-mode space of the fine lattice. Because of the correspondence, the coarse lattice can be used to solve the low modes of the fine lattice. The coarse lattice is used in the restart algorithm and the preconditioned conjugate gradient algorithm where the latter is called the renormalization group based preconditioned conjugate gradient algorithm (RGPCG). By using the near-null vectors as the filter, RGPCG could reduce the operations of the matrix multiplications on the fine lattice by 33% to 44% for the inversion of Dirac equation. The algorithm works better than the conjugate gradient algorithm when multiple equations are solved
Recommended from our members
Lattice QCD Simulations towards Strong and Weak Coupling Limits
Lattice gauge theory is a special regularization of continuum gauge theories and the numerical simulation of lattice quantum chromodynamics (QCD) remains as the only first principle method to study non-perturbative QCD at low energy. The lattice spacing a, which serves as the ultraviolet cut off, plays a significant role in determining error on any lattice simulation results. Physical results come from extrapolating a series of simulations with different values for a to a=0. Reducing the size of these errors for non-zero a improves the extrapolation and minimizes the error.
In the strong coupling limit the coarse lattice spacing pushes the analysis of the finite lattice spacing error to its limit. Section 4 measures two renormalized physical observables, the neutral kaon mixing parameter BK and the Delta I=3/2 K pi pi decay amplitude A2 on a lattice with coarse lattice spacing of a ~ 1GeV and explores the a^2 scaling properties at this scale.
In the weak coupling limit the lattice simulations suffer from critical slowing down where for the Monte Carlo Markov evolution the cost of generating decorrelated samples increases significantly as the lattice spacing decreases, which makes reliable error analysis on the results expensive. Among the observables the topological charge of the configurations appears to have the longest integrated autocorrelation time. Based on a previous work where a diffusion model is proposed to describe the evolution of the topological charge, section 2 extends this model to lattices with dynamical fermions using a new numerical method that captures the behavior for different Fourier modes.
Section 3 describes our effort to find a practical renormalization group transformation to transform lattice QCD between two different scales, whose knowledge could ultimately leads to a multi-scale evolution algorithm that solves the problem of critical slowing down. For a particular choice of action, we have found that doubling the lattice spacing of a fine lattice yields observables that agree at the few precent level with direct simulations on the coarser lattice.
Section 5 aims at speeding up the lattice simulations in the weak coupling limit from the numerical method and hardware perspective. It proposes a preconditioner for solving the Dirac equation targeting the ensemble generation phase and details its implementation on currently the fastest supercomputer in the world
Algorithmes Parallèles Asynchrones pour la Simulation Numérique
En simulation numérique, la discrétisation des problèmes aux limites nous amène à résoudre des systèmes algébriques de grande dimension. Parmi les voies d'investigation et compte tenu de l'évolution actuelle des architectures des ordinateurs, la parallélisation des algorithmes est une solution naturelle pour résoudre ces problèmes. Or lorsqu'on exploite des calculateurs parallèles, les temps d'attente dus à la synchronisation entre les processus coopérants deviennent pénalisants ; cette perte de temps s'avère d'autant plus considérable en présence de déséquilibre de charge. Les algorithmes parallèles asynchrones permettent d'envisager de minimiser les pertes de temps dus la synchronisation, sans faire appel aux techniques d'équilibrage de charge. Ce sont des algorithmes itératifs dans lesquels les composantes du vecteur itéré sont réactualisées en parallèle, dans un ordre arbitraire et sans synchronisation. Les restrictions imposées aux algorithmes sont très faibles. De plus, les modèles mathématiques qui décrivent ce type de méthode permettent de prendre en compte le maximum de flexibilité entre les processus et d'assurer, sous certaines hypothèses, la convergence des algorithmes itératifs. Dans l'étude proposée, les modèles mathématiques ainsi que les théorèmes de convergence des itérations parallèles asynchrones classiques et avec communication flexible sont présentés dans un premier temps. Ensuite, nous exposons la parallélisation de l'algorithme de Schwarz à l'aide de la bibliothèque MPI (Message Passing Interface). Une étude de performance menée sur le serveur de calcul de l'IDRIS (Institut du Développement et des Ressources en Informatique Scientifique) permet de comparer les versions synchrones et asynchrones de l'algorithme parallèle dans le cadre de la résolution d'un problème de convection-diffusion tridimensionnel. Elle met en évidence les gains de temps obtenus grâce à l'asynchronisme. Enfin, sur le plan applicatif, nous nous sommes intéressés à des problèmes tridimensionnels tels que l'électrophorèse de zone à écoulement continu (dont le modèle mathématique résulte d'un couplage entre une équation de Navier-Stokes, une équation de convection-diffusion et une équation de Poisson généralisée) et le problème de l'obstacle (intervenant en mécanique et en mathématiques financières). Dans le cadre de ces applications, des études de performances ont également été menées. ABSTRACT : In numerical simulation, the discretization of boundary value problems lead to the solution of large sparse linear systems. Among the research topics and regard to the evolution of computer architectures, the parallelisation of the algorithms is a natural way to overcome the problems. However, the overhead due to the synchronization between the processors is the drawback of the use of parallel computers ; the waste of time is even more significant as the load is unbalanced. Parallel asynchronous algorithms allow to minimize the overhead due to synchronisation, without using load balancing techniques. These iterative algorithms consist in updating the components of the iteration vector in a parallel way, without synchronization, in an arbitrary order. The restrictions imposed on these algorithms are very weak. Furthermore, the mathematical models that describe the considered algorithms take into account very flexible parallel computation schemes and ensure the convergence of the iterative algorithms, under some hypothesis. The structure of the thesis is as follows. Firstly, mathematical models and convergence results of classical and flexible communication asynchronous iterations are presented. Then the implementation of parallel asynchronous Schwarz algorithm using MPI (Message Passing Interface) is exposed. The synchronous and asynchronous implementations of the algorithms are compared in the context of the solution of 3D convection-diffusion equations. The numerical experiments are carried out on the supercomputer of IDRIS (Institut du Développement et des Ressources en Informatique Scientifique). The benefits brought by asynchronism are shown. Finally, the algorithms are applied to the solution of 3D problems such as the continuous flow electrophoresis (which consists in coupling an incompressible Navier-Stokes equation with a convection-diffusion equation and a generalised Poisson equation) and the obstacle problem (which occurs in financial mathematics). Performance studies have also been carried out in the context of these applications
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
CONVERGENCE OF NON-STATIONARY PARALLEL MULTISPLITTING METHODS FOR HERMITIAN POSITIVE DEFINITE MATRICES
Abstract. Non-stationary multisplitting algorithms for the solution of linear systems are studied. Convergence of these algorithms is analyzed when the coefficient matrix of the linear system is hermitian positive definite. Asynchronous versions of these algorithms are considered and their convergence investigated. 1