306 research outputs found

    A Novel Partitioning Method for Accelerating the Block Cimmino Algorithm

    Get PDF
    We propose a novel block-row partitioning method in order to improve the convergence rate of the block Cimmino algorithm for solving general sparse linear systems of equations. The convergence rate of the block Cimmino algorithm depends on the orthogonality among the block rows obtained by the partitioning method. The proposed method takes numerical orthogonality among block rows into account by proposing a row inner-product graph model of the coefficient matrix. In the graph partitioning formulation defined on this graph model, the partitioning objective of minimizing the cutsize directly corresponds to minimizing the sum of inter-block inner products between block rows thus leading to an improvement in the eigenvalue spectrum of the iteration matrix. This in turn leads to a significant reduction in the number of iterations required for convergence. Extensive experiments conducted on a large set of matrices confirm the validity of the proposed method against a state-of-the-art method

    Distributed Solution of Large-Scale Linear Systems via Accelerated Projection-Based Consensus

    Get PDF
    Solving a large-scale system of linear equations is a key step at the heart of many algorithms in machine learning, scientific computing, and beyond. When the problem dimension is large, computational and/or memory constraints make it desirable, or even necessary, to perform the task in a distributed fashion. In this paper, we consider a common scenario in which a taskmaster intends to solve a large-scale system of linear equations by distributing subsets of the equations among a number of computing machines/cores. We propose an accelerated distributed consensus algorithm, in which at each iteration every machine updates its solution by adding a scaled version of the projection of an error signal onto the nullspace of its system of equations, and where the taskmaster conducts an averaging over the solutions with momentum. The convergence behavior of the proposed algorithm is analyzed in detail and analytically shown to compare favorably with the convergence rate of alternative distributed methods, namely distributed gradient descent, distributed versions of Nesterov's accelerated gradient descent and heavy-ball method, the block Cimmino method, and ADMM. On randomly chosen linear systems, as well as on real-world data sets, the proposed method offers significant speed-up relative to all the aforementioned methods. Finally, our analysis suggests a novel variation of the distributed heavy-ball method, which employs a particular distributed preconditioning, and which achieves the same theoretical convergence rate as the proposed consensus-based method

    MUMPS based approach to parallelize the block cimmino algorithm

    Get PDF
    The Cimmino method is a row projection method in which the original linear system is divided into subsystems. At every iteration, it computes one projection per subsystem and uses these projections to construct an approximation to the solution of the linear system. The usual parallelization strategy applied in block algorithms is to distribute the different blocks on the different available processors. In this paper, we follow another approach where we do not perform explicitely this block distribution to processors whithin the code, but let the multi-frontal sparse solver MUMPS handle the data distribution and parallelism. The data coming from the subsystems defined by the block partition in the Block Cimmino method are gathered in an unique matrix which is analysed, distributed and factorized in parallel by MUMPS. Our target is to define a methodology for parallelism based only on the functionalities provided by general sparse solver libraries and how efficient this way of doing can be. We relate the development of this new approach from an existing code written in Fortran 77 to the MUMPS-embedded version. The results of the ongoing numerical experiments will be presented in the conferenc

    Méthodes hybrides pour la résolution de grands systèmes linéaires creux sur calculateurs parallèles

    Get PDF
    Nous nous intéressons à la résolution en parallèle de système d’équations linéaires creux et de large taille. Le calcul de la solution d’un tel type de système requiert un grand espace mémoire et une grande puissance de calcul. Il existe deux principales méthodes de résolution de systèmes linéaires. Soit la méthode est directe et de ce fait est rapide et précise, mais consomme beaucoup de mémoire. Soit elle est itérative, économe en mémoire, mais assez lente à atteindre une solution de qualité suffisante. Notre travail consiste à combiner ces deux techniques pour créer un solveur hybride efficient en consommation mémoire tout en étant rapide et robuste. Nous essayons ensuite d’améliorer ce solveur en introduisant une nouvelle méthode pseudo directe qui contourne certains inconvénients de la méthode précédente. Dans les premiers chapitres nous examinons les méthodes de projections par lignes, en particulier la méthode Cimmino en bloc, certains de leurs aspects numériques et comment ils affectent la convergence. Ensuite, nous analyserons l’accélération de ces techniques avec la méthode des gradients conjugués et comment cette accélération peut être améliorée avec une version en bloc du gradient conjugué. Nous regarderons ensuite comment le partitionnement du système linéaire affecte lui aussi la convergence et comment nous pouvons améliorer sa qualité. Finalement, nous examinerons l’implantation en parallèle du solveur hybride, ses performances ainsi que les améliorations possible. Les deux derniers chapitres introduisent une amélioration à ce solveur hybride, en améliorant les propriétés numériques du système linéaire, de sorte à avoir une convergence en une seule itération et donc un solveur pseudo direct. Nous commençons par examiner les propriétés numériques du système résultants, analyser la solution parallèle et comment elle se comporte face au solveur hybride et face à un solveur direct. Finalement, nous introduisons de possible amélioration au solveur pseudo direct. Ce travail a permis d’implanter un solveur hybride "ABCD solver" (Augmented Block Cimmino Distributed solver) qui peut soit fonctionner en mode itératif ou en mode pseudo direct. ABSTRACT : We are interested in solving large sparse systems of linear equations in parallel. Computing the solution of such systems requires a large amount of memory and computational power. The two main ways to obtain the solution are direct and iterative approaches. The former achieves this goal fast but with a large memory footprint while the latter is memory friendly but can be slow to converge. In this work we try first to combine both approaches to create a hybrid solver that can be memory efficient while being fast. Then we discuss a novel approach that creates a pseudo-direct solver that compensates for the drawback of the earlier approach. In the first chapters we take a look at row projection techniques, especially the block Cimmino method and examine some of their numerical aspects and how they affect the convergence. We then discuss the acceleration of convergence using conjugate gradients and show that a block version improves the convergence. Next, we see how partitioning the linear system affects the convergence and show how to improve its quality. We finish by discussing the parallel implementation of the hybrid solver, discussing its performance and seeing how it can be improved. The last two chapters focus on an improvement to this hybrid solver. We try to improve the numerical properties of the linear system so that we converge in a single iteration which results in a pseudo-direct solver. We first discuss the numerical properties of the new system, see how it works in parallel and see how it performs versus the iterative version and versus a direct solver. We finally consider some possible improvements to the solver. This work led to the implementation of a hybrid solver, our "ABCD solver" (Augmented Block Cimmino Distributed solver), that can either work in a fully iterative mode or in a pseudo-direct mode

    On the Effects of Data Heterogeneity on the Convergence Rates of Distributed Linear System Solvers

    Full text link
    We consider the fundamental problem of solving a large-scale system of linear equations. In particular, we consider the setting where a taskmaster intends to solve the system in a distributed/federated fashion with the help of a set of machines, who each have a subset of the equations. Although there exist several approaches for solving this problem, missing is a rigorous comparison between the convergence rates of the projection-based methods and those of the optimization-based ones. In this paper, we analyze and compare these two classes of algorithms with a particular focus on the most efficient method from each class, namely, the recently proposed Accelerated Projection-Based Consensus (APC) and the Distributed Heavy-Ball Method (D-HBM). To this end, we first propose a geometric notion of data heterogeneity called angular heterogeneity and discuss its generality. Using this notion, we bound and compare the convergence rates of the studied algorithms and capture the effects of both cross-machine and local data heterogeneity on these quantities. Our analysis results in a number of novel insights besides showing that APC is the most efficient method in realistic scenarios where there is a large data heterogeneity. Our numerical analyses validate our theoretical results.Comment: 11 pages, 5 figure

    The solution of large sparse linear systems on parallel computers using a hybrid implementation of the block Cimmino method

    Get PDF
    We are interested in solving large sparse systems of linear equations in parallel. Computing the solution of such systems requires a large amount of memory and computational power. The two main ways to obtain the solution are direct and iterative approaches. The former achieves this goal fast but with a large memory footprint while the latter is memory friendly but can be slow to converge. In this work we try first to combine both approaches to create a hybrid solver that can be memory efficient while being fast. Then we discuss a novel approach that creates a pseudo-direct solver that compensates for the drawback of the earlier approach. In the first chapters we take a look at row projection techniques, especially the block Cimmino method and examine some of their numerical aspects and how they affect the convergence. We then discuss the acceleration of convergence using conjugate gradients and show that a block version improves the convergence. Next, we see how partitioning the linear system affects the convergence and show how to improve its quality. We finish by discussing the parallel implementation of the hybrid solver, discussing its performance and seeing how it can be improved. The last two chapters focus on an improvement to this hybrid solver. We try to improve the numerical properties of the linear system so that we converge in a single iteration which results in a pseudo-direct solver. We first discuss the numerical properties of the new system, see how it works in parallel and see how it performs versus the iterative version and versus a direct solver. We finally consider some possible improvements to the solver. This work led to the implementation of a hybrid solver, our "ABCD solver" (Augmented Block Cimmino Distributed solver), that can either work in a fully iterative mode or in a pseudo-direct mode

    Hybrid direct and interactive solvers for sparse indefinite and overdetermined systems on future exascale architectures

    Get PDF
    In scientific computing, the numerical simulation of systems is crucial to get a deep understanding of the physics underlying real world applications. The models used in simulation are often based on partial differential equations (PDE) which, after fine discretisation, give rise to huge sparse systems of equations to solve. Historically, 2 classes of methods were designed for the solution of such systems: direct methods, robust but expensive in both computations and memory; and iterative methods, cheap but with a very problem-dependent convergence properties. In the context of high performance computing, hybrid direct-iterative methods were then introduced inorder to combine the advantages of both methods, while using efficiently the increasingly largeand fast supercomputing facilities. In this thesis, we focus on the latter type of methods with two complementary research axis.In the first chapter, we detail the mechanisms behind the efficient implementation of multigrid methods. The latter makes use of several levels of increasingly refined grids to solve linear systems with a combination of fine grid smoothing and coarse grid corrections. The efficient parallel implementation of such a scheme is a difficult task. We focus on the solution of the problem on the coarse grid whose scalability is often observed as limiting at very large scales. We propose an agglomeration technique to gather the data of the coarse grid problem on a subset ofthe computing resources in order to minimise the execution time of a direct solver. Combined with a relaxation of the solution accuracy, we demonstrate an increased overall scalability of the multigrid scheme when using our approach compared to classical iterative methods, when the problem is numerically difficult. At extreme scale, this study is carried in the HHG framework(Hierarchical Hybrid Grids) for the solution of a Stokes problem with jumping coefficients, inspired from Earth's mantle convection simulation. The direct solver used on the coarse grid is MUMPS,combined with block low-rank approximation and single precision arithmetic.In the following chapters, we study some hybrid methods derived from the classical row-projection method block Cimmino, and interpreted as domain decomposition methods. These methods are based on the partitioning of the matrix into blocks of rows. Due to its known slow convergence, the original iterative scheme is accelerated with a stabilised block version of the conjugate gradient algorithm. While an optimal choice of block size improves the efficiency of this approach, the convergence stays problem dependent. An alternative solution is then introduced which enforces a convergence in one iteration by embedding the linear system into a carefully augmented space.These two approaches are extended in order to compute the minimum norm solution of in definite systems and the solution of least-squares problems. The latter problems require a partitioning in blocks of columns. We show how to improve the numerical properties of the iterative and pseudo-direct methods with scaling, partitioning and better augmentation methods. Both methods are implemented in the parallel solver ABCD-Solver (Augmented Block Cimmino Distributed solver)whose parallelisation we improve through a combination of load balancing and communication minimising techniques.Finally, for the solution of discretised PDE problems, we propose a new approach which augments the linear system using a coarse representation of the space. The size of the augmentation is controlled by the choice of a more or less refined mesh. We obtain an iterative method with fast linear convergence demonstrated on Helmholtz and Convection-Diffusion problems. The central point of the approach is the iterative construction and solution of a Schur complemen
    • …
    corecore