306 research outputs found
A Novel Partitioning Method for Accelerating the Block Cimmino Algorithm
We propose a novel block-row partitioning method in order to improve the
convergence rate of the block Cimmino algorithm for solving general sparse
linear systems of equations. The convergence rate of the block Cimmino
algorithm depends on the orthogonality among the block rows obtained by the
partitioning method. The proposed method takes numerical orthogonality among
block rows into account by proposing a row inner-product graph model of the
coefficient matrix. In the graph partitioning formulation defined on this graph
model, the partitioning objective of minimizing the cutsize directly
corresponds to minimizing the sum of inter-block inner products between block
rows thus leading to an improvement in the eigenvalue spectrum of the iteration
matrix. This in turn leads to a significant reduction in the number of
iterations required for convergence. Extensive experiments conducted on a large
set of matrices confirm the validity of the proposed method against a
state-of-the-art method
Distributed Solution of Large-Scale Linear Systems via Accelerated Projection-Based Consensus
Solving a large-scale system of linear equations is a key step at the heart
of many algorithms in machine learning, scientific computing, and beyond. When
the problem dimension is large, computational and/or memory constraints make it
desirable, or even necessary, to perform the task in a distributed fashion. In
this paper, we consider a common scenario in which a taskmaster intends to
solve a large-scale system of linear equations by distributing subsets of the
equations among a number of computing machines/cores. We propose an accelerated
distributed consensus algorithm, in which at each iteration every machine
updates its solution by adding a scaled version of the projection of an error
signal onto the nullspace of its system of equations, and where the taskmaster
conducts an averaging over the solutions with momentum. The convergence
behavior of the proposed algorithm is analyzed in detail and analytically shown
to compare favorably with the convergence rate of alternative distributed
methods, namely distributed gradient descent, distributed versions of
Nesterov's accelerated gradient descent and heavy-ball method, the block
Cimmino method, and ADMM. On randomly chosen linear systems, as well as on
real-world data sets, the proposed method offers significant speed-up relative
to all the aforementioned methods. Finally, our analysis suggests a novel
variation of the distributed heavy-ball method, which employs a particular
distributed preconditioning, and which achieves the same theoretical
convergence rate as the proposed consensus-based method
MUMPS based approach to parallelize the block cimmino algorithm
The Cimmino method is a row projection method in which the original linear system is divided into subsystems. At every iteration, it computes one projection per subsystem and uses these projections to construct an approximation to the solution of the linear system. The usual parallelization strategy applied in block algorithms is to distribute the different blocks on the different available processors. In this paper, we follow another approach where we do not perform explicitely this block distribution to processors whithin the code, but let the multi-frontal sparse solver MUMPS handle the data distribution and parallelism. The data coming from the subsystems defined by the block partition in the Block Cimmino method are gathered in an unique matrix which is analysed, distributed and factorized in parallel by MUMPS. Our target is to define a methodology for parallelism based only on the functionalities provided by general sparse solver libraries and how efficient this way of doing can be. We relate the development of this new approach from an existing code written in Fortran 77 to the MUMPS-embedded version. The results of the ongoing numerical experiments will be presented in the conferenc
Méthodes hybrides pour la résolution de grands systèmes linéaires creux sur calculateurs parallèles
Nous nous intéressons à la résolution en parallèle de système d’équations linéaires creux et de large taille. Le calcul de la solution d’un tel type de système requiert un grand espace mémoire et une grande puissance de calcul. Il existe deux principales méthodes de résolution de systèmes linéaires. Soit la méthode est directe et de ce fait est rapide et précise, mais consomme beaucoup de mémoire. Soit elle est itérative, économe en mémoire, mais assez lente à atteindre une solution de qualité suffisante. Notre travail consiste à combiner ces deux techniques pour créer un solveur hybride efficient en consommation mémoire tout en étant rapide et robuste. Nous essayons ensuite d’améliorer ce solveur en introduisant une nouvelle méthode pseudo directe qui contourne certains inconvénients de la méthode précédente. Dans les premiers chapitres nous examinons les méthodes de projections par lignes, en particulier la méthode Cimmino en bloc, certains de leurs aspects numériques et comment ils affectent la convergence. Ensuite, nous analyserons l’accélération de ces techniques avec la méthode des gradients conjugués et comment cette accélération peut être améliorée avec une version en bloc du gradient conjugué. Nous regarderons ensuite comment le partitionnement du système linéaire affecte lui aussi la convergence et comment nous pouvons améliorer sa qualité. Finalement, nous examinerons l’implantation en parallèle du solveur hybride, ses performances ainsi que les améliorations possible. Les deux derniers chapitres introduisent une amélioration à ce solveur hybride, en améliorant les propriétés numériques du système linéaire, de sorte à avoir une convergence en une seule itération et donc un solveur pseudo direct. Nous commençons par examiner les propriétés numériques du système résultants, analyser la solution parallèle et comment elle se comporte face au solveur hybride et face à un solveur direct. Finalement, nous introduisons de possible amélioration au solveur pseudo direct. Ce travail a permis d’implanter un solveur hybride "ABCD solver" (Augmented Block Cimmino Distributed solver) qui peut soit fonctionner en mode itératif ou en mode pseudo direct. ABSTRACT : We are interested in solving large sparse systems of linear equations in parallel. Computing the solution of such systems requires a large amount of memory and computational power. The two main ways to obtain the solution are direct and iterative approaches. The former achieves this goal fast but with a large memory footprint while the latter is memory friendly but can be slow to converge. In this work we try first to combine both approaches to create a hybrid solver that can be memory efficient while being fast. Then we discuss a novel approach that creates a pseudo-direct solver that compensates for the drawback of the earlier approach. In the first chapters we take a look at row projection techniques, especially the block Cimmino method and examine some of their numerical aspects and how they affect the convergence. We then discuss the acceleration of convergence using conjugate gradients and show that a block version improves the convergence. Next, we see how partitioning the linear system affects the convergence and show how to improve its quality. We finish by discussing the parallel implementation of the hybrid solver, discussing its performance and seeing how it can be improved. The last two chapters focus on an improvement to this hybrid solver. We try to improve the numerical properties of the linear system so that we converge in a single iteration which results in a pseudo-direct solver. We first discuss the numerical properties of the new system, see how it works in parallel and see how it performs versus the iterative version and versus a direct solver. We finally consider some possible improvements to the solver. This work led to the implementation of a hybrid solver, our "ABCD solver" (Augmented Block Cimmino Distributed solver), that can either work in a fully iterative mode or in a pseudo-direct mode
On the Effects of Data Heterogeneity on the Convergence Rates of Distributed Linear System Solvers
We consider the fundamental problem of solving a large-scale system of linear
equations. In particular, we consider the setting where a taskmaster intends to
solve the system in a distributed/federated fashion with the help of a set of
machines, who each have a subset of the equations. Although there exist several
approaches for solving this problem, missing is a rigorous comparison between
the convergence rates of the projection-based methods and those of the
optimization-based ones. In this paper, we analyze and compare these two
classes of algorithms with a particular focus on the most efficient method from
each class, namely, the recently proposed Accelerated Projection-Based
Consensus (APC) and the Distributed Heavy-Ball Method (D-HBM). To this end, we
first propose a geometric notion of data heterogeneity called angular
heterogeneity and discuss its generality. Using this notion, we bound and
compare the convergence rates of the studied algorithms and capture the effects
of both cross-machine and local data heterogeneity on these quantities. Our
analysis results in a number of novel insights besides showing that APC is the
most efficient method in realistic scenarios where there is a large data
heterogeneity. Our numerical analyses validate our theoretical results.Comment: 11 pages, 5 figure
The solution of large sparse linear systems on parallel computers using a hybrid implementation of the block Cimmino method
We are interested in solving large sparse systems of linear equations in parallel. Computing the solution of such systems requires a large amount of memory and computational power. The two main ways to obtain the solution are direct and iterative approaches. The former achieves this goal fast but with a large memory footprint while the latter is memory friendly but can be slow to converge. In this work we try first to combine both approaches to create a hybrid solver that can be memory efficient while being fast. Then we discuss a novel approach that creates a pseudo-direct solver that compensates for the drawback of the earlier approach. In the first chapters we take a look at row projection techniques, especially the block Cimmino method and examine some of their numerical aspects and how they affect the convergence. We then discuss the acceleration of convergence using conjugate gradients and show that a block version improves the convergence. Next, we see how partitioning the linear system affects the convergence and show how to improve its quality. We finish by discussing the parallel implementation of the hybrid solver, discussing its performance and seeing how it can be improved. The last two chapters focus on an improvement to this hybrid solver. We try to improve the numerical properties of the linear system so that we converge in a single iteration which results in a pseudo-direct solver. We first discuss the numerical properties of the new system, see how it works in parallel and see how it performs versus the iterative version and versus a direct solver. We finally consider some possible improvements to the solver. This work led to the implementation of a hybrid solver, our "ABCD solver" (Augmented Block Cimmino Distributed solver), that can either work in a fully iterative mode or in a pseudo-direct mode
Hybrid direct and interactive solvers for sparse indefinite and overdetermined systems on future exascale architectures
In scientific computing, the numerical simulation of systems is crucial to get a deep understanding of the physics underlying real world applications. The models used in simulation are often based on partial differential equations (PDE) which, after fine discretisation, give rise to huge sparse systems of equations to solve. Historically, 2 classes of methods were designed for the solution of such systems: direct methods, robust but expensive in both computations and memory; and iterative methods, cheap but with a very problem-dependent convergence properties. In the context of high performance computing, hybrid direct-iterative methods were then introduced inorder to combine the advantages of both methods, while using efficiently the increasingly largeand fast supercomputing facilities. In this thesis, we focus on the latter type of methods with two complementary research axis.In the first chapter, we detail the mechanisms behind the efficient implementation of multigrid methods. The latter makes use of several levels of increasingly refined grids to solve linear systems with a combination of fine grid smoothing and coarse grid corrections. The efficient parallel implementation of such a scheme is a difficult task. We focus on the solution of the problem on the coarse grid whose scalability is often observed as limiting at very large scales. We propose an agglomeration technique to gather the data of the coarse grid problem on a subset ofthe computing resources in order to minimise the execution time of a direct solver. Combined with a relaxation of the solution accuracy, we demonstrate an increased overall scalability of the multigrid scheme when using our approach compared to classical iterative methods, when the problem is numerically difficult. At extreme scale, this study is carried in the HHG framework(Hierarchical Hybrid Grids) for the solution of a Stokes problem with jumping coefficients, inspired from Earth's mantle convection simulation. The direct solver used on the coarse grid is MUMPS,combined with block low-rank approximation and single precision arithmetic.In the following chapters, we study some hybrid methods derived from the classical row-projection method block Cimmino, and interpreted as domain decomposition methods. These methods are based on the partitioning of the matrix into blocks of rows. Due to its known slow convergence, the original iterative scheme is accelerated with a stabilised block version of the conjugate gradient algorithm. While an optimal choice of block size improves the efficiency of this approach, the convergence stays problem dependent. An alternative solution is then introduced which enforces a convergence in one iteration by embedding the linear system into a carefully augmented space.These two approaches are extended in order to compute the minimum norm solution of in definite systems and the solution of least-squares problems. The latter problems require a partitioning in blocks of columns. We show how to improve the numerical properties of the iterative and pseudo-direct methods with scaling, partitioning and better augmentation methods. Both methods are implemented in the parallel solver ABCD-Solver (Augmented Block Cimmino Distributed solver)whose parallelisation we improve through a combination of load balancing and communication minimising techniques.Finally, for the solution of discretised PDE problems, we propose a new approach which augments the linear system using a coarse representation of the space. The size of the augmentation is controlled by the choice of a more or less refined mesh. We obtain an iterative method with fast linear convergence demonstrated on Helmholtz and Convection-Diffusion problems. The central point of the approach is the iterative construction and solution of a Schur complemen
- …