32 research outputs found
Approximation de rang faible pour les matrices creuses
In this paper we present an algorithm for computing a low rank approximation of a sparse matrix based on a truncated LU factorization with column and row permutations. We present various approaches for determining the column and row permutations that show a trade-off between speed versus deterministic/probabilistic accuracy. We show that if the permutations are chosen by using tournament pivoting based on QR factorization, then the obtained truncated LU factorization with column/row tournament pivoting, LU\_CRTP, satisfies bounds on the singular values which have similarities with the ones obtained by a communication avoiding rank revealing QR factorization. Experiments on challenging matrices show that LU_CRTP provides a good low rank approximation of the input matrix and it is less expensive than the rank revealing QR factorization in terms of computational and memory usage costs, while also minimizing the communication cost. We also compare the computational complexity of our algorithm with randomizedalgorithms and show that for sparse matrices and high enough but still modest accuracies, our approach is faster.Ce papier introduit un algorithme pour calculer une approximation de rang faible d’une matrice creuse. Cet algorithme est basé sur une factorisation LU avec des permutations de lignes et de colonnes
Reducing Communication in the Solution of Linear Systems
There is a growing performance gap between computation and communication on modern computers, making it crucial to develop algorithms with lower latency and bandwidth requirements. Because systems of linear equations are important for numerous scientific and engineering applications, I have studied several approaches for reducing communication in those problems. First, I developed optimizations to dense LU with partial pivoting, which downstream applications can adopt with little to no effort. Second, I consider two techniques to completely replace pivoting in dense LU, which can provide significantly higher speedups, albeit without the same numerical guarantees as partial pivoting. One technique uses randomized preprocessing, while the other is a novel combination of block factorization and additive perturbation. Finally, I investigate using mixed precision in GMRES for solving sparse systems, which reduces the volume of data movement, and thus, the pressure on the memory bandwidth
Linear-time CUR approximation of BEM matrices
International audienceIn this paper we propose linear-time CUR approximation algorithms for admissible matrices obtained from the hierarchical form of Boundary Element matrices. We propose a new approach called geometric sampling to obtain indices of most significant rows and columns usinginformation from the domains where the problem is posed. Our strategy is tailored to Boundary Element Methods (BEM) since it uses directly and explicitly the cluster tree containing information from the problem geometry. Our CUR algorithm has precision comparable with low-rankapproximations created with the truncated QR factorization with column pivoting (QRCP) and the Adaptive Cross Approximation (ACA) with full pivoting, which are quadratic-cost methods. When compared to the well-known linear-time algorithm ACA with partial pivoting, we show that our algorithm improves, in general, the convergence error and overcomes some cases where ACA fails. We provide a general relative error bound for CUR approximations created with geometrical sampling. Finally, we evaluate the performance of our algorithms on traditional BEM problemsdefined over different geometries.Dans cet article, nous présentons des algorithmes pour créer une approximation de rang faible de type CUR pour des matrices résultant de la discrétisation des équations intégrales par la méthode des éléments de frontière (BEM). Notre approche consiste à utiliser l’information sur la géométrie du problème pour choisir des colonnes et des lignes les plus représentatives de la matrice. Nous montrons que notre algorithme principal, dont le coût est linéaire, a la même précision que des méthodes, ayant coût quadratique, comme QRCP et Approximation Adaptative Croisée (ACA) avec pivotage complet. Nous présentons des expériences numériques sur des domaines complexes en utilisant des noyaux intégrales fréquemment utilisés dans la littérature
On Updating Preconditioners for the Iterative Solution of Linear Systems
El tema principal de esta tesis es el desarrollo de técnicas de actualización de precondicionadores para resolver sistemas lineales de gran tamaño y dispersos Ax=b mediante el uso de métodos iterativos de Krylov. Se consideran dos tipos interesantes de problemas. En el primero se estudia la solución iterativa de sistemas lineales no singulares y antisimétricos, donde la matriz de coeficientes A tiene parte antisimétrica de rango bajo o puede aproximarse bien con una matriz antisimétrica de rango bajo. Sistemas como este surgen de la discretización de PDEs con ciertas condiciones de frontera de Neumann, la discretización de ecuaciones integrales y métodos de puntos interiores, por ejemplo, el problema de Bratu y la ecuación integral de Love. El segundo tipo de sistemas lineales considerados son problemas de mínimos cuadrados (LS) que se resuelven considerando la solución del sistema equivalente de ecuaciones normales. Concretamente, consideramos la solución de problemas LS modificados y de rango incompleto. Por problema LS modificado se entiende que el conjunto de ecuaciones lineales se actualiza con alguna información nueva, se agrega una nueva variable o, por el contrario, se elimina alguna información o variable del conjunto. En los problemas LS de rango deficiente, la matriz de coeficientes no tiene rango completo, lo que dificulta el cálculo de una factorización incompleta de las ecuaciones normales. Los problemas LS surgen en muchas aplicaciones a gran escala de la ciencia y la ingeniería como, por ejemplo, redes neuronales, programación lineal, sismología de exploración o procesamiento de imágenes.
Los precondicionadores directos para métodos iterativos usados habitualmente son las factorizaciones incompletas LU, o de Cholesky cuando la matriz es simétrica definida positiva. La principal contribución de esta tesis es el desarrollo de técnicas de actualización de precondicionadores. Básicamente, el método consiste en el cálculo de una descomposición incompleta para un sistema lineal aumentado equivalente, que se utiliza como precondicionador para el problema original.
El estudio teórico y los resultados numéricos presentados en esta tesis muestran el rendimiento de la técnica de precondicionamiento propuesta y su competitividad en comparación con otros métodos disponibles en la literatura para calcular precondicionadores para los problemas estudiados.The main topic of this thesis is updating preconditioners for solving large sparse linear systems Ax=b by using Krylov iterative methods. Two interesting types of problems are considered. In the first one is studied the iterative solution of non-singular, non-symmetric linear systems where the coefficient matrix A has a skew-symmetric part of low-rank or can be well approximated with a skew-symmetric low-rank matrix. Systems like this arise from the discretization of PDEs with certain Neumann boundary conditions, the discretization of integral equations as well as path following methods, for example, the Bratu problem and the Love's integral equation. The second type of linear systems considered are least squares (LS) problems that are solved by considering the solution of the equivalent normal equations system. More precisely, we consider the solution of modified and rank deficient LS problems. By modified LS problem, it is understood that the set of linear relations is updated with some new information, a new variable is added or, contrarily, some information or variable is removed from the set. Rank deficient LS problems are characterized by a coefficient matrix that has not full rank, which makes difficult the computation of an incomplete factorization of the normal equations. LS problems arise in many large-scale applications of the science and engineering as for instance neural networks, linear programming, exploration seismology or image processing.
Usually, incomplete LU or incomplete Cholesky factorization are used as preconditioners for iterative methods. The main contribution of this thesis is the development of a technique for updating preconditioners by bordering. It consists in the computation of an approximate decomposition for an equivalent augmented linear system, that is used as preconditioner for the original problem.
The theoretical study and the results of the numerical experiments presented in this thesis show the performance of the preconditioner technique proposed and its competitiveness compared with other methods available in the literature for computing preconditioners for the problems studied.El tema principal d'esta tesi és actualitzar precondicionadors per a resoldre sistemes lineals grans i buits Ax=b per mitjà de l'ús de mètodes iteratius de Krylov. Es consideren dos tipus interessants de problemes. En el primer s'estudia la solució iterativa de sistemes lineals no singulars i antisimètrics, on la matriu de coeficients A té una part antisimètrica de baix rang, o bé pot aproximar-se amb una matriu antisimètrica de baix rang. Sistemes com este sorgixen de la discretització de PDEs amb certes condicions de frontera de Neumann, la discretització d'equacions integrals i mètodes de punts interiors, per exemple, el problema de Bratu i l'equació integral de Love. El segon tipus de sistemes lineals considerats, són problemes de mínims quadrats (LS) que es resolen considerant la solució del sistema equivalent d'equacions normals. Concretament, considerem la solució de problemes de LS modificats i de rang incomplet. Per problema LS modificat, s'entén que el conjunt d'equacions lineals s'actualitza amb alguna informació nova, s'agrega una nova variable o, al contrari, s'elimina alguna informació o variable del conjunt. En els problemes LS de rang deficient, la matriu de coeficients no té rang complet, la qual cosa dificultata el calcul d'una factorització incompleta de les equacions normals. Els problemes LS sorgixen en moltes aplicacions a gran escala de la ciència i l'enginyeria com, per exemple, xarxes neuronals, programació lineal, sismologia d'exploració o processament d'imatges.
Els precondicionadors directes per a mètodes iteratius utilitzats més a sovint són les factoritzacions incompletes tipus ILU, o la factorització incompleta de Cholesky quan la matriu és simètrica definida positiva. La principal contribució d'esta tesi és el desenvolupament de tècniques d'actualització de precondicionadors. Bàsicament, el mètode consistix en el càlcul d'una descomposició incompleta per a un sistema lineal augmentat equivalent, que s'utilitza com a precondicionador pel problema original.
L'estudi teòric i els resultats numèrics presentats en esta tesi mostren el rendiment de la tècnica de precondicionament proposta i la seua competitivitat en comparació amb altres mètodes disponibles en la literatura per a calcular precondicionadors per als problemes considerats.Guerrero Flores, DJ. (2018). On Updating Preconditioners for the Iterative Solution of Linear Systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/10492
Recommended from our members
Two Geometric Results regarding Hölder-Brascamp-Lieb Inequalities, and Two Novel Algorithms for Low-Rank Approximation
Broadly speaking, this thesis investigates mathematical questions motivated by computer science. The involved topics include communication avoiding algorithms, classical analysis, convex geometry, and low-rank matrix approximation. In total, the thesis consists of four self-contained sections, each adapted from papers the author has been a part of.The first two sections are both motivated by the Brascamp-Lieb inequalities, which are also often referred to as Hölder-Brascamp-Lieb inequalities. These inequalities have featured prominently in recent theoretical computer science work, due to connections to geometric complexity theory, harmonic analysis, communication-avoidance, and many other areas. Moreover, work generalizing the inequalities in various ways, such as to nonlinear versions, has been impactful to the study of differential equations.Section 1 studies the application of Hölder-Brascamp-Lieb (HBL) inequalities to the design of communication optimal algorithms. In particular, it describes optimal tiling (blocking) strategies for nested loops that lack data dependencies and exhibit affine memory access patterns. The problem roughly amounts to maximizing the volume of an object provided some of its linear images have bounded volume. The methods used are algorithmic.Another reason for the interest in these inequalities is because they are an interesting test case for non-convex optimization techniques. The optimal constant for a particular instance of the inequality is given by solving a non-convex optimization problem that is still highly structured. Of particular relevance to this thesis is that it can be formulated as a geodesically-convex problem, considered in the context of the manifold of positive definite matrices of determinant . Even using the methods of Section 1, the procedure is not necessarily polynomial time, and this motivates further study of geodesic convexity.This lead to the work of Section 2, which discusses a notion of halfspace for Hadamard manifolds that is natural in the context of convex optimization. For this notion of halfspace, we generalize a classic result of Grunbaum, which itself is a corollary of Helly's theorem. Namely, given a probability distribution on the manifold, there is a point for which all halfspaces based at this point have at least 1/(n+1) of the mass, n being the dimension of the manifold. As an application, the gradient oracle complexity of geodesic convex optimization is polynomial in the parameters defining the problem. In particular it is polynomial in -log(epsilon), where epsilon is the desired error. This is a step toward the open question of whether such an algorithm exists.The remaining two sections of the paper present a different research direction, randomized numerical linear algebra. Numerical linear algebra has long been an important part of scientific computing. Due to the current trend of increasing matrix sizes and growing importance of fast, approximate solutions in industry, randomized methods are quickly increasing in popularity. Sections 3 and 4 in this thesis aim to show that randomized low-rank approximation algorithms satisfy many of the properties of classical rank-revealing factorizations.Section 3 introduces a Generalized Randomized QR-decomposition (RURV) that may be applied to arbitrary products of matrices and their inverses, without needing to explicitly compute the products or inverses. This factorization is a critical part of a communication-optimal spectral divide-and-conquer algorithm for the nonsymmetric eigenvalue problem. In this paper, we establish that this randomized QR-factorization satisfies the strong rank-revealing properties. We also formally prove its stability, making it suitable in applications. Finally, we present numerical experiments which demonstrate that our theoretical bounds capture the empirical behavior of the factorization.Section 4 concerns a Generalized LU-Factorization (GLU) for low-rank matrix approximation. We relate this to past approaches and extensively analyze its approximation properties. The established deterministic guarantees are combined with sketching ensembles satisfying Johnson-Lindenstrauss properties to present complete bounds. Particularly good performance is shown for the sub-sampled randomized Hadamard transform (SRHT) ensemble. Moreover, the factorization is shown to unify and generalize many past algorithms. It also helps to explain the effect of sketching on the growth factor during Gaussian Elimination
Low Rank Approximation of a Sparse Matrix Based on LU Factorization with Column and Row Tournament Pivoting
International audiencen this paper we present an algorithm for computing a low rank approximation of a sparse matrix based on a truncated LU factorization with column and row permutations. We present various approaches for determining the column and row permutations that show a trade-off between speed versus deterministic/probabilistic accuracy. We show that if the permutations are chosen by using tournament pivoting based on QR factorization, then the obtained truncated LU factorization with column/row tournament pivoting, LU_CRTP, satisfies bounds on the singular values which have similarities with the ones obtained by a communication avoiding rank revealing QR factorization. Experiments on challenging matrices show that LU_CRTP provides a good low rank approximation of the input matrix and it is less expensive than the rank revealing QR factorization in terms of computational and memory usage costs, while also minimizing the communication cost. We also compare the computational complexity of our algorithm with randomized algorithms and show that for sparse matrices and high enough but still modest accuracies, our approach is faster
Parallel Tensor Train through Hierarchical Decomposition
We consider the problem of developing parallel decomposition and approximation algorithms for high dimensional tensors. We focus on a tensor representation named Tensor Train (TT). It stores a d-dimensional tensor in O(ndr^2), much less than the O(n^d) entries in the original tensor, where 'r' is usually a very small number and depends on the application. Sequential algorithms to compute TT decomposition and TT approximation of a tensor have been proposed in the literature. Here we propose a parallel algorithm to compute TT decomposition of a tensor. We prove that the ranks of TT-representation produced by our algorithm are bounded by the ranks of unfolding matrices of the tensor. Additionally, we propose a parallel algorithm to compute approximation of a tensor in TT-representation. Our algorithm relies on a hierarchical partitioning of the dimensions of the tensor in a balanced binary tree shape and transmission of leading singular values of associated unfolding matrix from the parent to its children. We consider several approaches on the basis of how leading singular values are transmitted in the tree. We present an in-depth experimental analysis of our approaches for different low rank tensors and also assess them for tensors obtained from quantum chemistry simulations. Our results show that the approach which transmits leading singular values to both of its children performs better in practice. Compression ratios and accuracies of the approximations obtained by our approaches are comparable with the sequential algorithm and, in some cases, even better than that. We also show that our algorithms transmit only O(log^2(P)log(d)) number of messages along the critical path for a d-dimensional tensor on P processors. The lower bound on the number of messages for any algorithm which exchanges data on P processors is log(P), and our algorithms achieve this bound, modulo polylog factor