32 research outputs found

    Approximation de rang faible pour les matrices creuses

    Get PDF
    In this paper we present an algorithm for computing a low rank approximation of a sparse matrix based on a truncated LU factorization with column and row permutations. We present various approaches for determining the column and row permutations that show a trade-off between speed versus deterministic/probabilistic accuracy. We show that if the permutations are chosen by using tournament pivoting based on QR factorization, then the obtained truncated LU factorization with column/row tournament pivoting, LU\_CRTP, satisfies bounds on the singular values which have similarities with the ones obtained by a communication avoiding rank revealing QR factorization. Experiments on challenging matrices show that LU_CRTP provides a good low rank approximation of the input matrix and it is less expensive than the rank revealing QR factorization in terms of computational and memory usage costs, while also minimizing the communication cost. We also compare the computational complexity of our algorithm with randomizedalgorithms and show that for sparse matrices and high enough but still modest accuracies, our approach is faster.Ce papier introduit un algorithme pour calculer une approximation de rang faible d’une matrice creuse. Cet algorithme est basé sur une factorisation LU avec des permutations de lignes et de colonnes

    Reducing Communication in the Solution of Linear Systems

    Get PDF
    There is a growing performance gap between computation and communication on modern computers, making it crucial to develop algorithms with lower latency and bandwidth requirements. Because systems of linear equations are important for numerous scientific and engineering applications, I have studied several approaches for reducing communication in those problems. First, I developed optimizations to dense LU with partial pivoting, which downstream applications can adopt with little to no effort. Second, I consider two techniques to completely replace pivoting in dense LU, which can provide significantly higher speedups, albeit without the same numerical guarantees as partial pivoting. One technique uses randomized preprocessing, while the other is a novel combination of block factorization and additive perturbation. Finally, I investigate using mixed precision in GMRES for solving sparse systems, which reduces the volume of data movement, and thus, the pressure on the memory bandwidth

    Linear-time CUR approximation of BEM matrices

    Get PDF
    International audienceIn this paper we propose linear-time CUR approximation algorithms for admissible matrices obtained from the hierarchical form of Boundary Element matrices. We propose a new approach called geometric sampling to obtain indices of most significant rows and columns usinginformation from the domains where the problem is posed. Our strategy is tailored to Boundary Element Methods (BEM) since it uses directly and explicitly the cluster tree containing information from the problem geometry. Our CUR algorithm has precision comparable with low-rankapproximations created with the truncated QR factorization with column pivoting (QRCP) and the Adaptive Cross Approximation (ACA) with full pivoting, which are quadratic-cost methods. When compared to the well-known linear-time algorithm ACA with partial pivoting, we show that our algorithm improves, in general, the convergence error and overcomes some cases where ACA fails. We provide a general relative error bound for CUR approximations created with geometrical sampling. Finally, we evaluate the performance of our algorithms on traditional BEM problemsdefined over different geometries.Dans cet article, nous présentons des algorithmes pour créer une approximation de rang faible de type CUR pour des matrices résultant de la discrétisation des équations intégrales par la méthode des éléments de frontière (BEM). Notre approche consiste à utiliser l’information sur la géométrie du problème pour choisir des colonnes et des lignes les plus représentatives de la matrice. Nous montrons que notre algorithme principal, dont le coût est linéaire, a la même précision que des méthodes, ayant coût quadratique, comme QRCP et Approximation Adaptative Croisée (ACA) avec pivotage complet. Nous présentons des expériences numériques sur des domaines complexes en utilisant des noyaux intégrales fréquemment utilisés dans la littérature

    On Updating Preconditioners for the Iterative Solution of Linear Systems

    Full text link
    El tema principal de esta tesis es el desarrollo de técnicas de actualización de precondicionadores para resolver sistemas lineales de gran tamaño y dispersos Ax=b mediante el uso de métodos iterativos de Krylov. Se consideran dos tipos interesantes de problemas. En el primero se estudia la solución iterativa de sistemas lineales no singulares y antisimétricos, donde la matriz de coeficientes A tiene parte antisimétrica de rango bajo o puede aproximarse bien con una matriz antisimétrica de rango bajo. Sistemas como este surgen de la discretización de PDEs con ciertas condiciones de frontera de Neumann, la discretización de ecuaciones integrales y métodos de puntos interiores, por ejemplo, el problema de Bratu y la ecuación integral de Love. El segundo tipo de sistemas lineales considerados son problemas de mínimos cuadrados (LS) que se resuelven considerando la solución del sistema equivalente de ecuaciones normales. Concretamente, consideramos la solución de problemas LS modificados y de rango incompleto. Por problema LS modificado se entiende que el conjunto de ecuaciones lineales se actualiza con alguna información nueva, se agrega una nueva variable o, por el contrario, se elimina alguna información o variable del conjunto. En los problemas LS de rango deficiente, la matriz de coeficientes no tiene rango completo, lo que dificulta el cálculo de una factorización incompleta de las ecuaciones normales. Los problemas LS surgen en muchas aplicaciones a gran escala de la ciencia y la ingeniería como, por ejemplo, redes neuronales, programación lineal, sismología de exploración o procesamiento de imágenes. Los precondicionadores directos para métodos iterativos usados habitualmente son las factorizaciones incompletas LU, o de Cholesky cuando la matriz es simétrica definida positiva. La principal contribución de esta tesis es el desarrollo de técnicas de actualización de precondicionadores. Básicamente, el método consiste en el cálculo de una descomposición incompleta para un sistema lineal aumentado equivalente, que se utiliza como precondicionador para el problema original. El estudio teórico y los resultados numéricos presentados en esta tesis muestran el rendimiento de la técnica de precondicionamiento propuesta y su competitividad en comparación con otros métodos disponibles en la literatura para calcular precondicionadores para los problemas estudiados.The main topic of this thesis is updating preconditioners for solving large sparse linear systems Ax=b by using Krylov iterative methods. Two interesting types of problems are considered. In the first one is studied the iterative solution of non-singular, non-symmetric linear systems where the coefficient matrix A has a skew-symmetric part of low-rank or can be well approximated with a skew-symmetric low-rank matrix. Systems like this arise from the discretization of PDEs with certain Neumann boundary conditions, the discretization of integral equations as well as path following methods, for example, the Bratu problem and the Love's integral equation. The second type of linear systems considered are least squares (LS) problems that are solved by considering the solution of the equivalent normal equations system. More precisely, we consider the solution of modified and rank deficient LS problems. By modified LS problem, it is understood that the set of linear relations is updated with some new information, a new variable is added or, contrarily, some information or variable is removed from the set. Rank deficient LS problems are characterized by a coefficient matrix that has not full rank, which makes difficult the computation of an incomplete factorization of the normal equations. LS problems arise in many large-scale applications of the science and engineering as for instance neural networks, linear programming, exploration seismology or image processing. Usually, incomplete LU or incomplete Cholesky factorization are used as preconditioners for iterative methods. The main contribution of this thesis is the development of a technique for updating preconditioners by bordering. It consists in the computation of an approximate decomposition for an equivalent augmented linear system, that is used as preconditioner for the original problem. The theoretical study and the results of the numerical experiments presented in this thesis show the performance of the preconditioner technique proposed and its competitiveness compared with other methods available in the literature for computing preconditioners for the problems studied.El tema principal d'esta tesi és actualitzar precondicionadors per a resoldre sistemes lineals grans i buits Ax=b per mitjà de l'ús de mètodes iteratius de Krylov. Es consideren dos tipus interessants de problemes. En el primer s'estudia la solució iterativa de sistemes lineals no singulars i antisimètrics, on la matriu de coeficients A té una part antisimètrica de baix rang, o bé pot aproximar-se amb una matriu antisimètrica de baix rang. Sistemes com este sorgixen de la discretització de PDEs amb certes condicions de frontera de Neumann, la discretització d'equacions integrals i mètodes de punts interiors, per exemple, el problema de Bratu i l'equació integral de Love. El segon tipus de sistemes lineals considerats, són problemes de mínims quadrats (LS) que es resolen considerant la solució del sistema equivalent d'equacions normals. Concretament, considerem la solució de problemes de LS modificats i de rang incomplet. Per problema LS modificat, s'entén que el conjunt d'equacions lineals s'actualitza amb alguna informació nova, s'agrega una nova variable o, al contrari, s'elimina alguna informació o variable del conjunt. En els problemes LS de rang deficient, la matriu de coeficients no té rang complet, la qual cosa dificultata el calcul d'una factorització incompleta de les equacions normals. Els problemes LS sorgixen en moltes aplicacions a gran escala de la ciència i l'enginyeria com, per exemple, xarxes neuronals, programació lineal, sismologia d'exploració o processament d'imatges. Els precondicionadors directes per a mètodes iteratius utilitzats més a sovint són les factoritzacions incompletes tipus ILU, o la factorització incompleta de Cholesky quan la matriu és simètrica definida positiva. La principal contribució d'esta tesi és el desenvolupament de tècniques d'actualització de precondicionadors. Bàsicament, el mètode consistix en el càlcul d'una descomposició incompleta per a un sistema lineal augmentat equivalent, que s'utilitza com a precondicionador pel problema original. L'estudi teòric i els resultats numèrics presentats en esta tesi mostren el rendiment de la tècnica de precondicionament proposta i la seua competitivitat en comparació amb altres mètodes disponibles en la literatura per a calcular precondicionadors per als problemes considerats.Guerrero Flores, DJ. (2018). On Updating Preconditioners for the Iterative Solution of Linear Systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/10492

    Low Rank Approximation of a Sparse Matrix Based on LU Factorization with Column and Row Tournament Pivoting

    No full text
    International audiencen this paper we present an algorithm for computing a low rank approximation of a sparse matrix based on a truncated LU factorization with column and row permutations. We present various approaches for determining the column and row permutations that show a trade-off between speed versus deterministic/probabilistic accuracy. We show that if the permutations are chosen by using tournament pivoting based on QR factorization, then the obtained truncated LU factorization with column/row tournament pivoting, LU_CRTP, satisfies bounds on the singular values which have similarities with the ones obtained by a communication avoiding rank revealing QR factorization. Experiments on challenging matrices show that LU_CRTP provides a good low rank approximation of the input matrix and it is less expensive than the rank revealing QR factorization in terms of computational and memory usage costs, while also minimizing the communication cost. We also compare the computational complexity of our algorithm with randomized algorithms and show that for sparse matrices and high enough but still modest accuracies, our approach is faster

    Parallel Tensor Train through Hierarchical Decomposition

    Get PDF
    We consider the problem of developing parallel decomposition and approximation algorithms for high dimensional tensors. We focus on a tensor representation named Tensor Train (TT). It stores a d-dimensional tensor in O(ndr^2), much less than the O(n^d) entries in the original tensor, where 'r' is usually a very small number and depends on the application. Sequential algorithms to compute TT decomposition and TT approximation of a tensor have been proposed in the literature. Here we propose a parallel algorithm to compute TT decomposition of a tensor. We prove that the ranks of TT-representation produced by our algorithm are bounded by the ranks of unfolding matrices of the tensor. Additionally, we propose a parallel algorithm to compute approximation of a tensor in TT-representation. Our algorithm relies on a hierarchical partitioning of the dimensions of the tensor in a balanced binary tree shape and transmission of leading singular values of associated unfolding matrix from the parent to its children. We consider several approaches on the basis of how leading singular values are transmitted in the tree. We present an in-depth experimental analysis of our approaches for different low rank tensors and also assess them for tensors obtained from quantum chemistry simulations. Our results show that the approach which transmits leading singular values to both of its children performs better in practice. Compression ratios and accuracies of the approximations obtained by our approaches are comparable with the sequential algorithm and, in some cases, even better than that. We also show that our algorithms transmit only O(log^2(P)log(d)) number of messages along the critical path for a d-dimensional tensor on P processors. The lower bound on the number of messages for any algorithm which exchanges data on P processors is log(P), and our algorithms achieve this bound, modulo polylog factor