35 research outputs found

    Analytical Methods for Structured Matrix Computations

    Get PDF
    The design of fast algorithms is not only about achieving faster speeds but also about retaining the ability to control the error and numerical stability. This is crucial to the reliability of computed numerical solutions. This dissertation studies topics related to structured matrix computations with an emphasis on their numerical analysis aspects and algorithms. The methods discussed here are all based on rich analytical results that are mathematically justified. In chapter 2, we present a series of comprehensive error analyses to an analytical matrix compression method and it serves as a theoretical explanation of the proxy point method. These results are also important instructions on optimizing the performance. In chapter 3, we propose a non-Hermitian eigensolver by combining HSS matrix techniques with a contour-integral based method. Moreover, probabilistic analysis enables further acceleration of the method in addition to manipulating the HSS representation algebraically. An application of the HSS matrix is discussed in chapter 4 where we design a structured preconditioner for linear systems generated by AIIM. We improve the numerical stability for the matrix-free HSS construction process and make some additional modifications tailored to this particular problem

    GENERALIZATIONS OF AN INVERSE FREE KRYLOV SUBSPACE METHOD FOR THE SYMMETRIC GENERALIZED EIGENVALUE PROBLEM

    Get PDF
    Symmetric generalized eigenvalue problems arise in many physical applications and frequently only a few of the eigenpairs are of interest. Typically, the problems are large and sparse, and therefore traditional methods such as the QZ algorithm may not be considered. Moreover, it may be impractical to apply shift-and-invert Lanczos, a favored method for problems of this type, due to difficulties in applying the inverse of the shifted matrix. With these difficulties in mind, Golub and Ye developed an inverse free Krylov subspace algorithm for the symmetric generalized eigenvalue problem. This method does not rely on shift-and-invert transformations for convergence acceleration, but rather a preconditioner is used. The algorithm suffers, however, in the presence of multiple or clustered eigenvalues. Also, it is only applicable to the location of extreme eigenvalues. In this work, we extend the method of Golub and Ye by developing a block generalization of their algorithm which enjoys considerably faster convergence than the usual method in the presence of multiplicities and clusters. Preconditioning techniques for the problems are discussed at length, and some insight is given into how these preconditioners accelerate the method. Finally we discuss a transformation which can be applied so that the algorithm extracts interior eigenvalues. A preconditioner based on a QR factorization with respect to the B-1 inner product is developed and applied in locating interior eigenvalues

    Spectral two-level preconditioners for sequences of linear systems

    Get PDF
    De nombreuses simulations numériques nécessitent la résolution d'une série de systèmes linéaires impliquant une même matrice mais des second-membres différents. Des méthodes efficaces pour ce type de problèmes cherchent à tirer bénéfice des résolutions précédentes pour accélérer les résolutions restantes. Deux grandes classes se distinguent dans la façon de procéder: la première vise à réutiliser une partie du sous-espace de Krylov, la deuxième à construire une mise à jour du préconditionneur à partir de vecteurs approximant un espace invariant. Dans cette thèse, nous nous sommes intéressés à cette dernière approche en cherchant à améliorer le préconditionneur d'origine. Dans une première partie, une seule mise à jour du préconditionneur est considérée pour tous les systèmes. Cette mise à jour consiste en une correction spectrale de rang faible qui permet de translater de un la position des plus petites valeurs propres en module de la matrice du système préconditionné de départ. Des expérimentations numériques sont réalisées en utilisant la méthode GMRES couplée à un préconditionneur de type inverse approchée. L'information spectrale est obtenue par un solveur de valeurs propres lors d'une phase préliminaire au calcul. Dans une deuxième partie, on autorise une possible mise à jour entre chaque système. Une correction spectrale incrémentale est proposée. Des expérimentations numériques sont réalisées en utilisant la méthode GMRES-DR, d'une part parce qu'elle est efficace en tant que solveur linéaire, et d'autre part parce qu'elle permet une bonne approximation des petites valeurs propres au cours de la résolution linéaire. Des stratégies sont développées afin de sélectionner l'information spectrale la plus pertinente. Ces approches ont été validées sur des problèmes de grande taille issus de simulations industrielles en électromagnétisme. Dans ce but, elles ont été implantées dans un code parallèle développé par EADS-CCR. ABSTRACT : Many numerical simulations in scientific and engineering applications require the solution of a set of large linear systems involving the same coefficient matrix but different right-hand sides. Efficient methods for tackling this problem attempt to benefit from the previously solved right-hand sides for the solution of the next ones. This goal can be achieved either by recycling Krylov subspaces or by building preconditioner updates based on near invariant subspace information. In this thesis, we focus our attention on this last approach that attempts to improve a selected preconditioner. In the first part, we consider only one update of the preconditioner for all the systems. This update consists of a spectral low-rank correction that shifts by one the smallest eigenvalues in magnitude of the matrix of the original preconditioned system. We perform experiments in the context of the GMRES method preconditioned by an approximate inverse preconditioner. The spectral information is computed by an eigensolver in a preprocessing phase. In the second part, we consider an update of the preconditioner between each system. An incremental spectral correction of the preconditioner is proposed. We perform experiments using the GMRES-DR method, thanks to its efficiency as a linear solver and its ability to recover reliable approximations of the desired eigenpairs at run time. Suitable strategies are investigated for selecting reliable eigenpairs. The efficiency of the proposed approaches is in particular assessed for the solution of large and challenging problems in electromagnetic applications. For this purpose, they have been implemented in a parallel industrial code developed by EADS-CCR

    High-Performance Software for Quantum Chemistry and Hierarchical Matrices

    Get PDF
    Linear algebra is the underpinning of a significant portion of the computation done in the modern age. Applications relying on linear algebra include physical and chemical simulations, machine learning, artificial intelligence, optimization, partial differential equations, and many more. However, the direct use of mathematically exact linear algebra is often infeasible for the large problems of today. Numerical and iterative methods provide a way of solving the underlying problems only to the required accuracy, allowing problems that are many magnitudes larger to be solved magnitudes more quickly than if the problems were to be solved using exact linear algebra. In this dissertation, we discuss, test existing methods, and develop new high-performance numerical methods for scientific computing kernels, including matrix-multiplications, linear solves, and eigensolves, which accelerate applications including Gaussian processes and quantum chemistry simulations. Notably, we use preconditioned hierarchical matrices for the hyperparameter optimization and prediction phases of Gaussian process regression, develop a sparse triple matrix product on GPUs, and investigate 3D matrix-matrix multiplications for Chebyshev-filtered subspace iteration for Kohn-Sham density functional theory calculations. The exploitation of the structural sparsity of many practical scientific problems can achieve a significant speedup over the dense formulations of the same problems. Even so, many problems cannot be accurately represented or approximated in a structurally sparse manner. Many of these problems, such as kernels arising from machine learning and the Electronic-Repulsion-Integral (ERI) matrices from electronic structure computations, can be accurately represented in data-sparse structures, which allows for rapid calculations. We investigate hierarchical matrices, which provide a data-sparse representation of kernel matrices. In particular, our SMASH approximation can construct and provide matrix multiplications in near-linear time, which can then be used in matrix-free methods to find the optimal hyperparameters for Gaussian processes and to do prediction asymptotically more rapidly than direct methods. To accelerate the use of hierarchical matrices further, we provide a data-driven approach (where we consider the distribution of the data points associated with a kernel matrix) that reduces a given problem's memory and computation requirements. Furthermore, we investigate the use of preconditioning in Gaussian process regression. We can use matrix-free algorithms for hyperparameter optimization and prediction phases of Gaussian process. This provides a framework for Gaussian process regression that scales to large-scale problems and is asymptotically faster than state-of-the-art methods. We provide an exploration and analysis of the conditioning and numerical issues that arise from the near-rank-deficient matrices that occur during hyperparameter optimizations. Density Functional Theory (DFT) is a valuable method for electronic structure calculations for simulating quantum chemical systems due to its high accuracy to cost ratio. However, even with the computational power of modern computers, the O(n^3) complexity of the eigensolves and other kernels mandate that new methods are developed to allow larger problems to be solved. Two promising methods for tackling these problems are using modern architectures (including state-of-the-art accelerators and multicore systems) and 3D matrix-multiplication algorithms. We investigate these methods to determine if using these methods will result in an overall speedup. Using these kernels, we provide a high-performance framework for Chebyshev-filtered subspace iteration. GPUs are a family of accelerators that provide immense computational power but must be used correctly to achieve good efficiency. In algebraic multigrid, there arises a sparse triple matrix product, which due to the sparse (and relatively unstructured) nature, is challenging to perform efficiently on GPUs, and is typically done as two successive matrix-matrix products. However, by doing a single triple-matrix product, reducing the overhead associated with sparse matrix-matrix products on the GPU may be possible. We develop a sparse triple-matrix product that reduces the computation time required for a few classes of problems.Ph.D

    Optimized Sparse Matrix Operations for Reverse Mode Automatic Differentiation

    Full text link
    Sparse matrix representations are ubiquitous in computational science and machine learning, leading to significant reductions in compute time, in comparison to dense representation, for problems that have local connectivity. The adoption of sparse representation in leading ML frameworks such as PyTorch is incomplete, however, with support for both automatic differentiation and GPU acceleration missing. In this work, we present an implementation of a CSR-based sparse matrix wrapper for PyTorch with CUDA acceleration for basic matrix operations, as well as automatic differentiability. We also present several applications of the resulting sparse kernels to optimization problems, demonstrating ease of implementation and performance measurements versus their dense counterparts

    Approximation spectrale de matrices issues d opérateurs discrétisés

    Get PDF
    Cette thèse considère la solution numérique d'un problème aux valeurs propres de grandes dimensions, dans lequel l'opérateur est dérivé d'un problème de transfert radiatif. Ainsi, cette thèse étudie l'utilisation de matrices hiérarchiques, une représentation efficace de tableaux, très intéressante pour une utilisation avec des problèmes de grandes dimensions. Les matrices sont des représentations hiérarchiques de structures de données efficaces pour les matrices denses, l'idée de base étant la division d'une matrice en une hiérarchie de blocs et l approximation de certains blocs par une matrice de petite caractéristique. Son utilisation permet de diminuer la mémoire nécessaire tout en réduisant les coûts informatiques. L'application de l'utilisation de matrices hiérarchique est analysée dans le contexte de la solution numérique d'un problème aux valeurs propres de grandes dimensions résultant de la discrétisation d'un opérateur intégral. L'opérateur est de convolution et est défini par la première fonction exponentielle intégrale, donc faiblement singulière. Pour le calcul informatique, nous avons accès à HLIB (Hierarchical matrices LIBrary) qui fournit des routines pour la construction de la structure hiérarchique des matrices et des algorithmes pour les opérations approximative avec ces matrices. Nous incorporons certaines routines comme la multiplication matrice-vecteur ou la decomposition LU, en SLEPc (Hierarchical matrices LIBrary) pour explorer les algorithmes existants afin de résoudre les problèmes de valeur propre. Nous développons aussi des expressions analytiques pour l'approximation des noyaux dégénérés utilisés dans la thèse et déduire ainsi les limites supérieures d'erreur pour ces approximations. Les résultats numériques obtenus avec d'autres techniques pour résoudre le problème en question sont utilisés pour la comparaison avec ceux obtenus avec la nouvelle technique, illustrant l'efficacité de ce dernierIn this thesis, we consider the numerical solution of a large eigenvalue problem in which the integral operator comes from a radiative transfer problem. It is considered the use of hierarchical matrices, an efficient data-sparse representation of matrices, especially useful for large dimensional problems. It consists on low-rank subblocks leading to low memory requirements as well as cheap computational costs. We discuss the use of the hierarchical matrix technique in the numerical solution of a large scale eigenvalue problem arising from a finite rank discretization of an integral operator. The operator is of convolution type, it is defined through the first exponential-integral function and hence it is weakly singular. We access HLIB (Hierarchical matrices LIBrary) that provides, among others, routines for the construction of hierarchical matrix structures and arithmetic algorithms to perform approximative matrix operations. Moreover, it is incorporated the matrix-vector multiply routines from HLIB, as well as LU factorization for preconditioning, into SLEPc (Scalable Library for Eigenvalue Problem Computations) in order to exploit the available algorithms to solve eigenvalue problems. It is also developed analytical expressions for the approximate degenerate kernels and deducted error upper bounds for these approximations. The numerical results obtained with other approaches to solve the problem are used to compare with the ones obtained with this technique, illustrating the efficiency of the techniques developed and implemented in this workST ETIENNE-Bib. électronique (422189901) / SudocSudocFranceF

    Simultaneous-FETI and Block-FETI: robust domain decomposition with multiple search directions.

    Get PDF
    International audienceDomain Decomposition methods often exhibit very poor performance when applied to engineering problems with large heterogeneities. In particular for heterogeneities along domain interfaces the iterative techniques to solve the interface problem are lacking an efficient preconditioner. Recently a robust approach, named FETI-Geneo, was proposed where troublesome modes are precomputed and deflated from the interface problem. The cost of the FETI-Geneo is however high. We propose in this paper techniques that share similar ideas with FETI-Geneo but where no pre-processing is needed and that can be easily and efficiently implemented as an alternative to standard Domain Decomposition methods. In the block iterative approaches presented in this paper, the search space at every iteration on the interface problem contains as many directions as there are domains in the decomposition. Those search directions originate either from the domain-wise preconditioner (in the Simultaneous FETI method) or from the block structure of the right-hand side of the interface problem (Block FETI). We show on 2D structural examples that both methods are robust and provide good convergence in the presence of high heterogeneities, even when the interface is jagged or when the domains have a bad aspect ratio. The Simultaneous FETI was also efficiently implemented in an optimized parallel code and exhibited excellent performance compared to the regular FETI method
    corecore