16 research outputs found
A Parallel Implementation of the Invariant Subspace Decomposition Algorithm for Dense Symmetric Matrices
. We give an overview of the Invariant Subspace Decomposition Algorithm for dense symmetric matrices (SYISDA) by first describing the algorithm, followed by a discussion of a parallel implementation of SYISDA on the Intel Delta. Our implementation utilizes an optimized parallel matrix multiplication implementation we have developed. Load balancing in the costly early stages of the algorithm is accomplished without redistribution of data between stages through the use of the block scattered decomposition. Computation of the invariant subspaces at each stage is done using a new tridiagonalization scheme due to Bischof and Sun. 1. Introduction Computation of all the eigenvalues and eigenvectors of a dense symmetric matrix is an essential kernel in many applications. The ever-increasing computational power available from parallel computers offers the potential for solving much larger problems than could have been contemplated previously. Hardware scalability of parallel machines is freque..
A Parallelizable Eigensolver for Real Diagonalizable Matrices with Real Eigenvalues
. In this paper, preliminary research results on a new algorithm for finding all the eigenvalues and eigenvectors of a real diagonalizable matrix with real eigenvalues are presented. The basic mathematical theory behind this approach is reviewed and is followed by a discussion of the numerical considerations of the actual implementation. The numerical algorithm has been tested on thousands of matrices on both a Cray-2 and an IBM RS/6000 Model 580 workstation. The results of these tests are presented. Finally, issues concerning the parallel implementation of the algorithm are discussed. The algorithm's heavy reliance on matrix-matrix multiplication, coupled with the divide and conquer nature of this algorithm, should yield a highly parallelizable algorithm. 1. Introduction. Computation of all the eigenvalues and eigenvectors of a dense matrix is essential for solving problems in many fields. The ever-increasing computational power available from modern supercomputers offers the potenti..
The PRISM Project: Infrastructure and Algorithms for Parallel Eigensolvers
The goal of the PRISM project is the development of infrastructure and algorithms for the parallel solution of eigenvalue problems. We are currently investigating a complete eigensolver based on the Invariant Subspace Decomposition Algorithm for dense symmetric matrices (SYISDA). After briefly reviewing SYISDA, we discuss the algorithmic highlights of a distributed-memory implementation of this approach. These include a fast matrix-matrix multiplication algorithm, a new approach to parallel band reduction and tridiagonalization, and a harness for coordinating the divide-and-conquer parallelism in the problem. We also present performance results of these kernels as well as the overall SYISDA implementation on the Intel Touchstone Delta prototype. 1. Introduction Computation of eigenvalues and eigenvectors is an essential kernel in many applications, and several promising parallel algorithms have been investigated [29, 24, 3, 27, 21]. The work presented in this paper is part of the PRI..
Comparison of Scalable Parallel Matrix Multiplication Libraries
This paper compares two general library routines for performing parallel distributed matrix multiplication. The PUMMA algorithm utilizes block scattered data layout, whereas BiMMeR utilizes virtual 2-D torus wrap. The algorithmic differences resulting from these different layouts are discussed as well as the general issues associated with different data layouts for library routines. Results on the Intel Delta for the two matrix multiplication algorithms are presented. 1. Introduction Matrix multiplication is a standard algorithm that is an important computational kernel in many applications including eigensolvers [3] and LU factorization [15]. Utilizing matrix multiplication is one of the principal ways of achieving high efficiency block algorithms in packages such as LAPACK [2]. The BLAS 3 routines were added to achieve this block performance on computers, and optimized versions are available on most serial machines [10]. For matrix multiplication, the BLAS 3 routine XGEMM is availa..
Parallel Spectral Division Via The Generalized Matrix Sign Function
. In this paper we demonstrate the parallelism of the spectral division via the matrix sign function for the generalized nonsymmetric eigenproblem. We employ the so-called generalized Newton iterative scheme in order to compute the sign function of a matrix pair. A recent study has allowed considerable reduction (by 75%) in the computational cost of this iteration, making this approach competitive when compared to the traditional QZ algorithm. The matrix sign function is thus revealed as an efficient and reliable spectral division method for applications that only require partial information of the eigenspectrum. For applications which require complete information of the eigendistribution, the matrix sign function can be used as an initial divide-and-conquer method, combined with the QZ algorithm for the last stages. The experimental results on an IBM SP2 multicomputer demonstrate the parallel performance (efficiency around 60--80%) and scalability of this approach. Key words. General..
Parallel Studies of the Invariant Subspace Decomposition Approach for Banded Symmetric Matrices
We present an overview of the banded Invariant Subspace Decomposition Algorithm for symmetric matrices and describe a parallel implementation of this algorithm. The algorithm described here is a promising variant of the Invariant Subspace Decomposition Algorithm for dense symmetric matrices (SYISDA) that retains the property of using scalable primitives, while requiring significantly less overall computation than SYISDA. 1 Introduction Computation of eigenvalues and eigenvectors is an essential kernel in many applications, and several promising parallel algorithms have been investigated. The work presented in this paper is part of the PRISM (Parallel Research on Invariant Subspace Methods) Project, which involves researchers from Argonne National Laboratory, the Supercomputing Research Center, the University of California at Berkeley, and the University of Kentucky. The goal of the PRISM project is the development of algorithms and software for solving large-scale eigenvalue problems ..
A Case Study of MPI: Portable and Efficient Libraries*
In this paper, we discuss the performance achieved by several implementations of the recently defined Message Passing Interface (MPI) standard. In particular, performance results for different implementations of the broadcast operation are analyzed and compared on the Delta, Paragon, SP1 and CM5. 1 Introduction For the past several years, members of the Parallel Research on Invariant Subspace Methods (PRISM) project have been investigating scalable parallel eigensolvers for distributed memory systems [1, 3]. The ultimate objective of this research is the development of portable and efficient libraries for this fundamental numerical linear algebra kernel. In the course of our work, we, like many other library developers, have been faced with many issues relating to portable programming. Previously, a notable obstacle to library development was the lack of standardization in message passing, from both a programming and a functional point of view. This lack of standardization made it dif..