Search CORE

16 research outputs found

A Parallel Implementation of the Invariant Subspace Decomposition Algorithm for Dense Symmetric Matrices

Author: Anna Tsao
Guodong Zhang
Steven Huss-lederman
Publication venue
Publication date
Field of study

. We give an overview of the Invariant Subspace Decomposition Algorithm for dense symmetric matrices (SYISDA) by first describing the algorithm, followed by a discussion of a parallel implementation of SYISDA on the Intel Delta. Our implementation utilizes an optimized parallel matrix multiplication implementation we have developed. Load balancing in the costly early stages of the algorithm is accomplished without redistribution of data between stages through the use of the block scattered decomposition. Computation of the invariant subspaces at each stage is done using a new tridiagonalization scheme due to Bischof and Sun. 1. Introduction Computation of all the eigenvalues and eigenvectors of a dense symmetric matrix is an essential kernel in many applications. The ever-increasing computational power available from parallel computers offers the potential for solving much larger problems than could have been contemplated previously. Hardware scalability of parallel machines is freque..

CiteSeerX

A Parallelizable Eigensolver for Real Diagonalizable Matrices with Real Eigenvalues

Author: Anna Tsao
Steven Huss-lederman
Thomas Turnbull
Publication venue
Publication date
Field of study

. In this paper, preliminary research results on a new algorithm for finding all the eigenvalues and eigenvectors of a real diagonalizable matrix with real eigenvalues are presented. The basic mathematical theory behind this approach is reviewed and is followed by a discussion of the numerical considerations of the actual implementation. The numerical algorithm has been tested on thousands of matrices on both a Cray-2 and an IBM RS/6000 Model 580 workstation. The results of these tests are presented. Finally, issues concerning the parallel implementation of the algorithm are discussed. The algorithm's heavy reliance on matrix-matrix multiplication, coupled with the divide and conquer nature of this algorithm, should yield a highly parallelizable algorithm. 1. Introduction. Computation of all the eigenvalues and eigenvectors of a dense matrix is essential for solving problems in many fields. The ever-increasing computational power available from modern supercomputers offers the potenti..

CiteSeerX

Serious fun

Author: Donald Chinn
James Skrentny
Steven Huss-Lederman
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

The PRISM Project: Infrastructure and Algorithms for Parallel Eigensolvers

Author: Anna Tsao
Christian Bischof
Steven Huss-lederman
Xiaobai Sun
Publication venue
Publication date: 31/12/1993
Field of study

The goal of the PRISM project is the development of infrastructure and algorithms for the parallel solution of eigenvalue problems. We are currently investigating a complete eigensolver based on the Invariant Subspace Decomposition Algorithm for dense symmetric matrices (SYISDA). After briefly reviewing SYISDA, we discuss the algorithmic highlights of a distributed-memory implementation of this approach. These include a fast matrix-matrix multiplication algorithm, a new approach to parallel band reduction and tridiagonalization, and a harness for coordinating the divide-and-conquer parallelism in the problem. We also present performance results of these kernels as well as the overall SYISDA implementation on the Intel Touchstone Delta prototype. 1. Introduction Computation of eigenvalues and eigenvectors is an essential kernel in many applications, and several promising parallel algorithms have been investigated [29, 24, 3, 27, 21]. The work presented in this paper is part of the PRI..

CiteSeerX

UNT Digital Library

Comparison of Scalable Parallel Matrix Multiplication Libraries

Author: Anna Tsao
Elaine M. Jacobson
Steven Huss-lederman
Publication venue: Society Press
Publication date
Field of study

This paper compares two general library routines for performing parallel distributed matrix multiplication. The PUMMA algorithm utilizes block scattered data layout, whereas BiMMeR utilizes virtual 2-D torus wrap. The algorithmic differences resulting from these different layouts are discussed as well as the general issues associated with different data layouts for library routines. Results on the Intel Delta for the two matrix multiplication algorithms are presented. 1. Introduction Matrix multiplication is a standard algorithm that is an important computational kernel in many applications including eigensolvers [3] and LU factorization [15]. Utilizing matrix multiplication is one of the principal ways of achieving high efficiency block algorithms in packages such as LAPACK [2]. The BLAS 3 routines were added to achieve this block performance on computers, and optimized versions are available on most serial machines [10]. For matrix multiplication, the BLAS 3 routine XGEMM is availa..

CiteSeerX

Matrix Multiplication On The Intel Touchstone Delta

Author: Anna Tsao
Elaine Jacobson
Guodong Zhang
Steven Huss-Lederman
Publication venue
Publication date
Field of study

CiteSeerX

Parallel Spectral Division Via The Generalized Matrix Sign Function

Author: Enrique S. Quintana-ort
Steven Huss-lederman
Yuan-jye Y. Wu
Publication venue
Publication date
Field of study

. In this paper we demonstrate the parallelism of the spectral division via the matrix sign function for the generalized nonsymmetric eigenproblem. We employ the so-called generalized Newton iterative scheme in order to compute the sign function of a matrix pair. A recent study has allowed considerable reduction (by 75%) in the computational cost of this iteration, making this approach competitive when compared to the traditional QZ algorithm. The matrix sign function is thus revealed as an efficient and reliable spectral division method for applications that only require partial information of the eigenspectrum. For applications which require complete information of the eigendistribution, the matrix sign function can be used as an initial divide-and-conquer method, combined with the QZ algorithm for the last stages. The experimental results on an IBM SP2 multicomputer demonstrate the parallel performance (efficiency around 60--80%) and scalability of this approach. Key words. General..

CiteSeerX

Parallel Studies of the Invariant Subspace Decomposition Approach for Banded Symmetric Matrices

Author: Anna Tsao
Christian Bischof
Steven Huss-lederman
Thomas Turnbull
Xiaobai Sun
Publication venue
Publication date
Field of study

We present an overview of the banded Invariant Subspace Decomposition Algorithm for symmetric matrices and describe a parallel implementation of this algorithm. The algorithm described here is a promising variant of the Invariant Subspace Decomposition Algorithm for dense symmetric matrices (SYISDA) that retains the property of using scalable primitives, while requiring significantly less overall computation than SYISDA. 1 Introduction Computation of eigenvalues and eigenvectors is an essential kernel in many applications, and several promising parallel algorithms have been investigated. The work presented in this paper is part of the PRISM (Parallel Research on Invariant Subspace Methods) Project, which involves researchers from Argonne National Laboratory, the Supercomputing Research Center, the University of California at Berkeley, and the University of Kentucky. The goal of the PRISM project is the development of algorithms and software for solving large-scale eigenvalue problems ..

CiteSeerX

A Case Study of MPI: Portable and Efficient Libraries*

Author: Anna Tsao
Christian Bischof
Steven Huss-lederman
Thomas Turnbull
Xiaobai Sun
Publication venue
Publication date
Field of study

In this paper, we discuss the performance achieved by several implementations of the recently defined Message Passing Interface (MPI) standard. In particular, performance results for different implementations of the broadcast operation are analyzed and compared on the Delta, Paragon, SP1 and CM5. 1 Introduction For the past several years, members of the Parallel Research on Invariant Subspace Methods (PRISM) project have been investigating scalable parallel eigensolvers for distributed memory systems [1, 3]. The ultimate objective of this research is the development of portable and efficient libraries for this fundamental numerical linear algebra kernel. In the course of our work, we, like many other library developers, have been faced with many issues relating to portable programming. Previously, a notable obstacle to library development was the lack of standardization in message passing, from both a programming and a functional point of view. This lack of standardization made it dif..

CiteSeerX