Search CORE

14 research outputs found

Recommended from our members

Workshop report on large-scale matrix diagonalization methods in chemistry theory institute

Author: Bischof C.H.
Huss-Lederman S.
Shepard R.L.
Publication venue: Argonne National Laboratory
Publication date: 01/10/1996
Field of study

The Large-Scale Matrix Diagonalization Methods in Chemistry theory institute brought together 41 computational chemists and numerical analysts. The goal was to understand the needs of the computational chemistry community in problems that utilize matrix diagonalization techniques. This was accomplished by reviewing the current state of the art and looking toward future directions in matrix diagonalization techniques. This institute occurred about 20 years after a related meeting of similar size. During those 20 years the Davidson method continued to dominate the problem of finding a few extremal eigenvalues for many computational chemistry problems. Work on non-diagonally dominant and non-Hermitian problems as well as parallel computing has also brought new methods to bear. The changes and similarities in problems and methods over the past two decades offered an interesting viewpoint for the success in this area. One important area covered by the talks was overviews of the source and nature of the chemistry problems. The numerical analysts were uniformly grateful for the efforts to convey a better understanding of the problems and issues faced in computational chemistry. An important outcome was an understanding of the wide range of eigenproblems encountered in computational chemistry. The workshop covered problems involving self- consistent-field (SCF), configuration interaction (CI), intramolecular vibrational relaxation (IVR), and scattering problems. In atomic structure calculations using the Hartree-Fock method (SCF), the symmetric matrices can range from order hundreds to thousands. These matrices often include large clusters of eigenvalues which can be as much as 25% of the spectrum. However, if Cl methods are also used, the matrix size can be between 10{sup 4} and 10{sup 9} where only one or a few extremal eigenvalues and eigenvectors are needed. Working with very large matrices has lead to the development o

UNT Digital Library

Recommended from our members

MPI-2: Extending the Message-Passing Interface

Author: Geist A.
Gropp W.
Huss-Lederman S.
Lumsdaine A.
Lusk E.
Saphir W.
Skjellum T.
Snir M.
Publication venue: Argonne National Laboratory
Publication date: 01/10/1996
Field of study

This paper describes current activities of the MPI-2 Forum. The MPI - 2 Forum is a group of parallel computer vendors, library writers, and application specialists working together to define a set of extensions to MPI (Message Passing Interface). MPI was defined by the same process and now has many implementations, both vendor- proprietary and publicly available, for a wide variety of parallel computing environments. In this paper we present the salient aspects of the evolving MPI-2 document as it now stands. We discuss proposed extensions and enhancements to MPI in the areas of dynamic process management, one-sided operations, collective operations, new language binding, real-time computing, external interfaces, and miscellaneous topics

UNT Digital Library

Parallel Spectral Division Via The Generalized Matrix Sign Function

Author: Enrique S. Quintana-ort
Steven Huss-lederman
Yuan-jye Y. Wu
Publication venue
Publication date
Field of study

. In this paper we demonstrate the parallelism of the spectral division via the matrix sign function for the generalized nonsymmetric eigenproblem. We employ the so-called generalized Newton iterative scheme in order to compute the sign function of a matrix pair. A recent study has allowed considerable reduction (by 75%) in the computational cost of this iteration, making this approach competitive when compared to the traditional QZ algorithm. The matrix sign function is thus revealed as an efficient and reliable spectral division method for applications that only require partial information of the eigenspectrum. For applications which require complete information of the eigendistribution, the matrix sign function can be used as an initial divide-and-conquer method, combined with the QZ algorithm for the last stages. The experimental results on an IBM SP2 multicomputer demonstrate the parallel performance (efficiency around 60--80%) and scalability of this approach. Key words. General..

CiteSeerX

Recommended from our members

The impact of HPF data layout on the design of efficient and maintainable parallel linear algebra libraries

Author: Bischof C. H.
Huss-Lederman S.
Jacobson E. M.
Sun Xiaobai
Tsao A.
Publication venue: Argonne National Laboratory
Publication date: 01/03/1994
Field of study

In this document, the authors are concerned with the effects of data layouts for nonsquare processor meshes on the implementation of common dense linear algebra kernels such as matrix-matrix multiplication, LU factorizations, or eigenvalue solvers. In particular, they address ease of programming and tunability of the resulting software. They introduce a generalization of the torus wrap data layout that results in a decoupling of {open_quotes}local{close_quotes} and {open_quotes}global{close_quotes} data layout view. As a result, it allows for intuitive programming of linear algebra algorithms and for tuning of the algorithm for a particular mesh aspect ratio or machine characteristics. This layout is as simple as the proposed HPF layout but, in the authors opinion, enhances ease of programming as well as case of performance tuning. They emphasize that they do not advocate that all users need be concerned with these issues. They do, however, believe, that for the foreseeable future {open_quotes}assembler coding{close_quotes} (as message-passing code is likely to be viewed from a HPF programmers` perspective) will be needed to deliver high performance for computationally intensive kernels. As a result, they believe that the adoption of this approach not only would accelerate the generation of efficient linear algebra software libraries but also would accelerate the adoption of HPF as a result. They point out, however, that the adoption of this new layout would necessitate that an HPF compiler ensure that data objects are operated on in a consistent fashion across subroutine and function calls

UNT Digital Library

Using Recursion to Boost ATLAS’s Performance

Author: B. Kagstrom
B. Kagstrom
D.H. Bailey
N.J. Higham
R. Whaley
R.P. Brent
S. Huss-Lederman
V. Strassen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Crossref

MPI-2: Extending the message-passing interface

Author: Geist A.
Gropp W.
Huss-Lederman S.
Lumsdaine A.
Lusk E.
Saphir W.
Skjellum T.
Snir M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/1996
Field of study

Crossref

UNT Digital Library

A Unified Approach to Parallel Block-Jacobi Methods for the Symmetric Eigenvalue Problem

Author: C.H. Bischof
D. Giménez
D. Mostafa El
G. Schroff
J. Demmel
L. Auslander
R. Schreiber
R.P. Brent
S. Domas
S. Huss-Lederman
X. Sun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1999
Field of study

Crossref

Wisconsin Wind Tunnel II: A Fast and Portable Parallel Architecture Simulator

Author: Falsafi Babak
Hill Mark D.
Huss-Lederman Steven
Larus James R.
Litzkow Mike
Mukherjee Shubhendu S.
Reinhardt Steven K.
Wood David A.
Publication venue
Publication date: 06/04/2009
Field of study

The design of future parallel computers requires rapid simulation of target designs running realistic workloads. These simulations have been accelerated using two techniques: direct execution and the use of a parallel host. Historically, these techniques have been considered to have poor portability. This paper identi- ﬁes and describes the implementation of four key oper- ations necessary to make such simulation portable across a variety of parallel computers. These four operations are: calculation of target execution time, simulation of features of interest, communication of target messages, and synchronization of host proces- sors. Portable implementations of these four operations have allowed us to easily run the Wisconsin Wind Tun- nel II (WWT II)—a parallel, discrete-event, direct-exe- cution simulator—across a wide range of platforms, such as desktop workstations, a SUN Enterprise server, a cluster of workstations, and a cluster of symmetric multiprocessing nodes. We plan to release WWTII in August, 1997. We also plan to port WWT II to the IBM SP2. We ﬁnd that for two benchmarks, WWT II demon- strates both good performance and good scalability. Uniprocessor WWT II simulates one target cycle of a 32- node target machine in 114 and 166 host cycles respec- tively for the two benchmarks on a SUN UltraSPARC. Parallel WWT II achieves speedups between 4.1-5.4 on 8 host processors in our three parallel machine conﬁgura- tions

Infoscience - École polytechnique fédérale de Lausanne

A framework for practical parallel fast matrix multiplication

Author: AMD.
Huss-Lederman S.
Knuth D. E.
Kurzak J.
Lipshitz B.
McCalpin J. D.
Thottethodi M.
Van Zee F. G.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref