8,683 research outputs found
Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion
We describe a new data format for storing triangular, symmetric, and
Hermitian matrices called RFPF (Rectangular Full Packed Format). The standard
two dimensional arrays of Fortran and C (also known as full format) that are
used to represent triangular and symmetric matrices waste nearly half of the
storage space but provide high performance via the use of Level 3 BLAS.
Standard packed format arrays fully utilize storage (array space) but provide
low performance as there is no Level 3 packed BLAS. We combine the good
features of packed and full storage using RFPF to obtain high performance via
using Level 3 BLAS as RFPF is a standard full format representation. Also, RFPF
requires exactly the same minimal storage as packed format. Each LAPACK full
and/or packed triangular, symmetric, and Hermitian routine becomes a single new
RFPF routine based on eight possible data layouts of RFPF. This new RFPF
routine usually consists of two calls to the corresponding LAPACK full format
routine and two calls to Level 3 BLAS routines. This means {\it no} new
software is required. As examples, we present LAPACK routines for Cholesky
factorization, Cholesky solution and Cholesky inverse computation in RFPF to
illustrate this new work and to describe its performance on several commonly
used computer platforms. Performance of LAPACK full routines using RFPF versus
LAPACK full routines using standard format for both serial and SMP parallel
processing is about the same while using half the storage. Performance gains
are roughly one to a factor of 43 for serial and one to a factor of 97 for SMP
parallel times faster using vendor LAPACK full routines with RFPF than with
using vendor and/or reference packed routines
Efficient numerical diagonalization of hermitian 3x3 matrices
A very common problem in science is the numerical diagonalization of
symmetric or hermitian 3x3 matrices. Since standard "black box" packages may be
too inefficient if the number of matrices is large, we study several
alternatives. We consider optimized implementations of the Jacobi, QL, and
Cuppen algorithms and compare them with an analytical method relying on
Cardano's formula for the eigenvalues and on vector cross products for the
eigenvectors. Jacobi is the most accurate, but also the slowest method, while
QL and Cuppen are good general purpose algorithms. The analytical algorithm
outperforms the others by more than a factor of 2, but becomes inaccurate or
may even fail completely if the matrix entries differ greatly in magnitude.
This can mostly be circumvented by using a hybrid method, which falls back to
QL if conditions are such that the analytical calculation might become too
inaccurate. For all algorithms, we give an overview of the underlying
mathematical ideas, and present detailed benchmark results. C and Fortran
implementations of our code are available for download from
http://www.mpi-hd.mpg.de/~globes/3x3/ .Comment: 13 pages, no figures, new hybrid algorithm added, matches published
version, typo in Eq. (39) corrected; software library available at
http://www.mpi-hd.mpg.de/~globes/3x3
- …