646 research outputs found
Fast linear algebra is stable
In an earlier paper, we showed that a large class of fast recursive matrix
multiplication algorithms is stable in a normwise sense, and that in fact if
multiplication of -by- matrices can be done by any algorithm in
operations for any , then it can be done
stably in operations for any . Here we extend
this result to show that essentially all standard linear algebra operations,
including LU decomposition, QR decomposition, linear equation solving, matrix
inversion, solving least squares problems, (generalized) eigenvalue problems
and the singular value decomposition can also be done stably (in a normwise
sense) in operations.Comment: 26 pages; final version; to appear in Numerische Mathemati
Studies in Rheology: Molecular Simulation and Theory
With an enormous advance in the capability of computers during the last fewdecades, the computer simulation has become an important tool for scientific researches in many areas such as physics, chemistry, biology, and so on. In particular, moleculardynamics (MD) simulations have been proven to be of a great help in understanding the rheology of complex fluids from the fundamental microscopic viewpoint. There are two important standard flows in rheology: shear flow and elongational flow. While there exist suitable nonequilibrium MD (NEMD) algorithms of shear flows, such as the Lees-Edwards purely boundary-driven algorithm and the so-called SLLOD algorithm as a field-driven algorithm, a proper NEMD algorithm for elongational flow has been lacking. The main difficulty of simulating elongational flow lies in the limited simulation time available due to the contraction of one or two dimensions dictated by itskinematics. This problem, however, has been partially resolved by Kraynik and Reinelt’s ingenious discovery of the temporal and spatial periodicity of lattice vectors in planar elongational flow (PEF). Although there have been a few NEMD simulations of PEF using their idea, another serious defect has recently been reported when using the SLLOD algorithm in PEF: for adiabatic systems, the total linear momentum of the system in the contracting direction grows exponentially with time, which eventually leads to an aphysical phase transition.This problem has been completely resolved by using the so-called ‘proper-SLLOD’ or ‘p-SLLOD’ algorithm, whose development has been one of the mainaccomplishments of this study. The fundamental correctness of the p-SLLOD algorithm has been demonstrated quite thoroughly in this work through detailed theoretical analyses together with direct simulation results. Both theoretical and simulation works achieved in this research are expected to play a significant role in advancing the knowledge of rheology, as well as that of NEMD simulation itself for other types of flow in general. Another important achievement in this work is the demonstration of the possibility of predicting a liquid structure in nonequilibrium states by employing a concept of ‘hypothetical’ nonequilibrium potentials. The methodology developed in this work has been shown to have good potential for further developments in this field
Minimizing Communication for Eigenproblems and the Singular Value Decomposition
Algorithms have two costs: arithmetic and communication. The latter
represents the cost of moving data, either between levels of a memory
hierarchy, or between processors over a network. Communication often dominates
arithmetic and represents a rapidly increasing proportion of the total cost, so
we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds
were presented on the amount of communication required for essentially all
-like algorithms for linear algebra, including eigenvalue problems and
the SVD. Conventional algorithms, including those currently implemented in
(Sca)LAPACK, perform asymptotically more communication than these lower bounds
require. In this paper we present parallel and sequential eigenvalue algorithms
(for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms
that do attain these lower bounds, and analyze their convergence and
communication costs.Comment: 43 pages, 11 figure
Computing the k-th Eigenvalue of Symmetric -Matrices
The numerical solution of eigenvalue problems is essential in various
application areas of scientific and engineering domains. In many problem
classes, the practical interest is only a small subset of eigenvalues so it is
unnecessary to compute all of the eigenvalues. Notable examples are the
electronic structure problems where the -th smallest eigenvalue is closely
related to the electronic properties of materials. In this paper, we consider
the -th eigenvalue problems of symmetric dense matrices with low-rank
off-diagonal blocks. We present a linear time generalized LDL decomposition of
matrices and combine it with the bisection eigenvalue algorithm
to compute the -th eigenvalue with controllable accuracy. In addition, if
more than one eigenvalue is required, some of the previous computations can be
reused to compute the other eigenvalues in parallel. Numerical experiments show
that our method is more efficient than the state-of-the-art dense eigenvalue
solver in LAPACK/ScaLAPACK and ELPA. Furthermore, tests on electronic state
calculations of carbon nanomaterials demonstrate that our method outperforms
the existing HSS-based bisection eigenvalue algorithm on 3D problems.Comment: 14 pages, 11 figure
MRRR-based Eigensolvers for Multi-core Processors and Supercomputers
The real symmetric tridiagonal eigenproblem is of outstanding importance in
numerical computations; it arises frequently as part of eigensolvers for
standard and generalized dense Hermitian eigenproblems that are based on a
reduction to tridiagonal form. For its solution, the algorithm of Multiple
Relatively Robust Representations (MRRR or MR3 in short) - introduced in the
late 1990s - is among the fastest methods. To compute k eigenpairs of a real
n-by-n tridiagonal T, MRRR only requires O(kn) arithmetic operations; in
contrast, all the other practical methods require O(k^2 n) or O(n^3) operations
in the worst case. This thesis centers around the performance and accuracy of
MRRR.Comment: PhD thesi
Mixed-Precision Numerical Linear Algebra Algorithms: Integer Arithmetic Based LU Factorization and Iterative Refinement for Hermitian Eigenvalue Problem
Mixed-precision algorithms are a class of algorithms that uses low precision in part of the algorithm in order to save time and energy with less accurate computation and communication. These algorithms usually utilize iterative refinement processes to improve the approximate solution obtained from low precision to the accuracy we desire from doing all the computation in high precision. Due to the demand of deep learning applications, there are hardware developments offering different low-precision formats including half precision (FP16), Bfloat16 and integer operations for quantized integers, which uses integers with a shared scalar to represent a set of equally spaced numbers. As new hardware architectures focus on bringing performance in these formats, the mixed-precision algorithms have more potential leverage on them and outmatch traditional fixed-precision algorithms. This dissertation consists of two articles. In the first article, we adapt one of the most fundamental algorithms in numerical linear algebra---LU factorization with partial pivoting--- to use integer arithmetic. With the goal of obtaining a low accuracy factorization as the preconditioner of generalized minimal residual (GMRES) to solve systems of linear equations, the LU factorization is adapted to use two different fixed-point formats for matrices L and U. A left-looking variant is also proposed for matrices with unbounded column growth. Finally, GMRES iterative refinement has shown that it can work on matrices with condition numbers up to 10000 with the algorithm that uses int16 as input and int32 accumulator for the update step. The second article targets symmetric and Hermitian eigenvalue problems. In this section we revisit the SICE algorithm from Dongarra et al. By applying the Sherman-Morrison formula on the diagonally-shifted tridiagonal systems, we propose an updated SICE-SM algorithm. By incorporating the latest two-stage algorithms from the PLASMA and MAGMA software libraries for numerical linear algebra, we achieved up to 3.6x speedup using the mixed-precision eigensolver with the blocked SICE-SM algorithm for iterative refinement when compared with full double complex precision solvers for the cases with a portion of eigenvalues and eigenvectors requested
Efficient Algorithms for Solving Structured Eigenvalue Problems Arising in the Description of Electronic Excitations
Matrices arising in linear-response time-dependent density functional theory and many-body perturbation theory, in particular in the Bethe-Salpeter approach, show a 2 × 2 block structure. The motivation to devise new algorithms, instead of using general purpose eigenvalue solvers, comes from the need to solve large problems on high performance computers. This requires parallelizable and communication-avoiding algorithms and implementations. We point out various novel directions for diagonalizing structured matrices. These include the solution of skew-symmetric eigenvalue problems in ELPA, as well as structure preserving spectral divide-and-conquer schemes employing generalized polar decompostions
Optimising Matrix Product State Simulations of Shor's Algorithm
We detail techniques to optimise high-level classical simulations of Shor's
quantum factoring algorithm. Chief among these is to examine the entangling
properties of the circuit and to effectively map it across the one-dimensional
structure of a matrix product state. Compared to previous approaches whose
space requirements depend on , the solution to the underlying order-finding
problem of Shor's algorithm, our approach depends on its factors. We performed
a matrix product state simulation of a 60-qubit instance of Shor's algorithm
that would otherwise be infeasible to complete without an optimised
entanglement mapping.Comment: 8 pages, 2 figures, 2 tables. v2 using PDFLaTeX compiler. v3 to
include extra references. v4 for publication in Quantu
- …