149 research outputs found
Minimizing Communication for Eigenproblems and the Singular Value Decomposition
Algorithms have two costs: arithmetic and communication. The latter
represents the cost of moving data, either between levels of a memory
hierarchy, or between processors over a network. Communication often dominates
arithmetic and represents a rapidly increasing proportion of the total cost, so
we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds
were presented on the amount of communication required for essentially all
-like algorithms for linear algebra, including eigenvalue problems and
the SVD. Conventional algorithms, including those currently implemented in
(Sca)LAPACK, perform asymptotically more communication than these lower bounds
require. In this paper we present parallel and sequential eigenvalue algorithms
(for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms
that do attain these lower bounds, and analyze their convergence and
communication costs.Comment: 43 pages, 11 figure
Minimizing Communication in Linear Algebra
In 1981 Hong and Kung proved a lower bound on the amount of communication
needed to perform dense, matrix-multiplication using the conventional
algorithm, where the input matrices were too large to fit in the small, fast
memory. In 2004 Irony, Toledo and Tiskin gave a new proof of this result and
extended it to the parallel case. In both cases the lower bound may be
expressed as (#arithmetic operations / ), where M is the size
of the fast memory (or local memory in the parallel case). Here we generalize
these results to a much wider variety of algorithms, including LU
factorization, Cholesky factorization, factorization, QR factorization,
algorithms for eigenvalues and singular values, i.e., essentially all direct
methods of linear algebra. The proof works for dense or sparse matrices, and
for sequential or parallel algorithms. In addition to lower bounds on the
amount of data moved (bandwidth) we get lower bounds on the number of messages
required to move it (latency). We illustrate how to extend our lower bound
technique to compositions of linear algebra operations (like computing powers
of a matrix), to decide whether it is enough to call a sequence of simpler
optimal algorithms (like matrix multiplication) to minimize communication, or
if we can do better. We give examples of both. We also show how to extend our
lower bounds to certain graph theoretic problems.
We point out recently designed algorithms for dense LU, Cholesky, QR,
eigenvalue and the SVD problems that attain these lower bounds; implementations
of LU and QR show large speedups over conventional linear algebra algorithms in
standard libraries like LAPACK and ScaLAPACK. Many open problems remain.Comment: 27 pages, 2 table
Parallel Krylov Solvers for the Polynomial Eigenvalue Problem in SLEPc
Polynomial eigenvalue problems are often found in scientific computing applications. When the coefficient matrices of the polynomial are large and sparse, usually only a few eigenpairs are required and projection methods are the best choice. We focus on Krylov methods that operate on the companion linearization of the polynomial but exploit the block structure with the aim of being memory-efficient in the representation of the Krylov subspace basis. The problem may appear in the form of a low-degree polynomial (quartic or quintic, say) expressed in the monomial basis, or a high-degree polynomial (coming from interpolation of a nonlinear eigenproblem) expressed in a nonmonomial basis. We have implemented a parallel solver in SLEPc covering both cases that is able to compute exterior as well as interior eigenvalues via spectral transformation. We discuss important issues such as scaling and restart and illustrate the robustness and performance of the solver with some numerical experiments.The first author was supported by the Spanish Ministry of Education, Culture and Sport through an FPU grant with reference AP2012-0608.Campos, C.; Román Moltó, JE. (2016). Parallel Krylov Solvers for the Polynomial Eigenvalue Problem in SLEPc. SIAM Journal on Scientific Computing. 38(5):385-411. https://doi.org/10.1137/15M1022458S38541138
ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers
Solving the electronic structure from a generalized or standard eigenproblem
is often the bottleneck in large scale calculations based on Kohn-Sham
density-functional theory. This problem must be addressed by essentially all
current electronic structure codes, based on similar matrix expressions, and by
high-performance computation. We here present a unified software interface,
ELSI, to access different strategies that address the Kohn-Sham eigenvalue
problem. Currently supported algorithms include the dense generalized
eigensolver library ELPA, the orbital minimization method implemented in
libOMM, and the pole expansion and selected inversion (PEXSI) approach with
lower computational complexity for semilocal density functionals. The ELSI
interface aims to simplify the implementation and optimal use of the different
strategies, by offering (a) a unified software framework designed for the
electronic structure solvers in Kohn-Sham density-functional theory; (b)
reasonable default parameters for a chosen solver; (c) automatic conversion
between input and internal working matrix formats, and in the future (d)
recommendation of the optimal solver depending on the specific problem.
Comparative benchmarks are shown for system sizes up to 11,520 atoms (172,800
basis functions) on distributed memory supercomputing architectures.Comment: 55 pages, 14 figures, 2 table
MRRR-based Eigensolvers for Multi-core Processors and Supercomputers
The real symmetric tridiagonal eigenproblem is of outstanding importance in
numerical computations; it arises frequently as part of eigensolvers for
standard and generalized dense Hermitian eigenproblems that are based on a
reduction to tridiagonal form. For its solution, the algorithm of Multiple
Relatively Robust Representations (MRRR or MR3 in short) - introduced in the
late 1990s - is among the fastest methods. To compute k eigenpairs of a real
n-by-n tridiagonal T, MRRR only requires O(kn) arithmetic operations; in
contrast, all the other practical methods require O(k^2 n) or O(n^3) operations
in the worst case. This thesis centers around the performance and accuracy of
MRRR.Comment: PhD thesi
Waveform Design for Secure SISO Transmissions and Multicasting
Wireless physical-layer security is an emerging field of research aiming at
preventing eavesdropping in an open wireless medium. In this paper, we propose
a novel waveform design approach to minimize the likelihood that a message
transmitted between trusted single-antenna nodes is intercepted by an
eavesdropper. In particular, with knowledge first of the eavesdropper's channel
state information (CSI), we find the optimum waveform and transmit energy that
minimize the signal-to-interference-plus-noise ratio (SINR) at the output of
the eavesdropper's maximum-SINR linear filter, while at the same time provide
the intended receiver with a required pre-specified SINR at the output of its
own max-SINR filter. Next, if prior knowledge of the eavesdropper's CSI is
unavailable, we design a waveform that maximizes the amount of energy available
for generating disturbance to eavesdroppers, termed artificial noise (AN),
while the SINR of the intended receiver is maintained at the pre-specified
level. The extensions of the secure waveform design problem to multiple
intended receivers are also investigated and semidefinite relaxation (SDR) -an
approximation technique based on convex optimization- is utilized to solve the
arising NP-hard design problems. Extensive simulation studies confirm our
analytical performance predictions and illustrate the benefits of the designed
waveforms on securing single-input single-output (SISO) transmissions and
multicasting
Recommended from our members
Two Geometric Results regarding Hölder-Brascamp-Lieb Inequalities, and Two Novel Algorithms for Low-Rank Approximation
Broadly speaking, this thesis investigates mathematical questions motivated by computer science. The involved topics include communication avoiding algorithms, classical analysis, convex geometry, and low-rank matrix approximation. In total, the thesis consists of four self-contained sections, each adapted from papers the author has been a part of.The first two sections are both motivated by the Brascamp-Lieb inequalities, which are also often referred to as Hölder-Brascamp-Lieb inequalities. These inequalities have featured prominently in recent theoretical computer science work, due to connections to geometric complexity theory, harmonic analysis, communication-avoidance, and many other areas. Moreover, work generalizing the inequalities in various ways, such as to nonlinear versions, has been impactful to the study of differential equations.Section 1 studies the application of Hölder-Brascamp-Lieb (HBL) inequalities to the design of communication optimal algorithms. In particular, it describes optimal tiling (blocking) strategies for nested loops that lack data dependencies and exhibit affine memory access patterns. The problem roughly amounts to maximizing the volume of an object provided some of its linear images have bounded volume. The methods used are algorithmic.Another reason for the interest in these inequalities is because they are an interesting test case for non-convex optimization techniques. The optimal constant for a particular instance of the inequality is given by solving a non-convex optimization problem that is still highly structured. Of particular relevance to this thesis is that it can be formulated as a geodesically-convex problem, considered in the context of the manifold of positive definite matrices of determinant . Even using the methods of Section 1, the procedure is not necessarily polynomial time, and this motivates further study of geodesic convexity.This lead to the work of Section 2, which discusses a notion of halfspace for Hadamard manifolds that is natural in the context of convex optimization. For this notion of halfspace, we generalize a classic result of Grunbaum, which itself is a corollary of Helly's theorem. Namely, given a probability distribution on the manifold, there is a point for which all halfspaces based at this point have at least 1/(n+1) of the mass, n being the dimension of the manifold. As an application, the gradient oracle complexity of geodesic convex optimization is polynomial in the parameters defining the problem. In particular it is polynomial in -log(epsilon), where epsilon is the desired error. This is a step toward the open question of whether such an algorithm exists.The remaining two sections of the paper present a different research direction, randomized numerical linear algebra. Numerical linear algebra has long been an important part of scientific computing. Due to the current trend of increasing matrix sizes and growing importance of fast, approximate solutions in industry, randomized methods are quickly increasing in popularity. Sections 3 and 4 in this thesis aim to show that randomized low-rank approximation algorithms satisfy many of the properties of classical rank-revealing factorizations.Section 3 introduces a Generalized Randomized QR-decomposition (RURV) that may be applied to arbitrary products of matrices and their inverses, without needing to explicitly compute the products or inverses. This factorization is a critical part of a communication-optimal spectral divide-and-conquer algorithm for the nonsymmetric eigenvalue problem. In this paper, we establish that this randomized QR-factorization satisfies the strong rank-revealing properties. We also formally prove its stability, making it suitable in applications. Finally, we present numerical experiments which demonstrate that our theoretical bounds capture the empirical behavior of the factorization.Section 4 concerns a Generalized LU-Factorization (GLU) for low-rank matrix approximation. We relate this to past approaches and extensively analyze its approximation properties. The established deterministic guarantees are combined with sketching ensembles satisfying Johnson-Lindenstrauss properties to present complete bounds. Particularly good performance is shown for the sub-sampled randomized Hadamard transform (SRHT) ensemble. Moreover, the factorization is shown to unify and generalize many past algorithms. It also helps to explain the effect of sketching on the growth factor during Gaussian Elimination
- …