149 research outputs found

    Minimizing Communication for Eigenproblems and the Singular Value Decomposition

    Full text link
    Algorithms have two costs: arithmetic and communication. The latter represents the cost of moving data, either between levels of a memory hierarchy, or between processors over a network. Communication often dominates arithmetic and represents a rapidly increasing proportion of the total cost, so we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds were presented on the amount of communication required for essentially all O(n3)O(n^3)-like algorithms for linear algebra, including eigenvalue problems and the SVD. Conventional algorithms, including those currently implemented in (Sca)LAPACK, perform asymptotically more communication than these lower bounds require. In this paper we present parallel and sequential eigenvalue algorithms (for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms that do attain these lower bounds, and analyze their convergence and communication costs.Comment: 43 pages, 11 figure

    Minimizing Communication in Linear Algebra

    Full text link
    In 1981 Hong and Kung proved a lower bound on the amount of communication needed to perform dense, matrix-multiplication using the conventional O(n3)O(n^3) algorithm, where the input matrices were too large to fit in the small, fast memory. In 2004 Irony, Toledo and Tiskin gave a new proof of this result and extended it to the parallel case. In both cases the lower bound may be expressed as Ω\Omega(#arithmetic operations / M\sqrt{M}), where M is the size of the fast memory (or local memory in the parallel case). Here we generalize these results to a much wider variety of algorithms, including LU factorization, Cholesky factorization, LDLTLDL^T factorization, QR factorization, algorithms for eigenvalues and singular values, i.e., essentially all direct methods of linear algebra. The proof works for dense or sparse matrices, and for sequential or parallel algorithms. In addition to lower bounds on the amount of data moved (bandwidth) we get lower bounds on the number of messages required to move it (latency). We illustrate how to extend our lower bound technique to compositions of linear algebra operations (like computing powers of a matrix), to decide whether it is enough to call a sequence of simpler optimal algorithms (like matrix multiplication) to minimize communication, or if we can do better. We give examples of both. We also show how to extend our lower bounds to certain graph theoretic problems. We point out recently designed algorithms for dense LU, Cholesky, QR, eigenvalue and the SVD problems that attain these lower bounds; implementations of LU and QR show large speedups over conventional linear algebra algorithms in standard libraries like LAPACK and ScaLAPACK. Many open problems remain.Comment: 27 pages, 2 table

    Parallel Krylov Solvers for the Polynomial Eigenvalue Problem in SLEPc

    Full text link
    Polynomial eigenvalue problems are often found in scientific computing applications. When the coefficient matrices of the polynomial are large and sparse, usually only a few eigenpairs are required and projection methods are the best choice. We focus on Krylov methods that operate on the companion linearization of the polynomial but exploit the block structure with the aim of being memory-efficient in the representation of the Krylov subspace basis. The problem may appear in the form of a low-degree polynomial (quartic or quintic, say) expressed in the monomial basis, or a high-degree polynomial (coming from interpolation of a nonlinear eigenproblem) expressed in a nonmonomial basis. We have implemented a parallel solver in SLEPc covering both cases that is able to compute exterior as well as interior eigenvalues via spectral transformation. We discuss important issues such as scaling and restart and illustrate the robustness and performance of the solver with some numerical experiments.The first author was supported by the Spanish Ministry of Education, Culture and Sport through an FPU grant with reference AP2012-0608.Campos, C.; Román Moltó, JE. (2016). Parallel Krylov Solvers for the Polynomial Eigenvalue Problem in SLEPc. SIAM Journal on Scientific Computing. 38(5):385-411. https://doi.org/10.1137/15M1022458S38541138

    ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers

    Full text link
    Solving the electronic structure from a generalized or standard eigenproblem is often the bottleneck in large scale calculations based on Kohn-Sham density-functional theory. This problem must be addressed by essentially all current electronic structure codes, based on similar matrix expressions, and by high-performance computation. We here present a unified software interface, ELSI, to access different strategies that address the Kohn-Sham eigenvalue problem. Currently supported algorithms include the dense generalized eigensolver library ELPA, the orbital minimization method implemented in libOMM, and the pole expansion and selected inversion (PEXSI) approach with lower computational complexity for semilocal density functionals. The ELSI interface aims to simplify the implementation and optimal use of the different strategies, by offering (a) a unified software framework designed for the electronic structure solvers in Kohn-Sham density-functional theory; (b) reasonable default parameters for a chosen solver; (c) automatic conversion between input and internal working matrix formats, and in the future (d) recommendation of the optimal solver depending on the specific problem. Comparative benchmarks are shown for system sizes up to 11,520 atoms (172,800 basis functions) on distributed memory supercomputing architectures.Comment: 55 pages, 14 figures, 2 table

    MRRR-based Eigensolvers for Multi-core Processors and Supercomputers

    Get PDF
    The real symmetric tridiagonal eigenproblem is of outstanding importance in numerical computations; it arises frequently as part of eigensolvers for standard and generalized dense Hermitian eigenproblems that are based on a reduction to tridiagonal form. For its solution, the algorithm of Multiple Relatively Robust Representations (MRRR or MR3 in short) - introduced in the late 1990s - is among the fastest methods. To compute k eigenpairs of a real n-by-n tridiagonal T, MRRR only requires O(kn) arithmetic operations; in contrast, all the other practical methods require O(k^2 n) or O(n^3) operations in the worst case. This thesis centers around the performance and accuracy of MRRR.Comment: PhD thesi

    Waveform Design for Secure SISO Transmissions and Multicasting

    Full text link
    Wireless physical-layer security is an emerging field of research aiming at preventing eavesdropping in an open wireless medium. In this paper, we propose a novel waveform design approach to minimize the likelihood that a message transmitted between trusted single-antenna nodes is intercepted by an eavesdropper. In particular, with knowledge first of the eavesdropper's channel state information (CSI), we find the optimum waveform and transmit energy that minimize the signal-to-interference-plus-noise ratio (SINR) at the output of the eavesdropper's maximum-SINR linear filter, while at the same time provide the intended receiver with a required pre-specified SINR at the output of its own max-SINR filter. Next, if prior knowledge of the eavesdropper's CSI is unavailable, we design a waveform that maximizes the amount of energy available for generating disturbance to eavesdroppers, termed artificial noise (AN), while the SINR of the intended receiver is maintained at the pre-specified level. The extensions of the secure waveform design problem to multiple intended receivers are also investigated and semidefinite relaxation (SDR) -an approximation technique based on convex optimization- is utilized to solve the arising NP-hard design problems. Extensive simulation studies confirm our analytical performance predictions and illustrate the benefits of the designed waveforms on securing single-input single-output (SISO) transmissions and multicasting
    corecore