488 research outputs found
High-Performance Solvers for Dense Hermitian Eigenproblems
We introduce a new collection of solvers - subsequently called EleMRRR - for
large-scale dense Hermitian eigenproblems. EleMRRR solves various types of
problems: generalized, standard, and tridiagonal eigenproblems. Among these,
the last is of particular importance as it is a solver on its own right, as
well as the computational kernel for the first two; we present a fast and
scalable tridiagonal solver based on the Algorithm of Multiple Relatively
Robust Representations - referred to as PMRRR. Like the other EleMRRR solvers,
PMRRR is part of the freely available Elemental library, and is designed to
fully support both message-passing (MPI) and multithreading parallelism (SMP).
As a result, the solvers can equally be used in pure MPI or in hybrid MPI-SMP
fashion. We conducted a thorough performance study of EleMRRR and ScaLAPACK's
solvers on two supercomputers. Such a study, performed with up to 8,192 cores,
provides precise guidelines to assemble the fastest solver within the ScaLAPACK
framework; it also indicates that EleMRRR outperforms even the fastest solvers
built from ScaLAPACK's components
Fast modal extraction in NASTRAN via the FEER computer program
A new eigensolution routine, FEER (Fast Eigensolution Extraction Routine), used in conjunction with NASTRAN at Israel Aircraft Industries is described. The FEER program is based on an automatic matrix reduction scheme whereby the lower modes of structures with many degrees of freedom can be accurately extracted from a tridiagonal eigenvalue problem whose size is of the same order of magnitude as the number of required modes. The process is effected without arbitrary lumping of masses at selected node points or selection of nodes to be retained in the analysis set. The results of computational efficiency studies are presented, showing major arithmetic operation counts and actual computer run times of FEER as compared to other methods of eigenvalue extraction, including those available in the NASTRAN READ module. It is concluded that the tridiagonal reduction method used in FEER would serve as a valuable addition to NASTRAN for highly increased efficiency in obtaining structural vibration modes
Minimizing Communication for Eigenproblems and the Singular Value Decomposition
Algorithms have two costs: arithmetic and communication. The latter
represents the cost of moving data, either between levels of a memory
hierarchy, or between processors over a network. Communication often dominates
arithmetic and represents a rapidly increasing proportion of the total cost, so
we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds
were presented on the amount of communication required for essentially all
-like algorithms for linear algebra, including eigenvalue problems and
the SVD. Conventional algorithms, including those currently implemented in
(Sca)LAPACK, perform asymptotically more communication than these lower bounds
require. In this paper we present parallel and sequential eigenvalue algorithms
(for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms
that do attain these lower bounds, and analyze their convergence and
communication costs.Comment: 43 pages, 11 figure
Improved Accuracy and Parallelism for MRRR-based Eigensolvers -- A Mixed Precision Approach
The real symmetric tridiagonal eigenproblem is of outstanding importance in
numerical computations; it arises frequently as part of eigensolvers for
standard and generalized dense Hermitian eigenproblems that are based on a
reduction to tridiagonal form. For its solution, the algorithm of Multiple
Relatively Robust Representations (MRRR) is among the fastest methods. Although
fast, the solvers based on MRRR do not deliver the same accuracy as competing
methods like Divide & Conquer or the QR algorithm. In this paper, we
demonstrate that the use of mixed precisions leads to improved accuracy of
MRRR-based eigensolvers with limited or no performance penalty. As a result, we
obtain eigensolvers that are not only equally or more accurate than the best
available methods, but also -in most circumstances- faster and more scalable
than the competition
ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers
Solving the electronic structure from a generalized or standard eigenproblem
is often the bottleneck in large scale calculations based on Kohn-Sham
density-functional theory. This problem must be addressed by essentially all
current electronic structure codes, based on similar matrix expressions, and by
high-performance computation. We here present a unified software interface,
ELSI, to access different strategies that address the Kohn-Sham eigenvalue
problem. Currently supported algorithms include the dense generalized
eigensolver library ELPA, the orbital minimization method implemented in
libOMM, and the pole expansion and selected inversion (PEXSI) approach with
lower computational complexity for semilocal density functionals. The ELSI
interface aims to simplify the implementation and optimal use of the different
strategies, by offering (a) a unified software framework designed for the
electronic structure solvers in Kohn-Sham density-functional theory; (b)
reasonable default parameters for a chosen solver; (c) automatic conversion
between input and internal working matrix formats, and in the future (d)
recommendation of the optimal solver depending on the specific problem.
Comparative benchmarks are shown for system sizes up to 11,520 atoms (172,800
basis functions) on distributed memory supercomputing architectures.Comment: 55 pages, 14 figures, 2 table
Conditional quasi-exact solvability of the quantum planar pendulum and of its anti-isospectral hyperbolic counterpart
We have subjected the planar pendulum eigenproblem to a symmetry analysis
with the goal of explaining the relationship between its conditional
quasi-exact solvability (C-QES) and the topology of its eigenenergy surfaces,
established in our earlier work [Frontiers in Physical Chemistry and Chemical
Physics 2, 1-16, (2014)]. The present analysis revealed that this relationship
can be traced to the structure of the tridiagonal matrices representing the
symmetry-adapted pendular Hamiltonian, as well as enabled us to identify many
more -- forty in total to be exact -- analytic solutions. Furthermore, an
analogous analysis of the hyperbolic counterpart of the planar pendulum, the
Razavy problem, which was shown to be also C-QES [American Journal of Physics
48, 285 (1980)], confirmed that it is anti-isospectral with the pendular
eigenproblem. Of key importance for both eigenproblems proved to be the
topological index , as it determines the loci of the intersections
(genuine and avoided) of the eigenenergy surfaces spanned by the dimensionless
interaction parameters and . It also encapsulates the conditions
under which analytic solutions to the two eigenproblems obtain and provides the
number of analytic solutions. At a given , the anti-isospectrality
occurs for single states only (i.e., not for doublets), like C-QES holds solely
for integer values of , and only occurs for the lowest eigenvalues of
the pendular and Razavy Hamiltonians, with the order of the eigenvalues
reversed for the latter. For all other states, the pendular and Razavy spectra
become in fact qualitatively different, as higher pendular states appear as
doublets whereas all higher Razavy states are singlets
Parallel implementation for large and sparse eigenproblems
This paper analyses and evaluates the computational aspects of an efficient parallel implementation for the eigenproblem. This parallel implementation allows to solve the eigenproblem of symmetric, sparse and very large matrices. Mathematically, the algorithm is supported by the Lanczos and Divide and Conquer methods. The Lanczos method transforms the eigenproblem of a symmetric matrix into an eigenproblem of a tridiagonal matrix which is easier to be solved. The Divide and Conquer method provides the solution for the eigenproblem of a large tridiagonal matrix by decomposing it in a set of smaller subproblems. The method has been implemented for a distributed memory multiprocessor system with the PVM parallel interface. A Cray T3E system with up to 32 nodes has been used to evaluate the performance of our parallel implementation. Due to the super-lineal speed-up values obtained for all the studied matrices, a detailed analysis of the experimental results is carried out. It will be shown that the management of the memory hierarchy plays an important role in the performance of the parallel implementation
- âŠ