6,279 research outputs found
A nearly-linear computational-cost scheme for the forward dynamics of an N-body pendulum
The dynamic equations of motion of an n-body pendulum with spherical joints are derived to be a mixed system of differential and algebraic equations (DAE's). The DAE's are kept in implicit form to save arithmetic and preserve the sparsity of the system and are solved by the robust implicit integration method. At each solution point, the predicted solution is corrected to its exact solution within given tolerance using Newton's iterative method. For each iteration, a linear system of the form J delta X = E has to be solved. The computational cost for solving this linear system directly by LU factorization is O(n exp 3), and it can be reduced significantly by exploring the structure of J. It is shown that by recognizing the recursive patterns and exploiting the sparsity of the system the multiplicative and additive computational costs for solving J delta X = E are O(n) and O(n exp 2), respectively. The formulation and solution method for an n-body pendulum is presented. The computational cost is shown to be nearly linearly proportional to the number of bodies
A Householder-based algorithm for Hessenberg-triangular reduction
The QZ algorithm for computing eigenvalues and eigenvectors of a matrix
pencil requires that the matrices first be reduced to
Hessenberg-triangular (HT) form. The current method of choice for HT reduction
relies entirely on Givens rotations regrouped and accumulated into small dense
matrices which are subsequently applied using matrix multiplication routines. A
non-vanishing fraction of the total flop count must nevertheless still be
performed as sequences of overlapping Givens rotations alternately applied from
the left and from the right. The many data dependencies associated with this
computational pattern leads to inefficient use of the processor and poor
scalability.
In this paper, we therefore introduce a fundamentally different approach that
relies entirely on (large) Householder reflectors partially accumulated into
block reflectors, by using (compact) WY representations. Even though the new
algorithm requires more floating point operations than the state of the art
algorithm, extensive experiments on both real and synthetic data indicate that
it is still competitive, even in a sequential setting. The new algorithm is
conjectured to have better parallel scalability, an idea which is partially
supported by early small-scale experiments using multi-threaded BLAS. The
design and evaluation of a parallel formulation is future work
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
Introduction to StarNEig -- A Task-based Library for Solving Nonsymmetric Eigenvalue Problems
In this paper, we present the StarNEig library for solving dense
non-symmetric (generalized) eigenvalue problems. The library is built on top of
the StarPU runtime system and targets both shared and distributed memory
machines. Some components of the library support GPUs. The library is currently
in an early beta state and only real arithmetic is supported. Support for
complex data types is planned for a future release. This paper is aimed for
potential users of the library. We describe the design choices and capabilities
of the library, and contrast them to existing software such as ScaLAPACK.
StarNEig implements a ScaLAPACK compatibility layer that should make it easy
for a new user to transition to StarNEig. We demonstrate the performance of the
library with a small set of computational experiments.Comment: 10 pages, 4 figures (10 when counting sub-figures), 2 tex-files.
Submitted to PPAM 2019, 13th international conference on parallel processing
and applied mathematics, September 8-11, 2019. Proceedings will be published
after the conference by Springer in the LNCS series. Second author's first
name is "Carl Christian" and last name "Kjelgaard Mikkelsen
Minimizing Communication for Eigenproblems and the Singular Value Decomposition
Algorithms have two costs: arithmetic and communication. The latter
represents the cost of moving data, either between levels of a memory
hierarchy, or between processors over a network. Communication often dominates
arithmetic and represents a rapidly increasing proportion of the total cost, so
we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds
were presented on the amount of communication required for essentially all
-like algorithms for linear algebra, including eigenvalue problems and
the SVD. Conventional algorithms, including those currently implemented in
(Sca)LAPACK, perform asymptotically more communication than these lower bounds
require. In this paper we present parallel and sequential eigenvalue algorithms
(for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms
that do attain these lower bounds, and analyze their convergence and
communication costs.Comment: 43 pages, 11 figure
- …