6,279 research outputs found

    A nearly-linear computational-cost scheme for the forward dynamics of an N-body pendulum

    Get PDF
    The dynamic equations of motion of an n-body pendulum with spherical joints are derived to be a mixed system of differential and algebraic equations (DAE's). The DAE's are kept in implicit form to save arithmetic and preserve the sparsity of the system and are solved by the robust implicit integration method. At each solution point, the predicted solution is corrected to its exact solution within given tolerance using Newton's iterative method. For each iteration, a linear system of the form J delta X = E has to be solved. The computational cost for solving this linear system directly by LU factorization is O(n exp 3), and it can be reduced significantly by exploring the structure of J. It is shown that by recognizing the recursive patterns and exploiting the sparsity of the system the multiplicative and additive computational costs for solving J delta X = E are O(n) and O(n exp 2), respectively. The formulation and solution method for an n-body pendulum is presented. The computational cost is shown to be nearly linearly proportional to the number of bodies

    A Householder-based algorithm for Hessenberg-triangular reduction

    Full text link
    The QZ algorithm for computing eigenvalues and eigenvectors of a matrix pencil A−λBA - \lambda B requires that the matrices first be reduced to Hessenberg-triangular (HT) form. The current method of choice for HT reduction relies entirely on Givens rotations regrouped and accumulated into small dense matrices which are subsequently applied using matrix multiplication routines. A non-vanishing fraction of the total flop count must nevertheless still be performed as sequences of overlapping Givens rotations alternately applied from the left and from the right. The many data dependencies associated with this computational pattern leads to inefficient use of the processor and poor scalability. In this paper, we therefore introduce a fundamentally different approach that relies entirely on (large) Householder reflectors partially accumulated into block reflectors, by using (compact) WY representations. Even though the new algorithm requires more floating point operations than the state of the art algorithm, extensive experiments on both real and synthetic data indicate that it is still competitive, even in a sequential setting. The new algorithm is conjectured to have better parallel scalability, an idea which is partially supported by early small-scale experiments using multi-threaded BLAS. The design and evaluation of a parallel formulation is future work

    Introduction to StarNEig -- A Task-based Library for Solving Nonsymmetric Eigenvalue Problems

    Full text link
    In this paper, we present the StarNEig library for solving dense non-symmetric (generalized) eigenvalue problems. The library is built on top of the StarPU runtime system and targets both shared and distributed memory machines. Some components of the library support GPUs. The library is currently in an early beta state and only real arithmetic is supported. Support for complex data types is planned for a future release. This paper is aimed for potential users of the library. We describe the design choices and capabilities of the library, and contrast them to existing software such as ScaLAPACK. StarNEig implements a ScaLAPACK compatibility layer that should make it easy for a new user to transition to StarNEig. We demonstrate the performance of the library with a small set of computational experiments.Comment: 10 pages, 4 figures (10 when counting sub-figures), 2 tex-files. Submitted to PPAM 2019, 13th international conference on parallel processing and applied mathematics, September 8-11, 2019. Proceedings will be published after the conference by Springer in the LNCS series. Second author's first name is "Carl Christian" and last name "Kjelgaard Mikkelsen

    Minimizing Communication for Eigenproblems and the Singular Value Decomposition

    Full text link
    Algorithms have two costs: arithmetic and communication. The latter represents the cost of moving data, either between levels of a memory hierarchy, or between processors over a network. Communication often dominates arithmetic and represents a rapidly increasing proportion of the total cost, so we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds were presented on the amount of communication required for essentially all O(n3)O(n^3)-like algorithms for linear algebra, including eigenvalue problems and the SVD. Conventional algorithms, including those currently implemented in (Sca)LAPACK, perform asymptotically more communication than these lower bounds require. In this paper we present parallel and sequential eigenvalue algorithms (for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms that do attain these lower bounds, and analyze their convergence and communication costs.Comment: 43 pages, 11 figure
    • …
    corecore