40,766 research outputs found

    Incomplete Orthogonal Factorization Methods Using Givens Rotations II: Implementation and Results

    Get PDF
    We present, implement and test a series of incomplete orthogonal factorization methods based on Givens rotations for large sparse unsymmetric matrices. These methods include: column-Incomplete Givens Orthogonalization (cIGO-method), which drops entries by position only; column-Threshold Incomplete Givens Orthogonalization (cTIGO-method) which drops entries dynamically by both their magnitudes and positions and where the reduction via Givens rotations is done in a column-wise fashion; and, row-Threshold Incomplete Givens Orthogonalization (r-TIGO-method) which again drops entries dynamically, but only magnitude is now taken into account and reduction is performed in a row-wise fashion. We give comprehensive accounts of how one would code these algorithms using a high level language to ensure efficiency of computation and memory use. The methods are then applied to a variety of square systems and their performance as preconditioners is tested against standard incomplete LU factorization techniques. For rectangular matrices corresponding to least-squares problems, the resulting incomplete factorizations are applied as preconditioners for conjugate gradients for the system of normal equations. A comprehensive discussion about the uses, advantages and shortcomings of these preconditioners is given

    Uniform random generation of large acyclic digraphs

    Full text link
    Directed acyclic graphs are the basic representation of the structure underlying Bayesian networks, which represent multivariate probability distributions. In many practical applications, such as the reverse engineering of gene regulatory networks, not only the estimation of model parameters but the reconstruction of the structure itself is of great interest. As well as for the assessment of different structure learning algorithms in simulation studies, a uniform sample from the space of directed acyclic graphs is required to evaluate the prevalence of certain structural features. Here we analyse how to sample acyclic digraphs uniformly at random through recursive enumeration, an approach previously thought too computationally involved. Based on complexity considerations, we discuss in particular how the enumeration directly provides an exact method, which avoids the convergence issues of the alternative Markov chain methods and is actually computationally much faster. The limiting behaviour of the distribution of acyclic digraphs then allows us to sample arbitrarily large graphs. Building on the ideas of recursive enumeration based sampling we also introduce a novel hybrid Markov chain with much faster convergence than current alternatives while still being easy to adapt to various restrictions. Finally we discuss how to include such restrictions in the combinatorial enumeration and the new hybrid Markov chain method for efficient uniform sampling of the corresponding graphs.Comment: 15 pages, 2 figures. To appear in Statistics and Computin

    A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures

    Full text link
    As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these new processors. Fine grain parallelism becomes a major requirement and introduces the necessity of loose synchronization in the parallel execution of an operation. This paper presents an algorithm for the Cholesky, LU and QR factorization where the operations can be represented as a sequence of small tasks that operate on square blocks of data. These tasks can be dynamically scheduled for execution based on the dependencies among them and on the availability of computational resources. This may result in an out of order execution of the tasks which will completely hide the presence of intrinsically sequential tasks in the factorization. Performance comparisons are presented with the LAPACK algorithms where parallelism can only be exploited at the level of the BLAS operations and vendor implementations

    A parallel algorithm for Hamiltonian matrix construction in electron-molecule collision calculations: MPI-SCATCI

    Full text link
    Construction and diagonalization of the Hamiltonian matrix is the rate-limiting step in most low-energy electron -- molecule collision calculations. Tennyson (J Phys B, 29 (1996) 1817) implemented a novel algorithm for Hamiltonian construction which took advantage of the structure of the wavefunction in such calculations. This algorithm is re-engineered to make use of modern computer architectures and the use of appropriate diagonalizers is considered. Test calculations demonstrate that significant speed-ups can be gained using multiple CPUs. This opens the way to calculations which consider higher collision energies, larger molecules and / or more target states. The methodology, which is implemented as part of the UK molecular R-matrix codes (UKRMol and UKRMol+) can also be used for studies of bound molecular Rydberg states, photoionisation and positron-molecule collisions.Comment: Write up of a computer program MPI-SCATCI Computer Physics Communications, in pres

    Theory and computation of covariant Lyapunov vectors

    Full text link
    Lyapunov exponents are well-known characteristic numbers that describe growth rates of perturbations applied to a trajectory of a dynamical system in different state space directions. Covariant (or characteristic) Lyapunov vectors indicate these directions. Though the concept of these vectors has been known for a long time, they became practically computable only recently due to algorithms suggested by Ginelli et al. [Phys. Rev. Lett. 99, 2007, 130601] and by Wolfe and Samelson [Tellus 59A, 2007, 355]. In view of the great interest in covariant Lyapunov vectors and their wide range of potential applications, in this article we summarize the available information related to Lyapunov vectors and provide a detailed explanation of both the theoretical basics and numerical algorithms. We introduce the notion of adjoint covariant Lyapunov vectors. The angles between these vectors and the original covariant vectors are norm-independent and can be considered as characteristic numbers. Moreover, we present and study in detail an improved approach for computing covariant Lyapunov vectors. Also we describe, how one can test for hyperbolicity of chaotic dynamics without explicitly computing covariant vectors.Comment: 21 pages, 5 figure

    Improvements in sparse matrix operations of NASTRAN

    Get PDF
    A "nontransmit" packing routine was added to NASTRAN to allow matrix data to be refered to directly from the input/output buffer. Use of the packing routine permits various routines for matrix handling to perform a direct reference to the input/output buffer if data addresses have once been received. The packing routine offers a buffer by buffer backspace feature for efficient backspacing in sequential access. Unlike a conventional backspacing that needs twice back record for a single read of one record (one column), this feature omits overlapping of READ operation and back record. It eliminates the necessity of writing, in decomposition of a symmetric matrix, of a portion of the matrix to its upper triangular matrix from the last to the first columns of the symmetric matrix, thus saving time for generating the upper triangular matrix. Only a lower triangular matrix must be written onto the secondary storage device, bringing 10 to 30% reduction in use of the disk space of the storage device

    Communication-optimal Parallel and Sequential Cholesky Decomposition

    Full text link
    Numerical algorithms have two kinds of costs: arithmetic and communication, by which we mean either moving data between levels of a memory hierarchy (in the sequential case) or over a network connecting processors (in the parallel case). Communication costs often dominate arithmetic costs, so it is of interest to design algorithms minimizing communication. In this paper we first extend known lower bounds on the communication cost (both for bandwidth and for latency) of conventional (O(n^3)) matrix multiplication to Cholesky factorization, which is used for solving dense symmetric positive definite linear systems. Second, we compare the costs of various Cholesky decomposition implementations to these lower bounds and identify the algorithms and data structures that attain them. In the sequential case, we consider both the two-level and hierarchical memory models. Combined with prior results in [13, 14, 15], this gives a set of communication-optimal algorithms for O(n^3) implementations of the three basic factorizations of dense linear algebra: LU with pivoting, QR and Cholesky. But it goes beyond this prior work on sequential LU by optimizing communication for any number of levels of memory hierarchy.Comment: 29 pages, 2 tables, 6 figure
    corecore