40,766 research outputs found
Incomplete Orthogonal Factorization Methods Using Givens Rotations II: Implementation and Results
We present, implement and test a series of incomplete orthogonal factorization methods based on Givens rotations for large sparse unsymmetric matrices. These methods include: column-Incomplete Givens Orthogonalization (cIGO-method), which drops entries by position only; column-Threshold Incomplete Givens Orthogonalization (cTIGO-method) which drops entries dynamically by both their magnitudes and positions and where the reduction via Givens rotations is done in a column-wise fashion; and, row-Threshold Incomplete Givens Orthogonalization (r-TIGO-method) which again drops entries dynamically, but only magnitude is now taken into account and reduction is performed in a row-wise fashion. We give comprehensive accounts of how one would code these algorithms using a high level language to ensure efficiency of computation and memory use. The methods are then applied to a variety of square systems and their performance as preconditioners is tested against standard incomplete LU factorization techniques. For rectangular matrices corresponding to least-squares problems, the resulting incomplete factorizations are applied as preconditioners for conjugate gradients for the system of normal equations. A comprehensive discussion about the uses, advantages and shortcomings of these preconditioners is given
Uniform random generation of large acyclic digraphs
Directed acyclic graphs are the basic representation of the structure
underlying Bayesian networks, which represent multivariate probability
distributions. In many practical applications, such as the reverse engineering
of gene regulatory networks, not only the estimation of model parameters but
the reconstruction of the structure itself is of great interest. As well as for
the assessment of different structure learning algorithms in simulation
studies, a uniform sample from the space of directed acyclic graphs is required
to evaluate the prevalence of certain structural features. Here we analyse how
to sample acyclic digraphs uniformly at random through recursive enumeration,
an approach previously thought too computationally involved. Based on
complexity considerations, we discuss in particular how the enumeration
directly provides an exact method, which avoids the convergence issues of the
alternative Markov chain methods and is actually computationally much faster.
The limiting behaviour of the distribution of acyclic digraphs then allows us
to sample arbitrarily large graphs. Building on the ideas of recursive
enumeration based sampling we also introduce a novel hybrid Markov chain with
much faster convergence than current alternatives while still being easy to
adapt to various restrictions. Finally we discuss how to include such
restrictions in the combinatorial enumeration and the new hybrid Markov chain
method for efficient uniform sampling of the corresponding graphs.Comment: 15 pages, 2 figures. To appear in Statistics and Computin
A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures
As multicore systems continue to gain ground in the High Performance
Computing world, linear algebra algorithms have to be reformulated or new
algorithms have to be developed in order to take advantage of the architectural
features on these new processors. Fine grain parallelism becomes a major
requirement and introduces the necessity of loose synchronization in the
parallel execution of an operation. This paper presents an algorithm for the
Cholesky, LU and QR factorization where the operations can be represented as a
sequence of small tasks that operate on square blocks of data. These tasks can
be dynamically scheduled for execution based on the dependencies among them and
on the availability of computational resources. This may result in an out of
order execution of the tasks which will completely hide the presence of
intrinsically sequential tasks in the factorization. Performance comparisons
are presented with the LAPACK algorithms where parallelism can only be
exploited at the level of the BLAS operations and vendor implementations
A parallel algorithm for Hamiltonian matrix construction in electron-molecule collision calculations: MPI-SCATCI
Construction and diagonalization of the Hamiltonian matrix is the
rate-limiting step in most low-energy electron -- molecule collision
calculations. Tennyson (J Phys B, 29 (1996) 1817) implemented a novel algorithm
for Hamiltonian construction which took advantage of the structure of the
wavefunction in such calculations. This algorithm is re-engineered to make use
of modern computer architectures and the use of appropriate diagonalizers is
considered. Test calculations demonstrate that significant speed-ups can be
gained using multiple CPUs. This opens the way to calculations which consider
higher collision energies, larger molecules and / or more target states. The
methodology, which is implemented as part of the UK molecular R-matrix codes
(UKRMol and UKRMol+) can also be used for studies of bound molecular Rydberg
states, photoionisation and positron-molecule collisions.Comment: Write up of a computer program MPI-SCATCI Computer Physics
Communications, in pres
Theory and computation of covariant Lyapunov vectors
Lyapunov exponents are well-known characteristic numbers that describe growth
rates of perturbations applied to a trajectory of a dynamical system in
different state space directions. Covariant (or characteristic) Lyapunov
vectors indicate these directions. Though the concept of these vectors has been
known for a long time, they became practically computable only recently due to
algorithms suggested by Ginelli et al. [Phys. Rev. Lett. 99, 2007, 130601] and
by Wolfe and Samelson [Tellus 59A, 2007, 355]. In view of the great interest in
covariant Lyapunov vectors and their wide range of potential applications, in
this article we summarize the available information related to Lyapunov vectors
and provide a detailed explanation of both the theoretical basics and numerical
algorithms. We introduce the notion of adjoint covariant Lyapunov vectors. The
angles between these vectors and the original covariant vectors are
norm-independent and can be considered as characteristic numbers. Moreover, we
present and study in detail an improved approach for computing covariant
Lyapunov vectors. Also we describe, how one can test for hyperbolicity of
chaotic dynamics without explicitly computing covariant vectors.Comment: 21 pages, 5 figure
Improvements in sparse matrix operations of NASTRAN
A "nontransmit" packing routine was added to NASTRAN to allow matrix data to be refered to directly from the input/output buffer. Use of the packing routine permits various routines for matrix handling to perform a direct reference to the input/output buffer if data addresses have once been received. The packing routine offers a buffer by buffer backspace feature for efficient backspacing in sequential access. Unlike a conventional backspacing that needs twice back record for a single read of one record (one column), this feature omits overlapping of READ operation and back record. It eliminates the necessity of writing, in decomposition of a symmetric matrix, of a portion of the matrix to its upper triangular matrix from the last to the first columns of the symmetric matrix, thus saving time for generating the upper triangular matrix. Only a lower triangular matrix must be written onto the secondary storage device, bringing 10 to 30% reduction in use of the disk space of the storage device
Communication-optimal Parallel and Sequential Cholesky Decomposition
Numerical algorithms have two kinds of costs: arithmetic and communication,
by which we mean either moving data between levels of a memory hierarchy (in
the sequential case) or over a network connecting processors (in the parallel
case). Communication costs often dominate arithmetic costs, so it is of
interest to design algorithms minimizing communication. In this paper we first
extend known lower bounds on the communication cost (both for bandwidth and for
latency) of conventional (O(n^3)) matrix multiplication to Cholesky
factorization, which is used for solving dense symmetric positive definite
linear systems. Second, we compare the costs of various Cholesky decomposition
implementations to these lower bounds and identify the algorithms and data
structures that attain them. In the sequential case, we consider both the
two-level and hierarchical memory models. Combined with prior results in [13,
14, 15], this gives a set of communication-optimal algorithms for O(n^3)
implementations of the three basic factorizations of dense linear algebra: LU
with pivoting, QR and Cholesky. But it goes beyond this prior work on
sequential LU by optimizing communication for any number of levels of memory
hierarchy.Comment: 29 pages, 2 tables, 6 figure
- …