204 research outputs found
Scalability Analysis of Parallel GMRES Implementations
Applications involving large sparse nonsymmetric linear systems encourage parallel implementations of robust iterative solution methods, such as GMRES(k). Two parallel versions of GMRES(k) based on different data distributions and using Householder reflections in the orthogonalization phase, and variations of these which adapt the restart value k, are analyzed with respect to scalability (their ability to maintain fixed efficiency with an increase in problem size and number of processors).A theoretical algorithm-machine model for scalability is derived and validated by experiments on three parallel computers, each with different machine characteristics
Computational Efficiency in Bayesian Model and Variable Selection
Large scale Bayesian model averaging and variable selection exercises present, despite the great increase in desktop computing power, considerable computational challenges. Due to the large scale it is impossible to evaluate all possible models and estimates of posterior probabilities are instead obtained from stochastic (MCMC) schemes designed to converge on the posterior distribution over the model space. While this frees us from the requirement of evaluating all possible models the computational effort is still substantial and efficient implementation is vital. Efficient implementation is concerned with two issues: the efficiency of the MCMC algorithm itself and efficient computation of the quantities needed to obtain a draw from the MCMC algorithm. We evaluate several different MCMC algorithms and find that relatively simple algorithms with local moves perform competitively except possibly when the data is highly collinear. For the second aspect, efficient computation within the sampler, we focus on the important case of linear models where the computations essentially reduce to least squares calculations. Least squares solvers that update a previous model estimate are appealing when the MCMC algorithm makes local moves and we find that the Cholesky update is both fast and accurate.Bayesian Model Averaging; Sweep operator; Cholesky decomposition; QR decomposition; Swendsen-Wang algorithm
On some orthogonalization schemes in Tensor Train format
In the framework of tensor spaces, we consider orthogonalization kernels to
generate an orthogonal basis of a tensor subspace from a set of linearly
independent tensors. In particular, we experimentally study the loss of
orthogonality of six orthogonalization methods, namely Classical and Modified
Gram-Schmidt with (CGS2, MGS2) and without (CGS, MGS) re-orthogonalization, the
Gram approach, and the Householder transformation. To overcome the curse of
dimensionality, we represent tensors with a low-rank approximation using the
Tensor Train (TT) formalism. In addition, we introduce recompression steps in
the standard algorithm outline through the TT-rounding method at a prescribed
accuracy. After describing the structure and properties of the algorithms, we
illustrate their loss of orthogonality with numerical experiments. The
theoretical bounds from the classical matrix computation round-off analysis,
obtained over several decades, seem to be maintained, with the unit round-off
replaced by the TT-rounding accuracy. The computational analysis for each
orthogonalization kernel in terms of the memory requirements and the
computational complexity measured as a function of the number of TT-rounding,
which happens to be the most computationally expensive operation, completes the
study
Computational Efficiency in Bayesian Model and Variable Selection
This paper is concerned with the efficient implementation of Bayesian model averaging (BMA) and Bayesian variable selection, when the number of candidate variables and models is large, and estimation of posterior model probabilities must be based on a subset of the models. Efficient implementation is concerned with two issues, the efficiency of the MCMC algorithm itself and efficient computation of the quantities needed to obtain a draw from the MCMC algorithm. For the first aspect, it is desirable that the chain moves well and quickly through the model space and takes draws from regions with high probabilities. In this context there is a natural trade-off between local moves, which make use of the current parameter values to propose plausible values for model parameters, and more global transitions, which potentially allow exploration of the distribution of interest in fewer steps, but where each step is more computationally intensive. We assess the convergence properties of simple samplers based on local moves and some recently proposed algorithms intended to improve on the basic samplers. For the second aspect, efficient computation within the sampler, we focus on the important case of linear models where the computations essentially reduce to least squares calculations. When the chain makes local moves, adding or dropping a variable, substantial gains in efficiency can be made by updating the previous least squares solution.
GMRES implementations and residual smoothing techniques for solving ill-posed linear systems
AbstractThere are verities of useful Krylov subspace methods to solve nonsymmetric linear system of equations. GMRES is one of the best Krylov solvers with several different variants to solve large sparse linear systems. Any GMRES implementation has some advantages. As the solution of ill-posed problems are important. In this paper, some GMRES variants are discussed and applied to solve these kinds of problems. Residual smoothing techniques are efficient ways to accelerate the convergence speed of some iterative methods like CG variants. At the end of this paper, some residual smoothing techniques are applied for different GMRES methods to test the influence of these techniques on GMRES implementations
An experimental comparison of several approaches to the linear least squares problem
Algorithms and original data matrix approaches compared for linear least squares proble
Least squares residuals and minimal residual methods
We study Krylov subspace methods for solving unsymmetric linear algebraic systems that minimize the norm of the residual at each step (minimal residual (MR) methods). MR methods are often formulated in terms of a sequence of least squares (LS) problems of increasing dimension. We present several basic identities and bounds for the LS residual. These results are interesting in the general context of solving LS problems. When applied to MR methods, they show that the size of the MR residual is strongly related to the conditioning of different bases of the same Krylov subspace. Using different bases is useful in theory because relating convergence to the characteristics of different bases offers new insight into the behavior of MR methods.
Different bases also lead to different implementations which are mathematically equivalent but can differ numerically. Our theoretical results are used for a finite precision analysis of implementations of the GMRES method [Y. Saad and M. H. Schultz, SIAM J. Sci. Statist. Comput., 7 (1986), pp. 856--869]. We explain that the choice of the basis is fundamental for the numerical stability of the implementation. As demonstrated in the case of Simpler GMRES [H. F. Walker and L. Zhou, Numer. Linear Algebra Appl., 1 (1994), pp. 571--581], the best orthogonalization technique used for computing the basis does not compensate for the loss of accuracy due to an inappropriate choice of the basis. In particular, we prove that Simpler GMRES is inherently less numerically stable than the Classical GMRES implementation due to Saad and Schultz [SIAM J. Sci. Statist. Comput., 7 (1986), pp. 856--869]
Randomized block Gram-Schmidt process for solution of linear systems and eigenvalue problems
We propose a block version of the randomized Gram-Schmidt process for
computing a QR factorization of a matrix. Our algorithm inherits the major
properties of its single-vector analogue from [Balabanov and Grigori, 2020]
such as higher efficiency than the classical Gram-Schmidt algorithm and
stability of the modified Gram-Schmidt algorithm, which can be refined even
further by using multi-precision arithmetic. As in [Balabanov and Grigori,
2020], our algorithm has an advantage of performing standard high-dimensional
operations, that define the overall computational cost, with a unit roundoff
independent of the dominant dimension of the matrix. This unique feature makes
the methodology especially useful for large-scale problems computed on
low-precision arithmetic architectures. Block algorithms are advantageous in
terms of performance as they are mainly based on cache-friendly matrix-wise
operations, and can reduce communication cost in high-performance computing.
The block Gram-Schmidt orthogonalization is the key element in the block
Arnoldi procedure for the construction of Krylov basis, which in its turn is
used in GMRES and Rayleigh-Ritz methods for the solution of linear systems and
clustered eigenvalue problems. In this article, we develop randomized versions
of these methods, based on the proposed randomized Gram-Schmidt algorithm, and
validate them on nontrivial numerical examples
- …