920 research outputs found
Numerically Stable Recurrence Relations for the Communication Hiding Pipelined Conjugate Gradient Method
Pipelined Krylov subspace methods (also referred to as communication-hiding
methods) have been proposed in the literature as a scalable alternative to
classic Krylov subspace algorithms for iteratively computing the solution to a
large linear system in parallel. For symmetric and positive definite system
matrices the pipelined Conjugate Gradient method outperforms its classic
Conjugate Gradient counterpart on large scale distributed memory hardware by
overlapping global communication with essential computations like the
matrix-vector product, thus hiding global communication. A well-known drawback
of the pipelining technique is the (possibly significant) loss of numerical
stability. In this work a numerically stable variant of the pipelined Conjugate
Gradient algorithm is presented that avoids the propagation of local rounding
errors in the finite precision recurrence relations that construct the Krylov
subspace basis. The multi-term recurrence relation for the basis vector is
replaced by two-term recurrences, improving stability without increasing the
overall computational cost of the algorithm. The proposed modification ensures
that the pipelined Conjugate Gradient method is able to attain a highly
accurate solution independently of the pipeline length. Numerical experiments
demonstrate a combination of excellent parallel performance and improved
maximal attainable accuracy for the new pipelined Conjugate Gradient algorithm.
This work thus resolves one of the major practical restrictions for the
useability of pipelined Krylov subspace methods.Comment: 15 pages, 5 figures, 1 table, 2 algorithm
Accelerated Discontinuous Galerkin Solvers with the Chebyshev Iterative Method on the Graphics Processing Unit
This work demonstrates implementations of the discontinuous Galerkin (DG) method on graphics processing units (GPU), which deliver improved computational time compared to the conventional central processing unit (CPU). The linear system developed when applying the DG method to an elliptic problem is solved using the GPU. The conjugate gradient (CG) method and the Chebyshev iterative method are the linear system solvers that are compared, to see which is more efficient when computing with the CPU's parallel architecture. When applying both methods, computational times decreased for large problems executed on the GPU compared to CPU; however, CG is the more efficient method compared to the Chebyshev iterative method. In addition, a constant-free upper bound for the DC spectrum applied to the elliptic problem is developed. Few previous works combine the DG method and the GPU. This thesis will provide useful guidelines for the numerical solution of elliptic problems using DG on the GPU
Matrix-equation-based strategies for convection-diffusion equations
We are interested in the numerical solution of nonsymmetric linear systems
arising from the discretization of convection-diffusion partial differential
equations with separable coefficients and dominant convection. Preconditioners
based on the matrix equation formulation of the problem are proposed, which
naturally approximate the original discretized problem. For certain types of
convection coefficients, we show that the explicit solution of the matrix
equation can effectively replace the linear system solution. Numerical
experiments with data stemming from two and three dimensional problems are
reported, illustrating the potential of the proposed methodology
Preconditioners for Krylov subspace methods: An overview
When simulating a mechanism from science or engineering, or an industrial process, one is frequently required to construct a mathematical model, and then resolve this model numerically. If accurate numerical solutions are necessary or desirable, this can involve solving large-scale systems of equations. One major class of solution methods is that of preconditioned iterative methods, involving preconditioners which are computationally cheap to apply while also capturing information contained in the linear system. In this article, we give a short survey of the field of preconditioning. We introduce a range of preconditioners for partial differential equations, followed by optimization problems, before discussing preconditioners constructed with less standard objectives in mind
A Bayesian conjugate gradient method (with Discussion)
A fundamental task in numerical computation is the solution of large linear
systems. The conjugate gradient method is an iterative method which offers
rapid convergence to the solution, particularly when an effective
preconditioner is employed. However, for more challenging systems a substantial
error can be present even after many iterations have been performed. The
estimates obtained in this case are of little value unless further information
can be provided about the numerical error. In this paper we propose a novel
statistical model for this numerical error set in a Bayesian framework. Our
approach is a strict generalisation of the conjugate gradient method, which is
recovered as the posterior mean for a particular choice of prior. The estimates
obtained are analysed with Krylov subspace methods and a contraction result for
the posterior is presented. The method is then analysed in a simulation study
as well as being applied to a challenging problem in medical imaging
CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra
Many areas of machine learning and science involve large linear algebra
problems, such as eigendecompositions, solving linear systems, computing matrix
exponentials, and trace estimation. The matrices involved often have Kronecker,
convolutional, block diagonal, sum, or product structure. In this paper, we
propose a simple but general framework for large-scale linear algebra problems
in machine learning, named CoLA (Compositional Linear Algebra). By combining a
linear operator abstraction with compositional dispatch rules, CoLA
automatically constructs memory and runtime efficient numerical algorithms.
Moreover, CoLA provides memory efficient automatic differentiation, low
precision computation, and GPU acceleration in both JAX and PyTorch, while also
accommodating new objects, operations, and rules in downstream packages via
multiple dispatch. CoLA can accelerate many algebraic operations, while making
it easy to prototype matrix structures and algorithms, providing an appealing
drop-in tool for virtually any computational effort that requires linear
algebra. We showcase its efficacy across a broad range of applications,
including partial differential equations, Gaussian processes, equivariant model
construction, and unsupervised learning.Comment: Code available at https://github.com/wilson-labs/col
Faster Randomized Interior Point Methods for Tall/Wide Linear Programs
Linear programming (LP) is an extremely useful tool which has been
successfully applied to solve various problems in a wide range of areas,
including operations research, engineering, economics, or even more abstract
mathematical areas such as combinatorics. It is also used in many machine
learning applications, such as -regularized SVMs, basis pursuit,
nonnegative matrix factorization, etc. Interior Point Methods (IPMs) are one of
the most popular methods to solve LPs both in theory and in practice. Their
underlying complexity is dominated by the cost of solving a system of linear
equations at each iteration. In this paper, we consider both feasible and
infeasible IPMs for the special case where the number of variables is much
larger than the number of constraints. Using tools from Randomized Linear
Algebra, we present a preconditioning technique that, when combined with the
iterative solvers such as Conjugate Gradient or Chebyshev Iteration, provably
guarantees that IPM algorithms (suitably modified to account for the error
incurred by the approximate solver), converge to a feasible, approximately
optimal solution, without increasing their iteration complexity. Our empirical
evaluations verify our theoretical results on both real-world and synthetic
data.Comment: Extended version of the NeurIPS 2020 submission. arXiv admin note:
substantial text overlap with arXiv:2003.0807
Asynchronous and Multiprecision Linear Solvers - Scalable and Fault-Tolerant Numerics for Energy Efficient High Performance Computing
Asynchronous methods minimize idle times by removing synchronization barriers, and therefore allow the efficient usage of computer systems. The implied high tolerance with respect to communication latencies improves the fault tolerance. As asynchronous methods also enable the usage of the power and energy saving mechanisms provided by the hardware, they are suitable candidates for the highly parallel and heterogeneous hardware platforms that are expected for the near future
- …