422 research outputs found
Scalable iterative methods for sampling from massive Gaussian random vectors
Sampling from Gaussian Markov random fields (GMRFs), that is multivariate
Gaussian ran- dom vectors that are parameterised by the inverse of their
covariance matrix, is a fundamental problem in computational statistics. In
this paper, we show how we can exploit arbitrarily accu- rate approximations to
a GMRF to speed up Krylov subspace sampling methods. We also show that these
methods can be used when computing the normalising constant of a large
multivariate Gaussian distribution, which is needed for both any
likelihood-based inference method. The method we derive is also applicable to
other structured Gaussian random vectors and, in particu- lar, we show that
when the precision matrix is a perturbation of a (block) circulant matrix, it
is still possible to derive O(n log n) sampling schemes.Comment: 17 Pages, 4 Figure
Study of preconditioners based on Markov Chain Monte Carlo methods
Nowadays, analysis and design of novel scalable
methods and algorithms for fundamental linear algebra
problems such as solving Systems of Linear Algebraic Equations
with focus on large scale systems is a subject of study. This
research focuses on the study of novel mathematical methods
and scalable algorithms for computationally intensive problems
such as Monte Carlo and Hybrid Methods and Algorithms
Optimization of a parallel Monte Carlo method for linear algebra problems
Many problems in science and engineering can be represented by Systems of
Linear Algebraic Equations (SLAEs). Numerical methods such as direct or
iterative ones are used to solve these kind of systems. Depending on the size
and other factors that characterize these systems they can be sometimes
very difficult to solve even for iterative methods, requiring long time and
large amounts of computational resources. In these cases a preconditioning
approach should be applied.
Preconditioning is a technique used to transform a SLAE into a equivalent
but simpler system which requires less time and effort to be solved. The
matrix which performs such transformation is called the preconditioner [7].
There are preconditioners for both direct and iterative methods but they
are more commonly used among the later ones.
In the general case a preconditioned system will require less effort to
be solved than the original one. For example, when an iterative method is
being used, less iterations will be required or each iteration will require less
time, depending on the quality and the efficiency of the preconditioner.
There are different classes of preconditioners but we will focused only on
those that are based on the SParse Approximate Inverse (SPAI) approach.
These algorithms are based on the fact that the approximate inverse of a
given SLAE matrix can be used to approximate its result or to reduce its
complexity.
Monte Carlo methods are probabilistic methods, that use random numbers
to either simulate a stochastic behaviour or to estimate the solution of
a problem. They are good candidates for parallelization due to the fact that
many independent samples are used to estimate the solution. These samples
can be calculated in parallel, thereby speeding up the solution finding
process [27].
In the past there has been a lot of research around the use of Monte
Carlo methods to calculate SPAI preconditioners [1] [27] [10]. In this work
we present the implementation of a SPAI preconditioner that is based on a Monte Carlo method. This algorithm calculates the matrix inverse by sampling
a random variable which approximates the Neumann Series expansion.
Using the Neumman series it is possible to calculate the matrix inverse of
a system A by performing consecutive additions of the powers of a matrix
expressed by the series expansion of (I − A)
−1
.
Given the stochastic approach of the Monte Carlo algorithm, the computational
effort required to find an element of the inverse matrix is independent
from the size of the matrix. This allows to target systems that, due
to their size, can be prohibitive for common deterministic approaches [27].
Great part of this work is focused on the enhancement of this algorithm.
First, the current errors of the implementation were fixed, making the algorithm
able to target larger systems. Then multiple optimizations were
applied at different stages of the implementation making a better use of the
resources and improving the performance of the algorithm.
Four optimizations, with consistently improvements have been performed:
1. An inefficient implementation of the realloc function within the MPI
library was provoking the application to rapidly run out of memory.
This function was replaced by the malloc function and some slight
modifications to estimate the size of matrix A.
2. A coordinate format (COO) was introduced within the algorithm’s
core to make a more efficient use of the memory, avoiding several
unnecessary memory accesses.
3. A method to produce an intermediate matrix P was shown to produce
similar results to the default one and with matrix P being reduced to a
single vector, thus requiring less data. Given that this was a broadcast
data a diminishing on it, translated into a reduction of the broadcast
time.
4. Four individual procedures which accessed the whole initial matrix
memory, were merged into two processes, reducing this way the number
of memory accesses.
For each optimization applied, a comparison was performed to show the
particular improvements achieved. A set of different matrices, representing
different SLAEs, was used to show the consistency of these improvements.
In order to provide with insights about the scalability issues of the algorithm,
other approaches are presented to show the particularities of the
algorithm’s scalability: 1. Given that the original version of this algorithm was designed for a
cluster of single-core machines, an hybrid approach of MPI + openMP
was proposed to target the nowadays multi-core architectures. Surprisingly
this new approach did not show any improvement but it was
useful to show a scalability problem related to the random pattern
used to access the memory.
2. Having that common MPI implementations of the broadcast operation
do not take into account the different latencies between inter-node and
intra-node communications [25]. Therefore, we decided to implement
the broadcast in two steps. First by reaching a single process in each
of the compute nodes and then using those processes to perform a
local broadcast within their compute nodes. Results on this approach
showed that this method could lead to improvements when very big
systems are used.
Finally a comparison is carried out between the optimized version of the
Monte Carlo algorithm and the state of the art Modified SPAI (MSPAI).
Four metrics are used to compare these approaches:
1. The amount of time needed for the preconditioner construction.
2. The time needed by the solver to calculate the solution of the preconditioned
system.
3. The addition of the previous metrics, which gives a overview of the
quality and efficiency of the preconditioner.
4. The number of cores used in the preconditioner construction. This
gives an idea of the energy efficiency of the algorithm.
Results from previous comparison showed that Monte Carlo algorithm
can deal with both symmetric and nonsymmetric matrices while MSPAI
only performs well with the nonsymetric ones. Furthermore the time for
Monte Carlo’s algorithm is always faster for the preconditioner construction
and most of the times also for the solver calculation. This means that Monte
Carlo produces preconditioners of better or same quality than MSPAI. Finally,
the number of cores used in the Monte Carlo approach is always equal
or smaller than in the case of MSPAI
Status and Future Perspectives for Lattice Gauge Theory Calculations to the Exascale and Beyond
In this and a set of companion whitepapers, the USQCD Collaboration lays out
a program of science and computing for lattice gauge theory. These whitepapers
describe how calculation using lattice QCD (and other gauge theories) can aid
the interpretation of ongoing and upcoming experiments in particle and nuclear
physics, as well as inspire new ones.Comment: 44 pages. 1 of USQCD whitepapers
Computational linear algebra over finite fields
We present here algorithms for efficient computation of linear algebra
problems over finite fields
- …