338 research outputs found
Computing the R of the QR factorization of tall and skinny matrices using MPI_Reduce
A QR factorization of a tall and skinny matrix with n columns can be
represented as a reduction. The operation used along the reduction tree has in
input two n-by-n upper triangular matrices and in output an n-by-n upper
triangular matrix which is defined as the R factor of the two input matrices
stacked the one on top of the other. This operation is binary, associative, and
commutative. We can therefore leverage the MPI library capabilities by using
user-defined MPI operations and MPI_Reduce to perform this reduction. The
resulting code is compact and portable. In this context, the user relies on the
MPI library to select a reduction tree appropriate for the underlying
architecture
The Problem with the Linpack Benchmark Matrix Generator
We characterize the matrix sizes for which the Linpack Benchmark matrix
generator constructs a matrix with identical columns
Algorithmic Based Fault Tolerance Applied to High Performance Computing
We present a new approach to fault tolerance for High Performance Computing
system. Our approach is based on a careful adaptation of the Algorithmic Based
Fault Tolerance technique (Huang and Abraham, 1984) to the need of parallel
distributed computation. We obtain a strongly scalable mechanism for fault
tolerance. We can also detect and correct errors (bit-flip) on the fly of a
computation. To assess the viability of our approach, we have developed a fault
tolerant matrix-matrix multiplication subroutine and we propose some models to
predict its running time. Our parallel fault-tolerant matrix-matrix
multiplication scores 1.4 TFLOPS on 484 processors (cluster jacquard.nersc.gov)
and returns a correct result while one process failure has happened. This
represents 65% of the machine peak efficiency and less than 12% overhead with
respect to the fastest failure-free implementation. We predict (and have
observed) that, as we increase the processor count, the overhead of the fault
tolerance drops significantly
Computing the Conditioning of the Components of a Linear Least Squares Solution
In this paper, we address the accuracy of the results for the overdetermined
full rank linear least squares problem. We recall theoretical results obtained
in Arioli, Baboulin and Gratton, SIMAX 29(2):413--433, 2007, on conditioning of
the least squares solution and the components of the solution when the matrix
perturbations are measured in Frobenius or spectral norms. Then we define
computable estimates for these condition numbers and we interpret them in terms
of statistical quantities. In particular, we show that, in the classical linear
statistical model, the ratio of the variance of one component of the solution
by the variance of the right-hand side is exactly the condition number of this
solution component when perturbations on the right-hand side are considered. We
also provide fragment codes using LAPACK routines to compute the
variance-covariance matrix and the least squares conditioning and we give the
corresponding computational cost. Finally we present a small historical
numerical example that was used by Laplace in Theorie Analytique des
Probabilites, 1820, for computing the mass of Jupiter and experiments from the
space industry with real physical data
Tiled QR factorization algorithms
This work revisits existing algorithms for the QR factorization of
rectangular matrices composed of p-by-q tiles, where p >= q. Within this
framework, we study the critical paths and performance of algorithms such as
Sameh and Kuck, Modi and Clarke, Greedy, and those found within PLASMA.
Although neither Modi and Clarke nor Greedy is optimal, both are shown to be
asymptotically optimal for all matrices of size p = q^2 f(q), where f is any
function such that \lim_{+\infty} f= 0. This novel and important complexity
result applies to all matrices where p and q are proportional, p = \lambda q,
with \lambda >= 1, thereby encompassing many important situations in practice
(least squares). We provide an extensive set of experiments that show the
superiority of the new algorithms for tall matrices
- …
