2,083 research outputs found
Exact Sparse Matrix-Vector Multiplication on GPU's and Multicore Architectures
We propose different implementations of the sparse matrix--dense vector
multiplication (\spmv{}) for finite fields and rings \Zb/m\Zb. We take
advantage of graphic card processors (GPU) and multi-core architectures. Our
aim is to improve the speed of \spmv{} in the \linbox library, and henceforth
the speed of its black box algorithms. Besides, we use this and a new
parallelization of the sigma-basis algorithm in a parallel block Wiedemann rank
implementation over finite fields
Computational linear algebra over finite fields
We present here algorithms for efficient computation of linear algebra
problems over finite fields
Faster Inversion and Other Black Box Matrix Computations Using Efficient Block Projections
Block projections have been used, in [Eberly et al. 2006], to obtain an
efficient algorithm to find solutions for sparse systems of linear equations. A
bound of softO(n^(2.5)) machine operations is obtained assuming that the input
matrix can be multiplied by a vector with constant-sized entries in softO(n)
machine operations. Unfortunately, the correctness of this algorithm depends on
the existence of efficient block projections, and this has been conjectured. In
this paper we establish the correctness of the algorithm from [Eberly et al.
2006] by proving the existence of efficient block projections over sufficiently
large fields. We demonstrate the usefulness of these projections by deriving
improved bounds for the cost of several matrix problems, considering, in
particular, ``sparse'' matrices that can be be multiplied by a vector using
softO(n) field operations. We show how to compute the inverse of a sparse
matrix over a field F using an expected number of softO(n^(2.27)) operations in
F. A basis for the null space of a sparse matrix, and a certification of its
rank, are obtained at the same cost. An application to Kaltofen and Villard's
Baby-Steps/Giant-Steps algorithms for the determinant and Smith Form of an
integer matrix yields algorithms requiring softO(n^(2.66)) machine operations.
The derived algorithms are all probabilistic of the Las Vegas type
Solving Sparse Integer Linear Systems
We propose a new algorithm to solve sparse linear systems of equations over
the integers. This algorithm is based on a -adic lifting technique combined
with the use of block matrices with structured blocks. It achieves a sub-cubic
complexity in terms of machine operations subject to a conjecture on the
effectiveness of certain sparse projections. A LinBox-based implementation of
this algorithm is demonstrated, and emphasizes the practical benefits of this
new method over the previous state of the art
Fast Computation of Smith Forms of Sparse Matrices Over Local Rings
We present algorithms to compute the Smith Normal Form of matrices over two
families of local rings.
The algorithms use the \emph{black-box} model which is suitable for sparse
and structured matrices. The algorithms depend on a number of tools, such as
matrix rank computation over finite fields, for which the best-known time- and
memory-efficient algorithms are probabilistic.
For an \nxn matrix over the ring \Fzfe, where is a power of an
irreducible polynomial f \in \Fz of degree , our algorithm requires
\bigO(\eta de^2n) operations in \F, where our black-box is assumed to
require \bigO(\eta) operations in \F to compute a matrix-vector product by
a vector over \Fzfe (and is assumed greater than \Pden). The
algorithm only requires additional storage for \bigO(\Pden) elements of \F.
In particular, if \eta=\softO(\Pden), then our algorithm requires only
\softO(n^2d^2e^3) operations in \F, which is an improvement on known dense
methods for small and .
For the ring \ZZ/p^e\ZZ, where is a prime, we give an algorithm which
is time- and memory-efficient when the number of nontrivial invariant factors
is small. We describe a method for dimension reduction while preserving the
invariant factors. The time complexity is essentially linear in where is the number of operations in \ZZ/p\ZZ to evaluate the
black-box (assumed greater than ) and is the total number of non-zero
invariant factors.
To avoid the practical cost of conditioning, we give a Monte Carlo
certificate, which at low cost, provides either a high probability of success
or a proof of failure. The quest for a time- and memory-efficient solution
without restrictions on the number of nontrivial invariant factors remains
open. We offer a conjecture which may contribute toward that end.Comment: Preliminary version to appear at ISSAC 201
Efficient Computation of the Characteristic Polynomial
This article deals with the computation of the characteristic polynomial of
dense matrices over small finite fields and over the integers. We first present
two algorithms for the finite fields: one is based on Krylov iterates and
Gaussian elimination. We compare it to an improvement of the second algorithm
of Keller-Gehrig. Then we show that a generalization of Keller-Gehrig's third
algorithm could improve both complexity and computational time. We use these
results as a basis for the computation of the characteristic polynomial of
integer matrices. We first use early termination and Chinese remaindering for
dense matrices. Then a probabilistic approach, based on integer minimal
polynomial and Hensel factorization, is particularly well suited to sparse
and/or structured matrices
Interactive certificate for the verification of Wiedemann's Krylov sequence: application to the certification of the determinant, the minimal and the characteristic polynomials of sparse matrices
Certificates to a linear algebra computation are additional data structures
for each output, which can be used by a-possibly randomized- verification
algorithm that proves the correctness of each output. Wiede-mann's algorithm
projects the Krylov sequence obtained by repeatedly multiplying a vector by a
matrix to obtain a linearly recurrent sequence. The minimal polynomial of this
sequence divides the minimal polynomial of the matrix. For instance, if the
input matrix is sparse with n 1+o(1) non-zero entries, the
computation of the sequence is quadratic in the dimension of the matrix while
the computation of the minimal polynomial is n 1+o(1), once that projected
Krylov sequence is obtained. In this paper we give algorithms that compute
certificates for the Krylov sequence of sparse or structured
matrices over an abstract field, whose Monte Carlo verification complexity can
be made essentially linear. As an application this gives certificates for the
determinant, the minimal and characteristic polynomials of sparse or structured
matrices at the same cost
Solution of Large Sparse System of Linear Equations over GF(2) on a Multi Node Multi GPU Platform
We provide an efficient multi-node, multi-GPU implementation of the Block Wiedemann Algorithm (BWA)to find the solution of a large sparse system of linear equations over GF(2). One of the important applications ofsolving such systems arises in most integer factorization algorithms like Number Field Sieve. In this paper, wedescribe how hybrid parallelization can be adapted to speed up the most time-consuming sequence generation stage of BWA. This stage involves generating a sequence of matrix-matrix products and matrix transpose-matrix products where the matrices are very large, highly sparse, and have entries over GF(2). We describe a GPU-accelerated parallel method for the computation of these matrix-matrix products using techniques like row-wise parallel distribution of the first matrix over multi-node multi-GPU platform using MPI and CUDA and word-wise XORing of rows of the second matrix. We also describe the hybrid parallelization of matrix transpose-matrix product computation, where we divide both the matrices row-wise into equal-sized blocks using MPI. Then after a GPU-accelerated matrix transpose-matrix product generation, we combine all those blocks using MPI_BXOR operation in MPI_Reduce to obtain the result. The performance of hybrid parallelization of the sequence generation step on a hybrid cluster using multiple GPUs has been compared with parallelization on only multiple MPI processors. We have used this hybrid parallel sequence generation tool for the benchmarking of an HPC cluster. Detailed timings of the complete solution of number field sieve matrices of RSA-130, RSA-140, and RSA-170 are also compared in this paper using up to 4 NVidia V100 GPUs of a DGX station. We got a speedup of 2.8 after parallelization on 4 V100 GPUs compared to that over 1 GPU
- …