2,083 research outputs found

    Exact Sparse Matrix-Vector Multiplication on GPU's and Multicore Architectures

    Full text link
    We propose different implementations of the sparse matrix--dense vector multiplication (\spmv{}) for finite fields and rings \Zb/m\Zb. We take advantage of graphic card processors (GPU) and multi-core architectures. Our aim is to improve the speed of \spmv{} in the \linbox library, and henceforth the speed of its black box algorithms. Besides, we use this and a new parallelization of the sigma-basis algorithm in a parallel block Wiedemann rank implementation over finite fields

    Computational linear algebra over finite fields

    Get PDF
    We present here algorithms for efficient computation of linear algebra problems over finite fields

    Faster Inversion and Other Black Box Matrix Computations Using Efficient Block Projections

    Get PDF
    Block projections have been used, in [Eberly et al. 2006], to obtain an efficient algorithm to find solutions for sparse systems of linear equations. A bound of softO(n^(2.5)) machine operations is obtained assuming that the input matrix can be multiplied by a vector with constant-sized entries in softO(n) machine operations. Unfortunately, the correctness of this algorithm depends on the existence of efficient block projections, and this has been conjectured. In this paper we establish the correctness of the algorithm from [Eberly et al. 2006] by proving the existence of efficient block projections over sufficiently large fields. We demonstrate the usefulness of these projections by deriving improved bounds for the cost of several matrix problems, considering, in particular, ``sparse'' matrices that can be be multiplied by a vector using softO(n) field operations. We show how to compute the inverse of a sparse matrix over a field F using an expected number of softO(n^(2.27)) operations in F. A basis for the null space of a sparse matrix, and a certification of its rank, are obtained at the same cost. An application to Kaltofen and Villard's Baby-Steps/Giant-Steps algorithms for the determinant and Smith Form of an integer matrix yields algorithms requiring softO(n^(2.66)) machine operations. The derived algorithms are all probabilistic of the Las Vegas type

    Solving Sparse Integer Linear Systems

    Get PDF
    We propose a new algorithm to solve sparse linear systems of equations over the integers. This algorithm is based on a pp-adic lifting technique combined with the use of block matrices with structured blocks. It achieves a sub-cubic complexity in terms of machine operations subject to a conjecture on the effectiveness of certain sparse projections. A LinBox-based implementation of this algorithm is demonstrated, and emphasizes the practical benefits of this new method over the previous state of the art

    Fast Computation of Smith Forms of Sparse Matrices Over Local Rings

    Full text link
    We present algorithms to compute the Smith Normal Form of matrices over two families of local rings. The algorithms use the \emph{black-box} model which is suitable for sparse and structured matrices. The algorithms depend on a number of tools, such as matrix rank computation over finite fields, for which the best-known time- and memory-efficient algorithms are probabilistic. For an \nxn matrix AA over the ring \Fzfe, where fef^e is a power of an irreducible polynomial f \in \Fz of degree dd, our algorithm requires \bigO(\eta de^2n) operations in \F, where our black-box is assumed to require \bigO(\eta) operations in \F to compute a matrix-vector product by a vector over \Fzfe (and η\eta is assumed greater than \Pden). The algorithm only requires additional storage for \bigO(\Pden) elements of \F. In particular, if \eta=\softO(\Pden), then our algorithm requires only \softO(n^2d^2e^3) operations in \F, which is an improvement on known dense methods for small dd and ee. For the ring \ZZ/p^e\ZZ, where pp is a prime, we give an algorithm which is time- and memory-efficient when the number of nontrivial invariant factors is small. We describe a method for dimension reduction while preserving the invariant factors. The time complexity is essentially linear in μnrelogp,\mu n r e \log p, where μ\mu is the number of operations in \ZZ/p\ZZ to evaluate the black-box (assumed greater than nn) and rr is the total number of non-zero invariant factors. To avoid the practical cost of conditioning, we give a Monte Carlo certificate, which at low cost, provides either a high probability of success or a proof of failure. The quest for a time- and memory-efficient solution without restrictions on the number of nontrivial invariant factors remains open. We offer a conjecture which may contribute toward that end.Comment: Preliminary version to appear at ISSAC 201

    Efficient Computation of the Characteristic Polynomial

    Full text link
    This article deals with the computation of the characteristic polynomial of dense matrices over small finite fields and over the integers. We first present two algorithms for the finite fields: one is based on Krylov iterates and Gaussian elimination. We compare it to an improvement of the second algorithm of Keller-Gehrig. Then we show that a generalization of Keller-Gehrig's third algorithm could improve both complexity and computational time. We use these results as a basis for the computation of the characteristic polynomial of integer matrices. We first use early termination and Chinese remaindering for dense matrices. Then a probabilistic approach, based on integer minimal polynomial and Hensel factorization, is particularly well suited to sparse and/or structured matrices

    Interactive certificate for the verification of Wiedemann's Krylov sequence: application to the certification of the determinant, the minimal and the characteristic polynomials of sparse matrices

    Get PDF
    Certificates to a linear algebra computation are additional data structures for each output, which can be used by a-possibly randomized- verification algorithm that proves the correctness of each output. Wiede-mann's algorithm projects the Krylov sequence obtained by repeatedly multiplying a vector by a matrix to obtain a linearly recurrent sequence. The minimal polynomial of this sequence divides the minimal polynomial of the matrix. For instance, if the n×nn\times n input matrix is sparse with n 1+o(1) non-zero entries, the computation of the sequence is quadratic in the dimension of the matrix while the computation of the minimal polynomial is n 1+o(1), once that projected Krylov sequence is obtained. In this paper we give algorithms that compute certificates for the Krylov sequence of sparse or structured n×nn\times n matrices over an abstract field, whose Monte Carlo verification complexity can be made essentially linear. As an application this gives certificates for the determinant, the minimal and characteristic polynomials of sparse or structured matrices at the same cost

    Solution of Large Sparse System of Linear Equations over GF(2) on a Multi Node Multi GPU Platform

    Get PDF
    We provide an efficient multi-node, multi-GPU implementation of the Block Wiedemann Algorithm (BWA)to find the solution of a large sparse system of linear equations over GF(2). One of the important applications ofsolving such systems arises in most integer factorization algorithms like Number Field Sieve. In this paper, wedescribe how hybrid parallelization can be adapted to speed up the most time-consuming sequence generation stage of BWA. This stage involves generating a sequence of matrix-matrix products and matrix transpose-matrix products where the matrices are very large, highly sparse, and have entries over GF(2). We describe a GPU-accelerated parallel method for the computation of these matrix-matrix products using techniques like row-wise parallel distribution of the first matrix over multi-node multi-GPU platform using MPI and CUDA and word-wise XORing of rows of the second matrix. We also describe the hybrid parallelization of matrix transpose-matrix product computation, where we divide both the matrices row-wise into equal-sized blocks using MPI. Then after a GPU-accelerated matrix transpose-matrix product generation, we combine all those blocks using MPI_BXOR operation in MPI_Reduce to obtain the result. The performance of hybrid parallelization of the sequence generation step on a hybrid cluster using multiple GPUs has been compared with parallelization on only multiple MPI processors. We have used this hybrid parallel sequence generation tool for the benchmarking of an HPC cluster. Detailed timings of the complete solution of number field sieve matrices of RSA-130, RSA-140, and RSA-170 are also compared in this paper using up to 4 NVidia V100 GPUs of a DGX station. We got a speedup of 2.8 after parallelization on 4 V100 GPUs compared to that over 1 GPU
    corecore