37 research outputs found
Subquadratic computation of vector generating polynomials and improvement of the block Wiedemann algorithm
This paper describes a new algorithm for computing linear generators (vector generating polynomials) for matrix sequences, running in sub-quadratic time. This algorithm applies in particular to the sequential stage of Coppersmith's block Wiedemann algorithm. Experiments showed that our method can be substituted in place of the quadratic one proposed by Coppersmith, yielding important speedups even for realistic matrix sizes. The base fields we were interested in were finite fields of large characteristic. As an example, we have been able to compute a linear generator for a sequence of 4*4 matrices of length 242 304 defined over GF(2^607) in less than two days on one 667MHz alpha ev67 cpu
Computing the Characteristic Polynomial of a Finite Rank Two Drinfeld Module
Motivated by finding analogues of elliptic curve point counting techniques,
we introduce one deterministic and two new Monte Carlo randomized algorithms to
compute the characteristic polynomial of a finite rank-two Drinfeld module. We
compare their asymptotic complexity to that of previous algorithms given by
Gekeler, Narayanan and Garai-Papikian and discuss their practical behavior. In
particular, we find that all three approaches represent either an improvement
in complexity or an expansion of the parameter space over which the algorithm
may be applied. Some experimental results are also presented
Solution of Large Sparse System of Linear Equations over GF(2) on a Multi Node Multi GPU Platform
We provide an efficient multi-node, multi-GPU implementation of the Block Wiedemann Algorithm (BWA)to find the solution of a large sparse system of linear equations over GF(2). One of the important applications ofsolving such systems arises in most integer factorization algorithms like Number Field Sieve. In this paper, wedescribe how hybrid parallelization can be adapted to speed up the most time-consuming sequence generation stage of BWA. This stage involves generating a sequence of matrix-matrix products and matrix transpose-matrix products where the matrices are very large, highly sparse, and have entries over GF(2). We describe a GPU-accelerated parallel method for the computation of these matrix-matrix products using techniques like row-wise parallel distribution of the first matrix over multi-node multi-GPU platform using MPI and CUDA and word-wise XORing of rows of the second matrix. We also describe the hybrid parallelization of matrix transpose-matrix product computation, where we divide both the matrices row-wise into equal-sized blocks using MPI. Then after a GPU-accelerated matrix transpose-matrix product generation, we combine all those blocks using MPI_BXOR operation in MPI_Reduce to obtain the result. The performance of hybrid parallelization of the sequence generation step on a hybrid cluster using multiple GPUs has been compared with parallelization on only multiple MPI processors. We have used this hybrid parallel sequence generation tool for the benchmarking of an HPC cluster. Detailed timings of the complete solution of number field sieve matrices of RSA-130, RSA-140, and RSA-170 are also compared in this paper using up to 4 NVidia V100 GPUs of a DGX station. We got a speedup of 2.8 after parallelization on 4 V100 GPUs compared to that over 1 GPU
A kilobit hidden SNFS discrete logarithm computation
We perform a special number field sieve discrete logarithm computation in a
1024-bit prime field. To our knowledge, this is the first kilobit-sized
discrete logarithm computation ever reported for prime fields. This computation
took a little over two months of calendar time on an academic cluster using the
open-source CADO-NFS software. Our chosen prime looks random, and
has a 160-bit prime factor, in line with recommended parameters for the Digital
Signature Algorithm. However, our p has been trapdoored in such a way that the
special number field sieve can be used to compute discrete logarithms in
, yet detecting that p has this trapdoor seems out of reach.
Twenty-five years ago, there was considerable controversy around the
possibility of back-doored parameters for DSA. Our computations show that
trapdoored primes are entirely feasible with current computing technology. We
also describe special number field sieve discrete log computations carried out
for multiple weak primes found in use in the wild. As can be expected from a
trapdoor mechanism which we say is hard to detect, our research did not reveal
any trapdoored prime in wide use. The only way for a user to defend against a
hypothetical trapdoor of this kind is to require verifiably random primes
Faster Sparse Matrix Inversion and Rank Computation in Finite Fields
We improve the current best running time value to invert sparse matrices over
finite fields, lowering it to an expected time for the
current values of fast rectangular matrix multiplication. We achieve the same
running time for the computation of the rank and nullspace of a sparse matrix
over a finite field. This improvement relies on two key techniques. First, we
adopt the decomposition of an arbitrary matrix into block Krylov and Hankel
matrices from Eberly et al. (ISSAC 2007). Second, we show how to recover the
explicit inverse of a block Hankel matrix using low displacement rank
techniques for structured matrices and fast rectangular matrix multiplication
algorithms. We generalize our inversion method to block structured matrices
with other displacement operators and strengthen the best known upper bounds
for explicit inversion of block Toeplitz-like and block Hankel-like matrices,
as well as for explicit inversion of block Vandermonde-like matrices with
structured blocks. As a further application, we improve the complexity of
several algorithms in topological data analysis and in finite group theory
Solving Quadratic Equations with XL on Parallel Architectures - extended version
Solving a system of multivariate quadratic equations (MQ) is an NP-complete problem whose complexity estimates are relevant to many cryptographic scenarios. In some cases it is required in the best known attack; sometimes it is a generic attack (such as for the multivariate PKCs), and sometimes it determines a provable level of security (such as for the QUAD stream ciphers).
Under reasonable assumptions, the best way to solve generic MQ systems is the XL algorithm implemented with a sparse matrix solver such as Wiedemann\u27s algorithm. Knowing how much time an implementation of this attack requires gives us a good idea of how future cryptosystems related to MQ can be broken, similar to how implementations of the General Number Field Sieve that factors smaller RSA numbers give us more insight into the security of actual RSA-based cryptosystems.
This paper describes such an implementation of XL using the block
Wiedemann algorithm. In 5 days we are able to solve a system with 32 variables and 64 equations over (a computation of about bit operations) on a small cluster of 8 nodes, with 8 CPU cores and 36 GB of RAM in each node. We do not expect system solvers of the F/F family to accomplish this due to their much higher memory demand. Our software also offers implementations for and and
can be easily adapted to other small fields. More importantly, it scales nicely for small clusters, NUMA machines, and a combination of both
Faster Sparse Matrix Inversion and Rank Computation in Finite Fields
We improve the current best running time value to invert sparse matrices over finite fields, lowering it to an expected O(n^{2.2131}) time for the current values of fast rectangular matrix multiplication. We achieve the same running time for the computation of the rank and nullspace of a sparse matrix over a finite field. This improvement relies on two key techniques. First, we adopt the decomposition of an arbitrary matrix into block Krylov and Hankel matrices from Eberly et al. (ISSAC 2007). Second, we show how to recover the explicit inverse of a block Hankel matrix using low displacement rank techniques for structured matrices and fast rectangular matrix multiplication algorithms. We generalize our inversion method to block structured matrices with other displacement operators and strengthen the best known upper bounds for explicit inversion of block Toeplitz-like and block Hankel-like matrices, as well as for explicit inversion of block Vandermonde-like matrices with structured blocks. As a further application, we improve the complexity of several algorithms in topological data analysis and in finite group theory