1,349 research outputs found
Computational linear algebra over finite fields
We present here algorithms for efficient computation of linear algebra
problems over finite fields
Recommended from our members
A Study of High Performance Multiple Precision Arithmetic on Graphics Processing Units
Multiple precision (MP) arithmetic is a core building block of a wide variety of algorithms in computational mathematics and computer science. In mathematics MP is used in computational number theory, geometric computation, experimental mathematics, and in some random matrix problems. In computer science, MP arithmetic is primarily used in cryptographic algorithms: securing communications, digital signatures, and code breaking. In most of these application areas, the factor that limits performance is the MP arithmetic. The focus of our research is to build and analyze highly optimized libraries that allow the MP operations to be offloaded from the CPU to the GPU. Our goal is to achieve an order of magnitude improvement over the CPU in three key metrics: operations per second per socket, operations per watt, and operation per second per dollar. What we find is that the SIMD design and balance of compute, cache, and bandwidth resources on the GPU is quite different from the CPU, so libraries such as GMP cannot simply be ported to the GPU. New approaches and algorithms are required to achieve high performance and high utilization of GPU resources. Further, we find that low-level ISA differences between GPU generations means that an approach that works well on one generation might not run well on the next.
Here we report on our progress towards MP arithmetic libraries on the GPU in four areas: (1) large integer addition, subtraction, and multiplication; (2) high performance modular multiplication and modular exponentiation (the key operations for cryptographic algorithms) across generations of GPUs; (3) high precision floating point addition, subtraction, multiplication, division, and square root; (4) parallel short division, which we prove is asymptotically optimal on EREW and CREW PRAMs
High performance SIMD modular arithmetic for polynomial evaluation
Two essential problems in Computer Algebra, namely polynomial factorization
and polynomial greatest common divisor computation, can be efficiently solved
thanks to multiple polynomial evaluations in two variables using modular
arithmetic. In this article, we focus on the efficient computation of such
polynomial evaluations on one single CPU core. We first show how to leverage
SIMD computing for modular arithmetic on AVX2 and AVX-512 units, using both
intrinsics and OpenMP compiler directives. Then we manage to increase the
operational intensity and to exploit instruction-level parallelism in order to
increase the compute efficiency of these polynomial evaluations. All this
results in the end to performance gains up to about 5x on AVX2 and 10x on
AVX-512
- …