52,306 research outputs found
Analysis of Parallel Montgomery Multiplication in CUDA
For a given level of security, elliptic curve cryptography (ECC) offers improved efficiency over classic public key implementations. Point multiplication is the most common operation in ECC and, consequently, any significant improvement in perfor- mance will likely require accelerating point multiplication. In ECC, the Montgomery algorithm is widely used for point multiplication. The primary purpose of this project is to implement and analyze a parallel implementation of the Montgomery algorithm as it is used in ECC. Specifically, the performance of CPU-based Montgomery multiplication and a GPU-based implementation in CUDA are compared
Stable normal forms for polynomial system solving
This paper describes and analyzes a method for computing border bases of a
zero-dimensional ideal . The criterion used in the computation involves
specific commutation polynomials and leads to an algorithm and an
implementation extending the one provided in [MT'05]. This general border basis
algorithm weakens the monomial ordering requirement for \grob bases
computations. It is up to date the most general setting for representing
quotient algebras, embedding into a single formalism Gr\"obner bases, Macaulay
bases and new representation that do not fit into the previous categories. With
this formalism we show how the syzygies of the border basis are generated by
commutation relations. We also show that our construction of normal form is
stable under small perturbations of the ideal, if the number of solutions
remains constant. This new feature for a symbolic algorithm has a huge impact
on the practical efficiency as it is illustrated by the experiments on
classical benchmark polynomial systems, at the end of the paper
Neural computation of arithmetic functions
A neuron is modeled as a linear threshold gate, and the network architecture considered is the layered feedforward network. It is shown how common arithmetic functions such as multiplication and sorting can be efficiently computed in a shallow neural network. Some known results are improved by showing that the product of two n-bit numbers and sorting of n n-bit numbers can be computed by a polynomial-size neural network using only four and five unit delays, respectively. Moreover, the weights of each threshold element in the neural networks require O(log n)-bit (instead of n -bit) accuracy. These results can be extended to more complicated functions such as multiple products, division, rational functions, and approximation of analytic functions
A Randomized Sublinear Time Parallel GCD Algorithm for the EREW PRAM
We present a randomized parallel algorithm that computes the greatest common
divisor of two integers of n bits in length with probability 1-o(1) that takes
O(n loglog n / log n) expected time using n^{6+\epsilon} processors on the EREW
PRAM parallel model of computation. We believe this to be the first randomized
sublinear time algorithm on the EREW PRAM for this problem
- …