7,560 research outputs found
Efficient long division via Montgomery multiply
We present a novel right-to-left long division algorithm based on the
Montgomery modular multiply, consisting of separate highly efficient loops with
simply carry structure for computing first the remainder (x mod q) and then the
quotient floor(x/q). These loops are ideally suited for the case where x
occupies many more machine words than the divide modulus q, and are strictly
linear time in the "bitsize ratio" lg(x)/lg(q). For the paradigmatic
performance test of multiword dividend and single 64-bit-word divisor,
exploitation of the inherent data-parallelism of the algorithm effectively
mitigates the long latency of hardware integer MUL operations, as a result of
which we are able to achieve respective costs for remainder-only and full-DIV
(remainder and quotient) of 6 and 12.5 cycles per dividend word on the Intel
Core 2 implementation of the x86_64 architecture, in single-threaded execution
mode. We further describe a simple "bit-doubling modular inversion" scheme,
which allows the entire iterative computation of the mod-inverse required by
the Montgomery multiply at arbitrarily large precision to be performed with
cost less than that of a single Newtonian iteration performed at the full
precision of the final result. We also show how the Montgomery-multiply-based
powering can be efficiently used in Mersenne and Fermat-number trial
factorization via direct computation of a modular inverse power of 2, without
any need for explicit radix-mod scalings.Comment: 23 pages; 8 tables v2: Tweak formatting, pagecount -= 2. v3: Fix
incorrect powers of R in formulae [7] and [11] v4: Add Eldridge & Walter ref.
v5: Clarify relation between Algos A/A',D and Hensel-div; clarify
true-quotient mechanics; Add Haswell timings, refs to Agner Fog timings pdf
and GMP asm-timings ref-page. v6: Remove stray +bw in MULL line of Algo D
listing; add note re byte-LUT for qinv_
Generalised Mersenne Numbers Revisited
Generalised Mersenne Numbers (GMNs) were defined by Solinas in 1999 and
feature in the NIST (FIPS 186-2) and SECG standards for use in elliptic curve
cryptography. Their form is such that modular reduction is extremely efficient,
thus making them an attractive choice for modular multiplication
implementation. However, the issue of residue multiplication efficiency seems
to have been overlooked. Asymptotically, using a cyclic rather than a linear
convolution, residue multiplication modulo a Mersenne number is twice as fast
as integer multiplication; this property does not hold for prime GMNs, unless
they are of Mersenne's form. In this work we exploit an alternative
generalisation of Mersenne numbers for which an analogue of the above property
--- and hence the same efficiency ratio --- holds, even at bitlengths for which
schoolbook multiplication is optimal, while also maintaining very efficient
reduction. Moreover, our proposed primes are abundant at any bitlength, whereas
GMNs are extremely rare. Our multiplication and reduction algorithms can also
be easily parallelised, making our arithmetic particularly suitable for
hardware implementation. Furthermore, the field representation we propose also
naturally protects against side-channel attacks, including timing attacks,
simple power analysis and differential power analysis, which is essential in
many cryptographic scenarios, in constrast to GMNs.Comment: 32 pages. Accepted to Mathematics of Computatio
Integer Division by Constants: Optimal Bounds
The integer division of a numerator n by a divisor d gives a quotient q and a
remainder r. Optimizing compilers accelerate software by replacing the division
of n by d with the division of c * n (or c * n + c) by m for convenient
integers c and m chosen so that they approximate the reciprocal: c/m ~= 1/d.
Such techniques are especially advantageous when m is chosen to be a power of
two and when d is a constant so that c and m can be precomputed. The literature
contains many bounds on the distance between c/m and the divisor d. Some of
these bounds are optimally tight, while others are not. We present optimally
tight bounds for quotient and remainder computations
Elliptic Curve Cryptography on Modern Processor Architectures
Abstract
Elliptic Curve Cryptography (ECC) has been adopted by the US National Security Agency (NSA) in Suite "B" as part of its "Cryptographic Modernisation Program ". Additionally,
it has been favoured by an entire host of mobile devices due to its superior performance characteristics. ECC is also the building block on which the exciting field of pairing/identity based cryptography is based. This widespread use means that there is potentially a lot to be gained by researching efficient implementations on modern processors such as IBM's Cell Broadband Engine and Philip's next generation smart card cores. ECC operations can be thought of as a pyramid of building blocks, from instructions on a core, modular operations on a finite field, point addition & doubling, elliptic curve scalar
multiplication to application level protocols. In this thesis we examine an implementation of these components for ECC focusing on a range of optimising techniques for the Cell's SPU and the MIPS smart card. We show significant performance improvements that can be achieved through of adoption of EC
Deterministic elliptic curve primality proving for a special sequence of numbers
We give a deterministic algorithm that very quickly proves the primality or
compositeness of the integers N in a certain sequence, using an elliptic curve
E/Q with complex multiplication by the ring of integers of Q(sqrt(-7)). The
algorithm uses O(log N) arithmetic operations in the ring Z/NZ, implying a bit
complexity that is quasi-quadratic in log N. Notably, neither of the classical
"N-1" or "N+1" primality tests apply to the integers in our sequence. We
discuss how this algorithm may be applied, in combination with sieving
techniques, to efficiently search for very large primes. This has allowed us to
prove the primality of several integers with more than 100,000 decimal digits,
the largest of which has more than a million bits in its binary representation.
At the time it was found, it was the largest proven prime N for which no
significant partial factorization of N-1 or N+1 is known.Comment: 16 pages, corrected a minor sign error in 5.
- …