2,096 research outputs found
Generalised Mersenne Numbers Revisited
Generalised Mersenne Numbers (GMNs) were defined by Solinas in 1999 and
feature in the NIST (FIPS 186-2) and SECG standards for use in elliptic curve
cryptography. Their form is such that modular reduction is extremely efficient,
thus making them an attractive choice for modular multiplication
implementation. However, the issue of residue multiplication efficiency seems
to have been overlooked. Asymptotically, using a cyclic rather than a linear
convolution, residue multiplication modulo a Mersenne number is twice as fast
as integer multiplication; this property does not hold for prime GMNs, unless
they are of Mersenne's form. In this work we exploit an alternative
generalisation of Mersenne numbers for which an analogue of the above property
--- and hence the same efficiency ratio --- holds, even at bitlengths for which
schoolbook multiplication is optimal, while also maintaining very efficient
reduction. Moreover, our proposed primes are abundant at any bitlength, whereas
GMNs are extremely rare. Our multiplication and reduction algorithms can also
be easily parallelised, making our arithmetic particularly suitable for
hardware implementation. Furthermore, the field representation we propose also
naturally protects against side-channel attacks, including timing attacks,
simple power analysis and differential power analysis, which is essential in
many cryptographic scenarios, in constrast to GMNs.Comment: 32 pages. Accepted to Mathematics of Computatio
An algorithmic and architectural study on Montgomery exponentiation in RNS
The modular exponentiation on large numbers is computationally intensive. An effective way for performing this operation consists in using Montgomery exponentiation in the Residue Number System (RNS). This paper presents an algorithmic and architectural study of such exponentiation approach. From the algorithmic point of view, new and state-of-the-art opportunities that come from the reorganization of operations and precomputations are considered. From the architectural perspective, the design opportunities offered by well-known computer arithmetic techniques are studied, with the aim of developing an efficient arithmetic cell architecture. Furthermore, since the use of efficient RNS bases with a low Hamming weight are being considered with ever more interest, four additional cell architectures specifically tailored to these bases are developed and the tradeoff between benefits and drawbacks is carefully explored. An overall comparison among all the considered algorithmic approaches and cell architectures is presented, with the aim of providing the reader with an extensive overview of the Montgomery exponentiation opportunities in RNS
Realizing arbitrary-precision modular multiplication with a fixed-precision multiplier datapath
Within the context of cryptographic hardware, the term scalability refers to the ability to process operands of any size, regardless of the precision of the underlying data path or registers. In this paper we present a simple yet effective technique for increasing the scalability of a fixed-precision Montgomery multiplier. Our idea is to extend the datapath of a Montgomery multiplier in such a way that it can also perform an ordinary multiplication of two n-bit operands (without modular reduction), yielding a 2n-bit result. This
conventional (nxn->2n)-bit multiplication is then used as a “sub-routine” to realize arbitrary-precision Montgomery multiplication according to standard software algorithms such as Coarsely Integrated Operand Scanning (CIOS). We
show that performing a 2n-bit modular multiplication on an n-bit multiplier can be done in 5n clock cycles, whereby we assume that the n-bit modular multiplication takes n cycles. Extending a Montgomery multiplier for this extra
functionality requires just some minor modifications of the datapath and entails a slight increase in silicon area
Efficient long division via Montgomery multiply
We present a novel right-to-left long division algorithm based on the
Montgomery modular multiply, consisting of separate highly efficient loops with
simply carry structure for computing first the remainder (x mod q) and then the
quotient floor(x/q). These loops are ideally suited for the case where x
occupies many more machine words than the divide modulus q, and are strictly
linear time in the "bitsize ratio" lg(x)/lg(q). For the paradigmatic
performance test of multiword dividend and single 64-bit-word divisor,
exploitation of the inherent data-parallelism of the algorithm effectively
mitigates the long latency of hardware integer MUL operations, as a result of
which we are able to achieve respective costs for remainder-only and full-DIV
(remainder and quotient) of 6 and 12.5 cycles per dividend word on the Intel
Core 2 implementation of the x86_64 architecture, in single-threaded execution
mode. We further describe a simple "bit-doubling modular inversion" scheme,
which allows the entire iterative computation of the mod-inverse required by
the Montgomery multiply at arbitrarily large precision to be performed with
cost less than that of a single Newtonian iteration performed at the full
precision of the final result. We also show how the Montgomery-multiply-based
powering can be efficiently used in Mersenne and Fermat-number trial
factorization via direct computation of a modular inverse power of 2, without
any need for explicit radix-mod scalings.Comment: 23 pages; 8 tables v2: Tweak formatting, pagecount -= 2. v3: Fix
incorrect powers of R in formulae [7] and [11] v4: Add Eldridge & Walter ref.
v5: Clarify relation between Algos A/A',D and Hensel-div; clarify
true-quotient mechanics; Add Haswell timings, refs to Agner Fog timings pdf
and GMP asm-timings ref-page. v6: Remove stray +bw in MULL line of Algo D
listing; add note re byte-LUT for qinv_
Notes on the Riemann Hypothesis
These notes were written from a series of lectures given in March 2010 at the
Universidad Complutense of Madrid and then in Barcelona for the centennial
anniversary of the Spanish Mathematical Society (RSME). Our aim is to give an
introduction to the Riemann Hypothesis and a panoramic view of the world of
zeta and L-functions. We first review Riemann's foundational article and
discuss the mathematical background of the time and his possible motivations
for making his famous conjecture. We discuss some of the most relevant
developments after Riemann that have contributed to a better understanding of
the conjecture.Comment: 2 sections added, 55 pages, 6 figure
A high-speed integrated circuit with applications to RSA Cryptography
Merged with duplicate record 10026.1/833 on 01.02.2017 by CS (TIS)The rapid growth in the use of computers and networks in government, commercial and
private communications systems has led to an increasing need for these systems to be
secure against unauthorised access and eavesdropping. To this end, modern computer
security systems employ public-key ciphers, of which probably the most well known is the
RSA ciphersystem, to provide both secrecy and authentication facilities.
The basic RSA cryptographic operation is a modular exponentiation where the modulus
and exponent are integers typically greater than 500 bits long. Therefore, to obtain reasonable
encryption rates using the RSA cipher requires that it be implemented in hardware.
This thesis presents the design of a high-performance VLSI device, called the WHiSpER
chip, that can perform the modular exponentiations required by the RSA cryptosystem
for moduli and exponents up to 506 bits long. The design has an expected throughput
in excess of 64kbit/s making it attractive for use both as a general RSA processor within
the security function provider of a security system, and for direct use on moderate-speed
public communication networks such as ISDN.
The thesis investigates the low-level techniques used for implementing high-speed arithmetic
hardware in general, and reviews the methods used by designers of existing modular
multiplication/exponentiation circuits with respect to circuit speed and efficiency.
A new modular multiplication algorithm, MMDDAMMM, based on Montgomery arithmetic,
together with an efficient multiplier architecture, are proposed that remove the
speed bottleneck of previous designs.
Finally, the implementation of the new algorithm and architecture within the WHiSpER
chip is detailed, along with a discussion of the application of the chip to ciphering and key
generation
- …