29,718 research outputs found
Bipartite Modular Multiplication
This paper proposes a new fast method for calculating modular multiplication. The calculation is performed using a new represen- tation of residue classes modulo M that enables the splitting of the multiplier into two parts. These two parts are then processed separately, in parallel, potentially doubling the calculation speed. The upper part and the lower part of the multiplier are processed using the interleaved modular multiplication algorithm and the Montgomery algorithm respectively. Conversions back and forth between the original integer set and the new residue system can be performed at speeds up to twice that of the Montgomery method without the need for precomputed constants. This new method is suitable for both hardware implementation; and software implementation in a multiprocessor environment. Although this paper is focusing on the application of the new method in the integer eld, the technique used to speed up the calculation can also easily be adapted for operation in the binary extended eld GF(2m)
Montgomery Algorithm Implementation on an Embedded System for a 256-bit Input Size
The Montgomery multiplication is a leading method to compute modular multiplications faster over large prime fields. Numerous algorithms in number theory use Montgomery multiplication computations. This fast data processing makes it appealing to cryptosystem analysis. The objective of this work is to implement the Montgomery algorithm on an embedded system. For this application, the following 256-bit arithmetic functions were executed in the MCUXpresso IDE software: adder, subtraction, multiplication, and Barret reduction. The obtained results in the FRDM-K64F board show the Montgomery form values, and the product out of the Montgomery domain. The operations computed in the embedded board also demonstrate that the applied algorithms are congruent with the values obtained in C programming, Python, and the FRDM-K64F board.ITESO, A. C
Improving Cryptographic Architectures by Adopting Efficient Adders in their Modular Multiplication Hardware VLSI
This work studies and compares different modular multiplication algorithms with emphases on the underlying binary adders. The method of interleaving multiplication and reduction, Montgomery’s method, and high-radix method were studied using the carry-save adder, carry-lookahead adder and carry-skip adder. Two recent implementations of the first two methods were modeled and synthesized for practical analysis. A modular multiplier following Koc’s implementation [6] based on carry-save adders and the use of carry-skip adders in the final addition step is expected to be of a fast speed with fair area requirement and reduced power consumption
Improving Cryptographic Architectures by Adopting Efficient Adders in their Modular Multiplication Hardware VLSI
This work studies and compares different modular multiplication algorithms with emphases on the underlying binary adders. The method of interleaving multiplication and reduction, Montgomery’s method, and high-radix method were studied using the carry-save adder, carry-lookahead adder and carry-skip adder. Two recent implementations of the first two methods were modeled and synthesized for practical analysis. A modular multiplier following Koc’s implementation [6] based on carry-save adders and the use of carry-skip adders in the final addition step is expected to be of a fast speed with fair area requirement and reduced power consumption
An introspective algorithm for the integer determinant
We present an algorithm computing the determinant of an integer matrix A. The
algorithm is introspective in the sense that it uses several distinct
algorithms that run in a concurrent manner. During the course of the algorithm
partial results coming from distinct methods can be combined. Then, depending
on the current running time of each method, the algorithm can emphasize a
particular variant. With the use of very fast modular routines for linear
algebra, our implementation is an order of magnitude faster than other existing
implementations. Moreover, we prove that the expected complexity of our
algorithm is only O(n^3 log^{2.5}(n ||A||)) bit operations in the dense case
and O(Omega n^{1.5} log^2(n ||A||) + n^{2.5}log^3(n||A||)) in the sparse case,
where ||A|| is the largest entry in absolute value of the matrix and Omega is
the cost of matrix-vector multiplication in the case of a sparse matrix.Comment: Published in Transgressive Computing 2006, Grenade : Espagne (2006
Parametric, Secure and Compact Implementation of RSA on FPGA
We present a fast, efficient, and parameterized modular multiplier and a secure exponentiation circuit especially intended for FPGAs on the low end of the price range. The design utilizes dedicated block multipliers as the main functional unit and Block-RAM as storage unit for the operands. The adopted design methodology allows adjusting the number of multipliers, the radix used in the multipliers, and number of words to meet the system requirements such as
available resources, precision and timing constraints. The architecture, based on the Montgomery modular multiplication algorithm, utilizes a pipelining technique that allows concurrent operation of hardwired multipliers. Our
design completes 1020-bit and 2040-bit modular multiplications in 7.62 μs and 27.0 μs, respectively. The multiplier uses a moderate amount of system resources while achieving the best area-time product in literature. 2040-bit modular exponentiation engine can easily fit into Xilinx Spartan-3E 500; moreover the exponentiation circuit withstands known side channel attacks
Enhancing an Embedded Processor Core with a Cryptographic Unit for Performance and Security
We present a set of low-cost architectural enhancements to accelerate the execution of certain arithmetic operations common in cryptographic applications on an extensible embedded processor core. The proposed enhancements are generic in the sense that they can be beneficially applied in almost any RISC processor. We implemented the enhancements in form of a cryptographic unit (CU) that offers the programmer an extended instruction set. The CU features a 128-bit wide register file and datapath, which enables it to process 128-bit words and perform 128-bit loads/stores. We analyze the speed-up factors for some arithmetic operations and public-key cryptographic algorithms obtained through
these enhancements. In addition, we evaluate the hardware overhead (i.e. silicon area) of integrating the CU into an embedded RISC processor. Our experimental results show that the proposed architectural enhancements allow for a
significant performance gain for both RSA and ECC at the expense of an acceptable increase in silicon area. We also demonstrate that the proposed enhancements facilitate the protection of cryptographic algorithms against certain types of side-channel attacks and present an AES implementation
hardened against cache-based attacks as a case study
Generalised Mersenne Numbers Revisited
Generalised Mersenne Numbers (GMNs) were defined by Solinas in 1999 and
feature in the NIST (FIPS 186-2) and SECG standards for use in elliptic curve
cryptography. Their form is such that modular reduction is extremely efficient,
thus making them an attractive choice for modular multiplication
implementation. However, the issue of residue multiplication efficiency seems
to have been overlooked. Asymptotically, using a cyclic rather than a linear
convolution, residue multiplication modulo a Mersenne number is twice as fast
as integer multiplication; this property does not hold for prime GMNs, unless
they are of Mersenne's form. In this work we exploit an alternative
generalisation of Mersenne numbers for which an analogue of the above property
--- and hence the same efficiency ratio --- holds, even at bitlengths for which
schoolbook multiplication is optimal, while also maintaining very efficient
reduction. Moreover, our proposed primes are abundant at any bitlength, whereas
GMNs are extremely rare. Our multiplication and reduction algorithms can also
be easily parallelised, making our arithmetic particularly suitable for
hardware implementation. Furthermore, the field representation we propose also
naturally protects against side-channel attacks, including timing attacks,
simple power analysis and differential power analysis, which is essential in
many cryptographic scenarios, in constrast to GMNs.Comment: 32 pages. Accepted to Mathematics of Computatio
Recommended from our members
High-speed implementation of the RSA cryptosystem
A public key cryptosystem allows two or more parties to securely communicate
over an insecure channel without establishing a physically secure channel for key
exchange. The RSA cryptosystem is the most popular public key cryptosystem ever
invented. It is based on the difficulty of factoring large composite numbers. Once the RSA
system is setup, i.e., the modulus, the private and public exponents are determined, and the
public components have been published, the senders as well as the receivers perform a
single operation for signing, encryption, decryption, and verification. This operation is the
computation of modular exponentiation. In this thesis, we focus on fast implementations
of the modular exponentiation operation. Several methods for modular exponentiation are
presented, including the binary method and the m-ary method. We give a general algorithm
of implementing the m-ary method, and some examples of the quaternary method
and the octal method. The standard multiplication and squaring algorithms are also discussed
as methods to implement the modular multiplication and squaring operations. Two
methods for performing the modular multiplication operation are given: the multiply and
reduce method and the Montgomery method. The Montgomery product algorithm is used
in the implementation of the modular exponentiation operation. The algorithms presented
in this thesis are implemented in C and 16-bit in-line 80486 assembly code. We have performed
extensive testing of the code, and obtained timing results which are given in the
last chapter of the thesis
- …