Search CORE

108 research outputs found

Realizing arbitrary-precision modular multiplication with a fixed-precision multiplier datapath

Author: Grossschaedl Johann
Savas Erkay
Savaş Erkay
Yumbul Kazım
Yumbul Kazim
Publication venue: IEEE (Institute of Electrical and Electronics Engineers)
Publication date: 30/09/2009
Field of study

Within the context of cryptographic hardware, the term scalability refers to the ability to process operands of any size, regardless of the precision of the underlying data path or registers. In this paper we present a simple yet effective technique for increasing the scalability of a fixed-precision Montgomery multiplier. Our idea is to extend the datapath of a Montgomery multiplier in such a way that it can also perform an ordinary multiplication of two n-bit operands (without modular reduction), yielding a 2n-bit result. This conventional (nxn->2n)-bit multiplication is then used as a “sub-routine” to realize arbitrary-precision Montgomery multiplication according to standard software algorithms such as Coarsely Integrated Operand Scanning (CIOS). We show that performing a 2n-bit modular multiplication on an n-bit multiplier can be done in 5n clock cycles, whereby we assume that the n-bit modular multiplication takes n cycles. Extending a Montgomery multiplier for this extra functionality requires just some minor modifications of the datapath and entails a slight increase in silicon area

Crossref

Sabanci University Research Database

Open Repository and Bibliography - Luxembourg

A high-speed integrated circuit with applications to RSA Cryptography

Author: Onions Paul David
Publication venue: 'University of Plymouth'
Publication date: 01/01/1995
Field of study

Merged with duplicate record 10026.1/833 on 01.02.2017 by CS (TIS)The rapid growth in the use of computers and networks in government, commercial and private communications systems has led to an increasing need for these systems to be secure against unauthorised access and eavesdropping. To this end, modern computer security systems employ public-key ciphers, of which probably the most well known is the RSA ciphersystem, to provide both secrecy and authentication facilities. The basic RSA cryptographic operation is a modular exponentiation where the modulus and exponent are integers typically greater than 500 bits long. Therefore, to obtain reasonable encryption rates using the RSA cipher requires that it be implemented in hardware. This thesis presents the design of a high-performance VLSI device, called the WHiSpER chip, that can perform the modular exponentiations required by the RSA cryptosystem for moduli and exponents up to 506 bits long. The design has an expected throughput in excess of 64kbit/s making it attractive for use both as a general RSA processor within the security function provider of a security system, and for direct use on moderate-speed public communication networks such as ISDN. The thesis investigates the low-level techniques used for implementing high-speed arithmetic hardware in general, and reviews the methods used by designers of existing modular multiplication/exponentiation circuits with respect to circuit speed and efficiency. A new modular multiplication algorithm, MMDDAMMM, based on Montgomery arithmetic, together with an efficient multiplier architecture, are proposed that remove the speed bottleneck of previous designs. Finally, the implementation of the new algorithm and architecture within the WHiSpER chip is detailed, along with a discussion of the application of the chip to ciphering and key generation

Plymouth Electronic Archive and Research Library

OpenGrey Repository

Modular Exponentiation on Reconfigurable Hardware

Author: Blum Thomas
Publication venue: Digital WPI
Publication date: 03/09/1999
Field of study

It is widely recognized that security issues will play a crucial role in the majority of future computer and communication systems. A central tool for achieving system security are cryptographic algorithms. For performance as well as for physical security reasons, it is often advantageous to realize cryptographic algorithms in hardware. In order to overcome the well-known drawback of reduced flexibility that is associated with traditional ASIC solutions, this contribution proposes arithmetic architectures which are optimized for modern field programmable gate arrays (FPGAs). The proposed architectures perform modular exponentiation with very long integers. This operation is at the heart of many practical public-key algorithms such as RSA and discrete logarithm schemes. We combine two versions of Montgomery modular multiplication algorithm with new systolic array designs which are well suited for FPGA realizations. The first one is based on a radix of two and is capable of processing a variable number of bits per array cell leading to a low cost design. The second design uses a radix of sixteen, resulting in a speed-up of a factor three at the cost of more used resources. The designs are flexible, allowing any choice of operand and modulus. Unlike previous approaches, we systematically implement and compare several versions of our new architecture for different bit lengths. We provide absolute area and timing measures for each architecture on Xilinx XC4000 series FPGAs. As a first practical result we show that it is possible to implement modular exponentiation at secure bit lengths on a single commercially available FPGA. Secondly we present faster processing times than previously reported. The Diffie-Hellman key exchange scheme with a modulus of 1024 bits and an exponent of 160 bits is computed in 1.9 ms. Our fastest design computes a 1024 bit RSA decryption in 3.1 ms when the Chinese remainder theorem is applied. These times are more than ten times faster than any reported software implementation. They also outperform most of the hardware-implementations presented in technical literature

DigitalCommons@WPI

Montgomery Modular Multiplication on Reconfigurable Hardware: Systolic versus Multiplexed Implementation

Author: Daniel Gomes Mesquita
Guilherme Perin
João Baptista Martins
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2011
Field of study

This paper describes a comparison of two Montgomery modular multiplication architectures: a systolic and a multiplexed. Both implementations target FPGA devices. The modular multiplication is employed in modular exponentiation processes, which are the most important operations of some public-key cryptographic algorithms, including the most popular of them, the RSA. The proposed systolic architecture presents a high-radix implementation with a one-dimensional array of Processing Elements. The multiplexed implementation is a new alternative and is composed of multiplier blocks in parallel with the new simplified Processing Elements, and it provides a pipelined operation mode. We compare the time × area efficiency for both architectures as well as an RSA application. The systolic implementation can run the 1024 bits RSA decryption process in just 3.23 ms, and the multiplexed architecture executes the same operation in 4.36 ms, but the second approach saves up to 28% of logical resources. These results are competitive with the state-of-the-art performance

Crossref

Directory of Open Access Journals

Revisiting sum of residues modular multiplication

Author: Kong Y.
Phillips B.
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2010
Field of study

the 1980s,when the introduction of public key cryptography spurred interest in modularmultiplication, many implementations performed modularmultiplication using a sumof residues. As the fieldmatured, sum of residues modularmultiplication lost favour to the extent that all recent surveys have either overlooked it or incorporated it within a larger class of reduction algorithms. In this paper, we present a new taxonomy of modular multiplication algorithms. We include sum of residues as one of four classes and argue why it should be considered different to the other, now more common, algorithms.We then apply techniques developed for other algorithms to reinvigorate sum of residues modular multiplication. We compare FPGA implementations of modular multiplication up to 24 bits wide. The Sum of Residues multipliers demonstrate reduced latency at nearly 50% compared to Montgomery architectures at the cost of nearly doubled circuit area. The new multipliers are useful for systems based on the Residue Number System (RNS).Yinan Kong and Braden Phillip

Crossref

Adelaide Research & Scholarship

Directory of Open Access Journals

Macquarie University ResearchOnline

A Scalable GF(p) Elliptic Curve Processor Architecture for Programmable Hardware

Author: B. Parhami
E. Brickell
G. Orlando
I. Blake
K. Leung
L. Gao
M. Rosner
T. Blum
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Montgomery and RNS for RSA Hardware Implementation

Author: Manochehri Kooroush
Pourmozafari Saadat
Sadeghian Babak
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 26/01/2012
Field of study

There are many architectures for RSA hardware implementation which improve its performance. Two main methods for this purpose are Montgomery and RNS. These are fast methods to convert plaintext to ciphertext in RSA algorithm with hardware implementation. RNS is faster than Montgomery but it uses more area. The goal of this paper is to compare these two methods based on the speed and on the used area. For this purpose the architecture that has a better performance for each method is selected, and some modification is done to enhance their performance. This comparison can be used to select the proper method for hardware implementation in both FPGA and ASIC design

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Arithmetic Considerations for Isogeny Based Cryptography

Author: Joppe W. Bos
Simon Friedberger
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 01/05/2018
Field of study

In this paper we investigate various arithmetic techniques which can be used to potentially enhance the performance in the supersingular isogeny Diffie-Hellman (SIDH) key-exchange protocol which is one of the more recent contenders in the post-quantum public-key arena. Firstly, we give a systematic overview of techniques to compute efficient arithmetic modulo

2^xp^y\pm 1

. Our overview shows that in the SIDH setting, where arithmetic over a quadratic extension field is required, the approaches based on Montgomery reduction for such primes of a special shape are to be preferred. Moreover, the outcome of our investigation reveals that there exist moduli which allow even faster implementations. Secondly, we investigate if it is beneficial to use other curve models to speed-up the elliptic curve scalar multiplication. The use of twisted Edwards curves allows one to search for efficient addition-subtraction chains for fixed scalars while this is not possible with the differential addition law when using Montgomery curves. Our preliminary results show that despite the fact that we found such efficient chains, using twisted Edwards curves does not result in faster scalar multiplication arithmetic in the setting of SIDH

Cryptology ePrint Archive

Efficient and Fast Hardware Architectures for SIKE Round 2 on FPGA

Author: Mehran Mozaffari-Kermani
Rami Elkhatib
Reza Azarderakhsh
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 25/05/2020
Field of study

New primes were proposed for Supersingular Isogeny Key Encapsulation (SIKE) in NIST standardization process of Round 2 after further cryptanalysis research showed that the security levels of the initial primes chosen were over-estimated. In this paper, we develop a highly optimized

\mathbb{F}_{p}

Montgomery multiplication algorithm and architecture that further utilizes the special form of SIKE prime compared to previous implementations available in the literature. We then implement SIKE for all Round 2 NIST security levels (SIKEp434 for NIST security level 1, SIKEp503 for NIST security level 2, SIKEp610 for NIST security level 3, and SIKEp751 for NIST security level 5) on Xilinx Virtex 7 using the proposed multiplier. Our best implementation (NIST security level 1) runs 29\% faster and occupies 30\% less hardware resources in comparison to the leading counterpart available in the literature and implementations for other security levels achieved similar improvement

Cryptology ePrint Archive