259 research outputs found

    Realizing arbitrary-precision modular multiplication with a fixed-precision multiplier datapath

    Get PDF
    Within the context of cryptographic hardware, the term scalability refers to the ability to process operands of any size, regardless of the precision of the underlying data path or registers. In this paper we present a simple yet effective technique for increasing the scalability of a fixed-precision Montgomery multiplier. Our idea is to extend the datapath of a Montgomery multiplier in such a way that it can also perform an ordinary multiplication of two n-bit operands (without modular reduction), yielding a 2n-bit result. This conventional (nxn->2n)-bit multiplication is then used as a “sub-routine” to realize arbitrary-precision Montgomery multiplication according to standard software algorithms such as Coarsely Integrated Operand Scanning (CIOS). We show that performing a 2n-bit modular multiplication on an n-bit multiplier can be done in 5n clock cycles, whereby we assume that the n-bit modular multiplication takes n cycles. Extending a Montgomery multiplier for this extra functionality requires just some minor modifications of the datapath and entails a slight increase in silicon area

    Modular Exponentiation on Reconfigurable Hardware

    Get PDF
    It is widely recognized that security issues will play a crucial role in the majority of future computer and communication systems. A central tool for achieving system security are cryptographic algorithms. For performance as well as for physical security reasons, it is often advantageous to realize cryptographic algorithms in hardware. In order to overcome the well-known drawback of reduced flexibility that is associated with traditional ASIC solutions, this contribution proposes arithmetic architectures which are optimized for modern field programmable gate arrays (FPGAs). The proposed architectures perform modular exponentiation with very long integers. This operation is at the heart of many practical public-key algorithms such as RSA and discrete logarithm schemes. We combine two versions of Montgomery modular multiplication algorithm with new systolic array designs which are well suited for FPGA realizations. The first one is based on a radix of two and is capable of processing a variable number of bits per array cell leading to a low cost design. The second design uses a radix of sixteen, resulting in a speed-up of a factor three at the cost of more used resources. The designs are flexible, allowing any choice of operand and modulus. Unlike previous approaches, we systematically implement and compare several versions of our new architecture for different bit lengths. We provide absolute area and timing measures for each architecture on Xilinx XC4000 series FPGAs. As a first practical result we show that it is possible to implement modular exponentiation at secure bit lengths on a single commercially available FPGA. Secondly we present faster processing times than previously reported. The Diffie-Hellman key exchange scheme with a modulus of 1024 bits and an exponent of 160 bits is computed in 1.9 ms. Our fastest design computes a 1024 bit RSA decryption in 3.1 ms when the Chinese remainder theorem is applied. These times are more than ten times faster than any reported software implementation. They also outperform most of the hardware-implementations presented in technical literature

    A versatile Montgomery multiplier architecture with characteristic three support

    Get PDF
    We present a novel unified core design which is extended to realize Montgomery multiplication in the fields GF(2n), GF(3m), and GF(p). Our unified design supports RSA and elliptic curve schemes, as well as the identity-based encryption which requires a pairing computation on an elliptic curve. The architecture is pipelined and is highly scalable. The unified core utilizes the redundant signed digit representation to reduce the critical path delay. While the carry-save representation used in classical unified architectures is only good for addition and multiplication operations, the redundant signed digit representation also facilitates efficient computation of comparison and subtraction operations besides addition and multiplication. Thus, there is no need for a transformation between the redundant and the non-redundant representations of field elements, which would be required in the classical unified architectures to realize the subtraction and comparison operations. We also quantify the benefits of the unified architectures in terms of area and critical path delay. We provide detailed implementation results. The metric shows that the new unified architecture provides an improvement over a hypothetical non-unified architecture of at least 24.88%, while the improvement over a classical unified architecture is at least 32.07%

    A high-speed integrated circuit with applications to RSA Cryptography

    Get PDF
    Merged with duplicate record 10026.1/833 on 01.02.2017 by CS (TIS)The rapid growth in the use of computers and networks in government, commercial and private communications systems has led to an increasing need for these systems to be secure against unauthorised access and eavesdropping. To this end, modern computer security systems employ public-key ciphers, of which probably the most well known is the RSA ciphersystem, to provide both secrecy and authentication facilities. The basic RSA cryptographic operation is a modular exponentiation where the modulus and exponent are integers typically greater than 500 bits long. Therefore, to obtain reasonable encryption rates using the RSA cipher requires that it be implemented in hardware. This thesis presents the design of a high-performance VLSI device, called the WHiSpER chip, that can perform the modular exponentiations required by the RSA cryptosystem for moduli and exponents up to 506 bits long. The design has an expected throughput in excess of 64kbit/s making it attractive for use both as a general RSA processor within the security function provider of a security system, and for direct use on moderate-speed public communication networks such as ISDN. The thesis investigates the low-level techniques used for implementing high-speed arithmetic hardware in general, and reviews the methods used by designers of existing modular multiplication/exponentiation circuits with respect to circuit speed and efficiency. A new modular multiplication algorithm, MMDDAMMM, based on Montgomery arithmetic, together with an efficient multiplier architecture, are proposed that remove the speed bottleneck of previous designs. Finally, the implementation of the new algorithm and architecture within the WHiSpER chip is detailed, along with a discussion of the application of the chip to ciphering and key generation

    Software and hardware implementation of the RSA public key cipher

    Get PDF
    Cryptographic systems and their use in communications are presented. The advantages obtained by the use of a public key cipher and the importance of this in a commercial environment are stressed. Two two main public key ciphers are considered. The RSA public key cipher is introduced and various methods for implementing this cipher on a standard, nondedicated, 8 bit microprocessor are investigated. The performance of the different algorithms are evaluated and compared. Various ways of increasing the performance are considered. The limitations imposed by the performance on the practical use of the cipher are discussed. The importance of the key to the security of the cipher is assessed. Different forms of attack are mentioned and a procedure for generating keys, which minimise the probability of a sucessful attack is presented. This procedure is implemented on a minicomputer. Use of the method on personal computers or microprocessors is examined. Methods for performing multiplication in hardware, with particular emphasis on the use of these methods in modular multiplication, are detailed. An algorithm for performing part of the encryption function in hardware and the hardware necessary for it is described. Different methods for implementing the hardware are discussed and one is choosen. A description of the hardware unit is given. The design and development of an application specific integrated circuit (ASIC) to perform key elements of the encryption function is described. The various stages of the design process are detailed. The results expected from this device and its integration into the overall encryption scheme are presented

    Tamper-Resistant Arithmetic for Public-Key Cryptography

    Get PDF
    Cryptographic hardware has found many uses in many ubiquitous and pervasive security devices with a small form factor, e.g. SIM cards, smart cards, electronic security tokens, and soon even RFIDs. With applications in banking, telecommunication, healthcare, e-commerce and entertainment, these devices use cryptography to provide security services like authentication, identification and confidentiality to the user. However, the widespread adoption of these devices into the mass market, and the lack of a physical security perimeter have increased the risk of theft, reverse engineering, and cloning. Despite the use of strong cryptographic algorithms, these devices often succumb to powerful side-channel attacks. These attacks provide a motivated third party with access to the inner workings of the device and therefore the opportunity to circumvent the protection of the cryptographic envelope. Apart from passive side-channel analysis, which has been the subject of intense research for over a decade, active tampering attacks like fault analysis have recently gained increased attention from the academic and industrial research community. In this dissertation we address the question of how to protect cryptographic devices against this kind of attacks. More specifically, we focus our attention on public key algorithms like elliptic curve cryptography and their underlying arithmetic structure. In our research we address challenges such as the cost of implementation, the level of protection, and the error model in an adversarial situation. The approaches that we investigated all apply concepts from coding theory, in particular the theory of cyclic codes. This seems intuitive, since both public key cryptography and cyclic codes share finite field arithmetic as a common foundation. The major contributions of our research are (a) a generalization of cyclic codes that allow embedding of finite fields into redundant rings under a ring homomorphism, (b) a new family of non-linear arithmetic residue codes with very high error detection probability, (c) a set of new low-cost arithmetic primitives for optimal extension field arithmetic based on robust codes, and (d) design techniques for tamper resilient finite state machines

    Montgomery and RNS for RSA Hardware Implementation

    Get PDF
    There are many architectures for RSA hardware implementation which improve its performance. Two main methods for this purpose are Montgomery and RNS. These are fast methods to convert plaintext to ciphertext in RSA algorithm with hardware implementation. RNS is faster than Montgomery but it uses more area. The goal of this paper is to compare these two methods based on the speed and on the used area. For this purpose the architecture that has a better performance for each method is selected, and some modification is done to enhance their performance. This comparison can be used to select the proper method for hardware implementation in both FPGA and ASIC design

    Power Efficient Fpga Implementation Of Rsa Algortihm

    Get PDF
    Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2010Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2010Bu çalışmada Rivest, Shamir, Adleman (RSA) algoritması sahada programlanabilir kapı dizisi üzerinde gerçeklenmekte ve güç tasarruf yöntemlerinden yararlanılarak dinamik güç harcamaları azaltılmaktadır. RSA algoritması en yaygın kullanıma sahip açık anahtarlı şifreleme algoritmalarından biridir. RSA algoritmasını oluşturan matematiksel temel işlemleri iki ana başlıkta toplamak mümkündür: moduler çarpma işlemi ve moduler üs alma, exponent işlemi. RSA algoritmasında kullanılan aritmetik işlem ME mod N işlemidir. Bu işlemdeki N sayısı aralarında asal iki sayının çarpımından oluşan modulo değeri, M mesaj ya da düz metin dediğimiz bilgi, E ise açık anahtar olarak bilinen değerdir. İyi bir RSA gerçeklemesi oluşturmak istenirse; yapılması gereken en önemli şey, iyi bir modular çarpma devresi oluşturmaktır. Bu matematiksel açıklamalardan yola çıkararak anlamalıyız ki; bir RSA gerçeklemesinde en çok güç tüketen blok modular çarpma devresidir. Bu nedenle güç tüketimlerinin karşılaştırılması açısından modular çarpma devresine farklı teknikler uygulanmıştır. Daha sonra çok yaygın bir kullanıma sahip olan ardışıl ikili modular üs alma tekniği ile RSA algoritması gerçeklenmiştir. Bilgisayar benzetim programı ile yapıların test vektörü girişlerine karşılık doğru sonuçlar verdiği gösterilmiştir.In this study, dynamic power consumptions of Field Programmable Gate Array (FPGA) implementations of the Rivest, Shamir, Adleman (RSA) has been reduced by using low power design methods. RSA is one of the most popular public key cryptographic algorithms. The mathematics behind RSA algorithm, are summarized in two operations, modular multiplication and modular exponentiation. In the RSA cryptosystem, the arithmetic operation ME mod N is used, where N is a prime product of two relative prime numbers, M is the message and E is the public key. In order to create an efficient implementation of RSA, one has to design efficiently the multiplication of two modular numbers. So this mathematical background provides a good understanding that Modular Multiplication block dissipates the most of the power, dissipated in RSA. For comparison of power dissipations, different methods are used to implement Modular Multiplication block. Then RSA implemented by using Sequential Binary Modular Exponentiation which has widespread applications. Computer simulations have been used to show that the implementations of the algorithm generate correct outputs against test vectors.Yüksek LisansM.Sc
    corecore