1,394 research outputs found

    A versatile Montgomery multiplier architecture with characteristic three support

    Get PDF
    We present a novel unified core design which is extended to realize Montgomery multiplication in the fields GF(2n), GF(3m), and GF(p). Our unified design supports RSA and elliptic curve schemes, as well as the identity-based encryption which requires a pairing computation on an elliptic curve. The architecture is pipelined and is highly scalable. The unified core utilizes the redundant signed digit representation to reduce the critical path delay. While the carry-save representation used in classical unified architectures is only good for addition and multiplication operations, the redundant signed digit representation also facilitates efficient computation of comparison and subtraction operations besides addition and multiplication. Thus, there is no need for a transformation between the redundant and the non-redundant representations of field elements, which would be required in the classical unified architectures to realize the subtraction and comparison operations. We also quantify the benefits of the unified architectures in terms of area and critical path delay. We provide detailed implementation results. The metric shows that the new unified architecture provides an improvement over a hypothetical non-unified architecture of at least 24.88%, while the improvement over a classical unified architecture is at least 32.07%

    Realizing arbitrary-precision modular multiplication with a fixed-precision multiplier datapath

    Get PDF
    Within the context of cryptographic hardware, the term scalability refers to the ability to process operands of any size, regardless of the precision of the underlying data path or registers. In this paper we present a simple yet effective technique for increasing the scalability of a fixed-precision Montgomery multiplier. Our idea is to extend the datapath of a Montgomery multiplier in such a way that it can also perform an ordinary multiplication of two n-bit operands (without modular reduction), yielding a 2n-bit result. This conventional (nxn->2n)-bit multiplication is then used as a “sub-routine” to realize arbitrary-precision Montgomery multiplication according to standard software algorithms such as Coarsely Integrated Operand Scanning (CIOS). We show that performing a 2n-bit modular multiplication on an n-bit multiplier can be done in 5n clock cycles, whereby we assume that the n-bit modular multiplication takes n cycles. Extending a Montgomery multiplier for this extra functionality requires just some minor modifications of the datapath and entails a slight increase in silicon area

    Residue Number System Hardware Emulator and Instructions Generator

    Get PDF
    Residue Number System (RNS) is an alternative form of representing integers on which a large value gets represented by a set of smaller and independent integers. Cryptographic and signal filtering algorithms benefit from the use of RNS, due to its capabilities to increase performance and security. Herein, a simulation tool is presented which emulates the hardware implementation of an actual RNS co-processor. An “high-level to assembly” instructions generator is also built into this tool. The programmability and scalable architecture of the considered processor along with the high level description of the algorithm allows researchers and developers to easily evaluate and test their RNS algorithms on an actual architecture, using Java

    Efficient Unified Arithmetic for Hardware Cryptography

    Get PDF
    The basic arithmetic operations (i.e. addition, multiplication, and inversion) in finite fields, GF(q), where q = pk and p is a prime integer, have several applications in cryptography, such as RSA algorithm, Diffie-Hellman key exchange algorithm [1], the US federal Digital Signature Standard [2], elliptic curve cryptography [3, 4], and also recently identity based cryptography [5, 6]. Most popular finite fields that are heavily used in cryptographic applications due to elliptic curve based schemes are prime fields GF(p) and binary extension fields GF(2n). Recently, identity based cryptography based on pairing operations defined over elliptic curve points has stimulated a significant level of interest in the arithmetic of ternary extension fields, GF(3^n)

    A Systolic Hardware Architectures of Montgomery Modular Multiplication for Public Key Cryptosystems

    Get PDF
    The arithmetic in a finite field constitutes the core of Public Key Cryptography like RSA, ECC or pairing-based cryptography. This paper discusses an efficient hardware implementation of the Coarsely Integrated Operand Scanning method (CIOS) of Montgomery modular multiplication combined with an effective systolic architecture designed with a Two-dimensional array of Processing Elements. The systolic architecture increases the speed of calculation by combining the concepts of pipelining and the parallel processing into a single concept. We propose the CIOS method for the Montgomery multiplication using a systolic architecture. As far as we know this is the first implementation of such design. The proposed architectures are designed for Field Programmable Gate Array platforms. They targeted to reduce the number of clock cycles of the modular multiplication. The presented implementation results of the CIOS algorithms focuses on different security levels useful in cryptography. This architecture have been designed in order to use the flexible DSP48 on Xilinx FPGAs. Our architecture is scalable and depends only on the number and size of words. For instance, we provide results of implementation for 8, 16, 32 and 64 bit long words in 33, 66, 132 and 264 clock cycles. We highlight the fact that for a given number of word, the number of clock cycles is constant

    Efficient unified Montgomery inversion with multibit shifting

    Get PDF
    Computation of multiplicative inverses in finite fields GF(p) and GF(2/sup n/) is the most time-consuming operation in elliptic curve cryptography, especially when affine co-ordinates are used. Since the existing algorithms based on the extended Euclidean algorithm do not permit a fast software implementation, projective co-ordinates, which eliminate almost all of the inversion operations from the curve arithmetic, are preferred. In the paper, the authors demonstrate that affine co-ordinate implementation provides a comparable speed to that of projective co-ordinates with careful hardware realisation of existing algorithms for calculating inverses in both fields without utilising special moduli or irreducible polynomials. They present two inversion algorithms for binary extension and prime fields, which are slightly modified versions of the Montgomery inversion algorithm. The similarity of the two algorithms allows the design of a single unified hardware architecture that performs the computation of inversion in both fields. They also propose a hardware structure where the field elements are represented using a multi-word format. This feature allows a scalable architecture able to operate in a broad range of precision, which has certain advantages in cryptographic applications. In addition, they include statistical comparison of four inversion algorithms in order to help choose the best one amongst them for implementation onto hardware

    Versatile Montgomery Multiplier Architectures

    Get PDF
    Several algorithms for Public Key Cryptography (PKC), such as RSA, Diffie-Hellman, and Elliptic Curve Cryptography, require modular multiplication of very large operands (sizes from 160 to 4096 bits) as their core arithmetic operation. To perform this operation reasonably fast, general purpose processors are not always the best choice. This is why specialized hardware, in the form of cryptographic co-processors, become more attractive. Based upon the analysis of recent publications on hardware design for modular multiplication, this M.S. thesis presents a new architecture that is scalable with respect to word size and pipelining depth. To our knowledge, this is the first time a word based algorithm for Montgomery\u27s method is realized using high-radix bit-parallel multipliers working with two different types of finite fields (unified architecture for GF(p) and GF(2n)). Previous approaches have relied mostly on bit serial multiplication in combination with massive pipelining, or Radix-8 multiplication with the limitation to a single type of finite field. Our approach is centered around the notion that the optimal delay in bit-parallel multipliers grows with logarithmic complexity with respect to the operand size n, O(log3/2 n), while the delay of bit serial implementations grows with linear complexity O(n). Our design has been implemented in VHDL, simulated and synthesized in 0.5μ CMOS technology. The synthesized net list has been verified in back-annotated timing simulations and analyzed in terms of performance and area consumption

    Parametric, Secure and Compact Implementation of RSA on FPGA

    Get PDF
    We present a fast, efficient, and parameterized modular multiplier and a secure exponentiation circuit especially intended for FPGAs on the low end of the price range. The design utilizes dedicated block multipliers as the main functional unit and Block-RAM as storage unit for the operands. The adopted design methodology allows adjusting the number of multipliers, the radix used in the multipliers, and number of words to meet the system requirements such as available resources, precision and timing constraints. The architecture, based on the Montgomery modular multiplication algorithm, utilizes a pipelining technique that allows concurrent operation of hardwired multipliers. Our design completes 1020-bit and 2040-bit modular multiplications in 7.62 μs and 27.0 μs, respectively. The multiplier uses a moderate amount of system resources while achieving the best area-time product in literature. 2040-bit modular exponentiation engine can easily fit into Xilinx Spartan-3E 500; moreover the exponentiation circuit withstands known side channel attacks

    Comparison of Scalable Montgomery Modular Multiplication Implementations Embedded in Reconfigurable Hardware

    No full text
    International audienceThis paper presents a comparison of possible approaches for an efficient implementation of Multiple-word radix-2 Montgomery Modular Multiplication (MM) on modern Field Programmable Gate Arrays (FPGAs). The hardware implementation of MM coprocessor is fully scalable what means that it can be reused in order to generate long-precision results independently on the word length of the originally proposed coprocessor. The first of analyzed implementations uses a data path based on traditionally used redundant carry-save adders, the second one exploits, in scalable designs not yet applied, standard carry-propagate adders with fast carry chain logic. As a control unit and a platform for purely software implementation an embedded soft-core processor Altera NIOS is employed. All implementations use large embedded memory blocks available in recent FPGAs. Speed and logic requirements comparisons are performed on the optimized software and combined hardware-software designs in Altera FPGAs. The issues of targeting a design specifically for a FPGA are considered taking into account the underlying architecture imposed by the target FPGA technology. It is shown that the coprocessors based on carry-save adders and carry-propagate adders provide comparable results in constrained FPGA implementations but in case of carry-propagate logic, the solution requires less embedded memory and provides some additional implementation advantages presented in the paper
    corecore