Search CORE

14 research outputs found

Realizing arbitrary-precision modular multiplication with a fixed-precision multiplier datapath

Author: Grossschaedl Johann
Savas Erkay
Savaş Erkay
Yumbul Kazım
Yumbul Kazim
Publication venue: IEEE (Institute of Electrical and Electronics Engineers)
Publication date: 30/09/2009
Field of study

Within the context of cryptographic hardware, the term scalability refers to the ability to process operands of any size, regardless of the precision of the underlying data path or registers. In this paper we present a simple yet effective technique for increasing the scalability of a fixed-precision Montgomery multiplier. Our idea is to extend the datapath of a Montgomery multiplier in such a way that it can also perform an ordinary multiplication of two n-bit operands (without modular reduction), yielding a 2n-bit result. This conventional (nxn->2n)-bit multiplication is then used as a “sub-routine” to realize arbitrary-precision Montgomery multiplication according to standard software algorithms such as Coarsely Integrated Operand Scanning (CIOS). We show that performing a 2n-bit modular multiplication on an n-bit multiplier can be done in 5n clock cycles, whereby we assume that the n-bit modular multiplication takes n cycles. Extending a Montgomery multiplier for this extra functionality requires just some minor modifications of the datapath and entails a slight increase in silicon area

Crossref

Sabanci University Research Database

Open Repository and Bibliography - Luxembourg

Design of a RSA crypto-processor using a systolic array

Author: Kuipers E.A.M.
Publication venue
Publication date: 30/06/1996
Field of study

Pure OAI Repository

Montgomery Multiplication Using Vector Instructions

Author: Daniel Shumow
Gregory M. Zaverucha
Joppe W. Bos
Peter L. Montgomery
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 21/08/2013
Field of study

In this paper we present a parallel approach to compute interleaved Montgomery multiplication. This approach is particularly suitable to be computed on 2-way single instruction, multiple data platforms as can be found on most modern computer architectures in the form of vector instruction set extensions. We have implemented this approach for tablet devices which run the x86 architecture (Intel Atom Z2760) using SSE2 instructions as well as devices which run on the ARM platform (Qualcomm MSM8960, NVIDIA Tegra 3 and 4) using NEON instructions. When instantiating modular exponentiation with this parallel version of Montgomery multiplication we observed a performance increase of more than a factor of 1.5 compared to the sequential implementation in OpenSSL for the classical arithmetic logic unit on the Atom platform for 2048-bit moduli

Cryptology ePrint Archive

A high-speed integrated circuit with applications to RSA Cryptography

Author: Onions Paul David
Publication venue: 'University of Plymouth'
Publication date: 01/01/1995
Field of study

Merged with duplicate record 10026.1/833 on 01.02.2017 by CS (TIS)The rapid growth in the use of computers and networks in government, commercial and private communications systems has led to an increasing need for these systems to be secure against unauthorised access and eavesdropping. To this end, modern computer security systems employ public-key ciphers, of which probably the most well known is the RSA ciphersystem, to provide both secrecy and authentication facilities. The basic RSA cryptographic operation is a modular exponentiation where the modulus and exponent are integers typically greater than 500 bits long. Therefore, to obtain reasonable encryption rates using the RSA cipher requires that it be implemented in hardware. This thesis presents the design of a high-performance VLSI device, called the WHiSpER chip, that can perform the modular exponentiations required by the RSA cryptosystem for moduli and exponents up to 506 bits long. The design has an expected throughput in excess of 64kbit/s making it attractive for use both as a general RSA processor within the security function provider of a security system, and for direct use on moderate-speed public communication networks such as ISDN. The thesis investigates the low-level techniques used for implementing high-speed arithmetic hardware in general, and reviews the methods used by designers of existing modular multiplication/exponentiation circuits with respect to circuit speed and efficiency. A new modular multiplication algorithm, MMDDAMMM, based on Montgomery arithmetic, together with an efficient multiplier architecture, are proposed that remove the speed bottleneck of previous designs. Finally, the implementation of the new algorithm and architecture within the WHiSpER chip is detailed, along with a discussion of the application of the chip to ciphering and key generation

Plymouth Electronic Archive and Research Library

OpenGrey Repository

Montgomery Arithmetic from a Software Perspective

Author: Joppe W. Bos
Peter L. Montgomery
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 31/10/2017
Field of study

This chapter describes Peter L. Montgomery\u27s modular multiplication method and the various improvements to reduce the latency for software implementations on devices which have access to many computational units

Cryptology ePrint Archive

VLSI architectures for public key cryptology

Author: Tomlinson Allan
Publication venue: The University of Edinburgh
Publication date: 01/01/1991
Field of study

Edinburgh Research Archive

Cryptarray A Scalable And Reconfigurable Architecture For Cryptographic Applications

Author: Lomonaco Michael John
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2004
Field of study

Cryptography is increasingly viewed as a critical technology to fulfill the requirements of security and authentication for information exchange between Internet applications. However, software implementations of cryptographic applications are unable to support the quality of service from a bandwidth perspective required by most Internet applications. As a result, various hardware implementations, from Application-Specific Integrated Circuits (ASICs), Field-Programmable Gate Arrays (FPGAs), to programmable processors, were proposed to improve this inadequate quality of service. Although these implementations provide performances that are considered better than those produced by software implementations, they still fall short of addressing the bandwidth requirements of most cryptographic applications in the context of the Internet for two major reasons: (i) The majority of these architectures sacrifice flexibility for performance in order to reach the performance level needed for cryptographic applications. This lack of flexibility can be detrimental considering that cryptographic standards and algorithms are still evolving. (ii) These architectures do not consider the consequences of technology scaling in general, and particularly interconnect related problems. As a result, this thesis proposes an architecture that attempts to address the requirements of cryptographic applications by overcoming the obstacles described in (i) and (ii). To this end, we propose a new reconfigurable, two-dimensional, scalable architecture, called CRYPTARRAY, in which bus-based communication is replaced by distributed shared memory communication. At the physical level, the length of the wires will be kept to a minimum. CRYPTARRAY is organized as a chessboard in which the dark and light squares represent Processing Elements (PE) and memory blocks respectively. The granularity and resource composition of the PEs is specifically designed to support the computing operations encountered in cryptographic algorithms in general, and symmetric algorithms in particular. Communication can occur only between neighboring PEs through locally shared memory blocks. Because of the chessboard layout, the architecture can be reconfigured to allow computation to proceed as a pipelined wave in any direction. This organization offers a high computational density in terms of datapath resources and a large number of distributed storage resources that easily support a high degree of parallelism and pipelining. Experimental prototyping a small array on FPGA chips shows that this architecture can run at 80.9 MHz producing 26,968,716 outputs every second in static reconfiguration mode and 20,226,537 outputs every second in dynamic reconfiguration mode

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Recommended from our members

Rapid hardware implementations of classical modular multiplication

Author: Johnson Scott Andrew
Publication venue: 'Oregon State University'
Publication date
Field of study

Modular multiplication is a mathematical operation fundamental to the RSA cryptosystern, a public-key cryptosystem with many applications in privacy, security, and authenticity. However, cryptosecurity requires that the numbers involved be extremely large, typically ranging from 512-1024 bits in length. Calculations on numbers of this magnitude are cumbersome and lengthy; this limits the speed of RSA. This thesis examines the problem of speeding up modular multiplication of large numbers in hardware, using the classical (add-and-shift) multiplication algorithm. The problem is broken down, and it is shown that the primary computational bottleneck occurs in the modular reduction step performed on each cycle. This reduction consists of an integer division step, a broadcast step, and a multiplication step. Various methods of speeding up these steps are examined, both for the special case of radix-2 multipliers (those shifting a single bit at a time) and the general case of radix-2r multipliers (those shifting r bits on every cycle.) The impacts of these techniques, both on cycle time and on chip area, are discussed. The scalability of these systems is examined, and several implementations of modular multiplication found in the literature are analyzed. Most significantly, the technique of pipelining of modular multipliers is examined. It is shown that it is possible to pipeline the modular reduction sequence, effectively eliminating the cycle time's dependence on either the size of the modulus, or on the size of the radius. Furthermore, a technique for constructing such multipliers is given. It is demonstrated that this technique is scalable with respect to time, and that pipelining eliminates many of the disadvantages inherent in previous high-radix implementations. It is also demonstrated that such multipliers have an area requirement which is linear with respect to both radix and modulus size

ScholarsArchive@OSU

Efficient Low-Energy, Digit-Serial Exponentiator for large Finite Field GF

Author: Allah Cherigui F.
Mlynek D.
Publication venue
Publication date: 14/06/2006
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Concurrent Error Detection in Finite Field Arithmetic Operations

Author: Bayat Sarmadi Siavash
Publication venue: 'University of Waterloo'
Publication date: 01/01/2007
Field of study

With significant advances in wired and wireless technologies and also increased shrinking in the size of VLSI circuits, many devices have become very large because they need to contain several large units. This large number of gates and in turn large number of transistors causes the devices to be more prone to faults. These faults specially in sensitive and critical applications may cause serious failures and hence should be avoided. On the other hand, some critical applications such as cryptosystems may also be prone to deliberately injected faults by malicious attackers. Some of these faults can produce erroneous results that can reveal some important secret information of the cryptosystems. Furthermore, yield factor improvement is always an important issue in VLSI design and fabrication processes. Digital systems such as cryptosystems and digital signal processors usually contain finite field operations. Therefore, error detection and correction of such operations have become an important issue recently. In most of the work reported so far, error detection and correction are applied using redundancies in space (hardware), time, and/or information (coding theory). In this work, schemes based on these redundancies are presented to detect errors in important finite field arithmetic operations resulting from hardware faults. Finite fields are used in a number of practical cryptosystems and channel encoders/decoders. The schemes presented here can detect errors in arithmetic operations of finite fields represented in different bases, including polynomial, dual and/or normal basis, and implemented in various architectures, including bit-serial, bit-parallel and/or systolic arrays

University of Waterloo's Institutional Repository