Search CORE

6 research outputs found

Efficient NTRU Implementations

Author: O\u27Rourke Colleen Marie
Publication venue: Digital WPI
Publication date: 30/04/2002
Field of study

In this paper, new software and hardware designs for the NTRU Public Key Cryptosystem are proposed. The first design attempts to improve NTRU\u27s polynomial multiplication through applying techniques from the Chinese Remainder Theorem (CRT) to the convolution algorithm. Although the application of CRT shows promise for the creation of the inverse polynomials in the setup procedure, it does not provide any benefits to the procedures that are critical to the performance of NTRU (public key creation, encryption, and decryption). This research has identified that this is due to the small coefficients of one of the operands, which can be a common misunderstanding. The second design focuses on improving the performance of the polynomial multiplications within NTRU\u27s key creation, encryption, and decryption procedures through hardware. This design exploits the inherent parallelism within a polynomial multiplication to make scalability possible. The advantage scalability provides is that it allows the user to customize the design for low and high power applications. In addition, the support for arbitrary precision allows the user to meet the desired security level. The third design utilizes the Montgomery Multiplication algorithm to develop an unified architecture that can perform a modular multiplication for GF(p) and GF(2^k) and a polynomial multiplication for NTRU. The unified design only requires an additional 10 gates in order for the Montgomery Multiplier core to compute the polynomial multiplication for NTRU. However, this added support for NTRU presents some restrictions on the supported lengths of the moduli and on the chosen value for the residue for the GF(p) and GF(2^k) cases. Despite these restrictions, this unified architecture is now capable of supporting public key operations for the majority of Public-Key Cryptosystems

DigitalCommons@WPI

Versatile Montgomery Multiplier Architectures

Author: Gaubatz Gunnar
Publication venue: Digital WPI
Publication date: 30/04/2002
Field of study

Several algorithms for Public Key Cryptography (PKC), such as RSA, Diffie-Hellman, and Elliptic Curve Cryptography, require modular multiplication of very large operands (sizes from 160 to 4096 bits) as their core arithmetic operation. To perform this operation reasonably fast, general purpose processors are not always the best choice. This is why specialized hardware, in the form of cryptographic co-processors, become more attractive. Based upon the analysis of recent publications on hardware design for modular multiplication, this M.S. thesis presents a new architecture that is scalable with respect to word size and pipelining depth. To our knowledge, this is the first time a word based algorithm for Montgomery\u27s method is realized using high-radix bit-parallel multipliers working with two different types of finite fields (unified architecture for GF(p) and GF(2n)). Previous approaches have relied mostly on bit serial multiplication in combination with massive pipelining, or Radix-8 multiplication with the limitation to a single type of finite field. Our approach is centered around the notion that the optimal delay in bit-parallel multipliers grows with logarithmic complexity with respect to the operand size n, O(log3/2 n), while the delay of bit serial implementations grows with linear complexity O(n). Our design has been implemented in VHDL, simulated and synthesized in 0.5μ CMOS technology. The synthesized net list has been verified in back-annotated timing simulations and analyzed in terms of performance and area consumption

DigitalCommons@WPI

Efficient Algorithms for Finite Fields, with Applications in Elliptic Curve Cryptography

Author: Baktir Selcuk
Publication venue: Digital WPI
Publication date: 01/05/2003
Field of study

This thesis introduces a new tower field representation, optimal tower fields (OTFs), that facilitates efficient finite field operations. The recursive direct inversion method presented for OTFs has significantly lower complexity than the known best method for inversion in optimal extension fields (OEFs), i.e., Itoh-Tsujii\u27s inversion technique. The complexity of OTF inversion algorithm is shown to be O(m^2), significantly better than that of the Itoh-Tsujii algorithm, i.e. O(m^2(log_2 m)). This complexity is further improved to O(m^(log_2 3)) by utilizing the Karatsuba-Ofman algorithm. In addition, it is shown that OTFs are in fact a special class of OEFs and OTF elements may be converted to OEF representation via a simple permutation of the coefficients. Hence, OTF operations may be utilized to achieve the OEF arithmetic operations whenever a corresponding OTF representation exists. While the original OTF multiplication and squaring operations require slightly more additions than their OEF counterparts, due to the free conversion, both OTF operations may be achieved with the complexity of OEF operations. Furthermore, efficient finite field algorithms are introduced which significantly improve OTF multiplication and squaring operations. The OTF inversion algorithm was implemented on the ARM family of processors for a medium and a large sized field whose elements can be represented with 192 and 320 bits, respectively. In the implementation, the new OTF inversion algorithm ran at least six to eight times faster than the known best method for inversion in OEFs, i.e., Itoh-Tsujii inversion technique. According to the implementation results obtained, it is indicated that using the OTF inversion method an elliptic curve scalar point multiplication operation can be performed at least two to three times faster than the known best implementation for the selected fields

DigitalCommons@WPI

Recommended from our members

Side channel attack resistant elliptic curves cryptosystem on multi-cores for power efficiency

Author: Yoo Jaewon
Publication venue: 'Oregon State University'
Publication date
Field of study

The Advent of multi-cores allows programs to be executed much faster than before. Cryptoalgorithms use long-bit words thus parallelizing these operations on multi-cores will achieve significant performance improvement. However, not all long-bit word operations in cryptosystems are suitable for parallel execution on multi-cores. In particular, long-bit words used in Elliptic Curves Cryptography (ECC) do not efficiently divide by the system word size. This causes some of the cores to be idle, which makes it vulnerable for attackers to guess how many operations occurred and thus what field size is being used. Multiplication is the most important part of public key cryptosystems. Long-bit word multiplication operations are needed for encryption and decryption. J. Fan et al. proposed using Montgomery multiplication on multi-cores using GF(2²⁵⁶) [25, 26], which is suitable for comput-er systems with 16-bit or 32-bit word size. Fan‟s Montgomery multiplication is suitable for most RSA. However, in ECC, some GFs will cause idle cores. For example, suppose GF(2¹³¹) is used (which is one of the recommended word size by NIST) on a quad-core with a 32-bit word size, which requires [132/32] =5 iterations with the last iteration requiring just a 3-bit operation. This cause three of the cores to be idle during this time causing needless power consumption. The most general and the easiest way to make side channel attacks difficult is to insert dummy instructions to cover the idle processors. However, dummy instructions result in extra workloads that lead to performance degradation and increases in power consumption. In this thesis, we will present a multiplier adjuster technique to improve the execution time and the power consumption for the last unbalanced iteration. By appropriately applying dummy instructions between point-addition and point-doubling operations, a balanced point operation can be achieved in ECC. The performance and power-efficiency of the proposed method on multi-cores are analyzed for each GF used in ECC

ScholarsArchive@OSU

Frequency Domain Finite Field Arithmetic for Elliptic Curve Cryptography

Author: baktir selcuk
Publication venue: Digital WPI
Publication date: 05/05/2008
Field of study

Efficient implementation of the number theoretic transform(NTT), also known as the discrete Fourier transform(DFT) over a finite field, has been studied actively for decades and found many applications in digital signal processing. In 1971 Schonhage and Strassen proposed an NTT based asymptotically fast multiplication method with the asymptotic complexity O(m log m log log m) for multiplication of

m

-bit integers or (m-1)st degree polynomials. Schonhage and Strassen\u27s algorithm was known to be the asymptotically fastest multiplication algorithm until Furer improved upon it in 2007. However, unfortunately, both algorithms bear significant overhead due to the conversions between the time and frequency domains which makes them impractical for small operands, e.g. less than 1000 bits in length as used in many applications. With this work we investigate for the first time the practical application of the NTT, which found applications in digital signal processing, to finite field multiplication with an emphasis on elliptic curve cryptography(ECC). We present efficient parameters for practical application of NTT based finite field multiplication to ECC which requires key and operand sizes as short as 160 bits in length. With this work, for the first time, the use of NTT based finite field arithmetic is proposed for ECC and shown to be efficient. We introduce an efficient algorithm, named DFT modular multiplication, for computing Montgomery products of polynomials in the frequency domain which facilitates efficient multiplication in GF(p^m). Our algorithm performs the entire modular multiplication, including modular reduction, in the frequency domain, and thus eliminates costly back and forth conversions between the frequency and time domains. We show that, especially in computationally constrained platforms, multiplication of finite field elements may be achieved more efficiently in the frequency domain than in the time domain for operand sizes relevant to ECC. This work presents the first hardware implementation of a frequency domain multiplier suitable for ECC and the first hardware implementation of ECC in the frequency domain. We introduce a novel area/time efficient ECC processor architecture which performs all finite field arithmetic operations in the frequency domain utilizing DFT modular multiplication over a class of Optimal Extension Fields(OEF). The proposed architecture achieves extension field modular multiplication in the frequency domain with only a linear number of base field GF(p) multiplications in addition to a quadratic number of simpler operations such as addition and bitwise rotation. With its low area and high speed, the proposed architecture is well suited for ECC in small device environments such as smart cards and wireless sensor networks nodes. Finally, we propose an adaptation of the Itoh-Tsujii algorithm to the frequency domain which can achieve efficient inversion in a class of OEFs relevant to ECC. This is the first time a frequency domain finite field inversion algorithm is proposed for ECC and we believe our algorithm will be well suited for efficient constrained hardware implementations of ECC in affine coordinates

DigitalCommons@WPI

Recommended from our members

A Study of High Performance Multiple Precision Arithmetic on Graphics Processing Units

Author: Emmart Niall
Publication venue: ScholarWorks@UMass Amherst
Publication date: 21/03/2018
Field of study

Multiple precision (MP) arithmetic is a core building block of a wide variety of algorithms in computational mathematics and computer science. In mathematics MP is used in computational number theory, geometric computation, experimental mathematics, and in some random matrix problems. In computer science, MP arithmetic is primarily used in cryptographic algorithms: securing communications, digital signatures, and code breaking. In most of these application areas, the factor that limits performance is the MP arithmetic. The focus of our research is to build and analyze highly optimized libraries that allow the MP operations to be offloaded from the CPU to the GPU. Our goal is to achieve an order of magnitude improvement over the CPU in three key metrics: operations per second per socket, operations per watt, and operation per second per dollar. What we find is that the SIMD design and balance of compute, cache, and bandwidth resources on the GPU is quite different from the CPU, so libraries such as GMP cannot simply be ported to the GPU. New approaches and algorithms are required to achieve high performance and high utilization of GPU resources. Further, we find that low-level ISA differences between GPU generations means that an approach that works well on one generation might not run well on the next. Here we report on our progress towards MP arithmetic libraries on the GPU in four areas: (1) large integer addition, subtraction, and multiplication; (2) high performance modular multiplication and modular exponentiation (the key operations for cryptographic algorithms) across generations of GPUs; (3) high precision floating point addition, subtraction, multiplication, division, and square root; (4) parallel short division, which we prove is asymptotically optimal on EREW and CREW PRAMs

ScholarWorks@UMass Amherst