Search CORE

71 research outputs found

Area- Efficient VLSI Implementation of Serial-In Parallel-Out Multiplier Using Polynomial Representation in Finite Field GF(2m)

Author: Fatin Gholamreza Zare
Javidan Javad
Nabipour Saeideh
Publication venue
Publication date: 16/07/2020
Field of study

Finite field multiplier is mainly used in elliptic curve cryptography, error-correcting codes and signal processing. Finite field multiplier is regarded as the bottleneck arithmetic unit for such applications and it is the most complicated operation over finite field GF(2m) which requires a huge amount of logic resources. In this paper, a new modified serial-in parallel-out multiplication algorithm with interleaved modular reduction is suggested. The proposed method offers efficient area architecture as compared to proposed algorithms in the literature. The reduced finite field multiplier complexity is achieved by means of utilizing logic NAND gate in a particular architecture. The efficiency of the proposed architecture is evaluated based on criteria such as time (latency, critical path) and space (gate-latch number) complexity. A detailed comparative analysis indicates that, the proposed finite field multiplier based on logic NAND gate outperforms previously known resultsComment: 19 pages, 4 figure

arXiv.org e-Print Archive

Throughput/Area-efficient ECC Processor Using Montgomery Point Multiplication on FPGA

Author: Benaissa M.
Khan Z.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/07/2015
Field of study

High throughput while maintaining low resource is a key issue for elliptic curve cryptography (ECC) hardware implementations in many applications. In this brief, an ECC processor architecture over Galois fields is presented, which achieves the best reported throughput/area performance on field-programmable gate array (FPGA) to date. A novel segmented pipelining digit serial multiplier is developed to speed up ECC point multiplication. To achieve low latency, a new combined algorithm is developed for point addition and point doubling with careful scheduling. A compact and flexible distributed-RAM-based memory unit design is developed to increase speed while keeping area low. Further optimizations were made via timing constraints and logic level modifications at the implementation level. The proposed architecture is implemented on Virtex4 (V4), Virtex5 (V5), and Virtex7 (V7) FPGA technologies and, respectively, achieved throughout/slice figures of 19.65, 65.30, and 64.48 (106/(Seconds × Slices))

Crossref

White Rose Research Online

Efficient Low-Energy, Digit-Serial Exponentiator for large Finite Field GF

Author: Allah Cherigui F.
Mlynek D.
Publication venue
Publication date: 14/06/2006
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Novel Single and Hybrid Finite Field Multipliers over GF(2m) for Emerging Cryptographic Systems

Author: Xie Jiafeng
Publication venue
Publication date: 28/01/2015
Field of study

With the rapid development of economic and technical progress, designers and users of various kinds of ICs and emerging embedded systems like body-embedded chips and wearable devices are increasingly facing security issues. All of these demands from customers push the cryptographic systems to be faster, more efficient, more reliable and safer. On the other hand, multiplier over GF(2m) as the most important part of these emerging cryptographic systems, is expected to be high-throughput, low-complexity, and low-latency. Fortunately, very large scale integration (VLSI) digital signal processing techniques offer great facilities to design efficient multipliers over GF(2m). This dissertation focuses on designing novel VLSI implementation of high-throughput low-latency and low-complexity single and hybrid finite field multipliers over GF(2m) for emerging cryptographic systems. Low-latency (latency can be chosen without any restriction) high-speed pentanomial basis multipliers are presented. For the first time, the dissertation also develops three high-throughput digit-serial multipliers based on pentanomials. Then a novel realization of digit-level implementation of multipliers based on redundant basis is introduced. Finally, single and hybrid reordered normal basis bit-level and digit-level high-throughput multipliers are presented. To the authors knowledge, this is the first time ever reported on multipliers with multiple throughput rate choices. All the proposed designs are simple and modular, therefore suitable for VLSI implementation for various emerging cryptographic systems

D-Scholarship@Pitt

Bit Serial Systolic Architectures for Multiplicative Inversion and Division over GF(2<sup>m</sup>)

Author: Daneshbeh Amir
Publication venue: 'University of Waterloo'
Publication date: 01/01/2005
Field of study

Systolic architectures are capable of achieving high throughput by maximizing pipelining and by eliminating global data interconnects. Recursive algorithms with regular data flows are suitable for systolization. The computation of multiplicative inversion using algorithms based on EEA (Extended Euclidean Algorithm) are particularly suitable for systolization. Implementations based on EEA present a high degree of parallelism and pipelinability at bit level which can be easily optimized to achieve local data flow and to eliminate the global interconnects which represent most important bottleneck in todays sub-micron design process. The net result is to have high clock rate and performance based on efficient systolic architectures. This thesis examines high performance but also scalable implementations of multiplicative inversion or field division over Galois fields GF(2m) in the specific case of cryptographic applications where field dimension m may be very large (greater than 400) and either m or defining irreducible polynomial may vary. For this purpose, many inversion schemes with different basis representation are studied and most importantly variants of EEA and binary (Stein's) GCD computation implementations are reviewed. A set of common as well as contrasting characteristics of these variants are discussed. As a result a generalized and optimized variant of EEA is proposed which can compute division, and multiplicative inversion as its subset, with divisor in either polynomial or triangular basis representation. Further results regarding Hankel matrix formation for double-basis inversion is provided. The validity of using the same architecture to compute field division with polynomial or triangular basis representation is proved. Next, a scalable unidirectional bit serial systolic array implementation of this proposed variant of EEA is implemented. Its complexity measures are defined and these are compared against the best known architectures. It is shown that assuming the requirements specified above, this proposed architecture may achieve a higher clock rate performance w. r. t. other designs while being more flexible, reliable and with minimum number of inter-cell interconnects. The main contribution at system level architecture is the substitution of all counter or adder/subtractor elements with a simpler distributed and free of carry propagation delays structure. Further a novel restoring mechanism for result sequences of EEA is proposed using a double delay element implementation. Finally, using this systolic architecture a CMD (Combined Multiplier Divider) datapath is designed which is used as the core of a novel systolic elliptic curve processor. This EC processor uses affine coordinates to compute scalar point multiplication which results in having a very small control unit and negligible with respect to the datapath for all practical values of m. The throughput of this EC based on this bit serial systolic architecture is comparable with designs many times larger than itself reported previously

University of Waterloo's Institutional Repository

Efficient ASIC Architectures for Low Latency Niederreiter Decryption

Author: Daniel Fallnich
Shutao Zhang
Tobias Gemmeke
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 22/04/2022
Field of study

Post-quantum cryptography addresses the increasing threat that quantum computing poses to modern communication systems. Among the available quantum-resistant systems, the Niederreiter cryptosystem is positioned as a conservative choice with strong security guarantees. As a code-based cryptosystem, the Niederreiter system enables high performance operations and is thus ideally suited for applications such as the acceleration of server workloads. However, until now, no ASIC architecture is available for low latency computation of Niederreiter operations. Therefore, the present work targets the design, implementation and optimization of tailored archi- tectures for low latency Niederreiter decryption. Two architectures utilizing different decoding algorithms are proposed and implemented using a 22nm FDSOI CMOS technology node. One of these optimized architectures improves the decryption latency by 27% compared to a state-of-the-art reference and requires at the same time only 25% of the area

Cryptology ePrint Archive

Bit-parallel word-serial polynomial basis finite field multiplier in GF(2(233)).

Author: Tang Wenkai
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2004
Field of study

Smart card gains extensive uses as a cryptographic hardware in security applications in daily life. The characteristics of smart card require that the cryptographic hardware inside the smart card have the trade-off between area and speed. There are two main public key cryptosystems, these are RSA cryptosystem and elliptic curve (EC) cryptosystem. EC has many advantages compared with RSA such as shorter key length and more suitable for VLSI implementation. Such advantages make EC an ideal candidate for smart card. Finite field multiplier is the key component in EC hardware. In this thesis, bit-parallel word-serial (BPWS) polynomial basis (PB) finite field multipliers are designed. Such architectures trade-off area with speed and are very useful for smart card. An ASIC chip which can perform finite field multiplication and finite field squaring using the BPWS PB finite field multiplier is designed in this thesis. The proposed circuit has been implemented using TSMC 0.18 CMOS technology. A novel 8 x 233 bit-parallel partial product generator is also designed. This new partial product generator has low circuit complexity. The design algorithm can be easily extended to w x m bit-parallel partial product generator for GF(2m).Dept. of Electrical and Computer Engineering. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .T36. Source: Masters Abstracts International, Volume: 43-01, page: 0286. Advisers: H. Wu; M. Ahmadi. Thesis (M.A.Sc.)--University of Windsor (Canada), 2004

Scholarship at UWindsor

New bit-parallel Montgomery multiplier for trinomials using squaring operation

Author: Yin Li
Yiyang Chen
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 03/07/2015
Field of study

In this paper, a new bit-parallel Montgomery multiplier for

GF(2^m)

is presented, where the field is generated with an irreducible trinomial. We first present a slightly generalized version of a newly proposed divide and conquer approach. Then, by combining this approach and a carefully chosen Montgomery factor, the Montgomery multiplication can be transformed into a composition of small polynomial multiplications and Montgomery squarings, which are simpler and more efficient. Explicit complexity formulae in terms of gate counts and time delay of our architecture are investigated. As a result, the proposed multiplier has generally 25\% lower space complexity than the fastest multipliers, with time complexity as good as or better than previous Karatsuba-based multipliers for the same class of fields. Among the five irreducible polynomials recommended by NIST for the ECDSA (Elliptic Curve Digital Signature Algorithm), there are two trinomials which are available for our architecture. We show that our proposal outperforms the previous best known results if the space and time complexity are both considered

Cryptology ePrint Archive

Modular Exponentiation on Reconfigurable Hardware

Author: Blum Thomas
Publication venue: Digital WPI
Publication date: 03/09/1999
Field of study

It is widely recognized that security issues will play a crucial role in the majority of future computer and communication systems. A central tool for achieving system security are cryptographic algorithms. For performance as well as for physical security reasons, it is often advantageous to realize cryptographic algorithms in hardware. In order to overcome the well-known drawback of reduced flexibility that is associated with traditional ASIC solutions, this contribution proposes arithmetic architectures which are optimized for modern field programmable gate arrays (FPGAs). The proposed architectures perform modular exponentiation with very long integers. This operation is at the heart of many practical public-key algorithms such as RSA and discrete logarithm schemes. We combine two versions of Montgomery modular multiplication algorithm with new systolic array designs which are well suited for FPGA realizations. The first one is based on a radix of two and is capable of processing a variable number of bits per array cell leading to a low cost design. The second design uses a radix of sixteen, resulting in a speed-up of a factor three at the cost of more used resources. The designs are flexible, allowing any choice of operand and modulus. Unlike previous approaches, we systematically implement and compare several versions of our new architecture for different bit lengths. We provide absolute area and timing measures for each architecture on Xilinx XC4000 series FPGAs. As a first practical result we show that it is possible to implement modular exponentiation at secure bit lengths on a single commercially available FPGA. Secondly we present faster processing times than previously reported. The Diffie-Hellman key exchange scheme with a modulus of 1024 bits and an exponent of 160 bits is computed in 1.9 ms. Our fastest design computes a 1024 bit RSA decryption in 3.1 ms when the Chinese remainder theorem is applied. These times are more than ten times faster than any reported software implementation. They also outperform most of the hardware-implementations presented in technical literature

DigitalCommons@WPI