Search CORE

3,414 research outputs found

High-Speed Area-Efficient Hardware Architecture for the Efficient Detection of Faults in a Bit-Parallel Multiplier Utilizing the Polynomial Basis of GF(2m)

Author: Javidan Javad
Nabipour Saeideh
Publication venue
Publication date: 23/06/2023
Field of study

The utilization of finite field multipliers is pervasive in contemporary digital systems, with hardware implementation for bit parallel operation often necessitating millions of logic gates. However, various digital design issues, whether natural or stemming from soft errors, can result in gate malfunction, ultimately leading to erroneous multiplier outputs. Thus, to prevent susceptibility to error, it is imperative to employ an effective finite field multiplier implementation that boasts a robust fault detection capability. This study proposes a novel fault detection scheme for a recent bit-parallel polynomial basis multiplier over GF(2m), intended to achieve optimal fault detection performance for finite field multipliers while simultaneously maintaining a low-complexity implementation, a favored attribute in resource-constrained applications like smart cards. The primary concept behind the proposed approach is centered on the implementation of a BCH decoder that utilizes re-encoding technique and FIBM algorithm in its first and second sub-modules, respectively. This approach serves to address hardware complexity concerns while also making use of Berlekamp-Rumsey-Solomon (BRS) algorithm and Chien search method in the third sub-module of the decoder to effectively locate errors with minimal delay. The results of our synthesis indicate that our proposed error detection and correction architecture for a 45-bit multiplier with 5-bit errors achieves a 37% and 49% reduction in critical path delay compared to existing designs. Furthermore, the hardware complexity associated with a 45-bit multiplicand that contains 5 errors is confined to a mere 80%, which is significantly lower than the most exceptional BCH-based fault recognition methodologies, including TMR, Hamming's single error correction, and LDPC-based procedures within the realm of finite field multiplication.Comment: 9 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:2209.1338

arXiv.org e-Print Archive

Area- Efficient VLSI Implementation of Serial-In Parallel-Out Multiplier Using Polynomial Representation in Finite Field GF(2m)

Author: Fatin Gholamreza Zare
Javidan Javad
Nabipour Saeideh
Publication venue
Publication date: 16/07/2020
Field of study

Finite field multiplier is mainly used in elliptic curve cryptography, error-correcting codes and signal processing. Finite field multiplier is regarded as the bottleneck arithmetic unit for such applications and it is the most complicated operation over finite field GF(2m) which requires a huge amount of logic resources. In this paper, a new modified serial-in parallel-out multiplication algorithm with interleaved modular reduction is suggested. The proposed method offers efficient area architecture as compared to proposed algorithms in the literature. The reduced finite field multiplier complexity is achieved by means of utilizing logic NAND gate in a particular architecture. The efficiency of the proposed architecture is evaluated based on criteria such as time (latency, critical path) and space (gate-latch number) complexity. A detailed comparative analysis indicates that, the proposed finite field multiplier based on logic NAND gate outperforms previously known resultsComment: 19 pages, 4 figure

arXiv.org e-Print Archive

Hardware/Software Co-design Applied to Reed-Solomon Decoding for the DMB Standard

Author: Dam A.C.
Lammertink M.G.J.
Rauwerda G.K.
Rovers K.C.
Slagman J.
Smit G.J.M.
Wellink A.M.
Publication venue: IEEE Computer Society
Publication date: 01/01/2006
Field of study

This paper addresses the implementation of Reed- Solomon decoding for battery-powered wireless devices. The scope of this paper is constrained by the Digital Media Broadcasting (DMB). The most critical element of the Reed-Solomon algorithm is implemented on two different reconfigurable hardware architectures: an FPGA and a coarse-grained architecture: the Montium, The remaining parts are executed on an ARM processor. The results of this research show that a co-design of the ARM together with an FPGA or a Montium leads to a substantial decrease in energy consumption. The energy consumption of syndrome calculation of the Reed- Solomon decoding algorithm is estimated for an FPGA and a Montium by means of simulations. The Montium proves to be more efficient

University of Twente Research Information

Low-Resource and Fast Elliptic Curve Implementations over Binary Edwards Curves

Author: Koziel Brian
Publication venue: RIT Scholar Works
Publication date: 01/05/2016
Field of study

Elliptic curve cryptography (ECC) is an ideal choice for low-resource applications because it provides the same level of security with smaller key sizes than other existing public key encryption schemes. For low-resource applications, designing efficient functional units for elliptic curve computations over binary fields results in an effective platform for an embedded co-processor. This thesis investigates co-processor designs for area-constrained devices. Particularly, we discuss an implementation utilizing state of the art binary Edwards curve equations over mixed point addition and doubling. The binary Edwards curve offers the security advantage that it is complete and is, therefore, immune to the exceptional points attack. In conjunction with Montgomery ladder, such a curve is naturally immune to most types of simple power and timing attacks. Finite field operations were performed in the small and efficient Gaussian normal basis. The recently presented formulas for mixed point addition by K. Kim, C. Lee, and C. Negre at Indocrypt 2014 were found to be invalid, but were corrected such that the speed and register usage were maintained. We utilize corrected mixed point addition and doubling formulas to achieve a secure, but still fast implementation of a point multiplication on binary Edwards curves. Our synthesis results over NIST recommended fields for ECC indicate that the proposed co-processor requires about 50% fewer clock cycles for point multiplication and occupies a similar silicon area when compared to the most recent in literature

RIT Scholar Works

Frequency Domain Finite Field Arithmetic for Elliptic Curve Cryptography

Author: baktir selcuk
Publication venue: Digital WPI
Publication date: 05/05/2008
Field of study

Efficient implementation of the number theoretic transform(NTT), also known as the discrete Fourier transform(DFT) over a finite field, has been studied actively for decades and found many applications in digital signal processing. In 1971 Schonhage and Strassen proposed an NTT based asymptotically fast multiplication method with the asymptotic complexity O(m log m log log m) for multiplication of

m

-bit integers or (m-1)st degree polynomials. Schonhage and Strassen\u27s algorithm was known to be the asymptotically fastest multiplication algorithm until Furer improved upon it in 2007. However, unfortunately, both algorithms bear significant overhead due to the conversions between the time and frequency domains which makes them impractical for small operands, e.g. less than 1000 bits in length as used in many applications. With this work we investigate for the first time the practical application of the NTT, which found applications in digital signal processing, to finite field multiplication with an emphasis on elliptic curve cryptography(ECC). We present efficient parameters for practical application of NTT based finite field multiplication to ECC which requires key and operand sizes as short as 160 bits in length. With this work, for the first time, the use of NTT based finite field arithmetic is proposed for ECC and shown to be efficient. We introduce an efficient algorithm, named DFT modular multiplication, for computing Montgomery products of polynomials in the frequency domain which facilitates efficient multiplication in GF(p^m). Our algorithm performs the entire modular multiplication, including modular reduction, in the frequency domain, and thus eliminates costly back and forth conversions between the frequency and time domains. We show that, especially in computationally constrained platforms, multiplication of finite field elements may be achieved more efficiently in the frequency domain than in the time domain for operand sizes relevant to ECC. This work presents the first hardware implementation of a frequency domain multiplier suitable for ECC and the first hardware implementation of ECC in the frequency domain. We introduce a novel area/time efficient ECC processor architecture which performs all finite field arithmetic operations in the frequency domain utilizing DFT modular multiplication over a class of Optimal Extension Fields(OEF). The proposed architecture achieves extension field modular multiplication in the frequency domain with only a linear number of base field GF(p) multiplications in addition to a quadratic number of simpler operations such as addition and bitwise rotation. With its low area and high speed, the proposed architecture is well suited for ECC in small device environments such as smart cards and wireless sensor networks nodes. Finally, we propose an adaptation of the Itoh-Tsujii algorithm to the frequency domain which can achieve efficient inversion in a class of OEFs relevant to ECC. This is the first time a frequency domain finite field inversion algorithm is proposed for ECC and we believe our algorithm will be well suited for efficient constrained hardware implementations of ECC in affine coordinates

DigitalCommons@WPI

Readiness of Quantum Optimization Machines for Industrial Applications

Author: Biswas Rupak
de Kleer Johan
Diedrich Alexander
Feldman Alexander
Isakov Sergei V.
Katzgraber Helmut G.
Lackey Brad
Neven Hartmut
O'Gorman Bryan
Ozaeta Asier
Perdomo-Ortiz Alejandro
Zhu Zheng
Publication venue: 'American Physical Society (APS)'
Publication date: 02/07/2019
Field of study

There have been multiple attempts to demonstrate that quantum annealing and, in particular, quantum annealing on quantum annealing machines, has the potential to outperform current classical optimization algorithms implemented on CMOS technologies. The benchmarking of these devices has been controversial. Initially, random spin-glass problems were used, however, these were quickly shown to be not well suited to detect any quantum speedup. Subsequently, benchmarking shifted to carefully crafted synthetic problems designed to highlight the quantum nature of the hardware while (often) ensuring that classical optimization techniques do not perform well on them. Even worse, to date a true sign of improved scaling with the number of problem variables remains elusive when compared to classical optimization techniques. Here, we analyze the readiness of quantum annealing machines for real-world application problems. These are typically not random and have an underlying structure that is hard to capture in synthetic benchmarks, thus posing unexpected challenges for optimization techniques, both classical and quantum alike. We present a comprehensive computational scaling analysis of fault diagnosis in digital circuits, considering architectures beyond D-wave quantum annealers. We find that the instances generated from real data in multiplier circuits are harder than other representative random spin-glass benchmarks with a comparable number of variables. Although our results show that transverse-field quantum annealing is outperformed by state-of-the-art classical optimization algorithms, these benchmark instances are hard and small in the size of the input, therefore representing the first industrial application ideally suited for testing near-term quantum annealers and other quantum algorithmic strategies for optimization problems.Comment: 22 pages, 12 figures. Content updated according to Phys. Rev. Applied versio

arXiv.org e-Print Archive

Fraunhofer-ePrints

Reconfigurable elliptic curve cryptography

Author: Malik Aarti
Publication venue: RIT Scholar Works
Publication date: 01/02/2005
Field of study

Elliptic Curve Cryptosystems (ECC) have been proposed as an alternative to other established public key cryptosystems such as RSA (Rivest Shamir Adleman). ECC provide more security per bit than other known public key schemes based on the discrete logarithm problem. Smaller key sizes result in faster computations, lower power consumption and memory and bandwidth savings, thus making ECC a fast, flexible and cost-effective solution for providing security in constrained environments. Implementing ECC on reconfigurable platform combines the speed, security and concurrency of hardware along with the flexibility of the software approach. This work proposes a generic architecture for elliptic curve cryptosystem on a Field Programmable Gate Array (FPGA) that performs an elliptic curve scalar multiplication in 1.16milliseconds for GF (2163), which is considerably faster than most other documented implementations. One of the benefits of the proposed processor architecture is that it is easily reprogrammable to use different algorithms and is adaptable to any field order. Also through reconfiguration the arithmetic unit can be optimized for different area/speed requirements. The mathematics involved uses binary extension field of the form GF (2n) as the underlying field and polynomial basis for the representation of the elements in the field. A significant gain in performance is obtained by using projective coordinates for the points on the curve during the computation process

RIT Scholar Works

Improved throughput of Elliptic Curve Digital Signature Algorithm (ECDSA) processor implementation over Koblitz curve k-163 on Field Programmable Gate Array (FPGA)

Author: Abdul-Hadi Alaa M.
Tawfeeq Firas Ghanim
Publication venue: College of Science for Women - University of Baghdad
Publication date: 08/09/2020
Field of study

يقـدم البحث دراسة عن تصميم وتنفيذ دائرة الكترونية لتوليد التوقيع الالكتروني والتاكد من صحته ,بالاعتماد على مواصفات المنحني الاهليجي الموصى بها من  قبل المعهد الوطني للمعايير والتكنولوجيا(NIST) .حيث أرتكز العمل على إختيار منحني كوبلتز وتطبيقه على الحقول المنتهية أو ما تسمى بحقول غالو(2163)GF، ونظراً لأهمية تحسين الأداء في المعالجات الحديثة المبنية في بيئة البوابات المنطقية القابلة للبرمجة (FPGA)،  فقد أظهرت نتائج المحاكاة والتنفيذ للتصميم المقترح على الجهاز نوع Virtex5-xc5vlx155t-3ff1738  زيادة في معدل البيانات التي يتم معالجتها اثناء عمليتي توليد التوقيع واثبات صحته الى 0.08187 Mbit/s وبنسبة تصل الى 6.95% ,بالمقارنة مع التصميمات السابقة ، كما أستغرقت مدة تنفيذ العمليتين 1.66 ملي ثانية وبتردد أقصاه 83.477 ميكاهرتز. تم الاخذ بنظرالاعتبار تصميم المنفذ التسلسلي غير المتزامن (UART) والمستخدم في عملية نقل البيانات بين الحاسبة وFPGA .            The widespread use of the Internet of things (IoT) in different aspects of an individual’s life like banking, wireless intelligent devices and smartphones has led to new security and performance challenges under restricted resources. The Elliptic Curve Digital Signature Algorithm (ECDSA) is the most suitable choice for the environments due to the smaller size of the encryption key and changeable security related parameters. However, major performance metrics such as area, power, latency and throughput are still customisable and based on the design requirements of the device. The present paper puts forward an enhancement for the throughput performance metric by proposing a more efficient design for the hardware implementation of ECDSA. The design raised the throughput to 0.08207 Mbit/s, leading to an increase of 6.95% from the existing design. It also includes the design and implementation of the Universal Asynchronous Receiver Transmitter (UART) module. The present work is based on a 163-bit key-size over Koblitz curve k-163 and secure hash function SHA-1. A serial module for the underlying modular layer, high-speed architecture of Koblitz point addition and Koblitz point multiplication have been considered in this work, in addition to utilising the carry-save-multiplier, modular adder-subtractor and Extended Euclidean module for ECDSA protocols. All modules are designed using VHDL and implemented on the platform Virtex5 xc5vlx155t-3ff1738. Signature generation requires 0.55360ms, while its validation consumes 1.10947288ms. Thus, the total time required to complete both processes is equal to 1.66ms and the maximum frequency is approximately 83.477MHZ, consuming a power of 99mW with the efficiency approaching 3.39 * 10-6

Baghdad Science Journal

Efficient Arithmetic for the Implementation of Elliptic Curve Cryptography

Author: Hasan Abdulrahman Ebrahim Abdulrahman
Publication venue: Scholarship@Western
Publication date: 22/11/2013
Field of study

The technology of elliptic curve cryptography is now an important branch in public-key based crypto-system. Cryptographic mechanisms based on elliptic curves depend on the arithmetic of points on the curve. The most important arithmetic is multiplying a point on the curve by an integer. This operation is known as elliptic curve scalar (or point) multiplication operation. A cryptographic device is supposed to perform this operation efficiently and securely. The elliptic curve scalar multiplication operation is performed by combining the elliptic curve point routines that are defined in terms of the underlying finite field arithmetic operations. This thesis focuses on hardware architecture designs of elliptic curve operations. In the first part, we aim at finding new architectures to implement the finite field arithmetic multiplication operation more efficiently. In this regard, we propose novel schemes for the serial-out bit-level (SOBL) arithmetic multiplication operation in the polynomial basis over F_2^m. We show that the smallest SOBL scheme presented here can provide about 26-30\% reduction in area-complexity cost and about 22-24\% reduction in power consumptions for F_2^{163} compared to the current state-of-the-art bit-level multiplier schemes. Then, we employ the proposed SOBL schemes to present new hybrid-double multiplication architectures that perform two multiplications with latency comparable to the latency of a single multiplication. Then, in the second part of this thesis, we investigate the different algorithms for the implementation of elliptic curve scalar multiplication operation. We focus our interest in three aspects, namely, the finite field arithmetic cost, the critical path delay, and the protection strength from side-channel attacks (SCAs) based on simple power analysis. In this regard, we propose a novel scheme for the scalar multiplication operation that is based on processing three bits of the scalar in the exact same sequence of five point arithmetic operations. We analyse the security of our scheme and show that its security holds against both SCAs and safe-error fault attacks. In addition, we show how the properties of the proposed elliptic curve scalar multiplication scheme yields an efficient hardware design for the implementation of a single scalar multiplication on a prime extended twisted Edwards curve incorporating 8 parallel multiplication operations. Our comparison results show that the proposed hardware architecture for the twisted Edwards curve model implemented using the proposed scalar multiplication scheme is the fastest secure SCA protected scalar multiplication scheme over prime field reported in the literature

Scholarship@Western