3,414 research outputs found
High-Speed Area-Efficient Hardware Architecture for the Efficient Detection of Faults in a Bit-Parallel Multiplier Utilizing the Polynomial Basis of GF(2m)
The utilization of finite field multipliers is pervasive in contemporary
digital systems, with hardware implementation for bit parallel operation often
necessitating millions of logic gates. However, various digital design issues,
whether natural or stemming from soft errors, can result in gate malfunction,
ultimately leading to erroneous multiplier outputs. Thus, to prevent
susceptibility to error, it is imperative to employ an effective finite field
multiplier implementation that boasts a robust fault detection capability. This
study proposes a novel fault detection scheme for a recent bit-parallel
polynomial basis multiplier over GF(2m), intended to achieve optimal fault
detection performance for finite field multipliers while simultaneously
maintaining a low-complexity implementation, a favored attribute in
resource-constrained applications like smart cards. The primary concept behind
the proposed approach is centered on the implementation of a BCH decoder that
utilizes re-encoding technique and FIBM algorithm in its first and second
sub-modules, respectively. This approach serves to address hardware complexity
concerns while also making use of Berlekamp-Rumsey-Solomon (BRS) algorithm and
Chien search method in the third sub-module of the decoder to effectively
locate errors with minimal delay. The results of our synthesis indicate that
our proposed error detection and correction architecture for a 45-bit
multiplier with 5-bit errors achieves a 37% and 49% reduction in critical path
delay compared to existing designs. Furthermore, the hardware complexity
associated with a 45-bit multiplicand that contains 5 errors is confined to a
mere 80%, which is significantly lower than the most exceptional BCH-based
fault recognition methodologies, including TMR, Hamming's single error
correction, and LDPC-based procedures within the realm of finite field
multiplication.Comment: 9 pages, 4 figures. arXiv admin note: substantial text overlap with
arXiv:2209.1338
Area- Efficient VLSI Implementation of Serial-In Parallel-Out Multiplier Using Polynomial Representation in Finite Field GF(2m)
Finite field multiplier is mainly used in elliptic curve cryptography,
error-correcting codes and signal processing. Finite field multiplier is
regarded as the bottleneck arithmetic unit for such applications and it is the
most complicated operation over finite field GF(2m) which requires a huge
amount of logic resources. In this paper, a new modified serial-in parallel-out
multiplication algorithm with interleaved modular reduction is suggested. The
proposed method offers efficient area architecture as compared to proposed
algorithms in the literature. The reduced finite field multiplier complexity is
achieved by means of utilizing logic NAND gate in a particular architecture.
The efficiency of the proposed architecture is evaluated based on criteria such
as time (latency, critical path) and space (gate-latch number) complexity. A
detailed comparative analysis indicates that, the proposed finite field
multiplier based on logic NAND gate outperforms previously known resultsComment: 19 pages, 4 figure
Hardware/Software Co-design Applied to Reed-Solomon Decoding for the DMB Standard
This paper addresses the implementation of Reed-
Solomon decoding for battery-powered wireless
devices. The scope of this paper is constrained by the
Digital Media Broadcasting (DMB). The most critical
element of the Reed-Solomon algorithm is implemented
on two different reconfigurable hardware
architectures: an FPGA and a coarse-grained
architecture: the Montium, The remaining parts are
executed on an ARM processor. The results of this
research show that a co-design of the ARM together
with an FPGA or a Montium leads to a substantial
decrease in energy consumption. The energy
consumption of syndrome calculation of the Reed-
Solomon decoding algorithm is estimated for an FPGA
and a Montium by means of simulations. The Montium
proves to be more efficient
Low-Resource and Fast Elliptic Curve Implementations over Binary Edwards Curves
Elliptic curve cryptography (ECC) is an ideal choice for low-resource applications because it provides the same level of security with smaller key sizes than other existing public key encryption schemes. For low-resource applications, designing efficient functional units for elliptic curve computations over binary fields results in an effective platform for an embedded co-processor. This thesis investigates co-processor designs for area-constrained devices. Particularly, we discuss an implementation utilizing state of the art binary Edwards curve equations over mixed point addition and doubling. The binary Edwards curve offers the security advantage that it is complete and is, therefore, immune to the exceptional points attack. In conjunction with Montgomery ladder, such a curve is naturally immune to most types of simple power and timing attacks. Finite field operations were performed in the small and efficient Gaussian normal basis. The recently presented formulas for mixed point addition by K. Kim, C. Lee, and C. Negre at Indocrypt 2014 were found to be invalid, but were corrected such that the speed and register usage were maintained. We utilize corrected mixed point addition and doubling formulas to achieve a secure, but still fast implementation of a point multiplication on binary Edwards curves. Our synthesis results over NIST recommended fields for ECC indicate that the proposed co-processor requires about 50% fewer clock cycles for point multiplication and occupies a similar silicon area when compared to the most recent in literature
Frequency Domain Finite Field Arithmetic for Elliptic Curve Cryptography
Efficient implementation of the number theoretic transform(NTT), also known as the discrete Fourier transform(DFT) over a finite field, has been studied actively for decades and found many applications in digital signal processing. In 1971 Schonhage and Strassen proposed an NTT based asymptotically fast multiplication method with the asymptotic complexity O(m log m log log m) for multiplication of -bit integers or (m-1)st degree polynomials. Schonhage and Strassen\u27s algorithm was known to be the asymptotically fastest multiplication algorithm until Furer improved upon it in 2007. However, unfortunately, both algorithms bear significant overhead due to the conversions between the time and frequency domains which makes them impractical for small operands, e.g. less than 1000 bits in length as used in many applications. With this work we investigate for the first time the practical application of the NTT, which found applications in digital signal processing, to finite field multiplication with an emphasis on elliptic curve cryptography(ECC). We present efficient parameters for practical application of NTT based finite field multiplication to ECC which requires key and operand sizes as short as 160 bits in length. With this work, for the first time, the use of NTT based finite field arithmetic is proposed for ECC and shown to be efficient. We introduce an efficient algorithm, named DFT modular multiplication, for computing Montgomery products of polynomials in the frequency domain which facilitates efficient multiplication in GF(p^m). Our algorithm performs the entire modular multiplication, including modular reduction, in the frequency domain, and thus eliminates costly back and forth conversions between the frequency and time domains. We show that, especially in computationally constrained platforms, multiplication of finite field elements may be achieved more efficiently in the frequency domain than in the time domain for operand sizes relevant to ECC. This work presents the first hardware implementation of a frequency domain multiplier suitable for ECC and the first hardware implementation of ECC in the frequency domain. We introduce a novel area/time efficient ECC processor architecture which performs all finite field arithmetic operations in the frequency domain utilizing DFT modular multiplication over a class of Optimal Extension Fields(OEF). The proposed architecture achieves extension field modular multiplication in the frequency domain with only a linear number of base field GF(p) multiplications in addition to a quadratic number of simpler operations such as addition and bitwise rotation. With its low area and high speed, the proposed architecture is well suited for ECC in small device environments such as smart cards and wireless sensor networks nodes. Finally, we propose an adaptation of the Itoh-Tsujii algorithm to the frequency domain which can achieve efficient inversion in a class of OEFs relevant to ECC. This is the first time a frequency domain finite field inversion algorithm is proposed for ECC and we believe our algorithm will be well suited for efficient constrained hardware implementations of ECC in affine coordinates
Readiness of Quantum Optimization Machines for Industrial Applications
There have been multiple attempts to demonstrate that quantum annealing and,
in particular, quantum annealing on quantum annealing machines, has the
potential to outperform current classical optimization algorithms implemented
on CMOS technologies. The benchmarking of these devices has been controversial.
Initially, random spin-glass problems were used, however, these were quickly
shown to be not well suited to detect any quantum speedup. Subsequently,
benchmarking shifted to carefully crafted synthetic problems designed to
highlight the quantum nature of the hardware while (often) ensuring that
classical optimization techniques do not perform well on them. Even worse, to
date a true sign of improved scaling with the number of problem variables
remains elusive when compared to classical optimization techniques. Here, we
analyze the readiness of quantum annealing machines for real-world application
problems. These are typically not random and have an underlying structure that
is hard to capture in synthetic benchmarks, thus posing unexpected challenges
for optimization techniques, both classical and quantum alike. We present a
comprehensive computational scaling analysis of fault diagnosis in digital
circuits, considering architectures beyond D-wave quantum annealers. We find
that the instances generated from real data in multiplier circuits are harder
than other representative random spin-glass benchmarks with a comparable number
of variables. Although our results show that transverse-field quantum annealing
is outperformed by state-of-the-art classical optimization algorithms, these
benchmark instances are hard and small in the size of the input, therefore
representing the first industrial application ideally suited for testing
near-term quantum annealers and other quantum algorithmic strategies for
optimization problems.Comment: 22 pages, 12 figures. Content updated according to Phys. Rev. Applied
versio
Reconfigurable elliptic curve cryptography
Elliptic Curve Cryptosystems (ECC) have been proposed as an alternative to other established public key cryptosystems such as RSA (Rivest Shamir Adleman). ECC provide more security per bit than other known public key schemes based on the discrete logarithm problem. Smaller key sizes result in faster computations, lower power consumption and memory and bandwidth savings, thus making ECC a fast, flexible and cost-effective solution for providing security in constrained environments. Implementing ECC on reconfigurable platform combines the speed, security and concurrency of hardware along with the flexibility of the software approach.
This work proposes a generic architecture for elliptic curve cryptosystem on a Field Programmable Gate Array (FPGA) that performs an elliptic curve scalar multiplication in 1.16milliseconds for GF (2163), which is considerably faster than most other documented implementations. One of the benefits of the proposed processor architecture is that it is easily reprogrammable to use different algorithms and is adaptable to any field order. Also through reconfiguration the arithmetic unit can be optimized for different area/speed requirements. The mathematics involved uses binary extension field of the form GF (2n) as the underlying field and polynomial basis for the representation of the elements in the field. A significant gain in performance is obtained by using projective coordinates for the points on the curve during the computation process
Improved throughput of Elliptic Curve Digital Signature Algorithm (ECDSA) processor implementation over Koblitz curve k-163 on Field Programmable Gate Array (FPGA)
يقـدم البحث دراسة عن تصميم وتنفيذ دائرة الكترونية لتوليد التوقيع الالكتروني والتاكد من صحته ,بالاعتماد على مواصفات المنحني الاهليجي الموصى بها من قبل المعهد الوطني للمعايير والتكنولوجيا(NIST) .حيث أرتكز العمل على إختيار منحني كوبلتز وتطبيقه على الحقول المنتهية أو ما تسمى بحقول غالو(2163)GF، ونظراً لأهمية تحسين الأداء في المعالجات الحديثة المبنية في بيئة البوابات المنطقية القابلة للبرمجة (FPGA)، فقد أظهرت نتائج المحاكاة والتنفيذ للتصميم المقترح على الجهاز نوع Virtex5-xc5vlx155t-3ff1738 زيادة في معدل البيانات التي يتم معالجتها اثناء عمليتي توليد التوقيع واثبات صحته الى 0.08187 Mbit/s وبنسبة تصل الى 6.95% ,بالمقارنة مع التصميمات السابقة ، كما أستغرقت مدة تنفيذ العمليتين 1.66 ملي ثانية وبتردد أقصاه 83.477 ميكاهرتز. تم الاخذ بنظرالاعتبار تصميم المنفذ التسلسلي غير المتزامن (UART) والمستخدم في عملية نقل البيانات بين الحاسبة وFPGA . The widespread use of the Internet of things (IoT) in different aspects of an individual’s life like banking, wireless intelligent devices and smartphones has led to new security and performance challenges under restricted resources. The Elliptic Curve Digital Signature Algorithm (ECDSA) is the most suitable choice for the environments due to the smaller size of the encryption key and changeable security related parameters. However, major performance metrics such as area, power, latency and throughput are still customisable and based on the design requirements of the device.
The present paper puts forward an enhancement for the throughput performance metric by proposing a more efficient design for the hardware implementation of ECDSA. The design raised the throughput to 0.08207 Mbit/s, leading to an increase of 6.95% from the existing design. It also includes the design and implementation of the Universal Asynchronous Receiver Transmitter (UART) module. The present work is based on a 163-bit key-size over Koblitz curve k-163 and secure hash function SHA-1. A serial module for the underlying modular layer, high-speed architecture of Koblitz point addition and Koblitz point multiplication have been considered in this work, in addition to utilising the carry-save-multiplier, modular adder-subtractor and Extended Euclidean module for ECDSA protocols. All modules are designed using VHDL and implemented on the platform Virtex5 xc5vlx155t-3ff1738. Signature generation requires 0.55360ms, while its validation consumes 1.10947288ms. Thus, the total time required to complete both processes is equal to 1.66ms and the maximum frequency is approximately 83.477MHZ, consuming a power of 99mW with the efficiency approaching 3.39 * 10-6
Efficient Arithmetic for the Implementation of Elliptic Curve Cryptography
The technology of elliptic curve cryptography is now an important branch in public-key based crypto-system. Cryptographic mechanisms based on elliptic curves depend on the arithmetic of points on the curve. The most important arithmetic is multiplying a point on the curve by an integer. This operation is known as elliptic curve scalar (or point) multiplication operation. A cryptographic device is supposed to perform this operation efficiently and securely. The elliptic curve scalar multiplication operation is performed by combining the elliptic curve point routines that are defined in terms of the underlying finite field arithmetic operations. This thesis focuses on hardware architecture designs of elliptic curve operations. In the first part, we aim at finding new architectures to implement the finite field arithmetic multiplication operation more efficiently. In this regard, we propose novel schemes for the serial-out bit-level (SOBL) arithmetic multiplication operation in the polynomial basis over F_2^m. We show that the smallest SOBL scheme presented here can provide about 26-30\% reduction in area-complexity cost and about 22-24\% reduction in power consumptions for F_2^{163} compared to the current state-of-the-art bit-level multiplier schemes. Then, we employ the proposed SOBL schemes to present new hybrid-double multiplication architectures that perform two multiplications with latency comparable to the latency of a single multiplication. Then, in the second part of this thesis, we investigate the different algorithms for the implementation of elliptic curve scalar multiplication operation. We focus our interest in three aspects, namely, the finite field arithmetic cost, the critical path delay, and the protection strength from side-channel attacks (SCAs) based on simple power analysis. In this regard, we propose a novel scheme for the scalar multiplication operation that is based on processing three bits of the scalar in the exact same sequence of five point arithmetic operations. We analyse the security of our scheme and show that its security holds against both SCAs and safe-error fault attacks. In addition, we show how the properties of the proposed elliptic curve scalar multiplication scheme yields an efficient hardware design for the implementation of a single scalar multiplication on a prime extended twisted Edwards curve incorporating 8 parallel multiplication operations. Our comparison results show that the proposed hardware architecture for the twisted Edwards curve model implemented using the proposed scalar multiplication scheme is the fastest secure SCA protected scalar multiplication scheme over prime field reported in the literature
- …