Search CORE

860 research outputs found

Low Power Reversible Parallel Binary Adder/Subtractor

Author: Muralidhara K N
Raja K B
Rangaraju H G
Venugopal U.
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 01/09/2010
Field of study

In recent years, Reversible Logic is becoming more and more prominent technology having its applications in Low Power CMOS, Quantum Computing, Nanotechnology, and Optical Computing. Reversibility plays an important role when energy efficient computations are considered. In this paper, Reversible eight-bit Parallel Binary Adder/Subtractor with Design I, Design II and Design III are proposed. In all the three design approaches, the full Adder and Subtractors are realized in a single unit as compared to only full Subtractor in the existing design. The performance analysis is verified using number reversible gates, Garbage input/outputs and Quantum Cost. It is observed that Reversible eight-bit Parallel Binary Adder/Subtractor with Design III is efficient compared to Design I, Design II and existing design.Comment: 12 pages,VLSICS Journa

arXiv.org e-Print Archive

Crossref

ePrints@Bangalore University

Efficient Bit-parallel Multiplication with Subquadratic Space Complexity in Binary Extension Field

Author: Duan Xiaolin
Publication venue: 'University of Windsor Leddy Library'
Publication date: 31/08/2018
Field of study

Bit-parallel multiplication in GF(2^n) with subquadratic space complexity has been explored in recent years due to its lower area cost compared with traditional parallel multiplications. Based on \u27divide and conquer\u27 technique, several algorithms have been proposed to build subquadratic space complexity multipliers. Among them, Karatsuba algorithm and its generalizations are most often used to construct multiplication architectures with significantly improved efficiency. However, recursively using one type of Karatsuba formula may not result in an optimal structure for many finite fields. It has been shown that improvements on multiplier complexity can be achieved by using a combination of several methods. After completion of a detailed study of existing subquadratic multipliers, this thesis has proposed a new algorithm to find the best combination of selected methods through comprehensive search for constructing polynomial multiplication over GF(2^n). Using this algorithm, ameliorated architectures with shortened critical path or reduced gates cost will be obtained for the given value of n, where n is in the range of [126, 600] reflecting the key size for current cryptographic applications. With different input constraints the proposed algorithm can also yield subquadratic space multiplier architectures optimized for trade-offs between space and time. Optimized multiplication architectures over NIST recommended fields generated from the proposed algorithm are presented and analyzed in detail. Compared with existing works with subquadratic space complexity, the proposed architectures are highly modular and have improved efficiency on space or time complexity. Finally generalization of the proposed algorithm to be suitable for much larger size of fields discussed

Scholarship at UWindsor

Study on bit parallel and serial arithmetic logic approaches

Author: Vähäsöyrinki V. (Veikko)
Publication venue: University of Oulu
Publication date: 10/03/2023
Field of study

Abstract. This paper provides general overview of how computers process numbers and how computers do arithmetic. Different ways to implement digital arithmetic logic are presented. Bit-serial designs can save chip real estate, but require more clock cycles for arithmetic operations such as additions and multiplications. Bit-parallel designs produce results with fewer clock cycles, but require more gates, e.g., due to carry-look-ahead generators. This may translate into higher power dissipation. This BSc thesis presents an exploration of bit-serial-parallel and bit-parallel arithmetic logic designs. The intention is to gain understanding of their basic design characteristics

University of Oulu Repository - Jultika

Error Detecting Dual Basis Bit Parallel Systolic Multiplication Architecture over GF(2m)

Author: Bera A.
Mathew J.
Pradhan D.
Rahaman H.
Singh Ashutosh Kumar
Publication venue: University of Electronic Science and Technology
Publication date: 01/01/2009
Field of study

An error tolerant hardware efficient very large scale integration (VLSI) architecture for bit parallel systolic multiplication over dual base, which can be pipelined, is presented. Since this architecture has the features of regularity, modularity and unidirectional data flow, this structure is well suited to VLSI implementations. The length of the largest delay path and area of this architecture are less compared to the bit parallel systolic multiplication architectures reported earlier. The architecture is implemented using Austria Micro System's 0.35 m CMOS (complementary metal oxide semiconductor) technology. This architecture can also operate over both the dual-base and polynomial base

espace@Curtin

High-Speed Area-Efficient Hardware Architecture for the Efficient Detection of Faults in a Bit-Parallel Multiplier Utilizing the Polynomial Basis of GF(2m)

Author: Javidan Javad
Nabipour Saeideh
Publication venue
Publication date: 23/06/2023
Field of study

The utilization of finite field multipliers is pervasive in contemporary digital systems, with hardware implementation for bit parallel operation often necessitating millions of logic gates. However, various digital design issues, whether natural or stemming from soft errors, can result in gate malfunction, ultimately leading to erroneous multiplier outputs. Thus, to prevent susceptibility to error, it is imperative to employ an effective finite field multiplier implementation that boasts a robust fault detection capability. This study proposes a novel fault detection scheme for a recent bit-parallel polynomial basis multiplier over GF(2m), intended to achieve optimal fault detection performance for finite field multipliers while simultaneously maintaining a low-complexity implementation, a favored attribute in resource-constrained applications like smart cards. The primary concept behind the proposed approach is centered on the implementation of a BCH decoder that utilizes re-encoding technique and FIBM algorithm in its first and second sub-modules, respectively. This approach serves to address hardware complexity concerns while also making use of Berlekamp-Rumsey-Solomon (BRS) algorithm and Chien search method in the third sub-module of the decoder to effectively locate errors with minimal delay. The results of our synthesis indicate that our proposed error detection and correction architecture for a 45-bit multiplier with 5-bit errors achieves a 37% and 49% reduction in critical path delay compared to existing designs. Furthermore, the hardware complexity associated with a 45-bit multiplicand that contains 5 errors is confined to a mere 80%, which is significantly lower than the most exceptional BCH-based fault recognition methodologies, including TMR, Hamming's single error correction, and LDPC-based procedures within the realm of finite field multiplication.Comment: 9 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:2209.1338

arXiv.org e-Print Archive

An Efficient CRT-based Bit-parallel Multiplier for Special Pentanomials

Author: Yin Li
Yu Zhang
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 10/06/2020
Field of study

The Chinese remainder theorem (CRT)-based multiplier is a new type of hybrid bit-parallel multiplier, which can achieve nearly the same time complexity compared with the fastest multiplier known to date with reduced space complexity. However, the current CRT-based multipliers are only applicable to trinomials. In this paper, we propose an efficient CRT-based bit-parallel multiplier for a special type of pentanomial

x^m+x^{m-k}+x^{m-2k}+x^{m-3k}+1, 5k<m\leq 7k

. Through transforming the non-constant part

x^m+x^{m-k}+x^{m-2k}+x^{m-3k}

into a binomial, we can obtain relatively simpler quotient and remainder computations, which lead to faster implementation with reduced space complexity compared with classic quadratic multipliers. Moreover, for some

m

, our proposal can achieve the same time delay as the fastest multipliers for irreducible Type II and Type C.1 pentanomials of the same degree, but the space complexities are reduced

Cryptology ePrint Archive

BP-NTT: Fast and Compact in-SRAM Number Theoretic Transform with Bit-Parallel Modular Multiplication

Author: Imani Mohsen
Sadredini Elaheh
Zhang Jingyao
Publication venue
Publication date: 22/04/2023
Field of study

Number Theoretic Transform (NTT) is an essential mathematical tool for computing polynomial multiplication in promising lattice-based cryptography. However, costly division operations and complex data dependencies make efficient and flexible hardware design to be challenging, especially on resource-constrained edge devices. Existing approaches either focus on only limited parameter settings or impose substantial hardware overhead. In this paper, we introduce a hardware-algorithm methodology to efficiently accelerate NTT in various settings using in-cache computing. By leveraging an optimized bit-parallel modular multiplication and introducing costless shift operations, our proposed solution provides up to 29x higher throughput-per-area and 2.8-100x better throughput-per-area-per-joule compared to the state-of-the-art.Comment: This work is accepted to the 60th Design Automation Conference (DAC), 202

arXiv.org e-Print Archive

Bit-level pipelined digit-serial array processors

Author: Aggoun A
Ashur A
Ibrahim MK
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/1998
Field of study

A new architecture for high performance digit-serial vector inner product (VIP) which can be pipelined to the bit-level is introduced. The design of the digit-serial vector inner product is based on a new systematic design methodology using radix-2n arithmetic. The proposed architecture allows a high level of bit-level pipelining to increase the throughput rate with minimum initial delay and minimum area. This will give designers greater flexibility in finding the best tradeoff between hardware cost and throughput rate. It is shown that sub-digit pipelined digit-serial structure can achieve a higher throughput rate with much less area consumption than an equivalent bit-parallel structure. A twin-pipe architecture to double the throughput rate of digit-serial multipliers and consequently that of the digit-serial vector inner product is also presented. The effect of the number of pipelining levels and the twin-pipe architecture on the throughput rate and hardware cost are discussed. A two's complement digit-serial architecture which can operate on both negative and positive numbers is also presented

Crossref

Brunel University Research Archive