860 research outputs found

    Low Power Reversible Parallel Binary Adder/Subtractor

    Get PDF
    In recent years, Reversible Logic is becoming more and more prominent technology having its applications in Low Power CMOS, Quantum Computing, Nanotechnology, and Optical Computing. Reversibility plays an important role when energy efficient computations are considered. In this paper, Reversible eight-bit Parallel Binary Adder/Subtractor with Design I, Design II and Design III are proposed. In all the three design approaches, the full Adder and Subtractors are realized in a single unit as compared to only full Subtractor in the existing design. The performance analysis is verified using number reversible gates, Garbage input/outputs and Quantum Cost. It is observed that Reversible eight-bit Parallel Binary Adder/Subtractor with Design III is efficient compared to Design I, Design II and existing design.Comment: 12 pages,VLSICS Journa

    Efficient Bit-parallel Multiplication with Subquadratic Space Complexity in Binary Extension Field

    Get PDF
    Bit-parallel multiplication in GF(2^n) with subquadratic space complexity has been explored in recent years due to its lower area cost compared with traditional parallel multiplications. Based on \u27divide and conquer\u27 technique, several algorithms have been proposed to build subquadratic space complexity multipliers. Among them, Karatsuba algorithm and its generalizations are most often used to construct multiplication architectures with significantly improved efficiency. However, recursively using one type of Karatsuba formula may not result in an optimal structure for many finite fields. It has been shown that improvements on multiplier complexity can be achieved by using a combination of several methods. After completion of a detailed study of existing subquadratic multipliers, this thesis has proposed a new algorithm to find the best combination of selected methods through comprehensive search for constructing polynomial multiplication over GF(2^n). Using this algorithm, ameliorated architectures with shortened critical path or reduced gates cost will be obtained for the given value of n, where n is in the range of [126, 600] reflecting the key size for current cryptographic applications. With different input constraints the proposed algorithm can also yield subquadratic space multiplier architectures optimized for trade-offs between space and time. Optimized multiplication architectures over NIST recommended fields generated from the proposed algorithm are presented and analyzed in detail. Compared with existing works with subquadratic space complexity, the proposed architectures are highly modular and have improved efficiency on space or time complexity. Finally generalization of the proposed algorithm to be suitable for much larger size of fields discussed

    Study on bit parallel and serial arithmetic logic approaches

    Get PDF
    Abstract. This paper provides general overview of how computers process numbers and how computers do arithmetic. Different ways to implement digital arithmetic logic are presented. Bit-serial designs can save chip real estate, but require more clock cycles for arithmetic operations such as additions and multiplications. Bit-parallel designs produce results with fewer clock cycles, but require more gates, e.g., due to carry-look-ahead generators. This may translate into higher power dissipation. This BSc thesis presents an exploration of bit-serial-parallel and bit-parallel arithmetic logic designs. The intention is to gain understanding of their basic design characteristics

    Error Detecting Dual Basis Bit Parallel Systolic Multiplication Architecture over GF(2m)

    Get PDF
    An error tolerant hardware efficient very large scale integration (VLSI) architecture for bit parallel systolic multiplication over dual base, which can be pipelined, is presented. Since this architecture has the features of regularity, modularity and unidirectional data flow, this structure is well suited to VLSI implementations. The length of the largest delay path and area of this architecture are less compared to the bit parallel systolic multiplication architectures reported earlier. The architecture is implemented using Austria Micro System's 0.35 m CMOS (complementary metal oxide semiconductor) technology. This architecture can also operate over both the dual-base and polynomial base

    High-Speed Area-Efficient Hardware Architecture for the Efficient Detection of Faults in a Bit-Parallel Multiplier Utilizing the Polynomial Basis of GF(2m)

    Full text link
    The utilization of finite field multipliers is pervasive in contemporary digital systems, with hardware implementation for bit parallel operation often necessitating millions of logic gates. However, various digital design issues, whether natural or stemming from soft errors, can result in gate malfunction, ultimately leading to erroneous multiplier outputs. Thus, to prevent susceptibility to error, it is imperative to employ an effective finite field multiplier implementation that boasts a robust fault detection capability. This study proposes a novel fault detection scheme for a recent bit-parallel polynomial basis multiplier over GF(2m), intended to achieve optimal fault detection performance for finite field multipliers while simultaneously maintaining a low-complexity implementation, a favored attribute in resource-constrained applications like smart cards. The primary concept behind the proposed approach is centered on the implementation of a BCH decoder that utilizes re-encoding technique and FIBM algorithm in its first and second sub-modules, respectively. This approach serves to address hardware complexity concerns while also making use of Berlekamp-Rumsey-Solomon (BRS) algorithm and Chien search method in the third sub-module of the decoder to effectively locate errors with minimal delay. The results of our synthesis indicate that our proposed error detection and correction architecture for a 45-bit multiplier with 5-bit errors achieves a 37% and 49% reduction in critical path delay compared to existing designs. Furthermore, the hardware complexity associated with a 45-bit multiplicand that contains 5 errors is confined to a mere 80%, which is significantly lower than the most exceptional BCH-based fault recognition methodologies, including TMR, Hamming's single error correction, and LDPC-based procedures within the realm of finite field multiplication.Comment: 9 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:2209.1338

    An Efficient CRT-based Bit-parallel Multiplier for Special Pentanomials

    Get PDF
    The Chinese remainder theorem (CRT)-based multiplier is a new type of hybrid bit-parallel multiplier, which can achieve nearly the same time complexity compared with the fastest multiplier known to date with reduced space complexity. However, the current CRT-based multipliers are only applicable to trinomials. In this paper, we propose an efficient CRT-based bit-parallel multiplier for a special type of pentanomial xm+xmk+xm2k+xm3k+1,5k<m7kx^m+x^{m-k}+x^{m-2k}+x^{m-3k}+1, 5k<m\leq 7k. Through transforming the non-constant part xm+xmk+xm2k+xm3kx^m+x^{m-k}+x^{m-2k}+x^{m-3k} into a binomial, we can obtain relatively simpler quotient and remainder computations, which lead to faster implementation with reduced space complexity compared with classic quadratic multipliers. Moreover, for some mm, our proposal can achieve the same time delay as the fastest multipliers for irreducible Type II and Type C.1 pentanomials of the same degree, but the space complexities are reduced

    BP-NTT: Fast and Compact in-SRAM Number Theoretic Transform with Bit-Parallel Modular Multiplication

    Full text link
    Number Theoretic Transform (NTT) is an essential mathematical tool for computing polynomial multiplication in promising lattice-based cryptography. However, costly division operations and complex data dependencies make efficient and flexible hardware design to be challenging, especially on resource-constrained edge devices. Existing approaches either focus on only limited parameter settings or impose substantial hardware overhead. In this paper, we introduce a hardware-algorithm methodology to efficiently accelerate NTT in various settings using in-cache computing. By leveraging an optimized bit-parallel modular multiplication and introducing costless shift operations, our proposed solution provides up to 29x higher throughput-per-area and 2.8-100x better throughput-per-area-per-joule compared to the state-of-the-art.Comment: This work is accepted to the 60th Design Automation Conference (DAC), 202

    Bit-level pipelined digit-serial array processors

    Get PDF
    A new architecture for high performance digit-serial vector inner product (VIP) which can be pipelined to the bit-level is introduced. The design of the digit-serial vector inner product is based on a new systematic design methodology using radix-2n arithmetic. The proposed architecture allows a high level of bit-level pipelining to increase the throughput rate with minimum initial delay and minimum area. This will give designers greater flexibility in finding the best tradeoff between hardware cost and throughput rate. It is shown that sub-digit pipelined digit-serial structure can achieve a higher throughput rate with much less area consumption than an equivalent bit-parallel structure. A twin-pipe architecture to double the throughput rate of digit-serial multipliers and consequently that of the digit-serial vector inner product is also presented. The effect of the number of pipelining levels and the twin-pipe architecture on the throughput rate and hardware cost are discussed. A two's complement digit-serial architecture which can operate on both negative and positive numbers is also presented
    corecore