702 research outputs found

    The MFIBVP real-time multiplier

    Get PDF
    This paper presents the architecture of the MFIBVP real-time multiplier which is The MFIBVP technique is a combination of the MSB–First computation, the Interval-Bounded Arithmetic and the Variable-Precision computation techniques.The MFIBVP computation guarantees the computation carried out will produce high accuracy from the early computation time, self error estimation and time-optimal computation. This paper shows the performance of the MFIBVP real-time multiplier unit that can gives accuracy of it’s intermediate-result more than 99% since the second phase of its process

    Serial-data computation in VLSI

    Get PDF

    Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues

    Get PDF
    The residue number system (RNS) is suitable for DSP architectures because of its ability to perform fast carry-free arithmetic. However, this advantage is over-shadowed by the complexity involved in the conversion of numbers between binary and RNS representations. Although the reverse conversion (RNS to binary) is more complex, the forward transformation is not simple either. Most forward converters make use of look-up tables (memory). Recently, a memoryless forward converter architecture for arbitrary moduli sets was proposed by Premkumar in 2002. In this paper, we present an extension to that architecture which results in 44% less hardware for parallel conversion and achieves 43% improvement in speed for serial conversions. It makes use of the periodicity properties of residues obtained using modular exponentiation

    Efficient Arithmetic for the Implementation of Elliptic Curve Cryptography

    Get PDF
    The technology of elliptic curve cryptography is now an important branch in public-key based crypto-system. Cryptographic mechanisms based on elliptic curves depend on the arithmetic of points on the curve. The most important arithmetic is multiplying a point on the curve by an integer. This operation is known as elliptic curve scalar (or point) multiplication operation. A cryptographic device is supposed to perform this operation efficiently and securely. The elliptic curve scalar multiplication operation is performed by combining the elliptic curve point routines that are defined in terms of the underlying finite field arithmetic operations. This thesis focuses on hardware architecture designs of elliptic curve operations. In the first part, we aim at finding new architectures to implement the finite field arithmetic multiplication operation more efficiently. In this regard, we propose novel schemes for the serial-out bit-level (SOBL) arithmetic multiplication operation in the polynomial basis over F_2^m. We show that the smallest SOBL scheme presented here can provide about 26-30\% reduction in area-complexity cost and about 22-24\% reduction in power consumptions for F_2^{163} compared to the current state-of-the-art bit-level multiplier schemes. Then, we employ the proposed SOBL schemes to present new hybrid-double multiplication architectures that perform two multiplications with latency comparable to the latency of a single multiplication. Then, in the second part of this thesis, we investigate the different algorithms for the implementation of elliptic curve scalar multiplication operation. We focus our interest in three aspects, namely, the finite field arithmetic cost, the critical path delay, and the protection strength from side-channel attacks (SCAs) based on simple power analysis. In this regard, we propose a novel scheme for the scalar multiplication operation that is based on processing three bits of the scalar in the exact same sequence of five point arithmetic operations. We analyse the security of our scheme and show that its security holds against both SCAs and safe-error fault attacks. In addition, we show how the properties of the proposed elliptic curve scalar multiplication scheme yields an efficient hardware design for the implementation of a single scalar multiplication on a prime extended twisted Edwards curve incorporating 8 parallel multiplication operations. Our comparison results show that the proposed hardware architecture for the twisted Edwards curve model implemented using the proposed scalar multiplication scheme is the fastest secure SCA protected scalar multiplication scheme over prime field reported in the literature

    A Gate-Array Realization of an Algorithm for Division

    Get PDF
    A realization of a division algorithm suitable for high speed pipeline and realtime processors is presented. Implementation of the divide algorithm can be achieved by utilizing LSI / VLSI gate array technology. The divider performs precision, high speed 9 bit sign magnitude division. The design consist of combinational logic, where input and output data are latched into input and output registers. Data propagates through 16 divide stages. The n\u27th stage generates the n\u27th quotient bit upon receiving the updated dividend and controls from the previous stage. A simulation program is developed to verify the algorithm, and an analysis for speed performance and cost is provided. Other division algorithms are discussed

    High speed world level finite field multipliers in F2m

    Get PDF
    Finite fields have important applications in number theory, algebraic geometry, Galois theory, cryptography, and coding theory. Recently, the use of finite field arithmetic in the area of cryptography has increasingly gained importance. Elliptic curve and El-Gamal cryptosystems are two important examples of public key cryptosystems widely used today based on finite field arithmetic. Research in this area is moving toward finding new architectures to implement the arithmetic operations more efficiently. Two types of finite fields are commonly used in practice, prime field GF(p) and the binary extension field GF(2 m). The binary extension fields are attractive for high speed cryptography applications since they are suitable for hardware implementations. Hardware implementation of finite field multipliers can usually be categorized into three categories: bit-serial, bit-parallel, and word-level architectures. The word-level multipliers provide architectural flexibility and trade-off between the performance and limitations of VLSI implementation and I/O ports, thus it is of more practical significance. In this work, different word level architectures for multiplication using binary field are proposed. It has been shown that the proposed architectures are more efficient compared to similar proposals considering area/delay complexities as a measure of performance. Practical size multipliers for cryptography applications have been realized in hardware using FPGA or standard CMOS technology, to similar proposals considering area/delay complexities as a measure of performance. Practical size multipliers for cryptography applications have been realized in hardware using FPGA or standard CMOS technology. Also different VLSI implementations for multipliers were explored which resulted in more efficient implementations for some of the regular architectures. The new implementations use a simple module designed in domino logic as the main building block for the multiplier. Significant speed improvements was achieved designing practical size multipliers using the proposed methodology

    ACCELERATION OF SPARSE MATRIX MULTIPLICATION USING BIT-SERIAL ARITHMETIC

    Get PDF
    Machine Learning inference requires the multiplication of large, sparse matrices. We argue that direct spatial implementation of these fixed matrices minimizes the work per- formed in the computation, and allows for significant reduction in latency and power through constant propagation and logic minimization. Bit-serial arithmetic enables massive static matrices to be implemented. We present the structure of our bit-serial matrix multiplier, and evaluate using canonical signed digit representation to further reduce logic utilization. We have implemented these matrices on a large FPGA and provide a cost model that is simple and extensible. These FPGA implementations, on average, reduce latency by 50x up to 86x versus GPU libraries. Comparing against a recent sparse DNN accelerator, we measure a 4.1x to 47x reduction in latency depending on matrix dimension and sparsity. Throughput of the FPGA solution is also competitive for a wide range of matrix dimensions and batch sizes. Finally, we discuss ways these techniques could be deployed in ASICs, making them applicable for dynamic sparse matrix computations.M.S

    One way Doppler extractor. Volume 1: Vernier technique

    Get PDF
    A feasibility analysis, trade-offs, and implementation for a One Way Doppler Extraction system are discussed. A Doppler error analysis shows that quantization error is a primary source of Doppler measurement error. Several competing extraction techniques are compared and a Vernier technique is developed which obtains high Doppler resolution with low speed logic. Parameter trade-offs and sensitivities for the Vernier technique are analyzed, leading to a hardware design configuration. A detailed design, operation, and performance evaluation of the resulting breadboard model is presented which verifies the theoretical performance predictions. Performance tests have verified that the breadboard is capable of extracting Doppler, on an S-band signal, to an accuracy of less than 0.02 Hertz for a one second averaging period. This corresponds to a range rate error of no more than 3 millimeters per second

    MSB-First Interval-Bounded Variable-Precision RealTime Arithmetic Unit

    Get PDF
    This paper presents a paradigm of real-time processing on the lowest level of computing systems: the arithmetic unit. The arithmetic unit based on this principle containing addition, subtraction, multiplication and division operations is  described.  The  development  of  the  computation  model  is  based  on  the  Soft Computing and the Imprecise Computation paradigms, combined with the MSBFirst  and  the  Interval  Arithmetic  techniques.  Those  paradigms  and  techniques give  the  arithmetic  unit  design  the  ability  to  compute  with  precisions  as  a function  of time available or accuracy needed. The predictability of processing time and result's accuracy are obtained by means of processing granularity of k bits and by  using look-up tables. We present an evaluation of  the operation in time  delay  and  computation  accuracy  that  shows  significant  performance improvement over conventional arithmetic unit architecture, that is,  the ability to produce  intermediate-result  during  execution  time,  to  give  certainty  in computation  accuracy  even  before  the  process  finish  time  by  providing  two intermediate-results,  which  act  as  the  lower  and  upper  bound  of  the  real  and complete computation result, and finally, gain high computation accuracy from the early time of the execution process

    High Speed and Low-Complexity Hardware Architectures for Elliptic Curve-Based Crypto-Processors

    Get PDF
    The elliptic curve cryptography (ECC) has been identified as an efficient scheme for public-key cryptography. This thesis studies efficient implementation of ECC crypto-processors on hardware platforms in a bottom-up approach. We first study efficient and low-complexity architectures for finite field multiplications over Gaussian normal basis (GNB). We propose three new low-complexity digit-level architectures for finite field multiplication. Architectures are modified in order to make them more suitable for hardware implementations specially focusing on reducing the area usage. Then, for the first time, we propose a hybrid digit-level multiplier architecture which performs two multiplications together (double-multiplication) with the same number of clock cycles required as the one for one multiplication. We propose a new hardware architecture for point multiplication on newly introduced binary Edwards and generalized Hessian curves. We investigate higher level parallelization and lower level scheduling for point multiplication on these curves. Also, we propose a highly parallel architecture for point multiplication on Koblitz curves by modifying the addition formulation. Several FPGA implementations exploiting these modifications are presented in this thesis. We employed the proposed hybrid multiplier architecture to reduce the latency of point multiplication in ECC crypto-processors as well as the double-exponentiation. This scheme is the first known method to increase the speed of point multiplication whenever parallelization fails due to the data dependencies amongst lower level arithmetic computations. Our comparison results show that our proposed multiplier architectures outperform the counterparts available in the literature. Furthermore, fast computation of point multiplication on different binary elliptic curves is achieved
    corecore