732 research outputs found

    Efficient Bit-parallel Multiplication with Subquadratic Space Complexity in Binary Extension Field

    Get PDF
    Bit-parallel multiplication in GF(2^n) with subquadratic space complexity has been explored in recent years due to its lower area cost compared with traditional parallel multiplications. Based on \u27divide and conquer\u27 technique, several algorithms have been proposed to build subquadratic space complexity multipliers. Among them, Karatsuba algorithm and its generalizations are most often used to construct multiplication architectures with significantly improved efficiency. However, recursively using one type of Karatsuba formula may not result in an optimal structure for many finite fields. It has been shown that improvements on multiplier complexity can be achieved by using a combination of several methods. After completion of a detailed study of existing subquadratic multipliers, this thesis has proposed a new algorithm to find the best combination of selected methods through comprehensive search for constructing polynomial multiplication over GF(2^n). Using this algorithm, ameliorated architectures with shortened critical path or reduced gates cost will be obtained for the given value of n, where n is in the range of [126, 600] reflecting the key size for current cryptographic applications. With different input constraints the proposed algorithm can also yield subquadratic space multiplier architectures optimized for trade-offs between space and time. Optimized multiplication architectures over NIST recommended fields generated from the proposed algorithm are presented and analyzed in detail. Compared with existing works with subquadratic space complexity, the proposed architectures are highly modular and have improved efficiency on space or time complexity. Finally generalization of the proposed algorithm to be suitable for much larger size of fields discussed

    Synthesis and Optimization of Reversible Circuits - A Survey

    Full text link
    Reversible logic circuits have been historically motivated by theoretical research in low-power electronics as well as practical improvement of bit-manipulation transforms in cryptography and computer graphics. Recently, reversible circuits have attracted interest as components of quantum algorithms, as well as in photonic and nano-computing technologies where some switching devices offer no signal gain. Research in generating reversible logic distinguishes between circuit synthesis, post-synthesis optimization, and technology mapping. In this survey, we review algorithmic paradigms --- search-based, cycle-based, transformation-based, and BDD-based --- as well as specific algorithms for reversible synthesis, both exact and heuristic. We conclude the survey by outlining key open challenges in synthesis of reversible and quantum logic, as well as most common misconceptions.Comment: 34 pages, 15 figures, 2 table

    Secure and Efficient RNS Approach for Elliptic Curve Cryptography

    Get PDF
    Scalar multiplication, the main operation in elliptic curve cryptographic protocols, is vulnerable to side-channel (SCA) and fault injection (FA) attacks. An efficient countermeasure for scalar multiplication can be provided by using alternative number systems like the Residue Number System (RNS). In RNS, a number is represented as a set of smaller numbers, where each one is the result of the modular reduction with a given moduli basis. Under certain requirements, a number can be uniquely transformed from the integers to the RNS domain (and vice versa) and all arithmetic operations can be performed in RNS. This representation provides an inherent SCA and FA resistance to many attacks and can be further enhanced by RNS arithmetic manipulation or more traditional algorithmic countermeasures. In this paper, extending our previous work, we explore the potentials of RNS as an SCA and FA countermeasure and provide an description of RNS based SCA and FA resistance means. We propose a secure and efficient Montgomery Power Ladder based scalar multiplication algorithm on RNS and discuss its SCAFA resistance. The proposed algorithm is implemented on an ARM Cortex A7 processor and its SCA-FA resistance is evaluated by collecting preliminary leakage trace results that validate our initial assumptions

    A new class of irreducible pentanomials for polynomial-based multipliers in binary fields

    Get PDF
    We introduce a new class of irreducible pentanomials over F2\mathbb{F}_2 of the form f(x)=x2b+c+xb+c+xb+xc+1f(x) = x^{2b+c} + x^{b+c} + x^b + x^c + 1. Let m=2b+cm=2b+c and use ff to define the finite field extension of degree mm. We give the exact number of operations required for computing the reduction modulo ff. We also provide a multiplier based on Karatsuba algorithm in F2[x]\mathbb{F}_2[x] combined with our reduction process. We give the total cost of the multiplier and found that the bit-parallel multiplier defined by this new class of polynomials has improved XOR and AND complexity. Our multiplier has comparable time delay when compared to other multipliers based on Karatsuba algorithm

    High Speed and Low Latency ECC Implementation over GF(2m) on FPGA

    Get PDF
    In this paper, a novel high-speed elliptic curve cryptography (ECC) processor implementation for point multiplication (PM) on field-programmable gate array (FPGA) is proposed. A new segmented pipelined full-precision multiplier is used to reduce the latency, and the Lopez-Dahab Montgomery PM algorithm is modified for careful scheduling to avoid data dependency resulting in a drastic reduction in the number of clock cycles (CCs) required. The proposed ECC architecture has been implemented on Xilinx FPGAs' Virtex4, Virtex5, and Virtex7 families. To the best of our knowledge, our single- and three-multiplier-based designs show the fastest performance to date when compared with reported works individually. Our one-multiplier-based ECC processor also achieves the highest reported speed together with the best reported area-time performance on Virtex4 (5.32 ÎĽs at 210 MHz), on Virtex5 (4.91 ÎĽs at 228 MHz), and on the more advanced Virtex7 (3.18 ÎĽs at 352 MHz). Finally, the proposed three-multiplier-based ECC implementation is the first work reporting the lowest number of CCs and the fastest ECC processor design on FPGA (450 CCs to get 2.83 ÎĽs on Virtex7)

    A Chinese Remainder Theorem Approach to Bit-Parallel GF(2^n) Polynomial Basis Multipliers for Irreducible Trinomials

    Get PDF
    We show that the step “modulo the degree-n field generating irreducible polynomial” in the classical definition of the GF(2^n) multiplication operation can be avoided. This leads to an alternative representation of the finite field multiplication operation. Combining this representation and the Chinese Remainder Theorem, we design bit-parallel GF(2^n) multipliers for irreducible trinomials u^n + u^k + 1 on GF(2) where 1 < k &#8804; n=2. For some values of n, our architectures have the same time complexity as the fastest bit-parallel multipliers – the quadratic multipliers, but their space complexities are reduced. Take the special irreducible trinomial u^(2k) + u^k + 1 for example, the space complexity of the proposed design is reduced by about 1=8, while the time complexity matches the best result. Our experimental results show that among the 539 values of n such that 4 < n < 1000 and x^n+x^k+1 is irreducible over GF(2) for some k in the range 1 < k &#8804; n=2, the proposed multipliers beat the current fastest parallel multipliers for 290 values of n when (n &#8722; 1)=3 &#8804; k &#8804; n=2: they have the same time complexity, but the space complexities are reduced by 8.4% on average

    Overlap-free Karatsuba-Ofman Polynomial Multiplication Algorithms

    Get PDF
    We describe how a simple way to split input operands allows for fast VLSI implementations of subquadratic GF(2)[x]GF(2)[x] Karatsuba-Ofman multipliers. The theoretical XOR gate delay of the resulting multipliers is reduced significantly. For example, it is reduced by about 33\% and 25\% for n=2tn=2^{t} and n=3tn=3^{t} (t>1)(t>1), respectively. To the best of our knowledge, this parameter has never been improved since the original Karatsuba-Ofman algorithm was first used to design GF(2n)GF(2^n) multipliers in 1990

    Fast Modular Reduction for Large-Integer Multiplication

    Get PDF
    The work contained in this thesis is a representation of the successful attempt to speed-up the modular reduction as an independent step of modular multiplication, which is the central operation in public-key cryptosystems. Based on the properties of Mersenne and Quasi-Mersenne primes, four distinct sets of moduli have been described, which are responsible for converting the single-precision multiplication prevalent in many of today\u27s techniques into an addition operation and a few simple shift operations. A novel algorithm has been proposed for modular folding. With the backing of the special moduli sets, the proposed algorithm is shown to outperform (speed-wise) the Modified Barrett algorithm by 80% for operands of length 700 bits, the least speed-up being around 70% for smaller operands, in the range of around 100 bits

    N-term Karatsuba Algorithm and its Application to Multiplier designs for Special Trinomials

    Get PDF
    In this paper, we propose a new type of non-recursive Mastrovito multiplier for GF(2m)GF(2^m) using a nn-term Karatsuba algorithm (KA), where GF(2m)GF(2^m) is defined by an irreducible trinomial, xm+xk+1,m=nkx^m+x^k+1, m=nk. We show that such a type of trinomial combined with the nn-term KA can fully exploit the spatial correlation of entries in related Mastrovito product matrices and lead to a low complexity architecture. The optimal parameter nn is further studied. As the main contribution of this study, the lower bound of the space complexity of our proposal is about O(m22+m3/2)O(\frac{m^2}{2}+m^{3/2}). Meanwhile, the time complexity matches the best Karatsuba multiplier known to date. To the best of our knowledge, it is the first time that Karatsuba-based multiplier has reached such a space complexity bound while maintaining relatively low time delay
    • …
    corecore