17 research outputs found

    Fast hybrid Karatsuba multiplier for Type II pentanomials

    Get PDF
    We continue the study of Mastrovito form of Karatsuba multipliers under the shifted polynomial basis (SPB), recently introduced by Li et al. (IEEE TC (2017)). A Mastrovito-Karatsuba (MK) multiplier utilizes the Karatsuba algorithm (KA) to optimize polynomial multiplication and the Mastrovito approach to combine it with the modular reduction. The authors developed a MK multiplier for all trinomials, which obtain a better space and time trade-off compared with previous non-recursive Karatsuba counterparts. Based on this work, we make two types of contributions in our paper. FORMULATION. We derive a new modular reduction formulation for constructing Mastrovito matrix associated with Type II pentanomial. This formula can also be applied to other special type of pentanomials, e.g. Type I pentanomial and Type C.1 pentanomial. Through related formulations, we demonstrate that Type I pentanomial is less efficient than Type II one because of a more complicated modular reduction under the same SPB; conversely, Type C.1 pentanomial is as good as Type II pentanomial under an alternative generalized polynomial basis (GPB). EXTENSION. We introduce a new MK multiplier for Type II pentanomial. It is shown that our proposal is only one TXT_X slower than the fastest bit-parallel multipliers for Type II pentanomial, but its space complexity is roughly 3/4 of those schemes, where TXT_X is the delay of one 2-input XOR gate. To the best of our knowledge, it is the first time for hybrid multiplier to achieve such a time delay bound

    An Efficient CRT-based Bit-parallel Multiplier for Special Pentanomials

    Get PDF
    The Chinese remainder theorem (CRT)-based multiplier is a new type of hybrid bit-parallel multiplier, which can achieve nearly the same time complexity compared with the fastest multiplier known to date with reduced space complexity. However, the current CRT-based multipliers are only applicable to trinomials. In this paper, we propose an efficient CRT-based bit-parallel multiplier for a special type of pentanomial xm+xmk+xm2k+xm3k+1,5k<m7kx^m+x^{m-k}+x^{m-2k}+x^{m-3k}+1, 5k<m\leq 7k. Through transforming the non-constant part xm+xmk+xm2k+xm3kx^m+x^{m-k}+x^{m-2k}+x^{m-3k} into a binomial, we can obtain relatively simpler quotient and remainder computations, which lead to faster implementation with reduced space complexity compared with classic quadratic multipliers. Moreover, for some mm, our proposal can achieve the same time delay as the fastest multipliers for irreducible Type II and Type C.1 pentanomials of the same degree, but the space complexities are reduced

    Novel Single and Hybrid Finite Field Multipliers over GF(2m) for Emerging Cryptographic Systems

    Get PDF
    With the rapid development of economic and technical progress, designers and users of various kinds of ICs and emerging embedded systems like body-embedded chips and wearable devices are increasingly facing security issues. All of these demands from customers push the cryptographic systems to be faster, more efficient, more reliable and safer. On the other hand, multiplier over GF(2m) as the most important part of these emerging cryptographic systems, is expected to be high-throughput, low-complexity, and low-latency. Fortunately, very large scale integration (VLSI) digital signal processing techniques offer great facilities to design efficient multipliers over GF(2m). This dissertation focuses on designing novel VLSI implementation of high-throughput low-latency and low-complexity single and hybrid finite field multipliers over GF(2m) for emerging cryptographic systems. Low-latency (latency can be chosen without any restriction) high-speed pentanomial basis multipliers are presented. For the first time, the dissertation also develops three high-throughput digit-serial multipliers based on pentanomials. Then a novel realization of digit-level implementation of multipliers based on redundant basis is introduced. Finally, single and hybrid reordered normal basis bit-level and digit-level high-throughput multipliers are presented. To the authors knowledge, this is the first time ever reported on multipliers with multiple throughput rate choices. All the proposed designs are simple and modular, therefore suitable for VLSI implementation for various emerging cryptographic systems

    Efficient Bit-parallel Multiplication with Subquadratic Space Complexity in Binary Extension Field

    Get PDF
    Bit-parallel multiplication in GF(2^n) with subquadratic space complexity has been explored in recent years due to its lower area cost compared with traditional parallel multiplications. Based on \u27divide and conquer\u27 technique, several algorithms have been proposed to build subquadratic space complexity multipliers. Among them, Karatsuba algorithm and its generalizations are most often used to construct multiplication architectures with significantly improved efficiency. However, recursively using one type of Karatsuba formula may not result in an optimal structure for many finite fields. It has been shown that improvements on multiplier complexity can be achieved by using a combination of several methods. After completion of a detailed study of existing subquadratic multipliers, this thesis has proposed a new algorithm to find the best combination of selected methods through comprehensive search for constructing polynomial multiplication over GF(2^n). Using this algorithm, ameliorated architectures with shortened critical path or reduced gates cost will be obtained for the given value of n, where n is in the range of [126, 600] reflecting the key size for current cryptographic applications. With different input constraints the proposed algorithm can also yield subquadratic space multiplier architectures optimized for trade-offs between space and time. Optimized multiplication architectures over NIST recommended fields generated from the proposed algorithm are presented and analyzed in detail. Compared with existing works with subquadratic space complexity, the proposed architectures are highly modular and have improved efficiency on space or time complexity. Finally generalization of the proposed algorithm to be suitable for much larger size of fields discussed

    A new approach in building parallel finite field multipliers

    Get PDF
    A new method for building bit-parallel polynomial basis finite field multipliers is proposed in this thesis. Among the different approaches to build such multipliers, Mastrovito multipliers based on a trinomial, an all-one-polynomial, or an equally-spacedpolynomial have the lowest complexities. The next best in this category is a conventional multiplier based on a pentanomial. Any newly presented method should have complexity results which are at least better than those of a pentanomial based multiplier. By applying our method to certain classes of finite fields we have gained a space complexity as n2 + H - 4 and a time complexity as TA + ([ log2(n-l) ]+3)rx which are better than the lowest space and time complexities of a pentanomial based multiplier found in literature. Therefore this multiplier can serve as an alternative in those finite fields in which no trinomial, all-one-polynomial or equally-spaced-polynomial exists

    Mastrovito Form of Non-recursive Karatsuba Multiplier for All Trinomials

    Get PDF
    We present a new type of bit-parallel non-recursive Karatsuba multiplier over GF(2m)GF(2^m) generated by an arbitrary irreducible trinomial. This design effectively exploits Mastrovito approach and shifted polynomial basis (SPB) to reduce the time complexity and Karatsuba algorithm to reduce its space complexity. We show that this type of multiplier is only one TXT_X slower than the fastest bit-parallel multiplier for all trinomials, where TXT_X is the delay of one 2-input XOR gate. Meanwhile, its space complexity is roughly 3/4 of those multipliers. To the best of our knowledge, it is the first time that our scheme has reached such a time delay bound. This result outperforms previously proposed non-recursive Karatsuba multipliers

    Efficient Implementation of Elliptic Curve Cryptography on FPGAs

    Get PDF
    This work presents the design strategies of an FPGA-based elliptic curve co-processor. Elliptic curve cryptography is an important topic in cryptography due to its relatively short key length and higher efficiency as compared to other well-known public key crypto-systems like RSA. The most important contributions of this work are: - Analyzing how different representations of finite fields and points on elliptic curves effect the performance of an elliptic curve co-processor and implementing a high performance co-processor. - Proposing a novel dynamic programming approach to find the optimum combination of different recursive polynomial multiplication methods. Here optimum means the method which has the smallest number of bit operations. - Designing a new normal-basis multiplier which is based on polynomial multipliers. The most important part of this multiplier is a circuit of size O(nlogn)O(n \log n) for changing the representation between polynomial and normal basis

    Polynomial multiplication over binary finite fields : new upper bounds

    Get PDF
    When implementing a cryptographic algorithm, efficient operations have high relevance both in hardware and in software. Since a number of operations can be performed via polynomial multiplication, the arithmetic of polynomials over finite fields plays a key role in real-life implementations\u2014e.g., accelerating cryptographic and cryptanalytic software (pre- and post-quantum) (Chou in Accelerating pre-and post-quantum cryptography. Ph.D. thesis, Technische Universiteit Eindhoven, 2016). One of the most interesting papers that addressed the problem has been published in 2009. In Bernstein (in: Halevi (ed) Advances in Cryptology\u2014CRYPTO 2009: 29th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 16\u201320, 2009. Proceedings, pp 317\u2013336. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009), Bernstein suggests to split polynomials into parts and presents a new recursive multiplication technique which is faster than those commonly used. In order to further reduce the number of bit operations (Bernstein in High-speed cryptography in characteristic 2: minimum number of bit operations for multiplication, 2009. http://binary.cr.yp.to/m.html) required to multiply n-bit polynomials, researchers adopt different approaches. In CMT: Circuit minimization work. http://www.cs.yale.edu/homes/peralta/CircuitStuff/CMT.html a greedy heuristic has been applied to linear straight-line sequences listed in Bernstein (High-speed cryptography in characteristic 2: minimum number of bit operations for multiplication, 2009. http://binary.cr.yp.to/m.html). In 2013, D\u2019angella et al. (Applied computing conference, 2013. ACC\u201913. WEAS. pp. 31\u201337. WEAS, 2013) skip some redundant operations of the multiplication algorithms described in Bernstein (in: Halevi (ed) Advances in Cryptology\u2014CRYPTO 2009: 29th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 16\u201320, 2009. Proceedings, pp 317\u2013336. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009). In 2015, Cenk et al. (J Cryptogr Eng 5(4):289\u2013303, 2015) suggest new multiplication algorithms. In this paper, (a) we present a \u201ck-1\u201d-level recursion algorithm that can be used to reduce the effective number of bit operations required to multiply n-bit polynomials, and (b) we use algebraic extensions of F 2 combined with Lagrange interpolation to improve the asymptotic complexity

    UpWB: An Uncoupled Architecture Design for White-box Cryptography Using Vectorized Montgomery Multiplication

    Get PDF
    White-box cryptography (WBC) seeks to protect secret keys even if the attacker has full control over the execution environment. One of the techniques to hide the key is space hardness approach, which conceals the key into a large lookup table generated from a reliable small block cipher. Despite its provable security, space-hard WBC also suffers from heavy performance overhead when executed on general purpose hardware platform, hundreds of magnitude slower than conventional block ciphers. Specifically, recent studies adopt nested substitution permutation network (NSPN) to construct dedicated white-box block cipher [BIT16], whose performance is limited by a massive number of rounds, nested loop dependency and high-dimension dynamic maximal distance separable (MDS) matrices. To address these limitations, we put forward UpWB, an uncoupled and efficient accelerator for NSPN-structure WBC. We propose holistic optimization techniques across timing schedule, algorithms and operators. For the high-level timing schedule, we propose a fine-grained task partition (FTP) mechanism to decouple the parameteroriented nested loop with different trip counts. The FTP mechanism narrows down the idle time for synchronization and avoids the extra usage of FIFO, which efficiently increases the computation throughput. For the optimization of arithmetic operators, we devise a flexible and vectorized modular multiplier (VMM) based on the complexity-reduced Montgomery algorithm, which can process multi-precision variable data, multi-size matrix-vector multiplication and different irreducible polynomials. Then, a configurable matrix-vector multiplication (MVM) architecture with diagonal-major dataflow is presented to handle the dynamic MDS matrix. The multi-scale (Inv)Mixcolumns are also unified in a compact manner by intensively sharing the common sub-operations and customizing the constant multiplier. To verify the proposed methodology, we showcase the unified design implementation for three recent families of WBCs, including SPNbox-8/16/24/32, Yoroi-16/32 and WARX-16. Evaluated on FPGA platform, UpWB outperforms the optimized software counterpart (executed on 3.2 GHz Intel CPU with AES-NI and AVX2 instructions) by 7x to 30x in terms of computation throughput. Synthesized under TSMC 28nm technology, 36x to 164x improvement of computation throughput is achieved when UpWB operates at the maximum frequency of 1.3 GHz and consumes a modest area 0.14 mm2. Besides, the proposed VMM also offers about 30% improvement of area efficiency without pulling flexibility down when compared to state-of-the-art work
    corecore