14 research outputs found

    Efficient Bit-parallel Multiplication with Subquadratic Space Complexity in Binary Extension Field

    Get PDF
    Bit-parallel multiplication in GF(2^n) with subquadratic space complexity has been explored in recent years due to its lower area cost compared with traditional parallel multiplications. Based on \u27divide and conquer\u27 technique, several algorithms have been proposed to build subquadratic space complexity multipliers. Among them, Karatsuba algorithm and its generalizations are most often used to construct multiplication architectures with significantly improved efficiency. However, recursively using one type of Karatsuba formula may not result in an optimal structure for many finite fields. It has been shown that improvements on multiplier complexity can be achieved by using a combination of several methods. After completion of a detailed study of existing subquadratic multipliers, this thesis has proposed a new algorithm to find the best combination of selected methods through comprehensive search for constructing polynomial multiplication over GF(2^n). Using this algorithm, ameliorated architectures with shortened critical path or reduced gates cost will be obtained for the given value of n, where n is in the range of [126, 600] reflecting the key size for current cryptographic applications. With different input constraints the proposed algorithm can also yield subquadratic space multiplier architectures optimized for trade-offs between space and time. Optimized multiplication architectures over NIST recommended fields generated from the proposed algorithm are presented and analyzed in detail. Compared with existing works with subquadratic space complexity, the proposed architectures are highly modular and have improved efficiency on space or time complexity. Finally generalization of the proposed algorithm to be suitable for much larger size of fields discussed

    Trade-Off Approach for GHASH Computation Based on a Block-Merging Strategy

    Get PDF
    In the Galois counter mode (GCM) of encryption an authentication tag is computed with a sequence of multiplications and additions in F 2 m. In this paper we focus on multiply-and-add architecture with a suquadratic space complexity multiplier in F 2 m. We propose a recom-bination of the architecture of P. Patel (Master Thesis, U. Waterloo, ON. Canada, 2008) which is based on a subquadratic space complexity Toeplitz matrix vector product. We merge some blocks of the recombined architecture in order to reduce the critical path delay. We obtain an architecture with a subquadratic space complexity of O(log 2 (m)m log 2 (m)) and a reduced delay of (1.59 log 2 (m) + log 2 (δ))D X + D A where δ is a small constant. To the best of our knowledge, this is the first multiply-and-add architecture with subquadratic space complexity and delay smaller than 2 log 2 (m)D X

    Multiplication in Finite Fields and Elliptic Curves

    Get PDF
    La cryptographie à clef publique permet de s'échanger des clefs de façon distante, d'effectuer des signatures électroniques, de s'authentifier à distance, etc. Dans cette thèse d'HDR nous allons présenter quelques contributions concernant l'implantation sûre et efficace de protocoles cryptographiques basés sur les courbes elliptiques. L'opération de base effectuée dans ces protocoles est la multiplication scalaire d'un point de la courbe. Chaque multiplication scalaire nécessite plusieurs milliers d'opérations dans un corps fini.Dans la première partie du manuscrit nous nous intéressons à la multiplication dans les corps finis car c'est l'opération la plus coûteuse et la plus utilisée. Nous présentons d'abord des contributions sur les multiplieurs parallèles dans les corps binaires. Un premier résultat concerne l'approche sous-quadratique dans une base normale optimale de type 2. Plus précisément, nous améliorons un multiplieur basé sur un produit de matrice de Toeplitz avec un vecteur en utilisant une recombinaison des blocs qui supprime certains calculs redondants. Nous présentons aussi un multiplieur pous les corps binaires basé sur une extension d'une optimisation de la multiplication polynomiale de Karatsuba.Ensuite nous présentons des résultats concernant la multiplication dans un corps premier. Nous présentons en particulier une approche de type Montgomery pour la multiplication dans une base adaptée à l'arithmétique modulaire. Cette approche cible la multiplication modulo un premier aléatoire. Nous présentons alors une méthode pour la multiplication dans des corps utilisés dans la cryptographie sur les couplages : les extensions de petits degrés d'un corps premier aléatoire. Cette méthode utilise une base adaptée engendrée par une racine de l'unité facilitant la multiplication polynomiale basée sur la FFT. Dans la dernière partie de cette thèse d'HDR nous nous intéressons à des résultats qui concernent la multiplication scalaire sur les courbes elliptiques. Nous présentons une parallélisation de l'échelle binaire de Montgomery dans le cas de E(GF(2^n)). Nous survolons aussi quelques contributions sur des formules de division par 3 dans E(GF(3^n)) et une parallélisation de type (third,triple)-and-add. Dans le dernier chapitre nous développons quelques directions de recherches futures. Nous discutons d'abord de possibles extensions des travaux faits sur les corps binaires. Nous présentons aussi des axes de recherche liés à la randomisation de l'arithmétique qui permet une protection contre les attaques matérielles

    A Chinese Remainder Theorem Approach to Bit-Parallel GF(2^n) Polynomial Basis Multipliers for Irreducible Trinomials

    Get PDF
    We show that the step “modulo the degree-n field generating irreducible polynomial” in the classical definition of the GF(2^n) multiplication operation can be avoided. This leads to an alternative representation of the finite field multiplication operation. Combining this representation and the Chinese Remainder Theorem, we design bit-parallel GF(2^n) multipliers for irreducible trinomials u^n + u^k + 1 on GF(2) where 1 < k &#8804; n=2. For some values of n, our architectures have the same time complexity as the fastest bit-parallel multipliers – the quadratic multipliers, but their space complexities are reduced. Take the special irreducible trinomial u^(2k) + u^k + 1 for example, the space complexity of the proposed design is reduced by about 1=8, while the time complexity matches the best result. Our experimental results show that among the 539 values of n such that 4 < n < 1000 and x^n+x^k+1 is irreducible over GF(2) for some k in the range 1 < k &#8804; n=2, the proposed multipliers beat the current fastest parallel multipliers for 290 values of n when (n &#8722; 1)=3 &#8804; k &#8804; n=2: they have the same time complexity, but the space complexities are reduced by 8.4% on average

    Fast hybrid Karatsuba multiplier for Type II pentanomials

    Get PDF
    We continue the study of Mastrovito form of Karatsuba multipliers under the shifted polynomial basis (SPB), recently introduced by Li et al. (IEEE TC (2017)). A Mastrovito-Karatsuba (MK) multiplier utilizes the Karatsuba algorithm (KA) to optimize polynomial multiplication and the Mastrovito approach to combine it with the modular reduction. The authors developed a MK multiplier for all trinomials, which obtain a better space and time trade-off compared with previous non-recursive Karatsuba counterparts. Based on this work, we make two types of contributions in our paper. FORMULATION. We derive a new modular reduction formulation for constructing Mastrovito matrix associated with Type II pentanomial. This formula can also be applied to other special type of pentanomials, e.g. Type I pentanomial and Type C.1 pentanomial. Through related formulations, we demonstrate that Type I pentanomial is less efficient than Type II one because of a more complicated modular reduction under the same SPB; conversely, Type C.1 pentanomial is as good as Type II pentanomial under an alternative generalized polynomial basis (GPB). EXTENSION. We introduce a new MK multiplier for Type II pentanomial. It is shown that our proposal is only one TXT_X slower than the fastest bit-parallel multipliers for Type II pentanomial, but its space complexity is roughly 3/4 of those schemes, where TXT_X is the delay of one 2-input XOR gate. To the best of our knowledge, it is the first time for hybrid multiplier to achieve such a time delay bound

    Digit-Level Serial-In Parallel-Out Multiplier Using Redundant Representation for a Class of Finite Fields

    Get PDF
    Two digit-level finite field multipliers in F2m using redundant representation are presented. Embedding F2m in cyclotomic field F2(n) causes a certain amount of redundancy and consequently performing field multiplication using redundant representation would require more hardware resources. Based on a specific feature of redundant representation in a class of finite fields, two new multiplication algorithms along with their pertaining architectures are proposed to alleviate this problem. Considering area-delay product as a measure of evaluation, it has been shown that both the proposed architectures considerably outperform existing digit-level multipliers using the same basis. It is also shown that for a subset of the fields, the proposed multipliers are of higher performance in terms of area-delay complexities among several recently proposed optimal normal basis multipliers. The main characteristics of the postplace&route application specific integrated circuit implementation of the proposed multipliers for three practical digit sizes are also reported

    On Large Polynomial Multiplication in Certain Rings

    Get PDF
    Multiplication of polynomials with large integer coefficients and very high degree is used in cryptography. Residue number system (RNS) helps distribute a very large integer over a set of smaller integers, which makes the computations faster. In this thesis, multiplication of polynomials in ring Z_p/(x^n + 1) where n is a power of two is analyzed using the Schoolbook method, Karatsuba algorithm, Toeplitz matrix vector product (TMVP) method and Number Theoretic Transform (NTT) method. All coefficients are residues of p which is a 30-bit integer that has been selected from the set of 30-bit moduli for RNS in NFLlib. NTT has a computational complexity of O(n log n) and hence, it has the best performance among all these methods for the multiplication of large polynomials. NTT method limits applications in ring Z_p/(x^n + 1). This restricts size of the polynomials to only powers of two. We consider multiplication in other cyclotomic rings using TMVP method which has a subquadratic complexity of O(n log_2 3). An attempt is made to improve the performance of TMVP method by designing a hybrid method that switches to schoolbook method when n reaches a certain low value. It is first implemented in Zp/(x^n + 1) to improve the performance of TMVP for large polynomials. This method performs almost as good as NTT for polynomials of size 2^(10). TMVP method is then exploited to design multipliers in other rings Z_p/Φ_k where Φ_k is a cyclotomic trinomial. Similar hybrid designs are analyzed to improve performance in the trinomial rings. This allows a wider range of polynomials in terms of size to work with and helps avoid unnecessary use of larger key size that might slow down computations

    Hardware Implementations for Symmetric Key Cryptosystems

    Get PDF
    The utilization of global communications network for supporting new electronic applications is growing. Many applications provided over the global communications network involve exchange of security-sensitive information between different entities. Often, communicating entities are located at different locations around the globe. This demands deployment of certain mechanisms for providing secure communications channels between these entities. For this purpose, cryptographic algorithms are used by many of today\u27s electronic applications to maintain security. Cryptographic algorithms provide set of primitives for achieving different security goals such as: confidentiality, data integrity, authenticity, and non-repudiation. In general, two main categories of cryptographic algorithms can be used to accomplish any of these security goals, namely, asymmetric key algorithms and symmetric key algorithms. The security of asymmetric key algorithms is based on the hardness of the underlying computational problems, which usually require large overhead of space and time complexities. On the other hand, the security of symmetric key algorithms is based on non-linear transformations and permutations, which provide efficient implementations compared to the asymmetric key ones. Therefore, it is common to use asymmetric key algorithms for key exchange, while symmetric key counterparts are deployed in securing the communications sessions. This thesis focuses on finding efficient hardware implementations for symmetric key cryptosystems targeting mobile communications and resource constrained applications. First, efficient lightweight hardware implementations of two members of the Welch-Gong (WG) family of stream ciphers, the WG(29,11)\left(29,11\right) and WG-1616, are considered for the mobile communications domain. Optimizations in the WG(29,11)\left(29,11\right) stream cipher are considered when the GF(229)GF\left(2^{29}\right) elements are represented in either the Optimal normal basis type-II (ONB-II) or the Polynomial basis (PB). For WG-1616, optimizations are considered only for PB representations of the GF(216)GF\left(2^{16}\right) elements. In this regard, optimizations for both ciphers are accomplished mainly at the arithmetic level through reducing the number of field multipliers, based on novel trace properties. In addition, other optimization techniques such as serialization and pipelining, are also considered. After this, the thesis explores efficient hardware implementations for digit-level multiplication over binary extension fields GF(2m)GF\left(2^{m}\right). Efficient digit-level GF(2m)GF\left(2^{m}\right) multiplications are advantageous for ultra-lightweight implementations, not only in symmetric key algorithms, but also in asymmetric key algorithms. The thesis introduces new architectures for digit-level GF(2m)GF\left(2^{m}\right) multipliers considering the Gaussian normal basis (GNB) and PB representations of the field elements. The new digit-level GF(2m)GF\left(2^{m}\right) single multipliers do not require loading of the two input field elements in advance to computations. This feature results in high throughput fast multiplication in resource constrained applications with limited capacity of input data-paths. The new digit-level GF(2m)GF\left(2^{m}\right) single multipliers are considered for both the GNB and PB. In addition, for the GNB representation, new architectures for digit-level GF(2m)GF\left(2^{m}\right) hybrid-double and hybrid-triple multipliers are introduced. The new digit-level GF(2m)GF\left(2^{m}\right) hybrid-double and hybrid-triple GNB multipliers, respectively, accomplish the multiplication of three and four field elements using the latency required for multiplying two field elements. Furthermore, a new hardware architecture for the eight-ary exponentiation scheme is proposed by utilizing the new digit-level GF(2m)GF\left(2^{m}\right) hybrid-triple GNB multipliers

    Hardware processors for pairing-based cryptography

    Get PDF
    Bilinear pairings can be used to construct cryptographic systems with very desirable properties. A pairing performs a mapping on members of groups on elliptic and genus 2 hyperelliptic curves to an extension of the finite field on which the curves are defined. The finite fields must, however, be large to ensure adequate security. The complicated group structure of the curves and the expensive field operations result in time consuming computations that are an impediment to the practicality of pairing-based systems. The Tate pairing can be computed efficiently using the ɳT method. Hardware architectures can be used to accelerate the required operations by exploiting the parallelism inherent to the algorithmic and finite field calculations. The Tate pairing can be performed on elliptic curves of characteristic 2 and 3 and on genus 2 hyperelliptic curves of characteristic 2. Curve selection is dependent on several factors including desired computational speed, the area constraints of the target device and the required security level. In this thesis, custom hardware processors for the acceleration of the Tate pairing are presented and implemented on an FPGA. The underlying hardware architectures are designed with care to exploit available parallelism while ensuring resource efficiency. The characteristic 2 elliptic curve processor contains novel units that return a pairing result in a very low number of clock cycles. Despite the more complicated computational algorithm, the speed of the genus 2 processor is comparable. Pairing computation on each of these curves can be appealing in applications with various attributes. A flexible processor that can perform pairing computation on elliptic curves of characteristic 2 and 3 has also been designed. An integrated hardware/software design and verification environment has been developed. This system automates the procedures required for robust processor creation and enables the rapid provision of solutions for a wide range of cryptographic applications

    16th Scandinavian Symposium and Workshops on Algorithm Theory: SWAT 2018, June 18-20, 2018, Malmö University, Malmö, Sweden

    Get PDF
    corecore