19 research outputs found

    GF(2n)GF(2^n) Bit-Parallel Squarer Using Generalized Polynomial Basis For a New Class of Irreducible Pentanomials

    Get PDF
    We present explicit formulae and complexities of bit-parallel GF(2n)GF(2^{n}) squarers for a new class of irreducible pentanomials xn+xn−1+xk+x+1x^{n}+x^{n-1}+x^{k}+x+1, where nn is odd and 1<k<(n−1)/21<k<(n-1)/2. The squarer is based on the generalized polynomial basis of GF(2n)GF(2^{n}). Its gate delay matches the best results, while its XOR gate complexity is n+1n+1, which is only about 2/3 of the current best results

    Efficient Square-based Montgomery Multiplier for All Type C.1 Pentanomials

    Get PDF
    In this paper, we present a low complexity bit-parallel Montgomery multiplier for GF(2m)GF(2^m) generated with a special class of irreducible pentanomials xm+xm−1+xk+x+1x^m+x^{m-1}+x^k+x+1. Based on a combination of generalized polynomial basis (GPB) squarer and a newly proposed square-based divide and conquer approach, we can partition field multiplications into a composition of sub-polynomial multiplications and Montgomery/GPB squarings, which have simpler architecture and thus can be implemented efficiently. Consequently, the proposed multiplier roughly saves 1/4 logic gates compared with the fastest multipliers, while the time complexity matches previous multipliers using divide and conquer algorithms

    Reconfigurable elliptic curve cryptography

    Get PDF
    Elliptic Curve Cryptosystems (ECC) have been proposed as an alternative to other established public key cryptosystems such as RSA (Rivest Shamir Adleman). ECC provide more security per bit than other known public key schemes based on the discrete logarithm problem. Smaller key sizes result in faster computations, lower power consumption and memory and bandwidth savings, thus making ECC a fast, flexible and cost-effective solution for providing security in constrained environments. Implementing ECC on reconfigurable platform combines the speed, security and concurrency of hardware along with the flexibility of the software approach. This work proposes a generic architecture for elliptic curve cryptosystem on a Field Programmable Gate Array (FPGA) that performs an elliptic curve scalar multiplication in 1.16milliseconds for GF (2163), which is considerably faster than most other documented implementations. One of the benefits of the proposed processor architecture is that it is easily reprogrammable to use different algorithms and is adaptable to any field order. Also through reconfiguration the arithmetic unit can be optimized for different area/speed requirements. The mathematics involved uses binary extension field of the form GF (2n) as the underlying field and polynomial basis for the representation of the elements in the field. A significant gain in performance is obtained by using projective coordinates for the points on the curve during the computation process

    New bit-parallel Montgomery multiplier for trinomials using squaring operation

    Get PDF
    In this paper, a new bit-parallel Montgomery multiplier for GF(2m)GF(2^m) is presented, where the field is generated with an irreducible trinomial. We first present a slightly generalized version of a newly proposed divide and conquer approach. Then, by combining this approach and a carefully chosen Montgomery factor, the Montgomery multiplication can be transformed into a composition of small polynomial multiplications and Montgomery squarings, which are simpler and more efficient. Explicit complexity formulae in terms of gate counts and time delay of our architecture are investigated. As a result, the proposed multiplier has generally 25\% lower space complexity than the fastest multipliers, with time complexity as good as or better than previous Karatsuba-based multipliers for the same class of fields. Among the five irreducible polynomials recommended by NIST for the ECDSA (Elliptic Curve Digital Signature Algorithm), there are two trinomials which are available for our architecture. We show that our proposal outperforms the previous best known results if the space and time complexity are both considered

    N-term Karatsuba Algorithm and its Application to Multiplier designs for Special Trinomials

    Get PDF
    In this paper, we propose a new type of non-recursive Mastrovito multiplier for GF(2m)GF(2^m) using a nn-term Karatsuba algorithm (KA), where GF(2m)GF(2^m) is defined by an irreducible trinomial, xm+xk+1,m=nkx^m+x^k+1, m=nk. We show that such a type of trinomial combined with the nn-term KA can fully exploit the spatial correlation of entries in related Mastrovito product matrices and lead to a low complexity architecture. The optimal parameter nn is further studied. As the main contribution of this study, the lower bound of the space complexity of our proposal is about O(m22+m3/2)O(\frac{m^2}{2}+m^{3/2}). Meanwhile, the time complexity matches the best Karatsuba multiplier known to date. To the best of our knowledge, it is the first time that Karatsuba-based multiplier has reached such a space complexity bound while maintaining relatively low time delay

    Efficient Bit-parallel Multiplication with Subquadratic Space Complexity in Binary Extension Field

    Get PDF
    Bit-parallel multiplication in GF(2^n) with subquadratic space complexity has been explored in recent years due to its lower area cost compared with traditional parallel multiplications. Based on \u27divide and conquer\u27 technique, several algorithms have been proposed to build subquadratic space complexity multipliers. Among them, Karatsuba algorithm and its generalizations are most often used to construct multiplication architectures with significantly improved efficiency. However, recursively using one type of Karatsuba formula may not result in an optimal structure for many finite fields. It has been shown that improvements on multiplier complexity can be achieved by using a combination of several methods. After completion of a detailed study of existing subquadratic multipliers, this thesis has proposed a new algorithm to find the best combination of selected methods through comprehensive search for constructing polynomial multiplication over GF(2^n). Using this algorithm, ameliorated architectures with shortened critical path or reduced gates cost will be obtained for the given value of n, where n is in the range of [126, 600] reflecting the key size for current cryptographic applications. With different input constraints the proposed algorithm can also yield subquadratic space multiplier architectures optimized for trade-offs between space and time. Optimized multiplication architectures over NIST recommended fields generated from the proposed algorithm are presented and analyzed in detail. Compared with existing works with subquadratic space complexity, the proposed architectures are highly modular and have improved efficiency on space or time complexity. Finally generalization of the proposed algorithm to be suitable for much larger size of fields discussed

    Computer Architectures for Cryptosystems Based on Hyperelliptic Curves

    Get PDF
    Security issues play an important role in almost all modern communication and computer networks. As Internet applications continue to grow dramatically, security requirements have to be strengthened. Hyperelliptic curve cryptosystems (HECC) allow for shorter operands at the same level of security than other public-key cryptosystems, such as RSA or Diffie-Hellman. These shorter operands appear promising for many applications. Hyperelliptic curves are a generalization of elliptic curves and they can also be used for building discrete logarithm public-key schemes. A major part of this work is the development of computer architectures for the different algorithms needed for HECC. The architectures are developed for a reconfigurable platform based on Field Programmable Gate Arrays (FPGAs). FPGAs combine the flexibility of software solutions with the security of traditional hardware implementations. In particular, it is possible to easily change all algorithm parameters such as curve coefficients and underlying finite field. In this work we first summarized the theoretical background of hyperelliptic curve cryptosystems. In order to realize the operation addition and doubling on the Jacobian, we developed architectures for the composition and reduction step. These in turn are based on architectures for arithmetic in the underlying field and for arithmetic in the polynomial ring. The architectures are described in VHDL (VHSIC Hardware Description Language) and the code was functionally verified. Some of the arithmetic modules were also synthesized. We provide estimates for the clock cycle count for a group operation in the Jacobian. The system targeted was HECC of genus four over GF(2^41)

    On Space-Time Trade-Off for Montgomery Multipliers over Finite Fields

    Get PDF
    La multiplication dans le corps de Galois Ă  2^m Ă©lĂ©ments (i.e. GF(2^m)) est une opĂ©rations trĂšs importante pour les applications de la thĂ©orie des correcteurs et de la cryptographie. Dans ce mĂ©moire, nous nous intĂ©ressons aux rĂ©alisations parallĂšles de multiplicateurs dans GF(2^m) lorsque ce dernier est gĂ©nĂ©rĂ© par des trinĂŽmes irrĂ©ductibles. Notre point de dĂ©part est le multiplicateur de Montgomery qui calcule A(x)B(x)x^(-u) efficacement, Ă©tant donnĂ© A(x), B(x) in GF(2^m) pour u choisi judicieusement. Nous Ă©tudions ensuite l'algorithme diviser pour rĂ©gner PCHS qui permet de partitionner les multiplicandes d'un produit dans GF(2^m) lorsque m est impair. Nous l'appliquons pour la partitionnement de A(x) et de B(x) dans la multiplication de Montgomery A(x)B(x)x^(-u) pour GF(2^m) mĂȘme si m est pair. BasĂ© sur cette nouvelle approche, nous construisons un multiplicateur dans GF(2^m) gĂ©nĂ©rĂ© par des trinĂŽme irrĂ©ductibles. Une nouvelle astuce de rĂ©utilisation des rĂ©sultats intermĂ©diaires nous permet d'Ă©liminer plusieurs portes XOR redondantes. Les complexitĂ©s de temps (i.e. le dĂ©lais) et d'espace (i.e. le nombre de portes logiques) du nouveau multiplicateur sont ensuite analysĂ©es: 1. Le nouveau multiplicateur demande environ 25% moins de portes logiques que les multiplicateurs de Montgomery et de Mastrovito lorsque GF(2^m) est gĂ©nĂ©rĂ© par des trinĂŽmes irrĂ©ductible et m est suffisamment grand. Le nombre de portes du nouveau multiplicateur est presque identique Ă  celui du multiplicateur de Karatsuba proposĂ© par Elia. 2. Le dĂ©lai de calcul du nouveau multiplicateur excĂšde celui des meilleurs multiplicateurs d'au plus deux Ă©valuations de portes XOR. 3. Nous determinons le dĂ©lai et le nombre de portes logiques du nouveau multiplicateur sur les deux corps de Galois recommandĂ©s par le National Institute of Standards and Technology (NIST). Nous montrons que notre multiplicateurs contient 15% moins de portes logiques que les multiplicateurs de Montgomery et de Mastrovito au coĂ»t d'un dĂ©lai d'au plus une porte XOR supplĂ©mentaire. De plus, notre multiplicateur a un dĂ©lai d'une porte XOR moindre que celui du multiplicateur d'Elia au coĂ»t d'une augmentation de moins de 1% du nombre total de portes logiques.The multiplication in a Galois field with 2^m elements (i.e. GF(2^m)) is an important arithmetic operation in coding theory and cryptography. In this thesis, we focus on the bit- parallel multipliers over the Galois fields generated by trinomials. We start by introducing the GF(2^m) Montgomery multiplication, which calculates A(x)B(x)x^{-u} in GF(2^m) with two polynomials A(x), B(x) in GF(2^m) and a properly chosen u. Then, we investigate the rule for multiplicand partition used by a divide-and-conquer algorithm PCHS originally proposed for the multiplication over GF(2^m) with odd m. By adopting similar rules for splitting A(x) and B(x) in A(x)B(x)x^{-u}, we develop new Montgomery multiplication formulae for GF(2^m) with m either odd or even. Based on this new approach, we develop the corresponding bit-parallel Montgomery multipliers for the Galois fields generated by trinomials. A new bit-reusing trick is applied to eliminate redundant XOR gates from the new multiplier. The time complexity (i.e. the delay) and the space complexity (i.e. the logic gate number) of the new multiplier are explicitly analysed: 1. This new multiplier is about 25% more efficient in the number of logic gates than the previous trinomial-based Montgomery multipliers or trinomial-based Mastrovito multipliers on GF(2^m) with m big enough. It has a number of logic gates very close to that of the Karatsuba multiplier proposed by Elia. 2. While having a significantly smaller number of logic gates, this new multiplier is at most two T_X larger in the total delay than the fastest bit-parallel multiplier on GF(2^m), where T_X is the XOR gate delay. 3. We determine the space and time complexities of our multiplier on the two fields recommended by the National Institute of Standards and Technology (NIST). Having at most one more T_X in the total delay, our multiplier has a more-than-15% reduced logic gate number compared with the other Montgomery or Mastrovito multipliers. Moreover, our multiplier is one T_X smaller in delay than the Elia's multiplier at the cost of a less-than-1% increase in the logic gate number

    Bit Serial Systolic Architectures for Multiplicative Inversion and Division over GF(2<sup>m</sup>)

    Get PDF
    Systolic architectures are capable of achieving high throughput by maximizing pipelining and by eliminating global data interconnects. Recursive algorithms with regular data flows are suitable for systolization. The computation of multiplicative inversion using algorithms based on EEA (Extended Euclidean Algorithm) are particularly suitable for systolization. Implementations based on EEA present a high degree of parallelism and pipelinability at bit level which can be easily optimized to achieve local data flow and to eliminate the global interconnects which represent most important bottleneck in todays sub-micron design process. The net result is to have high clock rate and performance based on efficient systolic architectures. This thesis examines high performance but also scalable implementations of multiplicative inversion or field division over Galois fields GF(2m) in the specific case of cryptographic applications where field dimension m may be very large (greater than 400) and either m or defining irreducible polynomial may vary. For this purpose, many inversion schemes with different basis representation are studied and most importantly variants of EEA and binary (Stein's) GCD computation implementations are reviewed. A set of common as well as contrasting characteristics of these variants are discussed. As a result a generalized and optimized variant of EEA is proposed which can compute division, and multiplicative inversion as its subset, with divisor in either polynomial or triangular basis representation. Further results regarding Hankel matrix formation for double-basis inversion is provided. The validity of using the same architecture to compute field division with polynomial or triangular basis representation is proved. Next, a scalable unidirectional bit serial systolic array implementation of this proposed variant of EEA is implemented. Its complexity measures are defined and these are compared against the best known architectures. It is shown that assuming the requirements specified above, this proposed architecture may achieve a higher clock rate performance w. r. t. other designs while being more flexible, reliable and with minimum number of inter-cell interconnects. The main contribution at system level architecture is the substitution of all counter or adder/subtractor elements with a simpler distributed and free of carry propagation delays structure. Further a novel restoring mechanism for result sequences of EEA is proposed using a double delay element implementation. Finally, using this systolic architecture a CMD (Combined Multiplier Divider) datapath is designed which is used as the core of a novel systolic elliptic curve processor. This EC processor uses affine coordinates to compute scalar point multiplication which results in having a very small control unit and negligible with respect to the datapath for all practical values of m. The throughput of this EC based on this bit serial systolic architecture is comparable with designs many times larger than itself reported previously
    corecore