19 research outputs found
Bit-Parallel Squarer Using Generalized Polynomial Basis For a New Class of Irreducible Pentanomials
We present explicit formulae and complexities of bit-parallel squarers for a new class of irreducible pentanomials
, where is odd and . The squarer is based on the generalized polynomial basis of .
Its gate delay matches the best results, while its XOR gate complexity is , which is only about 2/3 of the current best results
Efficient Square-based Montgomery Multiplier for All Type C.1 Pentanomials
In this paper, we present a low complexity bit-parallel Montgomery multiplier for generated with a special class of irreducible pentanomials . Based on a combination of generalized polynomial basis (GPB) squarer and a newly proposed square-based divide and conquer approach, we can partition field multiplications into a composition of sub-polynomial multiplications and Montgomery/GPB squarings, which have simpler architecture and thus can be implemented efficiently.
Consequently, the proposed multiplier roughly saves 1/4 logic gates compared with the fastest multipliers, while the time complexity matches previous multipliers using divide and conquer algorithms
Reconfigurable elliptic curve cryptography
Elliptic Curve Cryptosystems (ECC) have been proposed as an alternative to other established public key cryptosystems such as RSA (Rivest Shamir Adleman). ECC provide more security per bit than other known public key schemes based on the discrete logarithm problem. Smaller key sizes result in faster computations, lower power consumption and memory and bandwidth savings, thus making ECC a fast, flexible and cost-effective solution for providing security in constrained environments. Implementing ECC on reconfigurable platform combines the speed, security and concurrency of hardware along with the flexibility of the software approach.
This work proposes a generic architecture for elliptic curve cryptosystem on a Field Programmable Gate Array (FPGA) that performs an elliptic curve scalar multiplication in 1.16milliseconds for GF (2163), which is considerably faster than most other documented implementations. One of the benefits of the proposed processor architecture is that it is easily reprogrammable to use different algorithms and is adaptable to any field order. Also through reconfiguration the arithmetic unit can be optimized for different area/speed requirements. The mathematics involved uses binary extension field of the form GF (2n) as the underlying field and polynomial basis for the representation of the elements in the field. A significant gain in performance is obtained by using projective coordinates for the points on the curve during the computation process
New bit-parallel Montgomery multiplier for trinomials using squaring operation
In this paper, a new bit-parallel Montgomery multiplier for is presented, where the field is generated with an irreducible trinomial. We first present a slightly generalized version of a newly proposed divide and conquer approach. Then, by combining this approach and a carefully chosen Montgomery factor, the Montgomery multiplication can be transformed into a composition of small polynomial multiplications and Montgomery squarings, which are simpler and more efficient. Explicit complexity formulae in terms of gate counts and time delay of our architecture are investigated. As a result, the proposed multiplier has generally 25\% lower space complexity than the fastest multipliers, with time complexity as good as or better than previous Karatsuba-based multipliers for the same class of fields. Among the five irreducible polynomials recommended by NIST for the ECDSA (Elliptic Curve Digital Signature Algorithm), there are two trinomials which are available for our architecture. We show that our proposal outperforms the previous best known results if the space and time complexity are both considered
N-term Karatsuba Algorithm and its Application to Multiplier designs for Special Trinomials
In this paper, we propose a new type of non-recursive Mastrovito multiplier for using a -term Karatsuba algorithm (KA), where is defined by an irreducible trinomial, . We show that such a type of trinomial combined with the -term KA can fully exploit the spatial correlation of entries in related Mastrovito product matrices and lead to a low complexity architecture. The optimal parameter is further studied.
As the main contribution of this study, the lower bound of the space complexity of our proposal is about . Meanwhile, the time complexity matches the best Karatsuba multiplier known to date. To the best of our knowledge, it is the first time that Karatsuba-based multiplier has reached such a space complexity bound while maintaining relatively low time delay
Efficient Bit-parallel Multiplication with Subquadratic Space Complexity in Binary Extension Field
Bit-parallel multiplication in GF(2^n) with subquadratic space complexity has been explored in recent years due to its lower area cost compared with traditional parallel multiplications. Based on \u27divide and conquer\u27 technique, several algorithms have been proposed to build subquadratic space complexity multipliers. Among them, Karatsuba algorithm and its generalizations are most often used to construct multiplication architectures with significantly improved efficiency. However, recursively using one type of Karatsuba formula may not result in an optimal structure for many finite fields. It has been shown that improvements on multiplier complexity can be achieved by using a combination of several methods. After completion of a detailed study of existing subquadratic multipliers, this thesis has proposed a new algorithm to find the best combination of selected methods through comprehensive search for constructing polynomial multiplication over GF(2^n). Using this algorithm, ameliorated architectures with shortened critical path or reduced gates cost will be obtained for the given value of n, where n is in the range of [126, 600] reflecting the key size for current cryptographic applications. With different input constraints the proposed algorithm can also yield subquadratic space multiplier architectures optimized for trade-offs between space and time. Optimized multiplication architectures over NIST recommended fields generated from the proposed algorithm are presented and analyzed in detail. Compared with existing works with subquadratic space complexity, the proposed architectures are highly modular and have improved efficiency on space or time complexity. Finally generalization of the proposed algorithm to be suitable for much larger size of fields discussed
Computer Architectures for Cryptosystems Based on Hyperelliptic Curves
Security issues play an important role in almost all modern communication and computer networks. As Internet applications continue to grow dramatically, security requirements have to be strengthened. Hyperelliptic curve cryptosystems (HECC) allow for shorter operands at the same level of security than other public-key cryptosystems, such as RSA or Diffie-Hellman. These shorter operands appear promising for many applications. Hyperelliptic curves are a generalization of elliptic curves and they can also be used for building discrete logarithm public-key schemes. A major part of this work is the development of computer architectures for the different algorithms needed for HECC. The architectures are developed for a reconfigurable platform based on Field Programmable Gate Arrays (FPGAs). FPGAs combine the flexibility of software solutions with the security of traditional hardware implementations. In particular, it is possible to easily change all algorithm parameters such as curve coefficients and underlying finite field. In this work we first summarized the theoretical background of hyperelliptic curve cryptosystems. In order to realize the operation addition and doubling on the Jacobian, we developed architectures for the composition and reduction step. These in turn are based on architectures for arithmetic in the underlying field and for arithmetic in the polynomial ring. The architectures are described in VHDL (VHSIC Hardware Description Language) and the code was functionally verified. Some of the arithmetic modules were also synthesized. We provide estimates for the clock cycle count for a group operation in the Jacobian. The system targeted was HECC of genus four over GF(2^41)
On Space-Time Trade-Off for Montgomery Multipliers over Finite Fields
La multiplication dans le corps de Galois Ă 2^m Ă©lĂ©ments (i.e. GF(2^m)) est une opĂ©rations trĂšs importante pour les applications de la thĂ©orie des correcteurs et de la cryptographie. Dans ce mĂ©moire, nous nous intĂ©ressons aux rĂ©alisations parallĂšles de multiplicateurs dans GF(2^m) lorsque ce dernier est gĂ©nĂ©rĂ© par des trinĂŽmes irrĂ©ductibles. Notre point de dĂ©part est le multiplicateur de Montgomery qui calcule A(x)B(x)x^(-u) efficacement, Ă©tant donnĂ© A(x), B(x) in GF(2^m) pour u choisi judicieusement. Nous Ă©tudions ensuite l'algorithme diviser pour rĂ©gner PCHS qui permet de partitionner les multiplicandes d'un produit dans GF(2^m) lorsque m est impair. Nous l'appliquons pour la partitionnement de A(x) et de B(x) dans la multiplication de Montgomery A(x)B(x)x^(-u) pour GF(2^m) mĂȘme si m est pair. BasĂ© sur cette nouvelle approche, nous construisons un multiplicateur dans GF(2^m) gĂ©nĂ©rĂ© par des trinĂŽme irrĂ©ductibles. Une nouvelle astuce de rĂ©utilisation des rĂ©sultats intermĂ©diaires nous permet d'Ă©liminer plusieurs portes XOR redondantes. Les complexitĂ©s de temps (i.e. le dĂ©lais) et d'espace (i.e. le nombre de portes logiques) du nouveau multiplicateur sont ensuite analysĂ©es:
1. Le nouveau multiplicateur demande environ 25% moins de portes logiques que les multiplicateurs de Montgomery et de Mastrovito lorsque GF(2^m) est généré par des trinÎmes irréductible et m est suffisamment grand. Le nombre de portes du nouveau multiplicateur est presque identique à celui du multiplicateur de Karatsuba proposé par Elia.
2. Le délai de calcul du nouveau multiplicateur excÚde celui des meilleurs multiplicateurs d'au plus deux évaluations de portes XOR.
3. Nous determinons le délai et le nombre de portes logiques du nouveau multiplicateur sur les deux corps de Galois recommandés par le National Institute of Standards and Technology (NIST). Nous montrons que notre multiplicateurs contient 15% moins de portes logiques que les multiplicateurs de Montgomery et de Mastrovito au coût d'un délai d'au plus une porte XOR supplémentaire. De plus, notre multiplicateur a un délai d'une porte XOR moindre que celui du multiplicateur d'Elia au coût d'une augmentation de moins de 1% du nombre total de portes logiques.The multiplication in a Galois field with 2^m elements (i.e. GF(2^m)) is an important arithmetic operation in coding theory and cryptography. In this thesis, we focus on the bit-
parallel multipliers over the Galois fields generated by trinomials. We start by introducing the GF(2^m) Montgomery multiplication, which calculates A(x)B(x)x^{-u} in GF(2^m)
with two polynomials A(x), B(x) in GF(2^m) and a properly chosen u. Then, we investigate the rule for multiplicand partition used by a divide-and-conquer algorithm PCHS
originally proposed for the multiplication over GF(2^m) with odd m. By adopting similar rules for splitting A(x) and B(x) in A(x)B(x)x^{-u}, we develop new Montgomery
multiplication formulae for GF(2^m) with m either odd or even. Based on this new approach, we develop the corresponding bit-parallel Montgomery multipliers for the Galois
fields generated by trinomials. A new bit-reusing trick is applied to eliminate redundant XOR gates from the new multiplier. The time complexity (i.e. the delay) and the
space complexity (i.e. the logic gate number) of the new multiplier are explicitly analysed:
1. This new multiplier is about 25% more efficient in the number of logic gates
than the previous trinomial-based Montgomery multipliers or trinomial-based Mastrovito multipliers on GF(2^m) with m big enough. It has a number of logic gates very close to
that of the Karatsuba multiplier proposed by Elia. 2. While having a significantly smaller number of logic gates, this new multiplier is at most two T_X larger in the total
delay than the fastest bit-parallel multiplier on GF(2^m), where T_X is the XOR gate delay. 3. We determine the space and time complexities of our multiplier on the two
fields
recommended by the National Institute of Standards and Technology (NIST). Having at most one more T_X in the total delay, our multiplier has a more-than-15% reduced
logic gate number compared with the other Montgomery or Mastrovito multipliers. Moreover, our multiplier is one T_X smaller in delay than the Elia's multiplier at the cost of
a less-than-1% increase in the logic gate number
Recommended from our members
Formal Analysis of Arithmetic Circuits using Computer Algebra - Verification, Abstraction and Reverse Engineering
Despite a considerable progress in verification and abstraction of random and control logic, advances in formal verification of arithmetic designs have been lagging. This can be attributed mostly to the difficulty in an efficient modeling of arithmetic circuits and datapaths without resorting to computationally expensive Boolean methods, such as Binary Decision Diagrams (BDDs) and Boolean Satisfiability (SAT), that require âbit blastingâ, i.e., flattening the design to a bit-level netlist. Approaches that rely on computer algebra and Satisfiability Modulo Theories (SMT) methods are either too abstract to handle the bit-level nature of arithmetic designs or require solving computationally expensive decision or satisfiability problems. The work proposed in this thesis aims at overcoming the limitations of analyzing arithmetic circuits, specifically at the post-synthesized phase. It addresses the verification, abstraction and reverse engineering problems of arithmetic circuits at an algebraic level, treating an arithmetic circuit and its specification as a properly constructed algebraic system. The proposed technique solves these problems by function extraction, i.e., by deriving arithmetic function computed by the circuit from its low-level circuit implementation using computer algebraic rewriting technique. The proposed techniques work on large integer arithmetic circuits and finite field arithmetic circuits, up to 512-bit wide containing millions of logic gates
Bit Serial Systolic Architectures for Multiplicative Inversion and Division over GF(2<sup>m</sup>)
Systolic architectures are capable of achieving high throughput by maximizing pipelining and by eliminating global data interconnects. Recursive algorithms with regular data flows are suitable for systolization. The computation of multiplicative inversion using algorithms based on EEA (Extended Euclidean Algorithm) are particularly suitable for systolization. Implementations based on EEA present a high degree of parallelism and pipelinability at bit level which can be easily optimized to achieve local data flow and to eliminate the global interconnects which represent most important bottleneck in todays sub-micron design process. The net result is to have high clock rate and performance based on efficient systolic architectures.
This thesis examines high performance but also scalable implementations of multiplicative inversion or field division over Galois fields GF(2m) in the specific case of cryptographic applications where field dimension m may be very large (greater than 400) and either m or defining irreducible polynomial may vary. For this purpose, many inversion schemes with different basis representation are studied and most importantly variants of EEA and binary (Stein's) GCD computation implementations are reviewed. A set of common as well as contrasting characteristics of these variants are discussed. As a result a generalized and optimized variant of EEA is proposed which can compute division, and multiplicative inversion as its subset, with divisor in either polynomial or triangular basis representation. Further results regarding Hankel matrix formation for double-basis inversion is provided. The validity of using the same architecture to compute field division with polynomial or triangular basis representation is proved.
Next, a scalable unidirectional bit serial systolic array implementation of this proposed variant of EEA is implemented. Its complexity measures are defined and these are compared against the best known architectures. It is shown that assuming the requirements specified above, this proposed architecture may achieve a higher clock rate performance w. r. t. other designs while being more flexible, reliable and with minimum number of inter-cell interconnects.
The main contribution at system level architecture is the substitution of all counter or adder/subtractor elements with a simpler distributed and free of carry propagation delays structure. Further a novel restoring mechanism for result sequences of EEA is proposed using a double delay element implementation.
Finally, using this systolic architecture a CMD (Combined Multiplier Divider) datapath is designed which is used as the core of a novel systolic elliptic curve processor. This EC processor uses affine coordinates to compute scalar point multiplication which results in having a very small control unit and negligible with respect to the datapath for all practical values of m. The throughput of this EC based on this bit serial systolic architecture is comparable with designs many times larger than itself reported previously