Abstract-Finite field or Galois field plays an important role in efficient architecture design and implementation of Elliptic curve cryptosystem. A lot of research work is going on in this area since it is suitable for cryptography as well as error correcting codes useful for digital communication, compact disks etc. In this paper we discuss the basic concepts of finite field and its application to elliptic curve cryptography (ECC). A detailed study and analysis of various implementation options available in finite field has been explored and highlighted for effective system design. In section IX we discuss a few efficient hardware design approaches adopted by many researchers useful for ECC.
I. INTRODUCTION
Elliptic curve cryptography (ECC) was invented in the mid eighties independently by Victor Miller & Neal Koblitz. Since its invention, it has proved itself a strong alternative to the public key cryptographic systems like RSA and Deffie-Hellman (DH). It offers equivalent security with smaller key sizes resulting in faster computations, lower power consumption, as well as memory and bandwidth savings. This is because of its flexibility in the underlying mathematical concepts, efficient algorithm development & subsequent efficient implementation in both hardware & software. These advantages are especially important in applications on constrained devices such as smart cards, mobile phones, PDAs.
The efficiency of every public key cryptosystem depends on a hard mathematical problem, which is computationally intractable. For instance, RSA and Diffie-Hellman (DH) rely on the hardness of integer factorization and the discrete logarithm problem respectively. Unlike these cryptosystems, which operate over integer fields, the Elliptic Curve Cryptosystems (ECC) operates over a group of points on an elliptic curve over finite field.
The basic mathematical operation in RSA and Diffie-Hellman is modular integer exponentiation. But, elliptic curve arithmetic relies on a operation called scalar point multiplication, which computes Q = kP, where P is a point on a selected elliptic curve & k is a sufficiently large integer which is private in nature. This multiplication gives rise to a new point called Q on the same elliptic curve. Scalar multiplication is performed through a combination of point-additions (which add two distinct points together) and point-doublings (which add two copies of a point together). For example, 11P can be expressed as 11P = (2. ((2 . (2 . P)) + P)) + P.
The security of ECC depends on efficient computation of k for given P & Q. Although computing Q is easier, but computing k is too hard. This is, in fact, called Elliptic Curve Discrete Logarithm Problem (ECDLP).
Although applying a brute-force approach to compute all multiples of P until Q is found, looks sound, but k is so large in a real cryptographic application that it would be infeasible to determine k in this way. Mathematicians have given their best effort to attack such problem since many years. But, the best known algorithm to attack ECDLP so far takes exponential time [1] , where as sub-exponential time algorithm do exist, to solve the hard problems of RSA [2] & DH [3] .
The remaining part of this paper is organized as follows. Section 2 discusses the issues related to finite field & basis representation. A brief idea about finite field arithmetic operations is presented in section 3. Mathematical concepts of Elliptic curves over finite fields and representation of points in different coordinate system is presented in section 4. Performance and complexity metrics is explained in section 5. A note on how to choose architecture is given in section 6. Finally section 7 gives the conclusion.
II. FINITE FIELD
A finite field is an algebraic system consisting of a finite set F together with two binary operations + & . , defined on F, satisfying the following axioms:
• F is an abelian group with respect to "+";
• F\{0} is an abelian group with respect to ".";
• Distributive: for all x,y and z in F we have: . The concept of optimal normal basis(ONB) was introduced in [11] to reduce the complexity of hardware architecture. Again it can be of two types i.e. Type I & Type II. The basic difference between these two variants is that, Type I has less irreducible polynomials then Type II.
VI. FINITE FIELD OPERATIONS
The generalized operations on the finite field are addition/subtraction, multiplication, squaring, inversion, exponentiation, reduction and division. However, all the operations are not individually realized in hardware or software. Because some primitive operations like addition can do subtraction under modulo 2 operation, multiplication can be helpful for realizing inversion, division can be implemented by multiplication and inversion and so on. Again these operations are basis dependent except addition/subtraction.
Multiplication under polynomial basis is performed by multiplying polynomial a(x) and b(x) and taking the modulo with respect to a reduction polynomial f(x). The following procedure is commonly used to choose a reduction polynomial: if an irreducible trinomial Extended Euclidean algorithm and the other one by employing Fermat's little theorem can achieve inversion in the finite fields. A method for efficiently implementing division was proposed by Itoh and Tsuji [14] .
A. Finite Field Multipliers
Since the central operation of ECC is the elliptic scalar multiplication (described in section 1), this section gives some more insight into finite field multipliers. Finite field multipliers can be grouped into two major categories, namely serial and parallel. Serial multipliers have less complex structure than parallel but produce only a few result bits per clock cycle [9] . These are more attractive under hard area constraints, since they require fewer gates than parallel ones. On the other hand, parallel ones can perform the total multiplication in one clock cycle at the cost of large chip area [10] . Efficient bit-parallel multipliers for both polynomial and normal basis representation have been proposed [15, 16, 17] , including the Mastrovito multiplier [18] . In a recent paper [19] , Huapeng Wu has proposed low complexity bit parallel multipliers for three classes of finite fields.
Polynomial multiplication can be efficiently implemented using well-known techniques such as the Ofman Karatsuba method [16] . Field multiplication, i.e. polynomial multiplication combined with reduction, can be implemented using techniques such as the least significant digit (LSD) first or most significant digit (MSD) first multiplication method [8] .
F is defined as a multiplication of the dividend a(x) with the multiplicative inverse of the divisor b(x). This also necessitates efficient exponentiation division circuit.
VII. ELLIPTIC CURVES OVER FINITE FIELDS
There are several ways of defining equations for elliptic curves, which depend on whether the field is a prime field F p or binary field m 2 F .
B. Elliptic curves over GF(p)
LET The operation in ) ( p F E is specified as follows: one inversion, two multiplications, one squaring and six additions. Similarly, doubling an elliptic curve point requires one inversion, two multiplications, two squarings and 8 additions.
4.Let
Since inversion in p F is an expensive operation, an alternative method to compute the sum of two elliptic points is to use projective coordinates. In such case, the inversion operation is performed by more multiplications and other less expensive finite field operations. F can be found in [12] and [13] respectively, where an inversion costs about 24 and 10 multiplications respectively.
VIII. PERFORMANCE AND COMPLEXITY METRICS

Several performance and complexity metrics are used to compare and evaluate finite field multipliers. Following is a short description of those metrics.
Gate count This is the main complexity metric which is usually considered as the numbers of 2-input AND and XOR gates, flip-flops and switches or 2-to-1 multiplexers. It is sometimes tied to the silicon area used for implementation using the area and count of an equivalent 2 input NAND gate to represent the hardware complexity.
Latency The delay between first input and first output of the multiplier expressed in clock cycles is defined as latency. This measure is of special importance in systolic and semi-systolic design as the output makes a delay of multiple clock cycles after the arrival of the input.
Regularity and Modularity These two metrics are interrelated. Design regularity helps in extending a system to perform more operations easily and modularity helps in visualizing the entire system as a combination of independent modules. Polynomial basis multipliers are most preferred in terms of regularity while normal basis are less preferred. Polynomial basis representation has the advantage over the other bases as it can be performed using ordinary polynomial arithmetic. It is also easier to extend to high order finite fields than the dual or normal basis [8] . In the normal basis where squaring is only a shift left operation is crucial to inversion operation. For example, Massey-Omura multiplier [7] is very effective in performing squaring, exponentiation, and inversion operations. The dual basis yields the simplest architecture. The dual basis multiplier [6] for example, needs the least number of gates, which leads to the smallest area required for VLSI implementation.
Selecting serial or parallel architecture depends on the availability of operands during computation.
Coming to systolic and non-systolic, systolic architecture allows for pipelining while non-systolic are more hardware efficient.
Taking all those factors into account although absolute hardware architecture is difficult to realize but it helps in focusing the target. For example if the target is for low hardware complexity then non-systolic design is a better choice. Although, semi-systolic design gives still less hardware complexity than fully systolic but the common control signal in semi-systolic make it difficult to expand the multiplier to higher fields. In a recent paper Meher [20] has proposed a bit-level-pipelined systolic design, which takes nearly half of the time complexity of the corresponding existing design that of Wang and Lin [21] , Lee [22] , Lee et al. [23] by appropriate cut-set retiming and logic optimization in the processing elements (PEs).
Using composite fields to construct the multiplier architecture is also helpful in lowering the area complexity and increases the modularity of the architecture. Because, multiplication over composite field GF( (2) n m ) is performed using arithmetic modules of GF ( 2 n ) and their complexity analysis can be found in [24] .
X. CONCLUSION
This study of different implementation options in finite field multipliers helps in identifying the suitable target easily and effectively. Although there are many papers on arithmetic operations in finite field, but more and more optimal hardware architecture is what we are looking for. Because of the miniaturization of electronic devices, power efficiency and cost-effectiveness without compromising the security will be the real challenge of future ECC based system design.
