Abstract-In this paper we propose the use of Identity Based Encryption (IBE) for ensuring a secure wireless sensor network. In this context we have implemented the arithmetic operations required for the most computationally expensive part of IBE, which is the Tate pairing, in 90nm CMOS and obtained area, timing and energy figures for the designs. Initial results indicate that a hardware implementation of IBE would meet the strict energy constraint of a wireless sensor network node.
there is no need for a certificate to bind a node's identity to its public key, as the node's identity can be used as the public key.
The remainder of the paper is organized as follows: the Tate pairing, a key component of IBE, and methods for calculating it, are discussed in Section II in the context of low energy hardware implementation. In light of this discussion, descriptions of the hardware units required for the implementation are outlined in Section III. Finally, conclusions and directions for future work are presented in Section IV.
II. TATE PAIRING
The Boneh-Franklin scheme is one example of IBE [5] . This approach requires arithmetic on the supersingular elliptic curve, E, defined over GF(2283) (1) , and the Tate pairing for its implementation. E(GF(2283)) = Y2 +y _ x3 + 1 (1) The Tate pairing is the map from two points on the elliptic curve to the multiplicative group (GF(2283x4))* E(GF(2283)) [1] x E(GF(2283x4)) [1] -(GF(2283x4)) * (2) which has the property of bilinearity. It can be defined as el(P, Q) = fP(X(Q))2 (3) where P, Q C E(GF (2283)) [1] , fp is a rational function on the curve such that its divisor (fp) = I(P) -1(0) and X is a distortion map [6] . Such large finite fields are required as the discrete logarithm problem must be hard to solve in the field that the pairing maps to.
It is well known that the Tate pairing is the most computationally expensive operation in an IBE algorithm, therefore in a wireless sensor node, it should be implemented in hardware in order to reduce the amount of energy required to compute the pairing. The pairing can be calculated using Miller's algorithm [7] , and improvements have recently been proposed to reduce its cost in terms of its execution time [6] , [8] , [9] , [10] . We are currently implementing the algorithm [9] , [10] , as it is the most amenable to a low energy hardware implementation , since it is in characteristic two and has a regular structure which maps well to hardware (see Algorithm 1). In the subfield GF(2), addition is carried out using modulo two arithmetic, and hence can be performed in hardware using an XOR gate. Addition is equivalent to subtraction in GF (2) . Also, multiplication is performed using an AND gate in hardware.
The polynomial basis representation is used for the elements of the two finite fields. The irreducible polynomials which generates the fields GF(2283) and GF(2283x4) are, f(x) x 283 + ±9119 + x 97 + x 93 + P(x) 4 ±x+ 1 (4) and are defined over GF (2) and GF(2283), respectively.
A. Addition
Addition in a binary extension field is trivial to implement in hardware. It is an array of XOR gates, one for every two bits of the operands that are to be added. As the circuits are much smaller in area and energy consumption than the circuits for the other operations outlined in this paper, their contribution to the overall energy and timing figures for the Tate pairing are neglected.
B. Multiplication in GF(2283)
The aim of this work is to reduce the energy consumption of the circuits that are implemented to carry out the various operations. As we are implementing our design in 90 nm CMOS and due to the preeminence of static energy over dynamic energy consumption when using deep sub-micron technologies, it would be best if the operation could be preformed as fast as possible. This would then mean that the circuits could be powered off when not in use, hence static energy is saved at the cost of greatly increased area. A fast bit-parallel multiplier is approximately 300,000 gates in area. This would be prohibitive in terms of manufacturing cost for a wireless sensor node.
Thus, a bit-serial approach to designing the multiplier is warranted. Multiplication is to be performed using the modified shift-register (MSR) multiplier [11] which has superior low energy attributes compared to the standard shift-register due to its early exit mechanism as outlined below.
The MSR multiplier is based on the following observation for A(x), B(x), C(x) C GF(2283).
C(x) can be calculated using a shift and add algorithm where the first partial product is boA(x). B(x) is then shifted right one bit while at the same time A(x) is multiplied by x and reduced mod f (x). It is added to the previous product if bi is equal to 1. The algorithm will terminate when the value of the right shift register is equal to zero (see Algorithm 2). This is the early exit mechanism referred to above, as it could finish after one clock cycle or 283 clock cycles. The datapath circuitry is shown in Fig. 1 . We use a right shift and linear feedback barrel shift registers to improve the performance of the circuit. Two, three, four or five consecutive zero bits are searched for, and the registers shifted accordingly. Five bits is considered the optimum choice, as for each bit searched there is a cost of approximately 400 gates, and the probability of five zeros is 1 which is quite high. When the system is complete we will implement clock tree gating at a higher level to reduce the energy being dissipated on the clock nets. (9) From (9) it can be seen that nine multiplications and twenty two additions are required in GF(2283).
The datapath circuitry is shown in Fig. 2 . The datapath width is 283 bits wide. As well as using the MSR multiplier which improves upon energy consumption, reductions can also be achieved by holding wires at a constant value when not in use, and hence reducing dynamic energy dissipation. This is accomplished through the signals enaddlO and enaddl2 (not shown), which gate the inputs and the combinational logic, respectively. Clock tree gating is used at a higher level to reduce the energy being dissipated on the clock nets. The MSR multipliers clocks are gated with their "done" signals. This technique take advantage of the early exit of the MSR multipliers due to their structure.
D. Squaring
The bit-serial multiplier described in Section Ill-B could be used for squaring, but as squaring is used 283 times in each loop and in the inversion circuitry, this is not the optimum choice. Instead, we have implemented a bit-parallel squaring circuit which does not require a significant amount of area (see Table I ). In our architecture there are two squaring circuits that operate in parallel.
E. Exponentiation
The only exponentiation that is required is a = a22 where a, /3 C GF(2283x4). This is also known as the Frobenius map.
The exponentiation is as follows; 2283 3 I: aixi) '1 can now be decomposed using (14) and (15) For a proof see [13] . Due to the fact that x24 it can be seen that The Frobenius map can therefore be implemented in hardware with two additions in GF(2283) and reordering of the coefficients.
The technique for inversion for C GF(2283) is based on Fermat's little theorem (12) . We have used this method as it allows reuse of the squaring and multiplication circuits. (12) This means that 2 283-2 2_ 1 (mod p(x)) and therefore /2 -2 is the inverse of/. The inverse of can be calculated with the square and multiply technique using the following observations; Therefore only 11 multiplications and 283 squarings are required to obtain the inverse of /.The datapath circuitry is shown in Fig. 3 .
G. Inversion in GF(2283x4)
Fermat's little theorem (12) , [15] can also be used to get the inversion of an element a C GF(2283x4)(see Algorithm 3). Again, this approach has been used to reduce area, whilst achieving maximum performance in terms of energy and time per operation. If the general case a C GF(2fl) is considered then the inverse is /2-1 1= (3 2 2 (2283), and finally the last step is multiplication in GF(2283x4). The datapath circuitry is shown in Fig. 4 .
IV. CONCLUSION
The primitives were implemented in VHDL and synthesised targeting a 90nm library. Power measurements were taken using the PrimePower tool from Synopsys. All circuits in our design operate at 250MHz. From an analysis of the algorithm we are implementing (see Algorithm 1), and using the results given in Table I , it is estimated the Tate pairing would require 18.36 ,uJ. Given that the Tate pairing is the most computationally expensive process in IBE, we believe that this figure is appropriate in the context of meeting the extremely restrictive energy constraint of a wireless sensor node. We estimate the time taken to compute the pairing to be 0.84 ms, this compares favourably with the latency estimate of 1.48 ms presented in [16] . In our future work we will integrate these arithmetic primitives into a low energy, low cost implementation of the Tate pairing. 
