Abstract-This work describes the smallest known hardware implementation for Elliptic/Hyperelliptic Curve Cryptography (ECC/HECC). We propose two solutions for Publickey Cryptography (PKC), which are based on arithmetic on elliptic/hyperelliptic curves. One solution relies on ECC over binary fields F2n where n is a composite number of the form 2p (p is a prime) and another on HECC on curves of genus 2 over F2p . This implies the same arithmetic unit for both cases which supports arithmetic in a field F2p . Our best solution that still results in a feasible performance features less than 5 kgates with an average power consumption smaller than 10 µW .
I. INTRODUCTION
The field of embedded systems is growing at a rapid rate, as devices such as mobile phones, PDAs, smart cards and more recently RFID tags, sensor nodes and key immobilizers have become unavoidable in our everyday life. Hence, the distinguishing characteristics of embedded security can be divided into two categories: resource-limitation and physical accessibility. The former one specifies severe resource constraints on the security architecture in terms of memory, computational capacity, and energy for embedded devices. The most challenging tasks for embedded security are implementations of Public-key Cryptography (PKC).
RFID tags are small wireless devices for pervasive computing. Despite their rigorous constraints featuring extremely low budget for power and die size they also give rise to serious security and privacy issues. Typical security services include authentication, key management and encryption. Although some experts are a priori giving up on public key solutions, assuming it being too expensive and too power hungry, there exists a line of research exploring the limits of compact public key implementations for low-cost applications such as RFIDs and sensor networks.
In our previous work we investigated standardized low cost solutions for Elliptic Curve Cryptography (ECC) processors supporting security algorithms and protocols for RFID [1] . Namely, in standards it is mainly recommended to use ECC over a field F 2 p , where p is a prime. In this work we describe a new solution based on Hyperelliptic Curve Cryptography (HECC) and on ECC over composite fields. HECC has some advantages over ECC because of the possibility to work in a smaller field e.g. for HECC (in the case of genus 2 curves) one can work in the field F 2 n whilst obtaining the same level of security as for ECC over fields of bit-lengths that are twice as large. The same holds for ECC over composite fields. This property allows for more compact ALU than in the case of ECC.
We revisit the algorithms for HECC and ECC over composite fields and optimize them on the number of registers, resulting in an area-optimized solution. Our results show that the arithmetic unit for those two cases is smaller than the one for standardized ECC, but memory requirements are slightly bigger. Using RAM for storage, which would result in smaller number of gates than register files, one could have a feasible low cost solution using all kinds of curve-based cryptography.
The paper is organized as follows. Section II lists some related work. In Sect. III we give some background information on curve-based cryptography and supporting arithmetic. In Sect. IV we elaborate on a suitable selection of parameters and algorithms and in Sect. V we outline our architecture and describe our hardware implementation. Our results are discussed in Sect. VI. Section VII concludes the paper.
II. RELATED WORK
As related previous work we mention implementations of Public-key cryptosystems for resource constrained environments such as RFID tags and sensor networks.
Gaubatz et al. [5] investigated ECC implementations for wireless sensor networks. The architecture of the ECC processor occupied an area of 18 720 gates and consumed less than 400 µW of power at 500 kHz. The field used was a prime field of order ≈ 2 100 . PKC processors for RFID tags include the results of Wolkerstorfer [12] and Kumar and Paar [8] . Wolkerstorfer [12] showed that ECC based PKC is feasible on RFID-tags by implementing the ECDSA on a small IC. The chip has an area complexity of around 23 000 gates and it features a latency of 6.67 ms for one point multiplication at 68.5 MHz. However, it can be used for both types of fields e.g. F 2 191 and F p192 . The results of Kumar and Paar [8] include an area complexity of almost 12 kgates and a latency of 18 ms for one point multiplication over F 2 131 at 13.56 MHz. The operating frequency is in both cases too high for those applications and therefore the results cannot be properly evaluated. Namely, with such a high frequency the power consumed becomes too large, which has the most crucial impact on the feasibility of the implementations. Our previous work described in [1] considers an ECC processor over a range of fields varying from F 2 131 till F 2 163 . The best solution features 6718 gates for arithmetic and control unit (data memory not included) in 0.13 µm CMOS technology over the field F 2 131 , which provides a reasonable level of security for the time being. In this case the consumed power is less than 15 µW when operating frequency is 200 kHz. We compare the previous implementations with our new results in Section VI in more detail.
III. MATHEMATICAL BACKGROUND
In general, ECC relies on a group structure induced on an elliptic curve. A set of points on an elliptic curve together with the point at infinity, denoted ∞, and with point addition as binary operation has the structure of an abelian group. Here we consider finite fields of characteristic two. A non-supersingular elliptic curve E over F 2 n is defined as the set of solutions (x, y) ∈ F 2 n × F 2 n of the equation:
Hyperelliptic Curve Cryptography (HECC) was proposed in 1988 by Koblitz [7] as a generalization of ECC. More details can be found in [2] , [10] .
The scalar multiplication i.e. operation k * P (where k is an integer and P a point or divisor on EC/HEC respectively) is the basic operation for all ECC/HECC protocols. At the next (lower) level are the group operations i.e. point/divisor addition and doubling. The lowest level consists of finite field operations such as addition, subtraction, multiplication and inversion required to perform the group operations. The scalar multiplication is easily performed via repeated group operations. The basic scheme is called double-and-add or the binary method [2] .
Elliptic curves can be viewed as a special case of hyperelliptic curves i.e. an elliptic curve is a hyperelliptic curve of genus g = 1. Here we consider a hyperelliptic curve C of genus g = 2 over GF(2 n ), which is defined by an equation of the form: C :
is a polynomial of degree at most g and f (x) is a monic polynomial of degree 2g + 1. For genus 2 curves, in the general case the following equation is used:
For ECC over a composite field we consider F 2 2·p as a field of quadratic extension over F 2 p , so we can write
, where deg(f ) = 2. In this case each element from the field F 2 2·p is represented as c = c 1 t + c 0 where c 0 , c 1 ∈ F 2 p , and all operations in this field can be done by means of operations in F 2 p .
In [4] , the Weil descent attack is introduced against EC defined over binary fields of composite degree n = k · m. This work put some doubt on security of composite field implementations of EC in general. However, further investigations have shown that composite fields with degree n = 2 · p (i.e., extension of degree two), where p is prime, remain secure against Weil Descent attacks and its variants.
IV. ALGORITHMS SELECTION AND OPTIMIZATIONS
For the ECC point multiplication we chose the method of Montgomery [11] that maintains the relationship P 2 − P 1 as invariant. It uses a representation where computations are performed on the x-coordinate only in affine coordinates (or on the X and Z coordinates in projective representation). That fact allows us to save registers which is one of the main criteria for obtaining a compact solution.
As starting point for our optimizations we use the formulae of Lopez and Dahab [9] . The original formulas in [9] From the formulae for point operations for projective coordinates [2] it is evident that we need to implement only multiplications and additions. Squaring is considered as a special case of multiplication in order to minimize the area and inversion is avoided by use of projective coordinates. We assume that conversion to affine coordinates can be computed at the reader's side. Note also that, if necessary, the one inversion that is required can be calculated by use of multiplications. In this way the area remains almost intact and some small control logic has to be added.
For our HECC implementation we used so-called type II curves [3] , which are defined by h 2 = 0, h 1 = 1. As a starting point for divisor operations we used formulae from [6] . Those curves allow for faster doublings than for a general curve, while security remains intact. However, we optimized the formulae on the number of intermediate variables which resulted in a small increase in number of multiplications. In this way, we can perform a trade-off between area and performance.
V. CURVE-BASED PROCESSORS FOR LOW-COST

APPLICATIONS
Our solution for a curve-based processor is shown in Fig. 1 . The architecture consists of the following blocks: a control unit (FSM), a modular arithmetic unit (MALU), and some memory (RAM and ROM). In ROM the ECC/HECC parameters and some constants used in algorithms are stored. On the other hand, RAM contains all input and output variables and therefore it communicates with both, the ROM and the MALU.
The FSMs control the scalar multiplication k * P and the point/divisor operations. In addition, the controller commands the MALU which performs field operations. When the START signal is set, the bits of k =
, n k = log 2 k , are evaluated from MSB to LSB. The control consists of a number of simple state machines and a counter and its area cost is small.
The datapath of the MALU is an MSB-first bit-serial F 2 n multiplier with digit size d. This arithmetic unit computes 2 83 ) 2 ) HECC over GF (2 83 Fig. 1 . Architecture of our curve-based procesoor.
and P (x) = p i x i . Modular addition is also supported by the same hardware logic. This operation requires additional multiplexors and XORs. However the cost of this solution is much cheaper compared to the case of having a separate modular adder. This type of hardware sharing is very important for such low-cost applications. The proposed datapath is scalable in the digit size d which can be determined arbitrary by exploring the best combination of performance and cost. Details of the MALU are given in [1] .
VI. RESULTS Now we give the results for area complexity, power consumption and the latency in the case of ECC/HECC scalar multiplication. The designs were synthesized by Synopsys Design Vision using a 0.13 µm CMOS library. We rely on the arithmetic in binary fields F 2 p where p varies from bitsize 67 till 83.
The results of the area complexity for various architectures with respect to the choice of fields and the size of d for the MALU are given in Table II . The results for the complete architecture in gates are given in Table III.   TABLE II  THE AREA COMPLEXITY OF MALU IN GATES.   Field size  d=1  d=2  d=4  d=6  d=8  67  2295  2551  3041  3574  4056  71  2426  2694  3201  3709  4217  79  2705  2953  3558  4157  4676  83  2844  3154  3739  4328  4956   TABLE III The graphical representations of our results for area in µm 2 and for the total power consumed are shown in Fig. 2, Fig. 3 , Fig. 4 and Fig. 5 for ECC and HECC respectively. The power estimates were made assuming the operating frequency of 200 kHz. With this frequency the power stays between 8 and 16 µW which is assumed to be acceptable for these applications. Next we give the numbers for the performance. For the point multiplication we used Montgomery ladder and for divisor scalar multiplication the binary method. We calculate the total number of cycles for each field operation by use of the following formulae for field operations. The total number of cycles for one field multiplication is The results for the total number of cycles of one point multiplication for ECC are given in Table IV . To calculate the time for one point multiplication we need an operating frequency. However, the frequency that can be used is strictly influenced by the total power. We assumed an operating frequency of 200 kHz in order to estimate the actual timing as our power results showed to be reasonable for RFID applications. We get We underline again that our results for the area complexity do not include RAM. The amount of storage that is required for our implementation is to store 13n and 28n bits for ECC and HECC respectively, where n is the number of bits of elements in a subfield. We can conclude that our architecture presents the smallest known curve-based crypto processor for low-cost applications.
VII. CONCLUSIONS
This work presents architectures for low-cost applications of curve-based cryptography for RFID tags. Several solutions for various field sizes are given. Our results show that PKC for RFIDs is an option but further investigations are necessary with respect to the memory requirements and more precise power estimates.
