Abstract -We present an integrated circuit area efficient and high-speed FPGA implementation of scalar multiplication using a Vedic multiplier. Scalar multiplication is the most important operation in Elliptic Curve Cryptography (ECC), which is used for public key generation and the performance of ECC greatly depends on it. The scalar multiplier is designed over Galois Binary field GF(2 233 ) for field size=233-bit which is secured curve according to NIST. The performances of the proposed design are evaluated by comparing it with Karatsuba based scalar multiplier for area and delay. The results show that the proposed scalar multiplication using Vedic multiplier has consumed 22% less area on FPGA and has 12% less delay than Karatsuba, based scalar multiplier. The scalar multipliers coded in Verilog HDL, synthesize and simulated in Xilinx 13.2 ISE on Virtex6 FPGA.
I. INTRODUCTION
Elliptic Curve Cryptography is a public key cryptography proposed by Miller and Koblitz in 1985. ECC is gaining acceptance for implementing security standards in place of well-known RSA, DES cryptography algorithm. In ECC, smaller key size provides more security i.e. 160-bit key provide the same security level compare with the 1024-bit key of RSA. Due to the above feature, this cryptosystem is suitable for devices, having less computation power, limited storage, and limited battery backup. Elliptic curve cryptosystem offers the following protocols for key generation, key exchange, Digital Signature, and data encryption;
 In above ECC protocols, scalar multiplication will be used for a public key generation at the sender and receiver end. The performance of the ECC protocol greatly depends on the efficient implementation of the scalar multiplication operation. In the literature, many authors have proposed different techniques for optimizing scalar multiplication operation and optimization can be achieved at a different level of computation. The first approach is at the upper level, by representing the [k] in such a way that it reduces the hamming weight of scalar [k] ; resulting in reducing the execution of addition, doubling operation. In [1] [2] the author presents methods based on this approach which is discussed in section 2. In the second approach, the optimization can be achieved at the bottom level by the fast and efficient implementation of underlying finite field operation such as addition, multiplication, squaring and inversion. From above finite field operation, multiplication, inversion is the most time-consuming operation and it occupies more device space.
In [3] the author has proposed and implemented finite field multiplier using Binary, Simple, General and Hybrid Karatsuba multiplier over the projective coordinate system. The result shows that the Hybrid Karatsuba multiplier is more area efficient than other design. In [4] Elliptic Curve scalar multiplier architecture for field size 163-bit has presented, the delay is reduced by adopting the pipeline strategy to implement point addition, point doubling, and Karatsuba multiplier. The architecture uses 3, 4 stage pipelining for ECSMA. In [5] [12] using different approaches and methods. In this paper, we propose the finite field multiplication operation using a Vedic multiplier for scalar multiplication. For performance evaluation of proposed scheme, we have implemented scalar multiplication using Hybrid Karatsuba multiplier and comparative analysis of multiplier are presented for area and delay. The scalar multiplier is coded using Verilog HDL and implemented on Virtex6 FPGA in Xilinx 13.2 ISE.
In the rest of the paper, Section 2 presents working of Scalar multiplication. Section 3 presents the mathematical background of Karatsuba and Vedic multiplier. FPGA Implementation of scalar multiplication for binary field GF (2 
II. SCALAR MULTIPLICATION
Scalar multiplication is the most important operations in Elliptic curve cryptography [13] . The scalar multiplication is computing Q=[k]P, where k is a scalar and P(x1, y1) and Q(x 2 , y2) are the points on an Elliptic curve E.
The scalar multiplication has the form:
This can be calculated by adding point P exactly k-1 times itself which is shown in equation (2):
The security of ECC depends on the difficulty of Discrete Logarithm Problem (DLP), which is finding k from given P and Q ϵ E. Practically it is very difficult to find k if P and Q are known. Figure 1 shows the layer model of scalar multiplication for computing Q. 
If P≠Q then ECS ADD will perform, and if P=Q, then EC DBL operation will be called. The result of ECS ADD or ECSDBL results in a new points R will always be another point on the Elliptic curve E. Figure 3 shows point addition and Figure 4 , shows the doubling operation on the elliptic curve E resulting third coordinate R(x3, y3) on the same curve E. The ECS ADD and ECS DBL use finite field arithmetic operation like addition, subtraction, multiplication, squaring, and inversion to compute coordinate R(x3, y3) on the Elliptic Curve E. Since among these finite field operations, multiplication dominates the speed of ECSM, we have proposed computation of finite field multiplication using a Vedic multiplier.
III. KARATSUBA AND VEDIC MULTIPLIER FOR FINITE FIELD MULTIPLICATION
Multiplier plays a vital role in digital circuit design. Among all the arithmetic operation, multiplication is the most expensive operation. The computational time for multiplication depends on the size of multiplier and multiplicand. For large numbers, the naïve multiplier is not suitable. In digital design, different multipliers i.e Array [14] , Booth [15] , Wallace-Tree [16] , Dadda, and Karatsuba [7] are used for performing the multiplication operation. In [7] [17] the author has analyzed different multipliers and its variations for their performance. In this section, we will present working of Karatsuba multiplier and a Vedic multiplier.
A. Karatsuba Multiplier
The Karatsuba multiplier works on divide and conquers method for multiplying two numbers. The Karatsuba multiplier breaks the large number into smaller numbers and algorithm called recursively for subpart for performing multiplication. It works on the linear and polynomial function as well. In [7] the author has evaluated Padded, Binary, Simple and Generalized Karatsuba multiplier and proposed a new Hybrid Karatsuba multiplier using Simple and Generalized Karatsuba multiplier. Generalized Karatsuba multiplier is more area efficient compared with other design. In this section, we will discuss the Simple, Generalized and Hybrid Karatsuba multiplier.
The multiplication of two n-bit numbers performs using three multiplications and some addition operations. Consider x and y are two n-bit numbers of any base ( base-2 or base-10) and the multiplication of this numbers using Karatsuba multiplier are performed using the following formulas. The numbers are divided into Higher and Lower bits. The High bit represent using H and L represents a Lower bit.
The above requires only three multiplications and multiplier called recursively until the number being multiplied is a single digit number.
A1. Method for Polynomial Multiplication
The Karatsuba multiplier can also be used for multiplication of polynomials. The finite field multiplication for two polynomial of degree-n A(x) and B(x) ϵ GF (2 n ) is defined as:
The n-bit multiplicand is divided into two term polynomials and multiplication is perform using three n/2 multiplication which shown below [7] . 
A2. Hybrid Karatsuba Multiplier
The Hybrid Karatsuba multiplier [7] is designed using simple and General Karatsuba multiplier which is shown in Figure  5 . In Hybrid multiplier, the initial multiplication for all large multiplication is done using Simple Karatsuba Multiplier and final small multiplication performs using General Multiplier. The author has implemented 233-bit Hybrid multiplier on FPGA. The result shows that 233-bit Hybrid multiplier is more area efficient, but relatively slower than other Karatsuba design. Let's consider an example of a 4 digit Karatsuba multiplier:
Compute 1234 * 4321, the subproblems will be, a1=12*43 d1=34*21 e1=(12+34)*(43+21)-a1-d1 = 46*64-a1-d1
The First Sub-Problem will be, a1=12*43 This has the following sub problems, a2=1*4=4 d2=2*3=6 e2=(1+2)(4+3)-a2-d2 =11 Answer: 4*102+11*10+6=516
The Second Sub-Problem is, d1=34*21 This has the following sub problems, a2=3*2=6 d2=4*1=4 e2=(3+4)(2+1)-a2-d2 = 11 Answer: 6*102+11*10+4=714
The Third Sub-Problem is, e1=46*64-a1-d1 This has the following sub problems, a2=4*6=24 d2=6*4=24 e2=(4+6)(6+4)-a2-d2 = 52 Answer: 24*102+52*10+24-714 -516 = 1714 and the final answer is, 1234*4321=516*104+1714*100+714 = 5,332,114 This is how Karatsuba multiplier works for large numbers.
B. Vedic Multiplier
Jagdguru Shakarachraya Bharti Krishna Teerthaji Maharaj proposed different simple methods for all mathematical calculations. Any mathematical calculations perform using Vedic mathematics is simple to implement and faster. The Vedic multiplier is more area and delays efficient than other multipliers [16] . Jagdguru Shakarachraya proposed 16 sutras and 13 sutras for Vedic mathematics from Athrav Veda. Out of this 16 sutras following two sutras are used for multiplication of two numbers.
i. Nikhilam Navatascaramam sutra ii. Urdhva -Tiryagbhyam sutra Among this Urdhav-Triyagbhyam sutra is more efficient. In our scalar multiplication, we perform finite field multiplication operation using Urdhav-Triyagbhyam. The Urdhav-Triyagbhyam multiplication technique can be directly applied for decimal and binary number.
B1. Urdhva Tiryagbhyam
Urdhva-Tiryagbhyam sutra is one of the 16 Vedic sutras which perform the multiplication operation of two numbers [18] . The multiplication technique, which is used in this sutra, is a general technique, which can directly be applied to decimal, binary, small and large number. The beauty of this sutra is that the same multiplication method can be directly applied to decimal as well as binary numbers. "Urdhva" means vertically and "Tiryagbhyam" means crosswise, therefore, it is also called as Vertically and Crosswise algorithm [18] . Figure 6 shows steps for multiplication of two 3-digit decimal numbers using vertically and crosswise method and Figure 2 shows an alternative method for multiplication of two 4-digit using Urdhva-Tiryagbhyam sutra [4] . To demonstrate the working of a typical Vedic multiplication algorithm, consider the multiplication of two numbers m=42 and n= 21 to obtain o=m*n [12] . The following steps perform this:
Step1. Multiply the 2 highest digits MSB (4 and 2), which will be resulting in an 8.
Step2. For the next higher digit, cross multiply MSB(m) and LSB(n) 4*1 (4) and MSB(m) and LSB(n) 2*2 (4), and add together, producing the middle digit of the number 8.
Step3. For the lowest digit, multiply LSB(m) and LSB(n) 2 lowest digits (1*2) together, resulting in a 2.
Step4. Put all of the digits together to produce your answer using the Vedic multiplier, which is 882.
One thing that can note that the order in which you go through for the Vedic process does not actually matter.
Therefore, we can similarly start with the lowest digit and work our way up to the highest digit.
B2. Algorithm for 4X4 Vedic Multiplier
The multiplication steps for 4X4 multiplier using vertically and crosswise technique is given below. Once 4x4 multiplier is designed than this multiplier is used recursively to design 8x8, 16x16, 32x32 and higher bit multiplier.
IV. FPGA IMPLEMENTATION OF SCALAR MULTIPLICATION
Scalar multiplication involves multiplication of a scalar quantity with a vector quantity, which results in a vector output. This type of multiplication is the most basic operation in the field of vector computation and is used in point multiplication based applications like encryption using ECC. Scalar multiplication usually involves multiple normal multiplications in order to produce the vector result. The following diagram shows the operation of scalar multiplication. From Figure 8 , we can see that for an N dimension vector, we need N simple multiplier instances (Mui1,Mui2,….,MuiN). Thus as the dimension of the vector quantity increases, the number of multipliers increase linearly. If the complexity of a simple multiplier is O(n), then for an N dimension vector, the scalar multiplier complexity will be N*O(n), similarly, the area and power of the scalar multiplier follow the same pattern. Thus, it is essential to optimize the simple multiplier unit in order to optimize the performance of the scalar multiplier.
Generally, Karatsuba multiplier is used as the basic building block for the scalar multiplier, the Karatsuba multiplier has many advantages including but not limited to,

Increased speed of operation when compared to shift and add method  Less number of computations, thus less area when compared to shift and add method  Low power consumption But, the performance of the Karatsuba based scalar multiplier can be further enhanced by using a Vedic multiplier in place of the Karatsuba multiplier. The Vedic multiplier based scalar multiplication diagram can be represented as follows, Figure 9 . Scalar multiplication using Vedic Multiplier. Figure 9 shows; we have replaced the existing normal multiplier with the Vedic multiplier. The Urdhva Tiryakbhyam sutra is used, which is described in the previous section. Using the Vedic multiplier for scalar multiplication design gives the following advantages,  Delay of the Vedic multiplier is one clock cycle, thus the scalar multiplication happens very quickly  The power consumption of the circuit reduces as the number of clocks for which the circuit is active is reduced to 1, thereby reducing the overall energy requirement of the system  Vedic multiplier uses less number of operations when compared to the Karatsuba multiplier, thus the overall area of the scalar multiplier reduces drastically
The block diagram of a 2x2 Vedic multiplier is shown below. In Figure 11 , the 2x2 multiplier is the same Vedic multiplier, which is previously described. In the 4x4 multiplier, we use the same Urdhva Tiryakbhyam sutra, which first multiplies LSBs of X and Y, then MSB of X with LSB of Y, & LSB of X with MSB of Y, and then finally MSB of X and Y. The result is shown from the P vector in the above figure. The complete operation does not require any recursion (like Karatsuba multiplier), and thus the entire 4 bits are multiplied in a single clock cycle. Thereby reducing the delay of the system to 1 clock cycle. A similar process will apply for 8x8, 16x16 and NxN Vedic multiplier in order to perform parallel multiplication. Due to simplicity in construction, the power and area requirements of this design are less too. Based on these advantages, we evaluated the performance of the Vedic multiplier based scalar multiplier and obtained some very interesting results that will describe in the next section.
V. IMPLEMENTATION RESULTS
This section present implementation results of ECSM Scalar multiplication using Karatsuba multiplier (ECS KM ) and Vedic multiplier (ECS VM ). The scalar multiplier is designed for the binary field for 233-bit GF (2 233 ) which is secured curved recommended by National Institute of Standards Technology(NIST) recommended in his Federal Information Standards(FIPS) 186-3 [19] . The Curve value of Curve constant b and base point will be taken from the above standard document is as given below [19] .
Curve: B-233
Curve Constant:
Base Point P(x,y): G x = 0fa c9dfcbac 8313bb21 39f1bb75 5fef65bc 391f8b36f8f8eb73 71fd558b G y = 100 6a08a419 03350678 e58528be bf8a0beff867a7ca36716f7e 01f81052
The 32-bit key k, in scalar multiplication, is a private key in ECC. The Scalar multiplier using Karatsuba ECS KM and the Vedic multiplier ECSVM is coded in Verilog HDL and implemented on Virtex6 FPGA in Xilinx 13.2 ISE. The Synthesis, Place and Route (PAR) report are used to get the device utilization and delay of the design. The test-bench is created for testing the design and simulated using ISim simulator. ECS KM and ECS VM are tested with the same data set on Virtex6-xc6vlx760-ff1760 FPGA device. Figure 13 and Figure 14 The Q(sx, sy) are the scalar output received after scalar multiplication of base point BP(x,y) and the key value key(31:0). The values received after Scalar multiplication is the public key Q used to encrypt the data in Elliptic Curve Cryptography.
VI. CONCLUSION
The proposed work indicates that Vedic multiplier has definitive advantages when compared to Karatsuba multiplier. These advantages are utilized in our paper, and we proposed a scalar multiplier based on Vedic multiplication technique, which outperforms the Karatsuba based multiplier in terms of delay requirement, power consumption, and area requirements. We observe that the Vedic multiplier based implementation is nearly 12% more delay efficient than Karatsuba based implementation, and has 22% less device utilization. Due to which the overall power consumption also reduces. These advantages make the Vedic based scalar multiplication circuit more usable for low power and high speed embedded systems, and also allows for the given circuit to perform better when applied to high complexity applications like encryption and communication. In the future, we plan to integrate the optimized scalar multiplier with a highly complex elliptic curve cryptosystem and analyze its performance.
