The paper presents the concepts behind the "Urdhva Tiryagbhyam Sutra" and "Nikhilam Sutra" multiplication techniques. It then shows the architecture for a 16×16 Vedic multiplier module using Urdhva Tiryagbhyam Sutra. The paper then extends multiplication to 16×16 Vedic multiplier using "Nikhilam Sutra" technique. The 16×16 Vedic multiplier module using Urdhva Tiryagbhyam Sutra uses four 8×8 Vedic multiplier modules; one 16 bit carry save adders, and two 17 bit full adder stages. The carry save adder in the multiplier architecture increases the speed of addition of partial products. The 16×16 Vedic multiplier is coded in VHDL, synthesized and simulated using Xilinx ISE 10.1 software. This multiplier is implemented on Spartan 2 FPGA device XC2S30-5pq208. The performance evaluation results in terms of speed and device utilization are compared with earlier multiplier architecture. The proposed design has speed improvements as compared to multiplier architecture presented in [5] .
INTRODUCTION
Vedic mathematics was rediscovered in the early twentieth century from ancient Indian sculptures (Vedas). Ancient Indian system of mathematics was derived from Vedic Sutras. The conventional mathematical algorithms can be simplified and even optimized by the use of Vedic mathematics. The Vedic algorithms can be applied to arithmetic, trigonometry, plain and spherical geometry, calculus.
In [1] , authors have proposed a new multiplier based on an Vedic algorithm for low power and high speed applications. Their multiplier architecture is based on generating all partial products and their sums in one step. They claim that their proposed Vedic multiplier is faster than the corresponding array multiplier and Booth multiplier. The authors in [2] have tested and compared various multiplier implementations such as Array multiplier, Multiplier macro, Vedic multiplier with full partitioning, Vedic multiplier using 4 bit macro, fully Recursive Vedic multiplier, Vedic multiplier using 8 bit macro for optimum speed. They have claimed that Vedic method is not fundamentally different from conventional method of multiplication.The implementation of Rivest, Shamir & Adleman (RSA) encryption/decryption algorithm using Vedic mathematics is proposed to improve performance in [3] . They have used Vedic multiplier and division architecture in the RSA circuitry for improved efficiency. Their results show that RSA circuitry implemented using Vedic division and multiplication is efficient in terms of area/speed compared to its implementation using conventional multiplication and division architectures. Dhillon and Mitra [4] proposed a multiplier using "Urdhva Tiryagbhyam" algorithm, which is optimized by "Nikhilam" algorithm. They have suggested a reduced bit multiplication algorithm using "Urdhva Tiryakbhyam" and "Nikhilam" Sutra.
We have developed a new Vedic multiplier structure using "Nikhilam" Sutra. The carry save adder implemented in the proposed architecture reduces propagation delay significantly. It is believed that our architecture may set new path for future research.
Vedic Multiplier using 'Urdhva Tiryagbhyam'Sutra
The "Urdhva Tiryagbhyam" Sutra [5] [6] [7] [8] [9] [10] is a general multiplication formula applicable to all cases of multiplication. "Urdhva" and "Tiryagbhyam" words are derived from Sanskrit literature. "Urdhva" means "Vertically" and "Tiryagbhyam" means "crosswise".
The multiplication of two 2-digit decimal numbers 21 and 32 is shown in Figure 1 . The least significant digit 1 of multiplicand is multiplied vertically by least significant digit 2 of the multiplier, get their product 2 and set it down as the least significant part of the answer. Then 2 and 2, 1 and 3 are multiplied crosswise, add the two, get 7 as the sum and set it down as the middle part of the answer. Then 2 and 3 is multiplied vertically, get 6 as their product and put it down as the last the left hand most part of the answer.
So 21 × 32 = 276. The "Urdhva Tiryagbhyam" algorithm can be implemented for binary number system in the same way as decimal number system. Let us consider the multiplication of two 2-bit binary numbers a1a0 and b1b0. Assuming that the result of this multiplication would be 4 bits, we express it as p2 p1 p0. The least significant bit a0 of multiplicand is multiplied vertically by least significant bit b0 of the multiplier, get their product p0 and set it down as the least significant part of the answer (p0). Then a1 and b0, and a0 and b1 are multiplied crosswise, add the two, get sum1 and carry1, the sum bit is the middle part of the answer (p1). Then a1 and b1 is multiplied vertically, and add with the previous carry (carry1) and get p2 (2 bit) as their product and put it downs as the left hand most part of the answer (p2). So a1a0 X b1b0=p2 p1 p0.Similarly The 2×2 Vedic multiplier module is then used to implement higher level multipliers (4×4 multiplier, 8×8 multiplier, 16×16 multiplier).
16x16 Vedic Multiplier Module
The architecture of 16x16 Vedic multiplier using "Urdhva Tiryagbhyam" Sutra is shown in Fig.2 . The 16x16 Vedic multiplier architecture is implemented using four 8x8 Vedic multiplier modules, one 16 bit carry save adder, and two 17 bit binary adder stages. 
The proposed architecture uses 16-bit carry save adder and 17-bits adder modules to generate the final 32-bits product (p31-p16) & ( p15-p8 ) & (p7-p0).The p7-p0 (8-bits) of the product represents least significant 8-bits of the 16-bit output of the right hand most 8x8 multiplier module. The 16-bit carry save adder adds three input 16-bit operands i.e. concatenated 16-bit ("00000000" & most significant eight bits output of right hand most 8x8 multiplier module), each 16-bit output of second and third 8x8 multiplier modules. The 16-bit carry save adder produces two 16-bit output operands, sum vector and carry vector. The outputs of the carry save adder are fed into first 17-bit adder to generate 17-bit sum. The middle part (p15-p8) represents the least significant eight bits of 17-bit sum. The 16-bit output of the left most 8x8 multiplier module and concatenated 16-bits ("0000000"& the most significant nine bits of 17-bits sum) are fed into second 17-bit adder. The p31-p16 represents sixteen bit sum. The 33rd carry bit is omitted while taking the final product.
PROPOSED 16X16VEDIC MULTIPLIER USING NIKHILAM SUTRA
The Nikhilam Sutra literally means "all from 9 and last from 10". It is more efficient when the numbers involved are large. The Nikhilam Sutra algorithm is efficient for multiplication only when the magnitudes of both operands are more than half their maximum values. For n-bit numbers, therefore both operands must be larger than 2n-1. Nikhilam Sutra is explained by considering the multiplication of two single digit decimal numbers 8 and 7 where the chosen base is 10 which is nearest to and greater than both these two numbers [6] . As shown in Fig.3 , the multiplier and the multiplicand are written in two rows followed by the differences of each of them from the chosen base, i.e., their compliments. There are two columns of numbers, one consisting of the numbers to be multiplied (Column 1) and the other consisting of their compliments (Column 2). The product also consists of two parts which are demarked by a vertical line for the purpose of illustration. The right hand side (RHS) of the product can be obtained by simply multiplying the numbers of the Column 2 i.e., (2x3=6). However the surplus portion on the RHS is carried over to Left. The left hand side (LHS) of the product can be found by cross subtracting the second number of Column 2 from the first number of Column 1 or vice versa, i.e., 8 -3 = 5 or 7-2 = 5. The final result is obtained by concatenating RHS and LHS (Answer = 56).
Nikhilam Sutra in Binary Number
The Nikhilam sutra can also be applicable to binary number system. The compliment of multiplicand or multiplier is represented by taking 2"s compliment of that number. The right hand side part of the product is implemented using 16X16 bit multiplication. The left hand side part is implemented using 16-bit carry save adder. Hence the multiplication of two 16-bit numbers is reduced to the multiplication of their compliments and addition. It has been observed that for 16x16 Vedic multiplier module using Nikhilam Sutra , the gate delay is 33.729 ns with Device utilization (number of slices-45%) while it is 41.751 ns with Device utilization (number of slices-87%) for the 16x16 Vedic multiplier module using Urdhva Tiryakbhyam Sutra. to16x16Vedic multiplier using "Urdhva Tiryagbhyam" Sutra. The high propagation delay of 16x16 Vedic multiplier using "Urdhva Tiryagbhyam" Sutra has been reduced significantly by using Nikhilam Sutra. Because Nikhilam Sutra reduces the multiplication of two large numbers into the multiplication of two small numbers and addition. Hence, there is significant speed improvement for 16x16 Vedic multiplier implementation using Nikhilam Sutra..
RESULTS AND DISCUSSION

CONCLUSION
The proposed Vedic multiplier architecture shows speed improvements over multiplier architecture presented in [5] . The 16x16 Vedic multiplier using "Nikhilam" Sutra found to be better than 16x16 Vedic multiplier using "Urdhva Tiryakbhyam" Sutra in terms of speed when magnitude of both operands are more than half of their maximum values . This approach may be well suited for multiplication of numbers with more than 16 bit size.
