Abstract-This paper proposes a design method for an 8-bit multiplication with reduced delay time. Normally, two numeric data can be multiplied by repeated addition. In case of binary multiplication, combinational circuit can be designed using manual multiplication method which requires binary addition. Carry generated because of addition affects the speed of multiplication since the present addition depends on the value of previous carry. To overcome this problem, addition with the help of multiplexer is introduced and the result is an increased speed in multiplication. Even though the proposed design is mainly for FPGA implementation, it can also be implemented in ASIC as the logical delay is reduced when compared the result in Xilinx device.
I. INTRODUCTION
Multiplication is generally used in almost all applications or designs like digital signal processing [1] , image processing, embedded system, design of Arithmetic Logic Unit etc., [2] . In this fast moving life, researchers are competing with the speed of technology. When a typical algorithm is implemented in FPGA, it provides better execution speed than the one which is implemented in a DSP or any other processor. It is because; processor utilizes its hardware architecture through sequential instructions. Whereas, FPGA activates its hardware at the same time to do a job. Hence, many of the applications are being replaced by FPGA nowadays. VLSI technology is categorized into ASIC or FPGA implementation. FPGA is chosen widely since it has the flexibility of implementing different types of applications in a single chip [3] . Type of an application can be changed by writing HDL program into it.
Adder is the basic circuit that is needed to do multiplication. Efficiency of multiplication automatically improves when the efficiency of adder is increased. In case of VLSI design, efficiency speaks in terms of area, speed & power. Since there is a trade-off among these parameters, only one of the parameter is considered for optimization and the other two parameters shall be brought up to the satisfactory level [4] . Delay time of multiplexer is reduced in this paper since many applications require high speed operation. Delay time of adder circuit mainly depends on processing carry generation when two bits are added. In ripple carry adder, the present addition waits for its previous S. Tamil Selvi is with National Engineering College, Kovilpatti, India (e-mail: tamilgopal2004@yahoo.co.in).
addition to generate carry. This increases delay time with increase in size of addition. Carry look-ahead adder is the best adder when speed is given more importance.
II. ARRAY MULTIPLIER
Based on performance comparison [5] , [6] , array multiplier is selected for the purpose of evaluation. Fig. 1 shows a 5X5 multiplication. In array multiplier, multiplication is achieved in three steps. In the first step, each multiplicand bit is logically and with each multiplier bit. This logical AND operation can be done simultaneously. In the second step, partial product in each row is shifted by its row position minus one. In the third step, partial products are added to get the end result. Now, this addition can be performed depending on designer's choice. Effective design of adder circuit decides multiplier efficiency. A 5×5 multiplier requires, 20 AND gates, 15 shifters and 16 full adders. It is also possible to perform multiplication with repeated addition using a single adder circuit. Hardware requirement will be further increased if the designer chooses carry look-ahead adder. However, pipeline method of multiplication reduces delay time. Logical and in partial product is also responsible for increase in delay time. Delay time logarithmically varies in proportion to the bit size of multiplicand and multiplier. 
III. MULTIPLIER DESIGN
Multiplication of two binary numbers involve logical and OR operation. As in array multiplier, partial products are obtained by logically and in each bit of multiplicand with each bit in multiplier [7] . After generating product terms, addition is done using suitable adder circuit. Now, in this design the product terms are grouped together so that all the FPGA Implementation of 8-bit Multiplier with Reduced Delay Time
Dhanabalan and Tamil Selvi addition operations are done [8] simultaneously as shown in Fig. 2(a). r1, r2, r3 & r4 are the resultant additions. These results are again grouped for addition to get x1 and x2 as shown in Fig. 2(b) . The final result from Fig. 2(a) is x1, x2. Fig. 2 (a) The final result is achieved after adding x1 and x2. 
A. Adder Using Multiplexer
Addition is achieved in three stages. In the first stage, 8 bit data is divided into two 4 bit data and added simultaneously using two 4-bit adder. In the second stage, Sum from the addition of a0-a3 & b0-b3 and carry from the addition of a4-a7 & b4-b7 are added using 4 bit incrementer. Carry from incremented & carry from the addition of a4-a7 & b4-b7 are logically ORed to get final carry in the third stage.
Adder circuit using multiplexer is designed from the adder truth table. For an 8-bit addition, truth table will have 65,536 rows. In this design, we have segregated 8-bit addition into two 4-bit addition and hence the truth table will have 256 rows only. Six inputs are used as selector lines. Hence 64:1 multiplexer is designed as a 4-bit adder. By having two 4 bit adders implemented into FPGA, 8-bit addition is done with lesser delay time as shown in Fig. 3 .
Multiplexer design for four bit addition is derived from truth table as shown in Table I . The design is written in verilogHDL program and verified for correct output using ModelSim-Altera 6.6c simulation software. Figure 4 shows simulated result for sample data (0d×15, 8d ×55 & ff× ff). Using test bench, outputs for all the possible combination of inputs are verified. For implementing multiplier into hardware, we used Spartan 3E FPGA board [9] . Synthesize and comparison is done using the target device xc5vlx30-3-ff324. The Cell usage is shown in Table II.   TABLE II: CELL USAGE   LUTs & IO Buffers  In numbers  LUT2  4  LUT3  13  LUT4  13  LUT5  41  LUT6  62  MUXF7  10  IBUF  16  OBUF  17 To have comparison, multiplication is done using * symbol in program for the same target device. Cell usage for
International Journal of Computer and Communication Engineering, Vol. 2, No. 6, November 2013 this design is shown in Table III. The RTL & Technological schematic of proposed multiplier is shown in Fig. 6, 7, 8 & 9 . Array multiplier performs better with reference to other multipliers. Its delay time is 14.4ns. Delay time of proposed multiplier when synthesized using Virtex5 FPGA is 12.294ns. Hence the proposed design is able to execute multiplication faster than array multiplier.
The proposed design is also compared with the multiplier that is included as the standard cell by the vendor [10] - [11] . It is shown in Table IV . Delay time in the proposed design is higher than standard cell multiplier. Delay time is the addition of logic delay and route delay. Now, logic delay in proposed design is 1.525 ns less than standard cell multiplier. If logic delay of proposed design and route delay of standard cell multiplier is combined, then delay time would be than 5.622ns. 
VI. CONCLUSION
In this paper, we concentrated on how to increase the speed of multiplication. Logic delay in this design is reduced. Hence, the design can also be looked for ASIC implementation. In case of FPGA implementation, route delay is higher than the implementation using DSP48E. Route delay can be reduced with the help of an efficient algorithm. Defining a mechanism for routing algorithm in FPGA may pave way for reduced route delay and hence delay time.
