The paper presents the implementation of MAC (multiplieraccumulator) unit using Vedic multiplier. The speed of MAC depends on the speed of the multiplier. The Vedic multiplier uses "Urdhva Tiryagbhyam" algorithm. The proposed MAC unit is coded in VHDL, synthesized and simulated using Xilinx ISE 10.1 software. The MAC is implemented on a FPGA device XC2S200-6PQ208 using Xilinx ISE10.1 tool. The proposed design shows improvement of speed over the design presented in [1] .
INTRODUCTION
A conventional MAC unit consists of multiplier and an accumulator that contains the sum of the previous consecutive products. The main goal of a DSP processor design is to enhance the speed of the MAC unit.
A high speed energy efficient ALU design using Vedic mathematics is discussed in [1] . They have implemented ALU using adder, subtractor, Vedic multiplier, and MAC unit. They have implemented MAC using Vedic multiplier. Their Vedic multiplier architecture shows speed improvements over conventional shift and add algorithm.
In [2] , authors have compared implementation of normal multiplication and Vedic multiplication. They claim that same number of multiplication and addition operations is required in both normal multiplier and Vedic multiplier. They have tested and compared various multiplier implementations such as Array multiplier, Multiplier macro, Vedic multiplier with full partitioning, Vedic multiplier using 4 bit macro, fully Recursive Vedic multiplier, Vedic multiplier using 8 bit macro for optimum speed.
Dhillon and Mitra [3] Proposed a multiplier using "Urdhva Tiryagbhyam" algorithm, which is optimized by "Nikhilam" algorithm. They have suggested a reduced bit multiplication algorithm using "Urdhva Tiryagbhyam" and "Nikhilam" Sutra. Their multiplier architecture is very similar to the array multiplier.
THE PROPOSED MAC UNIT
The multiply-accumulate unit computes the product of two numbers and adds that product to an accumulator. The MAC unit, consisting of a multiplier followed by an adder and an accumulator register which stores the result when clocked [4] [5] . The output of the register is fed back to one input of the adder, so that on each clock the output of the multiplier is added to the register. Combinational multipliers require a large amount of logic, but can compute a product much more quickly than the method of shifting and adding typical of earlier computers.
The MAC circuit must check for overflow, which might happen when the number of MAC operations is large. Overflow in a signed adder occurs when two operands with the same sign produce a result with a different sign. . The proposed design uses one 16x16 Vedic multiplier using "Urdhva Tiryagbhyam" algorithm [6] [7] [8] [9] [10] , 32 bit accumulator using carry save adder, and one 32 bit register. Vedic multiplier can increase the MAC unit design speed. Carry save adder is used as an accumulator in this design. The Vedic multiplier and carry save adder in the MAC unit design enhance the MAC unit speed so as to gain better system performance. The product of Ai X Bi is fed back into the 32-bit Carry Save Adder and then added again with the next product Ai x Bi. This MAC unit is capable of multiplying and adding with previous product consecutively (Output = Σ Ai Bi). When three or more operands are added simultaneously using two operand adders, the time consuming carry propagation is repeated several times. If the number of operands is "n", then carries have to propagate (n-1) time"s .In the carry save addition, the carry propagate only in the last step, while in all the other steps the partial sum and sequence of carries are generated separately.
The register module of MAC unit is implemented by using a 32 bit register. The 32 bit output from accumulator becomes input to 32 bit register. It produces 32 bit output. It is observed that for 16x16,and 32x32 proposed MAC module, the gate delay are 6.884ns,and 7.556ns while it is 22.604 ns, and 35.76 ns for the corresponding optimized Vedic multiplier described in [1] .The total number of additions required in different bit size MAC are less compared to corresponding optimized Vedic multiplier due to the carry save adder used in MAC architecture. So MAC module uses less number of slices compared to optimized Vedic multiplier. Hence, the proposed MAC module implementation found to be most efficient in terms of speed as compared to the scheme presented in [1] . . Initially "0" decimal equivalent of 16 bit operand "A" is multiplied with "0" decimal equivalent of 16 bit operand "B" using 16x16 Vedic multiplier to produce result "0",which 32 bit equivalent of "Q". The result is stored in 32 bit accumulator register. Then the decimal equivalent of next two
RESULT AND DISCUSSION

