In this paper, we propose a new multiplier-and-accumulator (MAC) 
INTRODUCTION
The important operations in digital signal processing are filtering, convolution, and inner products. For such operations the required essential elements are multiplier and multiplier-andaccumulator (MAC). Most digital signal processing methods use non-linear functions such as discrete cosine transform (DCT) [2] or discrete wavelet transform (DWT) [3] . Basically the operations consists of repetitive application of multiplication and addition, so the speed and performance of the operation depends on the speed of the multiplication and addition. For high speed multiplication, the modified radix-4 Booth's algorithm (MBA) [4] is commonly used. However, this cannot completely solve the problem due to long critical path for multiplication [5] , [6] .
Multiplication is an important operation in digital signal processing algorithms. It needs large area, and consumes considerable power. Therefore, there is need of designing low power multiplier for DSP applications. Extensive work has been carried out on low power multipliers at technology, physical, circuit and logic levels. These low-level techniques are not unique to multiplier modules and they are generally applicable to other types of modules. Moreover, power consumption is directly related to data switching patterns. However, it is difficult to consider application-specific data characteristics in low-level power optimization.
The objective of realizing a good multiplier is to have a small size, high speed and low power consumption. To save significant power consumption of a VLSI design, the focus to be to reduce its dynamic power, the bulk of total power dissipation.
The purpose of this work is to design and implement a low power MAC unit with block enabling technique to save power. Firstly, a 1-bit MAC unit is designed, with appropriate geometries that give optimized power, area and delay. For low power and delay reduce the path in the pipeline stages for data flow between the MAC blocks.
A multiplier design consists of three operational steps. The first is radix-2 Booth encoding in which a partial product is generated from the multiplicand X and the multiplier Y. The second is adder array or partial product compression to add all partial products and convert them into the form of sum and carry. The last is the final addition in which the final multiplication result is produced by adding the sum and the carry. When the multiplier results are to be accumulated, an additional step is needed, as shown in figure1. Fill the least significant (rightmost) bit with a zero.
2. Determine the two least significant (rightmost) bits of P. If they are (a) 01, find the value of P + A. Ignore any overflow.
(b) 10 find the value of P + S. Ignore any overflow.
(c) 00 do nothing. Use P directly in the next step.
(d) 11 do nothing. Use P directly in the next step.
3. Arithmetically shift the value obtained in the second step by a single place to the right. Let P now equal this new value.
4. Repeat steps 2 and 3 for r number of times.
5. Drop the least significant (rightmost) bit from P. The result is the product of X and Y.
IMPLEMENTATION
The low power multiplier with SPST consists of i) modified Booth Encoder, ii) detection Unit, and iii) Register as shown in figure 4. 
MODIFIED BOOTH ENCODER
Modified Booth algorithm has been proposed for high speed multiplication .This type of multiplier operates much faster than an array multiplier for longer operands because its computation time is proportional to the logarithm of the word length of operands. Booth multiplication is a technique that allows faster multiplication by grouping the multiplier bits. The grouping of multiplier bits and Radix-2 Booth encoding reduce the number of partial products to half. So we take every second column, and multiply by ±1, ±2, or 0, instead of shifting and adding for every column of the multiplier term and multiplying by 1 or 0.The advantage of this method is halving of the number of partial products. For Booth encoding the multiplier bits are formed in blocks of three, such that each block overlaps the previous block by one bit. Start from the LSB for grouping, and the first block only uses two bits of the multiplier. Figure 6 shows the grouping of bits from the multiplier term.
Figure 6: Grouping of bits from the multiplier term
To obtain the correct partial product each block is decoded. Table 1 shows the encoding of the multiplier Y, using the modified Booth algorithm, generates the following five signed digits, -2, -1, 0, +1, +2. Each encoded digit in the multiplier performs a certain operation on the multiplicand X. 
SPURIOUS POWER SUPPRESSION TECHNIQUES AND REGISTER
Spurious transitions (also called glitches) in combinational CMOS logic are a well known source of unnecessary power dissipation. Reducing glitch power is a highly desirable target because in the vast majority of digital CMOS circuits, only one signal transition per clock cycle is functionally meaningful. Unfortunately, glitch power is heavily dependent on the low-level implementation details, namely, gate propagation delays and input transitions misalignments. The procedure for glitch minimization is based on a well known idea. Glitches are eliminated by adding some redundant logic that prevents spurious transitions. This can be done by inserting latches in a gate-level net list. Figure 7 shows a 16-bit adder/subtractor based on the proposed SPST. In this, the 16-bit adder / subtractor is divided into MSP (Most Significant Part) and LSP (Least Significant Part) between the 8 th and 9 th bits. The MSP of the original adder is modified to include detection logic circuits, data controlling circuits, sign extension circuits, logic for calculating carry-in and carry-out signals.
Simple logic gates are used to implement the latches and the sign extension circuits in order to reduce the additional overhead as for as possible. Low power adder/subtractor consists of i) latch, ii) Detection logic, and iii) sign extension logic. 
(a) Detection Logic
The most important part of detection logic is design of the control signal asserting circuit, shown in figure 7. Although this asserting circuit brings evident power reduction, it may induce additional delay. An approach for implementing the control signal assertion circuit is using registers and is illustrated as shaded area in figures 9 and 10. 
(b) Applying SPST to the Modified Booth Encoder
The SPST equipped modified Booth encoder, which is controlled by a detection unit. One of the two operands as input to the detection unit, which decide whether the Booth encoder calculates redundant computations. As shown in Figure 10 , the latches can, respectively, freeze the inputs of MUX-4 to MUX-7 or only those of MUX-6 to MUX-7 when PP 4 to PP 7 or PP 6 to PP 7 are zero, to reduce the transition power dissipation. 
ARRAY MULTIPLIER AND ACCUMULATOR
A 4x4 array multiplier consists of 16 AND gates, 4 HAs, 8FAs (total 12 Adders) is shown in figure 11 .Therefore, for an m x n Array Multiplier, m*n AND gates, n HAs, (m-2)*n FAs, i.e. a total of (m-1)*n adders are required. Figure 13 also illustrates generation of partial products in a 4x4 array multiplier. A total of 8 partial products are generated by the 4x4 array multiplier. Similarly, a 16 x16 array multiplier takes 16-bit multiplicand and 16-bit multiplier and generates 32 partial products. Table 2 shows the synthesis report for array MAC and radix-2 modified Booth algorithm with SPST adder MAC. Table 3 shows the comparisons of power consumption and delay of the array MAC and radix-2 modified Booth algorithm with SPST adder MAC. The code is dumped onto the target device Spartan 3E (Xc3s500eft256 -4), inputs (Set frequency of asynchronous nets as10MHz) , signals (Set frequency of asynchronous nets as10MHz) and outputs (Set capacitive load of outputs as 28000 pf). 
CONCLUSION
The SPST adder avoids the unwanted glitches and thus minimizes the switching power dissipation. Radix -2 modified booth algorithm reduces the number of partial products to half by grouping of bits from the multiplier term, which improves the speed. The implemented radix-2 modified Booth algorithm MAC with SPST gives a factor of 5 less delay and 7% less power consumption as compared to array MAC.
