# Review of MAC Unit for Complex Numbers

Priyanka W. Kinge<sup>1</sup> Prof. Dr. D. R. Dandekar<sup>2</sup> <sup>1</sup>M.Tech Student, <sup>2</sup>HOD Professor, Department of Electronics Engineering B.D. College of Engineering Sewagram, Wardha.

**Abstract:** In Digital Communication, Digital Signal Processor (DSP) is an important block which performs several digital signal processing applications such as Convolution, Discrete Cosine Transform (DCT), Fourier Transform, and so on. Every digital signal processor contains MAC unit. The MAC unit performs multiplication and accumulation processes repeatedly in order to perform continuous and complex operations in digital signal processing. MAC unit also contains clock and reset in order to control its operation. Many researchers have been focusing on the design of advance MAC unit architectures for complex numbers so as to achieve minimum resource utilization and delay.

\*\*\*\*\*

### I. Introduction

With the increasing popularity of the smart phones and TABs, speed of the processor has become so important nowadays. Though the need for the processor's speed is more exploited for gaming and multimedia application purposes, it can also be deployed in fields like medical for faster diagnosis, in automation industry for higher throughput and so on. Since most of the signal processing operations is done by adders and multipliers units, efficient design of these units increases the speed of the processor. MAC unit is mainly used in several multimedia applications, careful design of this MAC unit leads to the design of high performance processor. The speed of the conventional MAC unit is optimized by using various frontline multipliers like Wallace tree multiplier, Booth Multiplier, Baugh-Wooley multiplier.



#### Fig 1: Block Diagram of MAC

In general, digital signal processors are used to perform the digital signal processing operations like convolution, correlation, transform and filtering. All the above mentioned digital signal processing operations are in the form of multiplication and repeated addition. So multiply accumulate circuit (MAC) is the heart of the digital signal

processor. The signal sequences can be represented as fixed/floating point complex numbers. Complex numbers are playing a vital role in electronics and digital signal processing (DSP), because they are easy way to represent and manipulate the most useful real world sinusoidal waveforms. The signal attributes like amplitude and phase, can be revealed easily by complex numbers than real numbers. The basic blocks of MAC is shown in Fig. 1, where the inputs A and B are multiplied then the multiplication result is added with the previous MAC result. If A, B are n bits wide then the multiplication result will be 2n bits wide. So to avoid overflow during accumulation, the accumulation register will have k extra bits with its actual length of 2n bits. The multiplier is the part of the MAC which can be designed in many ways. Array multiplier and Wallace tree multiplier are the popular multipliers that are used in hardware implementation. The Wallace tree multiplier has the time complexity as O(log2 n). The array multiplier can be further classified into two categories namely, ripple carry array multiplier and carry save array multiplier. The ripple carry array multiplier can be designed with time complexity of O(n2). The carry save array multiplier can be designed in many ways, they are Braun multiplier and Baugh Wooley multiplier with time complexity of O(n). The second part of the MAC is accumulator which can be designed in several ways namely, ripple carry adder and carry look ahead adder. The ripple carry adder can be designed with time complexity of O(n). The recursive doubling based carry look ahead adder can be designed in  $\Theta(\log 2 n)$ . The conventional fixed point complex number multiplier-cum-accumulator is shown in Fig. 2, where four fixed point multipliers and four fixed point adders are used. In the architecture two fixed point multiplier-cum accumulators, two fixed point multipliers and two fixed point adders are involved.



Fig 2: Complex Multilplier

# II. Literature Review

In this paper [1], a high performance 32-bit radix-2 fixed point complex number MAC is proposed, where the real and imaginary parts can be computed by sending the previous MAC result as one of the partial product to the present multiplication. So the depth of the MAC is equal to the depth of the multiplier. And hence the separate accumulator circuit is avoided. The experimental results are showing the proposed fixed point complex number MAC is giving better performance than the conventional fixed point complex number MAC. The proposed architecture achieves an improvement factor of 32.4% in Wallace tree and 19.1% in Braun multiplier based fixed point complex number MAC with out pipeline using 45 nm technology library. The same architecture achieves an improvement factor of 14.6% in Wallace tree and 12.2% in Braun multiplier based fixed point complex number MAC with pipeline. In this paper [2], fixed-width modified Baugh-Wooley multiplier has been realized using Virtex-7, Artix-7 and Zynq- 7000 FPGAs. The design has been coded in VHDL using modular approach. The performance of the multiplier is evaluated based on area, speed and power using different design optimization goals such as balanced, area reduction, timing performance and power optimization. Optimal results are obtained with balanced approach as compared to other techniques. As a future work, a generic reconfigurable power efficient and low-error fixed-width Baugh-Wooley multiplier will be realized in FPGA and its performance will be analyzed and compared with other multiplier architectures. In this paper [3], an optimized co-processor unit, designed specifically for executing the DSP application is proposed. It can be used as a co-processor for the ACORN ARM processor. The co-processor comprises of one MAC unit, control unit, a 32 bit output registers and register files for storing the input values and other coefficient. The co-processor is designed to execute a FIR filter. Vedic multiplier and booth multiplier has been used in the MAC unit and comparison is done based on the power, speed and area. This paper [4] proposes a low power pipelined MAC architecture that incorporates a 16x16 multiplier using Baugh-Wooley algorithm with high performance multiplier tree, together with clock gating the idle pipeline stages to reduce the power consumption. By using the technique of clock gating independent pipeline stages of MAC architecture, we have shown that the power dissipation of the proposed MAC architecture is less than existing low power MAC units with the same performance. Simulations show that the power consumption of the proposed architecture is 30% to 80% less than the other contemporary MAC architectures, without compromising its computation performance. In this paper [5], MAC unit model is designed by incorporating the various multipliers such as Array Multiplier, Ripple Carry Array Multiplier with Row Bypassing Technique, Wallace Tree Multiplier and DADDA Multiplier [6] [7] in the multiplier module and the performance of MAC unit models is analyzed in terms of area, delay and power. The performance analysis of MAC unit models is done by designing the models in Verilog HDL. Then, MAC unit models are simulated and synthesized in Xilinx ISE 13.2 for Virtex-6 family 40nm technology. Further, this work can be extended by designing of MAC unit with higher number of bit sizes such as 16, 32 and 64 and also for designing applications like ALU, filters etc. Remarkable observations in the literature survey are

Vedic, Wallace, Dadda are all unsigned multipliers, Baugh Wooley is signed multiplier and Baugh Wooley multiplier is preferred because it offers better sign bit management, uniform VLSI structure and no complex encoding circuits that results in compact circuit. The biggest advantage of compact and uniform structure is implementation of pipelining that easily divides the partial product generation stages and increases speed of operation [8] [9].

| Reference | Multipliers                 | Number of pipelined stages | Speed of Operation |
|-----------|-----------------------------|----------------------------|--------------------|
| [1]       | 32 bit Wallace pipelined    | 3                          | 8 ns               |
| [2]       | Fixed Width 8 bit Baugh     | 3                          | 4.8 ns             |
|           | Wooley                      |                            |                    |
| [3]       | 16 bit Vedic non pipelined  | -                          | 1.169 ns           |
|           | 16 bit Booths non-pipelined | -                          | 11.489 ns          |
| [4]       | 16 bit Baugh Wooley         | Not mentioned              | 1.71 ns            |
|           | pipelined                   |                            |                    |
| [5]       | 16 bit Dadda non-pipelined  | -                          | 3.65 ns            |

Table 1 shows the comparative chart for various MAC

## III. Conclusion

MAC unit capable of multiplying two 16-bit complex numbers can be used any digital signal processor or digital signal processing system for implementation of filters etc. High performance 32-bit fixed point complex number MAC, where the real and imaginary parts can be computed by sending the previous MAC result as one of the partial product to the present multiplication. So the depth of the MAC is equal to the depth of the multiplier. The fixed point complex number MAC will give better performance than the conventional fixed point complex number multiplier.

## References

- [1] Mohamed Asan Basiri M, Noor Mahammad Sk, "An Efficient Hardware Based MAC Design in Digital Filters with Complex Numbers", in IEEE International Conference on Signal Processing and Integrated Networks, 2014.
- [2] Aiman Badawi, et al, "FPGA Realization and Performance Evaluation of Fixed-Width Modified Baugh-Wooley Multiplier", in IEEE International Conference on Technological Advances in Electrical, Electronics & Computer Engineering, 2015.
- [3] Rahul Narasimhan. A, R. Siva Subramanian, "High Speed Multiply-Accumulator Coprocessor Realized for Digital Filters", in IEEE International Conference on Electrical, Computer & Communication Technologies, 2015.
- [4] Rakesh Warrier, C.H. Vun, Wei Zhang, "A Low-Power Pipelined MAC Architecture using Baugh-Wooley based Multiplier", in IEEE Gobal Conference on Consumer Electronics, 2014.
- [5] Maroju Sai Kumar, D. Ashok Kumar, Dr. P. Samundiswary, "Design and Performance Analysis of Multiply-Accumulate (MAC) Unit", in IEEE International Conference on Circuit, Power and Computing Technologies, 2014.
- [6] Charles Roth, "Digital System Design using VHDL", in Cengage Learning, 2010.

- [7] Douglas Perry, "VHDL Programming by Examples", in Tata Mcgraw Hill, 2002.
- [8] K. Paldurai, K. Hariharan, G.C. Karthikeyan, K. Lakshmanan, "Implementation of MAC using Area Efficient and Reduced Delay Vedic Multiplier Targeted at FPGA Architectures", in IEEE International Conference on Communication and Network Technologies, 2014.
- [9] Ramandeep Kaur, Rahul Malhotra, Sujay Deb, "MAC based FIR Filter: A novel approach for Low-Power Real-Time De-noising of ECG signals", in IEEE 19th International Symposium on VLSI Design & Test, 2015.