© Universiti Tun Hussein Onn Malaysia Publisher's Office





The International Journal of Integrated Engineering

Journal homepage: http://penerbit.uthm.edu.my/ojs/index.php/ijie

# **Power-Effective Vedic-Hierarchical Multiplier**

# Wong Moon Cheng<sup>1</sup>, Mohd Rizal Arshad<sup>1</sup>\*

<sup>1</sup>Universiti Sains Malaysia Engineering Campus, 14300 Nibong Tebal, Pulau Pinang, MALAYSIA

\*Corresponding author

DOI: https://doi.org/10.30880/ijie.2020.12.02.009 Received 1 March 2019; Accepted 2 January 2020; Available online 28 February 2020

**Abstract:** The motivation of this project is to design a power effective multiplier without having much drawback in term of time constraint and area utilization. This is due to the overall power dissipation which increases in direct proportion to the increase in power density. The area and time constraint are considered as pertinent design parameters with the increase in market demand for high performance and complex portable systems. The current hierarchical multiplier designs suffer on long critical path and large area utilization. These parameters caused the hierarchical multiplier design to be less effective in term of power saving. The objective of this research is to reduce overall power dissipation and ensure that the multiplier achieve timing requirement. This paper proposes the design 1 by using two 4-bit Binary-to-Excess-One Converter (BEC) instead of an 8-bit BEC. The BEC reduce the critical path and cause low power dissipation. Design 2 which is implementation of Carry Select Adder (CSIA) with tristate buffer is to reduce logic elements. The power dissipation is reduced because of the decrease in the number of logic elements which leads to the lower number of logic gates. Both advantages of design 1 and design 2 are combined to become design 3. Design 3 is the lowest power dissipation design because it consists of the lowest number of logic elements and the shortest critical path. The timing requirement for all designs are met as shown by the positive slack time of 9.034 ns, 8.686 ns and 9.404 ns respectively. The power dissipation of the combinational multipliers designs have been improved to 60.96%, 60.89% and 61.02% respectively.

Keywords: Low power dissipation, Hierarchical multiplier, Vedic mathematics, Carry Select Adder, Binary-to-Excess-one Converter, FPGA

# 1. Introduction

Multiplication involves the partial product generation, partial product reduction and partial production addition for computing multiplication process. Area, speed and power dissipation are three essential design parameters for VLSI design. High speed, less area and low power consumption are ideal case for design. Either two of this parameter become research interest of designer [1, 2]. Multiplier takes a significant impact for computational performance. The multiplication is an arithmetic logic operation for many applications especially it dominate most of the DSP algorithm in term of execution time.

Power dissipation is leakage power consumption which depend on the temperature profile of on-chip implementation and higher temperature result in higher power dissipation. One of the way to reduce power is reducing size (number of transistors) of the gate that impact on delay of circuit. Power dissipation is applicable to all the hardware component including FPGA and multiplier is one of the programmable logic gate array in FPGA [3].

The interest of hierarchical multiplier is able to carry multiplication in one clock cycle. Therefore, hierarchical multiplier which based on Vedic algorithm is able to reduce its power dissipation based on its operation and able to

perform multiplication in one clock cycle [4, 5]. Vedic multiplier consumes less power than other multipliers and it has regular structure. Power utilization of Vedic multiplier is reduced by reducing number of operation and reducing dynamic power of total power dissipation [5]. High speed, less area and low power consumption are ideal case for design. Conventionally, delay time and area size are also reduced to minimize power dissipation [6].

### **1.1 Vedic Multiplication**

Vedic multiplier is based on Urdhva Tiryakbhyam Sutra which is a "Vertical and Crosswise" multiplication. Figure 1 shows the "Vertical and Crosswise" of Vedic multiplication concept. The advantages of Vedic multiplier are delay and area increases very slowly as compared to other multipliers as the number of bits increases. Therefore employing Vedic multiplication in the multiplier will reduce the complexity, execution time and power.

The inputs are separated into equal half which are A\_L,A\_H,B\_L and B\_H of to implement a Vedic algorithm. If a 16-bit Vedic multiplier is implemented, the A\_L will be 0th-bit to 7th-bit and the A\_H will be 8th-bit to 15th-bit of multiplicand and vice versa to multiplier B. Figure 2 shows the relationship of Vedic multiplication concept in Figure 1 into Vedic multiplication algorithm [7, 8].



Fig. 1 - Vedic multiplication concept



Fig. 2 - Block diagram of 16-bit Vedic algorithm

Considering two 8-bit binary numbers, A0A1A2A3A4A5A6A7 and B0B1B2B3B5B6B7 are used to perform multiplication and result in the following equation.

| R0=A0B0                                         | (1)  |
|-------------------------------------------------|------|
| C1R1=A0B1+A1B0                                  | (2)  |
| C2R2=C1+A0B2+A2B0+A1B1                          | (3)  |
| C3R3=C2+A3B0+A0B3+A1B2+A2B1                     | (4)  |
| C4R4=C3+A4B0+A0B4+A3B1+A1B3+A2B2                | (5)  |
| C5R5=C4+A5B0+A0B5+A4B1+A1B4+A3B2+A2B3           | (6)  |
| C6R6=C5+A6B0+A0B6+A5B1+A1B5+A4B2+A2B4+A3B3      | (7)  |
| C7R7=C6+A7B0+A0B7+A6B1+A1B6+A5B2+A2B5+A4B3+A3B4 | (8)  |
| C8R8=C7+A7B1+A1B7+A6B2+A2B6+A5B3+A3B5+A4B4      | (9)  |
| C9R9=C8+A7B2+A2B7+A6B3+A3B6+A5B4 +A4B5          | (10) |

| C10R10=C9+A7B3+A3B7+A6B4+A4B6+A5B5 | (11) |
|------------------------------------|------|
| C11R11=C10+A7B4+A4B7+A6B5+A5B6     | (12) |
| C12R12=C11+A7B5+A5B7+A6B6          | (13) |
| C13R13=C12+A7B6+A6B7               | (14) |
| C14R14=C13+A7B7                    | (15) |

#### 2. Proposed Design

The architecture of hierarchy multiplier for previous work is illustrated as the Figure 3 (a) [4, 9] in general whereas Figure 3.3(b) shows the proposed architecture of the hierarchical multiplier. Both multipliers consist of four 8x8 bit base Vedic multiplier blocks, a 16-bit Carry Save Adder (CSA), a 16 bit Carry Select Adder (CSIA). The 16-bit input X and Y is divided into equal halves which are  $8 \times 8$  bit multiplicand and multiplier respectively.

The architecture multiplier from previous work has an 8- bit BEC and a 16:8 MUX whereas the proposed architecture has two 4-bit BECs and two 8:4 MUX. The purpose of proposed design having has two 4-bit BECs and two 8:4 MUX is to reduce it's the critical part and minimize total power dissipation. Shorter critical path result in low temperature and dissipate less power.

There are three designs are implemented. For the first design, the two 4-bit BEC is used to replace 8-bit BEC only of entire architecture of hierarchical multiplier. Design 2 is implemented using proposed CSIA is used without having modification on entire 8-bit BEC. The architecture of design 3 is modified using proposed BEC and proposed CSIA.



Fig. 3 - (a): Hierarchical multiplier design [4, 9]; Fig. 3 - (b) Proposed hierarchical multiplier design

#### 2.1 Carry Select Adder (CSIA)

The proposed CSIA in Figure 4 will use tri-sate buffer to replace MUX from the conventional CSIA [4, 9, 10]. The unused input will cut off by tristate buffer to save logic element and power instead of conventional MUX [11].



Fig. 4 - Proposed CSIA topology

#### 2.2 Proposed BEC and MUX Topology

The two 4-bit BEC with two 8:4 MUX is used to reduce the critical path compared to a conventional design which uses 8-bit BEC and a 16:8 MUX. The selector of the second BEC is come from AND gates of first previous 4-bit BEC in Figure 5 and the critical path is reduced to save power. The proposed deign have less XOR gates and AND gates compared to conventional design. The proposed design have extra 1 NOT gate but reduce a XOR gate and AND gate in Table 3.1. Equations of a 4-bit BEC are shown in Equation 16, Equation 17, Equation 18 and Equation 19.

$$X0 = -B0$$
 (16)  
 $X1 = B0 \oplus B1$  (17)

- (17)
  - (18)
  - (19)



#### 3. Results and Discussions

X2=B2⊕(B0 & B1)

A 16x16 bit hierarchical multiplier is designed. In Figure 6, x and y are the input whereas z are 32-bit output of the hierarchy multiplier. The waveform in Figure 6 is based on simulation. The inputs is defined from minimum to maximum ( $2^{0}$  to  $2^{16}$ ) in term of decimal form. The outputs are performed properly since  $1x_{1}=1$ ,  $20x_{4}=800$ , 350x500=175000 and 65535x65535=4294836225.

|     |   | Name | Value at | 0 ps | 10.0 | ) ns | 20.0 ns | 30.    | 0 ns      | 40.0 ns |
|-----|---|------|----------|------|------|------|---------|--------|-----------|---------|
|     |   | Name | 0 ps     | 0 ps |      |      |         |        | -         |         |
| in_ | > | x    | U 1      |      | 1    | 20   |         | 350    | 65535     |         |
| in_ | > | У    | U 1      |      | 1    | 40   |         | 500    | 65535     |         |
| 쐥   | > | z    | U 1      |      | 1    | 800  |         | 175000 | 429483622 | 5       |

Fig. 6 - Simulation result of Hierarchical multiplier

Design 1 have advantage on shorter critical path compared to reference design when comparing Table 1 and Table 2.It is 60.96% power improved compared to reference design. The percentage improved is calculated based on equation 20. Design 2 exhibit the highest power dissipation but have the lowest propagation delay and lower number of logic element compared to design 1. It is 60.89% power improved from reference design in Table 3. The design 3 achieve the lowest power dissipation among the design due to the lowest logic element which is 591elements and perform almost same critical path with design 1. When comparing to benchmark design in Table 4, power dissipation is 61.02 % improved. The design 3 has drawback on the highest propagation delay among 3 designs.

$$\% improved = \frac{Parameter \ of \ journal \ [9] - Proposed \ designs}{Parameter \ of \ journal \ [9]} \times 100\%$$
(20)

| Types of designs                 | Total<br>Power<br>dissipation<br>(mW) | Number of<br>logic<br>elements | Critical<br>path   | Worst-case<br>propagation<br>delay (ns) |
|----------------------------------|---------------------------------------|--------------------------------|--------------------|-----------------------------------------|
| Proposed BEC                     | 78.36                                 | 600                            | x [6] to z<br>[27] | 24.110                                  |
| Proposed CS1A                    | 78.50                                 | 591                            | x [3] to z<br>[30] | 23.683                                  |
| Proposed BEC as<br>Proposed CSIA | nd 77.96                              | 591                            | x [3] to z<br>[25] | 24.921                                  |

Table 1 - Comparison of parameters between reference design and proposed design 1

Table 2 - Comparison of parameters between reference design and proposed design 1

| Parameters                      | MC's<br>design | Proposed<br>design 1 | Percentage<br>improved |
|---------------------------------|----------------|----------------------|------------------------|
| Total power dissipation<br>(mW) | 200.73         | 78.36                | 60.96%                 |
| Number of logic element         | 613            | 600                  | 2.12%                  |
| Worst-case propagation          | 31.173         | 24.110               | 22.66%                 |

Table 3 - Comparison of parameters between reference design and proposed design 2

| Parameters                      | MC's<br>design | Proposed<br>design 2 | Percentage<br>improved |
|---------------------------------|----------------|----------------------|------------------------|
| Total power dissipation<br>(mW) | 200.73         | 78.50                | 60.89%                 |
| Number of logic element         | 613            | 591                  | 3.59%                  |
| Worst-case propagation          | 31.173         | 23.683               | 24.03%                 |

Table 4 - Comparison of parameters between reference design and proposed design 3

| Parameters                      | MC's<br>design | Proposed<br>design 3 | Percentage<br>improved |
|---------------------------------|----------------|----------------------|------------------------|
| Total power dissipation<br>(mW) | 200.73         | 77.96                | 61.02%                 |
| Number of logic element         | 613            | 591                  | 3.59%                  |
| Worst-case propagation          | 31.173         | 24.921               | 19.60%                 |

## Conclusion

The power dissipation of the combinational multipliers designs has been improved to 60.96%, 60.89% and 61.02% respectively. The multiplier is verified by functional and timing verification and implemented into DE2-115 board successfully. In conclusion, proposed hierarchical multiplier achieves reduction of overall power dissipation, achieves timing requirement and achieve expected function on the hardware.

#### Acknowledgement

\_

I would like to express my sincere appreciation to who helped me during the research. I would also like to thank Universiti Sains Malaysia (USM) for providing easy access to catalogues of publications and materials as well as infrastructure required for this project especially the School of Electrical and Electronic Engineering.

#### References

- [1] P. R. Aparna and N. Thomas, "Design and implementation of a high performance multiplier using HDL," 2012 International Conference on Computing, Communication and Applications, 2012.
- [2] S. R. Vaidya and D. R. Dandekar, "A hierarchical design of high performance 8x8 bit multiplier based on Vedic mathematics," Proceedings of the 2011 International Conference on Communication, Computing & Security -ICCCS 11, 2011.

- [3] L. Deng, K. Sobti, Y. Zhang, and C. Chakrabarti, "Accurate Area, Time and Power Models for FPGA-Based Implementations," Journal of Signal Processing Systems, vol. 63, no. 1, pp. 39-50, 2009
- [4] M. Shoba and R. Nakkeeran, "Energy and area efficient hierarchy multiplier architecture based on Vedic mathematics and GDI logic," Engineering Science and Technology, an International Journal, vol. 20, no. 1, pp. 321-331, 2017.
- [5] S. Vaidya and D. Dandekar, "Delay-Power Performance Comparison of Multipliers in VLSI Circuit Design," International Journal of Computer Networks & Communications, vol. 2, no. 4, pp. 47-56, July 2010
- [6] S. R. Vaidya and D. R. Dandekar, "A hierarchical design of high performance 8x8 bit multiplier based on Vedic mathematics," Proceedings of the 2011 International Conference on Communication, Computing & Security, pp. 383-386, 2011.
- [7] A. Thomas, A. Jacob, S. Shibu, S. Sudhakaran, "Comparison of Vedic Multiplier with Conventional Array and Wallace Tree Multiplier," International Journal of VLSI System Design and Communication Systems, vol. 04, no. 04, pp. 244-248, April 2016.
- [8] J. Thomas, R. Pushpangadan, and S. Jinesh, "Comparative Study of Performance Vedic Multiplier on the Basis of Adders used," 2015 IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE), pp. 325-328, December 2015.
- [9] M. C. Wong and R. Hussin, "Speed and area analysis on hierarchy multiplier," EPJ Web of Conferences, vol. 162, pp. 1-5, 2017.
- [10] B. Ramkumar and H. M. Kittur, "Low-Power and Area-Efficient Carry Select Adder," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 20, no. 2, pp. 371-375, 2012.
- [11] Renuka Jaiswal, Ranbir Paul, Vikas Ranjan Mahto, "Power Reduction in CMOS Technology by using Tri-State Buffer and Clock Gating," International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), vol. 3, no. 5, pp. 1853-1860, May 2014.