Novel design of array multiplier by Razavi, Seyyed Masoud & Talebiyan, Seyyed Reza
312 
 
Recebido: dia/mês/ano Aceito: dia/mês/ano 
Ciência e Natura, v. 37 Part 2  2015, p. 312−319 
 
ISSN impressa: 0100-8307  ISSN on-line: 2179-460X 
 
 
Novel design of array multiplier 
Seyyed Masoud Razavi 1, Seyyed Reza Talebiyan1 
1Department of Electrical Engineering, Imam Reza International University, Mashhad, 
Iran  
 
Abstract 
In this paper a new array multiplier has been proposed, which has lower power consumption than the regular array multipliers.  This 
technique has been applied on two conventional and leapfrog array multipliers. In the formation of 8×8 mult iplier all designs proposed in this 
paper have been implemented using the HSPICE by the use of 180 nm TSMC technology at a supply voltage 1v. To verify the performance of 
the proposed structures, structures have been simulated in 130 nm & 65 nm PTM technologies. The simulation results show that applying 
the return technique in the array structures causes power consumption reduction and consequently PDP reduction. This improvement for 
180 nm technology in the conventional array structure is 13.32 % and in the leapfrog array structure is 23.27 %. It should be noted that this 
technique substantially makes the number of transistors less and as a result area reduction.  
Keywords: Array multiplier, return technique, return leapfrog array multiplier, power, delay 
313 
 
Recebido: dia/mês/ano Aceito: dia/mês/ano 
1 Introduction 
any application systems such as 
digital signal processing systems 
require the processing of large 
amounts of digital data (Chong, Bah-Hwee, & 
Chang, 2004; Mathew, Latha, Ravi, & 
Logashanmugam, 2013; Ravi, Rao, & Prasad, 
2011; Ravi, Subbaiah, Prasad, & Rao, 2011; 
Srivastava, Vishant, Singh, & Nagaria, 2013). To 
implement algorithms such as convolution and 
because of various filters, multiplication 
operation unit is placed in digital signal 
processor systems. In many algorithms, 
multiplication is considered as the critical path 
and consequently the most critical operations 
(Mathew et al., 2013). In recent years, 
researchers have put emphasis on three fields of 
power, speed, and area (Ravi, Rao, et al., 2011). 
The need for specific design causes the increase 
of consumption power and the number of 
transistors on chip as well. Therefore, power is 
the most important field among those three. In 
order to achieve the high operating speed the 
most suitable structure is array multipliers 
which are in good order and lead to the ordered 
arrangement of layout (Mathew et al., 2013). 
This paper focuses on power consumption and a 
new method for array multipliers has been 
proposed which can reduce the power and the 
area as well. In the second section, the 
mathematical relationships and the 
multiplication algorithm of two 8-bit numbers 
have been explained. In the third section, two 
conventional array multiplier and leapfrog 
structures have been analyzed and in the fourth 
section, by applying return technique on two 
structures, a new design has been done. In the 
fifth section, how to perform the simulation 
process about the best selection and result 
presentation is examined and finally in the sixth 
section, a general conclusion has been made 
about the work done. 
2 Parallel Multiplier  
A serial multiplier consumes less power but 
due to ripple, delay will be more. In parallel 
multiplier delay is less but high complex 
circuitry it consumes more power. 
Consider the multiplication of two unsigned 
n-bit numbers, where 
n-1 n-2 0
X= x ,x ,...x  is the 
multiplicand and 
n-1 n-2 0Y= y ,y ,...y  is the 
multiplier. The product of these two bits can be 
written as (Mathew et al., 2013; Ravi, Rao, et al., 
2011; Ravi, Subbaiah, et al., 2011). 
n-1
i
n-1 n-2 0 i
i=0
X = x x ....x = x 2   
 
(1)  
 
n-1
j
n-1 n-2 0 j
j=0
 Y = y y ....y = y 2  
 
(2) 
 
n-1 n-1
i j
i j
i=0 j=0
n-1 n-1
i+j
i j
i=0 j=0
 P = XY= x 2   y 2  
= x y 2            
  
    
   
 

 
 
(3) 
 
In the example discussed in this paper are      
8-bit multiplicand and multiplier. Using 
equation 3, 8 rows of partial product as shown 
in Figure 1 has been shown to be produced. 
 
 
Figure 1: 8×8 array multiplication algorithm.  
3 Array Multiplier 
In this section conventional and leapfrog 
array multipliers will be reviewed briefly. It 
will be a point for our design in section 4. 
3.1 Conventional Array Multiplier 
The block diagram of a 8×8-bit conventional 
array multiplier is shown in Figure 3. In the 
conventional array multiplier (Chong et al., 
2004; Mahant-Shetti, Balsara, & Lemonds, 1999; 
M 
314 
 
Recebido: dia/mês/ano Aceito: dia/mês/ano 
Ravi, Rao, et al., 2011), output signals (Sum and 
Carry) of the carry save adders (CSAs) are 
directly connected to the next row of CSAs. 
Finally, in order to produce an 8-bit value most 
significant bit (MSB) of the ripple carry adder 
(RCA) n-bits is used. 
3.2 Leapfrog Array Multiplier 
The block diagram of a 8×8-bit leapfrog array 
multiplier is shown in Figure 4. In the leapfrog 
array structure (Chong et al., 2004; Mahant-
Shetti et al., 1999), on the other hand, the 
interconnections of the CSAs are rearranged 
such that the propagation delay of the CSAs is 
better synchronized within the intermediate 
rows. This potentially results in higher speed 
and lower spurious switching (lower power 
dissipation) because the carry signal of the full 
adder is generally generated earlier than the 
sum signal of the same full adder. To take 
advantage of this, instead of connecting the sum 
outputs of the CSAs in row 1 to the CSAs in row 
2 (as in a general array structure), the sum 
outputs of the CSAs in row 1 are instead 
connected to the CSAs in row 3. The carry 
signals of the CSAs in row 1, however, remain 
connected to the CSAs in row 2. Put simply, in a 
leapfrog array structure, the arrival times of 
carry (from row 2) and sum signals (from row 1) 
are better synchronized to the CSAs in row 3. 
Consequently, this results in higher speed (for 
data propagation) and lower spurious switching 
(less power dissipation) (Chong et al., 2004; 
Mahant-Shetti et al., 1999). 
4 Return Technique  
By using return technique in these structures, 
addition operation is done through two cycles.  
For the first cycle, the addition operation on the 
first four rows of partial products is also done 
and for the second cycle, the addition operation 
on the second four rows of partial products and 
on the final result of the first cycle is done. In 
Figure 2, the multiplication algorithm of two       
8-bit numbers is shown by applying return 
technique. 
 
Figure 2: 8-bit multiplication algorithm by 
applying a return technique. 
4.1 Return Conventional Array Multiplier 
The block diagram of the return conventional 
array multiplier is shown in Figure 5. In the 
structure the number of full adder rows is 
reduced to half than conventional array 
multiplier and a row of registers for saving the 
outputs of the last full adder row for the first 
cycle and returning them for the second cycle to 
the input of the first full adder row, are used.    
T-1…T-4 are 1-bit registers and T-0…T-7 are      
2-bit registers. T-1…T-4 registers for every two 
cycles include the 8-bit least significant bit (LSB) 
of the final product. In this structure, if the 8-bit 
LSB are considered as two groups, for first cycle, 
First 4 bit of the final product are produced and 
saved in T1…T4 registers, and the sum of the 
first 4 rows partial product are saved in the 
T0…T7 registers, and are returned to the input 
of the first row of full adder for the second cycle, 
and they are added to the second 4 rows of 
partial product. The second 4 bit of the LSB of 
the final product are produced for the second 
cycle and saved in the T1…T4 registers. The 
saved bits on T0…T7 registers are applied to the 
final stage of full adder (CRA). So that the 8 bit 
MSB of the final product are produced. 
 
 
 
 
 
 
 
 
 
 
 
315 
 
Recebido: dia/mês/ano Aceito: dia/mês/ano 
 
Figure 3: Block diagram of conventional array multiplier. 
 
 
Figure 4: Block diagram of leapfrog array multiplier.
4.2 Return Leapfrog Array Multiplier 
The main structure presented in this paper 
which has the lowest consumption power is the 
return leapfrog array multiplier structure. In 
Figure 6, block diagram of the return leapfrog 
array multiplier is shown. In this structure, the 
length of the first adder row is n-bit which is 
equal to the length of multiplicand, and the 
length of the next three rows is n+1bit. The 
addition of a full adder to these three rows is for 
adding the output of previous row sum and the  
 
 
leapfrog sum of the previous two rows as well. 
Because of leapfrog, in this structure two register 
rows are used. The number of these registers in 
the first row is 3/2n, which   T1-1…T1-4 registers 
are single-bit and T10…T17 registers are two-bit 
and include output carry of the last row of adder 
and the sum output of penultimate row. The 
length of these registers in the second row is       
n-bit which includes T20….T27 and consists of 
the first registers, carry of single adder in the 
fifth row, and the rest of registers consist of the 
sum output of the last row (fourth row) adder. 
316 
 
Recebido: dia/mês/ano Aceito: dia/mês/ano 
 
Figure 5: Block diagram of return conventional array multiplier. 
 
 
Figure 6: Block diagram of return leapfrog array multiplier. 
 
The performance of this structure the same as 
the previous one, stands for two cycles. For the 
first cycle 4-bit of first LSB and for the second 
cycle 4-bit of second LSB are also produced in 
the output of T1-1…T1-4 registers. For the first 
cycle, sum output of the third row and the fourth 
row carry of adders are saved in T10…T17 
registers and will be returned to the first row of 
adder. The sum output of adder last row will be 
saved in T21…T27 registers and fifth row carry 
will be saved in T20 and will be returned to the 
adder second row to be also added with the 
second category of partial product rows for the 
second cycle. The last stage of this structure, the 
same as the structure of current leapfrog array 
multiplier, consists of a row of CSA and a row of 
CRA. For the second cycle, T10…T17 and 
T21…T27 registers output are applied in a row of 
CSA in order to decrease the number of product 
rows. Finally a CRA row is used to produce the 
final result. The full adder and register 
architecture used in this paper are shown in 
Figures 7(a) and 7(b) . 
In fact, in these two structures, due to the 
reduction of the number of full adder rows, area 
and consequently consumption power also 
decrease. 
 
317 
 
Recebido: dia/mês/ano Aceito: dia/mês/ano 
     
(a) 
 
 
 
 
 
 
 
 
 
(b) 
Figure 7: (a) Full adder architecture and (b) C2CMOS flip flop architecture (Tambat & Lakhotiya, 
2014). 
5 Simulation Results 
The simulation in this paper was performed 
by HSPICE software and by means of 180nm 
TSMC and 130nm & 65nm PTM libraries and in 
the form of multiplying two 8-bit numbers. In 
order to show the suitable performance and very 
low consumption power of the designed return 
leapfrog array multiplier structure, this structure 
along with the return conventional array 
multiplier structure were designed and 
simulated by full adder cell and register which is 
presented in figures 7(a) and 7(b) of this paper. 
The results of simulation have been shown in the            
 
 
 
following tables by means of different libraries. 
In all technologies, the related results of the 
performance of each structure are presented in 
front of it first Real values and then, normalized 
values. It should be noted that, the normalization 
process was performed separately for each 
structure. Since the main discussion in this paper 
is on the array structure, in this simulation 
assuming that all individual bits of partial 
product have been previously produced, delay 
and consumption power are only related to the 
array structure and were calculated. In all 
technologies, the return leapfrog array multiplier 
structure in comparison to the rest of proposed 
structures has the least PDP. 
Table 1: The results of the simulation of array and return array multiplier by using the 180 nm 
technology 
No. of 
Transistors 
PDP 
(E-15) 
Delay    
(E-9) 
Avg.Power 
(E-6) 
Parameters Multipliers 
(180nm) 
2736 4.6283 1.4806 3.1260 Real  
CAM  1   1   1   1 Normalized 
1760 4.0119 2.4279 1.6524 Real  
   ReturnCAM  0.6432 0.8668 1.6398 0.5285 Normalized 
3344 4.0447 1.7486 2.3131 Real  
LAM  1 1 1 1 Normalized 
2350 3.1037 1.8696 1.6601 Real   ReturnLAM  
0.7027 0.7673 1.0691 0.7176 Normalized 
 
 
318 
 
Recebido: dia/mês/ano Aceito: dia/mês/ano 
Table 2: The results of the simulation of array and return array multiplier by using the 130 nm 
technology 
No. of 
Transistors 
PDP 
(E-14) 
Delay    
(E-10) 
Avg.Power 
(E-4) 
Parameters Multipliers 
(130nm) 
2736 7.5775 2.6334 2.8774 Real  
CAM  1 1 1 1 Normalized 
1760 7.3324 4.4419 1.6507 Real  
    ReturnCAM  0.6432 0.9676 1.6867 0.5736 Normalized 
3344 9.4487 4.4923 2.1033 Real  
LAM  1 1 1 1 Normalized 
2350 5.6161 3.5190 1.5959 Real  
   
Return
LAM  0.7027 0.5943 0.7833 .07587 Normalized 
 
Table 3: The results of the simulation of array and return array multiplier by using the 65 nm  
technology 
No. of 
Transistors 
PDP 
(E-14) 
Delay    
(E-10) 
Avg.Power 
(E-4) 
Parameters Multipliers 
(65nm) 
2736 2.5390 1.4746 1.7218 Real  
CAM  1 1 1 1 Normalized 
1760 2.3482 2.3547 0.9972 Real  
   ReturnCAM  0.6432 0.9248 1.5968 0.5791 Normalized 
3344 3.2261 2.5073 1.2867 Real  
LAM  1 1 1 1 Normalized 
2350 1.8104 1.8475 0.97995 Real  
   ReturnLAM  0.7027 0.5611 0.7368 0.7615 Normalized 
 
 
The frequency of return structures by means 
of different libraries are as follows: 
max_180nm
max_130nm
max_65nm
F = 2 Mhz
F = 266,66 Mhz
F = 400 Mhz
 
6 Conclusion 
The simulation results show that applying the 
return technic in the array structures cause 
power consumption reduction and consequently 
PDP reduction. This improvement for 180 nm 
technology in the conventional array structure is 
13.32 % and in the leapfrog array structure is 
23.27 %. It should be noted that this technic 
substantially makes the number of transistors 
less and as a result area reduction. This 
reduction, for leapfrog array structure is 29.73 % 
and for conventional array structure is 35.68 %. 
 
 
References 
Chong, K.-S., Bah-Hwee, G., & Chang, J. S. (2004, 
23-26 May 2004). A low power 16-bit Booth 
Leapfrog array multiplier using Dynamic 
Adders. Paper presented at the Circuits and 
Systems, 2004. ISCAS '04. Proceedings of the 
2004 International Symposium on. 
Mahant-Shetti, S. S., Balsara, P. T., & Lemonds, 
C. (1999). High performance low power array 
multiplier using temporal tiling. Very Large 
Scale Integration (VLSI) Systems, IEEE 
Transactions on, 7(1), 121-124. doi: 
10.1109/92.748208 
Mathew, K., Latha, S. A., Ravi, T., & 
Logashanmugam, E. (2013). Design and 
Analysis of Array Multiplier using an Area 
Efficient Full Adder Cell in 32nm CMOS 
Technology. International Journal of 
Engineering and Science, 2(3), 8-16.  
Ravi, N., Rao, D. T., & Prasad, D. T. (2011). 
Performance Evaluation of Bypassing Array 
Multiplier with Optimized Design. 
319 
 
Recebido: dia/mês/ano Aceito: dia/mês/ano 
International Journal of Computer 
Applications (0975–8887) Volume.  
Ravi, N., Subbaiah, Y., Prasad, T. J., & Rao, T. S. 
(2011). A novel low power, low area array 
multiplier design for DSP applications. Paper 
presented at the Signal Processing, 
Communication, Computing and Networking 
Technologies (ICSCCN), 2011 International 
Conference on. 
Srivastava, P., Vishant, V., Singh, R. K., & 
Nagaria, R. K. (2013, 12-14 April 2013). 
Design and implementation of high 
performance array multipliers for digital 
circuits. Paper presented at the Engineering 
and Systems (SCES), 2013 Students 
Conference on. 
Tambat, R. V., & Lakhotiya, S. A. (2014). Design 
of Flip-Flops for High Performance VLSI 
Applications using Deep Submicron CMOS 
Technology.  
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
 
 
