Traditional and Truncation schemes for Different  Multiplier by Yogesh M. Motey & Tejaswini G. Panse
International Journal of Electronics and Computer Science Engineering              627 
 
                          Available Online at www.ijecse.org                                           ISSN- 2277-1956 
ISSN 2277-1956/V2N2-627-633                                                                       
Traditional and Truncation schemes for Different 
Multiplier 
Yogesh M. Motey
  1 , Tejaswini G. Panse
  2 
1 2  Department of Electronics and Communication Engineering 
1 M. Tech(Electronics Engg), YCCE, Nagpur 
2 Asst. Prof.(Electronics Engg), YCCE, Nagpur 
1 Email-   yogesh.motey@gmaill.com  
 
Abstract-   A rapid and proficient in power requirement multiplier is always vital in electronics industry like DSP, 
image  processing  and  ALU  in  microprocessors.  Multiplier  is  such  an  imperative  block  with  respect  to  power 
consumption  and  area  occupied  in  the  system.  In  order  to  meet  the  demand  for  high  speed,  various  parallel  array 
multiplication  algorithms  have  been  proposed  by a  number  of  authors.  The  array  multipliers  use  a  large  amount  of 
hardware, consequently consuming a large amount of power.  One of the methods for multiplication is based on Indian 
Vedic mathematics. The total Vedic mathematics  is based  on  sixteen sutras (word formulae) and manifests a merged 
structure of mathematics. The parallel multipliers for example radix 2 and radix 4 booth multiplier does the computations 
using less number of adders and less number of iterative steps that results in, they occupy less space to that of serial 
multiplier. Truncated multipliers offer noteworthy enhancements in area, delay, and power.  Truncated multiplication 
provides different method for reducing the power dissipation and area of rounded parallel multipliers in DSP systems. 
Since in a truncated multiplier the x less significant bits of the full-width product are discarded thus partial products are 
removed and replaced by a suit- able compensation equations, match the accuracy with hardware cost. A pseudo-carry 
compensation truncation (PCT) scheme, it is for the multiplexer based array multiplier, which yields less average error 
among existing truncation methods.  
After studying many research papers it’s found that some of the schemes for multiplier are suitable because their own 
uniqueness of multiplication. Such schemes are listed in this paper for example the different truncation schemes like 
constant-correction  truncation  (CCT),  variable  -correction  truncation  (VCT),  pseudo-carry  compensation  truncation 
(PCT) are most suitable for truncated multiplier. 
 
Keywords – Digital multiplier, Wallace tree multiplier, truncated multiplier, Vedic multiplier, and variable correction 
truncation scheme. 
I. INTRODUCTION 
Almost  all  signal  processing  applications  demand  a  considerable  number  of  multiplications.  Some  of  these 
applications require the multiplication to be performed at a faster rate, and others concentrate on less hardware and 
moderate speed. Multiplication is a fundamental arithmetic operation used pervasively in digital signal processing 
(DSP) applications like filtering, convolution, and compression.  Multipliers are important components of many high 
performance systems such as filters in signal processors, microprocessors, and digital signal processors. Performance 
of system is generally depending on the performance of the multiplier as multiplier is generally the slowest clement 
in overall system. In addition to that, it is generally large area consuming. Thus, enhancing the speed and area of the 
multiplier is a major issue designing. The area and speed are usually conflicting constraints so that improving speed 
leads to larger areas. Because of it, all types of multipliers with different area-speed constraints have been designed 
with fully parallel. These multipliers have satisfactory performance in both area and speed. Now a day’s, the existing 
digit serial multipliers have been designed by complicated switching systems and/or irregularities in design. There are 
a variety of different implementations for parallel multipliers. Radix 2^n multipliers which operate on digits in a 
parallel  fashion  instead  of  bits  bring  the  pipelining  to  the  digit  level  and  avoid  most  of  ‘the  above  problems’. 
However, most algorithms involve a shift and an add technique where the multiplicand is conditionally added to 
obtain the final result. Although there are many algorithms for accomplishing this, there is no reduction in the height 
of partial products that need to be summed to produce the final result. The Booth algorithm attempts to reduce the 
number of partial-products by recoding the multiplier so that groups of its bits select multiples of the multiplicand. IJECSE,Volume2, Number 2  
Yogesh M. Motey
  et al. 
628 
 
ISSN 2277-1956/V2N2-627-633                                                                       
Although, Booth encoding can typically reduce 45% to 80% the number of operations in a typical DSP application, 
Booth-encoded  multipliers  typically  dissipate  more  power  than  multipliers  that  are  not  Booth-encoded.  Vedic 
mathematics approach is very different and its working is considered very close to the way a human mind’s working. 
The  various  methods  used  for  multiplication  in  Vedic  mathematics,  Urdhava  Tiryakbhyam  is  the  general 
multiplication formula applicable to all cases of multiplication. Vedic multiplier is used in matrix multiplication. 
DSP’s performance can be greatly improved using this Vedic multiplier. Although there have been many techniques 
to reduce the power dissipation of parallel multipliers, one technique that is well-suited for digital signal processing 
systems  is  truncated  multiplication.  Truncated  multiplication  is  significantly  reducing  the  power  over  standard 
parallel multiplier for different operand sizes, is shown in different research papers. 
 
II. THE VEDIC MULTIPLIER AND VEDIC MULTIPLICATION METHOD 
To develop powerful and direct applications of the Vedic sutras in geometry, calculus and computing great deal 
of research is also being carried out. Most engineering system designs are based on various mathematical approaches 
thus  conventional  mathematics  is  an  integral  part  of  engineering  education  since.  The  requirement  for  faster 
processing speed is continuously gives rise to major improvements in processor technologies and the search for new 
algorithms [2]. The proposed Vedic multiplier is based on the Vedic multiplication formulae (Sutras). Traditionally 
these Sutras have been used for the multiplication of two numbers in the decimal number system. Here, to make the 
proposed algorithm compatible with the digital hardware we apply the above ideas to the binary number system [2]. 
Vedic multiplication based on Urdhava Tiryakbhyam sutra is discussed below: 
A.   Urdhava Tiryakbhyam sutra:– 
An algorithm Urdhava Tiryakbhyam (Vertical & Crosswise) of ancient Indian Vedic Mathematics is used to the 
proposed multiplier. Generally Urdhava Tiryakbhyam Sutra is a multiplication formula applicable to all types of 
multiplication. It actually means “Vertically and crosswise”. A new concept is used in which the generation of all 
partial products can be done with the concurrent addition of these partial products. Generation of partial products 
through the parallelism and their summation is obtained using Urdhava Tiryakbhyam explained in Figure 1. The 
algorithm can be generalized for m x m bit number. The multiplier is independent of the clock frequency of the 
processor, since the partial products and their sums are calculated in parallel. Thus the multiplier will require the same 
amount of time to calculate the product and hence is independent of the clock frequency. The main advantage is that it 
reduces the need of microprocessors having higher clock frequencies. Higher clock frequency generally results in 
increased processing power; its disadvantage is that it also increases power dissipation which leads to higher device 
operating temperatures. Microprocessors designers can easily get out of these problems to avoid catastrophic device 
failures by adopting the Vedic multiplier. The processing power of multiplier can easily be increased by increasing the 
input and output data bus widths since it has a quite a regular structure [3]. Due to its regular structure, it can be easily 
layout in a silicon chip [4]. Advantage is gate delay and area increases very slowly as compared to other multipliers in 
this Multiplier with the increase number of bits,. Therefore it is efficient in time, space and power.  
Multiplication of two decimal numbers- 325*738: To illustrate this multiplication scheme, let us consider the 
multiplication of two decimal numbers (325 * 738). Line diagram for the multiplication is shown in Figure 2. The 
digits on the both sides of the line are multiplied and added with the carry from the previous step. This generates one of 
the bits of the result and a carry. This carry is added in the next step and hence the process goes on. If more than one 
line are there in one step, all the results are added to the previous carry. In each step, least significant bit acts as the 
result bit and all other bits act as carry for the next step. Initially the carry is taken to be zero. 
To illustrate the multiplication algorithm, let us consider the multiplication of two binary numbers a3a2a1a0 and 
b3b2b1b0. As the result of this multiplication would be more than 4 bits, we express it as ... r3r2r1r0. Line diagram for 
multiplication of two 4- bit numbers is shown in Figure 2 which is nothing but the mapping of the Figure 1 in binary 
system. For the simplicity, each bit is represented by a circle. 629 
Traditional and Truncation schemes for Different Multiplier 
 
ISSN 2277-1956/V2N2-627-633                                                                       
 
Figure 1.   Multiplication of Two decimal numbers by Urdhava Tiryakbhyam [7] 
Least significant bit r0 is obtained by multiplying the least significant bits of the multiplicand and the multiplier. 
The process is followed according to the steps shown in Figure 2. 
Firstly, least significant bits are multiplied which gives the least significant bit of the product (vertical). Then, the 
LSB of the multiplicand is multiplied with the next higher bit of the multiplier and added with the product of LSB of 
multiplier and next higher bit of the multiplicand (crosswise). The sum gives second bit of the product and the carry is 
added in the output of next stage sum obtained by the crosswise and vertical multiplication and addition of three bits of 
the two numbers from least significant position.  
 
Figure 2.   Line diagram for multiplication of two 4 – bit numbers  
Next, all the four bits are processed with crosswise multiplication and addition to give the sum and carry. The sum 
is the corresponding bit of the product and the carry is again added to the next stage multiplication and addition of 
three bits except the LSB. The same operation continues until the multiplication of the two MSBs to give the MSB of 
the product. IJECSE,Volume2, Number 2  
Yogesh M. Motey
  et al. 
630 
 
ISSN 2277-1956/V2N2-627-633                                                                       
For example, if in some intermediate step, we get 110, then 0 will act as result bit (referred as rn) and 11 as the 
carry (referred as cn). It should be clearly noted that cn may be a multi-bit number. 
Thus we get the following expressions: 
r0=a0b0;                                               (1) 
c1r1=a1b0+a0b1;                                    (2) 
c2r2=c1+a2b0+a1b1 + a0b2;                    (3) 
c3r3=c2+a3b0+a2b1 + a1b2 + a0             (4) 
c4r4=c3+a3b1+a2b2 + a1b3;                    (5) 
c5r5=c4+a3b2+a2b3;                               (6) 
c6r6=c5+a3b3;                                        (7) 
 
With c6r6r5r4r3r2r1r0 being the final product. Hence this is the general mathematical formula applicable to all cases 
of multiplication. 
III. Wallace Tree Multiplier 
The Wallace Tree multiplier proposed in [5]. The reduction scheme published by Wallace [6] begins by grouping 
the partial- product matrix into sets of three rows. Each set is reduced to two rows using half-adders on sets of two bits 
and full-adders on sets of three bits. Excess rows that do not belong to a set of three are passed to the next reduction 
stage unmodified. Each reduction stage is processed in a similar way until only two rows remain. Then a final CPA 
(Carry Propagate Adder) is used. Fig.3 shows the dot diagram illustrating Wallace reduction for an 8 × 8 multiplier. 
Dot diagrams, developed by Dadda, are a convenient means for visualizing the placement of full-adders and half-
adders.  In such  diagrams, dots represent bits,  given by  partial products.  For  example,  the upper-right dot in the 
multiplication matrix is x8y8. The circle with three dots represents a Full-Adder (FA). 
The operation of channel separation is applied on the watermarked color image to generate its sub images, and then 
2-level discrete wavelet transform is applied on the sub images to generate the approximate coefficients and detail 
coefficients.  
 
 
Figure 3.   Wallace reduction for an 8*8 multiplier. Full circle: Full Adder, Half circle: Half Adder. 
In the next stage the circle is substituted by its outputs: a bit sum, a dot in the same column, and a bit carry, a dot in 
the  column  on  the  left.  Same  reasoning  is  valid  for  the  half-adder,  represented  by  a  circle  with  dashed  line. 631 
Traditional and Truncation schemes for Different Multiplier 
 
ISSN 2277-1956/V2N2-627-633                                                                       
Summarizing in the Wallace tree the number of the operands is reduced at the earliest opportunity, so, if there are m 
dots in the column we immediately apply m=3 full adder to that column. This tends to minimize the overall delay by 
making the final CPA as short as possible intensity that makes it sensitive to intensity variations in the illumination or 
the geometry of the object. 
IV. Truncated Multipliers 
Parallel multipliers are typically implemented as either carry-save array or tree multipliers [9]. In many computer 
systems, the (n+m)-bit products produced by parallel multipliers are rounded to r bits to avoid growth in word size. As 
presented in [10], truncated multiplication provides an efficient method for reducing the hardware requirements of 
rounded parallel multipliers. With truncated multiplication, only the r+k most-significant columns of the multiplication 
matrix are used to compute the product. The error produced by omitting the m+n−r−k least-significant columns and 
rounding the final result to r bits is estimated, and this estimate is added along with the r + k most-significant columns 
to produce the rounded product. Although this leads to additional error in the rounded product, various techniques have 
been developed to help limit this error [8]. 
One method to compensate for truncation is Constant Correction Truncated (CCT) Multipliers [11]. In this method, 
a constant is added to columns n+m−r−1 to n+m−r−k of the multiplication matrix. The constant helps compensate for 
the error introduced by omitting the n +m− r – k least-significant columns (called reduction error), and the error due to 
rounding the product to r bits (called rounding error). The expected value of the sum of this error Etotal is computed by 
assuming that each bit in A, B and P has an equal probability of being one or zero. Consequently, the expected value of 
the total error is the sum of expected reduction error and the expected rounding error as 
       Etotal = Ereduction + Erounding                                                     (8) 
1 1
0
1 1
( 1) 2 2
4 2
S k S
m n q m n z
total
q z S k
E q
- - -
- - + - - +
= = -
= + × + × ∑ ∑
        (9) 
Where S = m + n − r [14]. The constant Ctotal is obtained by rounding Etotal to r + k fractional bits, such that  
                                          (10) 
Where round(x) indicates that x is rounded to the nearest integer. Although the value of k can be chosen to limit the 
maximum absolute error to a specific precision, this paper assumes the maximum absolute error is limited to one unit 
in the last place (i.e., 2 
−r).  
Another method to compensate for the truncation is using the Variable Correction Truncated (VCT) Multiplier 
[12]. With this type of multiplier, the values of the partial product bits in column m+n−r−k−1 are used to estimate the 
error due to leaving off the m+n−r−k least-significant columns. This is accomplished by adding the partial products 
bits in column m + n − r − k − 1 to column m + n − r − k. To compensate for the rounding error, a constant is added to 
columns m+n−r−2 to m+n−r−k of the multiplication matrix. The value for this constant is 
Ctotal =2
−s−1 (1−2
−k+1)                                                            (11) 
This corresponds to the expected value of the rounding error truncated to r + k bits. 
Another  method,  called  a  Hybrid  Correction  Truncated  (HCT)  Multiplier,  uses  both  constant  and  variable 
correction techniques to reduce the overall error [13]. In order to implement a HCT multiplier, a new parameter is 
introduced, p, that represents the percentage of variable correction to use for the correction. This percentage is utilized 
to choose the number of partial products from column m + n − r − k – 1 to be used to add into column m + n − r − k. 
The calculation of the number of variable correction bits is the following utilizing the number of bits used in the 
variable correction method, NHCT 
NHCT = floor (NV CT × p)                                                            (12) 
(2 )
2
r k
total
total r k
round E
C
+
+
×
= -IJECSE,Volume2, Number 2  
Yogesh M. Motey
  et al. 
632 
 
ISSN 2277-1956/V2N2-627-633                                                                       
Similar to both the CCT and the VCT multipliers, a HCT multiplier uses a correction constant to compensate for 
the error. However, since the correction constant will be based on smaller number bits than a VCT multiplier, the 
correction constant is modified as follows 
CVCT‘ = 2
−r−k−2 · NHCT                                                                  (13) 
This produces a new correction constant based on the difference between the new variable correction constant and the 
constant correction constant.  
                    (14) 
 
V. CONCLUSION AND FUTURE DIRECTIONS 
Vedic multiplier proves to be highly efficient in terms of the speed. The main advantage is delay increases slowly 
as the input bits increases. At the same time speed is function of no. of input bits thus for large bit data speed is low. 
For multimedia and digital signal processing application that do not require correctly rounded multiplication, truncated 
multipliers offer a significant hardware savings while introducing a small amount of additional error. Simulations 
indicate that for applications where correct rounding of the result is not needed; truncated multipliers have significant 
savings in terms of area, delay, and power. The efficiency of digital multiplication can be improved tremendously by 
truncation methods provided precise outputs are not required for the operation. The table below shows the area and 
power comparison using synopsis tool compiler. 
Table -1 Comparative Result 
Multiplier  Power  Area 
Wallace tree  339.65uw  3658.2716 
Modified Both- 
Wallace tree 
288.23uw  5103.555 
Modified 
Wallace tree 
99.54uw  2996.9652 
Modified Both- 
Modified 
Wallace tree 
288.47uw  5096.3124 
 
The VCT multiplier using the new truncated multiplication was incorporated into JPEG image compression, the 
differences  in  PSNR  of  the  reconstructed  images  were  within  1  dB  from  those  obtained  with  full-precision 
multiplication. 
                             
VI. REFERENCE 
 
[1]   C. H. Chang, R. K. Satzoda, and S. Sekar, “A novel multiplexer based     truncated array multiplier,” in Proc. IEEE Int. Symp. Circuits Syst.  
(ISCAS), Kobe, Japan, May 2005, pp. 85–88. 
[2]   S. S. Kerur,  Prakash Narchi, Jayashree C N, Harish M Kittur and Girish V “A Implementation of Vedic Multiplier for Digital Signal 
Processing”   International Conference on VLSI, Communication & Instrumentation (ICVCI) 2011 Proceedings published by International 
Journal of     Computer Applications® (IJCA). 
[3]  Himanshu Thapliyal and M.B Srinivas, “An Efficient Method of Elliptic Curve Encryption Using Ancient Indian Vedic M athematics”, 
IEEE, 2005. 
' (( ) 2 )
2
r k
CCT VCT
total r k
round C C
C
+
+
- ×
=-633 
Traditional and Truncation schemes for Different Multiplier 
 
ISSN 2277-1956/V2N2-627-633                                                                       
[4]   H. Thapliyal and M. B. Shrinivas and H. Arbania, “Design and Analysis   of a VLSI Based High Performance Low Power Parallel Square 
Architecture”, Int. Conf. Algo. Math.Comp. Sc., Las Vegas, June 2005, pp. 72-76. 
[5]  VALERIA GAROFALO Thesis on “Truncated Binary Multipliers with Minimum Mean Square Error: Analytical Characterization, Circuit 
Implementation and Applications.” 
[6]  C.Wallace, “A suggestion for fast multiplier,” IEEE Transaction on Electronic Computers, vol. EC-13, no. 1, pp. 14–17, Feb. 1964. 
[7]  P. D. Chidgupkar and M. T. Karad, “The Imp lementation of Vedic Algorithms in Digital Signal Processing”, Global J. of Engg. Edu, vol.8, 
no.2, 2004. 
[8]  Alok A. Katkar and James E. Stine “Modified Booth Truncated Multipliers” VLSI Computer Architecture, Arithmetic, and CAD Research 
Laboratory Department of Electrical and Computer Engineerin Illinois Institute of Technology Chicago, Illinois 60616, USA. 
[9]  K. Bickerstaff, M. J. Schulte, and E. E. Swartzlander, Jr., “Parallel Reduced Area Multipliers,” Journal of VLSI Signal Processing, vol. 9, pp.  
181–192, April 1995. 
[10]  Y.  C.  Lim,  “Single-precision  multiplier  with  reduced  circuit  complexity  for  signal  processing  applications,”  IEEE  Transactions  on 
Computers, vol. 41, no. 10, pp. 1333–1336, 1992. 
[11]  ] M. J. Schulte and E. E. Swartzlander, Jr., “Truncated multiplication with correction constant,” in VLSI Signal Processing VI, pp. 388–396, 
October 1993. 
[12]   E. J. King and E. E. Swartzlander, Jr., “Data-dependent truncated scheme for parallel multiplication,” in Proceedings of the Thirty First 
Asilomar Conference on Signals, Circuits and Systems, pp. 1178–1182, 1998. 
[13]   J. E. Stine and O. M. Duverne, “Variations on truncated multiplication,”  in Euromicro Symposium on Digital System Design, pp. 112–119, 
2003. 
[14]  K. Z. Pekmestzi, “Multiplexer-based array multipliers,” IEEE Trans. Comput., vol. 48, no. 1, pp. 15–23, Jan. 1999. 
[15]  L. -D.Van and C. -C.Yang, “Generalized low-error area-efficient fixedwidth multipliers,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol.  52, 
no. 8, pp. 1608–1619, Aug. 2005.  
[16]  M. J. Schulte, J. E. Stine, and J. G. Jansen, “Reduced power dissipation  through truncated multiplication,” in Proc. IEEE Alessandro Volta 
Memorial Workshop Low-Power Des., Mar. 1999, pp. 61–69. 
[17]  S. Kidambi, F. El-Guibaly, and A. Antonious, “Area-efficient multipliers for digital signal processing applications,” IEEE Transaction on 
Circuits and Systems II: Analog and digital signal processing, vol. 43, no. 2, pp.  90–95, Feb. 1996. 