Abstract: Embedded and mobile computing devices are frequently required to execute some key digital signal processing applications.
Introduction
Recent years have witnessed tremendous advancement of portable electronic devices powered by batteries with intensive computational capabilities. The power requirements of these devices have increased many folds with the increase in complexity of ICs.
Multiplication is one of the basic arithmetic operations but it requires more hardware sources. Advanced technology researchers, tried by using the multipliers to increase the speed and reduce the power in the digital signal processing applications. To determine the performance of the multiplier number of partial products to be added is the main parameter. Based on the application serial or parallel multiplier is selected.
The serial parallel multiplier achieves better performance by compromising the speed. But performing arithmetic operation, particularly multiplication operations, a typical processor central processing unit devotes a considerable amount of processing time. DSP system performance is limited by its multiplication performance. Multiplication dominates the execution time of most DSP algorithms.
Multiplications have always been a hardware, time and power consuming arithmetic operation, especially for large value operands. This bottleneck is even more emphasized in digital signal processing applications that involve a huge number of multiplications [4] . Achieving high energy efficiency has become a key design objective for embedded and mobile computing devices due to their limited battery capacity and power budget.
Parallel multipliers provide high speed method for multiplications, but require large area, for VLSI implementation. In most digital signal processing applications rounded product is required to avoid growth in word size. Thus an important aim is to design a multiplier which required less area and that is possible with the truncated multipliers. This multipliers are extensively used in digital signal processing where the speed of the multiplication and the area and power consumption are important [8] .
The fixed point arithmetic is used due to the area and cost of supporting FP units in embedded computing devices. Although this conversion process leads to some loss of computational accuracy, but it does not affect the quality of digital signal processing applications due to computational error tolerance. The approximate multiplication technique takes m-consecutive bits from each n-bit operand [1] .
In this paper, proposes an approximate multipliers uses m bit segment from n bit operand. This approach provides higher accuracy in the LSB by truncating method. Because, it consider accurate all bits. Here compare the array, booth and modified booth multiplier based on the delay and power.
Rest of this brief explains the follows. Section II explains the existing method for the proposed architecture. Section III analyzes the proposed methods architecture operation and performance analysis of the multipliers based on quality of delay and accuracy. Section IV results and discussion of this paper and section V concludes the paper.
Existing Method
Digital signal processing algorithms often rely heavily on a large number of multiplications, which is both time and power consuming. Various types of multipliers are used to improve the accuracy and energy efficiency. An 16*16 multiplier proposed using inaccurate 2*2 partial product generators while guaranteeing the minimum and maximum accuracy fixed at design time.
Approximate Array Multiplier
Array multiplier is well known due to its regular structure. Multiplier circuit is based on add and shift algorithm. Each partial product is generated by the multiplication of the multiplicand with one multiplier bit. The partial product are shifted according to their bit orders and then added. The adders can be performed with normal carry propagate adder. Both multiplicand and multiplier may be positive or negative, 2's complement number system is used to represent them. If the multiplier operand is positive then essentially the same technique can be used but care must be taken for sign bit extension. To motivate and describe this multiplier, define an m-bit segment as m contiguous bits starting with the leading one in an n-bit positive operand. This method dub into dynamic segment method and in contrast to static segment method.
With two m-bit segments from two n-bit operands, perform multiplication using an m*m multiplier. The simple example of a multiplication after taking 8-b segments from 16-b operands.
Figure 1: Example of multiplication with 8-b segments
Furthermore, an m*m multiplier consumes much less energy than n*n multiplier, because the complexity of multipliers quadratically increases with n. For example, the 4*4 and 8*8 multipliers consume almost 20* and 5* less energy than a 16*16 multiplier per operation on average.
The area and energy penalties associated with DSM are to capture an m-bit segment starting from an arbitrary bit position in an n-bit operand because the leading one bit can be anywhere. To overcome this SSM method can be used or multiplication, the m-bit segment that contains the leading one bit of each operand and apply the chosen segments from both operands to the m*m multiplier. The SSM greatly simplifies the circuits circuit that chooses m-bit segments and steers them to m*m multiplier by replacing two n-bit LODs and shifters for the DSM.
The accuracy of an SSM with m=n/2 can be significantly low for operands. On the other hand, this problem becomes less severe as m is larger than n/2. An SSM allowing taking an m-bit segment from two possible bit positions of an n-bit operand. The advantage is scalability for various m and n, because the complexity of circuits choosing/steering m-bit segments and expanding a 2m-bit result to a 2n bit results scales linearly with m.
The approximate multiplier takes m consecutive bits of an nbit operand either starting from the MSB or ending at the LSB and apply two segments that includes the leading ones from two operands to an m*m multiplier. Compared with an approach that identifies the exact leading one positions of two operands and applies two m-bit segments starting from the leading one positions, consumes much less energy and area efficiency.
This improved energy and area efficiency comes at the cost of slightly lower compute accuracy. The loss of small compute accuracy using SSM does not impact QoC of image, audio and recognition applications.
Proposed Method
Embedded and mobile computing devices are frequently required to execute digital signal processing (DSP). Now days the embedded system sacrifices the power for speed. Speed is the major concern in the digital signal processing. The speed of the booth multiplier and modified booth multiplier is considered and it can be compare with the array multiplier.
The multiplicand and product are specified in this multiplier. Both also in two's complement representation, but any number system that supports addition and subtraction will work as well like multiplier. These typically proceeds from LSB and MSB starting at i=0; based on this the booth and modified booth multiplier works.
Booth Multiplier
The multiplier needs to reduce the number of partial products generated. The booth recording multiplier is like that multiplier; it uses three bits at a time to reduce the number of partial products. These 3 bits are: the 2 bit from the present pair and a third bit from the high order bit of an adjacent lower order pair.
To speed up the multiplication booth encoding performs several steps of multiplication at once. An adder subtractor is nearly as fast and small as a simple adder. This advantage is taken by Booth's algorithm.
If the successive bits in the multiplicand are same then the addition/ subtraction operation can be skipped in the booth multiplier. The delay and the adders are reduced in this multiplier. The power consumption is increased in the booth multiplier. It is one of the drawbacks in this type multiplier. The complexity also reduced due to the reduction of adder circuit. The booth multiplication is a powerful algorithm for signed multiplication. It considers both positive and negative numbers randomly. This algorithm will reduce the number of multiplicand multiples.
For the standard add-shift operation, each multiplier bit generates one multiple of the multiplicand to be added to the partial product. Large number of multiplicands has to be added when the multiplier is large. The performance will get better, if there is away to reduce the number of addition.
Modified Booth Multiplier
The modified booth multiplier consists of three steps: 1. to generate partial products 2. to add the generated partial products until the least two rows are remained 3. to compute the final multiplication by adding the last two rows. It reduces the partial half products in the first step. It is noted as most efficient booth encoding and decoding scheme.
The modified booth algorithm is mainly used for high speed multiplier circuits. This modified booth multiplier's computation time and the logarithm of the word length of operands are proportional to each other. It reduces half the number of partial product. The overlapping is used for comparing three bits at a time. This grouping is started from least significant bit(LSB), in which only two bits of the booth multiplier are used by the first block and a zero is assumed as third bit. It generates irregular partial product array.
ISSN (Online): 2319-7064
The area is reduced due to its silicon area savings and the speed is increased as the number of stages reduced by half. The speed of the multiplier is mainly based on the partial product generate and how can add fast them.
To reduce the number of partial products to be added the modified booth algorithm is used. The power consumption also reduced significantly in the modified booth algorithm. It reduces the power at the same time the area and delay also reduced in the modified booth multiplier. Fig 2. Shows an SSM allowing taking an m-bit segment from two possible bit positions of an n-bit operand. The accuracy of the SSM method depends m=n/2. In this method two separate process used for MSB and LSB operation. Two 16 bit mux is used in the input side with the OR gate and it can be given to the multiplier and the half adder. The adder performs based on the requirement of the application. The adder is used based on the multiplier technique. The booth multiplier and the modified booth multiplier reduces the adders. Because of this the hardware complexity is reduced in this two multiplier compared with array multiplier. Then this multiplier and half adder output is given to the 3-to-1 multiplexer. The output of this multiplexer is 32 bit. The input is given as 16 bit for the MSB and LSB. Based on the truncation method the LSB considers and the accuracy is increased. This architecture support three possible starting bit positions for picking an m-bit segment where m=n/2. The two 2-to-1 multiplexers used at the input stage and one 3-to-1 multiplier at the output stage are replaced. This replacing is done with 3-to-1 and 5-to-1 multiplexers respectively. This can be processed by making some minor changes in logic functions generating multiplexer control signals.
Operation of the Multipliers
The evaluation of the computational accuracy and energy consumption of various multipliers are analysed. The VHDL code is used for verifying the output and power and delay are measured using the Xilinx software.
For computational accuracy evaluation, four sets of 16-b operand pairs from: 1) all possible pairs of 16-b values are taken in random manner; 2)by using audio the noise can be cancelled; 3)by image 2-D optical tomography; 4)by recognition isolated spoken digit recognition.
To evaluate delay and power the Xilinx software used. The two different multipliers are used like booth and modified booth multiplier. The booth multiplier is superior to all like speed, power and area. Modified booth sacrifices the speed for power. It reduces the power at the same time delay is increased compare with booth multiplier.
The area also reduced due to the reduction of the adder circuit in the architecture. Both of the multiplier reduces the adder operation. The complexity is also reduced in these multipliers. Then the performances are compared with the array multiplier. Array multiplier is reduces the power and at the same time the speed and complexity is increased. The accuracy is small. The array multiplier gives the 99% computational accuracy with negligible noise in the image.
Result Analysis
The delay, power and area of the array multiplier, booth multiplier and modified booth multiplier are analysed. This can be done by using the ModelSim. Compared with these three multipliers the modified booth multiplier reduces the area, power and delay. Table1 shows the comparison of the three multipliers based on the power and delay. Licensed Under Creative Commons Attribution CC BY power consumption is reduced. This is an 2 input multiplier. It's output used the MSB bit or LSB bit only. Because of this the accuracy is increased. 
Conclusion
In this paper Array multiplier, booth multiplier and modified booth multiplier were compared based on their delay and power. The array multipliers power is increased and the delay also increased. Booth multiplier reduces the delay and power. The modified booth multiplier reduces the power at the same time the delay is also reduced compared with the two multiplier. In the DSP applications the speed is the major concern. Compared with these three multipliers the modified booth multiplier reduces the delay and power.
