Compressors are mostly used in multipliers to reduce partial products in a parallel manner. Firstly, this paper draw a comparison between the conventional m:2 and m:3 compressors. Secondly, a new hybrid 16-bit16-bit multiplier is proposed in this paper with the aim of taking benefits from both kinds of compressors. The new design decreases the amount of carry signals by employing m:3 compressors in the first stage. It also accelerates reducing partial products by using m:2 compressors in the following stages. The second and third phases of multiplication are considered together in this paper. The synthesizable structural VHDL code is used to simulate and implement different architectures. Our investigations demonstrate that the new multiplier is the fastest one with reasonable power and area dissipations.
I. INTRODUCTION
As the feature size of transistors keeps shrinking, more and more complicated signal processing systems are being implemented on a single nano-scale chip. In today's IC design industry, interconnections have become a serious challenge because they occupy a large amount of the chip area and lead to restrictions in placement and routing of logic elements [1] . They consume a lot of power and cause parasitic effects by increasing the number of connections inside a chip. Wire capacitance can contribute up to 70% of the total chip capacitance [2] , [3] . On the other hand, faster logic elements and arithmetic units have always been in high demand.
Multiplication is considered as a rather complicated and time-consuming arithmetic operation. It is widely used in DSP algorithms such as filtering and Fourier transforms. It is also a fundamental computational unit in microprocessors, embedded systems, and crypto processors [4] - [7] .
Multiplier architecture is divisible into three phases [8] : partial product generation phase; partial product reduction phase; and the final addition phase. At first, partial products are produced by multiplying each bit of the multiplicand with each bit of the multiplier. This phase results in a huge number of partial products in different bit-weight positions. As a Manuscript result, the partial product reduction is the most critical and time-consuming phase. All of the partial products must be accumulated up to the point where there are only two remaining partial products in each column. The final phase is the addition of these remaining partial products.
A compressor is simply an adder circuit. It takes a number of equally-weighted bits, adds them, and produces some sum signals. Compressors are commonly used with the aim of reducing and accumulating a large number of inputs to a smaller number in a parallel manner. Their main application is within a multiplier, where a huge number of partial products have to be summed up concurrently. The inner structure of compressors avoids carry propagation. Either there are not any carry signals or they do arrive at the same time of the internal values.
There are several different types of compressors. In general, an m:n compressor takes m input variables and returns n outputs. Carry signals are needed if the sum of input variables ( Inputs ) is greater than the sum of outputs ( Outputs ). m:2 compressors are the most well-known ones due to their high bit-rate compression. However, their efficiency degree must be investigated inside a multiplier, not individually. On the other hand, it is shown in [9] that how m:3 compressors might accelerate the phase of partial product reduction. Nevertheless, the third phase of multiplication is not taken into consideration in [9] . This paper takes the second and third phases into account, and presents a new hybrid multiplier architecture by the usage of m:2 and m:3 compressors. It combines the advantages of both kinds of compressors.
The rest of the paper is organized as follows: Section II includes a brief review of the conventional m:2 and m:3 compressors. Section III shows how compressors are connected together to form a 16bit-16bit multiplier. The proposed architecture is also presented in this section. Implementation results and comparisons are included in Section IV. Finally, Section V concludes the paper.
II. THE CONVENTIONAL M:2 AND M:3 COMPRESSORS
Compressors are widely used in the second phase of a multiplier to accumulate partial products in a concurrent manner. Their inner structure avoids carry propagation. The conventional m:2 and m:3 compressors are briefly reviewed in this section.
The overall structure of an m:2 compressor is shown in Fig.  1 , where the weights of the partial products are indicated by different colors. It takes m equally weighted input signals and returns two main outputs (Sum1 and Sum2). In addition, the m:2 compressor requires p input and output carries. 
Outputs Output Carries
Partial Products Input Carries
The overall structure of an m:3 compressor is shown in Fig.  2 . It produces one more main output signal (Sum3) than the m:2 does. This additional output provides a higher range of output values. Therefore, the m:3 compressor requires fewer carry signals than the m:2 does. Generally, it needs Tables I and II show that how many Half Adders (HA) and Full Adders (FA) are required to form the conventional compressors. The delay parameter of Half and Full Adders are considered as τ HA and τ FA . We assume that HAs are almost twice as fast as FAs (τ FA ≈2τ HA ). Outputs Output Carries Partial Products Input Carries
To estimate how rapidly an m:n compressor accumulates partial products, its bit-rate compression is quantified by equation 3 [10] , where the delay parameter is normalized by τ HA . The m:2 compressors reduce partial products faster (Tables II and III) , paying the prise of having more hardware components and interconnections. Note that although the m:2 compressors have higher bit-rate compression, the m:3 compressors are internally faster because of their shorter critical path. Therefore, their rapidness has to be investigated within a practical multiplier block. In the following sections, their impact on a 16bit-16bit multiplier is examined. 
III. MULTIPLIERS BY M:2 AND M:3 COMPRESSORS
A 16-bit16-bit multiplier is exemplified in this paper to compare m:2 and m:3 compressors. The first multiplier is the one with only m:2 compressors (Fig. 5 ). The second one is only composed of m:3 compressors (Fig. 6 ). The second phase of both multipliers is made of three stages, where the unattached input carries are connected to the ground (GND). There are a large amount of interconnections within the first stage of the first multiplier ( Fig. 5 ). On the other hand, there
International Journal of Information and Electronics Engineering, Vol. 6, No. 2, March 2016 are more remaining partial products in the second stage of the multiplier by m:3 compressors (Fig. 6 ). The m:2 compressors produce fewer partial products in comparison with the m:3 ones. The proposed 16-bit16-bit multiplier is depicted in Fig. 7 . It is a hybrid architecture, which is made of both kinds of m:2 and m:3 compressors. The first stage is built by the m:3 compressors in order to reduce carry signals and interconnections subsequently. Then, the m:2 compressors are exploited in the second stage to accelerate partial product reduction. As a result, the proposed design benefits from the strong points of both m:2 and m:3 compressors.
Output carries are always connected to the following compressors, situated in the next columns ( Fig. 7) . Although there are three partial products in the 21 st column of the second stage, a 4:2 compressor is inserted in this column. The reason is to avoid four partial products in the next stage. If we had put a Full Adder in that position, there would have been four partial products in the same column of the third stage. The utilized 4:2 compressor absorbs the previous carry signal and prevents from sending it to the next stage. As a result, the second phase is completed in three stages (Fig. 7) , the same as the way other ones do ( Fig. 5 and Fig. 6 ).
Furthermore, the third phase of multiplication, final addition, directly affects the entire performance. This is due to the fact that the third phase might somehow fade away the parallelism of the second phase. In spite of some improving techniques such as Carry-Lookahead Adder (CLA) and Carry-Skip Adder (CSA), carry propagation cannot be entirely eliminated in the final phase. Therefore, the third phase is as important as the second one, and it cannot be ignored. To consider the worst case scenario, a simple ripple adder is considered in this paper for the final addition. Moreover, to reveal the importance of the third phase, simulations are performed twice: One time, the second phase is solely simulated and once again with considering the final ripple adder. Implementation results will be given in the next section.
It is also worth mentioning that although the m:2 compressors have higher bit-rate compression, they might not necessarily result in higher speed for the entire multiplier. The reason is that despite the fact that an m:2 compressor (5m10) produces fewer partial products, they form a longer critical path compared to the equivalent m:3 (5m10) compressor. For example, τ 10:2 =τ 10:3 +τ FA (Fig. 3(g) and Fig.  4(g) ). Thus, the speed of the whole multiplier depends on the critical path from the first stage to the last Full Adder of the final ripple adder.
IV. IMPLEMENTATION RESULTS
All of the multipliers are implemented by structural VHDL code and synthesized by Synopsys synthesis tool without performing any specific optimization. The synthesis process is carried out with 90nm CMOS technology in 1V power supply. As mentioned earlier, the initial simulation results do not include the ripple adder (Table III) . In this case, the proposed multiplier is the slowest one. However, when the final ripple adder is also applied (Table IV) , the new hybrid architecture operates 0.22% and 5.12% faster than the first and second multiplier, respectively. It also consumes 7.73% less power than the multiplier by m:2 compressors due to the elimination of carry signals in the first stage. Eventually, the new design occupies 1.2% less area than the multiplier by m:3 compressors. GND   GND   f28  f29  f30  f31  f24  f25  f26  f27  f20  f21  f22  f23  f16  f17  f18  f19  f2  f3  f4  f5  f6  f7  f8  f9  f10  f11  f12  f13  f14  f15  f0  f1   f32   0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30 
V. CONCLUSION
The proposed hybrid 16-bit16-bit multiplier is an excellent trade-off between the ones which use only m:2 or m:3 compressors. While it is the fastest design, it consumes less power than one of the designs, and it occupies less area than the other one. The usage of m:3 compressors leads to fewer interconnections. On the other hand, m:2 compressors reduce partial products quicker. Their combination is considered in the new design. Another important conclusion is that although the second phase is known to be the most critical one, the correct analysis must include the second and third phases together.
