The inner product of two vectors might be one of the most frequently used mathematical operations in digital computation. The design style of inner product processor will become a critical issue of performance. So does its basic building block, i.e. 3-2 compressor. In this work, improved designs of 3-2 C 2 PL-based compressors are presented which can be used to build a fast inner product processor. The features of our compressors include a short delay minimized by HSPICE optimization, less transistor count, and high fan-out.
INTRODUCTION
In digital computation, the inner product of two vectors might be one of the most frequently used mathematical operations [1] . If the vectors' dimension is large, then the carry propagation of the inner product will likely become the critical delay. Many high-speed logic design styles have been announced to resolve the propagation delay caused by inner product. However, these logics suffer from different difficulties. For example, domino logic [2] cannot be non-inverting; NORA [3] has the charge sharing problem; all-N-logic [4] and robust single phase clocking [5] cannot operate correctly under clocks with short rise time or fall time, which cannot be easily integrated with other part of logic design; single-phase logic [6] and Zipper CMOS [7] contain slow P-logic blocks. Complementary pass-transistor logic (CPL) proposed by Yano et al. [8] is twice as fast as conventional CMOS, whereas it needs more area in silicon like the conventional CMOS due to the mixed interconnection. Moreover, noise margin and speed degradation caused by the mismatched input signal level and the logic threshold voltage of the CMOS driver needs to be taken into consideration when the CPL is implemented. Though Zhang et al. proposed a C 2 PL (complex CPL) and demonstrated that the problems of CPL are all fixed in Ref. [9] , several physical design factors are not fully considered or implemented. First, the sizes of the NMOS transistors for pass logics are impossible to be minimal. Second, the driving inverters' sizes have to be properly tuned. Third, the original design in Ref. [9] not only gives a poor fan-in and fan-out capability, but also produces very asymmetrical rise and fall delay which will very much likely cause glitch hazards and unwanted power consumption. In this paper, we propose an improved 3-2 compressor to resolve all the problems mentioned in the above. The HSPICE simulation results are presented to verify the correctness of our observation. 
FRAMEWORK OF IMPROVED COMPRESSORS

Basic Compressor Building Block Design
where F denotes ðA%CinÞ: As shown in Fig. 1 , the logic structure of a typical 3-2 compressor can be split into two logic layers. One of the three inputs, i.e. B(B 0 ), is not required in the first logic layer. The existence of unequal delays in the 3-2 compressor paves the way for us to reduce the total delay of inner product computation by arranging the input signals to the 3-2 compressors inside the compressor tree in a proper order.
Prior 3-2 Compressor Design
Though a 3-2 compressor could be realized by a full adder, and Zhang et al. [9] proposed a C 2 PL design for 3-2 and 7 -3 compressors, several design issues as addressed above were ignored in their work. Figures 2 and 3 shows the schematic diagrams for the two types of 3-2 compressors based on complex CPL (C 2 PL) proposed in Ref. [9] . To test if these two 3-2 compressors have enough fan-out, we add an 0.1 pF capacitor at the output side and use HSPICE to perform simulations. The results are shown in Figs. 4 and 5 .
It is obvious that Zhang's original design does not have enough driving capability. Hence, it is not suitable to cascade or construct an inner product processor. However, the original design still possesses its advantages: first, these two 3-2 compressors are functionally correct; second, the number of transistors is fewer than the traditional full adder; third, these two 3-2 compressors do not contain two logic layers shown in Fig. 1 .
Improved C 2 PL 3-2 Compressors
We, thus, try to improve the original design to achieve three goals:
(1) Minimize the delay time to make a single 3-2 compressor as fast as possible. 
Minimize the Delay Time
These two 3-2 compressors both have three inputs: "A", "B", and "Cin"(carry in). However, all the three different inputs in Figs. 2 and 3 need special buffers because they all connect to the source of a transistor. It is necessary to tune the size of each buffer, particularly "Cin" pin, because the pins need to provide nearly ideal power source to drive their own loads. Besides, in order to analyze the relation of input and output more precisely, we choose the input vectors carefully to ensure that the output will respond to the switching of only one input. Figure 6 is an example to show how we measure the delay. We use HSPICE to measure the delay between each input and each output. The delay measurement is tabulated in Table I .
Increase the Fan-out
According to the analysis of Zhang's 3-2 compressor, Zhang's original design is verified to possess poor fan-out.
To overcome this shortage, the output buffers are re-tuned to make the compressor have an enhanced fan-out capability. The size measurement is given in Table II wherein column "invsum" and "invcout" reveal our result. compressor is eight transistors less than the corresponding prior compressor. Consequently, the area of 3-2 compressors is reduced.
Reduce Transistor Count
PHYSICAL IMPLEMENTATION
We use Taiwan Semiconductor Manufacturing Company (TSMC) 0.6 mm 1P3M technology to carry out the improved 3-2 compressors. The schematic diagrams for the improved 3-2 compressors are shown in Figs. 7 and 8 , respectively. Table II shows the transistor sizes of the two 3-2 compressors, respectively. Figures 9 and 10 demonstrate the impressive simulation results of our new designs. The simulation results prove the fan-out is strengthened given a 0.1 pF load. Furthermore, the delay is drastically reduced.
CONCLUSION
In this paper, two improved designs of 3-2 compressors are presented. The improved 3-2 compressors are proposed to overcome several problems appearing in Zhang's work [9] . Our simulation results show that the improved 3-2 compressor is capable of driving large loads, and the transistor count is reduced. The improved 3-2 compressors become very solid cells to construct an inner product processor [1] .
