Abstract. 3-input AND/XOR is the basic complex gate of Reed-Muller logic. Low energy consumption is important for Reed-Muller logic circuit implementation. Against the drawbacks of the published gate-level and transistor-level 3-input AND/XOR gate design in power and power delay product (PDP), a low energy consumption 3-input AND/XOR gate is proposed by employing multi-rails and hybrid-CMOS techniques to improve its speed and short the signal transimission path. Under 55nm CMOS process, post-simulations in different process corners are carried out by using HSPICE and compared with the published circuits. Simulation results show that the proposed circuit has advantages over published designs. For typical process corners, the improvement of the proposed circuit can be up to 27.21%, 19.23% and 35.39%, respectively, in terms of power, delay and power delay product.
Introduction
As the rapid development of design technology and CMOS process shrinking, the design concerns arise to comprehensive requirements of performance, area and power. Currently, almost all of the digital circuits are implemented by the traditional Boolean (TB) logic based on AND, OR and NOT. Actually, digital circuits can also be implemented by Reed-Muller (RM) logic based on AND/XOR. Studies show that statistically half of the circuits implemented by the RM logic can achieve better performance than those by TB logic. Moreover, compared to the TB logic, RM logic has the following advantages: firstly, RM logic circuit is easy to be mapped to the field programmable gate array (FPGA) and has good testability properties [1] - [2] , which provide an effective way to solve the problem of "verification" in integrated circuit design; secondly, RM logic is much simpler in the implementation of some logic functions such as arithmetic components and parity function [3] , which not only brings about smaller area, but also has the potential advantages of power and speed. However, RM logic was not widely used because of the absence of corresponding complex gate circuits and EDA tools. In recent years, some 3-input AND/XOR gates based on gate level and transistor level are published, but far from satisfaction in terms of power consumption, delay and Power Delay Product (PDP).
In this paper, a novel transistor level based 3-input AND/XOR gate design is proposed. Under 55 nm CMOS process, the post-simulations of the circuit under different process corners are carried out by HSPICE and compared with the published circuits. The simulation results show that the proposed circuit has advantages in power consumption and delay.
Published 3-Input AND/XOR gates
3-input AND/XOR is the basic complex gate of RM logic that performs the following equation:
XOR and OR, denoted by and respectively, are binary operations. According to the structure of 3-input AND/XOR, its design method can be divided into two paths: gate level and transistor level.
Gate level-based 3-input AND/XOR gate is implemented by cascading one AND (or NAND) gate and one XOR (or XNOR) gate. In recent years, researchers already place a high value on XOR-XNOR circuits because they are basic building blocks in various circuits especially in Arithmetic circuits, Parity Checks, Error-detecting.
Depending on the difference in XOR gates, various kinds of AND/XOR gate circuits are proposed in Fig. 1 (a)-(c). In Fig. 1 (a) , a 3-input AND/XOR gate is implemented by cascading one NAND gate and one XOR gate in complementary CMOS structure [4] [5] , which consists of a pull-up PMOS network and a pull-down NMOS network. As a result, the gate circuit has a symmetrical structure and operates with full output voltage swing, but it requires significant transistors and high power consumption. In Fig. 1 (b) , a 3-input AND/XOR gate [6] is implemented by cascading one AND gate and one XOR gate [7] in pass-transistor logic (PTL) structure. PTL allows the source terminal of the MOS transistor is connected to input line. Consequently, the number of transistors and the related parasitic capacitance are reduced. However, PMOS and NMOS have poor performance on transmitting the signal "LOW" [8] . In addition, a static CMOS inverter is cascaded to improve the driving capability of PTL circuits. Based on TG logic, a 3-input AND/XOR gate is implemented by cascading one NAND gate and one XOR gate [9] , as shown in Fig. 1 (c) . The XOR gate is composed of two transmission gates and three static CMOS inverters. Thus, it has a full voltage swing at all nodes for all input patterns, so that it can be operated at lower supply voltage. Similarly, the inverters enhance the driving capability.
Based on transistor level, a 3-input AND/XOR gate [10] is designed as shown in Fig. 1 (d) . It is composed of one XOR gate, two transmission gates and two static CMOS inverters. The input signal A and its complementary input signal control two transmission gates so that only one TG can work at the same time. This design has a full voltage swing at all nodes for all input patterns and negligible short-circuit power dissipation, which lead to lower power consumption. Furthermore, two static CMOS inverters are cascaded after the circuit to improve the driving capability.
Based on the discussion above, the published AND/XOR gates have different disadvantages, the complementary CMOS logic based 3-input AND/XOR gate circuit requires most numbers of MOS transistors and has a high power consumption; the PTL or TG based 3-input AND/XOR gate circuits have long critical paths and the more numbers of internal nodes consume extra power consumption; the transistor level based 3-input AND/XOR gate circuits has a long critical paths. Therefore, the performance of AND/XOR has yet to be further improved.
Proposed 3-Input AND/XOR gate
Based on hybrid-CMOS techniques, a 16T 3-input AND/XOR gate circuit is proposed, as shown in Fig. 2 (a). The circuit contains two substructures and has a full voltage swing at all nodes for all input combinations.
The basic function of the gate circuit is as follows. When AC=00, P1 and P3 turn on and pass a strong "HIGH" signal level to node Q and then a strong "LOW" to the output. When BC=00, P2 and P3 turn on and pass a strong "HIGH" signal level to node Q and then a strong "LOW" to the output. When AC=01, N1 and N2 turn on and pass a strong "LOW" to node Q and then a strong "HIGH" to the output. Under five input combinations above, only substructure 1 works. Due to the node Q is connected to power supply or ground, is has full voltage swing under lower supply voltage, and there is no direct path between power supply and ground, which reduce the power consumption.
When ABC=111, N3, N4 and N5 turn on and pass two weak "HIGH" to the node Q. In order to eliminate threshold voltage loss, the third transmission path, which include P4 and P5, is added, meanwhile, corresponding performance is also improved. When ABC=101, P4, P5, N3 and N5 turn on and N3 and N5 pass a strong "LOW" to the node Q and a strong "HIGH" to the output. When ABC=110, N3 and N4 turn on and pass a strong "LOW" to the node Q and a strong "HIGH" to the output. Under above three input combinations, only substructure 2 works, and all internal nodes have full voltage swing. Due to multi-rails of structure 2 when ABC=111 or 101, the equivalent resistance of transmission path is smaller, which improve the charge and discharge speed of output, then enhance the working efficiency of the circuit. The cascade inverter is used to strengthen the circuit driving capability. The proposed design is modular and simple, which simplify the layout and optimize chip area potentially. The layout of the proposed design is shown in Fig. 2 (b) . 
Simulation and comparison
The circuit simulation is carried out using HSPICE under 55nm CMOS process at 1.20V supply voltage. The operating frequency is 100MHZ. The gate area ratio of PMOS and NMOS are about 2:1 while the physical W/L sizes of PMOS and NMOS are 240nm/60nm and 120nm/60nm, respectively. To model the realistic environment, the input terminals are cascaded with twostage inverters and the output drives are four parallel inverters. The simulation test bench used is shown in To compare the performance difference among the proposed gate and the published ones in Fig. 1 , all of circuits are post-simulated under the same conditions. The average power consumption is measured during 100 operating cycles. The transient performances are characterized by 50% delay, which is defined as the time when the output voltage reaches the 50% of its steady state value. The PDP is used to evaluate the performance of the circuit. The improvements of power consumption, Delay and PDP are expressed by Power%, Delay% and PDP%, respectively. The comparison for circuits in Fig. 1 and Fig. 2 is shown in Table 1 . As shown in Table 1 , the proposed circuit has the lowest PDP and the improvement ratio of PDP can be up to 35.39%. The reason behind this is that the circuits like TG, HU and CMOS have cascaded structure, which increase the capacitance of internal nodes and power consumption and capacitance. Moreover, due to longer transmission path of cascaded structures, these circuits also have longer time delays. As the output of structure based on complementary CMOS is directly connected to power supply or ground, it has a good driving capability without the additional tailing inverter. For circuit LH, it has a larger delay because it needs pass the signal from the XOR to the transmission gate for some input patterns. In addition, the proposed circuit is the better and more competitive one.
In order to evaluate the capability of these circuits at different frequencies, the circuits are post-simulated at frequencies in the range from 100MHZ to 1400MHZ at 1.20V supply voltage. The curves of power and PDP versus operation frequency in the range from 100MHz to 1400MHZ are shown in Fig. 5(a) and (b) . It can be seen that the proposed circuit can work reliably and has advantage in power and PDP.
The simulations are also carried out to check scaling capabilities of power supply voltages. Fig. 6 shows the curve of PDP versus supply voltage varying from 1.6V down to 0.8V under 100MHz. It can be seen that the proposed circuit has the best voltage scaling capability in terms of PDP. 
Conclusion
In this paper, a transistor level 3-input AND/XOR gate is proposed. Multiple transimission paths are employed to elimate the threshold voltage loss and improve its performance. Moreover, in order to decrease its delay, the hybrid-CMOS techniques are used to short the signal transimission path. The 16T 3-input AND/XOR gate is post-simulated under 55nm CMOS process, and the results are compared with four published circuits in terms of power consumption, delay and PDP. Comparison shows the proposed AND/XOR gate has the lowest PDP compared with those published AND/XOR gates and the improvement ratio of PDP can be up to 35.39%. Moreover, the post-simulations under five different PVT combinations are also carried out and show that the proposed circuit has the lowest PDP.
