INTRODUCTION
One major difficulty encountered when scaling CMOS tecnhnology is the fact that the 60mV/decade minimum subthreshold slope (SS) of CMOS devices makes it impossible to lower the threshold voltage without producing unacceptable off-state leakage currents. Consequently, supply voltage cannot be reduced without significantly degrading circuit speed. This results in power density problems for high performance applications requiring nominal supply voltages and energy inefficiency in low voltage applications. The latter is related to the large delay increments which rise energy associated to leakage current so much that any advantages obtained by scaling dynamic power with supply voltage are cancelled out. Intensive research is being conducted into devices with steeper subthreshold slopes (SS<60mV/dec). A smaller SS makes it possible to lower threshold voltage while keeping leakage current under control, facilitating low voltage operation with acceptable speed and thus generating savings in power and energy.
Tunnel transistors are one of the most attractive steep subthreshold slope devices [1] - [4] . Subthreshold swing under 60mV/dec has been experimentally obtained in different material systems. Research on group III-V TFETs has been advancing rapidly in recent years since this type of transistor has higher ON currents than TFETs made from group IV materials (Si or Ge). The limited ON current is, in fact, one of the major uncertainties of these devices. However, projections now exist for ON currents of 1900 A per micrometer of channel width with 0.4 V supply voltage [5] , which would be competitive with respect to high performance MOSFETs. More recently, band-to-band tunnel field-effect transistors, based on two-dimensional transition metal dichalcogenides semiconductor, that exhibit average subthreshold swing of 31.1mV/dec for four decades of drain current, at supplyvoltage of 0.1V at room temperature have been demonstrated [6] , which paves the way for improved TFETs. At the circuit level, there are design challenges associated to the distinguishing characteristic of TFETs with respect to MOSFETs, including super-linear onset, ambipolarity, enhanced Miller capacitance effect due to dominance of gate to drain capacitance or asymmetric conduction which are being addressed [7] - [11] . Pipeline is a well-known optimization technique to increase operating frequency and throughput. Pipeline is a generic technique which breaks a task into a sequence of smaller subtasks (stages), dedicates separate resources for the individual sub-tasks and operates such that at any point of time, all the units are carrying out some subtask. That is, data moves through the stage like parts move through a factory assembly line and all stages operates concurrently. At the circuit level, the pipeline concept is implemented adding registers to cut down combinational paths in order to increase clock frequency. Pipeline can be also applied to reduce power consumption if it is used not to increase operating frequency but to reduce supply voltage (VDD) while maintaining speed. The amount by which supply voltage can be lowered for a particular architecture depends on how much speed degrades with supply voltage reduction. That is the delay versus V DD behavior of the logic gates determines the power savings that can be achieved by applying pipelining techniques. It is well known that delay versus V DD behavior of tunnel transistor gates very much differs from that of the CMOS gates and so Fig. 1 . Circuit used to evaluate the impact of the fan-in on the performance of an inverter, NAND2 and NAND3 gates. of merit, we use the average of these delays. Fig. 2 depicts FO4 versus V DD for the three gates using both transistor technologies. It is observed that TFET gates are faster for low supply voltages values but exhibit larger delays for the higher V DD values. In fact, it is widely accepted that TFET are useful at moderate frequencies. At larger frequencies, no power savings are obtained or even the TFET solution does not work. Fig. 3 depicts average power for each gate versus frequency. For each frequency the minimum V DD at which that frequency is achieved has been used. Results correspond to switching activity ( ) equal to 0.1 and assume the gate is operated within a logic circuit with logic depth (LD) equal to 50. Depicted curves illustrate previous statements. For the inverter and the NAND2, there is a frequency range in which TFET gates are clearly advantageous in terms of power (and also energy) while they consume more over a given frequency. On the other hand, CMOS NAND3 power consumption is always larger than its TFET counterpart. Finally note that CMOS can operate up to a higher frequency. However, the shapes of the CMOS and TFET delay curves ( Fig. 2) are quite different. The latter ones exhibit, in comparison with the CMOS curves, wider flat regions in which delay smoothly increase when VDD is reduced. This translates in very steep power rising over a given frequency. Opposite, larger speed degradations occur when V DD is slightly reduced from 0.8V for CMOS gates. The differences of supply-voltage-speed behavior discussed motivates the analysis carried out in next section.
III. IMPACT OF LOGIC DEPTH ON GATE PERFORMANCE
Average power versus frequency curves have been obtained at different logic depths (LD=12.5, 25, 50) for each of the gates under characterization. Fig. 4a compares results for LD=25 and LD=50 for the inverter. Note that each point in the curves is associated to a V DD value (minimum V DD value required to work at that frequency). First point on the left corresponds to V DD =0.2V and last point on the right to V DD =0.8V. V DD values for any two consecutive points in one curve are separated by 0.05V. TFET curves start (on the left) at higher frequencies than their CMOS equivalents because of the speed advantages of TFET with respect to CMOS for low supply voltages. Operating TFET architectures at a frequency under their first point in the curves can be achieved lowering the supply voltages under 0.2V, the smallest value in our experiment.
As expected, frequency range in which TFET is advantageous is larger for LD=25. That is, this frequency region widens when logic depth is reduced. For a given frequency, power associated to LD=25 is lower than power associated to LD =50. It is more interesting the analysis of how much power savings are achieved by lowering LD from 50 to 25 in each technology. For that, let's consider the frequency at which CMOS-50 power and TFET-50 power equalizes (point A in Fig. 4a, FA) . It can be observed that, at that frequency, power of TFET-25 is smaller than power of CMOS-25. Hence, power savings achieved are larger in the TFET technology. TFET-25 consumes around 13% (saves 87%) of the TFET-50 power, while CMOS-25 consumes around 70% (saves 30%) of the CMOS-50 power. This result can be explained on the basis of the delay versus V DD performance discussed in previous Section. Note that in order to operate at the same frequency, the maximum allowed delay of the inverter in the designs with LD=25 is twice the delay in the designs with LD=50. In the TFET implementation, this translates in that V DD can be reduced from 0.65V (fourth point from the right in the TFET_50 curve) to around 0.25V (second point from the left in the TFET-50 curve). In the CMOS case, V DD does not reduce so much. It can be lowered to a value between 0.55V and 0.6V. The large difference in supply voltage level explains the huge power savings. It can be also observed in Fig. 4a that there is a frequency range in which power of TFET-50 architecture is smaller than the power of CMOS-25 architecture. Note that assuming LD=50 is the original circuit, the other two LD values can be interpreted as pipelined versions with two stages (LD=25) or four stages (LD=12.5) with ideal pipeline registers. Of course there are power and delay overheads associated to the actual pipeline registers that should be taken into account. Next Section evaluates the impact of applying pipeline and compares results for both technologies.
IV. IMPACT OF PIPELINE ON CIRCUIT PERFORMANCE
Previous discussion assumes pipelining registers are ideal. Now we estimate the actual power savings when power overhead of flip-flops is taken into account. For that, consider one flip-flop consumes as much power as 10 gates. This is a conservative estimation justify on the basis of widely used static implementations of a D flip-flop which requires 8 to 10 gates. Power reduction depends on the particular circuit into which pipeline is applied. In particular, it depends on how many flip-flop are added to support pipeline in relation to the number of gates in the circuit. Let's Nff be the number of flipflops, P avg the average power consumption of one typical logic gate and N gates the number of gates in the original circuit.
Power savings associated to the use of pipeline can be estimated by: P=P WP -P P (1) Where P WP stands for the power of the circuit without pipeline and P P is the power when pipeline is applied and can be evaluated:
, ( ) 
where V DD , WP (V DD , P ) is the supply voltage level required to operate the non pipelined circuit (the pipelined circuit).
The relation between P avg (V DD,P ) and P avg (V DD,WP ) was analyzed in previous Section. It depends on the technology, target frequency and number of pipelines stages. For the incorporation of one level of pipelined stages at the frequency at which the power of CMOS and TFET are equal, we obtained:
, ,
For TFETs:
This corresponds to taking the average value of the power reductions reported in Table III , 90%, which is equivalent to the relationship in previous equation. Average power without pipeline is ten times the power consumption with pipeline. Equivalently, considering average power reductions of 31% for CMOS: 
, For CMOS: (0.5 10 ) ( )
In TFET technology there are going to be power savings ( P>0) if N ff <0.9N gates , or the number of flip-flops is smaller than 90% of the number of gates. In CMOS, the number of required added flip-flops to achieve power advantages is quite more restricted: N ff <0.05N gates , or smaller than 5% of the number of gates in the circuit. Fig. 5 shows power reductions (100· P/P WP ) for different values of N ff /N gates for both technologies. Differences are evident. Finally, in order to support our previous analysis, we have carried out an experiment involving a simple adder tree. It has 
Inverter

LD=50
Power (μW) VDD (V)
LD=25
Power (μW) VDD (V)
LD=12.5
Power ( been implemented both as a purely combinational circuits (C) (Fig. 6a) and as two stage pipelines (P) (Fig. 6b) in each technology. Each adder in those figures is a ripple carry adder built from full adders implemented using inverters and NAND gates, as it is depicted in Fig. 6c . Pipeline registers are realized with D flip-flops . A well-known implementation with 8 gates has been used. A frequency target has been selected and the minimum V DD value achieving it for each of the four designs: C-CMOS, C-TFET, P-CMOS and P-TFET, has been determined. Fig. 7 shows waveforms corresponding to the simulation of the P-TFET version biased at 0.3V. Inputs transition has been applied to excite the critical path of the RCA adders. The expected behavior of the output carry of the second level adder for the applied stimulus is the alternation of logic 0 and logic 1 values. Correct operation is observed in the shown waveforms. P-CMOS fails to work at the target frequency at this supply voltage. Similarly to experiment described in Section III, the power reduction obtained by pipelining the designs have been evaluated through simulation of a long sequence of random input patterns. Table IV To further analyze these results, it is necessary to take into account the N FF /N gates value which is 0.047 (or 4.7%). This value is very close to the 5% at which power benefits disappears in the CMOS technology. According to the previous analytical model, this value is very close to the 5% at which power benefits disappears in the CMOS technology. Concerning, TFET designs, a slightly lower power reduction than the around 85% power reduction predicted by the analytical model for this flip-flop to gate ratio. Several considerations are in order to evaluate the results. First, power consumption of flip-flops was estimated to be ten times the power consumption of an average gate while an actual flipflop has been simulated to obtain power figures in this experiment. Second, the power consumption at VDD,P was estimated to be 10% of the power consumption at V DD,WP from the experiments at the gate level. In those experiments, V DD,P was 0.4V under V DD,WP , however it is only 0.35V for the adder example. This is a consequence of the fact that now pipelined registers have been also simulated. Critical path delay constraint that must be fulfilled now by the pipelined design is not the half of the critical delay in the combinational version, but set-up or propagation times of the flip-flops are added. Finally, 
Power savings adders Power flip-flops (%)
P-CMOS 33% 43%
P-TFET 87% 40%
V. CONCLUSIONS
The experiments carried out confirm that supply voltages reductions, achieved when reducing logic depth while keeping speed, are larger in TFET circuits than in CMOS, as expected from the delay versus V DD behavior exhibited by TFET logic gates due to the steep subthreshold slope of tunnel transistors. It has been shown that applying pipeline to decrease logic depth is an efficient power reduction technique for tunnel transistors circuits. Power savings close to 80% have been obtained for a two-stage adder tree used as an example circuit. This result suggests that architectural issues should be considered in the evaluation of this type of transistors. That is, benchmarking for determining the range of frequencies at which a given tunnel device is competitive in terms of power or energy, and so assessing its application domain, should not be limited to compare logic gates or identical circuit structures since the impact of techniques like pipelining can be very different in both technologies.
