ABSTRACT We use mixed device-circuit simulations to predict the performance of 6T static RAM (SRAM) cells implemented with tunnel-FETs (TFETs). Idealized template devices are used to assess the impact of device unidirectionality, which is inherent to TFETs and identify the most promising configuration for the access transistors. The same template devices are used to investigate the V DD range, where TFETs may be advantageous compared to conventional CMOS. The impact of device ambipolarity on SRAM operation is also analyzed. Realistic device templates extracted from experimental data of fabricated state-of-the-art silicon pTFET are then used to estimate the performance gap between the simulation of idealized TFETs and the best experimental implementations.
I. INTRODUCTION
Tunnel-FET is one of the most promising candidates to complement or replace CMOS in ultra-low-power (ULP) applications [1] - [5] , featuring a sub-threshold swing (SS) below the 60 mV/decade limit of metal-oxide-semiconductor field-effect transistor (MOSFET) at room temperature. Steep I D -V GS characteristics with a minimum SS of 30 mV/decade [6] or 21 mV/decade [7] have been demonstrated experimentally. However, since such low SS is achieved only over a small voltage range and for a drain current (I D ) in the order of pA/µm, there is a significant lag between modeling projections and experiments [1] , [2] .
Nowadays, the Static RAM (SRAM) cell is one of the most relevant digital building blocks largely deployed as on-board cache in processors (occupying up to the 70% of a processor area [8] ). To estimate the impact of using TFETs on the performance of SRAM is thus an important step to assess their deployment in advanced digital circuits [9] - [17] .
Since the availability of compact models for TFETs is limited, mixed device/circuit simulation decks [18] , [19] are a powerful alternative to analyze simple circuits using a microscopic description of the devices.
We report here on TCAD mixed device-circuit simulations of TFET based SRAM cells, implemented with different TFETs structures with the aim of assessing the impact of some specific features of TFETs on the static and dynamic SRAM performance.
One of the main intrinsic limitations of TFETs stems from the asymmetry of drain (D) and source (S) regions that makes I D inherently unidirectional. This is critical in SRAM cells since the access transistors should be bi-directional [8] - [17] . To this regard, we address in Section II the issues related to the current unidirectionality considering idealized template devices that are far from what can be fabricated now since they feature dimensions comparable to ultra-scaled MOSFETs and the Band-to-Band Tunneling (BtBT) rate is increased to provide drain currents comparable to CMOS. We use such idealized TFETs because they give us several opportunities: 1) exploring hybrid circuit topologies based on both TFETs (in the inverters) and conventional MOSFETs (as access transistor); 2) making a meaningful comparison between TFET and CMOS cells with comparable devices at different V DD ; 3) having an upper bound for the attainable performance, that can be used to benchmark more realistic simulations performed after calibration against experimental results.
In Section III we consider structures and model parameters calibrated on fabricated devices. We resort to a simulation deck [17] calibrated on ambipolar devices [6] to analyze the impact of TFET ambipolarity on SRAM cells. Then, we perform a new calibration on state-of-the-art experimental strained-Si pTFET nanowire to evaluate the performance mismatch between idealized and experimental devices.
II. EFFECT OF THE UNIDIRECTIONALITY CONSIDERING IDEALIZED TEMPLATE DEVICES
In this section we use idealized devices to analyze the impact of TFET unidirectionality on the performance of the cell and identify the best cell architecture. Realistic devices will be considered in the next section.
A. IDEALIZED TEMPLATE DESCRIPTION
The n-type SiGe/Si ( Fig. 1(a) ) and p-type strained-Si ( Fig. 1(b) ) TFETs considered in the mixed device/circuit simulations of this section have been designed in [4] . Fig. 1 shows also the corresponding band diagrams, highlighting the direction of electrons tunneling from the valence band (VB) of the source to the conduction band (CB) of channel region in the on-state NTFET ( Fig. 1(c) ), and from the channel region VB to the source CB in the on-state PTFET ( Fig. 1(d) ). Fig. 1(a) .
Tunneling conduction mechanisms are taken into account in the TCAD simulator by means of a (static) nonlocal tunneling model, activating the BtBT option [18] . The adjustable calibration parameters are the tunneling masses m c and m v and the scaling factors g c and g v for the generation/recombination terms that are added to the carrier continuity equations [18] . Even if a more physically accurate dynamic non-local BtBT model is available, we chose the static one since it is computationally more robust in the mixed device/circuit scheme.
The same base structure in Fig. 1 , which is here interpreted as a 2D cut of a 3D nanowire (NW), has been used to implement also n-and p-type conventional MOSFETs for comparison purpose. The use of idealized models and templates for both TFETs and MOSFETs assures a fair comparison between these two competing technologies. In fact, due to poor maturity of fabricated TFETs, it would be unfair to benchmark experimental TFETs against experimental CMOS.
The I D -V GS and I D -V DS characteristics of the considered transistors are reported in Fig. 2 . The gate work-functions have been adjusted to match the off-current LSTP (I OFF = 10 pA/µm) [3] , [4] . For such condition, the low SS value of the TFETs is confined at low current levels, and this would make them competitive over CMOS only for very low V DD applications (below ∼300 mV), since for larger supply voltages their current drivability would become much lower than the one of CMOS.
Even if this choice meets the ITRS specifications, it makes it difficult to highlight the TFET potentialities because the steepest part of the simulated I D -V GS corresponds to negative V GS and the crossover voltage at which the TFET on-current is surpassed by the CMOS one is very low. In fact, while the CMOS I D -V GS features an almost constant sub-threshold slope, the TFET I D -V GS characteristics takes full advantage of its steepest part when the target I OFF is decreased. In this respect, Fig. 3 illustrates that the lower is the off-current, the larger is the voltage range where TFETs outperform CMOS. In the following of our analysis we will use I OFF = 10 pA/µm (except for a more in detail discussion presented in Section II-G), because lower off-current would be difficult to achieve in real devices due to the presence of additional leakage paths such as gate leakage and trap-assisted-tunneling.
The I D -V DS characteristics in the right plot of Fig. 2 point out that the TFET current is essentially unidirectional. When a n-type TFET is biased with a negative V DS there is only a small linear region where the current increases, but then it decreases quickly to zero. Furthermore, as the V DS approaches −0.6 V, the current increases again due to forward biasing of the parasitic p-n diode. Since in such condition the drain current is not controlled by the V GS , TFETs should not be used in circuit topologies enforcing these biasing conditions.
B. SRAM CELL DESCRIPTION
In the symmetric 6T SRAM cell sketched in Fig. 4 (6T), n-type and p-type TFETs are employed for the two cross-coupled inverters. M1 and M3 are the pull-down transistors (PD) whereas M2 and M4 are the pull-up transistors (PU). Due to unidirectional current of TFETs, three alternatives are considered for the access transistors (AT) M5 and M6: the first two employing n-type TFETs, the last one employing n-type MOSFETs. The TFET ATs can be either (a) inward facing (I-AT), or (b) outward facing (O-AT) [9] - [17] . Since the two configurations with TFET-ATs suffer from the limitation of asymmetric current flow, we also investigate a hybrid TFET/CMOS SRAM cell using conventional MOSFETs as ATs (c) [16] .
The impact of I D unidirectionality on the I-AT and O-AT configurations has been already addressed by other groups [9] - [15] , with a consensus on the fact that a full-TFET symmetric 6T SRAM is not properly working. Therefore, modified SRAM architectures have been proposed to overcome the unidirectionality issue: asymmetric 6T cell with one I-AT and one O-AT and write-assist technique (WA) [9] , 7T cell with O-ATs and one additional transistor for the read [10] , 6T cell with p-type I-AT and read-assist (RA) [11] , 8T and 10T Schmitt-Trigger cells [12] , [13] , a 7T driver-less (DL) cell [14] and a 8T hybrid TFET/CMOS cell [15] .
In [16] we have shown that a proper cell sizing and the BL pre-charge ( BL ) to V DD /2 allows one to make the O-AT configuration working. We extend here this analysis with the aim to compare such trade-off on the O-AT 6T cell with the 8T cell sketched in the lower plot of Fig. 4 (8T). The 8T configuration is basically a 6T topology with O-ATs employed only for the write, and with two more transistors to perform the read. As a result, the write and read operations are effectively decoupled leading to a robust solution to the unidirectional I D issue. The 8T cell has been employed in other works as a reference topology to benchmark more innovative schemes [14] , [15] .
C. DEFINITION OF THE STATIC AND DYNAMIC FIGURES-OF-MERIT
The aim of this sub-section is to briefly review the write and read operations of a 6T SRAM cell to facilitate the forthcoming discussions.
The memorization element of a SRAM cell is represented by the two cross-coupled inverters storing the data as high and low voltage levels in the Q and QB nodes (see Fig. 4 ), while the two ATs allow to force or to access the stored data during the write and the read operations, respectively.
The symmetry of the 6T cells makes the write, read and hold operations symmetric with respect to the differential stored logic values: '0' (Q = 0 and QB = V DD ) and '1' (Q = V DD and QB = 0). This is not the case of any fabricated SRAM cell, where variability leads to differences in the mirrored parts of the cell. However, since the implications of process variability are beyond the scope of this work, we consider here nominal devices and treat the cells as perfectly symmetric.
The static performances are evaluated by means of the half circuit voltage transfer-characteristics (VTC) method [8] , which requires the V Q = f(V QB ) static characteristic taken for various BL and WL voltage levels. The Static Noise Margins in hold (HSNM) and read (RHSN) operations are calculated as in [8] (Fig 5(a) ). To compute the write SNM (WSNM), the butterfly curves are obtained for two different conditions, that is for the two differential voltage levels of the BLs (0 and V DD ) when the WL is at V DD ( Fig. 5(b) ). For bidirectional ATs, the deformation of these curves with respect to the inverter VTC is stronger for the inverter with the AT connected to the bit-line driven at low voltage level. In fact, even if both of them appear deformed due to on-state ATs, the drivability of the n-type AT is larger when BL is 0 V. It follows that the change of cell status is forced mainly by the side with the bit-line at low voltage level. The write butterfly graph of a well-sized cell features only one crossing point between the VTCs that coincides with the logic value to be written. The more deformed are the VTCs with respect to the butterfly of the inverter, the more robust is the write. Therefore, the WSNM is the size of the minimum square between the writing VTCs (Fig. 5(b) ). Unidirectionality of the ATs severely limits their drivability when biased with a negative V DS and this affects the time of the read and write operations. For this reason, beside the SNMs we compute also the write and read delays, defined respectively as the time needed by the storage node Q to flip from 0 V to 90% of V DD and as the time at which the difference V BL -V BLB achieves the 10% of V DD . These delays are computed by means of transient simulations performed on the full cells (including all the six transistors). Two capacitors of 20 fF 1 schematically describe the parasitic capacitance of the BLs.
D. RESULTS: WRITE OPERATION (6T CELLS)
Considering the WSNMs for the 6T cell in Fig. 6(a) and (b) , the O-ATs appear to perform better than the I-ATs. In fact, the write is performed by forcing the BL pair to differential levels of '1' and '0' before raising the WL to '1'. It follows that only one of the two unidirectional ATs can propagate the data, that is only the logic '1' with the I-AT and only the logic '0' with O-AT. Furthermore, since only transfer of the '0' is really efficient with an n-type transistor, the cell with the O-AT has a better write-ability than the cell with the I-AT, where a successful write can be performed only by sizing the ATs more than ten times larger than the nTFETs of the inverters ( Fig. 6(a) ). The arguments discussed above are still valid for p-type ATs, just substituting the n-type O-AT with the p-type I-AT, since p-type transistors propagate the '1' better than the '0' [11] . Conversely, the write in the hybrid solution ( Fig. 6(c) ) is quite similar to the CMOS case ( Fig. 6(d) ), where both the ATs can propagate the differential data (although with different strength). Interestingly, although with the nMOS ATs it is possible to force the data from both sides, by comparing the WSNMs of Fig. 6(b) and (c), we observe that the O-ATs WSNMs are larger than in the hybrid solution. This is due to the better drivability of the nTFET with respect to the nMOS at V DD = 0.3 V. In fact, the hybrid solution is strongly affected by V DD scaling, due to the different shape of the I D -V GS curves of the nMOS-AT and of the nTFET PD (Fig. 2 ).
E. RESULTS: READ OPERATION (6T CELLS)
Given that O-ATs feature a better write than I-ATs, we discard the I-AT configuration since, as demonstrated by the Fig. 6(a) , it cannot feature both WSNM and RSNM sufficiently larger than zero for a given cell sizing.
At the same time, the read operation with the O-ATs cannot be performed correctly if the BL pair is precharged to V DD because the O-ATs has a negative V DS . Consequently, even if the RSNM is essentially equal to the HSNM (Fig. 6(b) ), the cell is isolated, since the O-ATs are off, resulting in a huge read delay of about 10 ms at V DD = 0.3 V (not shown).
In [10] the read-ability with O-ATs is assessed assuming a BL = 0 V, coming to the conclusion that neither I-ATs nor O-ATs provide acceptable write and read SNMs for a given sizing. However, the O-ATs configuration features good SNMs and reasonable delays if the read is performed with BL = V DD /2. As shown in Fig. 6(b) , a W AT /W PD ratio close to 2 results in acceptable R-and W-SNMs. Furthermore, even if the C BL(B) connected to the O-AT related to storage node with '0' remains stuck at V DD /2 after the rising of WL voltage, the opposite O-AT pulls the corresponding C BL(B) from V DD /2 toward V DD , thus enabling a differential sense amplifier to detect the data stored into the cell. Fig. 7 summarizes the dynamic performances of the 6T TFET SRAM cell with O-ATs, of the hybrid case and of the CMOS cell. The read with BL = V DD /2 is performed only for the O-ATs configuration, while for the other configurations BL = V DD is used. In fact, in the full CMOS and hybrid configurations the read at half V DD does not lead to an appreciable reduction of the read delay as in the case of O-ATs configuration. According to Fig. 7 , the 6T SRAM with O-ATs becomes the best 6T choice when V DD is scaled down below 300 mV, considering that in the hybrid case the nMOS AT should be sized ten times the TFET to guarantee the write operation. For this reason, the associated delays are not shown for V DD = 0.2 V. However, for higher supply voltages the CMOS cell shows better performances.
F. READ AND WRITE IN THE 8T CELL
The 8T SRAM cell is a robust solution for both CMOS and TFET technologies since write and read operations are decoupled. In the TFET version (Fig. 4 (8T) ) we employed outward ATs for writing. The write operation is in fact very similar to that in the 6T cell (the impact of the capacitive load represented by the M7 read transistor is negligible), thus the WSNM of 8T TFET and 8T CMOS cells are practically the same as reported in Fig. 6 (b) (6T O-AT TFET cell) and 6(d) (6T CMOS cell), respectively. On the other hand, the read is performed through the stack represented by the n-type transistors M7 and M8. The stored data in QB determines whether M7 is in the on-or in the off-state. Thus, when the line capacitor of BL (R) is pre-charged to V DD and the WL (R) is activated, it is possible to interpret the QB value according to whether the C BLR discharges or remains stuck at V DD . Since the state of the cross-coupled inverters is not perturbed during the read, the RSNMs in the 8T cells correspond practically to the HSNMs of the corresponding 6T cells. Furthermore, the RSNMs of the 6T cells corresponds to the immunity margins of the (8T) half-selected cell [15] .
Since the static behavior can be derived from the 6T SNMs, and the write is in fact the same as in 6T, the benchmark of our proposal (i.e., O-AT 6T cell with read at V DD /2) with the TFET 8T cell (and the corresponding CMOS topologies) reduces to a comparison of the read delays reported in Fig. 8 . The 8T performs better than the 6T with pre-charge at V DD /2 since the boundary V DD for which TFET cells outperform the corresponding CMOS increases by ∼50mV. However, since the 8T solution leads to an area penalty by two further transistors, one can still use the 6T TFET with pre-charge at half V DD . If performance is more important, the 8T TFET is preferable.
In [15] , where the proposed topology featured a much slower write delay than the 8T one, write assist techniques have been investigated in order to mitigate the gap. However, since in the 6T topology with pre-charge at V DD /2 the dynamic performance are comparable with the 8T cell, we believe that write assist techniques are not necessary. Moreover, read to half V DD can be interpreted as a read assist technique.
G. INFLUENCE OF THE I OFF TARGET
The boundary V DD voltage of approximately 300 mV at which TFET and CMOS cells performance cross is close Fig. 3) .
to the one found from the comparison of the TFET and CMOS I D -V GS characteristics in Fig. 3 at I OFF = 10 pA/µm. Since from Figs. 7 and 8 one concludes that TFETs are recommended only for such low V DD , we performed further simulations at I OFF = 10 fA/µm (see I D -V GS in Fig. 3 ) to estimate the crossover V DD in such case. Fig. 9 shows that the voltage range where TFETs may be advantageous over CMOS is critically affected by the target I OFF , and the application window for TFET widens with decreasing off-current.
III. 6T SRAM CELLS WITH CALIBRATED TFETS
Now we repeat the analysis of the SNMs and delays considering TFET structures and parameters calibrated on experimental data with the aim of assessing the impact of TFET ambipolarity and low on-current values.
A. THE EFFECTS OF TFET AMBIPOLARITY ON SRAM CELLS
The transistor considered in this sub-section is the trigate TFET employed to make the TFET inverter reported in [6] . Since the N/PTFETs are physically identical [6] , the n-or p-operation mode was exclusively determined by the biasing. It is worth noting that the S region in the n-mode (i.e., the p+ doped pocket) becomes the D region in the p-mode convention. At the same time, the n+ pocket is the D/S for the N/PTFET, respectively. Unfortunately, since both junctions are designed to be used as tunneling junction (but in different operation modes), when the gate voltage is near to 0 V the band diagram is sufficiently steep at both interfaces, leading to an ambipolar behavior (see Fig. 10 ).
Although the ambipolarity is a parasitic effect and some techniques for reducing have been already demonstrated experimentally [21] , [22] , we investigated the feasibility of symmetric TFET-based 6T SRAM cells using the ambipolar devices whose calibration has been shown in [17] and the 6T TFET topologies (with either I-ATs or O-ATs) discussed so far. Since the TFET can operate both as n-and p-type (depending on the biasing), the off-state is not strictly controlled by the gate to source voltage when it is employed as AT. In fact for a positive V DS (terminal names related to n-mode convention), it can switch to the on-state both if V GS increases (for V GS > 0) and if V GD decreases (for V GD < 0). In Fig. 11 , the data retention of a minimum-size (W AT = W PD = W PU ) SRAM cell implemented with the devices of Fig. 10 [17] is evaluated through the butterfly curves of the I-ATs configuration. The BL and BLB were either set to logic '1' and '0' (a) or both to logic '1' (b), to measure the hold-ability (WL = '0') of the cell under test during read and write operations of a cell in the same column. The figure reports also the butterfly curves obtained from the pure inverter VTCs without considering the ATs (dotted grey lines). When the I-AT is added, even if it is biased with V G AT = V WL = 0 V, the VTC related to the side with the bit-line at '1' is considerably deformed and the transition to the low logic level eventually takes places at V QB larger than V DD (i.e., when the drivability of the PD transistor becomes stronger than that of the PU and AT transistors), so that the cell is no more bi-stable. This is due to the fact that, when 228 VOLUME 3, NO. 3, MAY 2015 the WL is at the low level and the BL (and/or BLB) is at V DD , the I-AT turns on as a PTFET since the V GD (i.e., V GS if the name of terminals refer to the p-type convention) is -V DD . A similar situation occurs for the O-AT configuration when the BL (and/or BLB) is set to logic '0' and for different V DD values. We can conclude that the ambipolarity of such TFETs degrades the operation of the SRAM cells so severely that it prevents the storage operation. Although we have simulated only 6T cells, similar considerations apply also to other proposed SRAM topologies [9] - [15] that employ TFETs as ATs.
In [17] we presented a further model calibration on pulsed measurements performed on the same device that featured much less ambipolarity, possibly due to suppression of trap-assisted-tunneling using pulse widths shorter than the trap time constants. The simulations reported in [17] on the SRAM performance were performed with such calibration deck. However, in the last part of the present paper we propose a completely new calibration on a less ambipolar p-type NW employed in simple p-type Transistor/Resistor logic gates (i.e., Inverter and NAND) [22] .
B. MODELS CALIBRATED ON STATE-OF-THE ART PTFET WITH REDUCED AMBIPOLARITY
In this section, we consider the p-type gate-all-around (GAA) NW TFET published in [22] . Device processing is very similar to the one employed for the aforementioned trigate TFET [6] . However, solutions were adopted to suppress the ambipolar I D -V GS characteristics, as the dimension scaling toward a 20 nm diameter GAA NW (the channel length is 200 nm) and the asymmetric doping strategy with an n+ pocket at the S side and a low doping at the D side.
As in [17] we reproduced the NW structure with a 3D template ( Fig. 12(a) ) and a 2D one (Fig. 12(b) ) that can be seen as a horizontal cut, except for the oxide thickness, reduced by 20% in the 2D structure to match the electrostatics of the 3D one. The doping levels are N D = 4·10 20 cm −3 and N A = 10 19 cm −3 , the pocket lengths 10 nm and 5 nm for the n+ pocket and for the p+ pocket, respectively. Regarding the adjustable model parameters, the effective masses m c and m v were set to 0.35·m 0 in the S and channel regions, and to 0.55·m 0 in the D region, while the pre-factors g c and g v [18] were kept to their default values. In addition, the E G0 (i.e., energy gap at 0 K) of sSi was modified to 1 eV. This deck of calibrated parameters leads to the satisfactory agreement between simulated and experimental characteristics illustrated in Fig. 13(a) . Even if logic gates have been already fabricated with this device, the lack of an n-type GAA NW with a similar behavior forced the group to design pull-down resistors instead of NTFETs, as in the case of the p-logic NAND gate [22] (see the schematic and the mixed-mode simulations compared with experimental data in Fig. 14) .
In this respect, we have defined a virtual n-type device ( Fig. 13(b) ) simply by reversing the S/D doping type to minimize the p/n imbalance, in order to simulate complementary TFET circuits with a reasonable agreement of the existing TFET technology performance. 
C. PERFORMANCE OF 6T SRAM CELLS
In Section II we have demonstrated that the 6T O-AT SRAM cell may outperform the equivalent CMOS cell at a V DD < V Boundary that depends on the targeted off-current. In this section the O-AT SRAM cell performance, implemented with the p-type GAA NW of Section III-B and the equivalent virtual n-type device whose characteristics are in Fig. 13(b) , is compared with the same 6T SRAM cell configuration implemented with the idealized template devices of Section II. However, the devices of Section III-B feature an off-current I OFF of about 0.75 nA/µm (Fig. 13) . For this reason, here we translate the I D -V GS characteristics of the TFET templates of Section II to assure a comparison for the same I OFF .
Due to minimized ambipolarity, the storage operation is ensured by the good switching off capability of the AT, as demonstrated by the overlap between the SRAM butterfly curves simulated in hold operation (WL = '0', Fig. 15 ) and the inverter VTCs (in contrast to the results in Fig. 11 ).
The trends of the static simulations (Fig. 16) were basically in line with what we found in Section II ( Fig. 6(a) and (b)), but with a general reduction of the SNMs mainly due to the reduced intrinsic voltage gain of the inverter. Furthermore, the read and write delays are 1-2 decades longer than the ones expected from the idealized templates used in Section II (Fig. 17) , which emphasizes that the lag between fabricated TFETs and the template TFETs with proven advantages over conventional CMOS is still large and requires many efforts at the device level.
IV. CONCLUSION
In this paper we presented a study on symmetric 6T SRAM cells implemented with both idealized template TFETs and TFET structures calibrated against state of the art devices [6] , [22] .
We used TFET templates with much better characteristics compared to actual fabricated samples to address the unidirectional current limit of TFETs when they are employed in 6T SRAM cells. Our results show that only the configuration with outward ATs could achieve both acceptable read and write SNMs, but the read delay was unacceptable (i.e., 10 ms at V DD = 0.3V, not shown). However, a BL pre-charge to V DD /2 allowed a reasonable read delay and a reduced performance degradation at scaled supply voltage.
Calibrations of the effective model parameters were then performed on realistic devices. The first considered samples were severely affected by ambipolarity, therefore it allowed us to investigate the effects of such behavior on 6T SRAM cells. Our results show that ambipolarity affects the gate control of TFET in off-state, thus preventing the SRAM cell data retention. Calibration on a less ambipolar PTFET allowed us to make a reliable estimation of the performance of actually fabricated devices. Our results demonstrated an alarming lag between simulations on idealized template transistors and fabricated devices. However, since further optimizations are required at device levels in order to bridge this gap, we hope that our benchmark will encourage people to look for solutions that can boost the performance of this potentially disruptive technology. 
