Monolayer heterojunction FETs based on vertical heterogeneous transition metal dichalcogenides (TMDCFETs) and planar black phosphorus FETs (BPFETs) have demonstrated excellent subthreshold swing, high I ON /I OFF , and high scalability, making them attractive candidates for post-CMOS memory design. This article explores TMDCFET and BPFET SRAM design by combining atomistic self-consistent device modeling with SRAM circuit design and simulation. We perform detailed evaluations of the TMDCFET/BPFET SRAMs at a single bitcell and at SRAM array level. Our simulations show that at low operating voltages, TMDCFET/BPFET SRAMs exhibit significant advantages in static power, dynamic read/write noise margin, and read/write delay over nominal 16nm CMOS SRAMs at both bitcell and array-level implementations. We also analyze the effect of process variations on the performance of TMDCFET/BPFET SRAMs. Our simulations demonstrate that TMDCFET/BPFET SRAMs exhibit high tolerance to process variations, which is desirable for low operating voltages.
INTRODUCTION
Static power is a dominant component of system power at present technology nodes [ITRS 2011] . Occupying more than 50% of die area, static random access memory (SRAM) is a major contributor to processor static power [Zhang et al. 2005; Pavlov and Sachdev 2008] . Multiple techniques have been explored to reduce the static power of SRAMs, such as sleep transistors [Zhang et al. 2005] , multiple thresholdvoltage [Hamzaoglu et al. 2000] , virtual ground [Sharifkhani and Sachdev 2007] , DRG-cache [Agarwal et al. 2002] , and so on. However, the subthreshold slope limit of 60mV/dec imposes a ceiling to the achievable reduction in static power [Taur and This work was supported by grant CCF-1217738 from the National Science Foundation. This work is an extended version of the work published in Design Automation Conference (DAC), 2015, titled "Monolayer Transition Metal Dichalcogenide and Black Phosphorus Transistors for Low Power Robust SRAM Design." Authors' addresses: J. Rakshit and K. Mohanram (corresponding author), ECE Department, University of Pittsburgh, 1238 Benedum Hall, Pittsburgh, PA 15261; emails: joydeep.rakshit@pitt.edu, kartik. mohanram@gmail.com and kmram@pitt.edu; R. Wan, K. T. Lam, and J. Guo, Department of Electrical and Computer Engineering, University of Florida, 551 New Engineering Building, P.O. Box 116130, Gainesville, FL 32611-6130; emails: {wanrunlai, lamkt, guoj}@ufl.edu. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from2007; Chandra et al. 2010; Zimmer et al. 2012; Nalam et al. 2011] for nominal CMOS SRAMs to make them competitive with TMDCFET and BPFET SRAMs. Simulation results show that on average, below 0.5V, the DRNM and DWNM of TMDCFET SRAMs is comparable to the best read-assisted and the best write-assisted CMOS SRAMs, respectively. However, both the DRNM and DWNM of BPFET SRAMs is lower when compared to the best read-assisted and the best write-assisted CMOS SRAMs, respectively. In conclusion, only TMDCFET SRAMs preserve their advantages over read/write-assisted CMOS SRAMs at low operating voltages, though RA/WA does close the gap between BPFET and CMOS SRAMs. However, it should also be noted that read-assist and write-assist techniques present additional design complexities, and have an area overhead for implementation, which is not present for the nominal TMDCFET and BPFET SRAMs.
Third, we perform a detailed array-level study of TMDCFET and BPFET SRAMs, guided by prior work for determining an uniform optimal array design, and analyze how the improvements exhibited by TMDCFET/BPFET SRAMs over CMOS SRAMs in the previous sections are affected at the memory array level. To accurately capture the performance of the TMDCFET/BPFET SRAMs at the array level, we perform a detailed bitline capacitance estimation for all the SRAM designs and incorporate it into our SRAM simulation framework, replacing the uniform bitline capacitance of 1fF considered in our single-bitcell evaluations. Our simulation results demonstrate that, on average, below 0.5V, the read delay advantages of the TMDCFET/BPFET SRAMs over nominal CMOS SRAMs enhances to 251×/303×, whereas improvements in static power, read stability and writability, and write delay remain unaltered. In conclusion, TMD-CFET/BPFET SRAMs retain their advantages over CMOS SRAMs at the array level.
Finally, we analyze the effects of process variations on the static power, read stability and writability, and read/write delay of TMDCFET and BPFET SRAMs. To the best of our knowledge, there is no published work that reports experimental data on the performance of TMDCFETs and BPFETs under process variations. We use the threshold voltage variation data available for sub-20nm CMOS devices to evaluate TMDCFET and BPFET robustness under process variations. Our simulations demonstrate that the static power, read stability and writability, and read/write delay of TMDCFET and BPFET SRAMs demonstrate a low standard deviation of 10%-15% under 3σ threshold voltage variation of 30%.
In summary, TMDCFET and BPFET SRAMs outperform CMOS SRAMs at low supply voltages, at both the single bitcell and the array level, and are potential replacements for the same. We expect TMDCFETs and BPFETs to demonstrate similar advantages in logic circuits as well.
This article is an extended version of Rakshit et al. [2015] and is organized as follows. Section 2 provides background on TMDCFETs and BPFETs, and describes the integrated simulation framework for circuit design. Section 3 summarizes technology exploration results for TMDCFET and BPFET SRAMs. Section 4 assesses RA/WA CMOS SRAMs against nominal TMDCFET and BPFET SRAMs. Section 5 presents a study of the TMDCFET/BPFET SRAMs at the array level. Section 6 provides a brief comparison between the TMDCFET and BPFET SRAMs. Section 7 presents a conclusion.
TMDCFETS/BPFETS: BACKGROUND
The basic monolayer FET integrates one or more layers of the same or different monolayer materials, which function as the channel material sandwiched between a substrate and a gate insulator or two gate insulators. The gate terminal on top of the gate insulator modulates the conductivity of the channel region and thereby controls the switching of the FET. Two types of FETs, TMDCFETs and BPFETs, with this basic structure have been proposed and studied [Lam et al. 2014a [Lam et al. , 2014b . The TMDCFET structure is presented in Figure 1 (a). The TMDCFET has one WTe 2 monolayer and one MoS 2 monolayer forming a vertical heterojunction channel between two 3nm HfO 2 layers with a dielectric constant (κ) of 20 acting as the gate insulators. The portion of the WTe 2 /MoS 2 monolayer outside the gated region is assumed to be heavily doped so that these extension regions are rendered ineffective in intrinsic transistor switching. The BPFET structure is presented in Figure 2 (a). The planar BPFET has a black phosphorus (BP) monolayer as the channel material between two 3nm ZrO 2 layers with κ = 25 as the gate insulator material. The gate is assumed to control the BPFET perfectly and short channel effects are ignored.
I-V Characteristics
This section discusses the general I D -V G characteristics of TMDCFETs and BPFETs using the plots in Figures 3(a) and 4(a), respectively, for V D of 0.2V and 0.6V.
TMDCFET: Both p-type and n-type characteristics can be achieved with a single TMDCFET in practice. p-type behavior is obtained using the top gate as the switching gate, whereas n-type behavior is obtained using the bottom gate as the switching gate. For p-type behavior, the voltage at the nonswitching bottom gate (V BG ) is fixed to electrostatically dope the MoS 2 layer to n-type. Similarly, during n-type switching, the voltage at the non-switching top gate (V TG ) is fixed to dope the WTe 2 layer to p-type. I ON for both pTMDCFETs (V BG =0.3V) and nTMDCFETs (V TG =-0.5V) is of the order of 10 2 μAμm −2 and 10 4 μAμm −2 at |V DS | of 0.2V and 0.6V, respectively. The nTMDCFETs have I OFF of around 10 −10 μAμm −2 at 0.2V and 0.6V V DS . The I OFF for pTMDCFETs increases from 10 −6 μAμm −2 to 1μAμm −2 as V DS is changed from -0.2V 
BPFET:
The I ON of both pBPFETs and nBPFETs is of the order of 10 3 μAμm −1 and 10 4 μAμm −1 for |V DS | of 0.2V and 0.6V, respectively. I OFF is of the order of 10
for both nBPFETs and pBPFETs. Note that in Figure 4 (a), the plots for 0.2V and 0.6V |V DS | overlap. Black phosphorus has a highly isotropic band structure [Lam et al. 2014a] with different carrier mobilities for orthogonal transport directions, namely the x-direction (armchair direction) and y-direction (zigzag direction). The effective mass of electrons and holes in the x-direction are 0.17m 0 and 0.16m 0 , while those in the y-direction are 1.20m 0 and 6.49m 0 , respectively, where m 0 is the mass of the free electron [Lam et al. 2014a] . Due to an increase in effective electron mass from x-to y-direction BPFETs, the carrier velocity decreases, and thereby I ON decreases, from xto y-direction BPFETs. Figure 4 (b) plots the I D -V G characteristics for x-and y-direction pBPFETs. The I ON for x-direction pBPFETs is 2-3× that of y-direction pBPFETs. We consider only x-direction BPFETs for the rest of this article, because of its high I ON , resulting in faster circuits.
Circuit Simulation and Design
Since there are no compact models for TMDCFETs and BPFETs available presently, we built a lookup table-based Verilog-A model of the TMDCFET and the BPFET for high-level circuit simulation of SRAMs using Cadence Spectre. These Verilog-A models characterize both the DC and the transient behavior of the transistors using first-order current-voltage-charge differential equations This method is suitable for accurately and efficiently modeling these emerging devices, and has been used in studying other emerging devices (e.g., Singh et al. [2010] , Yang and Mohanram [2011] , Choudhury et al. [2008] , and Yang et al. [2010] ). Atomistic self-consistent device simulation techniques have been adopted to simulate the intrinsic TMDCFETs and BPFETs in this article. The atomistic self-consistent device simulation techniques are well established device simulation methodology frequently used to model nanoscale emerging devices for which compact simulation models are yet to be developed. A detailed discussion of these simulation techniques are out of the scope of this work and we refer to Datta [2005] and Lundstrom and Guo [2006] for an extensive discussion on this topic. These simulation techniques have been shown to provide excellent agreement to experimental measurements on fabricated devices for other emerging devices [Choudhury et al. 2008; Yang et al. 2010; Yang and Mohanram 2011] . The device simulations provide
data, which are used to populate 2D lookup tables at discrete steps of V GS and V DS ranging from 0V to 0.8V. We briefly summarize appropriate device simulation techniques for the TMDCFET and BPFET, and the integration of extrinsic effects characteristic to fabricated devices, in the rest of this section. Please note that it is currently not possible to demonstrate a comparison between modeling and experimental data for the nanometer length TMDCFET and BPFET devices considered in this work. The state-of-the-art fabricated TMDCFETs and BPFETs have a channel length in the order of micrometers; however, we simulate nanometer scale devices in this work. Hence, we do not have nanometer scale experimental data available for comparison. Note also that we have used experimental data for estimating the extrinsic effects (contact resistance and parasitic capacitance) from micrometer scale devices in our simulations to reflect the extrinsic effects that are inevitable in fabricated devices. We observe rapid progress in fabrication technology for TMDCFETs and BPFETs, mainly focused on uniform doping and reduction of contact resistance. We believe that with all the active research interest in 2D transistors [Liu et al. 2014; Haratipour et al. 2015; Ryder et al. 2016] , nanoscale TMDCFETs and BPFETs will be reported in near future.
Monolayer WTe 2 -MoS 2 vertical heterojunction transistor:
The carrier statistics equations in WTe 2 and MoS 2 monolayers are solved self-consistently using Poisson's equation in the form of a capacitance model [Kumar et al. 2012] . The source-drain current I is computed using the Landauer-Buttiker formula [Datta 1997 ]:
where g s is the spin degeneracy factor, e is the electron charge, h is the Planck's constant, T tb (E) is the interlayer transmission between the wave state with a wave vector of k t in the top layer and k b in the bottom layer, and f t (E) and f b (E) are the Fermi-Dirac distribution functions of the top and bottom layers, respectively. The interlayer coupling is assumed to be weak compared to the intralayer binding in the simulations.
Monolayer black phosphorus transistor: Ballistic device performance of the monolayer BPFET is obtained by adopting the popular top-of-the-barrier approach [Rahman et al. 2003] . The E-k relation of monolayer black phosphorus is computed using ab initio simulations based on density functional theory (DFT) with advanced functionals. The current I is evaluated as the difference between the fluxes from the source and drain terminals using the formula:
where q is the elementary charge, v(E) is the carrier velocity computed from the E-k relation, D(E) is the density-of-states calculated from band structure, and f (E) is the Fermi-Dirac distribution.
Extrinsic Effects
Extrinsic features like parasitic capacitances and contact resistances are characteristic of fabricated devices, and they represent one of the major factors affecting circuit performance of scaled devices [Balasubramanian et al. 2003 ]. In order to capture the extrinsic effects on fabricated devices, the extrinsic TMDCFETs and BPFETs are modeled by adding parasitic capacitances and contact resistances around the intrinsic FET terminals, as shown in Figures 1(b) and 2(b). We utilize recent experimental data and modeling results to estimate these extrinsic features, and in the following text, we demonstrate the methodology for evaluation of these extrinsic features for our nanoscale devices, from microscale device experimental data.
2.3.1. TMDCFET Extrinsic Effects. The TMDCFET contact resistance arises due to formation of Schottky barriers between the metal contact and the TMDC monolayer. The major TMDCFET parasitic capacitance arises due to the fringing electric fields between the sidewalls of gate and the drain/source terminals, typical to sub-20nm technology nodes.
Contact resistance:
We adopt the contact resistivity of 200 μm reported in the recent work [Kappera et al. 2014] for the evaluation of the contact resistance for the nanoscale TMDCFETs considered in this work. The value reported in Kappera et al. [2014] is for a microscale device, and we employ linear scaling to obtain the contact resistance for the nanoscale TMDCFET. Therefore, the contact resistance at each terminal for a TMDCFET with 32nm channel width is 200 μm divided by 0.032μm (i.e., 6.25k ).
Parasitic capacitance: The vertical TMDCFET, as shown in Figure 1(a) , has a structure similar to the double-gate MOSFET with an extremely thin semiconductor body. The fringe capacitance is the major parasitic capacitance of these devices [Bansal et al. 2005] . A fringe capacitance value of 0.05fFμm −1 is reported in Bansal et al. [2005] for an optimized 16nm device model. Since this value is derived for an optimized device model, we adopt this value with a scaling factor of 10 (i.e., 0.5fFμm −1 ) to account for the nonidealities in a fabricated device. We utilize linear scaling to obtain the value for the nanoscale device. Therefore, the parasitic capacitance at each terminal for a TMDCFET with 32nm channel width is 0.5fFμm −1 times 0.032μm (i.e., 16×10 −3 fF).
2.3.2. BPFET Extrinsic Effects. The BPFET contact resistance, similar to the TMDCFET, arises due to formation of Schottky barriers between the metal contact and the BP monolayer. The contact resistance varies with different metal contacts. The BPFET parasitic capacitance also arises due to the fringing electric fields between the sidewalls of gate and the drain/source terminals.
The lowest reported experimental data for BPFET contact resistivity is 1.14 mm [Haratipour et al. 2015] . We adopt linear scaling to obtain the contact resistance for the nanoscale device. Therefore, the contact resistance at each terminal for a TMDCFET with 32nm channel width is 1.14 mm divided by 32×10 −6 mm (i.e., 35.6k ).
Note that the value calculated earlier is very high, which reduces the drive currents to negligible magnitudes and inhibits proper switching behavior of the FET. However, there is active interest in the research community to scale down this high contact resistivity (e.g., Liu et al. [2014] and Haratipour et al. [2015] ), and we expect contact resistance to be scaled down as the BP device technology becomes more mature, such that the effective resistance at nanoscale feature sizes is in the 1-5k range. Also, we should note that the nanoscale CMOS transistors in present technology nodes have a contact resistance of 0.1-1k . Therefore, in order to hold an advantage over the CMOS transistors, the nanoscale BPFET contact resistance should at least be in the range of the nanoscale CMOS contact resistance, and we expect the developments in process technology to achieve this goal. The contact resistance of the BPFET is assumed to be 2k in this work.
Parasitic capacitance: Similar to the TMDCFET, the planar double-gate BPFET as shown in Figure 2 (a) has a structure similar to the double-gate MOSFET with an extremely thin semiconductor body. Therefore, we can adopt the parasitic capacitance value calculated in Section 2.3.1 for the BPFETs also (i.e., 16×10 −3 fF).
EVALUATION OF TMDCFET/BPFET SRAMS
In this section, we present a comparative study of the static power, read stability and writability, read/write delay, and area of the classical 6T-SRAM bitcell in three technologies: CMOS, TMDCFET and BPFET. We use the high-performance (HP) and low-power (LP) 16nm predictive technology models (PTM) [ASU 2012 ] to implement the CMOS SRAM, and use the TMDCFET and BPFET extrinsic models presented in Section 2 to implement monolayer FET SRAMs. The classical 6T-SRAM cell circuit is presented in Figure 5 . The data is stored in SRAM cell at nodes "Q" and "QB." The access transistors M3 and M6 serve as access ports to the nodes Q and QB for read and write operations. For the pTMDCFET, the bottom gate voltage is 0.02V, and for the nTMDCFET, the top gate voltage is -0.5V [Lam et al. 2014b] . The bitline capacitance is 1fF for all designs. SRAM sizing is a very important aspect of SRAM design because it affects the performance of the SRAM bitcell, and arbitrary sizing may result in suboptimal performance. There are three different categories of transistor that can be sized for optimization of a SRAM bitcell: the pull-down nFETs (M1, M4), the nFET access transistors (M3, M6), and the pull-up pFETs (M2, M5). The two transistor dimensions that can be tuned are its channel length and width. In the following sections, we discuss the effect of channel length and channel width on static power, read stability and writability, and read/write delay and describe the sizing approach followed in this work to ensure a fair comparison of the CMOS, TMDCFET, and BPFET SRAMs.
Static Power
The static or leakage power of the bitcells depend on the leakage current of the access, pull-up, and pull-down transistors. The channel length (L) inversely affects the leakage current (I OFF ), whereas the channel width (W) directly affects the leakage current.
, where k is a constant. Since the leakage power can be decreased (increased) by increasing (decreasing) the channel length, we maintain a uniform channel length of 16nm for all the transistors for fairness in static power comparison. Figure 5 demonstrates that the pFET (M2) and the nFET (M4) in the cross-coupled inverters also affect the leakage power of the bitcell. We size the nFETs and pFETs to realize balanced inverters, thereby equalizing their effect on the leakage power. Depending on the ratio of the drive strengths of the nFETs and pFETs of CMOS, TMDCFET, and BPFET, we obtain a (W P /W N ) of (3/1), (1/3), and (1/1.4), respectively.
To compare the static power of LP-CMOS, HP-CMOS, TMDCFET, and BPFET SRAMs for V DD varying from 0.2V to 0.8V, the bitlines are clamped to V DD during the hold phase of SRAM. The access transistor connected to the node storing 0 has V DS ≈ V DD , which leads to subthreshold current flow. This study also measures leakage current through the cross-coupled inverters. Figure 6 (a) plots the static power of CMOS, TMDCFET, and BPFET SRAMs for varying V DD . On average, below 0.5V, the TMDCFET and BPFET SRAMs show 6 and 1 orders of magnitude improvement in static power over HP-CMOS SRAMs. In comparison to LP-CMOS SRAMs, only TMD-CFET SRAMs show 2 orders of magnitude improvement in static power below 0.5V. The improvement in static power stems from the comparatively low I OFF of the TMDCFETs and BPFETs than 16nm CMOS. Also, the improvements exhibited by the TMDCFET SRAMs is higher than that of BPFET SRAMs. This can be attributed to the lower I OFF of TMDCFETs than the BPFETs, as evident from Figures 3(a) and 4(a).
For the TMDCFETs, the leakage current is partially modulated by the back gate voltage V BG , and it decreases with decreasing V BG . The effect of tuning V BG on the static power of TMDCFET SRAMs is demonstrated in Figure 6 (b). A reduction of V BG leads to a diminished carrier concentration in the vertical channel of TMDCFETs, and causes a left shift of the pTMDCFET I D -V GS curve (refer to Figure 3 (b)), thereby incrementing its threshold voltage. This results in lowering of the leakage current and hence static power.
Stability: Dynamic Noise Margins
This section compares the read stability and writability of TMDCFET, BPFET, and HP-CMOS SRAMs, characterized by DRNM and DWNM, respectively. Note that LP-CMOS SRAMs are not considered henceforth because the high threshold voltage of LP-CMOS (≈0.5V) precludes the operation of LP-CMOS circuits below 0.5V (region of interest in this work). The read-disturb failure occurs when there is an undesired flip in SRAM state during a read operation. DRNM is defined as the minimum voltage difference between the nodes during a read operation [Dehaene et al. 2007] . The write failure is characterized by the failure to flip the state of the SRAM cell in a write cycle. DRNM and DWNM capture the dynamic behavior of read and write operations and are hence better measures than static noise margin, which assumes an infinite wordline pulse. In this work, we consider critical word length (WL CRIT ), defined as the minimum wordline pulse required to flip the SRAM state, as the measure of DWNM [Wang et al. 2008] .
DRNM and DWNM depend on the SRAM bitcell beta (β) and alpha (α) ratio, respectively. β is defined as the ratio of the width of the nFET in the inverter (W M1 ) to the width of the nFET access transistor (W M3 ). A higher β signifies a stronger inverter nFET resulting in a higher DRNM. α is defined as the ratio of the width of the inverter pFET (W M2 ) to the width of the nFET access transistor (W M3 ). A higher α signifies a stronger access transistor resulting in a higher DWNM.
Traditional SRAM designs use static read noise margins (SRNMs) and static write noise margins (SWNMs), measured in units of voltage (mV), to characterize read stability and writability. For sizing of generic SRAMs, α and β are chosen so that SRNM and SWNM are equal. In contrast, DRNM, measured in mV, and DWNM, measured in units of time (ps), capture the dynamic behavior of read and write operations and are better measures in comparison to static noise margin, which assumes an infinite wordline pulse. However, we cannot apply the scheme of equalizing DRNM and DWNM to obtain the optimum α and β for sizing. The choice of α and β depends on the required DRNM and DWNM for the SRAM bitcell. Since we present a comparative evaluation of read stability and writability, we cannot size the SRAMs based on a predefined value of DRNM and DWNM, which defeats the purpose of this study. We consider a β of 1.5 for all the SRAM designs, to ensure the same ratio of drive strength of the nFET in the inverter to that of the nFET access transistor. However, we cannot consider the same α for all the CMOS, TMDCFET, and BPFET SRAMs because they have different ratios of the drive strengths of pFETs and nFETs.
• CMOS SRAM: We consider α=1.5 for the CMOS SRAM. Note that the effective ratio of the drive strengths of nMOS (access transistor) to that of pMOS (inverter) with α=1.5 is 1.5/1 (since the inverters are balanced).
• TMDCFET SRAM: The ratio of the drive strength of a nTMDCFET to the drive strength of a pTMDCFET is 1/3. Hence, we consider α=(3×1.5)=4.5 to obtain an effective ratio of the drive strengths of nTMDCFET (access transistor) to the drive strength of pTMDCFET (inverter) as 1.5/1, similar to the CMOS SRAM.
• BPFET SRAM: The ratio of the drive strength of a nBPFET to the drive strength of a pBPFET is 1/1.4. Hence, we consider α=(1.5×1.4)=2.1 to obtain an effective ratio of the drive strength of nBPFET (access transistor) to the drive strength of pBPFET (inverter) as 1.5/1, similar to the CMOS SRAM. On average, TMDCFET and BPFET SRAMs have 15.8% and 5.1% higher DRNM than CMOS SRAMs, respectively, for V DD below 0.5V. This offers better stability and reliability at low operating voltages, which is one of the main concerns for low-voltage operations. The improvement that the TMDCFET/BPFET SRAMs offer over CMOS SRAMs arises primarily from the low subthreshold current (I OFF ) exhibited by the TMDCFETs/BPFETs. (b) Comparison of DWNM, as measured using WL CRIT , of CMOS, TMDCFET, and BPFET SRAMs at various supply voltages. On average, TMDCFET and BPFET SRAMs offer 6.6× and 0.5× improvement, respectively, over CMOS SRAMs for V DD below 0.5V, reducing the probability of write failures. This improvement is due to lower TMDCFET/BPFET SRAM node (Q and QB) capacitance and higher write current.
The aforementioned sizing methodology ensures equal drive strength ratios between the access transistors and the inverter nFETs and pFETs across CMOS, TMDCFET, and BPFET SRAMs, thereby ensuring fairness in comparison. Note that for achieving the α and β values discussed earlier, we vary the access transistor width, keeping the inverters balanced. This results in different access transistor widths during the read stability and writability evaluations. We adopt this sizing methodology to equalize the relative effect of the active transistors during read (M3, M1) and write operations (M5, M6) across the CMOS, TMDCFET, and BPFET SRAMs. This is in contrast to the traditional sizing scheme, where the FETs in the inverters are sized according to the α and β, keeping the access transistor width constant. Note that once a particular technology is determined, we can adopt the traditional sizing scheme to achieve the desired DRNM and DWNM. Figure 7 (a) shows the variation of DRNM for varying V DD . On average, the DRNM for TMDCFET and BPFET SRAMs are 15.8% and 5.1% higher than HP-CMOS SRAMs for V DD below 0.5V. Higher DRNM results in higher read stability against read-disturb failure for TMDCFET and BPFET SRAMs. The improvement that the TMDCFET/BPFET SRAMs offer over CMOS SRAMs arises primarily from the low subthreshold current (I OFF ) exhibited by the TMDCFETs/BPFETs. Referring to Figure 5 , during a read operation, the voltage (V Q ) of SRAM node storing 0 (Q) increases, due to discharge from the precharged bitline (BL), through the access transistor (M3) and the pull-down transistor (M1). As V Q increases, the nFET (M4) gate voltage starts rising up. Now, due to the subthreshold current of the nFET, it starts discharging the node storing 1 (QB), decreasing its voltage (V QB ). A higher subthreshold current results in a lower V QB , and also a lower DRNM (V QB -V Q ). The TMDCFETs/BPFETs, therefore, demonstrate better DRNM than CMOS SRAMs, for its low subthreshold current. indicates a relatively lower probability of experiencing a write failure in TMDCFET and BPFET SRAMs. The improvement in DWNM demonstrated by the TMDCFET/BPFET SRAMs over CMOS SRAMs can be associated with the lower SRAM node (Q and QB) capacitance and higher write current of the TMDCFET/BPFET SRAMs. The SRAM node capacitance of TMDCFET/BPFET SRAMs is lower than that of CMOS SRAMs and, hence, requires a shorter wordline pulse for adequate discharge during a write. Also, the TMDCFET/BPFET SRAMs have a higher write current, arising from higher I ON of TMDCFETs/BPFETs. A higher write current results in faster discharge of the SRAM node capacitance, thus needing a shorter wordline pulse for successful write. Therefore, these two reasons synergistically attribute to a lower WL CRIT , representing better writability of the TMDCFET/BPFET SRAMs.
DRNM:

DWNM (WL
The high I ON /I OFF for TMDCFETs and BPFETs results in improvement of both read and write margins, in contrast to the general trend where an improvement in one usually degrades the other. Figures 7(a) and 7(b) show that DRNM and DWNM of TMDCFET and BPFET SRAMs scale linearly with supply voltage, unlike that of CMOS SRAMs. This shows the potential of the TMDCFET and BPFET systems to scale down to very low voltages.
Performance: Read/Write Delay
This section compares the read/write delays of TMDCFET, BPFET, and HP-CMOS SRAMs. Read/write delays measure SRAM performance, and directly affect the memory latencies. The read and write delays are affected by the α and β in a similar manner to read stability and writability. Hence, we adopt the same sizing scheme or the read and write delay evaluations as discussed in the previous section for read stability and writability, respectively Read delay: Read delay is measured as the time delay between 50% of wordline activation to development of 10% of precharge voltage difference between the bitlines, assuming differential sensing. The read delay for varying V DD is shown in Figure 8 (a). TMDCFET and BPFET SRAMs have better read delay than CMOS SRAMs below 0.7V. Below 0.5V, on average, the read delay of TMDCFET and BPFET SRAMs show 89× and 99× improvement over CMOS SRAMs. The primary reason for this improvement is the high I ON of TMDCFETs and BPFETs at lower supply voltages. A higher read current leads to faster discharge of the bitline capacitance and hence quicker development of the required voltage differential between the two bitlines for a successful read, and hence lower read delay for the TMDCFET/BPFET SRAMs.
Write delay:
The write delay is characterized by the time between 50% activation of the wordline to a decrease from V DD to 0.1V DD at the internal node storing 1. The write delay for varying V DD is shown in Figure 8(b) . At V DD below 0.5V, the write delay of TMDCFET and BPFET SRAMs is 4.25× and 1.5× times better than CMOS SRAMs. The reason for the advantages in write delay is similar to that of advantages in WL CRIT , that is, lower SRAM node (Q and QB) capacitance and higher write current of the TMDCFET/BPFET SRAMs. A lower SRAM node capacitance requires a shorter time to adequately discharge during a write operation due to a smaller time constant, and a higher write current further assists in diminishing the discharge time. These two factors synergistically contribute to the lower write delay for TMDCFET/BPFET SRAMs when compared to CMOS SRAMs.
Area
In this section, we discuss the relative areas of the CMOS, TMDCFET, and BPFET SRAMs. As elaborated in the previous sections, the access transistor size is considered different during the read and write operations. Hence, we discuss the relative areas during both read and write operations. The 2D transistors represent a nascent technology and hence standard cell layouts are not available for area calculation. Hence, we report the approximate areas obtained from the sizing information.
Read:
In this section, we discuss the relative areas of the CMOS, TMDCFET, and BPFET SRAMs based on transistor sizing discussed in Section 3.2 for the read operation. Note that the β value considered is 1.5 for all the SRAMs.
• CMOS SRAM: The total width of the nMOS (2×W) in the inverters, the pMOS in the inverters (2×3×W), and the nMOS access transistors (2×W/β) is 9.3×W.
• TMDCFET SRAM: The total width of the nTMDCFETs (2×3×W) in the inverters, the pTMDCFETs in the inverters (2×W), and the nTMDCFET access transistors (2×(3/β)×W) is 12×W.
• BPFET SRAM: The total width of the nBPFETs (2×1.4×W) in the inverters, the pBPFETs in the inverters (2×W), and the nBPFET access transistors (2×(1.4/β)×W) is 6.7×W.
Hence, if we consider the area of a CMOS SRAM bitcell as A, the TMDCFET SRAM will have an area of 1.3×A, whereas the BPFET SRAM will have an area of 0.72×A. Please note that these areas are approximate, obtained by following a marginally different sizing methodology.
Write:
In this section, we discuss the relative areas of the CMOS, TMDCFET, and BPFET SRAMs based on the sizing discussed in Section 3.2 for the write operation.
• CMOS SRAM: The total width of the nMOS (2×W) and the pMOS in the inverters (2×3×W), and the nMOS access transistors (2×α×W) is 11×W, where α=1.5 for CMOS SRAMs.
• TMDCFET SRAM: The total width of the nTMDCFETs (2×3×W) and the pTMDCFETs in the inverters (2×W), and the nTMDCFET access transistors (2×α×W) is 17×W, where α=4.5 for TMDCFET SRAMs.
• BPFET SRAM: The total width of the nBPFETs (2×1.4×W) and the pBPFETs in the inverters (2×W), and the nBPFET access transistors (2×α×W) is 9×W, where α=2.1 for BPFET SRAMs.
Hence, if we consider the area of a CMOS SRAM bitcell as A, the TMDCFET SRAM will have an area of 1.5×A, whereas the BPFET SRAM will have an area of 0.8×A. Again, note that these areas are approximate, obtained by following a marginally different sizing methodology.
RA/WA TECHNIQUES
In the previous section, we discussed improvements in static power, stability, and performance of TMDCFET and BPFET SRAMs over CMOS SRAMs. However, it is also necessary to consider read-/write-assisted CMOS SRAMs, because they improve SRAM performance to a considerable extent [Pilo et al. 2007; Chandra et al. 2010; Zimmer et al. 2012; Nalam et al. 2011] . In this section, we evaluate whether TMDCFET and BPFET SRAMs preserve their advantages over read/write-assisted CMOS SRAMs. We will also discuss the overhead of using the RA/WA techniques.
RA Techniques
In this section, we adopt four leading RA techniques for CMOS SRAMs and evaluate whether the nominal TMDCFET and BPFET SRAMs maintain their advantages in read operation against these read-assisted CMOS SRAMs. We consider V DD raising, GND lowering, wordline lowering, and bitline precharge voltage-lowering techniques for RA. The same increase and decrease (0.3V DD ) in voltage levels is adopted for all the RA techniques for fairness of comparison. V DD raising: An increased V DD increases V GS across the pull-down device in the crosscoupled inverter, strengthening it with respect to the access transistor and enhancing read stability.
GND lowering: Similar to V DD raising, GND lowering increases V GS across the pulldown device in the cross-coupled inverter and strengthens it with respect to the access transistor, thereby enhancing read stability.
Wordline lowering: Wordline lowering decreases the V GS of the access transistor by decreasing the gate voltage and hence its drive strength, thereby making it weaker with respect to the pull-down device in the cross-coupled inverter and improving read stability.
Bitline precharge voltage lowering:
In this scheme, the bitlines are precharged to a value lower than V DD during the read cycle. Therefore, as the V DS of the access transistor decreases, its drive strength decreases improving read stability. Figure 9 demonstrates the effectiveness of the RA techniques and the improvement in DRNM for CMOS SRAMs. V DD raising and GND lowering show better improvement at lower supply voltages in comparison to wordline lowering and bitline precharge voltage lowering. However, even when CMOS SRAMs use V DD raising and GND lowering, the DRNM is, on average, 0.8% lower than the DRNM of TMDCFET SRAMs below 0.5V. Hence, RA techniques do not improve the DRNM of CMOS SRAMs to be better than TMDCFET SRAMs at low voltages. In contrast, CMOS SRAMs with V DD raising and GND lowering exhibit better DRNM than BPFET SRAMs.
WA Techniques
In this section, we adopt four leading WA techniques for CMOS SRAMs and evaluate whether the nominal TMDCFET and BPFET SRAMs preserve their advantages in write operation against write-assisted CMOS SRAMs. We consider V DD lowering, GND raising, wordline boosting, and bitline lowering techniques for WA. The same increase and decrease (0.3V DD ) in voltage levels is adopted for all the WA techniques for fairness of comparison. V DD lowering: A decreased V DD reduces V GS across the pull-up device in the crosscoupled inverter, weakening it with respect to the access transistor and making it easier to write to the SRAM cell.
GND raising: Similar to V DD lowering, a raised GND weakens the pull-up device in the cross-coupled inverter with respect to the access transistor, making it easier to write to the SRAM cell.
Wordline boosting: Wordline boosting increases the V GS of the access transistor by increasing the gate voltage and hence its drive strength, thereby making it stronger with respect to the pull-up device in the cross-coupled inverter.
Negative Bitline: Negative bitline increases the V GS of the access transistor by decreasing the source voltage, increasing its drive strength and making it stronger with respect to the pull-up device in the cross-coupled inverter.
Figure 10(a) demonstrates the effectiveness of the WA techniques and the resultant improvement in WL CRIT for CMOS SRAMs. Wordline boosting and negative bitline show better improvement at lower supply voltages in comparison to V DD lowering and GND raising, which is consistent with Chandra et al. [2010] . We observe that TMDCFET SRAMs demonstrate better DWNM than negative bitline CMOS SRAMs 
Overhead of RA/WA Techniques
RA: V DD raising and GND lowering are the best RA techniques, as discussed in Section 4.1. For both these techniques, a second external power supply is utilized during the read operation, which is connected to the write-selected row via a multiplexer. The multiplexer is used to switch between the two power supplies. These schemes incur a switching latency penalty. The wordline of a row has a high capacitance, contributed by the wire capacitance of the wordline and the gate capacitance of the access transistors. Therefore, switching the voltage of a wordline requires a considerably high charge/discharge time. This switching latency is added to the actual read latency of the SRAM array. Since memory reads lie on the critical path of program execution, this may lower the performance of the system as a whole.
WA: Due to column multiplexing, some cells are half-selected during write operations. For these half-selected cells, the wordline is asserted, but bitlines are left precharged at V DD . Essentially, these cells experience a read operation when fully selected columns are written. Half-selected cells have their wordline enabled, and thus WA techniques that focus on wordline voltage negatively affect DRNM. V DD -and GND-based techniques also affect the DRNM of half-selected cells if the core array voltages are scaled. Figure 10 (b) shows the DRNM of the half-selected cells when WA techniques are used. It is observed that all WA schemes deteriorate DRNM of half-selected cells, with V DD lowering exhibiting the least desirable behavior. The discussion in this section demonstrates that read/write-assisted CMOS SRAMs have comparable read stability and writability to that of TMDCFET/BPFET SRAMs. However, we must also consider the overheads in terms of latency (RA) and degradation of read stability of half-selected cells (WA). Table I summarizes the improvements that TMDCFET/BPFET SRAMs offer over CMOS SRAMs for the static power, stability, and performance metrics.
SRAM ARRAY EVALUATION: TMDCFET/BPFET SRAMS
In the previous sections, we focused on performing a comparative study of the static power, read stability and writability, and read/write delays of a single CMOS, TMD-CFET, and BPFET SRAM cell, with a uniform bitline capacitance for all the designs. However, it is also necessary to analyze the advantages of the TMDCFET/BPFET SRAMs at the SRAM array level, because the bitline capacitance in an array depends on the extrinsic capacitance of the access transistors attached to it, which is designand technology dependent. In this section, we perform a detailed study of the TMD-CFET and BPFET SRAMs at the array level, guided by prior work for determining an uniform optimal array design. We analyze how the improvements exhibited by TMD-CFET/BPFET SRAMs over CMOS SRAMs are affected during array-level operations.
We utilize prior work in the area of optimal SRAM array design to decide on the number of rows to be considered for the CMOS, TMDCFET, and BPFET SRAM arrays. For a fixed memory capacity, Kim et al. [2008] demonstrate that the maximum operational frequency decreases as the number of rows in an SRAM array increases. However, Calhoun and Chandrakasan [2007] report that increasing the number of rows allows reduction of the number of requisite peripheral circuits and hence reduces circuit area, which is very important to achieve high-density memory structures. Modern SRAM array designs generally choose from 128, 256, and 512 rows, depending on the target applications. We consider the central optimal value of 256 rows from this range for our SRAM array, to make our evaluations agnostic to target applications. SRAM arrays with 256 rows have also been widely adopted in prior work [Calhoun and Chandrakasan 2007; Verma and Chandrakasan 2008; Do et al. 2009 Do et al. , 2015 , for optimal array design.
The speed of SRAM array operations depend on the bitline capacitance, bitcell read/write current, and the minimum required voltage differential between the two bitlines for a successful read [Pavlov and Sachdev 2008] . The effective bitline capacitance of any column in an SRAM array is composed of the bitline wire capacitance and the extrinsic capacitance of the access transistors of that array column. In current technology nodes, the wire capacitance is responsible for half of the effective bitline capacitance, while the other half is attributed to the access transistors' extrinsic capacitance [Pavlov and Sachdev 2008] . Since the extrinsic capacitance of the CMOS, TMDCFETs, and BPFETs are not similar, the bitline capacitance will be different for their SRAM arrays.
In prior sections, all the SRAM designs were loaded with a uniform bitline capacitance of 1fF. We should also note that the SRAM array performance is dependent on the ratio of the bitline capacitance to the node (Q and QB) capacitance of the bitcells in a column. This ratio must be equal for all the SRAM designs to ensure a fair comparison, which is not true when a uniform bitline capacitance is used to load all the designs. In this section, we focus on a detailed and accurate estimation of the bitline capacitance for the individual technologies (CMOS, TMDCFET, and BPFET) independently. Then, we integrate those values in the SRAM simulation framework to evaluate the static power, read stability and writability, and read/write delay for the CMOS, TMDCFET, and BPFET SRAM arrays. We also interpret any change in the improvements over CMOS SRAMs at the array level in comparison to the bitcell-level studies.
SRAM Array Design: Bitline Capacitance Estimation
In this section, we describe the details of the methodology adopted for bitline capacitance estimation of the CMOS, TMDCFET, and BPFET SRAM arrays. First, the single bitcell node capacitance (C Q ) is evaluated from simulations. Second, it is evident from Figure 5 that the bitcell node capacitance is mainly attributed to the access transistor (M3), and the nFET (M1) and pFET (M2) of one of the cross-coupled inverters of the SRAM structure. With a simplifying assumption that all the three transistors contribute uniformly to the SRAM node capacitance, the contribution of the access transistor is obtained by dividing the SRAM node capacitance by 3. Third, the bitline capacitance component that can be attributed to the access transistors is the value obtained in the second step scaled by the total number of rows in a column, which is 256 Fig. 11 . Comparison of static power of CMOS, TMDCFET, and BPFET SRAM arrays at varying V DD . On average, TMDCFET SRAM arrays show 6 and 2 orders of magnitude reduction over HP-CMOS and LP-CMOS SRAM arrays, respectively, below 0.5V. BPFET SRAM arrays, on average, exhibit 10× reduction in static power over HP-CMOS SRAM arrays above 0.5V, and no improvement over LP-CMOS SRAM arrays. Therefore, for static power, we do not observe any change in the improvements from the bitcell level, primarily due to the constant contact resistances of the bitcell and access transistors.
for our SRAM array design. The result is then doubled to obtain the effective bitline capacitance, since access transistors constitute only half of the total bitline capacitance.
The bitcell node capacitance evaluations provide us with values of 350×10 −3 fF, 128×10
−3 fF, and 87×10 −3 fF for the CMOS, TMDCFET, and BPFET bitcells, respectively. Therefore, the effective bitline capacitances for the CMOS, TMDCFET, and BPFET SRAM arrays are as follows:
Note that the estimated bitline capacitances for all the SRAM arrays are greater than the initial lumped bitline capacitance of 1fF. This estimation provides a uniform ratio between the bitline capacitance and the bitcell node capacitance for all the SRAM arrays for fairness of comparison.
Evaluation of TMDCFET/BPFET SRAM Arrays
In this section, we perform a comparative study of the CMOS, TMDCFET, and BPFET SRAM memory arrays. We specifically focus on the evaluation of static power, read stability and writability, and read/write delays of the SRAM arrays, following the same procedure adopted in Section 3. Therefore, we do not elaborately describe the implementation details of the evaluation procedure in this section but focus on interpreting the changes in the advantages exhibited by TMDCFET/BPFET SRAMs in these evaluations. The bitline capacitance for the SRAM array designs determined in Section 5.1 are incorporated in the simulation framework to provide accurate estimates for the performance of the TMDCFET/BPFET SRAM arrays in comparison to the CMOS SRAM arrays.
5.2.1. Static Power. Figure 11 plots the static power of CMOS, TMDCFET, and BPFET SRAM arrays for varying V DD . At the array level, on average, below 0.5V, the TMDCFET and BPFET SRAMs show 6 and 1 orders of magnitude improvement in static power over HP-CMOS SRAMs. In comparison to LP-CMOS SRAM arrays, only TMDCFET SRAM arrays show 2 orders of magnitude improvement in static power below 0.5V. Therefore, for static power, we note that there is no significant difference in advantages offered by the TMDCFET/BPFET SRAMs at the bitcell level, evaluated in Section 3.1, and at the array level. Fig. 12. (a) Comparison of DRNM of CMOS, TMDCFET, and BPFET SRAM arrays at various supply voltages, after incorporating the bitline capacitance estimates. On average, TMDCFET and BPFET SRAM arrays have 15.5% and 4.9% higher DRNM than CMOS SRAM arrays, respectively, for V DD below 0.5V. Only the TMDCFET SRAM arrays exhibit an average 0.7% improvement over V DD -raised (best read-assisted) CMOS SRAM arrays. We observe that the DRNM advantages exhibited by TMDCFET/BPFET SRAMs in Section 3.2 at the bitcell level have not been significantly compromised at the array level, primarily due to the similar cell ratio β of the SRAMs at the bitcell and the array level. (b) Comparison of DWNM, as measured using WL CRIT , of CMOS, TMDCFET, and BPFET SRAM arrays at various supply voltages. On average, TMDCFET and BPFET SRAM arrays offer 6.6× and 0.5× improvement, respectively, over CMOS SRAM arrays for V DD below 0.5V. However, they exhibit no noteworthy improvement over the negativebitline (best write-assisted) CMOS SRAM arrays. We observe that the DWNM improvements exhibited by TMDCFET/BPFET SRAMs at the bitcell level, in Section 3.2 have not been significantly compromised at the array level. This is due to the unaltered node capacitance and the write current for a bitcell in the SRAM array.
The similarity in improvements can be attributed to the similar contact resistance values for the TMDCFETs/BPFETs at both the single bitcell and the array level. Leakage current is primarily responsible for the static power, and the contact resistances of the bitcell and access transistors is a major factor in determining this leakage current. Since the contact resistances remains unaltered, the static power for each bitcell in the array remains unchanged, and hence, the advantages exhibited are also unaltered in the SRAM array. Even though an increment in bitline capacitance results in an increase in leakage current, it is not significant enough to affect the advantages that the TMDCFET/BPFET SRAMs exhibit over CMOS SRAMs at array level.
5.2.2. Stability: Dynamic Noise Margins. We utilize the DRNM and DWNM measures, similar to Section 3.2, to compare the dynamic read and write stability of the CMOS, TMDCFET, and BPFET SRAM arrays.
DRNM:
A comparison of the DRNM for CMOS, TMDCFET, and BPFET SRAM arrays at varying V DD is reported in Figure 12 (a). On average, the DRNM for TMDCFET and BPFET SRAM arrays are 15.5% and 4.9% higher than HP-CMOS SRAM arrays for V DD below 0.5V. Also, only the TMDCFET SRAM arrays show an average improvement of 0.7%, below 0.5V, over the best read-assisted CMOS SRAM arrays, that is, V DDraised CMOS SRAMs, whereas BPFET SRAM arrays show no improvement. Hence, we observe that the DRNM advantages exhibited by TMDCFET/BPFET SRAMs over CMOS SRAMs in Section 3.2 for a single bitcell has not been significantly affected at the array level.
The DRNM depends only on the cell ratio β, which is defined as the ratio of the width of the pull-down transistors and the access transistors. This is explained by referring to Figure 5 . The voltage (V Q ) of the node storing 0 (Q) increases during the read operation, due to discharge from the precharged bitline (BL), through the access transistor (M3) and the pull-down transistor (M1). A higher β denotes a stronger pull-down transistor. Hence, for a higher β, the ultimate V Q after a read operation is lower, and the DRNM (V QB -V Q ) is higher. The β value is unaltered from Section 3.2 for each bitcell in the array, and hence, the DRNM is almost unchanged. Figure 12 (b) plots the WL CRIT of CMOS, TMDCFET, and BPFET SRAM arrays for varying V DD . The average improvement in WL CRIT for TMDCFET and BPFET SRAM arrays are 6.6× and 0.5× over HP-CMOS SRAM arrays for V DD below 0.5V. Also, the TMDCFET and BPFET SRAM arrays offer no advantage over the best writeassisted CMOS SRAM arrays, that is, negative-bitline CMOS SRAMs. We observe that the DWNM improvements exhibited by TMDCFET/BPFET SRAMs over CMOS SRAMs in Section 3.2 for a single bitcell has not been significantly affected at the array level.
DWNM:
The similarity in improvements can be attributed to the unchanged SRAM node capacitance and the write current for a single bitcell from Section 3.2. Since the SRAM node capacitance and the write current remain consistent, the requisite shortest wordline pulse for a successful write (WL CRIT ) of each bitcell in the array remains unchanged. The DWNM of the SRAM array is dependent on the DWNM of each of its bitcell and, therefore, the DWNM enhancement is unaltered.
5.2.3. Performance: Read/Write Delay. The performance of the CMOS, TMDCFET, and BPFET SRAM arrays are compared in this section, using the read and write delay statistics.
Read delay:
The read delay of TMDCFET, BPFET, and CMOS SRAM arrays for varying V DD is shown in Figure 13 (a). Below 0.5V, on average, the read delay of TMDCFET and BPFET SRAM arrays shows 251× and 303× improvement over CMOS SRAM arrays. We observe that the read delay advantages of the TMDCFET/BPFET SRAM arrays over CMOS SRAM arrays are further improved in comparison to the read delay evaluations of a single bitcell in Section 3.3.
The higher bitline capacitance of the CMOS SRAM array, when compared to TMDCFET/BPFET SRAM array, is primarily responsible for this enhancement in the read delay advantages of TMDCFET/BPFET SRAMs. An increased bitline capacitance results in a higher read delay, since it takes longer for the bitline capacitance, connected to the SRAM node storing 0, to discharge and develop the required voltage differential for a successful read operation. In the bitcell-level evaluations, all the SRAM designs had a 1fF bitline capacitance. However, the bitline capacitance of the CMOS SRAM arrays (60fF) is higher than that of TMDCFET/BPFET SRAM arrays (22fF/15fF). Therefore, even though the read delay increases for the CMOS, TMDCFET, and BPFET SRAMs at the array level for increased bitline capacitance, the increase in read delay is more for the CMOS SRAM arrays in comparison to the TMDCFET/BPFET SRAM arrays. Therefore, the read delay advantages for TMDCFET/BPFET SRAMs are enhanced at the array level.
Write delay: Figure 13 (b) plots the write delays of TMDCFET, BPFET, and CMOS SRAM arrays, for varying V DD . On average, at V DD below 0.5V, the write delay of TMDCFET and BPFET SRAM arrays is 4.25× and 1.5× times better than CMOS SRAM arrays. Note that the write delay advantages of the TMDCFET/BPFET SRAM arrays over CMOS SRAM arrays remain unchanged from the write delay evaluations of a single bitcell in Section 3.3. The reason for this consistency is similar to the reason for DWNM. The SRAM node capacitance and the write current remain unchanged from Section 3.2, and hence the time required for the SRAM node storing 1 to discharge adequately for a successful write operation stays the same, thus leaving the write delay advantages from a single bitcell unchanged at the array level.
Tables II and III summarize a comparison of the single bitcell level and the arraylevel advantages that TMDCFET/BPFET SRAMs exhibit over CMOS SRAMs.
TMDCFET AND BPFET SRAMS UNDER PROCESS VARIATIONS
With aggressive scaling of transistor geometries in the nanoscale domain to achieve higher integration density, we observe increasing difficulty in control of critical device dimensions and parameters like channel length, dopant concentration, and so on. This leads to a significant variation in critical transistor properties like threshold voltage, which affects circuit performance. Intra-die variations affecting transistors in an SRAM bitcell can degrade the noise margins and performance metrics of the SRAM bitcell [Agarwal et al. 2005] . Hence, it is important to analyze the effects of process variations on the nanoscale TMDCFET and BPFET SRAMs to evaluate their robustness.
To the best of our knowledge, there is no published work that reports experimental data on the performance of TMDCFETs and BPFETs under process variations. The primary reason for nonavailability of process variation data is that TMDCFETs and BPFETs are a nascent technology and no large-scale fabrication effort has been undertaken for these FETs. However, we take a cue from the variability data available for sub-20nm CMOS devices and apply the same statistical distribution of parameters to the TMDCFET and BPFET SRAMs. The process variations (oxide length, random discrete doping, etc.) in MOSFETs is primarily reflected in statistical variability of a single parameter, the threshold voltage (V T ) [Agarwal et al. 2005] . The layer thickness variation (number of layers) in TMDCFETs and BPFETs affects the electronic properties and hence also affects the threshold voltage of these transistors [Yun et al. 2012] . The V T variation within the devices of a 6T-SRAM in 16nm node is an independent Gaussian distribution with a 3σ variation of 30% [Vaddi et al. 2010] . Please note that we have already evaluated the TMDCFET/BPFET SRAM static power, read stability and writability, and read/write delays across a range of operating voltages (Sections 3 and 5). Also, our TMDCFET/BPFET models do not incorporate support to reflect the complex relation of electronic properties with temperature.
The impact of process variation on TMDCFET and BPFET SRAMs, characterized by the variation of V T , is evaluated using Monte-Carlo simulations of 1,500 iterations. The Monte-Carlo simulations are setup for mismatch variation simulation, which reflect intra-die variations. We consider a nominal V T of 0.1V for TMDCFETs and 0.15V for BPFETs, respectively. The evaluations consider an operating voltage of 0.5V.
Static Power
The effect of process variations on the static power of TMDCFET and BPFET SRAMs is reported in Figures 14(a) and 14(b) , respectively, with the static power profiles normalized to the static power value obtained at the nominal V T . The standard deviation of static power for TMDCFET and BPFET SRAMs is 12.4% and 13.3%, respectively. From Figures 14(a) and 14(b) , we see that the distribution is slightly skewed toward higher static power. The high subthreshold slope for these 2D devices is the reason for this skew. The devices which have a lower than nominal V T experience considerably large leakage current, since it is almost on the verge of being switched ON (conducting) because of the steep subthreshold slope. The BPFET SRAM has a slightly higher standard deviation because BPFETs have a less steep subthreshold slope in comparison to TMDCFETs.
Read Stability and Writability
Read stability: The effect of process variations on the read stability (DRNM) of TMD-CFET and BPFET SRAMs is reported in Figures 15(a) and 15(b) , respectively, with the DRNM profiles normalized to the DRNM value obtained at the nominal V T . The standard deviation of DRNM for TMDCFET and BPFET SRAMs is 9.1% and 10.4%, respectively. The standard deviation is really low for both TMDCFET and BPFET SRAMs. The TMDCFETs and BPFETs have a high subthreshold slope and a virtually flat I D -V G characteristic after switching to the conductive phase (refer to Figures 3(a)  and 4(a) ). The DRNM of the SRAM depends on the relative drive strengths of the nFET in the inverter and the nFET access transistor, both of which are in the ON state during a read operation, and a 30% variation in the V T does not change the I ON significantly. Hence, the DRNM does not vary significantly with process variations for TMDCFET and BPFET SRAMs.
Writability:
The effect of process variations on the writability (DWNM) of TMD-CFET and BPFET SRAMs is reported in Figures 16(a) and 16(b) , respectively, with the DWNM profiles normalized to the DWNM value obtained at the nominal V T . The Fig. 16 . Effect of process variations on the writability (DWNM) of (a) TMDCFET and (b) BPFET SRAMs. The standard deviation in the DWNM of TMDCFET and BPFET SRAMs is 9.4% and 9.9%, respectively. standard deviation of DWNM for TMDCFET and BPFET SRAMs is 9.4% and 9.9%, respectively. Similar to DWNM, the standard deviation experienced by DWNM of TMD-CFET and BPFET SRAMs is low. Similar to DWNM, there is no significant change in I ON for a 30% variation in V T for the 2D monolayer transistors.
Read/Write Delay
The effect of process variations on the read delay of TMDCFET and BPFET SRAMs is reported in Figures 17(a) and 17(b) , respectively. We also evaluate the effect of process variations on the write delay of TMDCFET and BPFET SRAMs, as shown in Figures 18(a) and 18(b) , respectively. The read and write delay variations are shown normalized to the nominal read and write delay at the nominal V T of the TMDCFETs and BPFETs, respectively. The standard deviation of read delay for TMDCFET and BPFET SRAMs is 9.2% and 10.7%, respectively. The standard deviation of write delay for TMDCFET and BPFET SRAMs is 9.9% and 11.3%, respectively. Similar to the DRNM and DWNM, the read/write delay depends on the I ON of the active transistors during a read/write operation. Since the I ON for TMDCFETs and BPFETs is not significantly affected by a 30% variation in the V T , the standard deviation of the variations in read/write delay as a result of process variations is not significant.
TMDCFET/BPFET SRAMS: COMPARISON AND CMOS COMPATIBILITY
We evaluated the advantages of the TMDCFET and BPFET SRAMs over CMOS SRAMs in the previous sections. In this section, we compare the TMDCFET and BPFET SRAMs, first at the bitcell level and then at the array level. Fig. 18 . Effect of process variations on write delay of (a) TMDCFET and (b) BPFET SRAMs. The standard deviation in the write delay of TMDCFET and BPFET SRAMs is 9.9% and 11.3%, respectively.
Bitcell:
The TMDCFET SRAMs have lower leakage than the BPFET SRAMs at supply voltages lower than 0.6V, as shown in Figure 6 . The read stability (DRNM) of the TMD-CFET SRAMs is better than that of the BPFET SRAMs as evident from Figure 7 (a). This is due to the lower subthreshold current, as explained in Section 3.2. The TMD-CFET SRAMs exhibit improved writability (DWNM) and write delay over the BPFET SRAMs, as demonstrated in Figures 7(b) and 8(b) , respectively. This is primarily due to the higher effective I ON of the pBPFETs than the pTMDCFETs. During a write operation, the pull-up transistor (pFET) of the inverters in the SRAM structure resist the discharge of the SRAM node. Therefore, a stronger pFET (higher I ON ) increases the write delay and WL CRIT . BPFET SRAMs demonstrate improved read delay than TMDCFET SRAMs, as shown in Figure 8 (a). This can be attributed to the higher effective I ON for the nBPFETs than the nTMDCFETs. The bitline connected to the node 0 through the access transistor discharges during a read operation. Higher I ON for the access transistor and the pull-down transistor (which are both nFETs) results in faster discharge of the bitline and hence lower read delay.
Array:
The TMDCFET SRAMs retains its advantages in static power, DRNM, DWNM, and write delay at the array-level implementations, as evident from Figures 11, 13(a), 13(b) , and 13(b), respectively. The advantage in read delay of BPFET SRAMs, over TMDCFET SRAMs, increases at the array-level implementation, because of its lower bitline capacitance (15fF) than TMDCFET SRAMs (22fF), as estimated in Section 5.1. A lower bitline capacitance discharges faster and hence enhances the existing advantage of BPFET SRAMs in read delay.
In conclusion, TMDCFET SRAMs are favorable in comparison to BPFET SRAMs because they exhibit better performance in all the metrics considered except read delay. The TMDCFETs retain their advantages over the BPFET SRAMs at the array level, as evident from the evaluations in Section 5.2.
CMOS compatibility:
The present challenge that the TMDCFET and BPFET SRAMs face is scaling down to nanoscale dimensions, since current technology nodes for on-chip devices like the processor and cache is 14-16nm. The reported fabricated devices are in the micrometer scale and the fabrication technology for these monolayer transistors need to be improved to produce nanoscale monolayer devices. The extrinsic effects, that is, contact resistance and parasitic capacitance, need to be scaled down effectively for fabrication of functional nanoscale TMDCFET/BPFETs.
To the best of our knowledge, there is no published work that reports integration of CMOS devices and monolayer FETs on the same chip. A discussion of CMOS compatibility challenges based on the fabrication techniques is out of the scope of this work.
However, the TMDCFETs and BPFETs are an emerging technology that has captured research interest as potential replacement for CMOS devices in the sub-20nm scale, due to their high subthreshold slope, low leakage power, and high I ON at low operating voltages. These transistors can be used to build logic circuits apart from memory circuits. Basic circuit blocks like inverter, NAND/NOR gates, and oscillators have been experimentally demonstrated [Wang et al. 2012] . 3D integration of monolayer transistors has also been achieved experimentally [Sachid et al. 2016] . So, if TMDCFETs and BPFETs are successfully scaled to nanoscale dimensions, TMDCFET and BPFET SRAMs will be fabricated alongside on-chip TMDCFET/BPFET logic blocks, and CMOS compatibility of TMDCFET and BPFET SRAMs will not be an issue.
CONCLUSIONS
This article presents a study of monolayer material FET SRAMs: transition metal dichalcogenide (TMDCFET) and black phosphorus (BPFET) SRAMs. We have adopted a bottom-up simulation flow that combines quantum atomistic device simulations with circuit modeling to evaluate low-voltage TMDCFET and BPFET SRAMs. The monolayer FETs considered in this article promise fast, low-energy, and stable SRAM circuits, outperforming both nominal and read/write-assisted 16nm CMOS SRAMs in all aspects at low supply voltages. Although reducing contact resistance and parasitics remain immediate challenges to scaling TMDCFETs and BPFETs to the nanometer regime, this is an active area of research. Our simulations show that TMDCFET and BPFET SRAMs will outperform scaled CMOS SRAMs as fabrication and technology hurdles are overcome. Directions for future research include the investigation of the impact of variability and defects on TMDCFET and BPFET transistors and circuits.
