Monolayer MoS 2 , MoSe 2 , MoTe 2 , WS 2 , WSe 2 , and black phosphorous field effect transistors (FETs) operating in the low-voltage (LV) regime (0.3V) with geometries from the 2019 and 2028 nodes of the 2013 International Technology Roadmap for Semiconductors (ITRS) are benchmarked along with an ultra-thin-body Si FET. Current can increase or decrease with scaling, and the trend is strongly correlated with the effective mass. For LV operation at the 2028 node, an effective mass of ∼ 0.4 m 0 , corresponding to that of WSe 2 , gives the maximum drive current. The short 6 nm gate length combined with LV operation is forgiving in its requirements for material quality and contact resistances. In this LV regime, device and circuit performance are competitive using currently measured values for mobilities and contact resistances for the monolayer two-dimensional materials.
I. INTRODUCTION
There is significant interest in understanding how two-dimensional (2D) semiconductors compare with traditional semiconductors for use as the channel material in ultra-scaled field effect transistors (FETs). The FET also serves as a baseline device for determining targets for material parameters. For example, given a set of FET performance specifications such as drive current, switching energy, switching delay, etc., one can then ask, "What material parameters, such as, for example, mobility, effective mass, bandgap, or contact resistance, are sufficient to achieve these device performance metrics?" One can also enquire, "What material parameters optimize the device performance?" Thus, benchmarking of a baseline device provides top-down targets for materials benchmarking [1] .
Promising 2D semiconductors include the transition metal dichalcogenides (TMDs) with the chemical form MX 2 where M = Mo or W and X = S, Se or Te [2] - [9] , and bandgaps in the range of 1-2 eV [3] , [6] . A more recent addition to the van der Waals (vdW) class of materials for field effect transistor (FET) applications is black phosphorus (BP) [10] - [12] . BP's large field effect mobility and highly anisotropic bandstructure make it a promising material for FET applications [10] , [11] , [13] - [16] .
A number of articles in the literature have theoretically predicted the performance of these alternate materials for future device applications. While the majority of the performance predictions are for MoS 2 FETs [17] - [24] and BP [14] , some of them focus on device comparisons within the TMD group for conventional FETs [25] - [27] and for tunnel FETs [28] , [29] . The BP FET was compared against the MoS 2 FET in Ref. [30] . A BP based TFET was proposed in Ref. [31] .
There are two different operation regimes denoted as high performance (HP) and low power (LP) defined in the 2013 ITRS [32] . There is also a low voltage (LV) regime considered in Ref. [33] and benchmarked in Refs. [34] , [35] . It is this LV regime that we consider in this work with a supply voltage of 0.3 V. As of today, there are a large number of material candidates for future CMOS devices. But little is known about their relative performance S. S. Sylvia and R. K. Lake are with the Department of Electrical and Computer Engineering, University of California, Riverside, CA 92521-0204, USA (e-mail: ssylvia@ece.ucr.edu; rlake@ece.ucr.edu).
K. Alam is with the Department of Electrical & Electronic Engineering, East West University, Dhaka, Bangladesh (e-mail: kalam@ewubd.edu).
We thank Prof. E. Tutuc for sharing his unpublished mobility data. This work is supported in part by FAME, one of six centers of STARnet, a Semiconductor Research Corporation program sponsored by MARCO in the LV regime, since, to the best of our knowledge, they have never been compared in a single systematic study. In general, LV has been given less attention than HP or LP operation. Inspired by the device benchmarking of the Nanoelectronics Research Initiative (NRI) [34] , [35] and the materials benchmarking of STARnet centers [1] , in this work we present and compare BP and 5 different TMD based FETs. For a baseline comparison, we also simulate an ultra thin body (UTB) Si FET using the same model and code. The vdW materials that we chose to compare are MoS 2 , MoSe 2 , MoTe 2 , WS 2 , WSe 2 and BP. Performance metrics are compared for individual devices as well as for a standard integrated circuit of a 32 bit adder. using the beyond CMOS benchmarking (BCB) scheme 3.0 [35] .
II. SIMULATION METHOD
The structural parameters for the devices were taken from columns 2019 and 2028 of the Low Power (LP) technology requirement tables, ITRS 2013 [32] . The values are summarized in Table I . The devices are assumed low voltage (LV) with V DD = 0.3 V [35] . Two different production years were selected to examine the effect of scaling on the devices of interest. We primarily considered single gate (SG) FETs, and a few exemplary simulations were performed for double gate (DG) structures as well. Figure 1 shows the device structures used for the simulations in this work [36] . The buried oxide and extended oxide regions are SiO 2 with a dielectric constant of 3.9. The gate oxide is composed of both high-K (according to Table I ) oxide under the gate and SiO 2 [37] in the source-drain extensions for improved gate control. For the Si FET, transport from source to drain is in the (100) direction. For the BP FET, transport is in the X direction, the direction of the light mass. For the circuit metrics, the default width of 4 times the pitch is used for the FETs [35] .
For the TMD and Si FETs, electron conduction is considered while for the BP FET, both electron and hole conduction are considered, since most recent experimental work focuses on hole transport [38] . For the vdW materials, the source and drain doping densities were swept from 1 × 10 19 to 1 × 10 20 cm −3 (∼ 5.7 × 10 11 − ∼ 7.3 × 10 12 cm −2 ). For each node and geometry, two results are recorded. One result is for the doping density that results in the maximum drive current. The second result is for the maximum doping density of 1 × 10 20 cm −3 . The drive currents versus source doping are shown in Fig. S1 of the Supplementary Information. This optimization is performed with the contact resistance set to zero. For the 3 nm Si UTB FETs, a source and drain doping density of 1 × 10 19 cm −3 (3 × 10 12 cm −2 ) is used [19] .
Material properties for all of the materials considered in this work are summarized in Table S1 of the Supplementary Information. The UTB Si has a finite thickness of 3 nm, and all of the vdW materials are monolayers. It has been shown that adding multiple layers on top of a single layer cannot boost the on current [24] . Listed mobilities for monolayer vdW materials are experimentally measured values obtained from the literature [7] , [39] - [43] except for MoTe 2 . Mobility in monolayer MoTe 2 was unknown at the time of this work, hence it was approximated from MoSe 2 using both materials' electron effective masses (see footnote of table S1).
As evident from the conduction band Λ-valley to K-valley energy separation, ∆ KΛ , listed in Table S1 , all values of ∆ KΛ are less than V DD , and, therefore they will have an effect on the electron transport in the TMD FETs. Also, there is considerable spin-splitting in many of the conduction band K and Λ valleys. Therefore, we have taken the different spins and valleys into account by using a multiple single-band approach as depicted in Figure  2 . In this approach, each spin and valley is treated as an independent band with its own effective mass.
For each band, the discretized effective mass Schrödinger equation is solved for the charge density using a nonequilibrium Green function (NEGF) approach similar to that described in [19] . The heavily doped source and drain regions are treated as contacts in equilibrium with their respective Fermi levels [44] . The total charge at each site is the sum of the charge calculated for each band. The charge is self-consistently solved with Poisson's equation. The electrostatic potential within the device is calculated using a 2D finite difference solution of Poisson's equation discretized on a 0.2 nm grid within the channel and a 0.5 nm grid within the oxide. Dirichlet boundary conditions are set at the metal gate and Von Neumann boundary conditions are used at all other exterior boundaries.
Once the charge calculation has converged, current is calculated for each band. The contribution from all bands is summed to give the total current. The effect of scattering in the channel is included with a reflection coefficient determined from a mean free path related to the mobility and an effective channel length [45] , [46] . Details are provided in the Supplementary Information.
The off current is set at 1.5 nA/µm for all devices. The drain bias V DS , and on-state gate voltage V GS are 0.3 V. The maximum allowable source-drain total contact resistances (R SD ) are estimated following the methodology used by the ITRS [32] , [36] . For this, a reference value of current was first calculated with scattering included but R SD set to 0. A set of simulations including scattering were then performed for a range of R SD values. R SD was divided equally between the source and drain. In the self-consistent loop, the internal gate and drain potentials with respect to the source, V ′ GS and V ′ DS , were updated at each iteration according to the source potential by I D R SD /2 which lowers the gate to source voltage by the same amount. The particular value of R SD that resulted in a 33.3% reduction of current compared to the reference current was then chosen as the maximum allowable contact resistance for the LV devices. Two performance metrics are the switching delay and the switching energy defined as
Here, I on is the on-current, C is the total capacitance that includes the oxide capacitance, the semiconductor capacitance (also known as quantum capacitance), and any parasitic capacitance that might be present. The capacitance is determined as follows. The total capacitance C = ∂Q/∂V G where Q is the total charge in the entire semiconductor region that includes the source, channel and drain. In this manner, the gate and fringing capacitances are taken into account all at the same time. In doing so, one has to make sure that no other external inputs are changing except the applied gate bias. Therefore, the total charge Q is calculated with R SD = 0, since R SD alters the effective gate voltage V ′ GS . The calculated drive currents and capacitances are input into the BCB 3.0 scripts [35] . The BCB 3.0 scripts use the input for one type of transistor and approximate the on-current of the pFET is equal to that of the nFET. Delay times and switching energies are calculated using empirical rules chosen to match SPICE simulations. For circuits, a per unit length interconnect capacitance of 126 aF/µm is used, and the interconnect length associated with each transistor is 20F where F is the DRAM half pitch corresponding to the technology node. Full details of the BCB 3.0 method are given in Ref. [35] .
III. RESULTS
The on-current, optimum doping, and series resistance for each material, node, and geometry are tabulated in Table II . I ball refers to the ballistic on-current calculated with both the contact resistance R SD and the backscattering coefficient r c set to 0. R SD is the maximum allowable total contact resistance (source plus drain) that degrades the current calculated in the presence of scattering by 33.3%. I scatt is the on-current where both r c and the maximum allowable R SD are included. For the rest of our discussion, unless otherwise noted, the on-current will refer to the The physical mechanisms governing FET performance are the same as those analyzed in Ref. [47] for III-V FETs, the balance between source exhaustion and tunneling leakage. The range of transport effective masses, from 0.15 m 0 for p-type BP to 0.53 m 0 for MoTe 2 , make this balance different for the different materials. The optimum source doping is lower for the lighter mass materials. The lower doping results in longer screening lengths of the channel potential into the source and the drain regions increasing the effective channel length and decreasing the off-state direct tunneling.
For the X-directed transport in p-type BP, the low mass in the transport direction provides a high velocity, and the large transverse effective mass provides many modes for transport. Because of the low transport mass, the optimum source doping of 2 × 10 19 cm −3 is the lowest among the vdW FETs. As shown in Fig. 4 , in the off-state, the low doping results in long screening lengths of the channel potential into the source and the drain regions increasing the effective off-state channel length and decreasing the off-state direct tunneling. The off-state channel potential decays approximately 10 nm into the source and 15 nm into the drain giving an off-state total effective source to drain length of 30 nm at the 2028 node. In the on-state, the small channel potential decays within a few nanometers into the source, and the high field region extends approximately 10 nm into the drain. Thus, the effective source to drain region in the on-state is approximately 15 nm. One advantage is that the longer depletion lengths in the source and drain reduce the fringing capacitance between the source and drain and therefore reduce the RC delay time. A disadvantage is that the transit time increases. A saturation velocity of 10 7 cm/s gives a transit time that is 10 times less than the RC delay time. At 10 6 cm/s, the two times are comparable.
As the gate length is scaled from 13 nm to 6 nm, with optimized doping, the on-current of BP drops slightly, and the on-currents of all of the TMD FETs increase. The TMD FETs with the heavier effective masses benefit from scaling, while the BP FET with the lightest transport mass is degraded by the scaling. In every case, the ballistic current decreases as the channel length decreases from 13 nm to 6 nm, in agreement with previous work [36] , and the ballistic current of p-type BP with the lightest transport mass decreases the most. For BP, the large decrease in the ballistic current dominates, and the total current including scattering decreases. For the heavier mass TMDs, the ballistic current is only slightly reduced. As the channel length becomes comparable to the mean free path, reflection is reduced. This process dominates for the TMDs with heavier effective masses, and their on-current increases as the gate length is scaled down to 6 nm.
The effective mass affects two processes that determine if the current will increase or decrease with scaling, and the trends become very clear with a fixed source and drain doping of 1 × 10 20 cm −3 as shown in the inset of Fig. 3 . The first process is direct tunneling through the channel, and the second process is scattering in the channel. The process of direct tunneling is governed by the effective mass of the channel material. A heavier mass minimizes the off-state leakage which enhances the drive current for a fixed V DD , because a smaller percentage of V DD is required to shut the device off. This effect is illustrated in Fig. 5 . The background color indicates the current spectrum (on a log scale) with the brightest yellow indicating the highest current. A comparison of Figs. 5(a) and (b) shows that, in the off-state, tunneling is significant through the BP barrier but is suppressed in the MoTe 2 barrier. For BP with the lightest transport mass, the barrier height required to attain the off-state current of 1.5 nA/µm is 365 meV. For MoTe 2 with the heaviest transport mass, the barrier height required to attain the same off-state current is 307 meV, approximately 60 meV lower than that for BP. Applying 0.3 V to the gate reduces the potential in the channels by 254 meV for BP and 247 meV for MoTe 2 , so that the barrier height in the on-state is 111 meV for BP and 60 meV for MoTe 2 . Thus, the barrier height of the channel in the on-state for BP is almost twice that for MoTe 2 . This effect is responsible for the reduction in I ball as the gate length is scaled from 13 nm to 6 nm.
The second process of scattering in the channel is also strongly correlated with the effective mass. A heavy mass is associated with a short mean free path, so that as the channel is scaled down to 6 nm, the device becomes more ballistic, r c decreases, and the current increases with scaling. The Mo compounds have the highest effective masses, the lowest measured electron mobilities, and the shortest mean free paths as shown in Table S1 . Therefore, these materials benefit most from scaling, since direct leakage through the channel is not a problem, and they become more ballistic as the channel length is scaled. For BP with the lightest mass in the transport direction, the first process of tunneling dominates the performance, and there is significant reduction in I scatt going from the 2019 to the 2028 node when the doping is fixed at 1.0 × 10 20 cm −3 . Even at the optimum doping condition, BP is the only 2D material that suffers from a reduction in current after scaling.
Adding a second gate to create a DG structure increases the magnitude of the current, and the increase in the magnitude of the current is qualitatively different for the vdW channels and the UTB Si channel. At the 2019 node, adding a second gate increases I ball by a factor of 1.7 for the TMD FETs and 1.63 for p-type BP. The increase in I scatt is slightly less. For 2028 TMDs, adding the second gate increases I ball by factors of 1.8 -1.94 for TMDs and 1.9 for both BP. The increase in I scatt is identical to the increase in I ball within numerical error. The larger increases in current due to doubling the gates in the 2028 2D FETs indicate that the single gate is losing some control of the channel when the gate is scaled down to 5.9 nm. In the DG geometry, the second gate provides greater electrostatic control of the channel. The increased gate control moves the position where ∆V ch = k B T /q further towards the drain which increases L eff and, consequently, r c , and is the reason why the increase in I scatt resulting from a second gate may not be quite as large as the increase in I ball .
The maximum allowable projected total contact resistance (source plus drain) R SD for each node and material are also included in Table II . For the SG devices, the current is small, and one can get away with relatively high contact resistances on the order of 0.48 to 0.95 kΩµm per contact at the 2019 node, and 0.42 to 0.52 kΩµm per contact at the 2028 node. To achieve the higher current densities of the DG TMD devices, lower contact resistances are required, on the order of 265 -450 Ωµm per contact at 2019 node and 215 -300 Ωµm per contact at 2028 node. Contact resistances of 240 Ωµm have already been reported in literature [48] .
From Eq. (1), the product of device capacitance and resistance gives the switching delay of each individual device. Fig. 6 shows the capacitance versus resistance for each material, node, and geometry. The arrows show the effect of going from a SG geometry to a DG geometry. First, we discuss the SG geometry at each node. At the 2019 node, among the SG vdW FETs, MoSe 2 and MoTe 2 have both the most resistance and capacitance and BP has the least. At the 2028 node, among the SG vdW FETs, WSe 2 has the smallest resistance among all the vdW materials since it has the highest drive current, and BP has the lowest capacitance. To understand the low capacitance, recall that the 'device' capacitance is determined by C = ∂Q/∂V G . Therefore, if the device is only weakly turned on, there is little charge in the channel, and C is small, irrespective of the actual geometrical gate capacitance. Considering the band diagram of BP at the 2028 node in Fig. 4 , it is weakly turned on since the top of the barrier is 83 meV above the source Fermi level. In comparison, MoTe 2 with the heaviest mass is more strongly turned on, and its capacitance is the highest even though its current is the lowest among the vdW FETs. Its low current or high resistance result from the low mobility and short mean free paths.
Both the 2028 SG and DG Si FETs stand out in Fig. 6 . Applying a DG to 2028 UTB Si gives a capacitance that is slightly below the DG vdW FET using p-type BP. There are several reasons for the low capacitance of the Si DG FET. The 3 nm thick channel requires a double gate to accumulate significant charge in the channel and turn the device on. Even when charge is accumulated in the channel, the relatively lower effective mass of the [49] lowest quantized state in the channel of 0.22 m 0 results in a lower quantum capacitance [19] . Finally, the lower doping of the source and drain of 10 19 cm −3 compared to the doping of the DG vdW FETs of 4 × 10 19 cm −3 -5 × 10 19 cm −3 results in longer depletion regions in the source and drain that reduce the fringing capacitance for the sidewalls of the gates. The UTB Si band diagrams shown in Fig. S3 illustrate these points. The intrinsic switching energies versus switching delay times are shown in Fig. S4 . At node 2019, the SG WS 2 and WSe 2 FETs and DG-Si have very similar switching energies and delay times. Adding a second gate to the 2D materials is detrimental in all cases causing both the energy and delay to increase. At the 2028 node, adding a second gate still moves all of the 2D materials to a higher energy-delay product. Only Si is moved to a lower energy-delay product by the addition of a second gate.
Energy-delay benchmarks for a 32 bit adder are shown in Fig. S5 . Now, the added capacitance of the interconnects is included. For a per unit length capacitance of 126 aF/µm, the interconnect capacitance per transistor (c i ) is 50 aF at the 2019 node and 18 aF at the 2028 node. The default widths used for the FETs are 4 times the pitches, and they are 80 nm at the 2019 node and 28.4 nm at the 2028 node. Multiplying these widths times the capacitance values in Fig. 6 gives the actual FET capacitances. For the vdW FETs, at the 2019 node, c i ranges between 1.33 -2.05 times the SG-FET capacitances and between 0.82 -1.16 times the DG-FET capacitances. At the 2028 node, c i ranges between 2.18 -3.73 times the SG-FET capacitances and between 1.35 -2.18 times the DG-FET capacitances. The interconnect contribution to the delay depends on the current that flows through the interconnect, and this current is the same as the device current. As a result, the drive current becomes more important for the performance of circuits. For a SG-TMD FET at either the 2019 or 2028 node, adding a second gate increases the intrinsic device switching energy more than it decreases the delay, so that the device energy-delay product increases. This same trend applies to the 2019 circuit. However, for the 2028 circuit, adding a second gate leaves the energy-delay product almost unchanged for BP, WS 2 and MoS 2 and slightly increased for WSe 2 .
The power density as a function of computational throughput is shown in Fig. S6 . Computational throughput is defined as number of integer operations per second per unit area (32 bit additions in the case of 32 bit adder) [34] . The throughput is the inverse of the circuit delay time in Fig. S5 divided by the circuit area. Since the areas for all adders at a given node are taken to be the same, the throughput is proportional to the inverse of the adder delay time. At the 2028 node, SG WSe 2 , WS 2 , and BP all have significantly higher throughputs than DG-Si with slightly higher power density. Following Refs. [34] and [35] , we set the power density limit to 10 W/cm 2 . All of the FETs lie within the power density constraints since they all operate at low voltage (0.3 V).
IV. SUMMARY AND CONCLUSIONS
We performed quantum mechanical simulations for vdW FETs with monolayer MoS 2 , MoSe 2 , MoTe 2 , WS 2 , WSe 2 , and BP channels operating in the LV regime for geometries corresponding to those of the 2019 node and the 2028 node of the 2013 ITRS. A UTB Si FET was simulated using the same approach to provide a comparison. The FET serves as a baseline device for determining targets for material parameters. As the gate length is scaled from 13.3 nm to 5.9 nm, blocking the leakage current becomes more critical, and the TMD materials with the heavier effective masses benefit most from extreme scaling. For all materials, the ballistic current always reduces with scaling in agreement with previous work [36] . However, the full current that includes the effect of scattering can either increase or decrease, and the increase or decrease is governed by two competing processes that are both closely tied to the effective mass, direct tunneling through the channel and backscattering from the channel. There is an optimum effective-mass of ∼ 0.4 m 0 corresponding to that of WSe 2 that provides a maximum drive current for LV operation with V DD = 0.3 V. The short 6 nm gate length combined with LV operation is forgiving in its requirements for material quality and contact resistances. Low-voltage results in low current and thus low IR drop across the contact resistances, and the short 6 nm gate length becomes less than the mean free path of the low-mobility material. At the 2028 node, the single gate vdW FETs show competitive performance in terms of drive current and power density. These performance metrics are obtained using currently measured values for mobilities shown in Table S1 and contact resistances shown in Table II that are comparable to the best measured contact resistances [48] .
Supplementary Material Table S1 provides the material parameters used in the calculations described in Sec. II of the paper. The measured mobility in monolayer MoTe 2 was unknown at the time of this work, hence µ M oT e2 was calculated as
During the review process, we became aware of measurements on multilayer MoTe 2 flakes showing a room temperature mobility of approximately 21 (cm 2 /Vs) [50] . Since TMDs are weakly coupled van der Waals layers, these mobility values can be representative of monolayer mobility as well [51] (or an upper bound). One caveat with these values is that the devices were not encapsulated during measurements and hence the mobility values represent a lower bound. A mobility value of 21 (cm 2 /Vs) (in contrast to 42.74 (cm 2 /Vs) as used in this work) would shorten the mean free path further and degrade the overall performance for MoTe 2 . For Si, we followed Ref. [52] and used a mobility of 200 (cm 2 /Vs) which could be considered as an optimistic value, since at lower inversion charge densities, mobilities can be reduced by a factor 2 [53] . Fig. S1 shows how the drive currents vary as a function of the source and drain doping densities. The doping densities that gave the maximum drive currents (in the absence of contact resistance) were chosen. At the highest doping of 1 × 10 20 cm −3 , the Fermi level lies close to the band edge for all of the vdW materials. The source (and drain) degeneracy E F s − E cs (E vs − E F s ) varies between -21.65 meV to 10.5 meV for the vdW materials where E F s is the source Fermi energy and E cs (E vs ) are the source conduction (valance) band edges. Even though the source doping of Si is one order of magnitude less than the highest doping used for the TMD FETs, the source degeneracy of the Si Fermi level, E F s − E cs ≈ 35 meV, is the largest among all of the FETs. This is a result of its density of states mass (0.22 m 0 ) times its degeneracy, 2 orbitals and 2 spins, being the smallest. For comparison, BP has the smallest transport mass, but, because of its huge anisotropy, its density of states masses of 0.98 m 0 for hole and 0.44 m 0 for electron are large.
While source exhaustion sets the lower limit on the doping in an unconstrained layout, there are design rules that limit the extent of the depletion regions into the source and drain. The source and drain depletion lengths will be terminated at the n + vias for the metal 1 contacts to the source and drain. Following the layout of Fig. 26 in Ref. [34] , for the 2028 node, these regions will be 7.1 nm to the left and right of the physical gate limiting the depletion lengths to 7.1 nm into the source and drain. To determine whether the layout constraint at the 2028 node affects the performance trends, we simulate the SG p-type BP and WSe 2 FETs with 1 × 10 20 cm −3 doping in the via regions on the left and right side of the gate with optimized doping between the via and the gate. The value of the optimum doping does not change, and the band diagrams for the p-type BP FET with and without the heavily doped via are shown in Fig. S2 . The currents for both the WSe 2 FET and the BP FET slightly decrease. For WSe 2 , I ball = 59.2 µA/µm, and I scatt = 32.3 µA/µm. For p-type BP, I ball = 53.3 µA/µm, and I scatt = 28.0 µA/µm. In both cases I ball decreases more than I scatt . The reason is that in the on-state, there is a stronger pull on the channel from the heavily doped drain via that drives the point at which the channel potential drops by k B T back towards the source. This reduces the effective channel length, L eff , which reduces the backscattering coefficient r c . Since the trends and relative performance are not affected by the proximity of the via we did not consider it in the main text.
For 2019 UTB Si, going to a DG structure increases I ball by a factor of 3.9 and I scatt by a factor of 3.7 compared to their values in the SG geometry. For 2028 UTB Si, going to a DG structure increases I ball by a factor of 7.3 and I scatt by a factor of 6.7 compared to their values in the SG geometry. The much larger increases in the UTB Si currents going from a SG to a DG geometry at the 2028 node compared to those of the 2D material currents are a result of the different channel thicknesses. At the 2028 node, a double gate is required to control the potential through the 3 nm Si channel. This is illustrated in Fig. S3 . The set of green curves in Fig. S3 shows the conduction band edges for SG Si at each grid point through the depth of the Si channel. The highest curves are at the top of the channel adjacent to the gate oxide, and the lowest curves are at the bottom of the channel adjacent to the substrate. The large spread in energy of the curves illustrates the loss of control of the channel potential by the single gate. The set of blue curves show the same set of conduction band edges for the DG device. The double gate provides good control of the potential throughout the channel. For the thinner monolayer vdW FETs, a single gate is adequate.
Figs. S4 -S6 are enlarged versions of Figs. 7 -9 of the main article.
NEGF Details
The heavily doped source and drain regions are treated as contacts in equilibrium with their respective Fermi levels [44] , and the charge in those regions is calculated from the equilibrium expression,
where ν is the band index, s ν is the spin degeneracy and n ν is the valley degeneracy which is 2 for the K-valleys, 6 for the Λ-valleys and 1 for the Γ-valley. m ν y is the effective mass in the width direction, k B is Boltzmann's constant, T is temperature, and A ν i is the spectral function on site i for band ν given by −2ImG R i,i;ν (E). The factors η S(D) = (µ S(D) − E)/k B T are the reduced Fermi factors resulting from analytically integrating over the transverse momentum where µ S(D) is the Fermi level of the source (drain), respectively. Within the device region, the charge is calculated from the non-equilibrium expression,
where A ν i;S(D) is the source (drain) connected spectral function at site i for band ν, given by A ν i;S = |G R i,1;ν | 2 Γ 1,1;ν and A ν i;S = |G R i,N ;ν | 2 Γ N,N ;ν . The drain current is calculated within the self-consistent loop from
where T ν (E) is the transmission coefficient for band ν, and r c is the backscattering coefficient,
. L eff is the critical length determined by the distance from top of the barrier in the channel to the position towards the drain where the potential drops by k B T /q. The mean free path λ is calculated using [46] 
where v T is the thermal velocity and η top = (µ S − E c,top )/k B T . The same equations are used for the UTB Si FET. The discretization within the 3 nm thick channel is 0.2 nm in the z direction and 0.25 nm in the x direction as shown in Fig. 1 of the main article. The charge is calculated everywhere using the non-equilibrium expression, Eq. (3). [3] , [7] , [19] , [39] - [43] , [52] 
