Abstract. Although MoS 2 field-effect transistors (FETs) with high-k dielectrics are promising for electron device applications, the underlying physical origin of interface degradation remains largely unexplored.
Introduction
MoS2 field-effect transistors (FETs) with high-k dielectrics have attracted significant attention in ultimate scaled device research [1] [2] [3] [4] [5] [6] [7] because a natural thin body (0.65 nm per layer) is expected to suppress short-channel effects. The electrostatic field-effect control of carriers determines most of the device characteristics and needs to be fully understood before exploring the underlying physics of the electrical transport properties. For example, metal-insulatortransition (MIT) is widely studied in MoS2 and other two-dimensional (2D) materials [3, [8] [9] [10] . However, a poor understanding of high-k/MoS2 interface properties might result in an erroneous subsequent physical analysis because the field-effect control by the gate is extrinsically affected by the interface states density (Dit), which may arise from the defects in MoS2 and/or dangling bonds in high-k oxides. Specifically, detailed observation via scanning transmission electron microscopy (STEM) and scanning tunneling microscopy (STM) has indicated the existence of sulfur vacancies on the order of ~10 13 cm -2 in mechanically exfoliated and chemically vapor deposited (CVD) MoS2, which introduce defect states below the conduction band according to the density functional theory (DFT) calculation [11] [12] [13] and severely degrade its electrical properties [14] [15] [16] [17] . However, it has not yet been determined whether the electrically activated interface states originate from defect states corresponding to S vacancies because they were evaluated as a function of gate voltage instead of the Fermi energy (EF).
Moreover, the field-effect control by the gate is reduced intrinsically due to the small density of states (DOS) of thin MoS2 [18, 19] . Extra kinetic energy is required to induce carriers in the MoS2 channel, which introduces quantum capacitance (CQ = e 2 DOS). The evolution of equivalent circuits from bulk MoS2 to monolayer MoS2 is shown in figure  S1 . The capacitance for the multilayer MoS2 consists of both the depletion capacitance (CDep) and CQ in series, whereas CQ is the only constituent for monolayer MoS2 because CDep becomes so large that it can be neglected. Here, capacitance-voltage (C-V) measurement is powerful for directly probing both CQ and Dit, which results in the full understanding of the mechanism of field-effect control [20, 21] . Although researchers have attempted to extract Dit as a function of EF for multilayer and monolayer MoS2 with both the capacitor structure [10, 17, 22, 23] and FET [16, [24] [25] [26] structure by C-V measurements, the lack of a detailed study on CQ makes the Dit energy distribution questionable. Based on the DOS of 2D materials and the Fermi distribution, CQ is theoretically formulated for the MoS2 monolayer [19] , but it is neither measured well nor fitted experimentally [24] . One of the main reasons is the lack of consideration of how the interface trap capacitance (Cit = e 2 Dit) affects CQ extraction. Therefore, to elucidate all the constituents of the electrostatic field-effect control, focus should be on monolayer MoS2, which finally results in the understanding of the whole picture of transport properties in the MoS2 FET. In this work, the systematic investigation of C-V and current-voltage (I-V) measurements of the same samples is carried out based on relatively high quality monolayer mechanically exfoliated MoS2 FETs. The interface properties are evaluated as a function of EF to elucidate the physical origin of interface degradation. A band-tail-shaped Dit is observed with the lowest value of 810 11 cm -2 eV -1 for the monolayer MoS2. With careful consideration of the effect of interface states, CQ is clearly extracted experimentally over the temperature range of 75-300 K for the first time. The correlation between top gate bias (VTG) and EF is obtained via the CQ analysis. Having evaluated Cit and CQ quantitatively by C-V measurements, I-V characteristics are then well reproduced and understood by utilizing the drift current model. The origin of MIT in monolayer MoS2 is finally discussed and is suggested to be the external outcome resulting from Cit and CQ.
Results and discussion
In this paper, monolayer MoS2 films were mechanically exfoliated onto the SiO2 (90 nm)/n + -Si substrate from natural bulk MoS2 flakes. Raman spectroscopy and atomic force microscopy (AFM) were employed for determining the layer number (Details are shown in figure S2 ). Ni/Au was deposited as source/drain electrodes. Then, Y metal with a thickness of 1 nm was deposited via thermal evaporation of the Y metal from a PBN crucible in an Ar atmosphere with a partial pressure of 10 -1 Pa, followed by oxidization in the laboratory atmosphere at room temperature to form the buffer layer [27, 28] . The Al2O3 oxide layer with a thickness of 10 nm was deposited via atomic layer deposition, followed by the Al top-gate electrode formation. I-V and C-V measurements were conducted using Keysight B1500 and 4980A LCR meters, respectively. All electrical measurements were performed in a vacuum prober with a cryogenic system. Figure 1 shows a schematic drawing and optical image of the dual-gate monolayer MoS2 FET. It should be noted that a monolayer MoS2 with a large area (>30 μm 2 ) was selected for device fabrication and characterization because the measured capacitance should be larger than the stray capacitance (~10 fF) of the measurement system. Figure 2a ,b shows the IDS-VTG characteristic at VDS = 0.1 V as a function of VTG for three different MoS2 FET samples measured at room temperature. The device performance is often scattered from device to device, indicating relatively low reliability of monolayer MoS2. The top gate oxide capacitance (CTG) can be determined by the relative ratio of capacitive coupling between the top and back gates with a MoS2 channel (Details are provided in figure S3 and the extracted physical properties are summarized in table S1). According to the CTG value, the two-probe field effect mobilities (μFE) for samples 1, 2 and 3 are estimated to be 9.5, 6.0 and 2.0 cm 2 V -1 s -1 , respectively. Although the mobility is largely underestimated due to the access region, as indicated in figure 1b, and the contact resistance, its difference among these three samples still indicates the difference in their interface properties. Indeed, the sample with the highest mobility (sample 1) exhibits the sharpest subthreshold region, in other words, the smallest subthreshold swing (S.S.). The S.S. values for the IDS range of 10 -11~1 0 -10 A for samples 1, 2 and 3 are estimated to be 110, 300, 300 mV/dec, respectively. Since S.S. depends on VTG, Dit can be precisely extracted as a function of VTG-VTH in the subthreshold region based on S.S., as shown in figure 4d, where VTH is the threshold voltage (Details are provided in note S1). The sample with the highest mobility has the lowest Dit level within the smallest VTG range. 
Dit extraction from S.S. in I-V

Dit extraction from the equivalent circuit analysis of C-V
The interface properties are studied via capacitance measurement for the same three samples. Figure 3 shows corrected Ctotal-VTG curves for the frequency range of 1 kHz -1 MHz. Parasitic capacitance (Cpara) was carefully considered and removed (details are provided in figure S4 ). Ideally, the measured capacitance (Ctotal) is zero in the deep depletion region, that is, in the off state for the I-V, and saturates asymptotically to CTG in the strong accumulation region because CQ (~84 μFcm -2 ) for monolayer MoS2 in this region is much larger than CTG. Therefore, all of the C-V curves at different frequencies were shifted to start from zero in the off state. This procedure is reasonable because the CTG obtained in the strong accumulation region after this correction is consistent with the CTG estimated from capacitive coupling between the top gate and back gate in the I-V within a 10% error. Hereafter, Ctotal is defined as the measured capacitance without Cpara. Frequency dispersion is observed in figure 3 for all the samples. Observed frequency dispersion clearly indicates the interface quality of measured samples. Specifically, the sample with best interface quality (sample 1) has smallest frequency dispersion. In general, frequency dispersion has two origins. One comes from large Cit, which reveals the interface property directly [16] . The other comes from series resistance effect [17] , even though the ohmic contacts are realized by the Ni/Au contact (Details are provided in figure S5 ). However, the large Cpara from SiO2/n + -Si substrate prevents us from extracting correct conductance signals. The quantitative analyses of series resistance effect is beyond the scope of this paper. The next task is to separately and quantitatively clarify Cit and CQ.
To quantitatively estimate the value of Dit via its frequency response, the capacitance is measured as a function of frequency (f). The equivalent circuit is modeled as shown in figure 4a. Ctotal can be calculated based on the following equation:
where τit is the time constant for Dit, and A and B refer to two types of interface states. This equation is slightly different from that used in a previous paper for a CVD monolayer MoS2 FET [16] because the multi-level model is more practical than the single-level model [20] . Figure 4b shows the Ctotal-f curves at different VTG for sample 1. Ctotal decreases with increasing frequency because Cit is unable to completely respond at high frequency. Therefore, 1/Ctotal saturates to Band-tail-shaped C it are assumed for all the samples, which corresponds to the C it level extracted experimentally by the C-V measurement. An additional C it peak with a Gaussian distribution at the peak energy of 0.1 eV below the CBE is introduced for sample 3, which corresponds to the observed hump. 1/CTG + 1/CQ at the high frequency limit according to eq. (1). By using CQ and two types of Dit and tit as fitting parameters, the experimental data are well reproduced, as shown by the solid black lines. Although the number of fitting parameters is large, the accuracy of the estimated Dit and τit is sufficiently high for quantitative analysis because Dit and τit characterize different physical properties. Although two types of interface states are considered, Dit mainly originates from the interface states of type A for most of the measured samples (DitA > DitB, τitA > τitB) (Details are provided in figure S6 ). Thus, DitA and τitA are simply referred to as Dit and τit. CQ, τit and Dit from this fitting are plotted as a function of VTG -VTH in figure 4c,d. VTH is theoretically defined as VTG at CQ = CTG [19] , which will be explained later in the CQ analysis. Both CQ and τit exhibit an approximately linear relation on the logarithmic scale [29, 30] . It should be noted that CQ can only be accurately extracted when the saturation tendency of Ctotal in figure 4b is clearly observed, which restricts the VTG range for the CQ extraction. Therefore, considering the Ctotal-f curves for samples 2 and 3 shown in figure S7, CQ can be extracted from sample 2 but not from sample 3. Now, let us compare the Dit values extracted from the C-V and I-V measurements. As shown in figure 4d , the values of Dit are comparable. This indicates that the interface properties were successfully evaluated via the electrical measurements. The lowest Dit obtained in this study is ~810 11 cm -2 eV -1 , which is one order of magnitude lower than that of a CVD MoS2 FET [16] . It is expected that these improved interface properties come from a higher quality of the bulk MoS2 crystals and the Y2O3 buffer layer. The Dit tail close to the conduction band is still observed for all samples. It should be noted that Dit is still presented as a function of VTG -VTH, but not EF, because the experimental correlation between VTG and the channel potential (VCH, EF = eVCH) is not clear. In the next section, the effect of Cit on CQ is discussed in detail, and as a result, the relation between VTG and VCH is revealed.
Quantum capacitance analysis
Quantum capacitance was originally derived from the finite DOS of a 2D electron gas [18, 31, 32] . In addition, it has been successfully extracted in graphene [33, 34] . Here, using the same procedure as for graphene, which is different from the C-f analysis in figure 4b, CQ is again extracted as a continuous function of CTG. The samples 1 and 2 are used for this analysis due to their relatively high qualities. First, CQ is extracted experimentally from the C-V measurements at 1 MHz in figure 3 to observe the entire picture. At the high frequency limit of 1 MHz, the interface states are assumed to be unable to respond. Therefore, eq. (1) is reduced to 1/Ctotal = 1/CTG + 1/CQ by neglecting Cit. Since CTG has already been determined in the strong accumulation region, CQ for samples 1 and 2 is extracted experimentally as a function of VTG in figure 5a,b. Alternatively, CQ can be calculated theoretically by considering the Fermi distribution and DOS of 2D materials and is expressed as follows [19] :
where g2D = gsgvm * /2πħ 2 is the band-edge DOS, and EG is 1.9 eV for monolayer MoS2. gs and gv are the spin and valley degeneracy factors, respectively. m * is assumed to be 0.6m0, where m0 is the electron mass in vacuum. The mid gap is defined to be EF = 0 eV. Then, the conduction band edge (CBE) is located at 0.95 eV. In eq. (2), CQ is expressed as a function of VCH, and the correlation between VCH and VTG is required for comparison with the experiment. Based on the ideal equivalent circuit at the high-frequency limit without Cit, the theoretical correlation between VCH and VTG is expressed as
where VTG,mid-gap is the fitting parameter that refers to VTG at EF = 0 eV. This parameter is used to compensate for the intrinsic n-type doping in MoS2. By combining eqs. (2) and (3), CQ is calculated as a function of VTG. The experimental and theoretical CQ-VTG curves are compared in figure 5a. The CQ-VTG curve of sample 1 fits well with the theoretical curve over the wide range of VTG (-1.8 ~ 0.1 V), while it largely deviates from the theoretical curve for sample 2.
However, even for sample 1, the deviation of CQ can be seen on the logarithmic scale, as shown in figure 5b. The deviations of CQ from the theoretical curve along the transverse and vertical axes have two different origins. One, for the transverse axis, is the "stretch-out" effect [35] . Although the interface traps do not respond to the alternating current (AC) frequency of 1 MHz with the amplitude of 50 mV in the C-V measurement, they respond to the slowly varying direct current (DC) VTG, which causes the C-V curve to stretch out along the transverse VTG axis as the interface trap occupancy changes with VTG. The other origin, which impacts the vertical axis, is that the high-frequency limit of 1 MHz is not always satisfied since τit is quite short near the CBE, as shown in figure 4c. Thus, the extracted CQ may partially include the contribution of Cit in terms of the vertical axis. As a result, the experimental correlation between VCH and VTG by a conventional high frequency method (the so-called Terman method) [20] needs to be reconsidered. For both cases, interface traps cause deviations from the theoretical CQ curve in the range of CQ < Cit.
As discussed above, the stretch-out effect and the limitation of the measured frequency complicate the correlation between VCH and VTG. Here, we propose a simple method to determine VCH, i.e., find EF by using the CQ values obtained from the C-f analysis in figure 4c instead of the CQ values extracted from the C-V measurement at 1 MHz, because they do not include Cit. Figure 5c shows the theoretical CQ-EF curve calculated via eq. (2). Experimental CQ values extracted from C-f analysis for samples 1 and 2 are then plotted on the theoretical CQ-EF line as the blue open circles and red open triangles in figure 5c, which compensates the contribution of stretch out along the transverse axis in the C-V curve. Then, the correlated EF value can be read by following the arrows. We have to emphasize that CQ obtained via the C-f analysis is significantly more accurate than that obtained from the C-V curve at 1 MHz because Cit can be strictly excluded from CQ. Sample 2 has a narrower EF range due to its larger Dit. This means that modulation of EF by VTG is suppressed by a larger Dit, which is often referred to as Fermi level pinning at the semiconductor/insulator interface [36, 37, 38] .
The stretch-out effect due to the large Dit can be clearly understood by comparing the IDS-VTG curves from figure 2a and the CQ-VTG curves from figure 5b. Theoretically, VTH is defined by VTG at CQ = CTG in figure 5b [19] . The VTH determined experimentally for sample 2 is considerably shifted to the negative VTG direction due to stretch out by the large Dit. This situation is consistent with the VTH position in the IDS-VTG curve, as shown in figure 2a. It is evident that the apparent VTH position in the I-V is largely affected by the degree of Dit.
Temperature-dependence of C-V & Physical origin for Dit
The slope of CQ becomes sharp at low temperatures due to the intrinsic nature of the Fermi distribution, which provides an alternative means to confirm the validity of CQ extraction. Based on this idea, both C-V and I-V measurements were performed at 75, 150 and 300 K for an additionally prepared monolayer MoS2 FET that has a relatively high quality (two-probe mobility ~ 10 cm 2 V -1 s -1 , S.S. = 240 mV/dec at room temperature). CQ is again extracted from the C-V curves at 1 MHz and fitted as a function of VTG at different temperatures, as shown in figure 6a,b. The extracted CQ-VTG curves are divided into two regions. The first region is the CQ dominant region, with CQ > Cit. In this region, CQ has a clear temperature dependence and fits well with the theoretical calculation. The other region is the Cit dominant region, with CQ < Cit. The CQ-VTG curve deviates from the theoretical curve and shows a gradual change with decreasing temperature. Here, let us consider the VTH shift with decreasing temperature in the CQ-VTG relation, as shown in figure 6b. It is clear that VTH shifts positively with decreasing temperature due to the temperature dependences of CQ and Dit. This is quite important for studying temperature-dependent transport properties and is discussed later in relation to MIT. To show the frequency dispersion at different temperatures, the capacitance difference (∆Ctotal) between 10 kHz and 1 MHz as a function of VTG at different temperatures is shown in figure 6c. ∆Ctotal gradually increases and broadens with decreasing temperature, which is reasonable for the band tail behavior. The exact Dit value is extracted based on eq. (1). Having confirmed the CQ analysis at the measured temperatures, Dit is illustrated as a function of EF, as shown in figure 6d. In addition, the band tail distribution of Dit is successfully confirmed using the temperature-dependent C-V measurements, the results of which are similar to those of the SiO2/Si case [39] .
Let us discuss the physical origin of Dit for a monolayer of MoS2. According to the DFT calculation [11] , an S vacancy introduces an isolated Dit peak at 0.46 eV below the CBE, which is also indicated in figure 6d. It is clear that the present band tail behavior of Dit is not directly related to the S vacancy. This band tail distribution of Dit, which is also called the U-shaped band edge states, has been widely observed in Si/SiO2 [39] [40] [41] , Ge/GeO2 [42, 43] and other conventional oxide semiconductor interfaces [44] . In the case of the Si/SiO2 interface, many models have been proposed to explain the U-shaped band edge states. For example, the stretched Si-Si bonds at the interface [40] and distortion of the Si-O-Si bond angle [45] are expected to cause trap levels because the conduction band is composed of an antibonding state of the sp 3 hybrid orbital. The strain is concentrated at the Si/SiO2 interface due to the density difference. This may cause a deviation of the anti-bonding state energy, resulting in the U-shaped band edge states. Therefore, since the conduction and valence bands of MoS2 are mainly composed of the energy splitting of the Mo d orbital [11, 46] , the Mo-S bond bending due to the strain caused by lattice mismatch at the MoS2/high-k interface [47, 48] , the surface roughness of the SiO2 surface, or bond bending related to the S vacancy might be the origin. However, further study is required to clarify the physical origin of the U-shaped Dit in MoS2.
The interface properties of a bulk MoS2 capacitor have been measured as an isolated Dit peak by using the Terman method [17] and it is suggested that it be ascribed to the S vacancy. Our multilayer MoS2 FET also shows the hump in C-V curves (data is not shown here). More interestingly, a similar hump is also observed in monolayer MoS2 with poor interface quality (sample 3), as shown by arrows in the C-V curve of figure 3c and the IDS-VTG curve of figure 2b. Thus, the origin of this hump could be the sulfur vacancy or its derivative (e.g., disulfur vacancy). However, for samples 1 and 2 with relatively high quality interfaces, the hump is not obvious. Since the Dit level of the band tail of a monolayer is much higher than that of a multilayer, the high Dit level of a monolayer may hide the isolated Dit peak of the S vacancy. If this is the case, the observation of the hump in sample 3 suggests that the concentration of S vacancies in sample 3 is highest.
Let us discuss the reason why the C-V curves of a monolayer MoS2 have a significantly larger frequency dispersion than those of Si even though the Dit energy distributions are roughly comparable. One reason is that the large band-gap of the monolayer MoS2 broadens the Dit energy distribution. The most important reason is the smaller DOS of the monolayer MoS2. As we discussed in the previous section, the effect of Cit on Ctotal is determined by the relative ratio of Cit/CQ. When CQ is smaller than Cit over a certain energy range, Cit degrades the C-V curve in terms of the large frequency dispersion, large VTH shift, limited modulation of EF by VTG, and other factors. In the case of the Si FET structure, semiconductor capacitance is composed of inversion layer capacitance (CInv) instead of CQ, as shown in figure S1 . The DOS for Si inversion is much larger than that for the monolayer MoS2, which suppresses the effect of Cit on Ctotal. This is supported by the reduced frequency dispersion in the C-V curve for the multilayer MoS2 due to the larger CQ. As a result, ultra-thin 2D materials are more sensitive to interface disorder due to reduced DOS.
CQ and Cit effect on I-V characteristics
Since all the constituents in Ctotal are well understood, it is now possible to reproduce I-V characteristics by completing two steps: the determination of the carrier density controlled by the electrostatic field-effect of the top gate, and then, the characterization of the electron transport of these carriers in the conduction band. Therefore, carrier density control by VTG is modeled based on the well-understood equivalent circuit. VCH can be correlated to VTG as follows:
For this equation, Cit is added to eq. (3) because Cit is able to respond completely in conventional I-V characteristics due to the DC measurement. The channel carrier density (nCH) -VCH relation is calculated using the equation [19] 
Then, the fundamental drift current equation σ = enCHμD is applied to simulate the carrier transport process, where σ and μD refer to conductivity and drift mobility, respectively. Since CQ(EF) is analytically calculated, Cit(EF), μD(EF) and VTG,mid-gap are used as fitting parameters. μD(EF) is assumed to be independent of EF with a constant value for simplicity. Then, the drift mobilities for samples 1, 2 and 3 are estimated to be 12. Drift mobilities are slightly higher than two-probe field-effect mobilities obtained experimentally (Table S1) because Cit reduces carrier controllability by the gate even in the linear region of IDS-VTG curves. Although this is a rough assumption, it is valid in the linear region of the I-V. Whereas in the subthreshold region, the drift mobility might decrease with reduced screening effect, the dominant factor in determining IDS in this region is the carrier density, which is exponentially related to VTG, instead of the drift mobility. Band-tail-shaped Cit curves with three different levels are assumed for the samples, as shown in figure 2c. For sample 3, an additional Cit peak with a Gaussian distribution at the peak energy of 0.1 eV below the CBE is introduced, which is used to reproduce the hump observed in figure 2b. Although the peak energy for a single S vacancy is reported to be 0.46 eV below the CBE by DFT calculation [11] , the energy level of the present Cit peak is quite shallow, suggesting the existence of clustering of S vacancies [49] . Finally, experimental IDS-VTG characteristics of samples 1, 2 and 3 are well fitted based on the above model, as shown by the black solid lines in figure 2a,b . Additionally, an ideal I-V curve without Cit for sample 1 is exhibited by the green solid line in figure 2a,b , where the ideal S.S. of ~60 mV/dec as well as the sharp transition from the linear to the subthreshold region are evident. In this case, VTH can be uniquely determined. However, Cit does degrade the subthreshold region, i.e., a S.S. of ~100 mV/dec for sample 1 and over 300 mV/dec for samples 2 and 3. This degradation introduces the ambiguity in the VTH extraction by experiment, which has been encountered in the C-V analysis as well.
Interpretation of MIT
In the final section, let us discuss the contribution of Cit to MIT. The top gate FET structure in figure 1b is unsuitable for studying IDS-VG in the linear region precisely due to the existence of the access region, which results in the underestimation of the intrinsic drift mobility in the linear region. Thus, a back-gate four-probe FET with monolayer MoS2 on 90-nm SiO2/n + -Si substrate is prepared. The experimental σ-VBG curves excluding the series resistance are shown in figure 7a. Clear MIT behavior is observed for the present device quality. So far, MIT of MoS2 has been discussed for both I-V [8, 9] and C-V [10] with different models. Here, the temperature dependences of σ-VBG curves obtained experimentally are again reproduced by the above-mentioned model using the relation σ = enCHμD. The temperature dependence of CQ is calculated in figure S8a, which results from the natural property of the Fermi distribution. The band tail shape of Cit with three different levels, that is, high, low and no Cit, is again assumed in figure S8a, while CBG is estimated as 0.038 μFcm -2 for back gate SiO2 oxide with a thickness of 90 nm. VBG,mid-gap, instead of VTG,mid-gap, is constant for all temperatures. Then, nCH can be calculated using eqs. (4) and (5) . Moreover, the μD used for this modeling is the same for all three Cit cases and is slightly larger than the experimental μFE at all temperatures (Details are shown in figure S8b). Figure 7b shows simulated σ-VBG curves with three different Cit levels. MIT is well reproduced using the high Cit. By decreasing the Cit level, the crossover points of the MIT shift to the negative VBG side and finally enter the subthreshold region for the case with no Cit, which blinds the MIT. Recently, no MIT has been reported for an h-BN-encapsulated monolayer CVD-MoS2 FET [50] , suggesting a quite low Cit due to superior 2D/2D interface properties. Generally, MIT can be observed intuitively by the combination of (i) the increase in the mobility and (ii) positive VTH shift with decreasing the temperature. Within the present model, the mobility is assumed to increase with decreasing temperature due to suppression of phonon scattering, as observed in the experiment. Therefore, the dominant key factor for MIT is a positive VTH shift with decreasing temperature. This occurs because EF at VTH approaches the CBE at lower temperature. Thus, a larger amount of Cit needs to be filled by electrons before reaching VTH at lower temperature. This also explains why VTH shifts more with temperature in the high Cit case. So far, many models have been developed for MIT on 2D layered channels. The present model indicates that Cit-induced positive VTH shift is one of the main origins for "extrinsic" MIT.
Conclusion
The degradation of the electrostatic field-effect control for the monolayer mechanical exfoliated MoS2 FET is systematically studied using both C-V and I-V characterization in terms of CQ and Cit. CQ was confirmed over all of the measured temperature ranges (75~300 K). Therefore, Dit was evaluated as a function of EF by the newly constructed CQ analysis, which can also be applied for other monolayer TMDs. Dit was extracted as 10 12~1 0 13 cm -2 eV -1 with a band tail shape close to the conduction band, which is comparable to that in Si/SiO2. However, ultra-thin 2D materials are more sensitive to interface disorder due to the reduced DOS, which drastically degrades the subthreshold properties. The multilayer MoS2 is more suitable for device application due to its larger DOS. Having elucidated all the constituents in Ctotal quantitatively by C-V measurements, I-V characteristics are then well reproduced and understood by utilizing the drift current model. One of the physical origins for MIT is suggested to be the extrinsic outcome of the VTH shift due to Cit and CQ. Capacitance measurement is quite informative for detecting interface states and density of states in ultra-thin 2D materials, which allows us to understand device physics and improve device performance. Based on equivalent circuit in figure S1b, the total charge in the channel is induced by both top gate and bottom gate.
where C MoS2 is capacitance of MoS 2 , which consists of C Q and C it in parallel. Carrier density is constant in principle when source/drain current is kept unchanged. As a result, V CH and C MoS2 are also constant 1 .
Therefore, after transformation,
By modulating top gate and bottom gate simultaneously at constant source/drain current,
Equation 2 turns to be
By comparing equation 2 and 3,
Experimentally, source/drain current is kept as constant below or close to the current level at V TH due to high sensitivity of carrier density as a function of gate bias at subthreshold region. Figure S4 . The open circles indicate the measured raw C-V data at different frequencies for sample 1, which includes C para . It should be noted that the back gate voltage was not applied, just floating. Ideally, the measured capacitance (C total ) is zero at the deep depletion region, that is, the off state for I-V, and saturates asymptotically to C TG at the strong accumulation region because C Q (~84 μF cm -2 ) for monolayer MoS 2 at this region is much larger than C TG . Therefore, all of the C-V curves at different frequencies were shifted to start from zero at the off state. The solid circles indicate C total after removing C para . After this correction, C total saturate asymptotically to C TG at the strong accumulation region. Therefore, C TG is extracted at accumulation region, which is consistent with C TG estimated from I-V in figure S2 within the 10% error. Reversely, the black line is theoretical calculation based on 1/C total =1/C Q + 1/C TG , where C Q was calculated using eq. (3) in the main text and the constant C TG value obtained above was used in this calculation. At the saturation region, theoretical calculation successfully reproduced the C-V data, suggesting that the present correction is reasonable. 
D it is then extracted as a function of V TG . Table S1 . Physical properties extracted from measured devices.
C TG values for three samples are different because they are from different batches.
