is an order magnitude larger than previously thought, yet near the low end of known solidsolid interfaces. Our study also reveals unexpected insight into non-uniformities of the MoS2 transistors (small bilayer regions), which do not cause significant self-heating, suggesting that such semiconductors are less sensitive to inhomogeneity than expected. These results provide key insights into energy dissipation of 2D semiconductors and pave the way for the future design of energy-efficient 2D electronics.
The performance of nanoelectronics is most often constrained by thermal challenges, 1, 2 memory bottlenecks, 3 and nanoscale contacts. 4 The former have become particular acute, with high integration densities leading to high power density, and numerous interfaces (e.g. between silicon, copper, SiO2) leading to high thermal resistance. New applications and new form-factors call for dense vertical integration into multi-layer "high-rise" processors for high-performance computing, 3 or integration with poor thermal substrates like flexible plastics (of thermal conductivity 5x
lower than SiO2 and nearly 500x lower than silicon) for wearable computing. 5 These are the two most likely platforms for incorporating 2D semiconductors into electronics, yet very little is known about fundamental limits or practical implications of energy dissipation in these contexts.
At its most basic level, energy dissipation begins in the ultra-thin transistor channel and is immediately limited by the insulating regions and thermal resistance with the interfaces surrounding it. Herbert Kroemer's observation 6 that "the interface is the device" is remarkably apt
for 2D semiconductors such as monolayer MoS2. These have no bulk, and are thus strongly limited by their interfaces. For instance, even some of the best electrical contacts known today add >50% parasitic resistance to MoS2 transistors when these are scaled to sub-100 nm dimensions. 7 Similarly, thermal interfaces may be expected to limit energy dissipation from 2D electronics, and their understanding is essential. Nevertheless, a key challenge is the need to differentiate heating of the sub-nanometer thin 2D material from its environment. Here, Raman spectroscopy holds a unique advantage, 8, 9 as the temperature of even a monolayer semiconductor can be distinguished from the material directly under (or above) it, if the Raman signatures are distinct. 10 Figure 1a shows our typical device structure and measurement setup. We utilize high-quality MoS2 films grown by chemical vapor deposition (CVD) on SiO2, with Si substrates which serve as back-gates 11 (see Methods). Micron-scale channel dimensions are chosen to minimize power dissipation at the contacts (Supporting Information Section 1) and to obtain good spatial resolution. Some transistors are entirely monolayer (1L) and others contain small (<0.5 µm 2 ) bilayer (2L) regions, 11 as seen in Figures. 1b-d . In the main manuscript we focus on the latter, partly because they represent a more extreme case of material variability, and partly to reveal insight into energy dissipation at such 1L/2L interfaces. (Supporting Information Section 2 describes measurements of 1L exfoliated MoS2, with similar results.) Figure 1e displays the characteristic Raman peaks of a MoS2 channel in thermal equilibrium and when power is applied (P ≈ 1 mW/µm 2 ). The Raman peaks red-shift due to heating and phonon softening, which serves as the temperature marker (see Methods). 9, [12] [13] [14] Importantly, both the MoS2 temperature and the Si substrate temperature (directly underneath the MoS2 channel) are acquired simultaneously in this measurement since their Raman peaks are both measurable and spectrally resolved. This has not been previously implemented, to our knowledge, yet we find it is crucial to avoid the need for any assumptions regarding heat sinking from the Si substrate. The
MoS2 temperature is obtained from the out-of-plane A1' mode to avoid uncertainty of strain effects on the in-plane E' mode, and additional corrections are described in Supporting Information Sections 3 and 4. Figures. 1c and 2c) . The temperature uniformity of the device is confirmed by scanning thermal microscopy (SThM) 15 in Figure 2b and Supporting Information Figure S6 . Unlike Raman, SThM only samples the temperature of the top AlOx capping layer (not the MoS2 channel temperature), but the lack of temperature variation around 2L regions remains clearly evident. Similarly uniform temperature maps were obtained from exfoliated 1L devices, as shown in Supporting Information Figure S2 .
Minor, randomly distributed non-uniformities in the temperature seen in Figure 2 are within the uncertainty of the measurement and are also visible in the reference map taken at VDS = 0 (on a hot stage), for which the temperature is known to be uniform, as shown in Supporting Information Figure S4 . The uniform self-heating of transistors from CVD-grown MoS2 suggests that any change in energy dissipation around the 2L spots or other non-uniformities is small, and below the resolution of the Raman thermometry technique. In fact, we utilize this information to place an upper bound on potential variations, like conduction band (CB) discontinuities at 1L-2L junctions, that could lead to measurable self-heating, and find these must be <120 meV (Supporting Information Figure S7 ). This finding is consistent with the previously estimated ~50 meV CB discontinuity at 1L-2L interfaces, 16, 17 18, 19 revealing that MoS2 monolayers are much more immune to atomicscale thickness variations than Si in this atomically thin limit. Figure 3a shows the average temperature rise in the MoS2 channel versus electrical input power density (P). No measurable difference is observed between CVD-grown (red) and exfoliated (blue) monolayer transistors, suggesting that their energy dissipation (and MoS2-SiO2 interface, as we will see below) is effectively the same. Importantly, our measurements simultaneously reveal the temperature rise at the underlying Si substrate surface (purple) directly beneath the MoS2 channel. Knowledge of the Si temperature is essential to understand the energy dissipation and to validate the thermal model shown in Figure 3b ,c.
The lines in Figure 3a represent the thermal resistance Rth normalized by the device area.
The Rth (= ∆TMoS2/P) of the MoS2 channel is the sum of contributions from the Si substrate (Rth,Si = ∆TSi/P), the SiO2 layer (Rth,ox), and the MoS2-SiO2 interface (Rth,int), as illustrated in Figure 3c . This is a good approximation here, as the device dimensions are significantly larger than the lateral thermal healing length (~100 nm). 20, 21 We note that the SiO2-Si interface TBC is > 125
MWm -2 K -1 , equivalent to < 10 nm Kapitza length in terms of SiO2 thickness. 22, 23 This accounts for < 5% of Rth, and is not shown in Figure 3c (see Supporting Information Section 8 /(2Rth,Si) = 95 ± 8 Wm -1 K -1 , in good agreement with known values for highly doped Si. 24 The analytic term for thermal spreading resistance into the Si substrate is that of a circular disk heater, which is within < 5% error from the numerical solution of the rectangular transistor heat source (see Supporting Information Section 8).
The TBC found here is nearly an order of magnitude higher than recently reported for exfoliated 1L MoS2 by Raman thermometry with optical heating, [12] [13] [14] but similar to that of metal interfaces with bulk MoS2 (~25 MWm -2 K -1 ). 25 The higher TBC cannot be explained solely by additional phonon coupling channels due to the presence of our AlOx capping layer, 26 but it could be due to better interface quality of our devices (see Methods). Our measurement accuracy is also improved by the precision of electrical heating power (used here for the first time to probe this interface), and our improved analysis which accounts for the thermal resistance of the SiO2 while directly measuring the Si substrate (Figure 3a) . In contrast, in optical heating experiments one must account for the temperature-dependent absorption, the precise laser spot size and shape, and for Raman shifts unrelated to temperature induced by high laser power. The latter are difficult to decouple from heating when the laser acts as both heater and thermometer. [12] [13] [14] The agreement between our exfoliated and CVD-grown devices (as well as our MD simulations) also suggests that the TBC measured here likely approaches the upper limit of the "atomically intimate" interface. Nevertheless, we note that the MoS2-SiO2 TBC is near the very low end of known solid-solid interfaces (which range from ~10 MWm -2 K -1 for Bi-diamond to 14
GWm -2 K -1 for Pd-Ir), 2, 27 with a thermal resistance comparable to that of the underlying SiO2 (~90 nm). This is an important result, because it highlights that energy dissipation from such 2D electronics is strongly limited by their interfaces, in addition to any thermal resistance of poor substrates (e.g. flexible plastics 5 or multi-layered "high-rise" processors) 3 . The TBC of MoS2-SiO2 is also two to four times lower than that of graphene-SiO2 interfaces, 28 which is consistent with the four times heavier mass per unit area of MoS2 compared to graphene. 26 Similar TBC values are expected for other 2D atomically thin layers (on SiO2), the lowest potentially belonging to WTe2, which has twice the mass density per unit area of MoS2.
Before concluding, we note that our investigation also sheds light on the breakdown (BD) mechanism of such 2D devices. In summary, we investigated energy dissipation in functioning monolayer MoS2 transistors for the first time. Raman thermometry takes advantage of material selectivity, simultaneously measuring the temperature of the transistor and substrate. We uncover relatively uniform heating, even near small bilayer regions present in some CVD grown films, revealing that 2D semiconductors are more immune to such variability than expected. However, thermal breakdown occurs at the drain of such devices, when the (localized) temperature exceeds the oxidation threshold of MoS2. We find that the MoS2 interface will ultimately limit energy dissipation, and its TBC is among the lowest presently known for solid-solid interfaces. Such 2D electronics can nonetheless benefit from better thermal substrates (e.g. thinner SiO2), while poor thermal substrates like flexible plastics could severely limit their performance. 20 Partial device cooling could be obtained from capping layers with higher thermal conductivity (e.g. h-BN), used in short-channel devices (<100 nm) where partial heat sinking can occur directly to the contacts 21 (Supporting Information Section 8). Overall, our findings shed new light on energy dissipation mechanisms in 2D semiconductor devices, paving the way towards energy-aware design of 2D electronics. 
Methods

CVD growth of monolayer
Supporting Information
S1. Electrical contact resistance
The electrical contact resistance was evaluated by the transfer length method (TLM). Figure S1 shows the total resistance RTOT (normalized by width) vs. channel length. We extract contact resistance RC = 1.6 ± 2.5 kΩ·µm with the uncertainty reflecting 95% confidence intervals from a least-squares fit of the TLM plot. To err on the conservative side, we only set an upper bound as the goodness of the TLM fit is limited by our shortest channel length Lmin = 0.5 µm. We therefore set an upper bound of RC ≤ 4 kΩ·µm to be used when estimating the fraction of power dissipated at the contacts (see Supporting Information Section 10). For the extraction of thermal boundary conductance (TBC) discussed in this work we only used transistors with L > 4 µm, for which RC < 0.1 RTOT. We also subtracted the power dissipated at the contacts (2I 2 RC) from the total power input of all data in the main text Figure 3a . 
S2. Temperature maps of exfoliated 1L MoS2 devices
We compared our measurements of 1L CVD MoS2 transistors to similar devices fabricated from exfoliated 1L MoS2 channels. The exfoliated monolayer MoS2 flakes were prepared using a goldassisted exfoliation method 2 onto identical substrates as the CVD-grown devices. The exfoliated devices were also capped by ~15 nm AlOx (see Methods), being expected to be similarly doped as the CVD-grown devices. The obtained temperature distribution of the exfoliated devices from Raman spectroscopy ( Figure S2 ) is uniform and the thermal resistance is comparable to the one ob- 
S3. Non-temperature related Raman peak shifts
The temperature in our experiment is measured by monitoring the softening of the Raman modes, and it is therefore important to account for any Raman shifts not induced directly by temperature, such as strain and doping. We have used the A1' mode in our measurements to avoid the uncertainty in the Raman shift due to strain present in the E' mode during the temperature calibration. We also calibrated Raman peak shifts of the A1' mode vs. temperature for 1L and 2L (bilayer) separately, and found that the Raman peak shift of the A1g mode vs. temperature of 2L is 0.015 ± 0.002 cm -1 /C (not shown here) and is very close to the A1' mode obtained from 1L.
In addition, the A1' mode peak position has a slight dependence on carrier concentration 3 . We decoupled the carrier concentration dependence (induced by back-gate voltage, VGS) from the temperature dependence in our measurement by calibrating the peak shift vs. VGS at VDS=0 as shown in Figure S3 . We then corrected the Raman signal across the device length by subtracting the peak shift induced by VGS -V(x), where V(x) is the voltage at position x in the channel, assuming linear voltage distribution between source (x = −L/2) to drain (x = L/2). 
S4. Corrections for stage drift
We obtain the spatially resolved temperature by Raman mapping of our devices with and without electrical bias, and comparing the Raman peak shifts to their temperature calibration done on a hot stage. Since the peak position of MoS2 out-of-plane Raman mode (A1' in 1L and A1g in 2L) depends on the number of layers, the Raman signal in the presence of small 2L regions is non-uniform. The 2L A1g mode in our samples is higher by ~2 cm -1 compared with the 1L A1' mode, in agreement with previous reports for the same laser wavelength. During data analysis, this nonuniform Raman signal across the device, induced by the presence of 2L regions, must be carefully examined. In addition, small shifts in the sample position (~100 nm) result in misalignment between the reference and the biased Raman maps and must be corrected.
We present our correction method in Figure S4 by comparing a reference map (a) acquired at room temperature and a "hot" map (b) acquired at stage temperature Tstage = 175 o C. In this case the device temperature should be uniform as no bias is applied. Figure S4c shows the raw temperature extraction obtained directly by subtracting map (b) from (a) and dividing by the calibration value (Raman peak shift to temperature) from Figure S5d . It is evident that the extracted raw ΔT map is non-uniform in temperature and includes artificially hot and cold spots. These artificial non-uniformities in temperature can be associated with the drift of the stage during the measurement. We note that the typical acquisition time of these Raman maps is of the order ~10 minutes, and even drift of ~150 nm is sufficient to induce the observed changes.
We have therefore developed a correction procedure that includes dividing the map into areas and sorting the spectra of different pixels by their Raman peak position (or intensity). We then subtract the pixels of each area one by one in their order (as they were sorted), such that the pixel with the highest Raman signal of one area is aligned with the pixel of the highest Raman signal in the same area of the reference map. We assume the temperature does not shift one pixel significantly more than the other such that, for example, a 2L pixel having its Raman peak 1 cm -1 higher than a 1L pixel at room temperature will not shift to a lower wavenumber than the 1L pixel at high temperature. The reason is that the difference in A1' peak position from 1L to A1g in 2L (~2 cm -1 ) is large compared with any possible temperature variations across the sample. The uniform temperature map (within the uncertainty of the Raman measurement) in Figure S4d confirms our correction procedure since the temperature across the device is expected to be uniform when heated on a hot stage (rather than heated by electrical bias). We note that this calibration procedure is necessary only when the Raman signal across the measured area is non-uniform. 
S5. Temperature-dependent Raman spectroscopy of monolayer MoS2
Raman temperature measurements were carried out by comparing the shifts in spectral peak position under applied bias (with respect to the unbiased case) to a calibration measurement on a hot stage, where the sample temperature was known. We note that the Stokes to anti-Stokes intensity ratio can also be used as a thermometer 4, 5 , however it relies on the measurement of intensity rather than spectral peak position; the latter being more accurate in our measurements. The Stokes to anti-Stokes ratio is also not suitable for measuring temperature when the incident laser energy lies close to an excitonic state energy and resonance effects dominate the measured intensity, as was the case in this study.
The calibration of Raman peak shift with temperature was carried out in five different locations on films similar to the ones measured electrically up to 250 °C -monolayer (1L) CVD and exfoliated MoS2 capped by AlOx. Data from a representative location is shown in Figure S5 . In addition, we carried out the same procedure on the MoS2 transistors that were measured electrically, but only up to 125 °C in order not to degrade their performance. We found that the absolute peak position slightly varied between samples, however the peak shift with temperature (the slope in Figure S5 ) was similar across different locations and different samples, within the uncertainty of the measurement (error-bars in Figure S5 ). The absorbed laser power here and in the electrical measurement is kept below 20 µW, such that the temperature rise induced by the laser is always < 8 °C. This is confirmed by the observation that the Raman modes do not shift within the uncertainty of the measurement between 1.5 µW and 20 µW incident laser power. For the Si substrate, the absorption depth of the 532 nm laser in highly doped Si is ~ 0.65 µm. 6 Given the dimensions of the device (4 × 5 µm 2 ) and the Si substrate thickness (500 µm) we can consider the measured temperature as that of the Si surface.
The temperature dependent Raman spectra in our devices agree with previous reports of 1L MoS2 on SiO2 7, 8 . The temperature dependence of the out-of-plane A1' mode was consistent in all measured devices, whether the MoS2 was grown by CVD or exfoliated, and capped with AlOx as well as uncapped. The in-plane E' mode showed some variations between different types of samples, possibly due to strain (e.g. grown by CVD vs. exfoliated). In addition we note that for MoS2 grown on quartz we measured the E' mode peak at higher frequency (~1.5 cm -1 higher than E' of MoS2 on Si/SiO2) whereas the A1' mode maintained its peak position. Similarly, previous studies showed E' mode spectral response was different between MoS2 on Si3N4 and sapphire substrates, whereas A1' maintained its spectral response with both substrates 7 . We have therefore used the shifts in A1' Raman mode as the thermometer in our measurements. The uncertainty in temperature measurement of the MoS2 is 5-10 K (Figure 3a) , whereas the uncertainty in the temperature measurement of the Si is about half, since the sensitivity of its Raman shift to temperature is almost double, as evident in Figure S5 . Si LO E' A 1 '
S6. Scanning thermal microscopy (SThM)
We confirmed the uniform distribution of temperature rise in our devices by scanning thermal microscopy (SThM) measurements, as follows. A commercial SThM module from Anasys® Instruments was added onto an Atomic Force Microscope (AFM) from Veeco® Instruments. SThM usually consists of a thermo-resistive probe that is connected to a Wheatstone bridge, a DC voltage source and an amplifier specifically designed to avoid small electrical spikes that could break the probe. Temperature sensing occurs when the sample (here the AlOx capping layer covering the MoS2 transistor) heats up, and the SThM tip changes its electrical resistance. Using this technique, a thermal map of the sample surface with nanoscale resolution can be obtained 10 . The thermal probe used in this work is DM-GLA-5 provided by Anasys®, made of a thin Pd layer on SiN.
The MoS2 device was placed on the AFM holder, and its electrical pads were wirebonded to small pieces of Au on SiO2/Si substrates with areas of ~0.5 x 0.5 cm 2 and total thickness of ~500 μm. Thin copper wires with radius ~50 μm were contacted to these substrates using silver epoxy. These wires were used to apply current through the MoS2 film using an electrical source. The MoS2 transistor is capped with 15 nm of AlOx, which prevents the SThM probe from electrical discharges that could break the probe, but results in measurement of the top AlOx surface rather than direct measurement of the MoS2 channel.
We heated the transistor electrically by applying voltage to the MoS2 channel, while using the SThM probe to obtain a thermal map of the device. The measured signal is proportional to temperature but is qualitative. The temperature scale-bar used in Figure S6 is estimated from our thermal model calibrated by Raman thermometry. We note, however that the temperature resolution of the SThM measurement (<5 K) is better than that of Raman (~10 K). The SThM detects temperature rise at low input power for which ΔT is lower than the uncertainty of the Raman measurement, confirming the higher temperature sensitivity of the SThM.
The high spatial resolution of SThM confirms that the small 2L regions of MoS2 do not act as hot spots. Another interesting feature observed in the SThM maps is some cooling at the MoS2 channel edges. The small asymmetry in the two gradients observed at both sides of the film edges can be considered as a probe artifact. The slightly larger gradient on one of the sides happens when the probe lifts from the SiO2 to the MoS2 film, which causes some instability in the thermal scan, while the gradient observed on the other edge of the film is better represented, since the probe goes from MoS2 film to the SiO2. The decrease in temperature at the edges of the MoS2 film is also found in finite element thermal simulations shown in Figure S10c . 
S7. Temperature estimates at 1L-2L junctions
The uniform heating observed in our CVD MoS2 channels which include some 2L features implies that heating at 1L-2L junctions is smaller than the uncertainty in the Raman and SThM temperature measurements. This finding allows us to estimate: 1) the maximum electrical resistance of the 2L-1L junctions (R2L-1L), and 2) the maximum conduction band (CB) offset between 1L and 2L MoS2. The former results in Joule heating when electrons cross the ΔECB barrier from 2L to 1L, the latter results in thermionic heating when hot electrons dissipate their energy after injection from 1L to 2L. (The CB of 1L MoS2 is nominally expected to be ~50 meV higher than for 2L 11 .)
In Figure S7 , we carried out thermal simulations of our device with channel length L = 4 µm with uniform power density, and placed an additional power generation source at the center of the channel to simulate (possible) additional heating at a 1L-2L junction. We set the length of the additional heat source to be of the order of the electron mean free path (λMFP ≈ 2 nm, see Figure  S10 in Supplement of Ref. 12). We varied the power density of the junction heat source to find the conditions that would have resulted in measurable heating. Figure S7a shows the temperature rise along the channel for different power densities at the junction (P is the uniform power density in the channel). Figure S7b shows the temperature rise in the channel when the heat source at the junction is set to 20P along with the temperature that would be measured by Raman (Gaussian average across the laser spot size) and SThM (due to heat spread in the capping layer and thermal exchange radius of the tip). 13 We find that in order to detect over-heating at the junction by Raman thermometry and SThM the power density at the junction must be: 1) higher than Pmin = 20 µW/µm, and 2) higher than 20P, where P is the (uniform) power density in the rest of the channel. The former condition is derived from a similar plot to Figure S7a but with the background temperature rise of ΔT ≈ 0 (not shown here). Since over-heating at the junction is not observed experimentally by Raman and SThM, we can estimate the power dissipated at the junction is smaller than the conditions outlined above. We note that 1L-2L and 2L-1L junctions could lead to either thermionic heating or cooling (depending on current flow direction), and neither effect is detectable here.
In Figure S7c we derive the minimum CB offset that would result in measurable heating at the junction based on these two conditions. The minimum power density curve is shown in blue, the 20P curve is shown in black and the red curve satisfies both conditions. We use an Ohmic currentvoltage relation and assume uniform electric field (E) distribution in the channel, such that P = V 2 /R = E 2 L 2 /R, where the channel lengths (L = 2.5 to 6.8 µm) and sheet resistance (R ~ 13 kΩ/□) are obtained from our measured devices. We assume the power dissipated by hot electron injection at the junction is P1L-2L = ΔECBI which determines the condition P1L-2L = 20P as ΔECB = 20EλMFP (black dashed line in Figure S7c ).
The minimum CB offset required to induce measurable heating is found to be ~120 meV. Since no over-heating was detected we conclude that the CB offset between 1L and 2L in our devices is smaller than ΔECB < 120 meV. This finding agrees with recent experimental reports on the surface potential difference between 1L and 2L of the order of ~50 meV 14 . Similarly, one can estimate based on the same power dissipation requirements (R2L-1L < 20RλMFP) the maximum electrical resistance of the junction between 2L and 1L MoS2 is R2L-1L < 500 Ωµm. and the temperature that would be measured by Raman (green circles, Gaussian average across the laser spot size) and SThM (black circles, representing the heat spread in the capping layer and the thermal exchange radius of the SThM tip). 13 (c) Maximum conduction band offset (ΔECB) between 1L and 2L MoS2 vs. E-field illustrating the regime for which measurable heating would be generated at a line defect such as a 1L-2L junction. Black dashed line satisfies P1L-2L= 20P, blue dashed line satisfies P1L-2L = Pmin (= 20 µW/µm) and the red curve satisfies both. Heating would be measurable in the area shaded gray. Since no heating was measured at the junction, we estimate ΔECB < 120 meV at 1L-2L junctions. 
S8. Thermal analysis and modeling
We used the analytical model reported in Ref. 15 to extract the TBC from the measurements shown in Figure 3 of the main text. In the model we have used the Si thermal conductivity extracted from the slope of Si temperature vs. power density (~95 Wm -1 K -1 which agrees well with known values for highly doped Si 16 ). We also used known thermal conductivity of thermally-grown SiO2 (1.4 W/m/K) and of the Si-SiO2 TBC (> 125 MWm -2 K -1 ). 17, 18 We note that the role of the Si-SiO2 TBC here is negligible (see Figure S10d ), accounting for < 5% of the total thermal resistance, but it could play a greater role in devices on thinner oxides (e.g. < ~ 25 nm). This is evident in Fig. 3 of Ref. 19 where a measurable effect of the TBC is only observed for SiO2 thinner than 25 nm.
We approximate the expression for the spreading thermal resistance to the Si substrate in Figure 3c of the main text with the shape factor of a circular disk heater on a semi-infinite substrate. To test the validity of this expression we carried out finite element thermal simulations of the structure used in this study (rectangular heater W × L = 5 × 4 µm 2 ). We found that the circular disk expression is within less than 5% error of the numerically accurate thermal resistance for the average temperature of a rectangular heat source. Figure S8 shows the temperature distribution of the thermal spreading to the Si substrate in the finite element simulation illustrating the circular profile of the temperature. 2 ) and thermal spreading to the Si substrate. The temperature profile is circular and the approximation of circular disk shape heater on semiinfinite substrate is good within less than 5% error.
The MoS2-SiO2 TBC is extracted by subtracting the Si and SiO2 thermal resistance contribution to the total thermal resistance Rth = ΔT/P, as in Figure 3 of the main text. Here P is the power density in the MoS2 channel after the contact power dissipation (2I 2 RC) was subtracted, as stated in Supporting Information Section 1. We note that the doping induced by the AlOx capping layer prevents pinch-off and results in uniform heating in the channel, except at the onset of breakdown when heating becomes more significant at the drain (Figure 4 of main text). The measured uniform temperature rise justifies the use of the analytic model, whereas a non-uniform power dissipation model should be invoked at the onset of breakdown. We also compare our experimental results with finite element electrothermal simulations (COMSOL Multiphysics software ®) to confirm the analytic model. The simulation results are summarized in Figs. S10-S12. The voltage drop and consequent heat generation at the contacts are included in the electrothermal simulations, yet most of the power (>90%) is dissipated at the channel as indicated in Supporting Section 1.
The lateral temperature distribution shows some cooling to the contacts (along the channel) and sideways (across channel width) within a characteristic thermal healing length 20 LH ~ 100 nm. The AlOx capping adds a parallel path for lateral heat flow to the contacts, hence increasing LH compared to the uncapped devices ( Figure S11 ). The temperature decay sideways (across channel width) within a length scale LH is qualitatively captured by the SThM (Figure S6 ), but since LH < laser spot size, the effect is not captured by the Raman measurements. Figure S11 | The thermal role of a capping layer. Simulated steady-state temperature rise across (a) channel width and (b) length with (blue) and without (red) a capping layer of 15 nm AlOx. Axis (y for channel width and x for channel length) as defined in Figure S10a . The temperature distribution with a capping layer is very similar to the one without it, except the thermal healing length is slightly longer since some of the heat is carried laterally by the capping layer. Finally we note that for devices used in this study, where W and L ≫ LH and no top-gate is present, the simplified lumped model presented in Figure 3c of the main text can readily be used.
We also illustrate via thermal simulations how the peak device temperature at nanoscale hot spots near the drain could be higher than the one measured by Raman. The temperature measured by Raman follows a Gaussian weighed function with laser beam size r0: Figure S12 shows the simulated temperature profile along the MoS2 channel and the temperature that would be measured by Raman with a beam size r0 ≈ 300 nm (measured experimentally in our devices by the knife edge method 21 ). The simulated temperature was chosen to represent the onset of thermal breakdown, having a peaked profile with a ~20 nm hot spot at the drain exceeding the oxidation temperature of MoS2 (T ≈ 400 o C > TBD ≈ 380 °C). The peak temperature "seen" by Raman thermometry is ~300 °C. The difference between the local temperature (on nm-scale) and the one measured by Raman can account for the difference between the maximum temperature measured in Figure 4a 
S9. Molecular dynamics (MD) simulations
To replicate the experimental setup within MD simulations, we use a simulation box containing a single layer of MoS2 and SiO2 as the substrate, as shown in Figure S13a . The substrate is a block of amorphous SiO2 with dimensions 5.7 × 5.7 × 5.7 nm created by the Visual Molecular Dynamic (VMD) package 22 . Periodic boundary conditions (PBC) were applied in all three directions. The x-y PBCs are chosen to create a continuous MoS2 sheet. A vacuum region of 20 nm above the MoS2 sheet is created to avoid interaction between the adjacent unit cells in the z-direction (perpendicular to the MoS2 sheet). Initially, the distance between MoS2 and SiO2 is set to be at 3 Å. 24 . The interaction between MoS2 and SiO2 is modeled using Lorentz-Berthelot mixing rules. The (Lennard-Jones) LJ parameters for MoS2 and SiO2 were used based on universal force field 25 and are shown in Table S1 . Table S1 | LJ parameters (σ and ϵ) used for the interaction between 1L MoS2 and SiO2.
In order to stabilize the SiO2 block, we first performed a separate equilibration simulation with SiO2. This equilibration is performed in an NPT ensemble at the temperature of 300 K and constant pressure of 1 bar. The total simulation time for NPT was 200 ps with a time step of 0.01 fs. The small time step ensures the relaxation of SiO2 atoms. In all simulations, we used Nosé-Hoover We performed the energy minimization of the system using the steepest decent algorithm. The tolerance for energy and force are both set at 10 -6 and 10 -6 eV/ Å respectively. We then perform a final equilibration step in an NPT ensemble (at T = 300 K and P = 1 bar) for 200 ps and with a time step of 0.01 fs.
To compute the TBC between MoS2 and SiO2, we set the temperature of MoS2 and SiO2 to 480 K and 300 K, respectively. This can be achieved by using separate thermostats for MoS2 and SiO2. At the set temperatures, the system is allowed to equilibrate for 1 ns in an NVT ensemble with a time step of 0.1 fs. After the temperatures of MoS2 and SiO2 reached equilibrium, we switch to an NVE ensemble where the energy of the whole system is conserved. We simulate in an NVE ensemble for 300 ps with a time step of 0.05 fs. As a result, the temperature of MoS2 decreases while the temperature of SiO2 increases slightly.
We calculate the difference in the temperature of the MoS2 layer and the SiO2 block (ΔT = TMoS2 -TSiO2) and fit it to an exponential decay as 26 : 
where, ΔT0 is the initial temperature difference between MoS2 and SiO2 (here set to 180 K). The mSiO2 and mMoS2 are the masses of the SiO2 block and the MoS2 layer, respectively. The CSiO2 and CMoS2 are the specific heat per unit mass for the SiO2 and MoS2 respectively. A is the total surface area between MoS2 and SiO2 and τ is the simulation time. The TBC is given by G.
In order to get sufficient statistics, we performed 9 simulations with different starting velocities, and the error bar is generated based on these samples. Finally, we obtain the TBC of G = 15.46 ± 1.49 MWm -2 K -1 . These values are consistent with the experimental value of 14 ± 4 MWm -2 K -1 as discussed in the main manuscript and in Section 10 below.
We note that the size of the MoS2 is chosen large enough to get significant statistics and avoid the non-idealities that might be introduced due to extremely small unit cell. 27 We also performed the dependence of TBC on the thickness of SiO2 and observe that the TBC does not change for SiO2 thickness greater than 2 nm.
S10. MoS2 oxidation
We measured the MoS2 oxidation temperature in air ambient with and without the AlOx capping layer in order to compare it to the thermal breakdown (BD) temperature of our devices in air, and to test the role of AlOx encapsulating the channel. We increased the stage temperature between 
