As one of the most important members of the two dimensional chalcogenide family, molybdenum disulphide (MoS2) has played a fundamental role in the advancement of low dimensional electronic, optoelectronic and piezoelectric designs. Here, we demonstrate a new approach to solid state synaptic transistors using two dimensional MoS2 floating gate memories. By using an extended floating gate architecture which allows the device to be operated at near-ideal subthreshold swing of 77 mV/decade over four decades of drain current, we have realised a charge tunneling based synaptic memory with performance comparable to the state of the art in neuromorphic designs. The device successfully demonstrates various features of a biological synapse, including pulsed potentiation and relaxation of channel conductance, as well as spike time dependent plasticity (STDP). Our device returns excellent energy efficiency figures and provides a robust platform based on ultrathin two dimensional nanosheets for future neuromorphic applications.
As one of the most important members of the two dimensional chalcogenide family, molybdenum disulphide (MoS2) has played a fundamental role in the advancement of low dimensional electronic, optoelectronic and piezoelectric designs. Here, we demonstrate a new approach to solid state synaptic transistors using two dimensional MoS2 floating gate memories. By using an extended floating gate architecture which allows the device to be operated at near-ideal subthreshold swing of 77 mV/decade over four decades of drain current, we have realised a charge tunneling based synaptic memory with performance comparable to the state of the art in neuromorphic designs. The device successfully demonstrates various features of a biological synapse, including pulsed potentiation and relaxation of channel conductance, as well as spike time dependent plasticity (STDP). Our device returns excellent energy efficiency figures and provides a robust platform based on ultrathin two dimensional nanosheets for future neuromorphic applications.
Understanding the complexities in the functioning of the human brain has been one of the foremost challenges in the field of neuroscience. Among the several proposed models, only a few can explain the operation of a human brain and that too for a very limited set of functionalities [1] [2] [3] . From an electronic point of view, the computational architecture of a brain is vastly different from that of a traditional von Neumann architecture based system [4, 5] . This has led to the emergence of neuromorphic computation schemes [6] [7] [8] [9] [10] . Current computation follows an architecture where processing and storage of data is handled by separate entities whereas in neuromorphic computation, processing and storage of data is handled by a single element which acts as the electrical analogue of a synapse. Mimicing the functionality and density of synapses in the brain would lead to a massive reduction in energy consumption and immensely enhance computational capabilities like parallel processing. Given the high density of synapses required, traditional silicon based devices which are plagued by power dissipation and short channel effects are rendered unsuitable for scalable neuromorphic applications [11, 12] . This makes ultrathin two dimensional materials a perfect candidate for the active element of a synaptic transistor given their immunity to short channel effects and excellent gate coupling at nanometer length scales [12, 13] .
Biologically, a synapse functions by changing its conductivity based on the sequence of synaptic pulses it receives. This is accomplished by varying the concentration of neurotransmitters or chemical stimulants which control the conductivity of the junction between two neurons [14] . An ideal synaptic transistor must possess the ‡ e-mail:tathagata@iisc.ac.in, arindam@iisc.ac.in twin qualities of being a non-volatile memory while inculcating a learning based mechanism to deduce its conductance from the history of applied inputs [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] . A considerable amount of literature currently exists on transition metal oxide based synaptic devices in both two terminal memristor and three terminal transistor geometry [17, 20] . However, oxides in general have a large band gap and require ionic liquid gating which diminishes the long term usability of these devices because of the short lifetime of most liquid gates. Furthermore, most of these devices utilise some form of electrochemical reaction to alter the concentration of an ionic species, and hence the channel conductance, making them very sensitive to environmental conditions like humidity, temperature etc. [20] . The requirement of a liquid gate can be avoided by substituting the transition metal oxide with a chalcogenide like molybdenum disulphide (MoS 2 ) because of its comparatively lower band gap and better coupling to metallic gates [12] . MoS 2 has already been used as an active element in high quality non-volatile memory cells with high ON/OFF ratio [31] [32] [33] [34] and appears to be a prime candidate for a complete solid state based synaptic transistor. It is a scalable semiconducting platform, with a layer dependent bandgap in the visible range [35] [36] [37] , exhibits a respectable carrier mobility (1-30 cm 2 /Vs) and displays unique transport properties like variable range hopping, percolative switching and valleytronic effects [38] [39] [40] [41] [42] [43] [44] [45] .
However, the current architecture of floating gate (FG) memory with MoS 2 is not conducive for realistic neuromorphic applications as it needs large gate voltage pulses (∼ 30 V) in three terminal geometry [31] while a large energy dissipation per pulse is observed when the device is operated in two terminal mode [46] . In this paper, we have addressed this difficulty by adopting an extended FG device architecture for the MoS 2 FET. Owing to its twodimensional nature, MoS 2 can be readily inserted in a planar floating gate (FG) architecture, where one or more metallic layers (the FGs) act as temporary storage of charge induced by a global back or top gate [31, 46, 47] . FG memory devices have been deployed in MOS architecture for a considerable period of time, where the tunneling of charge between the channel and the FG enables storage of information [48, 49] . With improvements in fabrication techniques for two dimensional systems, it is possible to create a two dimensional analogue of a FG memory by stacking different van der Waal layered materials on top of each other in an atomic lego or heterostructure [50] . We incorporate this idea in our work and demonstrate the performance of a floating gate memory device with MoS 2 as the active element. We have implemented an extended graphene FG in our devices enabling us to improve the gating efficiency which consequently leads to an almost ideal subthreshold swing and reduces the required drain bias and switching pulse for stable memory action. These benefits extend to neuromorphic applications leading to a reduction in the pulse heights required for long term potentiation and depression of the channel which reduces the stress on the gate dielectric while improving the integrability of the device with current neuromorphic systems. The FG and the channel are separated by a hexagonal boron nitride (hBN) tunnel barrier which controls the charge transfer between them, enabling us to tune the channel conductance. Distinct from previous reports of MoS 2 based synaptic memtransistors, which utilized bias induced motion of defect states in CVD (chemical vapour deposition) grown thin films to demonstrate the effect [18] , here we explore the possibility of controlled charge tunneling mediated multiple conductance states and synaptic activity in defect-free exfoliated MoS 2 layers. Using an extended FG architecture, we demonstrate hysteretic switching at near ideal subthreshold swing (77 mV/dec) in a trilayer stack of MoS 2 , hBN and graphene. We establish quantitatively that the hysteresis is caused by charge tunneling through hBN, and exploit the same to emulate spike time dependent plasticity at energy dissipation below 0.3 pJ.
The experiments were performed on a heterostructure of mechanically exfoliated flakes of MoS 2 , hBN and single/few layer graphene placed on a conventional p ++ -Si/(285 nm)SiO 2 substrate (Figure 1(a) ) (details of devices used provided in Supplementary Table S1 ). Individual layers were first exfoliated separately, searched under an optical microscope for suitable flakes using optical contrast and characterized by Raman spectroscopy for MoS 2 and graphene (see Supplementary Figure S1 ). The thickness of the hBN flake (≈ 5 nm -7 nm) was obtained via AFM measurements (see Supplementary Figure S2) . We fabricated the heterostructure (Figure 1(b) ) using a dry transfer method in an optical microscope with precision rotation and translation stages which assisted in the alignment of the individual layers [51] . Electrical contacts were defined using electron beam lithography followed by metallization via thermal evaporation of Cr(5 nm)/Au(50 nm). The extended FG was fabricated by lithographically connecting the graphene layer to a large area floating gold pad as shown in Figure 1(c) . The use of hBN as the intermediate layer was prompted by its excellent dielectric properties in the single crystalline form and large band gap (∼ 6 eV), which allows a controlled charge tunneling while reducing unintentional leakage of charge and providing a defect free substrate for the MoS 2 channel [52] [53] [54] [55] [56] . Extension of the floating gate increases the total area of the SiO 2 capacitor (≈ 45000 µm 2 , the area of the FG) which results in C 1 ≫ C 2 in Figure 1(c) , where C 1 and C 2 are the SiO 2 (≈ 5.72 pF) and hBN (≈ 5.9 fF) capacitance respectively. This increases the effective Si ++ -channel capacitance to that across the hBN layer only. (Figure 1(d) ,(e)). This is important since a control over the threshold voltage is an essential component in designing a power efficient FET [57] [58] [59] . Figure 1 (f) compares the subthreshold slope for five devices with varying configurations of the FG. Devices with an extended FG (D1, D2 and D3) demonstrate an almost ideal subthreshold slope of ≈ 80 mV/decade which increases to ≈ 300 mV/decade on removing the extension of the FG (D9) while devices with no FG (D10) operate at an even larger subthreshold slope of ≈ 1000 mV/decade. Capacitance engineering via extension of the FG leads to faster ON/OFF transitions with improved energy efficiency, both of which are of considerable importance in neuromorphic applications (Supplementary section II).
To explain the hysteresis in these devices, we postulate a charge trapping mechanism as shown in Figure 1 We use a pulse height of -4V and +3V for potentiation and depression respectively. Pulse width in both cases is 100 ms. The initial current values for potentiation (second panel from top) and depression (bottom panel) are different since the depression measurements were performed after a set of potentiation pulses had been applied which led to an increase in the channel conductance. Change in channel conductance for multiple potentiation and depression pulses for a device with (b) and without (c) an extended FG respectively. In the figures, pulses 1 to 12 and 25 to 36 are potentiation pulses while pulses 13 to 24 and 37 to 48 are depression pulses. Different potentiation and depression curves are obtained by varying the drain bias which is mentioned in volts beside the respective plots. Pulses used are similar to those in subsection (a) of this figure. (d) Repeatability of synaptic plasticity demonstrated for 20 cycles of potentiation (-3 V) and depression pulses (+3 V). Comparison of potentiation effect for different pulse heights at constant pulse width (e) and for different pulse widths at constant pulse height (f).
screening leads to the flatband condition at an effective negative bias when we start the forward run resulting in the anti-hysteretic transfer characteristics. The sweeprate independence (Supplementary Figure S4) and rangetunability of the anti-hysteresis (Figure 1(d) and (e)) suggests (nearly) relaxation-free charge transfer between the channel and the FG which is facilitated by crystallinity of the hBN layer and atomically pristine van der Waals interfaces.
The plasticity of vertical charge transfer in the MoS 2 floating gate device allows non-volatile conductance change under pulsed gate operation. This behaviour is analogous to biological synapses where the application of an excitatory or inhibitory pre-synaptic pulse has the effect of increasing or reducing the conductance of the synapse respectively. In this case, the gate acts as the pre-synaptic terminal and controls the conductance of the MoS 2 channel/synapse using a sequence of pulses. The increase and decrease in conductance are known as potentiation and depression of the synapse respectively. This is performed by applying short time period (0.1 s) voltage pulses at the gate terminal while simultaneously tracking the change in drain current. The channel conductance continuously increases for every excitatory pulse (−4 V pulse in top panel of Figure 2 Starting from the rest condition a set of twelve excitatory pulses (−4 V pulse height and 0.1 s pulse width) followed by twelve inhibitory ones (+3 V pulse height and 0.1 s pulse width) were applied at the gate terminal twice and the change in drain current was recorded after each pulse. The device with an extended FG (D4) shows a considerable change (≲ 80%) in channel conductance (Figure 2(b) ), while a negligible change is observed (≲ 2%) in the device without an extended FG (D9) (Figure 2(c) ). The current values plotted in Figure 2 (b) and (c) shows the average current over a period of one second after the pre-synaptic pulse has been removed and the channel conductance has settled down to its final value (see Supplementary Section IX). The long term plasticity is robust and persists even after a large number of potentiation and depression cycles which was limited to 20 in the current experiment (Figure 2(d) ). We observe potentiation and depression curves similar to previously reported synaptic devices [15] [16] [17] [18] [19] [20] 22] although the shape of the excitatory post-synaptic current (I sd vs time plots in Figure 2 (a)) in our case is different from that observed in previous reports [15] [16] [17] [20] [21] [22] . As a result of the unique transport mechanism of these devices, we observe low conductance values during the time period of an excitatory (potentiation) pulse while higher values of conductance are seen during an inhibitory (depression) pulse (Figure 2(a) ). Additionally, the inhibitory nature of positive gate voltage pulses leads to negative values for the short term plasticity based paired-pulse facilitation (PPF) index (see Supplementary Section X for details). We find that pulses of similar time period but larger magnitude produce a larger change in conductance. This is illustrated in Figure 2 (e) for multiple potentiation cycles. A similar effect is observable on increasing the time period of the pulse while keeping the magnitude same (Figure 2(f) ).
For a quantitative analysis of the change in I sd during both potentiation and depression pulses, we consider the bi-directional tunneling of charge across the hBN layer. As discussed in Figure 1(g) , the channel conductance varies due to the tunneling of charges in or out of the channel through a hBN tunnel barrier. In Figure 3(a) , we plot the absolute value of charge transferred per excitatory (−4 V) or inhibitory (+3 V) pulse for the device D4. This is computed by finding the effective gate bias necessary to induce the change in drain current (∆I sd ) observed for a single potentiation/depression of the channel. The magnitude of charge exchanged during a potentiation or depression event can be estimated from ∆Q = ∆V g × C self where ∆V g = ∆I sd g m with g m the transconductance and ∆V g the effective change in gate voltage for a single pre-synaptic pulse. Here, Figure 3(a) . We find ∆Q to be reasonably constant, being ≈ 2×10 −16 coulomb per pulse.
To estimate the tunneling current (I tunnel ), we assume Fowler Nordheim type electric field dependent tunneling in our devices as reported previously [60] for hBN tunnel barriers. The tunneling current is given by
where A ch is the channel area and φ b the barrier height for tunneling. The effective electron mass for hBN, m * = 0.26×m, where m is the free electron mass. Here, h and q represent the Plank's constant and electron charge, respectively, while d ≈ 5.8 nm is the thickness of the hBN layer (see Supplementary Figure S2 ). The barrier height (φ b ) is computed from the device band structure using known values for the work function of graphene and MoS 2 along with the electron affinity and band gap of hBN as shown in Figure 3 (b) [31] . We find a barrier height of 3.1 eV for potentiation which involves transfer of holes from MoS 2 to FG and 2.6 eV for depression which involves transfer of electrons (Figure 3(b) ). In Figure 3(c) we have plotted the tunneling charge (I tunnel × pulse width), calculated from Eq. 1 for both potentiation and depression as a function of the tunneling bias (V tunnel ). V tunnel for the current devices are obtained by graphically solving Eq. 1 for known values of the tunneling charge from Figure 3 across the hBN layer to be 3.36 V and 2.61 V for potentiation and depression events respectively (Figure 3(c) ).
To verify this, we measure the effective bias across the hBN tunnel barrier (denoted by the difference in Fermi level between the graphene and MoS 2 layers in panel V (potentiation) II (depression) in Figure 1(g) ) from the device transfer characteristics (Figure 3(d) ). The tunneling voltage for potentiation (depression) is given by the difference between the threshold voltage for forward (reverse) sweep and the excitatory (inhibitory) pulse height. This method yields V tunnel values of 3.52 V for potentiation and 2.5 V for depression (Figure 3(d) ), which are similar to those obtained from Fowler Nordheim modelling (Figure 3(c) ), confirming the charge tunneling mediated synaptic behaviour in our devices. Since the synaptic activity originates from the tunneling of charges between the channel and the FG, we also observe synaptic plasticity in a two terminal geometry. However, the device operates at large current levels (≈ few µA) making it energetically unfavorable for neuromorphic applications (see Supplementary Section VIII for details).
Apart from the systematic modification of channel conductance in response to pre-synaptic pulsing, synaptic memories are also meant to follow specific learning mechanisms which guide their response to a train of applied pulses. Here, we demonstrate a very common learning process of the human brain known as spike time dependent plasticity (STDP) using the current device [61] [62] [63] .
In this case, the conductivity of the synapse is a function of the time difference between the pre and post synaptic pulses. This is performed using a mapping function which converts the time difference between the pulses to the magnitude of pre-synaptic pulse applied.
The experimental procedure followed is demonstrated in Figure 4 (a) and is similar to the process detailed in Ref. [16] (see Supplementary material section V & VI for more details). Depending on the mapping function used (details provided in Supplementary section VI), we obtain synaptic responses which are symmetric (symmetric STDP) (Figure 4(c) ) or asymmetric (asymmetric STDP) (Figure 4(b) ) with respect to the time difference between the pre and post synaptic pulses. To demonstrate the effect, we have plotted the percentage change in channel conductance ∆G% with the time difference ∆t. ∆G% is given by
where G intial and G f inal are the channel conductance before and after the application of the synaptic pulse respectively. We observe large changes in the synaptic weight for small time differences between the pre and post synaptic pulse in both types of synaptic learning (Figure 4(b) and (c) ). For the asymmetric case (Figure 4(b) ), we see a sharp decrease in channel conductance for a non causal event, i.e. ∆t < 0, while there is a sharp increase in conductivity for a causal event, i.e. ∆t ≥ 0. To obtain a time constant for the potentiation and depression pulses we fit an exponential to the STDP data in Figure 4 (b) (black solid lines) as follows [62] ∆G
τ + and τ − denote the characteristic scale of time difference between the pre and post synaptic pulses for which there is a considerable change in the synaptic weight.
For the current device we find these values to be 0.34 s and 0.6 s for potentiation and depression pulses respectively. These values can be tuned by changing the mapping function (Supplementary section VI) . For the sym-metric STDP case (Figure 4(c) ) we find that the channel conductivity depends only on the absolute time difference between the synaptic inputs ∆t . The change in channel conductivity is ≈ 100% leading to a very robust demonstration of spike time dependent learning which is independent of the applied bias (Figure 4 (b) and (c)).
To evaluate the energy efficiency of the our synaptic transistor, note that the energy dissipated for a single pulse is given by
where I sd is the average current during the pulse, t pulse is the time period of the pulse and V sd the drain bias. Figure 5 (a) plots the energy dissipation as a function of pulse width for both potentiation (−4 V pulse height) and depression (+3 V pulse height) pulses at a drain bias (V sd ) of 0.01 V for the synaptic device D4. Since the channel conductance is lower during a potentiation pulse and higher during a depression pulse, we observe a higher energy loss during depression ( Figure 5(a) ). The observed energy dissipation ≈ 20 pJ per pulse for depression is similar to synaptic devices previously reported [15, 17, 20, [23] [24] [25] [26] [27] [28] [29] [30] ( Figure 5(b) ). Notably, this is about five decades lower than similar devices operated in two terminal geometry (≈ 1 µJ per pulse for same pulse duration) [46] and ∼ 1 − 2 decades lower than complementary MOS devices [20, 64] . We also note that the energy dissipation in our devices scale linearly with pulse width (Figure 5(a) ) leading to a decrease in energy consumption for lower values of t pulse (Eq. 4). For our MoS 2 based synaptic transistor, we find that the extrapolated energy dissipation for a pulse width of ≈ 100 µs is ≈ 20 fJ (indicated in Figure 5(a) ), which is comparable to that in Ref. [15] , reiterating the benefits of using TMDC based synaptic transistors for enhanced power efficiency. Additionally, we now know that both in-plane and cross-plane charge and heat transport in van der Waals heterostructures are strongly temperature dependent and can be tuned accurately with external electric fields. [37, 65] This allows a holistic integration of transport layer, heating layer and floating gate in a bottom up fashion, [66] opening up a wide range of possibilities including the implementation of biorealistic neuromorphic realizations for example, by electro-thermal pulsing in a second order memristor. [67] In conclusion, we have successfully fabricated a charge-tunneling based synaptic transistor using ultrathin molybdenum disulphide channels. Repeated potentiation and depression of the channel conductance is demonstrated along with spike timing dependent synaptic plasticity while maintaining a desirable energy efficiency. We provide a new framework for solid state synaptic devices free of electrochemical reactions which may be utilised in future neuromorphic applications. One of the key factors limiting the performance of FETs is the subthreshold swing. It is given by inverse of the amount of gate bias required to change the drain current by one decade and generally determines how fast a transistor switches. The subthreshold swing (S) maybe represented as
I. Devices used
where V g is the gate bias, I sd the drain current and ψ s the surface potential in the channel. The second term
is theoretically pegged at a value of 60 mV/decade at room temperature while the first term ∂Vg ∂ψs also known as the body factor is given by
This relation arises because we can picture the capacitive gate circuit as a series combination of capacitor C eqv the equivalent gate capacitance and C s the surface capacitance which is the quantum capacitance of the channel. For the case of a MoS 2 device fabricated on SiO 2 , C eqv would be the capacitance of the SiO 2 dielcetric. However, as observed in the main text, the equivalent capacitance for our extended floating gate structure is the capacitance of the hBN dielectric. Due to the two dimensional nature of a hBN dielectric ( thickness ≈ 10 nm), the capacitance is much larger when compared to an SiO 2 capacitor of the same area since the thickness of the SiO 2 dielectric is ≈ 285 nm. Hence, the application of an extended floating gate effectively allows us to use the hBN dielectric as the back gate rather than the (285 nm) SiO 2 . From Eqn. 2 it is clear that higher the equivalent capacitance, lower is the value of subthrehold swing (S) leading to faster switching. The higher capacitance per unit area of hBN allows us to reduce the body factor significantly leading to almost ideal subthreshold slope in all the measured devices for over four decades of drain current (I sd ) as shown in Fig 1f . We map the time difference between the pre and post-synaptic pulse into a voltage value using mapping functions demonstrated in the next section. This voltage value (V pre ) is fed into one of the inputs of a multiplexer while the other is held at 0 V. The select line of the multiplexer is controlled by the post synaptic pulse with the output connected to the Si ++ gate. Whenever there is a pulse in the postsynaptic channel V pre is applied at the gate terminal which is otherwise held at zero. Hence, the timing of the post-synaptic pulse determines the value of the gate pulse (V pre ) and consequently the change in channel conductance leading to a spike time dependent conductance change or plasticity.
VI. Mapping functions used in STDP
We perform the STDP measurements by simulating time difference between the pre and post-synaptic pulse as a voltage value which is denoted by V pre in Fig. S3 and is different for symmetric and asymmetric STDP. The timing of the post synaptic pulse (select pin of multiplexer) determines the pulse height of the presynaptic pulse (V pre ) and consequently the percentage change in conductance. The mapping function used in our measurements is listed below
Asymmetric STDP V pre = α × t pre + β α = 1 V/s for − 15 s < t pre ≤ −10 s and 0 s < t pre ≤ 10 s α = −1 V/s for − 10 s < t pre ≤ 0 s and 10 s < t pre ≤ 15 s β = 15 V for − 15 s < t pre ≤ −10 s and 10 s < t pre ≤ 15 s β = −5 V for − 10 s < t pre ≤ 0 s and 0 s < t pre ≤ 10 s Here the pulses are applied to the drain contact and the corresponding change in conductance is measured. The readout voltage is set at V sd = 2 V. Though we observe potentiation and depresssion as in the three terminal geometry, the energy dissipation per pulse in this case is (≈ 47 mJ) which is excessive for neuromorphic application. between the pulses and observing its effect on the post synaptic current due to the second pulse. In the current device geometry, positive pulses are inhibitory pulses and hence, the post-synaptic current due to the second pulse is always lower than the first one with a stronger inhibitory action being demonstrated for smaller values of ∆ t. This is quantified by computing the PPF index which denotes the percentage change in the postsynaptic current due to the second pulse and is given by, PPF index % = (I D2 − I D1 )/I D1 %. Subsection b demonstrates the PPF index % as a function of ∆ t. The PPF index is negative, indicating an inhibitory behaviour with an exponential decrease in the inhibition strength with increasing ∆ t. The black dashed line is a double exponential fit to the experimental data which is given by
where t is the time separation between the pulses, τ 1 and τ 2 are the relaxation times of the two decay phases while C 1 and C 2 are the facilitation amplitudes of the respective phases. For this fitting we find, C 1 = 9%, C 2 = 8%, τ 1 = 50 ms and τ 2 = 1000 ms. The characteristic relaxation time for both phases compare well with previous reports of two dimensional system based artificial synapses.
[1]
