The operation of a novel unified memory device using two floatinggates is described through experimental characterization of a fabricated proofof-concept device and confirmed through simulation. The dynamic, nonvolatile, and concurrent modes of the device are described in detail. Simulations show that the device compares favorably to conventional memory devices. Applications enabled by this unified memory device are discussed, highlighting the dramatic impact this device could have on next generation memory architectures.
Introduction
This chapter is in part based off previously published work on the demonstration of a novel double floating-gate unified memory device [1] . In this paper, that work is extended through device simulations and additional details on the fabrication, operation, and design of circuits based on such a device. Such a unified memory device could store both volatile (dynamic) and nonvolatile states simultaneously. This could have a dramatic impact on traditional memory hierarchies [2] [3] [4] . For example, the data stored in the nonvolatile mode of the device when the computer is powered down could quickly be written to the dynamic state when the power is turned on, allowing for instant-on computing. This data transfer could also operate in reverse as dynamic data could be written to nonvolatile states to allow for full or partial hibernation of the memory fabric. Alternatively, writing dynamic data quickly to nonvolatile data could enable fast in-situ checkpointing. Finally, there are a number of novel logic applications for such a device that could impact numerous areas of computation [4] . Two floating-gates (FGs) have been used previously for enhancing memory operation [5] [6] [7] [8] . However, these designs typically have been used in an effort to increase the memory window and data retention compared to single FG devices. For example, the size of the nanocrystals in the two FG layers can be engineered to exploit the Coulomb Blockade effect [8] . In this research, however, two FGs are used to enable a device which can store both dynamic and nonvolatile states concurrently. The two modes of operation are distinguished by the cha mode when charge is simply drawn up from the substrate operation of the device req merely redistributed betwee allowing for the coexisten engineering the vertical stac the proof-of-concept device based on those decisions are 2
Device Fabricat
The double FG MOSCAP demonstrate and confirm th The wafers were cleaned oxidation. For the bottom F and patterned using liftoff. through Atomic Layer Dep using palladium. ALD was been previously shown to memory technologies [9] . followed by a backside e Micrograph (TEM) of the c Fig. 1 . Device F Being a proof-of-concep not aggressively scaled, w relatively high operating vo the materials has been s optimize device performan ends, a 65-nm gate length modeled in Sentaurus TCAD arge condition of the two FGs. The device is in the dyna y redistributed between the two FGs. It is not until charg e that the device enters its nonvolatile mode. Therefore, quires the existence of a window between when charge en the FGs, and when charge is drawn up from the chan nce and selective control of both states. This requ ck to create and fine tune such a window. The fabrication e is discussed in the next section and the design trade considered throughout.
ion and Modeling
Ps shown in Fig. 1 were fabricated to experimenta he device operation. The process flow is outlined in Fig. before a SiO 2 gate oxide was grown through therm FG, palladium was deposited through e-beam evaporat . HfO 2 was used as the inter-FG dielectric and deposi osition (ALD). The top FG was then fabricated once ag used to deposit the control dielectric of HfAlO, which have low leakage that is required of ultra-scaled FLA A palladium control gate was deposited and pattern tch and aluminum deposition. A Transmission Elect cross section of the fabricated device is shown in Fig. 1b Fabrication. (a) Recipe and (b) TEM cross section.
pt structure, the layer thicknesses of the vertical stack w which as will be shown later in this paper have led oltages. Further device scaling combined with engineer hown through simulation to lower operating voltag nce and achieve high storage density [4] . Towards th h MOSFET, with the properties listed in Table 1 Bulk Si -This model is used to confirm the characterized device operation and is the link between the proof-of-concept structure and the circuit simulations discussed in Section 6. In addition to more aggressive thickness scaling, some important material distinctions can be made between the fabricated device and the simulated device. In contrast to the fabricated device, different materials were chosen for the two FGs, creating an asymmetry across the inter-FG dielectric, as can be seen in the energy band diagrams for the simulated and fabricated devices shown in Fig. 2 . A high work function metal, Pt, was used for the top FG, whereas a low work function metal, Mg, was used for the bottom FG. This results in fast dynamic programming as electrons tunnel from the bottom FG to the top FG relatively easily. Once trapped, the deep potential well of the top FG sustains sufficiently long retention times but comes at the expense of dynamic erasing, as will be shown later in the circuit simulations. The characterization and operation of this device is discussed in the following sections, starting with the dynamic mode. Operation is determined by the applied voltage envelope. Fo nd short duration, the device's dynamic mode ynamic operation of the device is illustrated in Fig. 3 . T egative voltage, to a positive voltage, and then back ally, as the device has a small negative voltage applied will move to the bottom FG leaving behind a posit ictured on the right-side of Fig. 3 . The negative charge e substrate, will cause a slight shift of the flat-band volt in the measured CV characteristic of the fabricated dev he voltage applied to the gate becomes positive, and results. Electrons now move to the top FG resulting i the substrate, as pictured on the left-side of Fig. 3 . T e substrate will cause a slight shift of the flat-band volt monstrated by the measured CV characteristic. Thus, steresis is counter-clockwise, which would be the oppo aditional single FG devices. Notice that for dynamic m lied to the control gate is insufficient to draw up cha her is only strong enough to simply redistribute charge ly thin inter-FG dielectric. Thus, there is no net increase s condition is what distinguishes the mode of operation The flat-band voltage sh envelope is shown in Fig. 4 device, there is a greater p positive voltage envelopes c symmetry in the characteris FGs. As shown in Fig. 2b , FGs such that the program a
Fig. 4
The simulations of the d The initial uncharged devic the control gate for 50 ns After about 300 ms, the th initial state of the device difference is needed to dis required. A 5 V refresh pul fully decayed back to the i 40 ns, rather than the initial can be seen in Fig. 5a , th threshold voltage (4). This v 300 ms to retain the charge device.
erimental Characterization of a Unified Memory Device hift of the fabricated device relative to the applied volt 4. As greater negative voltage envelopes are applied to positive shift in the flat-band voltage; whereas increas causes a greater negative shift in the flat-band voltage. T stics is indicative of the use of the same metal for the t this results in a symmetric energy barrier between the t and erase characteristics are also symmetric. (1) . A 5 V pulse is applied causing the threshold voltage to shift about -330 mV hreshold voltage decays about 220 mV back towards (3) . For these simulations, it is assumed that a 100 m stinguish between the two distinct states, thus a refresh lse is applied to the control gate. Since the device has initial state, this refresh pulse only needs to be applied l 50 ns applied to redistribute charge in the fresh device. his refresh returns the device back to the charged s volatile cycle continues, requiring a refresh period of ab ed state, and thus demonstrating the dynamic mode of 221 tage the sing The two two g. 5. d to (2) . the mV h is not for . As tate bout the ations. The (a) drain current and (b) capacitance vs. control g ustrated by the device simulations was confirmed in n in the dynamic retention characteristics of Fig. 6 . A + fabricated device which caused a flat-band voltage shif 6a. This is directly analogous to the simulations shown he control gate was removed, the CV characteristics wo curve. This is illustrated in Fig. 6b in which the capacita ime. As the charge difference between the two FGs deca to the right and the capacitance at 0.5 V decays from ñ t, after ~22 s, the +10 V is reapplied to the control g harge difference between the two FGs and causing the f k to the left. This is shown for five cycles in Finally, the dynamic mo consistent 300 mV window as the device was cycled o FG dielectric is critical in e cycles required of DRAM reduction of voltages and mechanisms that will redu dielectric for this inter-FG i 4
Nonvolatile Mod
The nonvolatile mode of th control gate that is sufficien FGs, as illustrated in Fig. 8 to a positive voltage, and t dynamic mode operation, substrate. As the voltage ap towards the substrate leavi charge on the FGs causes a the measured CV character gate becomes positive, and a negative charge on the FG charge on the FGs results in the measured CV characte expected for traditional sing
F
The dynamic mode oper depicted in Fig. 8 are com shown in Fig. 9 . This clear the applied voltage enve de endurance of the fabricated device is shown in Fig. 7 w between the programmed and erased states is maintai ver 10 5 times, though a cycling drift is present. The in ensuring the stable operation over the extensive number . As this dielectric is further scaled, this will permit d fields, and thus the use of lower energy tunnel uce the stress on this dielectric. Choosing an appropr is actively being investigated.
de Operation
he device is entered when a voltage pulse is applied to nt enough to draw up a net charge from the substrate to 8. Once again the device is swept from a negative volta then back to the negative voltage. However, unlike in the bias is large enough to draw up charge from pplied to the gate starts out negative, electrons are repel ing behind a positive charge on the FGs. A net posit a negative shift in the flat-band voltage, as can be seen ristic. As the sweep continues, the voltage applied to electrons are now drawn up from the substrate resulting Gs. The voltage is then swept in reverse, and the negat n a positive flat-band voltage shift, once again witnessed eristic. This results in a clockwise hysteresis, which gle FG nonvolatile devices. Fig. 2b for the fabricated device, the electrons can m hen they can tunnel back to the substrate. In Fig. 10 , both flat-band voltage shifts are plotted, and in every case, e pronounced than the initial shift. This is due to the fact to the control gate, more of the charge is being drawn up bottom FG. When the external bias is removed, the cha wo FGs resulting in an increase in the charge on the bott s closer to the substrate, this charge redistribution leads t ift over time as the charge settles. This is confirmed by le mode of the modeled device shown in Fig. 11 .
Nonvolatile Program/Erase Characteristics 225 set. and ated n in e, as more h the the that p to arge tom to a the Fig. 11 . Nonvolatile Mode Si gate voltage.
The initial uncharged de to the control gate for 30 μs channel, resulting in a net i is drawn up to the top FG, minor positive threshold vo Fig. 11a (2) . However, afte on the FGs redistributes r voltage shift of about 1.52 occurred in the fabricated d reach its stable state until a applied pulse and the charg and will have to be acco simulation, a -8.5 V pulse a 1 s (5) returns the device ap imulations. The (a) drain current and (b) capacitance vs. con evice characteristics are shown (1). A 9 V pulse is appl s (2). This pulse is large enough to pull up charge from increase of charge on the FGs. Initially, most of the cha limiting the impact on the channel. Thus, only a relativ oltage shift occurs immediately after the pulse, as shown er the voltage is removed from the control gate, the cha resulting in a much more pronounced positive thresh V after about 1 s (3). This is the same phenomenon t devices, though not to the same extent. The device does after some time passes. The relationship between the ini e redistribution settling time is currently being investiga unted for at the circuit level. Finally, as shown in applied for 30 μs (4) followed by a charge settling period pproximately back to its uncharged state. ntrol lied the arge vely n in arge hold that not itial ated the d of 
Concurrent Mod
The device is not limited to mode, but rather it can oper time. The experimental veri The device is first program shown in Fig. 13a , which charge is drawn up from th operation, a dynamic state state by the application of some of the negative charg erimental Characterization of a Unified Memory Device atile mode (a) retention and (b) endurance characteristics ile nature of the fabricated device, the retention of d in Fig. 12a . A window of at least 4.5 V is maintained o 10 years. Finally, the nonvolatile endurance of the dev over 10,000 cycles, as shown in Fig. 12b .
de Operation
o operation in either the dynamic mode or the nonvola rate in both the dynamic and nonvolatile modes at the sa ification of concurrent mode operation is shown in Fig.  mmed Fig. 1 s removed, the charge difference between the two F voltage shifts back to the original nonvolatile programm e dynamic state embedded on top of the programm ed five times, as shown in Fig. 13c . Combining this d which represents the dynamic state embedded on top of successfully demonstrates that the dynamic state can ogrammed and erased nonvolatile states. Thus, concurr cated device is experimentally verified. simulations are shown in Fig. 14. The device is f ed nonvolatile state (1) . A dynamic pulse of 5 V for 50 threshold voltage shift (2) . Upon cessation of the bias, he threshold voltage starts decaying back to the char V curve of Fig. 14b is directly analogous to the experime ig. 13b. A relatively small dynamic pulse shifts the flat-b ction, at which point it begins to decay back to the char voltage, as shown in Fig. 13c . Upon a refresh, the flat-b o the left (4). Thus, it is shown through simulation tha dded on the charged nonvolatile state. Circuit Simulati
The memory array shown i BSIM4.0 MOSFET model the simulated device descr substrate was SOI with a th was used, and the control a of this memory array is des 
Operation

Dynamic Prog Dynamic Er Dynamic Ref Nonvolatile Pro
Nonvolatile E Low V t Re High V t Re e, and concurrent mode operation have been experiment rization of the fabricated devices has been confirmed thro fully verifying the operation of this novel unified mem ng this device is discussed in the next section. ions in Fig. 15 was designed in Cadence Virtuoso 2010 usin with a 45-nm gate length and device parameters simila ribed in Table 1 . However, instead of bulk silicon, hickness of 13 nm; an SiO 2 back gate dielectric of 1.2 and back gates were composed of aluminum. The operat cribed in Table 2 . To dynamically program a target device, 3 V is placed on the appropriate WL and -2 V is placed on the appropriate SL. This results in a 5 V bias across its gate stack, which when applied for 50 ns results in the target device being dynamically programmed, as previously described in the device simulation. However, to prevent inadvertent programming of non-target devices on that WL, the non-target SLs need to be biased to 2 V such that there is only a 1 V bias across their gate stack. This represents the dynamic retain condition.
Simulated memory array architecture
As previously discussed, the device was engineered to have a low work function metal for the bottom FG and a high work function metal for the top FG. This resulted in an asymmetric barrier that allowed for fast dynamic programming and increased dynamic retention as charge tunneled easily from the bottom FG into the deeper energy well of the top FG, as shown in Fig. 2a . This resulted in a dynamic retention of 300 ms. However, this came at the expense of the dynamic erase; which as shown in Table 2 takes 10 μs. This is much longer than conventional DRAM. If, on the other hand, the materials are chosen to be symmetric, as was the case for the fabricated device in which palladium was used for both the top and bottom FGs, the dynamic erase time would reduce to 200 ns. Of course with a symmetric barrier, there is no longer the deeper potential well for the charge in the dynamically programmed state and so the retention time would also be reduced. However, for this device the retention time would only reduce from 300 ms to 100 ms, which could prove a wise tradeoff for reducing the dynamic erase time from 10 μs to 200 ns. Certainly further work function engineering can be performed to tailor the device performance towards target applications.
Another advantage of the device is that it operates more like an SRAM than a DRAM, and thus the read operation takes only 2.2 ns, which is much faster than DRAM. The read is also nondestructive, unlike DRAM. The memory array should also have a higher density than DRAM due to the difficulty of scaling the DRAM capacitor and maintaining sufficient charge sharing with the bitline. This device is scalable, in bulk form, to at least the 16-nm node. Through stacking, it has the potential to reach densities equivalent to the 8-nm node.
Overall, the device offers several advantages compared to conventional DRAM [3] . However, such a comparison is ill-conceived. The device may not be wholly superior to DRAM, nor to a similarly scaled single FG nonvolatile device, since it requires an extra FG and the addition of an ultra-thin inter-FG dielectric layer; but the device offers a tremendous advantage that neither of the other devices do singularly; it can store both dynamic (DRAM) and nonvolatile (FLASH) states concurrently. Such a unified memory device has enormous potential to impact next generation memory architectures.
Applications
There are a number of applications for such a unified memory device. For example, the device could be used to enable instant-on computing. The computer could quickly be powered down by simply moving all of the dynamic states into their nonvolatile states. If the entire memory array is written to its nonvolatile state in parallel, this would take only about 30 ms. When the user wants to power the computer back on, the memory controller simply needs to write back all of the nonvolatile data into the dynamic state. Once again, when performed in parallel, this would take only about 14 ms. In theory, the user could power up and power down the computer in only a fraction of a second. Beyond user convenience, this could allow for the operating system to power down during moments of inactivity. For example, if the user walked away from their computer to get a drink or take a phone call, the operating system could power down and conserve battery life. When the user returned, the power up penalty would only be a fraction of a second. This device could also enable partial hibernation. For parts of the memory that are not currently being used, those arrays could be written to the nonvolatile state in the background as the user continues to operate their computer. This could enable a flexible memory fabric that could be selectively powered down which could have a significant impact on energy-proportional computing. An example application for this would be Google servers. Recently, a study on their server power usage showed that at utilization workloads that were common (20-30%), the servers operated at less than half their peak energy efficiency performance [10] . Given the nature of their utilization, current solutions to transfer to inactive modes are impractical because of both a time latency and energy penalty. The device described in this chapter could make such transitions practical by significantly reducing the wake-up penalties. Alternatively, partial hibernation enabled by this device could be used to further enhance active energy-saving schemes.
Another example application in which these devices could be beneficial is in-situ checkpointing. The device could be running continuously in dynamic mode, and then upon desire for a check-point, the entire memory array could be quickly written to the nonvolatile state in only about 30 ms. This would be much more efficient than writing through narrow channels to disk. Thus, more check-points could be efficiently taken, improving the resiliency of the computer. Upon detection of an error, the correct state could be recovered much faster than traditional memory hierarchies would permit. Instant-on computers, energy-proportional computing, and in-situ checkpointing are just a few examples of the potential that could be realized with a memory array composed of this new unified memory device.
Conclusion
New unified memory devices using two FGs were modeled, simulated, fabricated and characterized. The operation of the devices in dynamic, nonvolatile, and concurrent modes were demonstrated in proof-of-concept MOSCAPs and confirmed through device simulations. The programming, retention, and endurance characteristics were demonstrated for the different modes. A memory array based on these devices was designed and simulated. It was shown that these devices compare favorably to both conventional DRAM and FLASH devices. However, the true potential of these devices is not in their use as either a DRAM or FLASH replacement, but rather as a new unified memory device that can store both dynamic and nonvolatile states concurrently. Applications for such a device were discussed that highlight the significant impact this device could have on next generation memory architectures.
