Abstract-Scaling bulk CMOS SRAM technology for onchip caches beyond the 22nm node is questionable, on account of high leakage power consumption, performance degradation, and instability due to process variations. Recently, two/three transistor one gated-diode (2T/3T1D) DRAMs were proposed as alternatives to address the SRAM variability problem, with an emphasis on high-activity embedded cache applications. They are highly competitive with an SRAM in terms of performance, while having a smaller power and area footprint at lower technology nodes. The current evolutionary trend in transistor structures is toward an era of multi-gate devices, which makes it necessary to identify design issues and advantages of gated-diode DRAMs implemented in a multi-gate technology.
I. INTRODUCTION
Static random access memory (SRAM) is currently the dominant memory architecture for caches in ICs, often occupying as much as 70% of total chip area [1] . Increased scaling has placed considerable stress on SRAM technology due to the effects of process variations on performance, stability and standby leakage power consumption. In order to circumvent the SRAM scaling/variability problem, researchers have considered replacing bulk SRAM with 2T/3T1D bulk DRAM [2] , [3] , or switching to a multi-gate implementation, such as FinFET SRAM [8] , [9] , [10] . 3T1D DRAM was shown to meet the performance requirements of an L1 cache memory in the temporal window of repeated accesses/writes, thereby obviating the need for a static memory, in [2] . It is scalable, robust to process variations, and has a smaller area footprint than 6T SRAM, leading to higher density. However, to our knowledge, there has been no attempt to explore the gateddiode DRAM design space in a multi-gate technology.
In this work, we address the above design problem in FinFET technology, in the light of process variations, by performing mixed-mode 2D device-level simulations in a doublegate design environment, called FinE, that we have developed. The main contributions of this work can be summarized as follows:
• We extend the model of internal voltage gain (ζ ) in planar gated-diodes to Fin variants of gated-diodes. It provides a good quantitative insight into optimum device sizing, mode of operation and operating voltage.
• We explore the design space of 2T/3T1D FinFET DRAM cells in an effort to enhance retention time and read current, while minimizing cell leakage.
• We contrast bulk and FinFET 2T1D DRAM cells under variations and compare them with 6T PGFB cells [8] .
• We present a new tunable threshold gated-diode sense amplifier that uses an n-type gated-diode for voltageboosting and a p-type gated-diode for zero-suppression. The work is organized as follows. Section II reviews planar gated-diodes and develops an augmented model of internal voltage gain. Section III describes the simulation setup. Sections IV and V analyze gated-diode DRAM design in FinFET technology. Section VI draws comparisons between 2T1D bulk, FinFET cells and 6T PGFB cells under the effect of process variations. Section VII deals with a new tunable threshold gated-diode sense amplifier with zero-suppression. Section VIII presents the discussion and conclusions.
II. OPERATING PRINCIPLE OF THE GATED-DIODE
A preliminary analysis of gated-diode operation in planar single-gate technology is available in [3] , [4] , [5] . In this section, we develop an augmented model of internal voltage gain [3] to aid the design of fin gated-diodes for voltageboosting, show that it matches well with device simulation in Section V and extend it to zero-suppression using p-type gated-diodes in Section VII.
A. Internal Voltage Gain
A gated-diode (T G, Fig. 1 ) can be implemented in bulk silicon either by shorting the source and drain of a FET or fabricating a 'partial' FET with a source and no drain [3] , to form a two-terminal device. The nonlinear C-V characteristic of T G can be leveraged to obtain internal voltage gain in gain memory cells like 2T/3T1D DRAMs (Fig. 1) , permitting lowvoltage bitline operation as well as non-destructive read-out.
In Fig. 2(a) , node G is initially at V HIGH . On raising the source voltage by V B , ΔQ HIGH amount of charge is transferred to C L (raising its voltage to V G f ). This is indicated by the area under the C-V curve. Hence, the voltage boost at node G is ΔV HIGH = ΔQ HIGH /C L . Fig. 2(b) shows the same with V GS = V LOW . As C GS LOW < C GS HIGH , ΔQ LOW < ΔQ HIGH , so that ΔV LOW < ΔV HIGH . Hence, internal voltage gain enables greater separation between '1' and '0' levels while the stored voltage representing a '1' is much smaller. In Fig. 1 , read transistor T R 1 , whose gate is tied to G, is significantly overdriven while a '1' is read and does not turn on when a '0' is read, by the above principle. In Fig. 2 , when the source voltage is lowered, ΔQ HIGH (ΔQ LOW ) charge returns to T G, raising its V GS to V HIGH (V LOW ). This results in the highly beneficial non-destructive read-out feature. An important design decision is the choice of V HIGH . It decides the mode of T G operation as well as the internal voltage gain which are quantified below.
(a) (b) Fig. 2 . ΔQ HIGH (ΔQ LOW ) charge transferred when storage node G is at V HIGH (V LOW ) and node S is raised by V B 1) Constrained charge transfer mode: In Fig. 2 , when node S is raised by V B , part of the stored charge is transferred to C L , raising its voltage to V G f . We use a step model to approximate the C-V characteristic for T G as well as C L , i.e., C GS = C GS ON 
We define internal voltage gain ζ as
Assuming ON , it can be shown that
where θ , α are
Our assumption of partial charge transfer implies V G f − V B > V t , which imposes the necessary condition:
While the above approximates the true C-V characteristic with a step model at V t , a better piecewise linear C-V model would capture η(V HIGH ) and χ(V HIGH ) at the expense of mathematical complexity.
2) Complete charge transfer mode:
In this mode, on raising the source by V B , all the charge stored in T G is transferred to C L . Using the earlier formulation, in the V HIGH case:
Here, ζ and α have the same form as in Eq. (3) and θ is
In the complete charge transfer mode, V GS = V HIGH > V t , and on raising node S by V B , V G f − V B < V t , which gives the following necessary condition:
From the above model, there is a close correlation between the design values chosen for V B , V HIGH and V t , and the gateddiode, read FET configurations which dictate the on/off state capacitances and hence, η, χ and κ.
III. SIMULATION SETUP
In this section, we briefly describe our simulation setup. We have developed an environment called FinE (Fig. 3) , which integrates Sentaurus TCAD [6] and the Spice3-UFDG [7] model into a single framework, thereby enabling designers to perform high-level experiments with ease. Fig. 4(a) shows the X-Y cross-section that was simulated in TCAD. The heavily doped extended source and extended drain regions (H CON ×L CON ) aid in forming contacts to the device. They lead into the source/drain regions in the fin where the dopant concentration gradually decreases progressing towards the relatively undoped body region. The V t of FinFETs is typically tuned by directly adjusting the workfunction of the gate material [11] . From a fabrication standpoint, owing to a high H FIN /T SI aspect ratio, we choose a single gate workfunction for all devices. We also included the effects of using high-k (ε high-k = 20) gate dielectric in the FinFET structure shown in Fig. 4 .
10 20 V DD (V ) 1 Fig. 4(b) shows the I DS vs. V GS characteristics for the device in Table I predicted using TCAD and the Spice3-UFDG model in FinE. Spice3-UFDG [7] is a physics-based compact model that simulates double-gate devices accurately, and shows excellent agreement with our device simulation in weak as well as strong inversion regions. We have employed device simulations using TCAD in FinE for all subsequent results owing to better convergence behavior in TCAD. Using the step C-V models from Section II, we operate T G in constrained charge transfer mode [SG, SB or IG (V b = 0)] with V HIGH = 0.4V . We maximize ζ by minimizing η, or equivalently we minimize C GS OFF and maximize C GS ON . Since C GS OFF for a single-fin T G, with source and drain externally shorted, is greater than that of a T G with a source region alone, we use the latter configuration hereafter. 
There are two ways to implement T G -multi-fin minimum gate length or a single-fin elongated T G, as shown in Fig. 5 (b) and 5(c). In the multi-fin case, using L G = 30nm, η ∼ 0.1. In the single elongated fin case, η ∼ 0.015 for L G = 300nm, which is chosen as the nominal T G gate length for subsequent portions of this work. As η is primarily set by the gate length ( Fig. 6(a) ), using multiple fins with minimum-sized gates is disadvantageous, as η marginally increases in spite of the increase in C GS ON , owing to proportionately higher parasitic capacitances. Therefore, using multiple minimum-sized gates results in poor ζ as well as higher layout area. On the other hand, with a single-fin elongated T G, η is small enough to provide good ζ and layout area is much lower. Fig. 6 shows the dependence of η on L G and T SI , for SG, SB and IG (V b = 0) mode T Gs. For the SG mode, on decreasing L G , η increases as the inversion capacitance drops while C GS OFF remains unchanged. η is lower in the high-k case owing to the fact that the gate dielectric has higher ε, leading to higher C GS ON , while the rest of the structure is identical to the low-k case. SB and IG modes show nearly an order of magnitude higher η, and variation with L G and T SI is minimal as C GS OFF scales proportionately with C GS ON . 
V. GATED-DIODE FINFET DRAM CELLS
Over the past decade, a lot of effort has been directed towards integrating one-transistor one-capacitor (1T1C) DRAM as an embedded cache memory. However, read-out in 1T1C cells is destructive, requiring a write-after-read mechanism, which increases the read cycle time. Also, high-speed, lowvoltage operation is difficult due to a low signal-to-noise ratio. Gain memory cells like the bulk 2T/3T1D DRAMs described in [3] (Fig. 1) score over 1T1C cells due to their non-destructive read-out feature, low read latency and high signal-to-noise ratio even under low voltage operation. They are 'gain' cells as the amount of charge stored is considerably lower than the charge discharged in the bitline, and the storage capacitor is smaller than the 1T1C trench capacitor. In this section, we study the tradeoffs involved in 2T/3T1D FinFET DRAM design. Fig. 1 shows a dual-port bulk 2T1D (Type B1) and 3T1D (Type B2) cell -BLW is the write port and BLR is the read port. TW is asserted on raising W LW to write a '0' (V GS,T G = V LOW ) or a '1' (V GS,T G = V HIGH ) at node G. In order to perform a read, W LR is raised and a stored '1' is boosted, thereby turning on T R 1 . A stored '0' should typically provide no gain. If ζ is high enough, the resulting read current in the '1' case can discharge the highly capacitive bitline very quickly, and provide low-latency read access. An additional transistor can be introduced along the read path of the 2T1D cell to convert it to a 3T1D cell. In the 3T1D case, we can connect the source of T R 1 to ground instead of V BIAS to increase the gate overdrive on T R 1 . The read current can be enhanced further, by using a low V t version of T R 1 [3] .
In order to design the optimal 2T/3T1D FinFET cell configuration, we examined ζ under various modes of operation of T G and T R 1 . We chose T R 1 as an SG-mode FinFET with parameters in Table I, From Fig. 8(a) , high-k SG-mode T G offers the best ζ , which approaches ζ MAX , due to the least η, while SB mode shows the worst ζ owing to highest C GS OFF (V b = V B ). While the difference in ζ between low-k and high-k gate dielectric is minor for the SG mode, it is considerable for IG and SB modes. Fig. 8(b) shows that the low-k SGmode ζ vs. V HIGH measured in TCAD agrees well with the analytical model described in Section II with parameters χ ∼ 0.18, η ∼ 0.015, κ ∼ 0.44 and V t ∼ 0.22V extracted from ac simulations. ζ peaks at the boundary of the constrained and complete charge transfer modes. Here, α = χ/(χ + 1) so that V HIGH | ζ MAX = V t + αV B ∼ 0.22V + 0.15V = 0.37V . It is important to note that the analytical model in Fig. 8(b) is applicable only when V HIGH > V t or α > 0. The step C-V approximation breaks down as V HIGH approaches the neighborhood of V t . 
For the sake of completeness, Table II shows the approximate ζ with T G in SG, SB and IG (V b = 0V ) modes and T R 1 in SG and IG (V GBS = 0V ) modes in the constrained charge transfer case. From Fig. 9 , for T R 1 in SG mode, κ remains relatively unchanged at high V DS and decreases as V DS < V t . For T R 1 in IG mode, κ is higher owing to higher C L OFF and increases marginally when V GBS or V DS is increased.
With η T R 1 ,SG = 0.015, κ T R 1 ,SG = 0.44, χ T R 1 ,SG = 0.18, and V HIGH = 0.4V , we find ζ T R 1 ,SG ∼ 2.72. The IG mode configuration, however, operates in the complete charge transfer mode owing to an increase in V t , as V GBS, T R 1 = 0V . 6), we obtain an upper bound ζ T R 1 ,IG ∼ 1.3 which is in good agreement with the respective measured ζ curve in Fig. 8(a) . For other combinations of T G and T R 1 , ζ is lower, as the ΔV LOW component increases. In the IG/SB mode,
and, hence, ζ is difficult to model analytically without externally extracting V t , especially under the complete charge transfer mode. However, the step C-V approximation is still reasonable for relative bounding comparisons. Owing to poor ζ and increased layout area in IG/SB modes, T G is chosen to be in SG mode hereafter. Next, we proceed to the design of gated-diode FinFET DRAM cells using the above insights. 
FinFET DRAM cells which are derived from 2T1D cells. Here, TW , T R 1 and T R 2 are instances of FinFET described in Table I and T G is in SG mode with L G = 300nm. The operating voltages for the cells are shown in Table III . The metrics of interest for gain memory cells are standby cell leakage (I LEAK ), cell read current (I READ ) and retention time (τ RET ). The main contribution to I LEAK is sub-threshold leakage along the read path. If V HIGH is high enough, T R 1 is strongly turned on irrespective of W LR, for a stored '1'. In the worst case, all cells of a column store '1', and bitline leakage is significant. In the 2T1D cells, this is alleviated by setting V BIAS = V HIGH . If V BIAS is increased further, I LEAK decreases owing to the fact that V GS < 0 for T R 1 . However, the gate overdrive during a read operation also decreases, thereby reducing I READ . In Types 3, 4, 5 and 6, V BIAS is set to ground as with Type B2.
τ RET is the time period upto which a degraded V G provides sufficient read current upon voltage-boosting. Fig. 13(a) shows the stored 1 → 0 transition for different V HIGH , for a low-k Type 1 cell. As V G approaches V t , the stored charge decreases considerably and |dV G /dt| increases. It should be noted that τ RET is a function of the read frequency f READ , as f READ sets the maximum sensing time and, hence, determines the minimum I READ or I MIN required to maintain read fidelity. Fig. 13(b) shows the reduction in retention time on increasing f READ for low-k and highk Type 1 cells. τ RET can be as high as 500μs even with f READ > 2.5 GHz. While the high-k version shows nearly a two-fold improvement in τ RET at low frequencies over the low-k case, the difference erodes at higher frequencies. The read current of the cell under a read operation competes with the total leakage current due to the remaining cells in the column. order to improve I READ , T R 1 can be upsized. Here, the width quantization property of FinFETs imposes a condition that widths of FETs can be modified only in discrete intervals of a single fin width. Hence, adding additional fins to increase the electrical width increases the cell area considerably, as consecutive fins are separated by a fin pitch. Also, using multiple fins for T R 1 would yield the same ζ only if T G is correspondingly sized up so that χ remains unchanged, as owing to T R 2 and IG-mode T R 1 , respectively. Also, a multifin T G implementation of the Type 3 cell occupies 1.97× layout area compared to that of Fig. 16(b) . From Table IV , Type 1 cells show 26% higher I READ than Type 2 cells. Owing to the IG-mode T R 1 , Type 2 cells suffer lower standby leakage as the V t of T R 1 is higher at V W LR low . While higher ζ implies better I READ , it does not guarantee a high retention time. This is demonstrated in Fig. 14(b) , where a Type 2 cell having lower I READ at V G = V HIGH shows a larger zone of retention than a Type 1 cell, at f READ = 2 GHz. Overall, Type 2 cells appear to strike the best tradeoffs in cell area, I LEAK , τ RET and I READ at high f READ .
VI. COMPARISONS UNDER PROCESS VARIATIONS
In this section, we draw comparisons between the Type 1, Type B1 and 6T PGFB cells [8] under process variations. Due to the time-consuming nature of mixed-mode device simulations, we set the number of QMC samples to 1000 and 3σ /μ ∼ 10% for nominal variations of physical parameters in Table I .
We compared Type B1 and Type 1 cells in order to show the huge difference in performance by shifting to FinFETs under approximate iso-area conditions and identical circuit topologies. For Type B1 cells, we used a T G with L G = 300nm, W = 150nm and V t ∼ 0.22V . TW was minimum-sized with L G = 30nm, W = 30nm and V t ∼ 0.3V . T R 1 was sized with L G = 30nm, W = 45nm and V t ∼ 0.22V . From Table IV, we can see that bulk cells fail to compete with corresponding FinFET cells under approximate iso-area constraints owing to poor ζ , I READ and τ RET (the operating f READ is considerably lower at 0.5 GHz compared to 2 GHz for FinFET DRAM cells). Fig. 17 compares the retention times of Type B1 and Type 1 cells. The Type 1 cell shows three orders of magnitude higher retention time at a higher read frequency. The spread in τ RET for Type B1 is very large, with cells having retention time as high as few ms to as low as tens of ps. This is mainly due to the high sensitivity of sub-threshold leakage and gate leakage through TW to variations. It is also partly due to the difficulty in designing highly scaled planar, bulk gated-diode DRAM cells with cell areas close to their FinFET counterparts and yet have FETs with good electrostatic integrity. In order to make a comparison with 6T PGFB cells, we used the FinFET described in Table I (except L UNDERLAP = 6nm to improve I READ , tipping the scales in favor of 6T PGFB) to simulate the pull-down and access FETs and complementary p-channel devices with identical parameters for the pull-up FETs. Fig. 19 shows the layout, with cell area 1.9× that of Type 1 cells. Note that 6T PGFB cells can only be implemented using FinFETs, not bulk single-gate FETs. Fig. 20(a) shows the variation in I READ and I LEAK vs. Φ G ∈ (4.4eV, 4.8eV ) for the 6T PGFB cell. I READ is maximum at Φ G = 4.4eV and progressively decreases with increasing Φ G . Hence, we used Φ G = 4.4eV , in an attempt to match I READ with that of Type 1 cells. This is not optimal from a static noise margin (SNM) and I LEAK perspective. Also, 6T PGFB yields higher SNM than most FinFET SRAMs in the literature, with minimal decline in I READ , for a given Φ G . With L UNDERLAP = 12nm, I LEAK = 5.4nA and I READ = 34.5μA, which is insufficient for operation at f READ = 2 GHz. Decreasing L UNDERLAP to 6nm improves I READ , but increases I LEAK as well. From Fig. 20(b) , even with Φ G = 4.4eV , the mean I READ under nominal variations is 25% smaller than that of Type 1 cells, suggesting that gated-diode FinFET DRAMs can outperform 6T FinFET SRAMs in terms of read current as well as cell area. 
VII. GATED-DIODE FINFET AMPLIFIER
Gated-diode DRAMs employ gated-diode sense amplifiers, which are designed based on the same principles of voltageboosting discussed in Section II. Fig. 21(a) shows a gateddiode FinFET amplifier that modifies the configuration used in [4] , [5] . The amplifier consists of a pass-gate transistor and gated-diode(s) for voltage-boosting/supression, which feed an inverter whose output can be latched. In the absence of the p-type gated-diode shown in Fig.  21(a) , the sensing mechanism works by allowing V S to charge the n-type gated-diode on enabling V C . Next, V READ is asserted and the boosted voltage at G trips the inverter if V G = V HIGH .
A significant problem with voltage-boosting at close-tozero levels is the inability to adequately suppress zeros from being boosted. This can be resolved by connecting a ptype gated-diode at G with its source terminal connected to V DD for the static case, and V TUNE for the tunable threshold case. Applying the step model to the C-V curve in Fig. 21(b) , we set C GS = C P ON , V GS ≤ −|V t p | and C GS = C P OFF , V GS > −|V t p |. In the absence of the p-type gated-diode,
, we have χ = χ 0 (1 + Ωβ χ 0 ) and κ = (κ 0 + β )/(1 + Ωβ ). Therefore, θ , under the complete charge transfer mode, is (8) with Eq. (6), we see that the ΔV LOW component is suppressed as κ → κ + β , while the ΔV HIGH term remains unchanged. Fig. 22(a) , shows the above for V LOW = 0V and V HIGH = 0.4V , with and without zero-suppression. Typically, β ∼ 10 and Ω ∼ η ∼ 0.01, so that voltage-boosting for the zero level is virtually eliminated. Furthermore, by varying V TUNE , different operating points on the p-type C-V curve are chosen (Fig. 21(b) ), resulting in different β and thresholds for voltage-boosting. While V TUNE sets β , Ωβ is independent of V TUNE , and hence, from Eq. (8), if Ωβ 1, the ΔV HIGH component is unchanged on varying V TUNE . Fig.  22(b) shows the effect of changing V TUNE on voltage-boosting at node G. At V TUNE = 0.6V -0.8V ⇒ V GS, p-type = 0.3V -0.5V , the voltage-boosting is considerable enough to trip the inverter. This is consistent with the low capacitance seen in Fig. 21(b) . For V TUNE = 0.8V -1.2V , the operating point shifts to the inverted region in Fig. 21(b) , resulting in a high β . Therefore, from the step C-V model, the voltage boost is suppressed if V HIGH + |V t p | < V TUNE . Since zero-suppression is controlled by β , it is possible to use T Gs with poor η as long as χ is low enough to ensure that ΔV HIGH is high. Hence, IG/SB mode T Gs, which have poor η, can be used in voltage-boosting applications with the option of tuning C-V characteristics, and have a reasonably high ζ .
The above style of using an additional p-type gated-diode is not specific to either bulk or FinFET technology -owing to the requirement of a second long gated-diode with Ω ∼ η, a FinFET implementation would be the most area-efficient. The methodology can be adapted to other types of circuits as well, where greater separation between high and low voltage levels is required due to a low signal-to-noise ratio.
VIII. DISCUSSION AND CONCLUSIONS
Gated-diode FinFET DRAMs are an attractive choice for low-power, high-activity cache memories of the future. Fin gated-diode structures have a better chance of scaling, as they can easily implement the storage capacitances needed for high internal voltage gain and retention time, under tight area constraints. 2T/3T1D FinFET DRAMs offer a variety of possible cell topologies (based on SG and IG modes of FinFET operation), some of which have been explored in this work. While SG-mode fin gated-diodes yield the highest voltage gain, in the IG mode, the C-V curve can be tuned by the back gate bias to vary gain. Also, the read access time can be traded off with cell leakage along the read path, by statically or dynamically biasing the back gate of the read FET.
Overall, gated-diode FinFET DRAMs demonstrate excellent robustness to variations unlike their planar counterparts, and show more than two-fold higher read current per unit cell area, in comparison to 6T PGFB FinFET SRAMs, under similar conditions. The retention time of gated-diode FinFET DRAMs can approach millions of cycles, so that the amortized impact of refreshes on memory performance is negligible. Gated-diode FinFET amplifiers (and their variants) with a tunable voltage-boosting threshold and zerosuppression can be used to combat variability in sensing lowvoltage swings under low signal-to-noise ratio scenarios. With dual-port operation, non-destructive read-out, large retention time, and high read current, gated-diode FinFET DRAMs constitute a versatile class of memories that have the potential to replace SRAMs in mainstream applications as well, with appropriate support at the architecture level.
IX. ACKNOWLEDGMENTS
This work was supported in part by SRC under Contract no. 2007-HJ-1602 and in part by NSF under Grant no. CNS-0719936
