The 
Introduction
The intrinsic-body double-gate MOSFET has emerged as one of the leading candidates to replace Bulk and Partially-Depleted SOI CMOS due to its superior scalability for a given gate insulator thickness, better short-channel behaviour without complex channel engineering, higher mobility and the absence of random dopant fluctuation effects. The ideal MOSFET is essentially a gate-voltage controlled switch, and the short channel effect reflects the negative influence of drain-voltage on channel electrostatics as channel length decreases. The double-gate fully-depleted MOS-FET diminishes the short-channel effect by bringing the gate closer to all regions of the channel, and thus improves scalability.
The quasi-planar SOI FinFET [6] and other variants have been proposed as easier manufacturable options compared to planar double-gate devices. Researchers have begun to develop design machinery for migration of microprocessor designs from PDSOI to FinFET CMOS [10] . Unlike planar single-and double-gate devices, the FinFET effec-
Figure 1. Multi-fin FinFET structure
tive channel width is perpendicular to the semiconductor plane. Hence it is possible to increase the effective channel width (and hence drive current) per unit planar area by increasing fin-height. Increasing drive current at the expense of gate area does not achieve performance benefits in gate-capacitance-dominated logic; delay is proportional to the ratio C load /I drive . On the other hand, interconnectdominated circuits such as memory arrays are likely to benefit from the increased drive current.
An estimated 70% of the transistors in a billion-transistor superscalar microprocessor are expected to be used in memory arrays, especially large L2 and L3 SRAM data caches [12] . Thus, chip area and leakage are determined primarily by these arrays. Further, with around 3 to 5 cache accesses occurring per cycle in a 16-wide issue machine, the performance of the pipeline depends to a large extent on cache access time. The performance of an SRAM subsystem is determined primarily by the delay involved in driving large loads on the bitline and the wordline. In fullydepleted SOI, junction capacitance is negligible, so the bitline load is entirely interconnect. Hence increasing cell device widths (and hence drive current) even at the cost of higher gate capacitance decreases delay. Alternately, under a power-constrained design scenario, higher widths can accommodate a decrease in V dd and an increase in V t (transistor threshold voltage) to save power while maintaining performance. Planar CMOS technologies (bulk or double gate) do not allow a "free" increase in channel width; the associated area penalty decreases array density and diminishes delay advantages because of the increase in wordline and bitline lengths. The quasi-planar FinFET allows an increase in effective channel width without any area penalty simply by increasing fin-height.
In this paper, we explore the joint V dd -fin-height-V t design space for a 65nm 32K FinFET SRAM array. We estimate the impact on array sub-threshold and gate leakage, dynamic energy, static noise margin and soft error immunity at iso-performance. Since all FinFETs on a die are expected to have the same height (essentially the SOI thickness), we also estimate the impact of this exploration on the performance and power of gate-capacitance-dominated logic. Figure 1 shows the structure of a multi-fin FinFET. A silicon fin of thickness t si is patterned on an SOI wafer. The gate wraps around on either side of the fin (over the gate insulator), and t si is the body-thickness of the resulting double-gate structure where both gates are tied together. Current flow is parallel to the wafer plane (though occurring in an orthogonal crystal plane), while channel width is perpendicular to the plane. The effective channel width of a two-channel single-fin FinFET is thus equal to 2h (h = height = SOI thickness); higher widths are achieved by drawing multiple fins in parallel and wrapping the gate around them. The effective channel width for a multi-fin FinFET on a given planar area of silicon is determined by h and fin-pitch p. The fin-pitch is expected to scale as the lithography half-pitch using spacer technology [4] . The minimum h required to achieve equivalent planar area efficiency is thus p/2; increasing h beyond p/2 increases area efficiency. The upper bound on h is set by the maximum fin aspect ratio (a max =h max /t si ) allowed by the process. Another consideration for the upper bound is the minimum width and width-increment required in the design, since width is quantised in integer multiples of 2h. Thus, there exists a design space for h between p/2 and a max t si [17] . Figure 2 shows the two-dimensional device structure used for the symmetrical-gate FinFETs. The gate workfunctions are determined such that the 65nm logic technology ITRS node [7] on-and off-current requirements are approximately met at the nominal height (h = p/2). Several metals and alloys with near-mid-gap adjustable workfunctions have been demonstrated for FinFETs [3, 8] . In this work, a 70mV increase in NMOS and PMOS V t is assumed to be achievable by adjusting the gate workfunction. In reality, this may be achieved through other means such as body The physical t ox in this work is somewhat smaller than is required for double-gate MOSFETs of this dimension. Using a thicker oxide necessitates the use of a thinner fin to suppress the short-channel effect; this worsens the impact of process variations when fin-thickness is controlled lithographically [16] . Using a thinner fin also decreases the finheight design space, given that the maximum aspect ratio (a max ) assumed is 5:1 [17] . However, researchers have reported FinFETs with higher aspect ratios [9] . We assume a small t ox -large t si scenario to demonstrate the benefits achievable over a large fin-height design space. It stands to reason that our observations remain valid under a larger t ox -smaller t si scenario; however the gains are smaller over A commercial device simulator -TAURUS [13] -is used to run two-dimensional device-circuit simulations. The Caughey-Thomas high-field mobility model is assumed for drift-diffusion transport. Quantum confinement effects are accounted for by solving one-dimensional Schroedinger equation (gate-field direction) self-consistently. Gate-oxide tunneling is solved self-consistently with majority and minority carrier transport for leakage estimation. Figure 3 shows the circuit model and the parameters used for the array. A 32K 6-T SRAM is organized as a 128 column-256 row array. A thin-cell layout is assumed, and interconnect RCs are adapted from previous 3-D simulations [14, 15] . Cell device dimensions are extrapolated from a previously reported FinFET SRAM structure [11] . The cell is verified to be readable and writable under worst-case ±30mV (15%) mismatch in cell device V t 's. The wordline and the bitline are modeled as distributed pi-RC networks. The pass-transistor gate capacitances are derived from C-V simulations. Junction capacitances are neglected because of the fully-depleted nature of the devices. A fourstage Fan-out-4 (last stage-Fan-out-6) wordline driver is designed with symmetrical rise and fall times. The input signal at the wordline driver (W L) is assumed to have a slew rate of 10ps. Figure 4 shows the design space explored in this experiment. Design 1 is the starting nominal fin-height design (with larger V dd and smaller V t ), and Design 3 is the final maximum fin-height design (with smaller V dd and larger V t ). V dd and h are varied for the cell and the wordline driver. The gate workfunctions are varied only for the cell devices; the wordline driver is maintained at the nominal (low-V t ) value. We assume that the S/D extension sheet resistance is the dominant component of R sd ; hence increasing h increases extension cross-sectional area and decreases R sd linearly. 
Device Design and Simulation

Circuit Model
Delay
We consider two components of SRAM delay -wordline driver (wordline driver input → SRAM cell) and bitline
and BLB are as shown in Figure 3 . The delay can be expressed as -
τ bl is the time required for a differential voltage ∆V sense (50mV) to develop between BL and BLB, after which the sense amplifier gets activated. I on−cell is the cell pull down current through transistors M 5 and M 1 (from Figure 6(a) ) that discharges the bitline. The wordline and bitline interconnect capacitances (C wl−int and C bl−int ) and ∆V sense are assumed to be constant for all 3 designs. The pass transistor component of wordline load (C wl−pass ) increases linearly with increase in h. Figure 5 shows the array waveforms. Both components of τ wldriver remain nearly invariant over the design space. From design 1 to 3, the increase in τ inv because of the 
Array Leakage
Figure 6(a) shows the various leakage paths in an 6-T cell. The cell leakage power can be expressed as -
where I ds and I g are defined per unit width. Figure 7(a) shows the decrease in sub-threshold (-87%) and gate conduction-band-electron tunneling (-50%) cell leakage power from design 1 to 3. Increasing h increases P leak linearly. Decreasing V dd improves sub-threshold slope and thus decreases I ds . Smaller gate field decreases I g exponentially. Increasing V t decreases I ds exponentially. These exponential effects coupled with the decrease in V dd override the impact of increasing h.
Gate Leakage
The gate leakage results include conduction-band-electron tunneling (CBET) for all cell devices. This accounts for the major portion of tunneling current in the NMOS devices. Valence-band-electron and valence-band-hole tunneling (VBET and VBHT) results are not available because of convergence difficulties with the simulator. for I gc in PMOS and is expected to follow a similar trend; the value of current is typically much smaller than NMOS [5] . I gc is expected to be the dominant mechanism for gate tunneling in these bias regimes [2] ; VBET, which accounts for gate-to-body tunneling (I gb ) is thus expected to be small as well. Edge-direct tunneling (EDT) from gate-to-source and gate-to-drain (I gso and I gdo ) is dominated by CBET [1] and is thus accounted for; however, because of the absence of overlap in our devices, it does not play a significant role.
Thus, we expect that CBET is a good indication of the overall gate leakage current. Further, all components of gate tunneling have a similar exponential dependence on V dd [1] and a linear dependence on h; so the overall gain is expected to follow a similar trend.
Array Dynamic Energy
Array dynamic energy is expended in charging and discharging the wordline and the bitline. The total energy during a read/write operation can be expressed as -
where n word = number of bits per word, and C wl = C wl−pass + C wl−int . 
Static Noise Margin and Soft Error Rate
Static noise margin (SNM) is defined as the side of the largest square inside the SRAM cross-coupled inverter characteristic measured during the read condition (BL = BLB = V dd , and W L = V dd ). Figure 8(a) shows the SNM curves for the SRAM cell for designs 1 and 3. Figure 7(b) shows the increase in SNM (+13%) from design 1 to 3. Increasing V t dominates the effect of decreasing V dd and thus SNM increases.
The charge stored at the "1" node of the cell (critical charge) is usually considered a first-order indication of the extent of immunity to soft errors - 
Impact on gate-capacitance-dominated logic
All FinFETs on a die are likely to have the same height (defined by SOI thickness) , and possibly the same V dd . So, we estimate the impact of this design space exploration on the delay and power of a 5-stage ring oscillator. This is assumed to be representative of SRAM peripheral circuitry (decoder, wordline driver, output driver, etc.) and arithmetic units that are present on the same die as the SRAM. These devices are assumed to remain at low-V t workfunctions.
The delay of a velocity-saturated inverter loaded by an identical gate is given by -
Dynamic energy and leakage power (for a single inverter) are given by -
where I ds and I g are defined per unit width. Figure 9 shows the impact on delay (+7%), dynamic energy (+30%) and leakage power (Sub-threshold: +11%, CBE gate: -29%) from design 1 to 3. Decreasing V dd increases V on /V dd and contributes to the increase in delay; increasing h has no effect as the driver and the load widths cancel each other. Dynamic energy increases faster than hV 2 dd due to the increase in E short−circuit resulting from slower slew rates. Sub-threshold leakage power increases slower than hV dd since I ds decreases with decreasing V dd due to smaller drain field. CBE gate leakage power decreases due to the exponential dependence of I g on V dd .
A larger h increases the minimum possible device width, and the quantum by which device width can be changed anywhere on the die. This might cause difficulty in designing circuits where careful balancing of widths is required, such as sense amplifiers, latches and dynamic gates [10] .
Conclusion
The FinFET is a promising candidate for mainstream CMOS integration. The unique quasi-planar structure allows an increase in effective channel width (and hence drive current) without any area penalty by increasing device height. We exploit this property to demonstrate power savings at iso-performance in an SRAM by reducing V dd and increasing V t . In effect, we demonstrate the benefits unique to quasi-planar technologies such as FinFET (equivalent to design points 2 and 3) compared to planar bulk and double-gate technologies (equivalent to design point 1).
Similar techniques could be employed for other interconnect-dominated structures such as register files, DRAMs etc. Alternately, power-density is becoming an important issue in circuits such as ALUs and clock buffers; a smaller increase in h and/or a larger decrease in V dd accompanied by a decrease in V t could enable the designer to improve power-density while trading off leakage at isoperformance. Overall, a careful joint optimization of V dd , h and V t is required to meet system design goals.
