Abstract: Vertical integration technology offers numerous advantages over conventional structures. Double-gate transistors can be easily fabricated for better device characteristics, and multiple device layers can be vertically stacked for better interconnect performance. In the paper, the authors explore the suitable device structures and interconnect architectures for multidevice-layer three-dimensional (3D) integrated circuits and study how 3D silicon-on-insulator (SOI) circuits can better meet the performance and power dissipation requirements projected by International Technology Roadmap for Semiconductors (ITRS) for future technology generations. Results demonstrate that double-gate SOI circuits can achieve as much as 20% performance gain and 30% power delay product reduction over single-gate SOI. More important, for interconnectdominated circuits, 3D integration offers significant performance improvement. Compared to 2D integration, most 3D circuits can be clocked at much higher frequencies (double or even triple). 3D circuits, with suitable SOI device structures, can be a viable solution for future low-power high-performance applications.
Introduction
Scaling has been the primary approach in the past few decades to meet circuit performance and power consumption requirements in VLSI circuits. However, as device dimensions shrink to submicron and below, short channel effects and quantum effects are becoming prominent. Simply scaling down device dimensions without altering device structures is no longer sufficient to maintain good device characteristics, circuit performance, or power consumption [1] . Moreover, transistor count and chip size continually increase, and hence, more and more transistors are closely packed and connected [2] . To deal with such a challenge, state-of-the-art process technologies rely heavily on adding more metal layers. It is believed that up to ten metal layers are needed in the next decade when the circuit speed reaches several gigahertz [3] . Adding more metal layers increases the complexity and cost of the process technology and degrades circuit reliability. More seriously, interconnect delay is increasing significantly due to the reduced wire width and increased interconnect length. In fact, interconnect delay is becoming a dominant factor in determining circuit performance [4] . It is, therefore, necessary to look for new device structures and circuit integration concepts to continually fuel the growth of the VLSI industry in the nanometer generations when conventional device structures reach the projected physical limits, and when the interconnect becomes the limiting factor to the integration capacity as well as circuit speed.
Silicon-on-insulator (SOI) technology has demonstrated many advantages over bulk silicon technology, such as low parasitic junction capacitance, high soft error immunity, elimination of CMOS latch-up, no threshold voltage degradation by the body effect, and simple device isolation process [5] . Moreover, the unique features of SOI technology make it easy to realise vertical structures for fully or partially depleted transistors with single or double gates [6] [7] [8] . Double-gate SOI (DGSOI) transistors have better on-off current ratio and may result in better circuits [9] [10] [11] [12] [13] [14] [15] . In addition, vertical integration enables a process technology to stack the device layers [16] [17] [18] . Stacking device layers significantly reduces the interconnect complexity and delay [19] [20] [21] [22] . Hence, vertical integration technology is able to simultaneously meet the device, circuit and interconnect requirements, and makes it a serious contender for future low-power high-performance applications.
In this paper, the authors present to the best of their knowledge, the first study of the applications that combine two highly promising technologies: DGSOI transistors for superior device characteristics and multi-device-layer integration for enhancing interconnect performance. Vertical device structures and interconnect architectures for low-power high-performance applications are explored. A system-level performance and power dissipation evaluation are presented for multi-device-layer integration with double-gate or single-gate transistors. By comparing the clock speed, power dissipation and power-delay product of single-gate (SG) SOI and double-gate (DG) SOI circuits with various numbers of device layers, this paper provides a vision on how 3D SOI circuits can better meet the requirements of future technology generations.
2 Vertical SOI device and circuit structures Fig. 1a shows the cross-section of a single-gate SOI (SGSOI) structure. When the silicon film is thicker than the maximum gate depletion width, the SOI exhibits a floating body effect and is regarded as a partially depleted (PD) SOI MOSFET. If the silicon film is thin enough that the entire film is depleted, the SOI device is considered as a fully depleted (FD) SOI MOSFET. Another structure is double-gate SOI (DGSOI) (see Fig. 1b ), where t of , t si and t b represent front gate oxide thickness, silicon film thickness and back gate oxide thickness, respectively. Double-gate fully depleted SOI MOSFETs can have ideal subthreshold slope, high drive current and superb short channel effect immunity. This makes them very attractive in lowvoltage low-power and high-performance CMOS circuit designs [5] .
Traditional double-gate SOI MOSFETs have a symmetric structure, the gate oxides on both sides of silicon film have equally small thickness. Hence, the drive current is very high. However, the gate capacitance is also large when the two gates are connected, which influences the circuit performance and power dissipation. For asymmetric double-gate SOI CMOS, if the back-gate oxide thickness is larger than the front-gate oxide thickness and the back surface is depleted, the back-gate voltage can be used to control the front-gate threshold voltage due to strong coupling of the front surface potential and back surface potential. Hence, the threshold voltage of the transistor can be altered dynamically to suit the operating state of the circuit. A high threshold voltage in the standby mode gives low leakage current, while a low threshold voltage allows higher drive current in the active mode of the operation. Compared to the dynamic threshold bulk CMOS (DTMOS: body and gate are tied together), there is no supply voltage limitation for double-gate dynamic threshold SOI (DGDTSOI) CMOS [9] .
Besides double-gated fully depleted devices, vertical process technology enables a new integration style: multi-device-layer integration, or 3D integration. Instead of simply spanning devices in a single 2D layer, 3D technology can have multiple device layers stacked over each other. Stacked structures achieve an increase of packing density and a decrease of interconnection length, saving chip area and improving circuit performance. A 3-device-layer SOI structure is illustrated in Fig. 2 .
The interlayer dielectric used to isolate multiple wiring layers from one to another can be used to isolate the device layers. Vertical channels are used to connect devices at different layers. The stacked active layers can be fabricated by selectively growing an epitaxial silicon layer on top of an insulating layer [23] , or by beam recrystallising polysilicon layer into pure crystal silicon layer [18, 24] , or by bonding two wafers [25, 26] .
Published work suggests that performance improvement can be achieved by reduced wire length with 3-D integration [20, 22] . Fig. 3 illustrates how the wire length can be reduced in a 4-device-layer integration. The original wire AB is replaced by A 0 B 0 and the length is reduced by half.
Three-dimensional delay distribution
To study the performance and power dissipation of circuits with new device structures and interconnect architectures, an accurate model that encompasses device, interconnect, as well as circuit and logic, is needed. This model is represented by 3D delay distribution.
Modelling of fully depleted SOI MOSFETs
A well established, general analytical model, which can be applied to both symmetric and asymmetric DGSOI transistors, is derived in [10] .
Consider the DGSOI NMOSFET shown in Fig. 1b . V gf and V gb denote the front-gate and back-gate voltages, while t si , t of and t ob represent the silicon film thickness, front-gate oxide thickness, and back-gate oxide thickness, respectively (usually, t of t ob ; if t of ¼ t ob , the transistor is regarded as a symmetric DGSOI MOSFET). Thus, frontgate threshold voltage with depleted back surface (V tf,depb ) and front-gate threshold voltage with inverted back surface (V tf,invb ) can be expressed by the following equations: For a DGSOI MOSFET, the saturation current I on can be expressed as
where m eff is the effective mobility and v sat is carrier saturated drift velocity.
To accurately model the off-current is very difficult. Theoretically, I off can be expressed by the following equations:
where V tf,0 ¼ V tf,depb (0) and V tb,0 ¼ V tb,depb (0) represent the front-gate threshold voltage at zero bias V gb and back-gate threshold voltage at zero bias V gf , respectively; V T is the thermal voltage; m f and m b are body-effect coefficients and are related to subthreshold slope; dV tf and dV tb are the threshold voltage lowering due to short-channel effects; m f and dV tf,0 are represented as [27] :
Here, W deff is the equivalent depletion width seen by the front gate. For thick back-gate oxide device, part of the back-gate oxide can be equivalent to the depletion region of the front gate [27] . m b and dV tf,0 can be estimated with similar equations.
For SGSOI transistor, t ob % 1. Thus I on and I off can be expressed as:
3.2 Three-dimensional interconnect modelling A 3D interconnect model was developed for investigating the impact of multi-device-layer structures on circuit performance and power consumption [22] . In order to best understand the wiring requirements, the overall wires are divided into two parts: horizontal wires and vertical wires. For a wire connecting one gate to another, the portions that are parallel to the device layers are defined as a (horizontal wire) and the portions that are perpendicular to the device layers are defined as a (vertical wire) (Fig. 4) . Horizontal wires determine the overall routing resources on top of each device layer and are generally realised by metal layers. Vertical wires are realised by vertical channels and have impact on the area of device layer. Both horizontal and vertical wires contribute to the overall interconnection delay. For a system with N gates distributed in m device layers, the closed form expressions of the horizontal and vertical wirelength distributions are obtained by extending Rent's rule [4] and are listed as follows: where h(') is the number of connections with horizontal distance of ' gate pitches, and
Horizontal wirelength distribution:
hð'Þ ¼ y ' 3 3 À 2' 2 ffiffiffiffi N m r þ 2' N m ! ' 2pÀ4 ; 1 ' < ffiffiffiffi N m r ; y 3 2 ffiffiffiffi N m r À ' ! 3 ' 2pÀ4 ; ffiffiffiffi N m r ' 2 ffiffiffiffi N m r 8 > > > > < > > > > :ð9Þy ¼ aAm p ðN =mÞð1 À ðN =mÞ pÀ1 Þ ÀðN =mÞ p ð1 þ 2p À 2 2pÀ1 Þ=ð pð2p À 1Þðp À 1Þð2p À 3ÞÞ Àð1=6pÞ þ ðð2 ffiffiffiffiffiffiffiffiffi ffi N =m p Þ=ð2p À 1ÞÞ À ððN =mÞ=ðp À1ÞÞ
Vertical wirelength distribution:
where V(k) is the number of connections with vertical distance of k device layers. k ¼ 1,2, . . . , m 7 1. A, a, p are Rent's parameters for interconnect complexity [4] . When m ¼ 1, the expressions give the wirelength distributions of 2D circuits (no vertical wires).
Three-dimensional delay distribution
With the device and interconnect models presented in the preceding subsections, delay distribution can be readily obtained for 3D integrated SOI circuits. This subsection studies the delay distribution of single-gate and doublegate circuits with multi-device-layer integrations. The delay distribution is calculated for 180 nm technology generation as projected by International Technology Roadmap for Semiconductors (ITRS) [3] . To simplify the calculation, the authors assume that each gate is a twoinput NAND gate. Based on transistor density predicted by ITRS, the sizes of the transistors are estimated to be W n =L n ¼ 10 and W p =L p ¼ 10. Also, for a net with both horizontal wire and vertical wire, the authors assume that the vertical wire is in the middle of the net regardless of the length of the net (Fig. 5) . The resistance and capacitance of a horizontal wire are denoted with one gate pitch as R and C, respectively, and the resistance and capacitance are denoted as RvÃR and CvÃC, respectively, for a vertical wire with one device layer depth (distance between the neighbouring device layers), where Rv and Cv are the coefficients that relate the resistances and capacitances of vertical wires and horizontal wires. Under the assumption that tungsten plugs are used as vertical channels, Rv and Cv are estimated to be 10 and 0.85, respectively. Fan-out of a gate is assumed to be 3, which gives A ¼ 3.8 and a ¼ 0.74 [28] . Rent's constant p is arbitrarily set to 0.45 based on the facts that p ¼ 0.45 roughly reflects the interconnect complexity of high-performance MPUs. However, the authors' discussions and conclusions are independent of the value of p.
The circuit model of Fig. 6 is used to evaluate the delay of each net [4] . Fig. 6a shows a NAND gate with fan-out of 3, with Fig. 6b being its RC model for delay estimation. The time required for the output to reach 50% of its final value will be used as gate delay t. Thus, where f g is the fan-out, R sw is the gate switching resistance, C g is the gate capacitance, and R 1 , R 2 , C 1 and C 2 are the resistance and capacitance of horizontal and vertical wires and are defined in Fig. 5 . Here, the first term is the delay due to wiring capacitance, the second term is the delay due to input capacitance of the gates at the next stage, the third term is the distributed-RC delay of the wires, and the fourth term is due to the resistance of the wires and the input capacitance of the gates. For a fully depleted SOI MOSFET, the junction capacitance is very small [5] and, hence, is ignored in the delay estimation. Table 1 lists the device and interconnect parameters for 180 nm technology node identified by ITRS [3] . Although, in this the authors only calculate and show the delay distribution for 180 nm node, it can be easily extended to other technology nodes.
Figs. 7 and 8 illustrate the delay distributions of singlegate and double-gate SOI circuits with respect to the number of device layers m ¼ 1, 4, 8, 16 . Clearly, it can be seen that every 3D delay distribution shows two distinct regions compared to 2D distribution (solid line). The author refer to the region that contains the short-delay nets as local region, and refer the region that contains the long-delay nets as the global region.
It is observed that, as the number of device layers increases, the range of the local region increases and the range of the global region decreases. Two factors contribute to this scenario. First, with more device layers, more long-delay nets in 2D are reduced and converted into shortdelay nets in 3D. Thus, the local region is enlarged. The increase of nets in the local region compensates for the decrease of nets in the global region, so that the conservation of total nets is maintained. It can therefore be concluded that 3D structures effectively reduce the longdelay nets to achieve high performance. Secondly, the influence of vertical wires is more pronounced with the increasing number of device layers and may even turn the short delay nets in 2D to moderate delay nets in 3D. When the contribution of vertical wires is significant enough, the local region may even be further extended and may completely cancel the benefit brought by 3D structures. This implies that vertical wires may limit the number of the device layers that can be integrated. To compare the SGSOI and DGSOI circuits, Figs. 7 and 8 are combined into Fig. 9 . It is observed that DGSOI distribution looks like a left-shift of SGSOIs. This implies that the delay for all nets is consistently reduced by DGSOI circuits. Hence, performance improvement can be expected for DGSOI circuits.
Performance and power dissipation evaluation
The evaluation is based on the 1999 International Technology Roadmap for Semiconductors [3] . ITRS identifies the trends and challenges of the VLSI technology from 1999 up to 2014. The period is divided into six technology generations. Clear targets and technology requirements for each generation are projected. Table 2 lists the parameters of high-performance MPUs for the six technology generations. The authors have again assumed that the gates are 2-input NAND gates. The wire widths are two times the minimum gate length. The normalisation coefficients Rv and Cv remain the same across the six technology generations (Rv ¼ 10.0 and Cv ¼ 0.85). Rent's constant p is taken as 0.45. The backgate oxide thickness and body film thickness are set as 5t of and 10t of , respectively, where t of represents the front-gate oxide thickness. These parameters roughly represent an optimum DGFD SOI device [29] . Fig. 10 illustrates the logic depth concept for critical path [4] . The widely accepted critical path delay model [4, [30] [31] [32] has the delay of all stages except one being average gate delay. The delay of the exception stage is determined by the longest global interconnect. Therefore, the minimum clock period T c is given by
Performance evaluation
where f d is the logic depth and is estimated around 20 [4] , t avg is the average gate delay and T LD is the delay of the longest global interconnect in the system. For giga-scale integration, the performance is dominated by the interconnect. T c is mostly determined by the longest global interconnect delay [4, 33] . In most of the critical path models, the average gate delay is calculated by taking average interconnection length as the wire load [4, 33, 34] . Such an approach does not capture the fact that the relation of delay and wire length is not linear, hence cannot be linearly superimposed. A more accurate approach is to calculate the average delay Fig . 10 Critical path concept directly. With delay distribution calculated in Section 3, the average net delay t avg can be easily obtained by
where function N(t) gives the number of nets N with delay t. Fig. 11 plots the clock speed with respect to various number of device layers for SGSOI and DGSOI circuits, respectively. It should be noticed that estimations of the clock speed with the critical path delay model indicate that not every circuit can meet performance requirement of ITRS, even with repeater insertion. To meet the clock speed, other techniques, such as pipeline [35] and parallelism [36] , are needed. Hence, the authors assume that a 2D SGSOI circuit at each technology generation does meet the ITRS clock speed requirement (e.g. 2D SGSOI runs at 1.2 GHz for 180 nm generation) and use the critical path delay as the relative speed measurement among the circuits.
From Fig. 11 it can be observed that DGSOI circuits consistently show better performance than SGSOI circuits. DGSOIs can be clocked at 13%-20% higher speed than the corresponding SGSOI circuits due to their higher drive currents.
Moreover, substantial performance improvement is obtained with multi-device-layer integration. Most of the circuits can be clocked at rates double or even triple those of 2D. For interconnect-dominated circuits, the delay reduction by multi-device-layer integration is significant. Fig. 12 further plots the clock speed for all technology generations. DGSOI and multi-device-layer circuits demonstrate significantly better performance across all technology nodes. Multi-device-layer integration shows 2 or 3 technology generation advantage over conventional 2D integration. It is therefore concluded that multi-devicelayer integration, together with DGSOI transistors, can be a serious contender for future high-performance applications.
It should be noticed that, although repeaters are not considered in our performance evaluation, including repeaters for delay estimation can be easily done using the approach shown in [4, 22] .
Power dissipation
In CMOS digital circuits, power dissipation consists of dynamic and static components. Ignoring power dissipation due to direct-path short-circuit current, the total average power dissipation of a CMOS inverter is given by
where activity is the switching activity (average number of switching event per clock period). I off denotes the subthreshold leakage current, which is given by (4) or (8) . C L is the sum of the gate and interconnect capacitances. While DGSOI circuits show better performance than SGSOI circuits, it is estimated that DGSOI circuits consume about 5% more dynamic power than corresponding SGSOI circuits. In estimating the power consumption, the authors assume that the supply voltage remains the same within the technology generation, regardless of the number of device layers. Hence, the power due to gate capacitance is the same for any number of device layers in the same technology generation.
Figs. 13 and 14 plot power consumptions for an individual gate and for the whole chip plotted with a ¼ 0.05. From the plots, an interesting phenomenon is observed. Down to 70 nm technology node, DGSOI circuits dissipate more power than SGSOI circuits, due to the added backgate capacitance. However, when going down to 50 nm node and beyond, DGSOI circuits actually consume less power than SGSOI circuits, as can be more clearly seen from Fig. 15 . This is due to the better short-channel effect immunity of DGFD SOI devices, which results in lower leakage, hence lower static power dissipation. Fig. 16 estimates the fractions of dynamic and static power in a circuit under room temperature. It shows that, for deep submicron circuits, static power dissipation can become comparable to, even higher than, dynamic power, and cannot be ignored. A double-gate fully depleted SOI circuit, with its inherently better short-channel effect immunity, has significant advantages for deep submicron circuits.
Despite the continuing reduction of power consumption in each gate, the total power for the whole system continually increases over technology generations, due to the continuous increase in the transistor count. Nevertheless, 3D integration can provide a relief, as shown in Figs. 13 and 14. This mainly comes from the reduction of the interconnect capacitance, because the authors do not alter the device or circuit structures.
Based on the authors' estimation, overall power consumption is still dominated by the gates. However, power consumed by interconnects is becoming substantial, taking about 15% to 25% of the total power. This is not an insignificant portion and should call for attention. Hence, 3D integration offers advantages, not only in performance, but also in power dissipation.
However, it is observed that there is a limit on how much reduction can be achieved by multi-device-layer integration. In fact, integrating more device layers may increase the power consumption. To understand this phenomenon, take 180 nm technology node as an example and plot the horizontal wire capacitance, vertical wire capacitance and the total interconnect capacitance against the number of device layers (Fig. 17) . The plot shows that, while the horizontal wire capacitance is reduced by multi-devicelayer integration, the vertical wire capacitance increases. For the small number of device layers, the decrease of horizontal wire capacitance overrides the increase of the vertical wire capacitance. Therefore, the total interconnect capacitance decreases and so does the power consumption. However, with a large number of device layers, the decrease of horizontal wire capacitance slows down, while the rate at which the vertical wire capacitance increases remains almost the same. Thus, the total wire capacitance increases. This, in turn, increases the power consumption.
Power-delay product
Power-delay product (PDP ¼ T c Â P T ) is a quality measurement of a circuit. Despite the higher power consumption of DGSOI circuits for technology nodes at 180 nm, 130 nm, 100 nm and 70 nm, the power-delay product of DGSOI is still lower than the corresponding SGSOI circuits. Fig. 18 plots the power-delay product over technology generations for various numbers of device layers. It has been observed that the power-delay product of DGSOI circuits is as much as 30% lower than SGSOI circuits (at 35 nm node). This makes DGSOI circuits very attractive for lower power and higher performance applications.
Multi-device-layer integration also achieves better power delay product. As shown in the preceding Sections, multidevice-layer integration can improve circuit performance and reduce power consumption, thus reducing the powerdelay product. From Fig. 18 , the authors again observe that there exists an optimum number of device layers that gives the best power-delay product. This implies that there is a limit on vertical integration of device layers in terms of power dissipation and power-delay product.
Conclusions
The authors have explored device structures and interconnect architectures for 3D integrated SOI circuits. Based on the projections of ITRS, SGSOI and DGSOI circuits with various numbers of device layers have been compared in terms of circuit speed, power dissipation and power-delay product. Their applications for future technology generations have investigated. The results show that, compared to SGSOI circuits, DGSOI can have up to 20% performance and 30% power-delay product gain. Moreover, for interconnect-dominated circuits, the multi-device-layer integrated circuit offers significant performance improvement. Multi-device-layer integration can have 2 or 3 technology generation advantage over 2D. The authors have also shown that multi-device-layer integration offers power and power-delay product reduction. Hence, it is concluded that 3D integration can be a viable solution for future low-power high-performance applications.
Acknowledgment
This research was supported in part by DARPA (N66001-97-1-8903), SRC (98-HJ-638), NSF CAREER award (CCR-9984553) and Intel Corporation.
References

