学位論文

Optimum Design of Subquarter-micrometer-gate CMOS/SOI Circuits for High-speed and Low-power Operation

(高速低消費電力のためのサブクォータミクロンゲートCMOS/SOI回路の最適設計)

## 平成4年12月

## Minoru Fujishima



Optimum Design of Subquarter-micrometer-gate CMOS/SOI Circuits for High-speed and Low-power Operation

> 高速低消費電力のための サブクォータミクロンゲート CMOS/SOI回路の最適設計

Supervisor: Professor Kunihiro Asada

Minoru Fujishima

Department of Electronic Engineering, the Faculty of Engineering the University of Tokyo

December 21, 1992

# Contents

| 1 | Introduction                                |                                                                            |    |  |  |  |  |  |  |  |  |
|---|---------------------------------------------|----------------------------------------------------------------------------|----|--|--|--|--|--|--|--|--|
|   | 1.1                                         | Brief History of SOI MOSFETs                                               | 1  |  |  |  |  |  |  |  |  |
|   | 1.2                                         | Target and Outline of This Dissertation                                    | 4  |  |  |  |  |  |  |  |  |
| I | Cl                                          | haracterization of Subquarter-micrometer-gate MOSFETs                      | 8  |  |  |  |  |  |  |  |  |
| 2 | Drain Current Characterization              |                                                                            |    |  |  |  |  |  |  |  |  |
|   | 2.1                                         | Introduction                                                               | 10 |  |  |  |  |  |  |  |  |
|   | 2.2                                         | Non-pinch-off Model                                                        | 12 |  |  |  |  |  |  |  |  |
|   | 2.3                                         | Conclusion                                                                 | 16 |  |  |  |  |  |  |  |  |
| 3 | Delay Time Characterization                 |                                                                            |    |  |  |  |  |  |  |  |  |
|   | 3.1                                         | 3.1 Introduction                                                           |    |  |  |  |  |  |  |  |  |
|   | 3.2 Analytical Modeling of Ring Oscillators |                                                                            |    |  |  |  |  |  |  |  |  |
|   |                                             | 3.2.1 Delay Time Model of CMOS Inverter Using Equivalent Linear Resistance | 19 |  |  |  |  |  |  |  |  |
|   |                                             | 3.2.2 Relation Between the Equivalent Linear Resistance and Current Dissi- |    |  |  |  |  |  |  |  |  |
|   |                                             | pation of a Ring Oscillator                                                | 21 |  |  |  |  |  |  |  |  |
|   | 3.3                                         | Experimental Results                                                       | 24 |  |  |  |  |  |  |  |  |
|   |                                             | 3.3.1 Process Parameters and Device Features                               | 24 |  |  |  |  |  |  |  |  |

î

|    |     | 3.3.2 Calculation of Load Capacitance from 3-dimensional shape | 27 |
|----|-----|----------------------------------------------------------------|----|
|    |     | 3.3.3 Comparison with Calculation and Measurement              | 29 |
|    | 3.4 | Discussions                                                    | 34 |
|    |     | 3.4.1 Derivation of the Effective Load Capacitance             | 34 |
|    |     | 3.4.2 Effects of Leak Current on Power-Delay Product           | 35 |
|    | 3.5 | Conclusions                                                    | 35 |
| 4  | Tes | t Structures for Characterization                              | 39 |
|    | 4.1 | Introduction                                                   | 39 |
|    | 4.2 | Characterization Method                                        | 40 |
|    |     | 4.2.1 Equivalent Linear Resistance                             | 40 |
|    |     | 4.2.2 Extension for Leaky Circuits                             | 41 |
|    |     | 4.2.3 Confirmation Method                                      | 45 |
|    | 4.3 | Experimental Results                                           | 46 |
|    | 4.4 | Conclusion                                                     | 52 |
| 11 | t E | Design for Low-power and High-speed CMOS Circuits              | 53 |
| 5  | Gat | te-Width Optimization                                          | 54 |
|    | 5.1 | Introduction                                                   | 54 |
|    | 5.2 | The Case of a Straight Circuit without Branch                  | 55 |
|    | 5.3 | the Case of General Circuit with Branches                      | 56 |
|    |     | 5.3.1 Minimum gate width of the first stage                    | 56 |
|    |     | 5.3.2 the Case Including Branches                              | 57 |
|    | 5.4 | The Case of Merging Output to the Same Logic                   | 60 |
|    | 5.5 | An optimization example                                        | 60 |

ii

|   | 5.6                                     | Conclusion                                             | 63  |  |  |  |  |  |  |  |  |
|---|-----------------------------------------|--------------------------------------------------------|-----|--|--|--|--|--|--|--|--|
| 6 | Low                                     | v Power Frequency Dividers                             | 64  |  |  |  |  |  |  |  |  |
|   | 6.1                                     | Introduction                                           | 64  |  |  |  |  |  |  |  |  |
|   | 6.2                                     | Summary of Device Features                             | 65  |  |  |  |  |  |  |  |  |
|   | 6.3                                     | Frequency Dividers                                     | 67  |  |  |  |  |  |  |  |  |
|   | 6.4                                     | Experimental Results                                   | 71  |  |  |  |  |  |  |  |  |
|   | 6.5                                     | Conclusion                                             | 75  |  |  |  |  |  |  |  |  |
| 7 | Hig                                     | h Speed Frequency Dividers                             | 77  |  |  |  |  |  |  |  |  |
|   | 7.1                                     | Introduction                                           | 77  |  |  |  |  |  |  |  |  |
|   | 7.2                                     | CMOS-Divider Circuits                                  | 78  |  |  |  |  |  |  |  |  |
|   |                                         | 7.2.1 CMOS Static Frequency Dividers                   | 78  |  |  |  |  |  |  |  |  |
|   |                                         | 7.2.2 CMOS Dynamic Frequency Dividers                  | 84  |  |  |  |  |  |  |  |  |
|   | 7.3                                     | Verification by Simulation                             | 86  |  |  |  |  |  |  |  |  |
|   |                                         | 7.3.1 The Effect of the Gate-Width of Clocked MOSFET's | 86  |  |  |  |  |  |  |  |  |
|   |                                         | 7.3.2 MOSFET sizing in clocked-inverter type circuit   | 87  |  |  |  |  |  |  |  |  |
|   | 7.4                                     | Conclusion                                             | 90  |  |  |  |  |  |  |  |  |
| 8 | High-Speed Adder and Counter 93         |                                                        |     |  |  |  |  |  |  |  |  |
|   | 8.1                                     | Introduction                                           | 93  |  |  |  |  |  |  |  |  |
|   | 8.2                                     | Algorithm for High-speed Adder Based on BCLA           | 94  |  |  |  |  |  |  |  |  |
|   | 8.3                                     | Conclusion                                             | 98  |  |  |  |  |  |  |  |  |
| 9 | 1 GHz Operation RISC Micro-Computer 100 |                                                        |     |  |  |  |  |  |  |  |  |
|   | 9.1                                     | Introduction                                           | 100 |  |  |  |  |  |  |  |  |
|   | 9.2                                     | Architecture of SOI RISC CPU                           | 101 |  |  |  |  |  |  |  |  |

iii

| 9.3    | Conclusion | <br> | <br> | <br> | <br> | <br> | <br> | 104 |
|--------|------------|------|------|------|------|------|------|-----|
| 10 Cor | nclusions  |      |      |      |      |      |      | 107 |

iv

## Chapter 1

## Introduction

## 1.1 Brief History of SOI MOSFETs

Device size in integrated circuits has been reduced aiming at both high speed and high density. Especially, huge investigation by big companies for high density DRAM results in the gate length of MOSFETs, which are now most widely used in semiconductor integrated circuits, having been yearly reduced as shown in Fig. 1.1. Although the size reduction improved the density and the performance of the integrated circuits, it caused additional problems which had been negligible before.

The major problems accompanied with the size reduction are (1) degradation by hot carrier and (2) short channel effects. The former one is conventionally solved by, so called, drain engineering like LDD (Lightly Doped Drain) technology or supply voltage reduction for deepsub-micron gate circuits. On the other hand, in order to suppress the latter problem, it is effective to make shallow junction of source and drain with increasing substrate doping density. However, to reduce junction depth in bulk MOSFETs or to increase doping density results in the reduction of break-down voltage and the degradation of carrier mobility.

To solve these problems, a fully depleted SOI (Silicon On Insulator) technology was proposed and it enabled to fabricate short-channel gate MOSFETs[1.2]. Although MOSFETs



Fig. 1.1 Progressive reduction of feature sizes and resolution limits in optical lithography[1.1].

made by SOI structure had existed before the fully-depleted type was proposed, their target was not suppression of short channel effect, but the study of three dimensional integrated circuits or high speed and high voltage device, so that SOI layer on which MOSFETs were made was relatively thick, which did not only inherit the problems of short-channel bulk MOSFETs, but also caused the particular problem of thick SOI MOSFETs like a kink effect as shown in Fig. 1.2 (a)[1.3]. On the other hand, about the reported fully-depleted MOSFETs, where the SOI layer thickness was reduced to be fully depleted by inversion layer, they were effective for controlling of kink effect as shown in Fig. 1.2 (b). It is noted that although, at first, it was found that the thinned SOI layer could suppress only punch-through by buried oxide[1.2], it was also found later that the kink effect could be also controlled[1.3] and that the sub-threshold characteristics could be improved to the theoretical limit[1.4]. Additionally, the substrate doping density can be reduced in SOI MOSFETs, while it must be kept high in bulk MOSFETs





to reduce short channel effects, which means that the ultra-thin SOI substrate may improve the carrier mobility in inversion layer, and reduce wiring capacitance by buried oxide, which causes delay time degradation in large scale integrated circuits.

Although fully-depleted SOI MOSFETs are theoretically superior to conventional bulk MOSFETs, its performance highly depends on the process technology to make ultra-thin silicon layer by single crystal on buried oxide. SOI layer was conventionally grown on a sapphire substrate, so called SOS (Silicon On Sapphire), but it was not thin enough to deplete completely. In 1978, SIMOX (Separation by IMplanted OXygen) technology was firstly proposed by Izumi *et al.*[1.6]. Because buried oxide is made by oxygen implantation and thermal annealing in SIMOX, the ultra-thin SOI layer can be made more easily than by SOS. Its fabrication process flow is shown in Fig. 1.4. Recently, by the reduction of oxygen implantation and thermal annealing of 1350 °C, the defect density in the SOI layer was reduced to  $10^2 / \text{cm}^2$  and the interface quality between the SOI layer and buried oxide was improved to the almost same



Fig. 1.3 Cross-section of a SIMOX MOS transistor[1.5].

that made by thermal oxidation[1.7]. Cross section of an nMOSFET fabricated with a SIMOX substrate is shown in Fig. 1.3. Currently, ultra-thin SOI film with under 1000 Å has been available in such large area that integrated circuits can be fabricated using SIMOX technology. For an example, a ring oscillator was fabricated by 0.29  $\mu$ m-gate SIMOX/SOI MOSFET, the delay time of which was 21.5 ps[1.8].

## 1.2 Target and Outline of This Dissertation

Target of this study is to show the guideline of the optimum design for the CMOS circuits using ultra-short-channel MOSFETs, the gate length of which is near 0.1  $\mu$ m. SOI MOSFETs were used in this study because their fabrication technology is currently most advanced for the ultra-short-channel MOSFET. The target is, however, not only for SOI MOSFETs, but also any type of ultra-short-channel MOSFETs including bulk ones when their fabrication technology is advanced in future.

The studies described in this dissertation is summarized in Fig. 1.5. They will be described







Fig. 1.5 Outlines of this dissertation.

from an device level shown at the bottom of pyramid in Fig. 1.5 to a system level shown at the top of pyramid. At first, modeling of drain current and delay time will be described. In section 2, by showing difference of drain current characteristics between long- and short-channel MOSFETs, precise drain current model of ultra-short-channel MOSFETs will be proposed. Then after delay time modeling of a CMOS inverter is described in section 3, which is the basis of design for integrated circuits, the essential parameters determining circuit performance such as power dissipation or delay time will be clarified in section 4. In section 5, speed optimization method for large scale integrated circuits fabricated on an SOI substrate will be proposed using simple gate width optimization theory. Circuit examples of SOI CMOS circuits, which operates at very low power and high speed, will be described in section 6 and 7, respectively, taking a frequency divider for one of basic circuits. In section 8, new high speed carry generation algorithm will be proposed for a high speed adder. Using all results described from 2 to 8 sections, a design of RISC microprocessor will be reported briefly for a practical example so that the feasibility of ultra-short-channel SOI MOSFETs applying to the large scale integrated circuits is shown. Finally, all the studies about ultra-short-channel CMOS circuits will be discussed and concluded.

## References

- [1.1] T. Masuhara, K. Itoh, K. Seki and K. Sasaki, "VLSI Memories: Present Status and Future Prospects," *IEICE Trans.*, vol. E 74, no. 1, pp. 130–141, Jan., 1991.
- [1.2] S. D. S. Malhi, H. W. Lam, R. F. Pinizzoto, A. H. Hamdi and F. D. MacDaniel, "Novel SOI CMOS design using ultra thin near intrinsic substrate," *IEDM Tech. Dig.*, pp. 107– 110, Dec., 1982.

- [1.3] J. P. Colinge, "Reduction of floating substrate effect in thin-film SOI MOSFETs," *Electron. Lett.*, vol. 22, no. 4, pp. 187–188, Feb., 1986.
- [1.4] J. P. Colinge, "Subthreshold slope of thin-film SOI MOSFET's," *IEEE Electron Device Lett.*, vol. EDL-7, no. 4, pp. 244–246, Apr., 1986.
- [1.5] N. Ieda, "Technology Trends in ASIC," *IEICE Trans.*, vol. E 74, no. 1, pp. 148–156, Jan., 1991.
- [1.6] K. Izumi, M. Doken and H. Ariyoshi, "C.M.O.S. devices fabricated on buried SiO<sub>2</sub> layers formed by oxygen implantation into silicon," *Electron. Lett.*, vol. 14, no. 18, pp. 593–594, Aug., 1978.
- [1.7] S. Nakashima and K. Izumi, "Practical reduction of dislocation density in SIMOX wafers," *Electron. Lett.*, vol. 26, no. 20, pp. 1647–1649, 1990.
- [1.8] H. Miki, T. Ohmameuda, M. Kumon, K. Asada, T. Sugano, Y. Ohmura and K. Izumi, "Subfemtojoule deep submicrometer-gate CMOS built in ultra-thin Si film on SIMOX substrate," *IEEE Trans. on Electron Devices*, vol. 38, no. 2, pp. 373–377, Feb., 1991.

## Part I

Characterization of Subquarter-micrometer-gate MOSFETs

## Chapter 2

## **Drain Current Characterization**

#### Abstract

A new drain current model for a short channel MOSFET is proposed, which is named non-pinch-off model. In this model, the effect of horizontal electric field is precisely taken into account to solve the two dimensional poisson's equation. The pinch-off point, where the horizontal electric filed tends to infinity in the conventional gradual-channel approximation, disappears in the non-pinch-off model, so that linear and saturation regions are smoothly connected. As a result, the ambiguity of the boundary between linear and saturation region in a short channel MOSFET can be understood using a single equation for drain current.

## Notation

 $t_{CH}$ : effective surface-channel thickness.

- x: distance from source edge.
- $C_{ox}$  : gate-oxide capacitance.
- $F_B$ : the coefficient for short channel effect.

 $I_D$ : drain current.

 $I_{D0}$ : drain current calculated using gradual-channel approximation.

L: gate length.

W: gate width.

Q(x): total surface charge sheet density.

 $Q_t(x)$ : surface charge sheet density corresponding to longitudinal field difference.

 $V_{TH}$ : threshold voltage.

 $V_{GS}$  : gate voltage.

 $\phi(x)$  : surface potential.

 $\varepsilon_{Si}$ : permittivity of silicon.

 $\mu_0$ : carrier mobility on low field.

 $\mu_s$ : carrier mobility taking into account transverse field dependence.

 $\mu_{eff}(x)$ : effective mobility; velocity saturation and transverse field dependence are taken into account.

 $\theta$ : coefficient of longitudinal field effect.

## 2.1 Introduction

Models for current-voltage characteristics of MOSFETs, such as a gradual-channel approximation[2,1], were conventionally derived assuming that boundary of linear and saturation regions is clear[2,3][2,2]. These are reasonable when MOSFET in saturation region can be regarded as a constant current source with taking channel-length modulation into account, which is







Fig. 2.2 Comparison of *I-V* characteristics and drain conductance between (a) 0.1  $\mu$ m gate length for short channel and (b) 0.8  $\mu$ m gate for long channel. Gate voltage is 0.5 V.



Fig. 2.3 The cross-sectional view of surface carrier in the non-pinch-off model. The effective surface carrier thickness,  $t_{CH}$  is introduced in the new model.

modulation of distance from the pinch-off point to the drain edge as shown in Fig. 2.1. However, such boundary tends to be ambiguous in deep-submicron MOSFETs as shown in Fig. 2.2. Actual surface charge never physically disappears even at a pinch-off point. Moreover, the concept of channel-length modulation loses its physical meaning for MOSFETs on SOI substrates with nearly non-doped channel[2.4].

In this chapter, a novel non-pinch-off gradual-channel model will be proposed considering the gradient of horizontal electric field, which is neglected in the conventional model, so that the surface charge does not disappear and the effective channel-length modulation factor in the conventional models is automatically taken into account.

## 2.2 Non-pinch-off Model

In the non-pinch-off gradual-channel model, an effective surface-channel thickness is introduced as  $t_{CH}$  as shown in Fig. 2.3. Surface charge sheet density Q(x) is then derived as

$$Q(x) = -\left(\varepsilon_{\rm Si} t_{CH} \frac{\mathrm{d}^2 \phi(x)}{\mathrm{d}x^2} + Q_t(x)\right),\tag{2.1}$$

where

$$Q_t(x) \equiv C_{ox}(V_{GS} - V_{TH} - (1 + F_B)\phi(x)).$$
(2.2)

The first term in the right side of (2.1) is a charge sheet density corresponding to the gradient of electric field in parallel with channel, while  $Q_t(x)$  stands for the surface charge corresponding to the conventional gradual-channel model. It is noted that  $Q_t(x)$  can be negative in the pinchoff region in conventional sense. When diffusion current is negligible, drain current  $I_D$  is given as

$$I_D = -WvQ(x), \qquad (2.3)$$

where v is carrier velocity given as

$$v = \frac{\mu_s \frac{\mathrm{d}\phi(x)}{\mathrm{d}x}}{1 + \frac{\mu_s}{v_{sat}} \frac{\mathrm{d}\phi(x)}{\mathrm{d}x}}.$$
(2.4)

By substituting (2.1) to (2.3) and integrating with respect to x, we get the following equation:

$$I_D = \frac{1}{2} \varepsilon_{\rm Si} \mu_{eff}(x) t_{CH} \frac{W}{x} \left\{ \left( \frac{\mathrm{d}\phi(x)}{\mathrm{d}x} \right)^2 - \left( \frac{\mathrm{d}\phi}{\mathrm{d}x} \right)_{x=0}^2 \right\} + I_{D0}, \tag{2.5}$$

where  $I_{D0}$  is the expression of drain current in linear region in the conventional gradual-channel model given as

$$I_{D0} \equiv \mu_{eff}(x) C_{ox} \frac{W}{x} \left( V_{GS} - V_{TH} - (1 + F_B) \frac{\phi(x)}{2} \right) \phi(x),$$
(2.6)

 $\frac{d\phi}{dx}\Big|_{x=0}$  is the electric field at source edge, which is given as the following equation by assuming the second derivative at source edge is zero so that the electric field at source edge is equal to that of the gradual-channel model:

$$\left. \frac{\mathrm{d}\phi}{\mathrm{d}x} \right|_{x=0} = \frac{I_D}{\mu_s C_{ox} W(V_{GS} - V_{TH}) - \frac{\mu_s}{v_{sat}} I_D},\tag{2.7}$$



Fig. 2.4 Comparison of *I-V* characteristics between non-pinch-off model and measurement of (a) a nMOSFET and (b) a pMOSFET. Dotted lines are measured data. Gate length is 0.1  $\mu$ m. 11 Å is employed for  $t_{CH}$ .

and the effective mobility,  $\mu_{eff}(x)$ , is given as

$$\mu_{eff}(x) = \frac{\mu_s}{1 + \frac{\mu_s}{v_{eff}} \frac{\phi(x)}{x}}.$$
(2.8)

(2.5) is rewritten as:

$$\left(\frac{\mathrm{d}\phi(x)}{\mathrm{d}x}\right)^2 = \frac{2(I_D - I_{D0})x}{\varepsilon_0\varepsilon_{\mathrm{Si}}\mu_{eff}(x)t_{CH}W} + \left(\frac{\mathrm{d}\phi}{\mathrm{d}x}\Big|_{x=0}\right)^2.$$
(2.9)

In this study, the above equation was solved numerically. An example of current-voltage characteristics of a 0.1- $\mu$ m-gate-length SOI/MOSFET is shown in Fig. 2.4, where the effective surface channel thickness is assumed as 11 Å[2.5] and  $\mu_s$  is approximated as

$$\mu_s = \frac{\mu_0}{1 + \theta(V_{GS} - V_{TH})},\tag{2.10}$$

which is independent of the position x. Drain conductance characteristics is shown in Fig. 2.5,



Fig. 2.5 Comparison of drain conductance characteristics between (a) SPICE level 3 for conventional model and (b) non-pinch-off model. Dots are measured data. Gate length is 0.1  $\mu$ m.



Fig. 2.6 *I-V* characteristics and drain conductance for various effective surface-channel thickness. Gate length is 0.1  $\mu$ m. Gate voltage is 0.5 V.

where the results obtained by level-3 model of SPICE3 are also shown. The results obtained by our model shows better agreement with measured data[2.1] than the conventional model, because the I-V characteristics is expressed by single equation for both of *linear* and *saturation* region. It is noted that  $I_D$  in (2.5) monotonically increases even when  $I_{D0}$  decreases in the *saturation region* because the first term in (2.5) compensates the decrease of  $I_{D0}$ . When  $t_{CH}$ goes to zero,  $I_D$  in (2.5) tends to  $I_{D0}$  in *linear region* and to be constant in *saturation region* as shown in Fig. 2.6.

## 2.3 Conclusion

A non-pinch-off model was proposed for drain current of a short channel MOSFET, in which horizontal electric field was taken into account. It showed the better agreement with the measured data of a 0.1- $\mu$ m-gate MOSFET than a conventional model. Equation (2.9) will be a good start point for establishing an explicit form, which is desired in deep-submicron MOSFET modeling.

## References

- [2.1] W. Shockley, "A unipolar "field-effect" transistor," Proc. IRE., vol. 40, pp. 1365–1376, Aug. 1952.
- [2.2] H. Masuda, J. Mano, R. Ikematsu, H. Sugihara and Y. Aoki, "A submicrometer MOS transistor *I-V* model for circuit simulation," *IEEE Trans. on Computer-Aided Design*, vol. CAD-10, no. 2, pp. 161–170, Feb. 1991.
- [2.3] B. Moon, C. Park, K. Lee and M. Shur, "New short-channel n-MOSFET current-voltage model in strong inversion and unified parameter extraction method," *IEEE Trans. on Electron Devices*, vol. ED-38, no. 3, pp. 592–602, Mar. 1991.

- [2.4] K. Throngnumchai, K. Asada, and T. Sugano, "Modeling of 0.1-μm MOSFET on SOI structure using monte carlo simulation technique," *IEEE Trans. on Electron Devices*, vol. ED-33, no. 7, pp. 1005–1011, July, 1986.
- [2.5] C. Park, C. Lee, K. Lee, B. Moon, Y. Byun and M. Shur, "A unified current-voltage model for long-channel nMOSFET's," *IEEE Trans. on Electron Devices*, vol. ED-38, no. 2, pp. 399–406, Feb. 1991.
- [2.6] Y. Omura, S. Nakashima, K. Izumi and T. Ishii, "0.1-µm-gate, ultrathin-film CMOS devices using SIMOX substrate with 80-nm-thick buried oxide layer," in *IEDM Tech.* Dig., pp. 675–678, 1991.

## Chapter 3

## **Delay Time Characterization**

#### Abstract

Dynamic performance of ultra-thin SIMOX (Separation by IMplanted OXygen) CMOS circuits has been studied using ring oscillators. A novel concept of current-delay product, along with an equivalent linear resistance of MOSFETs, is applied for deriving effective load capacitance of near 0.1  $\mu$ m gate CMOS circuits. Calculation results showed quantitative agreement with measurement data. It was found that the gate-fringing capacitance limits the delay time in the case of under 0.2  $\mu$ m gate-length. The lower bound of power-delay product of SIMOX/SOI is expected as low as 0.2 fJ for the gate length of 0.15  $\mu$ m at the supply voltage of 1.5 V.

## 3.1 Introduction

Low power operation is one of the serious problems in future VLSIs. SOI (Silicon On Insulator) is expected as a promising technology for this problem. These features are derived from low effective capacitance in circuits. Evaluation of effective load capacitance in logic circuits fabricated on SOI substrates is, however, a complex problem compared with the conventional bulk circuits[3.1], because the devices including MOSFETs and wiring are placed on an insulator film, which is again placed on a semiconductor substrate. Capacitance of a drain node and



Fig. 3.1 nMOS pull-down circuit.

wiring in bulk circuits can be easily evaluated using conventional junction capacitance model and parallel plate capacitance model[3.2], because bulk devices are fabricated on a highly doped substrate, where the effect of depletion layer beneath the field oxide is relatively small. On the other hand, ultra-thin SOI devices do not need a highly doped substrate because they are free from the punch-trough problem of a channel region owing to the insulator film[3.3], so that a lightly- or non-doped and high resistance substrate is frequently used expecting both high speed and low power dissipation for their low stray capacitance[3.4]. Such a lightly doped substrate, however, is sensitive to device fabrication process in terms of impurity profile around the interface of buried oxide and the substrate, which makes it difficult to understand the effective depletion layer under the buried oxide.

In this chapter, evaluation method of effective load capacitance will be described from the analysis of dynamic operation of CMOS circuits. In the following section, after describing the derivation method of effective load capacitance, we will compare the model and the measured data from ring oscillators fabricated on SOI/SIMOX substrates.

## 3.2 Analytical Modeling of Ring Oscillators

3.2.1 Delay Time Model of CMOS Inverter Using Equivalent Linear Resistance

In this section, the equivalent linear resistance of MOSFETs will be derived for an nMOS pull-down circuit with load capacitance  $C_L$ , as shown in Fig.3.1. Let the source-drain voltage of an nMOSFET be  $v_{DS}$ . The drain current,  $i_{DS}$ , is given as[3.5]:

$$i_{DS} = -C_L \frac{\mathrm{d}v_{DS}}{\mathrm{d}t}.\tag{3.1}$$

The transition time, t, is given as the following equation when the drain voltage is reduced from  $V_0$  to  $V_1$ .

$$t = C_L \int_{V_1}^{V_0} \frac{\mathrm{d}v_{DS}}{i_{DS}}.$$
 (3.2)

Defining the equivalent linear resistance as

$$R_{eff} \equiv \int_{V_1}^{V_0} \frac{\mathrm{d}v_{DS}}{i_{DS}} \quad \text{, for step input,} \tag{3.3}$$

(3.2) is simply written as;

$$t = C_L R_{eff}.$$
(3.4)

It is noted that the equivalent linear resistance  $R_{eff}$  is directly calculated from measured I-V characteristics of MOSFET using numerical integration.

The input signal to inverters in the real ring oscillator is not a step function. The delay time  $t_{pd}$  for the inverter in a ring oscillator is, however, empirically known to be given as the following equation[3.8]:

$$t_{pd} = \frac{5}{8}(t_{09} - t_{01}), \qquad (3.5)$$

where  $(t_{09} - t_{01})$  are the 10 % - 90 % transition time of the inverter output in case of the step input. By applying this equation to (3.3), the delay time  $t_{pd}$  is given as

 $t_{pd} = C_L R_D, \tag{3.6}$ 

where  $R_D$  is an equivalent linear resistance for a ring oscillator defined as follows:

$$R_D \equiv \frac{5}{8} \int_{0.1V_{DD}}^{0.9V_{DD}} \frac{\mathrm{d}v_{DS}}{i_{DS}} \quad \text{for a ring oscillator.}$$
(3.7)

Although it was reported that heating effect might degrade drain current statically[3.9], the error of dynamic delay-time evaluation using (3.7) is expected to be small because of the following two reasons:

- The heating effect is small in the case of thinned buried oxide. Buried oxide of 800 A
  was used in this study.
- The equivalent linear resistance is calculated by integrating *inverted* drain current from low to high drain voltage, so that the effect of drain-current degradation is small, which occurs only in high drain-voltage and high current region.

In the case of thick buried oxide, it will be needed to use a kind of pulse I-V measurement. It is noted that, however, the difference between measured delay time and calculation results obtained by (3.6) and (3.7) due to the heating effect in the case of thick buried oxide can be also evaluated from the current dissipation of a ring oscillator, which will be mentioned in the next section.

## 3.2.2 Relation Between the Equivalent Linear Resistance and Current Dissipation of a Ring Oscillator

In this section, the total current dissipation of a ring oscillator will be discussed as shown in Fig. 3.2.

The current dissipation is composed of static leak current and dynamic charging current of load capacitance, which is given as

$$I_{total} = C_L V_{DD} N f + (I_{lp} + I_{ln}) N/2,$$
(3.8)



Fig. 3.2 Ring oscillator circuit.

where  $I_{total}$  is the total current dissipation of a ring oscillator,  $C_L$  is a load capacitance,  $V_{DD}$ is supply voltage, N is the number of stages, f is oscillation frequency, and  $I_{lp}$  and  $I_{ln}$  are the sub- or near-threshold leak current of a pMOSFET and an nMOSFET at the gate-source voltage of zero, respectively. It is noted that the leak current is no more neglected in the 0.1- $\mu$ m-device region because of the reduction of the threshold voltage. In (3.8), the amount of leak current is independent of oscillation frequency, so that the dynamic charging current  $I_0$  is given as:

$$I_0 = I_{total} - (I_{lp} + I_{lp})N/2 = C_L V_{DD} Nf.$$
(3.9)

Substituting the delay time of an inverter given as

$$t_{pd} = \frac{1}{2Nf},\tag{3.10}$$

(3.9) can be simplified as

$$I_0 = \frac{C_L V_{DD}}{2t_{pd}}.$$
 (3.11)



Fig. 3.3 Relation between parameters of a MOSFET and a ring oscillator

Using the equivalent linear resistances of an nMOSFET  $R_D^{(d)}$  for a pull-down circuit and a pMOSFET  $R_D^{(u)}$  for a pull-up circuit, the average delay time of an inverter for rising and falling output is also given as the following equation from (3.6);

$$t_{pd} = \frac{C_L}{2} (R_D^{(d)} + R_D^{(u)})$$
(3.12)

From this equation and (3.11), we get finally

$$I_0 = \frac{V_{DD}}{R_D^{(d)} + R_D^{(u)}}.$$
(3.13)

This equation means that the ring oscillator can be regarded as a serial resistor circuit of  $R_D^{(d)}$ and  $R_D^{(u)}$  from a view point of average power consumption.

From the above derivation, we can get relation among the equivalent linear resistance, load capacitance of a MOSFET, current dissipation, and delay time of ring oscillator, as shown in Fig. 3.3.



Fig. 3.4 Schematic cross-sectional view of a MOSFET built with SIMOX substrate.

## 3.3 Experimental Results

#### 3.3.1 Process Parameters and Device Features

Starting substrates were nearly intrinsic, slightly p-type, (100)-oriented Si wafers. A dose of  $0.4 \times 10^{18}$  /cm<sup>2</sup> of  $^{16}\text{O}^+$  was implanted to the wafers with a 100-mA class implanter. Then wafers were annealed at 1350°C. Buried oxide layers were about 80 nm thickness. SOI film was thinned to 50 nm or 30 nm by thermal oxidation and etching. Gate oxide was about 7 nm in thickness grown by dry thermal oxidation at 850°C.

Fig. 3.4 shows a schematic cross-sectional view of a MOSFET. Fig. 3.5 shows drain current  $I_{DS}$  and voltage  $V_{DS}$  characteristics of 0.12- $\mu$ m gate n-channel and p-channel MOSFETs fabricated on 50-nm-thick SOI substrates. Fig. 3.6 also shows  $I_{DS}$  and  $V_{DS}$  characteristics of 0.1- $\mu$ m gate n-channel and p-channel MOSFETs fabricated on 30-nm-thick SOI substrates. The gate width  $W_G$  is 16  $\mu$ m for both n-channel and p-channel MOSFETs. The effective linear

-



Fig. 3.5 Drain current-voltage,  $I_{DS}$ - $V_{DS}$ , characteristics of MOSFET built with 50-nm-thick SOI film. The gate length is 0.12  $\mu$ m. (a) n-channel MOSFET. (b) p-channel MOSFET. No substrate bias is applied. The parameter is the gate-to-source voltage,  $V_{GS}$ , in volt.



Fig. 3.6 Drain current-voltage,  $I_{DS}$ - $V_{DS}$ , characteristics of MOSFET built with 30-nm-thick SOI film. The gate length is 0.1  $\mu$ m. (a) n-channel MOSFET. (b) p-channel MOSFET. No substrate bias is applied. The parameter is the gate-to-source voltage,  $V_{GS}$ , in volt.



Fig. 3.7 Capacitance elements contributing the total load capacitance. Intrinsic gate-oxide capacitance  $C_{ox}$  is omitted in this figure.

resistance, as used in the following discussions, was numerically calculated using these  $I_{DS}-V_{DS}$ characteristics based on (3.7).

## 3.3.2 Calculation of Load Capacitance from 3-dimensional shape

Fig. 3.7 shows capacitances contributing to the total load capacitance  $C_L$ . In a ring oscillator,  $C_L$  is

$$C_L = C_{ox}^{(n)} + C_{ox}^{(p)} + 2C_{GD}^{(n)} + 2C_{GD}^{(p)} + C_{GS}^{(n)} + C_{GS}^{(p)} + C_w$$
  
=  $C_{ox}^{(n)} + C_{ox}^{(p)} + 3C_t^{(n)} + 3C_t^{(p)} + C_w,$  (3.14)

$$C_{GD}^{(n)} = C_{GS}^{(n)} = C_f^{(n)}, \tag{3.15}$$

$$C_{GD}^{(p)} = C_{GS}^{(p)} = C_{f}^{(p)},$$
(3.16)

where  $C_{GD}$ ,  $C_{GS}$ ,  $C_w$ ,  $C_{ox}$  and  $C_f$  are the gate-drain capacitance, the gate-source capacitance,



Fig. 3.8 Calculation results of gate-fringing and gate-oxide capacitances based on [7] as a function of gate length. The thickness of gate-oxide capacitance is (a) 7 nm and (b) 3.5 nm. Capacitances are normalized by gate length,  $W_G$ .

wiring capacitances including drain-substrate capacitance, gate-oxide capacitance and gatefringing capacitance. Indices (n) and (p) represent nMOSFET and pMOSFET, respectively. Three times of  $C_I$  should be included in  $C_L$  for each of nMOSFET and pMOSFET, because a MOSFET drives  $C_{GD}$  of itself as well as  $C_{GD}$  and  $C_{GS}$  of a MOSFET in the next stage. Details of calculation of the fringing capacitance  $C_f$  is described in Appendix A. The calculation results are shown in Fig. 3.8, where gate-electrode height is 350 nm and gate-oxide thickness is 7 nm, which is the same value in this process shown in (a). Results for thinned gate oxide of 3.5 nm are also shown in (b) for comparison. The gate-fringing capacitance becomes larger than gate-oxide capacitance for under 0.15- $\mu$ m gate length in the case of 7-nm-gate-oxide thickness, which affects the delay time of inverter as described below. In the case of thinned gate oxide, however, the effect of the gate-fringing capacitance will be kept small even for 0.1  $\mu$ m gate



Fig. 3.9 The delay time of an inverter obtained by (a) measurement of ring oscillators and (b) calculation from DC characteristics of MOSFETs built with 30-nm-thick SOI film.  $C_{box}$  represents drain-substrate capacitance calculated using parallel plate model.

devices as shown in Fig. 3.8 (b).  $C_w$  comprises both of metal wiring stray capacitance  $C_{metal}$ and drain-substrate capacitance  $C_{box}$ , the former of which is, however, negligible in case of SOI circuits.

#### 3.3.3 Comparison with Calculation and Measurement

Both calculated and measured delay time built with 30-nm-thick SOI film is shown in Fig. 3.9 for comparison. Calculation results both with and without considering drain-substrate capacitance,  $C_{box}$ , employed for wiring capacitance in (3.14), using the parallel-plate model for buried oxide, are shown in Fig. 3.9 (b). The calculation results with  $C_{box}$  show better agreement than that without  $C_{box}$ , which implys that the depletion layer under the buried oxide in MOSFET region can be different from the other region like metal wiring region and the substrate capacitance may not be neglected even for the lightly doped substrate. It is



Fig. 3.10 Threshold voltage of MOSFETs built with 30-nm-thick SOI film.

noted that the measured delay time is a little larger than that obtained by calculation in the case of 0.1- $\mu$ m-gate because the internal signal swing is reduced due to leak current. The upper and lower bounds as well as the drain current,  $i_{DS}$ , in integration in (3.7) should be modified corresponding to the reduced gate voltage, so as to take the swing reduction into account in this case. The leak current in the case of other gate length, however, does not affect the signal swing despite of slightly negative threshold voltage of nMOSFETs as shown in Fig. 3.10 because on-current is much larger than leak current.

Both calculated and measured delay time for circuits built with 50-nm-thick SOI film is also shown in Fig. 3.11 for comparison, where  $C_{box}$  is considered in (3.14). The calculated delay time agrees with measured data. It is also shown that the delay time will saturated as the gate length is reduced, because  $C_f$  and  $C_w$  become dominant over  $C_{ox}$  unless the scaling is carried out vertically, as well.

On the contrary, the effective load capacitance has been also evaluated from delay time


Fig. 3.11 The delay time of an inverter obtained by (a) measurement of ring oscillators and (b) calculation from DC characteristics of MOSFETs built with 50-nm-thick SOI film.



Fig. 3.12 Evaluation of effective load capacitance. Triangles and circles are calculated results in the case of 1.5 volt and 2.0 volt supply voltage by using (6). Solid line is a calculated result from (14).

and equivalent linear resistance using (3.7) as shown in Fig.3.12, in which theoretical load capacitance based on (3.14) is also shown.

Current dissipation derived from (3.13) and measured data are shown in Fig. 3.13. In this figure, current dissipation difference between calculation and measurement gives estimation of D.C. leak current based on (3.8). Figure 3.14 shows leak current directly calculated from  $I_{DS}$ - $V_{DS}$  characteristics of typical MOSFETs. Comparing Figs. 3.13 and 3.14, this estimation shows fare agreement but the current in Fig. 3.14 was a little smaller than that of Fig. 3.13. It is reasonable because distribution of the nMOSFET threshold voltage near 0 volt as shown in Fig. 3.10 contributes to increase the total leak current of 51-stage inverters. The parasitic bipolar effects in dynamic operation might enhance the leak current, though it is not clear quantitatively to get.



Fig. 3.13 The current dissipation of an ring oscillator obtained when (a) supply voltage is 1.5 V and (b) supply voltage is 2 V built with 30-nm-thick SOI film.



Fig. 3.14 Evaluation of leak current from D.C characteristics of MOSFETs built with 30-nm-thick SOI film.

# 3.4 Discussions

### 3.4.1 Derivation of the Effective Load Capacitance

In this section, the derivation method of the effective load capacitance will be discussed, the outline of which is shown in Fig. 3.3.

At first, by using delay time and equivalent linear resistance, effective load capacitance can be given as follows;

$$C_L = \frac{2t_{pd}}{(R_D^{(d)} + R_D^{(u)})}.$$
(3.17)

The effective load capacitance obtained by this equation shows good agreement with theoretical load capacitance calculated from (3.14), as shown in Fig. 3.12. This method, however, needs I-V characteristics of a typical MOSFET.

On the other hand, effective load capacitance can be also derived from current-delay product as follows;

$$C_L = \frac{2t_{pd}I_0}{V_{DD}}.$$
(3.18)

One of advantageous features of (3.18) is that the effective load capacitance can be obtained without using D.C. characteristics, which is usually measured using different MOSFETs from ring oscillators. Static leak current of a ring oscillator, however, needs to be evaluated precisely, for  $I_0$  is a dynamic charging current excluding leak. Attention should be paid to the fact that the estimation of leak current from D.C. characteristics could be underestimated as shown in Figs. 3.13 and 3.14. The direct measurement of static leak current of the inverter array, which has the same number of stages as a ring oscillator, will be better evaluation than that from D.C. characteristics, because it contains the effect of leak-current distribution. Finally, both evaluation methods of effective load capacitance is expected to show the same result when the leak current is measured precisely, so that it gives us better confidence.

### 3.4.2 Effects of Leak Current on Power-Delay Product

It was found that the leak current was dominant in the total current in deep-submicron-CMOS circuits. In this section, power-delay product and its theoretical minimum of inverters will be discussed.

When the power dissipation of an inverter in an N-stage ring oscillator is P, power-delay product for each inverter, E, is given as;

$$E = V_{DD} I_{total} t_{nd} / N. \tag{3.19}$$

It is noted that  $I_{total}$  tends to  $I_0$  when the leak current becomes negligible, so that the theoretical minimum of power-delay product,  $E_0$  can be obtained using (3.13) as follows:

$$E_0 = V_{DD} I_0 t_{pd} / N$$
  
=  $\frac{V_{DD}^2 t_{pd}}{N(R_D^{(d)} + R_D^{(u)})},$  (3.20)

Power-delay product obtained by measured data as well as lower bound of power-delay product using (3.19) are shown in Fig. 3.15. Although measured power-delay product of an inverter with 0.15- to 0.25- $\mu$ m gate length was under 1 fJ in measurement, which is the similar results reported in[3.4], it is expected to be reduced down to 0.2 fJ using SOI inverters with under 0.15- $\mu$ m gate length at the supply voltage of 1.5 V.

# 3.5 Conclusions

Dynamic performance of ultra-thin SIMOX/CMOS circuits has been studied. An effective load capacitance in dynamic operation has been successfully derived from a novel concept of current-delay product of a ring oscillator. Results obtained from this study are;

1. Delay time of CMOS inverters can be accurately estimated from the product of load capacitance and equivalent linear resistance of MOSFETs, the latter of which can be



Fig. 3.15 Power-delay product obtained by measured data and lower bound of power-delay product using (20) at supply voltage of (a) 1.5 V and (b) 2.0 V.

calculated from DC characteristics of MOSFETs. This fact also means that the effective load capacitance can be accurately estimated by measuring the delay time of ring oscillators, and  $I_{DS}$ - $V_{DS}$  DC characteristics of MOSFETs.

- Current dissipation of ring oscillators contains static leak current of inverters and dynamic charging current of load capacitance; the latter current is equal to the supply voltage divided by the sum of equivalent linear resistances of an nMOSFET and pMOS-FET.
- 3. Finally the effective load capacitance in dynamic operation of CMOS inverters can be derived from delay-current product of ring oscillators without using I<sub>DS</sub>-V<sub>DS</sub> DC characteristics of MOSFETs, when the static leak current is measured or estimated separately.

By applying the above results to experimental ring oscillators fabricated on SIMOX/SOI, it was shown that the gate-fringing capacitance will limit the delay time in the case of under 0.2  $\mu$ m gate-length.

# Appendix A Gate Capacitance

Capacitances were calculated taking into account gate-fringing capacitance  $C_f$  and intrinsic gate-oxide capacitance  $C_{ox}$ . The parallel-plate model were employed for  $C_{ox}$  and the following equations for  $C_f[3.7]$ :

$$C_f = \frac{\epsilon_0}{\pi} [\epsilon_{ox} \{ 2 - \ln(4) + \ln(u/a) \} + \epsilon_{ni} \ln(a) ] \cdot L_G,$$
(3.21)

where  $\epsilon_0$ ,  $\epsilon_{ox}$  and  $\epsilon_{ni}$  are the permittivity of free space, a relative dielectric constant of the oxide and a relative dielectric constant of the nitride used for gate side-spacer, respectively.  $L_G$  is the gate length, while u and a are constants determined by the following equations:

$$a = 2K(K^2 - 1)^{1/2} + 2K^2 - 1$$
, and (3.22)

$$u = (R^2 a - 1)/(R^2 - 1), (3.23)$$

where K and R are calculated as follows:

$$K = 1 + t_m/t_{ox}, \text{ and}$$
(3.24)

$$\frac{\pi}{2} \cdot \frac{L_G}{t_{ox}} = \frac{a-1}{a^{1/2}} \cdot \frac{R}{(R^2-a)} + \ln\left(\frac{a^{1/2}R+1}{a^{1/2}R-1}\right) - \frac{a+1}{2a^{1/2}} \cdot \ln\left(\frac{R+1}{R-1}\right).$$
(3.25)

Here,  $t_m$  and  $t_{ox}$  are the thickness of gate poly-silicon and gate oxide respectively.

## References

- [3,1] Y. Omura and K. Izumi, "A new model of switching operation in fully depleted ultrathinfilm CMOS/SIMOX," *IEEE Electron Device Lett.*, vol. 12, no. 12, pp. 655–657, Dec., 1991.
- [3.2] S. M. Sze, Physics of semiconductor devices, Wiely, 1969.
- [3.3] K. Throngnumchai, K. Asada, and T. Sugano, "Modeling of 0.1-μm MOSFET on SOI structure using monte carlo simulation technique," *IEEE Trans. on Electron Devices*, vol. ED-33, no. 7, pp. 1005–1011, July, 1986.
- [3.4] H. Miki, T. Ohmameuda, M. Kumon, K. Asada, T. Sugano, Y. Ohmura and K. Izumi, "Subfemtojoule deep submicrometer-gate CMOS built in ultra-thin Si film on SIMOX substrates," *IEEE Trans. Electron Devices*, vol. ED-38, no. 2, pp. 373–377, Feb. 1991.
- [3.5] M. Fujishima, K. Asada, and T. Sugano, "Evaluation of delay-time degradation of lowvoltage BiCMOS based on a novel analytical delay-time modeling," *IEEE J. Solid-State Circuits*, vol. 26, no. 1, pp. 25–31, Jan. 1991.
- [3.6] Y. Omura, S. Nakashima, K. Izumi and T. Ishii, "0.1-µm-gate, ultrathin-film CMOS devices using SIMOX substrate with 80-nm-thick buried oxide layer," in *IEDM Tech. Dig.*, pp. 675–678, 1991.
- [3.7] E. Greeneich, "An analytical model for the gate capacitance of small-geometry MOS structures," *IEEE Trans. on Electron Devices*, vol. ED-30, no. 12, pp. 1838–1839, Dec. 1983.
- [3.8] T. Sugano and T. Iizuka, CMOS VLSI design, Chapter 4.3, Baifukan, 1989, (in Japanese).
- [3.9] L. J. MacDaid, S. Hall, W. Eccleston and J. C. Alderman, "Negative resistance in the output characteristics of SOI MOSFETs," *IEEE SOS/SOI Tech. Conf.*, pp. 33–34, 1989

# Chapter 4

# Test Structures for Characterization

### Abstract

A reliable characterization method is proposed for evaluating effective load capacitance, effective current drivability and effective leak current in dynamic operation. In this method, two test structures are utilized in order to make the evaluation reliable; one is an open-loop inverter array for extracting parameters and the other is a conventional closed-loop ring oscillator for confirmation. The method is easily extended for general high-speed circuits such as ECL and compound-semiconductor circuits though CMOS circuits are used in this chapter.

# 4.1 Introduction

Recent high-speed circuits, such as ECL or compound semiconductor circuits, consume static power as well as dynamic power. Even in CMOS circuits, static leak current is not a negligible problem in deep submicron region because of their reduced threshold voltage for lowered supply voltage. For these statically power-consuming circuits, conventional characterization methods were composed of DC and AC measurement using a single device and a ring oscillator, respectively. It was difficult, however, to directly relate measured data, such as delay time and current dissipation, with dynamic performance parameters, such as effective load capacitance,

effective current drivability and effective leak current. In this chapter, a new characterization method of dynamic performance parameters is described using an open-loop inverter array and a single device, the results of which can be directly related with AC measurement of a ring oscillator in order to make them more reliable.

## 4.2 Characterization Method

### 4.2.1 Equivalent Linear Resistance

At first, it is noted that delay time and average current dissipation can be derived using an effective load capacitance and an equivalent linear resistance[4.1] (ELR), assuming leak current is negligible. ELR in a ring oscillator is defined so that delay time of an inverter,  $t_{pd}$ , is given by Eq. (3.12). Equations for the ELR is given as the following when the signal swings fully from ground to power;

$$R_D^{(d)} = \frac{5}{4} \int_{0.5V_{DD}}^{V_{DD}} \frac{\mathrm{d}v_O}{i_O}$$
(4.1)

for a pull-down device and

$$R_D^{(u)} = \frac{5}{4} \int_0^{0.5V_{DD}} \frac{\mathrm{d}v_O}{-i_O} \tag{4.2}$$

for a pull-up device, where  $v_O$  and  $i_O$  are output voltage and current, and  $V_{DD}$  is supply voltage. These equations are derived based on the empirical fact that the delay time,  $t_{pd}$ , is about 5/4 of response time of an inverter for step input when the output voltage changes from  $V_{DD}$  to 50% of  $V_{DD}$ . Although  $t_{pd}$  was conventionally believed to be about 5/8 of response time of step input when the output voltage changes from 10 % to 90 %[4.2] as given by Eq. (3.7), the validity of eqs. (4.1) and (4.2) will be shown in later section by measurement.

Current dissipation of a ring oscillator,  $I_0$ , is given by the following, as usual:

$$I_0 = f C_L V_{DD} N, \tag{4.3}$$



(a) for CMOS without leakage (b) for circuits with leakage

Fig. 4.1 Two evaluation methods for ELR of a CMOS pull-down circuit for the case of (a) negligible leak current and (b) considerable leak current.

where f is oscillation frequency and N is the number of stages of a ring oscillator. It is noted that parameters,  $C_L$  and  $R_D$ , can be obtained by DC *I-V* characteristics of a single device and delay time of a ring oscillator based on eqs. (3.12) to (4.2). Namely, dynamic performance can be evaluated even by conventional test patterns when the leak current is negligible in such case like long-channel CMOS circuits.

#### **Extension for Leaky Circuits** 4.2.2

When the leak current is considerable, the above equations should be modified as follows. Equations (4.1) and (4.2) are no more valid for leaky circuits because reduced signal swing should be taken into account. In this case, the ELR is to be defined as an integration of output current measured as shown in Fig. 4.1 (b), where  $V_H$  and  $V_L$  are logic high and low voltages, respectively. Using the inverter as shown in Fig. 4.1 (b), eqs. (4.1) and (4.2) are modified as:

$$R_D^{(d)} = \frac{5}{4} \int_{(V_L + V_H)/2}^{V_H} \frac{\mathrm{d}v_O}{i_O}$$
(4.4)

for a pull-down operation and

$$R_D^{(u)} = \frac{5}{4} \int_{V_L}^{(V_L + V_H)/2} \frac{\mathrm{d}v_O}{-i_O}$$

(4.5)



Fig. 4.2 A proposed inverter array to evaluate dynamic performance of circuits with leak current.

for a pull-up operation.

A test circuit shown in Fig. 4.2 was designed to measure the above parameters. This circuit consists of an inverter array to provide  $V_L$  or  $V_H$  to the input of the last stage. Connections are shown in Fig. 4.3 for measuring *I-V* characteristics for the ELRs.

Leak current needs to be added to eq. (4.3) for current dissipation of a ring oscillator,  $I_1$ , as follows:

 $I_1 = fC_L (V_H - V_L) N + I_L N, (4.6)$ 

where  $I_L$  is the average of leak current for a pull-up and a pull-down circuit operation. Although delay time and current dissipation are expressed by eqs. (3.12) and (4.6), the leak current can not be reliably separated from dynamic current if using a ring oscillator, because oscillation frequency, f, of a ring oscillator is not controllable. These two components of current can be separated by measuring current dissipation at several points of frequency using a test circuit



Fig. 4.3 The circuit diagrams to evaluate ELR of an nMOSFET with leak current. Odd number is employed for the stage number of inverter array. The circuit (a) is for pull-down operation and (b) is for pull-up operation. Illustrative I-V characteristics of the output are shown in (c) and (d), respectively. ELR for pull-up and pull-down transistors are obtained by integrating the hatched area in (c) and (d).



Fig. 4.4 The circuit configuration to separate dynamic current and leak current. An illustrative current dissipation is shown in the graph as a function of operation frequency "f".

shown in Fig. 4.4. It is noted that the first stage of an inverter array is added in order to provide a typical input swing and the last stage is separated in power supply to remove the effect of parasitic capacitances due to a measurement equipment. An illustrative current dissipation depending on input frequency is shown in Fig. 4.4. Intercept of y-axis gives the static leak current, and its slope is proportional to the load capacitance, which is given by rewriting eq. (4.6):

$$C_L = \frac{I_1 - I_L}{f(V_H - V_L)N}.$$
(4.7)

Although the static leak current can be measured directly by applying DC signals to Fig. 4.4 as input, the above method is considered more reliable even for time-dependent load-capacitance[4.4] or for the case of SOI circuits, where threshold voltage can be changed depending on operation frequency by the floating body effect.



Fig. 4.5 Summary of the characterization method described in this study.

### 4.2.3 Confirmation Method

Additionally total current dissipation of a ring oscillator,  $I_R$ , is also related to the ELR as follows[4.3]:

$$I_R = \frac{V_H - V_L}{R_D^{(u)} + R_D^{(d)}} + I_L N.$$
(4.8)

The current dissipation from eq. (4.8), delay time from eq. (3.12), the load capacitance and the leak current from Fig. 4.4 must have good agreement with the results obtained by a ring oscillator to make the evaluation reliable. Relations of these characterization methods are summarized in Fig. 4.5



Fig. 4.6 Drain current-voltage,  $I_{DS}$ - $V_{DS}$ , characteristics of MOSFET on substrate #1. The gate length is 0.3  $\mu$ m and the gate width is 20  $\mu$ m. (a) n-channel MOSFET. (b) p-channel MOSFET. No substrate bias is applied.

## 4.3 Experimental Results

The proposed characterization method was experimentally confirmed using ultra-thin SIMOX SOI/MOSFET circuits. The thickness of SOI layer, buried oxide and gate oxide was 50nm, 85 nm and 7nm, respectively. Two types of substrates were measured, where threshold voltage is different. Typical drain current-voltage characteristics for 0.3  $\mu$ m-gate length is shown in Fig. 4.6. Output current-voltage characteristics of an inverter array is shown in Fig. 4.7 by using the circuit shown in Fig. 4.3. Input frequency dependencies of current dissipation of 0.3- $\mu$ m-gate inverter arrays are shown in Fig 4.8. It is noted that the intercept of y-axis is different from DC leak current, which may be caused by threshold voltage shift due to floating body effect. Effective load capacitances extracted from the slope of current dissipation in Fig. 4.8 are shown in Fig. 4.9, where the gate widths of nMOSFETs and pMOSFETs are 2.1  $\mu$ m



Fig. 4.7 Output current-voltage,  $I_O$ - $V_O$ , characteristics of an inverter array. (a) pull down current and (b) pull up current of substrate #1. (c) pull down current and (d) pull up current of substrate #2. The gate length is 0.3  $\mu$ m and the gate widths of nMOSFET and pMOSFET are 2.1  $\mu$ m and 3.3  $\mu$ m, respectively. No substrate bias is applied.



Fig. 4.8 Current dissipation of an inverter array on various input frequency on (a) substrate #1 and (b) substrate #2. The gate length is 0.3  $\mu$ m and the gate widths of nMOSFET and pMOSFET are 2.1  $\mu$ m and 3.3  $\mu$ m, respectively. No substrate bias is applied. The current dissipation of ring oscillators on oscillation frequency are also shown using black marks for comparison.



Fig. 4.9 Effective load capacitances on various gate length extracted from the slope of current dissipation in Fig. 8. The gate widths of nMOSFET and pMOSFET are 2.1  $\mu$ m and 3.3  $\mu$ m for all gate lengths, respectively.

and 3.3  $\mu$ m for all gate lengths, respectively. The change of load capacitance depending on gate length has good agreement with the change of gate oxide capacitance. Namely, the sum of parasitic capacitances other than gate oxide capacitance, such as gate fringing capacitance, drain-substrate capacitance and wiring capacitance, is independent of gate length in each substrate. The relationship of ELRs obtained by AC and DC measurement is shown in Fig 4.10. Here  $t_{pd}$  is delay time measured by ring oscillators. Data plotted in Fig. 4.10 is ranging from 0.2  $\mu$ m to 0.35  $\mu$ m in gate length, from 1.5 V to 2.5 V in supply voltage and from about 20 ps to 100 ps in delay time. The empirical coefficient 5/4 in eqs. (4.4) and (4.5) was employed based on Fig. 4.10 (b). It is noted that the coefficient 5/4 is independent of gate length, supply voltage and substrate type. Finally, chip micro photographs of a ring oscillator and an inverter array are shown in Fig. 4.11. The details of the circuit performance and fabrication technology will be reported in future.



Fig. 4.10 The relationship of equivalent linear resistances obtained by AC and DC measurements using (a) 10 to 90 %[2] and (b) 50 to 100 % of output swing. The equivalent linear resistance calculated from DC measurement is the average of pull-up and pull-down cases.  $t_{pd}$  is obtained from measurement of ring oscillators.





Fig. 4.11 Chip micro photographs of (a) a ring oscillator and (b) an inverter array.

# 4.4 Conclusion

In this chapter, a characterization method of dynamic circuit performance has been proposed using effective load capacitance, effective drivability (ELR), and effective leak current as parameters. Experimental results show that the characterization method of the load capacitance, the leak current and the ELR gives self-consistent results. It is also shown that the ELRs obtained from AC and DC measurement are correlated, which is independent of gate length, supply voltage and substrate type. As a result, the proposed method is expected to be a possible standard for characterizing high speed circuits, the leak current of which is no more negligible.

# References

- [4.1] M. Fujishima, K. Asada, and T. Sugano, "Evaluation of delay-time degradation of lowvoltage BiCMOS based on a novel analytical delay-time modeling," *IEEE J. Solid-State Circuits*, vol. 26, no. 1, pp. 25–31, Jan 1991.
- [4.2] T. Sugano and T. Iizuka, CMOS VLSI design, Chapter 4.3, Baifukan, 1989, (in Japanese).
- [4.3] M. Fujishima, M. Ikeda, K. Asada, Y. Omura and K. Izumi, "Analytical Modeling of Dynamic Performance of Deep Sub-micron SOI/SIMOX Based of Current-Delay Product," *IEICE Trans. Electron.*, vol. E75-C, no. 12, pp. 1506–1514, Dec 1992.
- [4.4] Y. Omura and K. Izumi, "A new model of switching operation in fully depleted ultrathinfilm CMOS/SIMOX," *IEEE Electron Device Lett.*, vol. 12, no. 12, pp. 655–657, Dec. 1991.

# Part II

# Design for Low-power and High-speed CMOS Circuits

# Chapter 5

# **Gate-Width Optimization**

### Abstract

Simple gate-optimization theory is proposed for CMOS circuits. It is found that gatewidth-dependent part in the delay time should be equal not only for an inverter array but also for all general structured circuits under the condition of negligible wiring capacitance. This theory is especially useful for the circuit within a module in an integrated circuit.

# 5.1 Introduction

The speed optimization is one of the serious problem in CMOS circuits because its delay time is almost proportional to load capacitance, while the delay time of a BiCMOS or a ECL circuit little increases even though the load capacitance increases. Many optimization method for the gate width of a MOSFET was conventionally proposed. These studies were either too simple for the practical circuit because the output and input of logics connected one by one[5.1][5.2] or too complex to be calculated analytically, so that they consumed large CPU time and were inconvenient for full custom design[5.3][5.4].

In this study, a simple optimization method for gate widths of MOSFETs will be proposed under the condition of negligible wiring capacitance. It will be a good guideline to design the module in an integrated circuits.



Fig. 5.1 A circuit buffered by inverters without branch.

# 5.2 The Case of a Straight Circuit without Branch

The delay time of CMOS and BiCMOS gates,  $t_{pd}$  can be approximated by Af+B, where A and B are constants and f is fan-out. Af and B are named size-dependent and size-independent delay time hereafter, respectively.

When certain *m*-stage cascade logics are buffered by k-stage inverters without any branch as shown in Fig. 5.1, the total delay time, D, is described as

$$D = \sum_{i=0}^{m-1} \left( B_i + A_i \frac{w_{i+1}}{w_i} \right) + \sum_{i=m}^{m+k-1} \left( B_I + A_I \frac{w_{i+1}}{w_i} \right),$$
(5.1)

where  $w_i$  is the gate width measured by size units,  $w_{m+k}$  is the equivalent gate width whose input capacitance is equal to the load capacitance and the suffix '<sub>I</sub>' denotes an inverter. The optimum total delay time is given[5.2] as

$$D = \sum_{i=0}^{m+k-1} B_i + (m+k) \left( A_0 \cdots A_{m-1} \frac{w_{m+k}}{w_0} \right)^{1/m+k}.$$
 (5.2)

The total number of optimum stages, n = m + k, is given as

$$n = m + k = \frac{\ln\left(\frac{A_0}{A_I} \cdots \frac{A_{m-1}}{A_I} \frac{w_{m+k}}{W_0}\right)}{\ln f_I},$$
(5.3)

where  $f_I$  is approximated as

 $f_I \approx e + \frac{B_I}{1.5A_I},\tag{5.4}$ 

which is independent of circuit topology. Each gate width,  $w_i$ , can be calculated as

$$w_i = \frac{w_0}{A_0} \cdots \frac{w_{i-1}}{A_{i-1}} f_I^{\ i}.$$
(5.5)

# 5.3 the Case of General Circuit with Branches5.3.1 Minimum gate width of the first stage

In this section, the minimum gate width will be derived when the total delay time is given. It is assumed that the minimum delay time is derived as D, when certain gate width of the first stage is given as W. On the contrary, if the minimum gate width were less than W in the case of the delay time D, the first assumption would be fault because the delay time of the first stage can be reduced by modifying the gate width of first stage to W without any change of other delays. As a result, the minimum gate width corresponds to the minimum delay time one by one. The relationship between the minimum gate width and delay time can be given from (5.3) and (5.2) as

$$D_A = \frac{A_I f_I}{\ln f_I} \ln \left( \frac{A_0}{A_I} \cdots \frac{A_{m-1}}{A_I} \frac{w_n}{w_0} \right),\tag{5.6}$$

where  $D_A$  is the total size-dependent delay time. The minimum gate width  $w_0$  is given as

$$w_0 = \frac{w_n}{(A_I f_I)^n} A_0 \cdots A_{m-1},$$
(5.7)

and from (5.3) and (5.6), the optimum number of stage, n, can be derived as

$$n = \frac{D_A}{A_I f_I}.$$
(5.8)

It is noted that this equation means that the optimum stage number for the minimum gate width is independent of the circuit topology when  $D_A$  is given.



Fig. 5.2 The circuit model with one branch. The number of stage before branch, m, is fixed. Each path is buffered by inverter array.

### 5.3.2 the Case Including Branches

At first, the case when the branch exists after (m-1)-th stage is considered, where m is fixed. When  $W_i^{(j)}$  denotes *j*-th gate width in *i*-th stage as shown in Fig. 5.2, the optimum delay time is given as

$$D = \sum_{i=0}^{m-2} A_i \frac{w_{i+1}}{w_i} + \frac{A_{m-1}}{w_{m-1}} \sum_{j=0}^N w_m^{(j)} + \sum_{i=m}^{m+n-1} A_i^{(j)} \frac{w_{i+1}^{(j)}}{w_i^{(j)}} + \sum_{i=0}^{m+n-1} B_i,$$
(5.9)

where size-independent term,  $\sum_{i=m}^{m+n-1} B_i$ , in each path is approximated to be equal. It is noted that the delay time after *m*-th stage should be equal in each path to minimize the delay time, which results that the number of stage in each path will be also equal as shown in Fig. 5.2. The minimum gate width of  $w_m^{(j)}$  can be given as

$$w_m^{(j)} = \frac{w_{m+n}^{(j)}}{(A_I f_I)^n} A_m^{(j)} \cdots A_{m+n-1}^{(j)},$$
(5.10)

and the size-dependent delay time in each stage is equalized as

$$A_i^{(j)} \frac{w_{i+1}^{(j)}}{w_i} = A_I f_I.$$
(5.11)

By substituting (5.10) and (5.11) to (5.9),

$$D = \sum_{i=0}^{m-2} A_i \frac{w_{i+1}}{w_i} + \frac{A_{m-1}}{w_{m-1}(A_I f_I)^n} \sum_{j=0}^N (A_m^{(j)} \cdots A_{m+n-1}^{(j)} w_{m+n}^{(j)}) + n(A_I f_I) + \sum_{i=0}^{m+n-1} B_i.$$
(5.12)

By differentiating in terms of  $w_i$  and  $(A_I f_I)$ , (5.12) is rewritten as

$$D = (m+n) \left( A_0 \cdots A_{m-1} \sum_{j=0}^N \left( A_m^{(j)} \cdots A_{m+n-1}^{(j)} \frac{w_{m+n}^{(j)}}{w_0} \right) \right)^{1/m+n} + \sum_{i=0}^{m+n-1} B_i.$$
(5.13)

This equation corresponds to (5.2), so that the optimum number of total stages  $n_{total}$  can be given by applying (5.3) as

$$n_{total} = m + n = \frac{\ln F_v}{\ln f_I},\tag{5.14}$$

where  $F_v$  is defined as a virtual fan-out in the case of having a branch given as

$$F_{v} = \left(\frac{A_{0}}{A_{I}} \cdots \frac{A_{m-1}}{A_{I}} \sum_{j=1}^{N} \left(\frac{A_{m}^{(j)}}{A_{I}} \cdots \frac{A_{m+n-1}^{(j)}}{A_{I}} \frac{w_{m+n}^{(j)}}{w_{0}}\right)\right).$$
(5.15)

Applying these derivations recursively, the optimum number of total stages of general treestructured circuits as shown in Fig. 5.3 can be given by (5.14) using

$$F_{v} = \left(\frac{A_{0}}{A_{I}} \cdots \frac{A_{m-1}}{A_{I}} \sum_{j=1}^{N_{1}} \left(\frac{A_{m}^{(j)}}{A_{I}} \cdots \frac{A_{m+n-1}^{(j)}}{A_{I}} \right) \times \sum_{k=1}^{N_{2}} \left(\frac{A_{m+n}^{(k)}}{A_{I}} \cdots \frac{A_{m+n+o-1}^{(j)}}{A_{I}} \sum_{l=1}^{N_{3}} \cdots \frac{w_{m+n+o+\dots}^{(j)}}{w_{0}}\right) \right),$$
(5.16)

and the optimum size-dependent delay time in each gate is  $A_I f_I$ .







Fig. 5.4 The case output merges to the same logic.

|      | Table 0.  | r carcuia | neu A ai  | IU D IOI | 0.2-µm c | mos di | utto        |
|------|-----------|-----------|-----------|----------|----------|--------|-------------|
|      | $t_{pLH}$ |           | $t_{pHL}$ |          | $t_{pd}$ |        |             |
| cell | A (ps)    | B(ps)     | A (ps)    | B(ps)    | A (ps)   | B (ps) |             |
| INV  | 21.4      | 12.3      | 26.7      | 14.4     | 24.0     | 13.4   | $f_I = 3.3$ |
| ND2  | 27.9      | 21.8      | 34.4      | 29.4     | 31.1     | 25.6   |             |
| NR2  | 33.9      | 24.8      | 42.5      | 25.5     | 38.2     | 25.2   |             |
| ND3  | 34.0      | 30.8      | 41.6      | 44.8     | 37.8     | 37.8   |             |
| NR3  | 45.8      | 38.1      | 57.6      | 35.2     | 51.7     | 36.6   |             |

### The Case of Merging Output to the Same Logic 5.4

When the outputs of several paths merges to the same logic as shown in Fig. 5.4, the delay time can be optimized by substituting  $w_{m+n}$  to  $w_{m+n}$  in (5.15) as

$$F_{v} = \left(\frac{A_{0}}{A_{I}} \cdots \frac{A_{m-1}}{A_{I}} \sum_{j=1}^{N} \left(\frac{A_{m}^{(j)}}{A_{I}} \cdots \frac{A_{m+n-1}^{(j)}}{A_{I}} \frac{w_{m+n}}{w_{0}}\right)\right).$$
(5.17)

The optimum size-dependent delay time will be the same  $A_I f_I$  as in the case of tree-structured circuits. Namely, even though the one output separates to different logics or several outputs merges to the same logic, the optimum size-dependent delay time of each gate is  $A_I f_I$ .

### An optimization example 5.5

The coefficient of the size-dependent delay time, A's, and the size-independent delay time, B's, assuming 0.2- $\mu$ m CMOS technology are shown in Table 5.1. 10-bit decoder circuit shown in Fig. 5.5 is employed for an optimization example. The virtual fan-out of each node is also shown in Fig. 5.5 when the real fan-out is 100. Because critical paths are from the input of 3-input-NAND to the output, the number of stage is optimized using its virtual fan-out. An optimized result is shown in Fig. 5.6, in which gate widths in the path from 2-input-NAND are minimized in order to reduce the load capacitance of a previous circuit. To confirm the optimum point, delay-time dependence against gate width is shown in 5.7. It is noted that all



Fig. 5.5 Circuit of a 10-bit decoder. The virtual fan-out is also shown in this figure.

2.1 3.3 3.3 3.3 -00-00 00 fan out 4 1.0 3.3 2.7 5.5 2.5 3.3 2.5 3.3 3.3 3.3 ×4 0-00-00-00-00 100w0 1.2 ×32 4.5 11.3 2.9 9.4 30.7 3.3 3.3 2.5 3.3 Th ×8 -00-00 p  $\supset 0$ gate size 0.82 2.7 4.4 11.0 ×2





Fig. 5.7 Delay-time dependence against gate width, which is normalized by the theoretical optimum width.

curves in each gate tend to be identical when the increase of delay time is proportional to the load capacitance such as CMOS circuits.

# 5.6 Conclusion

The simple optimization method in CMOS digital circuits was proposed. It was found that the delay time in general logic circuits could be optimized only by making its size-dependent part  $f_I$  times that of an inverter. Namely, the conventional e power low could be extended to general logic circuits because  $f_I$  tends to be e when  $B_I$  goes to zero.

# References

- [5.1] C. Mead and L. Conway, Introduction to VLSI Systems. Reading, MA: Addison-Wesley, 1980.
- [5.2] T. Sakurai, "A unified theory for mixed CMOS/BiCMOS buffer optimization," IEEE J. Solid-State Circuits, vol. SC-27, no. 7, pp. 1014–1019, Jul. 1992.
- [5.3] J. P. Fishburn and A. E. Dunlop, "TILOS: a polynomial programming approach to transistor sizing," *Proc. Int. Conf. Computer Aided Design*, pp. 326–328, 1985.
- [5.4] Z. J. Dai and K. Asada, "MOSIZ: a two-step transistor sizing algorithm based on optimal timing assignment method for multi-stage complex gates," *IEEE Proc. Custom Integrated Circuits Conf.*, pp. 17.3.1–17.3.4, May. 1989.

# Chapter 6

# Low Power Frequency Dividers

### Abstract

Four types of frequency dividers were fabricated on SIMOX/SOI(Separation by IMplanted OXygen/Silicon On Insulator) substrates. A novel circuit among these four circuits showed highest operation frequency of 1.2 GHz under 1-volt supply voltage, the gate lengths of which were 0.15  $\mu$ m and 0.1  $\mu$ m. Power consumption was no more than 50  $\mu$ W and 62  $\mu$ W for both 0.15- and 0.1- $\mu$ m gate designs, respectively.

# 6.1 Introduction

Currently, the delay time of an inverter of sub-micron-gate MOSFETs/SOI encourages CMOS circuits to be used in ultra-high speed integrated circuits as an alternative of bipolar circuits. Especially, deep submicron CMOS/SOI circuits with small fan-out such as frequency dividers are expected to show high speed and very low power consumption due to considerable reduction of parasitic capacitance by buried oxide and high resistivity of SOI substrate. Supply voltage can also be reduced in the case of near 0.1-µm-gate CMOS circuits.

Unlike the half micron SOI CMOS circuits studied in [6.3], we have used SIMOX substrates whose quality has been drastically improved by oxygen-implantation technology and high temperature annealing technology [6.4]. Four types of frequency divider circuits have been

### CHAPTER 6. LOW POWER FREQUENCY DIVIDERS

A



Fig. 6.1 Drain current-voltage,  $I_{DS}$ - $V_{DS}$ , characteristics of MOSFET built in 50-nm-thick SOI film. The gate length and width are 0.12  $\mu$ m and 16.5  $\mu$ m, respectively. The gate oxide thickness is 7 nm. (a) n-channel MOSFET. (b) p-channel MOSFET. No substrate bias is applied. The gate-to-source voltage,  $V_{GS}$ , varies from 0 V to 2 V.

fabricated and measured ranging from 0.1- to 0.2-µm-gate MOSFETs.

# 6.2 Summary of Device Features

Frequency dividers were fabricated on the SIMOX substrates with 50-nm SOI layer, 80-nm buried oxide and gate oxide of 7 nm thick. Fig. 6.1 shows typical drain current  $I_{DS}$  and voltage  $V_{DS}$  characteristics of 0.12- $\mu$ m gate n-channel and p-channel MOSFETs without substrate bias. Fig. 6.2 also shows  $I_{DS}$  and  $V_{DS}$  characteristics of 0.16- $\mu$ m gate MOSFETs without substrate bias. The gate width  $W_G$  is 16.5  $\mu$ m for both n-channel and p-channel MOSFETs. As shown in Fig. 6.1 (b), a slight punch-through effect occurred in the pMOSFET of 0.12- $\mu$ m gate. As a result, a 0.1- $\mu$ m gate frequency divider consumed larger leak current than 0.15- or 0.2- $\mu$ m

## CHAPTER 6. LOW POWER FREQUENCY DIVIDERS



Fig. 6.2 Drain current-voltage,  $I_{DS}$ - $V_{DS}$ , characteristics of MOSFET built in 50-nm-thick SOI film. The gate length and width are 0.16  $\mu$ m and 16.5  $\mu$ m, respectively. The gate oxide thickness is 7 nm. (a) n-channel MOSFET. (b) p-channel MOSFET. No substrate bias is applied. The gate-to-source voltage,  $V_{GS}$ , varies from 0 V to 2 V.


Fig. 6.3 Four types of frequency dividers built on SIMOX substrates. (a) 8-NAND type master slave circuit. (b) Complex-gate type master slave circuit. (c) 6-NAND type edge trigger circuit. (d) Novel dynamic master slave circuit (MC divider).

gate dividers, which will be described in section IV.

### 6.3 Frequency Dividers

Four types of frequency dividers used in this study are shown in Fig. 6.3; (a) and (b) are of master-slave types [6.5], (c) is of an edge-trigger type [6.5] while (d) is a new circuit developed in this study. The circuit (a) needs an inverter to generate complementary clock, and the other three circuits operates on single-phase clock. The circuit (d) is a novel circuit designed to realize both high speed and low power-dissipation modifying the circuit (b), which is called an MC (Modified Complex-gate) divider hereafter. Fig. 6.4 shows a transistor network of Fig. 6.3 (b) for convenience. Here, the MOSFETS M1~M8 can be omitted as in Fig. 6.3 (d)



Fig. 6.4 Transistor network of the circuit in Fig. 3 (b).  $M1 \sim M8$  can be omitted, resulting in the circuit in Fig. 3 (d). Nodes  $N1 \sim N4$  are floating when  $M1 \sim M8$  are omitted and clocked MOSFETs are turned off.

when the nodes N1~N4 are allowed to be floating, namely dynamic operation. The diagram of state transition is shown in Fig. 6.5. The MC divider operates properly from state I to IV as shown in this figure even after several MOSFETs are omitted in Fig. 6.4. In the MC divider, not only gate capacitance but also the drain area and wiring area can be reduced because the topologically configuration is considerably simplified. Comparison for the layout patterns using 0.1- $\mu$ m CMOS circuits is shown in Fig. 6.6. Even though the numbers of transistors in complex-gate and MC dividers are 24 and 16, respectively, the layout area shown in Fig. 6.6 (b) is about half of that in (a). Although gate-width optimization by transistor sizing may improve the performance further which will be experimentally verified in the next fabrication, it is noted that the gate widths of the circuit are 4  $\mu$ m for both nMOSFETs and pMOSFETs. In the case of 0.15- and 0.2- $\mu$ m circuits, on the other hand, gate widths are 6 and 8  $\mu$ m to keep the gate aspect ratio constant, respectively. Wiring is based on about 0.7- $\mu$ m process

é

h



cut-off transistors

Fig. 6.5 A diagram of state transition of the MC divider. I through IV in waveform corresponds to the ones in circuits.



A







Fig. 6.7 Performance data of four types of frequency dividers as a function of supply voltage. (a) Maximum operation frequency and (b) power dissipation at 1GHz operation. These data were obtained from 0.15  $\mu$ m-gate CMOS/SOI circuits. No substrate bias is applied.

technology in all cases.

### 6.4 Experimental Results

Fig. 6.7 (a) and (b) show the maximum operation frequency and power dissipation using 0.15- $\mu$ m CMOS circuits, respectively. It is noted that no substrate bias is applied in the measurement of frequency divider circuits hereafter. The maximum frequency of 2.5 GHz for the MC divider was obtained at 2 volt supply voltage and the power dissipation of 50  $\mu$ W was achieved at 1 GHz operation. These were the highest speed and the lowest power dissipation measured among the four types of frequency dividers.

Maximum operational frequency and power dissipation of the complex-gate type and the MC divider shown in Fig. 6.8 were measured for several gate lengths from 0.1 to 0.2  $\mu$ m. The MC type divider shows higher maximum frequency and lower power dissipation than the



Fig. 6.8 Performance data of MC and complex-gate dividers as a function of supply voltage. (a) Maximum operation frequency and (b) power dissipation at 1GHz operation. These data were obtained from 0.1-, 0.15- and 0.2- $\mu$ m-gate CMOS/SOI circuits. No substrate bias is applied.

complex-gate type divider for all sizes of gate length as shown in Fig. 6.8. The maximum frequency of the 0.1- and 0.15- $\mu$ m MC dividers was similar, although that of a 0.2- $\mu$ m MC divider was slightly lower than the others. On the contrary, the maximum frequency of a 0.2- $\mu$ m complex-gate divider was highest among the three types of complex-gate dividers because static complex-gate dividers are more sensitive to leak current which degrades current drivability of MOSFETs, that is, pull-up current must compensate for an extra pull-down current due to a leaky nMOSFET to hold an internal state in a static complex-gate type divider. Pull-down drivability is also degraded due to an additional leaky pMOSFET, which is omitted in the MC divider.

Power dissipation is almost proportional to the square of supply voltage, which means that charging current flows much larger than leak current at 1 GHz operation except for the case of 0.1- $\mu$ m gate. Although power dissipation of 0.1- $\mu$ m MC divider degrades at 2 volt supply voltage, it still consumes no more than 62  $\mu$ W at 1 volt supply voltage.

In[6.6], it was reported that the delay time for an SOI ring oscillator was about half of that for a bulk one. It is mainly because stray capacitances such as drain/source-to-substrate capacitance and wiring capacitance in SOI devices due to as buried oxide layer are about a half of those in bulk devices. Therefore, we are estimating that the performance improvement due to SOI circuits is approximately 100 % compared with bulk circuits with the same fine technologies in terms of power or maximum frequency. However, it is noted that the MC divider keeps advantage against other dividers because of a circuit simplicity regardless of fabrication technologies.

Finally, comparison of power dissipation with other studies is shown in Fig. 6.9. The power dissipation for multi-stage frequency dividers is of the first stage, which is estimated to be half of the total power. The MC divider consumes only 50 fJ per input cycle, resulting in less than 1/100 of the other dividers.

ê

A



Fig. 6.9 Comparison of power dissipation with other studies. X-axis stands for the reciprocal of maximum frequency. The product of the power and the reciprocal of frequency corresponds to the energy consumption per input cycle.

### 6.5 Conclusion

Deep submicron CMOS frequency dividers built on SIMOX substrates have been evaluated. It was shown the maximum frequency of the newly designed MC divider for 1/2 frequency division can operate at 2.5GHz and its power dissipation is 50  $\mu$ W at 1 GHz in the case of 0.15- $\mu$ m gate. Additionally, the maximum frequency of 2.6GHz and its power dissipation of 62  $\mu$ W at 1 GHz were achieved in the case of 0.1- $\mu$ m gate. From these results, it was demonstrated that deep submicron CMOS circuits on SOI structure are highly promising to future handy equipments for battery operation because of low power dissipation.

### References

- [6.1] Y. Omura, S. Nakashima, K. Izumi and T. Ishii, "0.1-μm-gate, ultrathin-film CMOS devices using SIMOX substrate with 80-nm-thick buried oxide layer," in *IEDM Tech.* Dig., pp. 675–678, 1991.
- [6.2] H. Miki, T. Ohmameuda, M. Kumon, K. Asada, T. Sugano, Y. Ohmura and K. Izumi, "Subfemtojoule deep submicrometer-gate CMOS built in ultra-thin Si film on SIMOX substrates," *IEEE Trans. Electron Devices*, vol. ED-38, no. 2, pp. 373–377, Feb. 1991.
- [6.3] A. Kamgar, S.J. Hillenius, H-I Cong, R.L. Field, W.S. Lindenberger, G.K. Celler, L.E. Trimble and J.C. Sturm, "Ultra-high speed CMOS circuits in thin SIMOX films," in *IEDM Tech. Diq.*, pp. 829–832, 1989.
- [6.4] S. Nakashima and K. Izumi, "Practical reduction of dislocation density in SIMOX wafers," *Electron. Lett.*, vol. 26, no. 20, pp. 1647–1649, 1990.
- [6.5] N. Weate and K. Eshraghian, Principles of CMOS VLSI Design: A System Perspective. Reading, MA: Addison-Wesley, 1985.

- [6.6] N. Ieda, "Technology trends in ASIC," *IEICE Trans.*, vol. E 74, no. 1, pp. 148–156, Jan. 1991.
- [6.7] H. Klose, M. Kerber, T. Meister, M. Ohnemus, R. Köpl, P. Weger and J. Weng, "Processoptimization for sub-30ps BiCMOS technologies for mixed ECL/CMOS applications," in *IEDM Tech. Dig.*, pp. 89–92, 1991.
- [6.8] T. Nittono, K. Nagata, Y. Yamauchi, T. Makimuraa, H. Ito and O. Nakajima, "Advanced IC fabrication technology using reliable, small-size, and high-speed AlGaAs/GaAs HBTs," in *IEDM Tech. Dig.*, pp. 931–934, 1991.
- [6.9] H. Yamada, T. Futatsugi, H. Shigematsu, T. Tomioka, T. Fujii and N. Yokoyama, "In-AlAs/InGaAs double heterojunction bipolar transistors with a collector launcher structure for high-speed ECL applications," in *IEDM Tech. Dig.*, pp. 964–966, 1991.

# Chapter 7

# High Speed Frequency Dividers

#### Abstract

Operation speed of several kinds of CMOS frequency dividers has been studied from an topological view point. It has been found that a divider base on a clocked-inverter type circuit is fastest in static CMOS circuits. It has been also found that the fastest CMOS divider circuit can be deformed a quite similar topology to an ECL divider circuit designed by NAND-AND logics. The result of circuit simulation by SPICE shows that maximum operation frequency is about half of the inverse of the inverter's delay time; this fact indicates that high-performance CMOS frequency dividers are expected to have higher speed than ECL frequency dividers, if taking into account the status of ring oscillator frequency records.

# 7.1 Introduction

Frequency divider circuits are practically important components used in phase locked loops (PLL) and counters. Recent personal communication equipments require dividers working on low power dissipation, reduced supply voltage and low cost. From this point of view, CMOS circuits are more suitable than ECL circuits [1-4] or other circuits using MESFET's or HBT[5][6]. Although there are several papers in which CMOS frequency dividers were



Fig. 7.1 Static frequency divider circuits using toggle flip-flop: (a) master-slave type, and (b) edge-trigger type.

studied[7-10], it was not yet clarified which type of CMOS circuits is the best.

In this chapter, the best circuit of CMOS static frequency dividers is discussed, and a CMOS dynamic circuit is also discussed, which needs only one-phase clock and has the similar performance to the best static circuit.

In Section 7.2, static and dynamic CMOS dividers are topologically analyzed. In Section 7.3, these analytical results are verified by simulation, and concluded in Section 7.4.

## 7.2 CMOS-Divider Circuits

### 7.2.1 CMOS Static Frequency Dividers

CMOS static dividers made by toggle-type flip flops are classified into two groups as shown in Fig.7.1. Master-slave type flip flops as shown in Fig.7.1(a) have the following features; (1a) two-phase clock is needed, (2a) the number of series MOSFET's between output node and ground or power line in each gate is less than or equal to two, and (3a) fan-out in feed-back loop is less than two.

Compared with the master-slave circuits, edge-trigger circuits as shown in Fig.7.1(b) have



Fig. 7.2 Master-slave type toggle flip-flop circuits: (a) 8-NAND type, (b) complex-gate type, (c) clocked-inverter type, and (d) transmission-gate type.

the following features; (1b) it can operate on one-phase clock, (2b) 3-input NAND or NOR logics are indispensably included, and (3b) fan-out factor spreads from one to three. Although master-slave circuits have the undesirable feature (1a), they are considered suitable for height-speed CMOS frequency dividers from the attractive features (2a) and (3a).

The master-slave circuits were classified into four types of circuits as shown in Fig.7.2. Here circuit types of (a) and (b) hold their state in the SR flip-flops composed of cross-coupled two NAND circuits, while circuit types of (c) and (d) hold their state in double inverter-ring circuits. The essential difference between these two types of circuits is that output q0 and its complement q1 of the double inverter-ring circuits change *simultaneously*, while outputs of cross-coupled two NAND circuits change sequentially, as shown in Fig.7.3. Namely, logic state

1 .







Fig. 7.3 Comparison of transient voltage between (a) a cross-coupled two NAND circuit and (b) a double inverter-ring circuit.





Fig. 7.4 Clocked-inverter type toggle flip-flop: (a) is clocked-inverter type using CMOS circuit and (b) is the circuit whose clocked-MOSFET's are merged,

has to propagate through two NAND circuits to change the state in the latter case. Because state of both master and slave has to be changed within one cycle, the maximum operation frequency  $f_{max}$  of types (a) and (b) is  $1/4t_{pd}^*$ , while  $f_{max}$  of types (c) and (d) is  $1/2t_{pd}^*$ , where  $t_{pd}^*$  is the average delay time in one logic stage.

The circuit shown in Fig. 7.4 (a) or (b) is described as transistor networks, which is the same circuit shown in Fig. 7.2 (c). The circuit in Fig. 7.4 (a) is obtained by directly translating gates to transistors from Fig. 7.2 (c). On the other hand, Fig. 7.4 (b) is obtained by merging clocked MOSFET's. A master-slave ECL circuit is also shown in Fig. 7.5 (a) for comparison, which is composed of NAND-AND circuits. It is noted that the nMOS transistor logic in Fig. 7.5 (b), like a source-coupled logic, has the same topology as the n-p-n bipolar transistor



Fig. 7.5 Comparison of clocked-inverter type CMOS and ECL circuits: (a) is NAND-AND type using ECL circuit and (b) is the same circuit as (a) but described like source-coupled logic.

logic in Fig. 7.5 (c). The circuit shown in Fig. 7.5 (c), usually followed by emitter follower buffer circuits, is the most popular circuit in the bipolar frequency divider, because of its high performance.

MOSFET's, whose gate nodes share clock signal, can be merged as shown in Fig.7.4 (a). The input load capacitance of this circuit is the half of Fig. 7.4(b).

Now let us compare three sub-circuits shown in Figs. 7.2 (b), (c) and (d). All of these circuits are composed of 24 transistors. Because each circuit has the symmetric circuits in the master and slave part, and in the set and reset part, we need only to compare the quarter part. The nMOSFET's circuit of the quarter parts is shown in Fig. 7.6, which consists of three MOSFET's. It is noted that the main difference in these circuits is the position of clocked nMOSFET whose gate terminal is connected to clock signal. When the input oscillator has an enough drivability, the gate width of the clocked MOSFET's can be expanded, compared with other MOSFET's, so that the channel resistance of enlarged clocked MOSFET's is negligible.





Fig. 7.6 1/4 part of nMOS circuits from (a) clocked-inverter type, (b) transmission-gate type and (c) complex-gate type circuits.



Fig. 7.7 CMOS dynamic divider circuits: (a) conventional transfer-gate type and (b) double-rail dynamic circuit using one-phase clock.

In this case, the circuits shown in Figs. 7.6 (a) and (b) have the same drivability as an inverter circuit, while the circuit shown in Fig. 7.6 (c) has the same drivability as a NAND circuit. It is noted that the parasitic capacitances of the source nodes of clocked MOSFET's in Fig. 7.6(b) will become large, because the source nodes of clocked MOSFET's are floating from ground. From this point, the operation speed of Fig. 7.6 (b) is slower than that of Fig. 7.6 (a) for enlarged clocked MOSFET's.

### 7.2.2 CMOS Dynamic Frequency Dividers

The most popular CMOS dynamic frequency divider is shown in Fig.7.7(a). The maximum operation frequency of this circuit is  $1/3t_{pd}^*$ , where  $t_{pd}^*$  is the mean delay time per stage. This circuit needs two-phase clock like static master-slave circuits. On the other hand, a double-rail dynamic frequency divider shown in Fig.7.7(b), which is a modified version from the circuit shown in Fig.7.4(d), needs only one-phase clock. The maximum operation frequency of the circuit shown in Fig.7.7(b) is  $1/2t_{pd}^*$ . Although the maximum frequency is comparative to that of the static clocked-inverter circuit as shown in Table 7.1, this dynamic frequency divider has advantage over the static circuit because two-phase clock is not needed.



# With normal gate-width of clocked MOSFET's

Fig. 7.8 Comparison of maximum frequency among CMOS static frequency divider circuits.

Table 7.1 Comparison of maximum operation frequency clocked-inverter circuits.

| Fig.3(d) | 6.1GHz |
|----------|--------|
| Fig.6(b) | 5.5GHz |
| Fig.10   | 6.3GHz |

|      | kp  | $16 \ \mu A/V^2$ |
|------|-----|------------------|
| pMOS | vto | -0.5 V           |
|      | tox | 15 nm            |
| nMOS | kp  | $40 \ \mu A/V^2$ |
|      | vto | 0.5 V            |
|      | tox | 15 nm            |

Table 7.2 Device parameters used in circuit simulation.

## 7.3 Verification by Simulation

In this section, the above consideration will be verified firstly, and sizing effect will be described about non-clocked MOSFET's in Section 7.3.2

### 7.3.1 The Effect of the Gate-Width of Clocked MOSFET's

The above qualitative consideration on frequency dividers were verified by circuit simulation. Simulation was carried out assuming device parameters of 1  $\mu$ m process, is shown in Table 7.2. Propagation delay time of a ring oscillator is 100 ps for this parameter. It is noted that process dependent parasitic capacitances originating from wiring and p-n junctions was ignored in order to consider the theoretical upper limits of performance of each circuit.

Simulation results are shown in Fig. 7.8, where the gate width of clocked MOSFET's denoted "enlarged" is 10 times larger than that of the others. The result shows that the clocked-inverter type circuit using enlarged clocked MOSFET's is fastest, which supports the consideration given in Section 7.2.1. Although the maximum frequency of the transmission-gate type divider was comparative to that of the clocked-inverter type divider for the normal gate width of clocked MOSFET's, it did not work when the clocked MOSFET's were enlarged. It is due to the position of clocked MOSFET's as described in Section 7.2.1.



Fig. 7.9 The gate-width dependence against maximum operation frequency: the x-axis of (a) is the gate-width ratio of a holding inverter over a transferring inverter, and that of (b) is the gate-width ratio of clocked MOSFET's over other MOSFET's.

### 7.3.2 MOSFET sizing in clocked-inverter type circuit

The optimum gate-width of non-clocked MOSFET's is discussed in this section.

Master-slave circuits can be broken down into two parts, which are for holding state and transferring state, respectively. State kept in one part is transferred to the other part synchronized with a clock signal. It is noted that the transfer circuit needs to have large drivability and the holding circuit needs to be toggled fast in order to improve transfer speed. To reduce the load capacitance of the transfer circuit, reduction of gate-width of MOSFET's in holding circuits in Fig.7.2 (a) (b) and (d) degrades the transferring speed. On the other hand, gatewidth of holding MOSFET's in the clocked-inverter type circuit shown in Fig.7.2 (c) can be reduced without degrading the transferring speed.

The MOSFET's size dependence on the maximum-operation frequency for the circuit shown in Fig. 7.4(d) is shown in Fig.7.9. In Fig.7.9(a), "inverter ratio" means the ratio of gate-widths



Fig. 7.10 Transient voltage of clocked-inverter circuits on high-speed operation



Fig. 7.11 Modified clocked-inverter circuit, in which the clocked-MOSFET's are omitted.

of the holding circuit to the transferring circuit and "clocked-MOSFET ratio" in Fig.7.9(b) means the ratio of gate-widths of clocked-MOSFET to transferring circuit. It shows that sizing the gate-width of MOSFET's can improve the dividing speed drastically. As shown in this figure, the maximum performance exceeds 5 GHz, which is more than  $1/2t_d$ , where the delay time  $t_d$  of inverter chain is 100 ps. This is because a coupling capacitance between the drain and gate nodes of clocked MOSFET's enhances the effective supply voltage over the power supply voltage as if bootstrapping circuits. The transient wave form in high-speed operation is shown in Fig.7.10, where the drain voltage of clocked MOSFET's is lowered as the gate voltage is lowered. It indicates that power is supplied not only from the power line but also from clock line. The performance of the divider is little improved as shown in Table 7.1, when

the holding inverters enclosed by dotted line are connected directly to power and ground line as shown in Fig.7.11, because the swing of the drain voltage of clocked MOSFET's becomes smaller in high-speed operation.

### 7.4 Conclusion

The following conclusions are obtained through the comparison of master-slave type frequency divider circuits:

- (1) The master-slave circuits are faster than the edge-trigger circuits for CMOS LSI.
- (2) The clocked-inverter type divider with enlarged gate width of clocked MOSFET's is fastest among the master-slave circuits.

Additionally, it was found that

(3) The nMOS part of the optimum clocked-inverter type circuit can be configured as the same topology as the ECL circuit using NAND-AND logic.

The maximum frequency is the half of the inverse of the inverter delay time, for the clockedinverter type circuits with enlarged gate width of clocked MOSFET's. This implys that CMOS frequency dividers are expected to exceed ECL static frequency dividers, because CMOS circuits, reported so far, can be faster than ECL circuits under the condition of small fan-out.

### References

[7.1] H. Ichino, N. Ishihara, M. Suzuki and S. Konaka, "18-GHz 1/8 dynamic frequency divider using Si bipolar technologies," *IEEE J. Solid-State Circuits*, vol. 24, no. 6, pp. 1723–1728, Dec. 1989.

- [7.2] P. Weger, L. Treitinger and J. Bieger, "15 GHz static frequency-divider IC in silicon bipolar technology," *Electronics Letters*, vol. 25, no. 8, pp. 513–514, Apr. 1989.
- [7.3] M. Kurisu, Y. Sasayama, M. Ohuchi, A. Sawari, M. Sugiyama, H. Takemura and T. Tashiro, "A Si bipolar 21GHz 320mW static frequency divider," in *ISSCC Tech. Dig.*, 1991, pp. 158–159.
- [7.4] T. Yamazaki, I. Namura, H. Goto, A. Tahara and T. Ito, "A 11.7GHz 1/8-divider using 42GHZ Si high speed bipolar transistor with photoepitaxially grown ultra-thin base," in *IEDM Tech. Dig.*, 1990, pp. 309–312.
- [7.5] S. Nishi, H. Tsuji, H. Fujishiro, M. Shikata and K. Tanaka, "A 36GHz 1/8 frequency divider with GaAs BP-MESFET's," in *IEDM Tech. Dig.*, 1990, pp. 305–308.
- [7.6] Y. Yamauchi, O. Nakajima, K. Nagata, H. Ito and T. Ishibashi, "A 34.8 GHz 1/4 static frequency divider using AlGaAs/GaAs HBTs," in 11th Annual GaAs IC Symp. Tech. Dig., 1989, p. 121-4.
- [7.7] H. Cong, J. Andrews, D. Boulin, S. Fang, S. Hillenius and J. Michejda, "Multigigahertz CMOS dual-modulus prescalar IC," *IEEE J. Solid-State Circuits*, vol. 23, no. 5, pp. 1189– 1194, Oct. 1988.
- [7.8] J. Yuan and C. Svensson, "High-speed CMOS circuit technique," *IEEE J. Solid-State Circuits*, vol. 24, no. 1, pp. 62–70, Feb. 1989.
- [7.9] A. Kamgar, S. Hillenius, H. Cong, R. Field, W. Lindenberger, G. Celler and L. Trimble, "Ultra-high speed CMOS circuits in thin SIMOX films," in *IEDM Tech. Dig.*, 1989, pp. 829–832.

- [7.10] Y. Kado, Y. Okazaki, M. Suzuki and T. Kobayashi, "3.2 GHz, 0.2  $\mu m$  gate CMOS 1/8 dynamic frequency divider" in *Electronics Letters*, vol. 26, no. 20, pp. 1684–1686, Sep. 1990.
- [7.11] A. Vladimirescu and S. Liu, "The simulation of MOS integrated circuits using SPICE2," Electron Res. Lab., Univ. of Calif., Berkeley, Memo. UCB/ERL M80/7, Feb. 1980.

# Chapter 8

# High-Speed Adder and Counter

#### Abstract

A new high speed carry generation algorithm is proposed based on binary carry look ahead (BCLA) algorithm, which is named fast binary carry look ahead (FBCLA). An inverted binary tree is omitted in FBCLA, which was conventionally needed in BCLA. The number of stages to make the carry generation signal for all bits is reduced to about half of that in conventional BCLA. As a result, the calculation time is also reduced to about half of the conventional one. This algorithm can apply to a binary incrementer and decrementer.

# 8.1 Introduction

A high speed adder is the most important part in a microprocessor, digital signal processor and so on, the speed of which is strongly dependent on the adder speed. That is why many studies were published for high speed adders[1–3]. The total speed in an adder circuit is strongly dependent on calculation time to generate carry signals. Conventionally, carry-select circuit[8.1] or carry-look-ahead (CLA) circuits[8.4], including binary-carry-look-ahead (BCLA)[8.2], was used for a high-speed adder. In these circuits, a BCLA is most suitable for a multi-bit adder, because the calculation time is proportional to the logarithm of word length. In this chapter,

a new circuit modified from the conventional BCLA adder is proposed, the calculation time of which is approximately half of the conventional one. After describing the FBCLA algorithm, it will be also shown that it can apply to a high-speed incrementer and decrementer.

## 8.2 Algorithm for High-speed Adder Based on BCLA

When  $A_i$  and  $B_i$  are defined as inputs,  $S_i$  and  $C_i$  as outputs of sum and carry, and  $G_i$  and  $P_i$ as generation and propagation of carry, respectively, the following equation is derived[8.4];

$$P_i = A_i \oplus B_i, \quad C_i = G_i + P_i \cdot C_{i-1} \tag{8.1}$$

$$G_i = A_i \cdot B_i, \qquad S_i = C_{i-1} \oplus P_i. \tag{8.2}$$

When the operation, o, is defined as

$$(g,p) \circ (g',p') \equiv (g + (p \cdot g'), p \cdot p') \tag{8.3}$$

the operation,  $\circ$ , is a connective operation independent of a calculation order. The carry signal is generated when  $g_i$  is substituted to  $G_i$ , which is calculated from the following equation;

$$(G_i, P_i) = \begin{cases} (g_1, p_1) & \text{if } i = 1\\ (g_i, p_i) \circ (G_{i-1}, P_{i-1}) & \\ = (g_i, p_i) \circ (g_{i-1}, p_{i-1}) \circ \cdots \circ (g_1, p_1) & \text{if } 2 \le i \le n \end{cases}$$
(8.4)

The carry signals were conventionally generated by cascaded calculation by a invertedbinary-tree circuit and a binary-tree circuit as shown in Fig. 8.1[8.4]. Although this circuit generates a carry signal for the most significant bit (MSB) without calculating the inverted binary tree, the carry generation of intermediate bits is delayed further. A new circuit shown in Fig. 8.2, on the other hand, does not need the inverted-binary-tree, where the calculation order is modified from Fig. 8.1. It is noted that this circuit maintains maximum fan-out number of two, so that load capacitance does not increase and the calculation time of each module has no disadvantage against that in Fig. 8.1.



Fig. 8.1 Block diagram of conventional binary-carry-look-ahead.





ß

ß



Fig. 8.3 Comparison of the number of stages needed to generate carry signals.  $\lfloor x \rfloor$  and  $\lceil x \rceil$  are the maximum integers less or equal to x and less than x + 1, respectively.



Fig. 8.4 Comparison of the layout of 32-bit adders using (a) the BCLA and (b) the FBCLA algorithm.

The comparison of the number of stages for the carry generation is shown in Fig. 8.3, which is proportional to the calculation time. As the number of bits increases, the number of stages in the FBCLA tends to be half of that in the conventional BCLA. Although the number of modules, which is proportional to the number of transistors, in the FBCLA is larger than that in BCLA, the typical layout area is almost same as shown in Fig. 8.4.

As an application of FBCLA, a binary incrementer and decrementer can be also designed. For an incrementer, a certain bit  $b_i$  must be inverted when the less significant bits than  $b_i$  are all 1. Namely

$$b_0 = \overline{b_0}$$
  

$$b_i = b_i \oplus (b_{i-1} \cdot \ldots \cdot b_0) \quad \text{for } i \ge 1.$$
(8.5)

For an decrementer, on the other hand,  $b_i$  must be inverted when the less significant bits than  $b_i$  are all 0. Namely

$$b_0 = \overline{b_0}$$
  

$$b_i = b_i \oplus (\overline{b_{i-1}} \cdot \ldots \cdot \overline{b_0}) \quad \text{for } i \ge 1.$$
(8.6)



Fig. 8.5 New circuits of a binary (a) incrementer and (b) decrementer based on FBCLA algorithm.

The operation  $\cdot$  is obviously connective operation, so that (8.5) and (8.6) can be calculated by binary trees as shown in Fig. 8.5. Its calculation time is proportional to the logarithm of word length, having the same characteristics of the FBCLA.

### 8.3 Conclusion

The new binary-carry-look-ahead, FBCLA, was proposed, the calculation time of which is about half of that of the conventional one with keeping layout area comparable. It is also shown that the binary incrementer and decrementer can be designed by applying the same algorithm used in the FBCLA.

## References

[8.1] M. Uya, K. Kaneko, and J. Yasui, "A CMOS floating point multiplier," IEEE J. Solid-

State Circuits, vol. SC-19, no. 5, pp. 697-702, Oct. 1984.

- [8.2] R. P. Brent and H. T. Kung, "A regular layout for parallel adders," *IEEE Trans. Comput.*, vol. C-31, no. 3, pp. 260–264, Mar. 1982.
- [8.3] H. R. Srinivas and K. K. Parhi, "A fast VLSI adder architecture," *IEEE J. Solid-State Circuits*, vol. SC-27, no. 5, pp. 761–767, May. 1992.
- [8.4] N. H. E. Weste and K. Eshraghian, Principles of CMOS VLSI Design: A System Perspective. Reading, MA: Addison-Wesley, 1988.

# Chapter 9

# 1 GHz Operation RISC Micro-Computer

#### Abstract

A RISC micro-processor built with an ultra-thin-film SOI substrate was designed and simulated. In the same chip area, instruction and data memories were also embedded using Harvard architecture. In carry generation block in ALU, which determines execution speed, FBCLA algorithm was employed. The gate widths were optimized based on a newly proposed method, utilizing negligible wiring capacitances on buried oxide. The simulation result indicates that the micro-processor on SOI operates over 1 GHz clock.

## 9.1 Introduction

Recently the performance of RISC (Reduced Instruction Set Central processing unit) microprocessor has been improved dramatically, taking the place of conventional CISC (Complex Instruction Set Central processing unit) microprocessor. As a result, although the density and speed of integrated circuits were conventionally competed by memories, they have been competed recently by the performance of RISC microprocessors as well as memories because the most advanced process can be introduced to fabrication of RISC in a short time due to its simple architecture[1–4]. Namely, the process technology recently has restricted the progress of

### CHAPTER 9. 1 GHZ OPERATION RISC MICRO-COMPUTER

CPU, while the design of architecture and layout conventionally restricted the progress of CPU. For an example, in 1990, 1-GHz-operation 4-bit CPU was firstly reported using Josephson device cooled by liquid helium[9.5], but only two years later, in 1992, 1 GIPS (Giga Instruction Per Second) 32-bit CPU was reported using silicon BiCMOS circuit at room temperature, the clock cycle of which is 250 MHz and it had 4 CPU super-scalar architecture[9.6]. However, parallel architecture performed such a high speed operation, which is strongly depend on softwares such as a compiler and application, and the effective performance may be different from the peak performance. To improve the performance independent of software, it is important to reduce the time to complete one instruction, that is, to increase clock cycle. It is noted that the clock cycle is determined by the delay time, which is degraded in a large scale integrated circuits built with an conventional bulk silicon substrate due to large wiring capacitance, so that the size reduction of MOSFETs can not necessarily improve the performance.

On the other hand, because the wiring capacitance of an SOI substrate is less than half of that of a bulk silicon substrate, it is suitable for the large scale integrated circuit such as micro processor. Additionally, the gate optimization is easier when the simple method described in chapter 5 is applied because the gate capacitance is dominant in all parasitic capacitances. In this chapter, a RISC microprocessor built with an ultra-thin SOI substrate will be described, which achieved 1 GHz operation, conventionally achieved by Josephson devices at liquid helium temperature.

# 9.2 Architecture of SOI RISC CPU

The RISC microprocessor designed in this study which is called SRISC (SOI RISC) hereafter was employed simple architecture because the target is to show the feasibility of SOI MOSFETs for large scale integrated circuits. Despite simplicity, to compare the performances of other RISC microprocessors, 32-bit architecture, which is most widely used, was employed. The

### CHAPTER 9. 1 GHZ OPERATION RISC MICRO-COMPUTER

| 1  | SETHI | const, rd            | set high 16 bits of rd |
|----|-------|----------------------|------------------------|
| 2  | SRL   | rs, rt or imm, rd    | shift right logical    |
| 3  | SLL   | rs, rt or imm, rd    | shift left logical     |
| 4  | SRA   | rs, rt or imm, rd    | shift right arithmetic |
| 5  | LD    | [address], rd        | load                   |
| 6  | ST    | [address], rd        | store                  |
| 7  | JMPL  | [address], rd        | jump and link          |
| 8  | Bcc   | [address], condition | branch on condition    |
| 9  | OR    | rs, rt or imm, rd    | or                     |
| 10 | AND   | rs, rt or imm, rd    | and                    |
| 11 | XOR   | rs, rt or imm, rd    | exclusive or           |
| 12 | XNOR  | rs, rt or imm, rd    | exclusive nor          |
| 13 | ADD   | rs, rt or imm, rd    | add                    |
| 14 | ADC   | rs, rt or imm, rd    | add with carry         |
| 15 | SUB   | rs, rt or imm, rd    | subtract               |
| 16 | SBB   | rs, rt or imm, rd    | subtract with borrow   |

#### Table 9.1 Operation code table.

instruction sets of SRISC is shown in Table 9.1, which are composed of 16 basic instructions. The calculation can be performed only between a register and immediate data, or two registers, and a branch operation is executed after one more operation like other RISCs. Five stage pipe line architecture was employed, which are (1) instruction fetch (IF), (2) instruction decode (DEC), (3) instruction execution (EXE), (4) memory access (MEM) and (5) register write-back (WB) as shown in Fig. 9.1. The circuit block diagram is shown in Fig. 9.2. The Harvard architecture, separating instruction and data memories, was employed in SRISC to relax the restriction of memory access time. However, in spite of the Harvard architecture, the access time must be still less than 1 ns to achieve 1 GHz operation, so that the memories were embedded in the same chip fabricated by the same SOI substrate. Both instruction and data memories have 32 bit  $\times$  128 words, which can be read or written externally as well as internally SRISC not only for the input of instruction and data but also for the measurement of their access time. Carry generation block in ALU, which determines large part of whole speed, employed FBCLA algorithm described in chapter 8. The simulation result showed that


5 stage pipeline

Fig. 9.1 Pipe line diagram



Fig. 9.2 Block diagram of SOI RISC CPU.

addition time should be less than 300 ps, which is less than 500 ps in order to achieve 1 GHz operation. SRISC can be interrupted only by reset signal for simplicity, although regular interrupts may be necessary for multi-tasks. The gate widths of MOSFETs were optimized by a simple way described in chapter 5. The layout of SRISC is shown in Fig. 9.3. The total number of pins is 174 including pins for confirmation and measurement of the performance of each block.

# 9.3 Conclusion

SRISC shown in Fig. 9.3 is currently under fabrication. The simulation results of each block module showed the possibility of 1 GHz clock operation. The feasibility of a large integrated circuit using SOI MOSFETs will be shown after measurement of the speed of SRISC.

# References

- [9.1] H. Nakano, M. Nakajima, Y. Nakakura, T. Yoshida, Y. Goi, Y. Nakai, R. Segawa, T. Kishida, and H. Kadota, "An 80-MFLOPS (peak) 64-b microprocessor for parallel computer," *IEEE J. Solid-State Cricuits*, vol. SC-27, no. 3, pp. 365–372, Mar. 1992.
- [9.2] K. Yano, M. Hiraki, S. Shukuri, M. Hanawa, M. Suzuki, S. Morita, A. Kawamata, N. Ohki, T. Nishida, and K. Seki, "3.3-V BiCMOS circuit techniques for 250-MHz RISC arithmetic modules," *IEEE J. Solid-State Cricuits*, vol. SC-27, no. 3, pp. 373–381, Mar. 1992.
- [9.3] D. W. Dobberpuhi, R. T. Witek, R. Allmon, R. Anglin, D. Bertucci, S. Britton, L. Chao, R. A. Conrad, D. E. Dever, B. Gieseke, S. M. N. Hassoun, G. W. Hoeppner, K. Kuchler, M. Ladd, B. M. Leary, L. Madden, E. J. McLellan, D. R. Meyer, J. Montanaro, D. A. Priore, V. Rajagopalan, S. Samudrala, and S. Santhanam, "A 200-MHz 64-b Dual-Issue





CMOS microprocessor," *IEEE J. Solid-State Cricuits*, vol. SC-27, no. 11, pp. 1555–1567, Nov. 1992.

- [9.4] R. I. Bahar, D. Bernstein, L. L. Biro, W. J. Bowhill, J. F. Brown, M. A. Case, R. W. Castelino, E. M. Cooper, M. A. Delaney, D. R. Deverell, J. H. Edmondson, J. J. Ellis, T. C. Fischer, T. F. Fox, M. K. Gowan, P. E. Gronowski, W. V. Herrick, A. K. Jain, J. E. Meyer, D. G. Miner, H. Partovi, V. Peng, R. P. Preston, C. Somanathan, R. L. Stamm, S. C. Thierauf, G. M. Uhler, N. D. Wade, and W. R. Wheeler, "A 100-MHz macropipelined VAX microprocessor," *IEEE J. Solid-State Cricuits*, vol. SC-27, no. 11, pp. 1585–1598, Nov. 1992.
- [9.5] H. Nakagawa, I. Kurosawa, M. Aoyagi, S. Kosaka, Y. Hamazaki, Y. Okada, and S. Takada, "Josephson computer ETL-JC1," IEICE Tech. Rep. SCE 89–59, pp. 43–48, 1989 (in Japanese).

<sup>[9.6]</sup> International Solid-State Circuit Conference,

# Chapter 10

# Conclusions

Modeling and design methods for ultra-short-channel MOSFETs built with an SOI substrate was shown in this dissertation. The results are summarized as follows:

- It was shown that the pinch-off point was ambiguous in any types of short-channel MOS-FETs including SOI MOSFETs. To describe this characteristics precisely, the derivative of horizontal electric field was considered, besides the longitudinal electric field considered in conventional gradual channel approximation model.
- 2. It was shown that a CMOS ring oscillator could be evaluated by the equivalent linear resistance of a MOSFET and the effective load capacitance of an inverter. It was also shown that the effect of leak current as well as resistance and capacitance should be evaluated for short-channel CMOS circuits. For the evaluation of leak current, the measurement of an inverter array as well as a ring oscillator was proposed, which improved the reliability of circuit evaluation.
- 3. In order to optimize the total delay time under the condition of negligible wiring capacitance such as SOI circuits, the size-dependent delay time in each gate should be equalized. Inverter buffers should be inserted to make the size-dependent delay time be  $f_I$  times that of an inverter, where  $f_I$  is the constant dependent on the process.

#### CHAPTER 10. CONCLUSIONS



Fig. 10.1 The important key words described in this dissertation.

- 4. MC (Modified Complex-gate) divider was proposed, in which redundant MOSFETs for static operation were omitted. It operated with only 50 μW at 1GHz input and 1 V supply voltage. For high speed operation, newly clocked-inverter type divider was proposed, the gate widths of which was optimized. Its maximum operation frequency is theoretically the reciprocal of twice the delay time of an inverter. Namely, it is expected to exceed 10 GHz when the delay time of an inverter is less than 50 ps.
- 5. The FBCLA (Fully Binary Carry Look Ahead) algorithm was proposed by modifying the conventional BCLA algorithm in order to calculate in more parallel way. The carry generation time by the FBCLA is about half of that by the BCLA. This algorithm can also apply to a counter.
- 6. 32-bit RISC microprocessor built with an SOI substrate was evaluated, for the application example of a large integrated circuits. Two 4k-bit memories and a two-phase clock generator were embedded in the same chip area, so that it operates without external circuits.

#### CHAPTER 10. CONCLUSIONS

The important key words described in this dissertation are summarized in Fig. 10.1. Finally, it is concluded that the guideline of ultra-short-channel CMOS circuits could be introduced by these studies.

# Acknowledgment

It is a great pleasure to express sincere gratitude to the dissertation supervisor, Professor Kunihiro Asada, for his constant guidance and support throughout the course of this work. These studies could be completed especially owing to his encouragement and help both professionally and privately. I would also like to thank Professor Takuo Sugano for his helpful discussions and suggestions.

I would like to thank Dr. Yasuhisa Omura of NTT LSI laboratory, who fabricated SIMOX SOI substrates and helped measurement of SOI devices and circuits and his fruitful discussions and suggestions were also acknowledged. I would also thank Mr. Tetsushi Sakai and Dr. Katsutoshi Izumi as well as Dr. Omura for useful discussions in the co-research project between NTT and the University of Tokyo, and thank Mr. Yuichi Kado of NTT LSI laboratory for helping the measurement of frequency dividers.

I would like to thank Mr. Minoru Fujita, Dr. Hisaro Katto and Dr. Katsuhiko Kubota of Hitachi Ltd. for their useful discussions and suggestions for a long time in the co-research project between NTT and the University of Tokyo throughout both master and doctor course. I would also like to thank Dr. Hiroshi Miki of Hitachi Ltd., who graduated from Sugano laboratory three years ago, for giving useful information about measurement and characteristics of SOI devices, by which I could start the study in doctor course smoothly.

I would like to thank all my present and past colleagues in both Asada laboratory and Sugano laboratory, who discussed and gave me useful suggestions for both researches and

privates. Especially, I would like to thank Mr. Masaki Yamashita and Dr. Mike Lee of Nippon Motorola Ltd. for discussions and suggestions about SOI/SIMOX devices and thank Mr. Makoto Ikeda of a graduate student for helping measurement and giving useful softwares. Additionally, Mr. Rimon Ikeno and Mr. Kazuhiko Mogi as well as Mr. Ikeda helped the design of RISC microprocessor, especially Mr. Mogi gave me useful idea about gate optimization describe in section 5. I would appreciate Mr. Shin-ichi Suzuki for his help in the laboratory. I would like to thank Ms. Makiko Okazaki and Ms. Noriko Yokochi, who are secretaries of Professor Asada, for help and encouragement.

Finally, I would like to express my gratitude to all my friends, especially in the swimming team, and to my family for private help and encouragement.

# List of Presentation and Publication

### Chapter 2

- M. Fujishima and K. Asada, "Drain-current modeling for deep-submicron MOSFETs," The Japan Society of Applied Physics, p. 610, Sep 1992.
- M. Fujishima and K. Asada, "A Non-pinch-off gradual channel model for ultra-thin SOI MOSFETs," submitted to *IEEE Electron Devices*.

# Chapter 3

- M. Fujishima, K. Asada, and T. Sugano, "Evaluation of dynamic load capacitance of CMOS circuits," *The Japan Society of Applied Physics*, p. 726, Mar 1992.
- M. Fujishima, M. Ikeda, K. Asada, Y. Omura and K. Izumi, "Analytical modeling of dynamic performance of deep sub-micron SOI/SIMOX based on current-delay product," *IEICE Trans.*, vol. E 75, no. 12, pp. 1506–1514, Dec 1992.
- M. Fujishima, K. Asada, and T. Sugano, "Evaluation of delay-time degradation of lowvoltage BiCMOS based on a novel analytical delay-time modeling," *IEEE J. Solid-State Circuits*, vol. 26, no. 1, pp. 25–31, Jan 1991.

# Chapter 4

 M. Fujishima and K. Asada, "Proposal of standard characterization method for dynamic circuit performance," *International Conference on Micro Test Structure*, Mar 1993.

### Chapter 5

 M. Fujishima, K. Asada, and T. Sugano, "Analytical gate-width optimization of MOS-FETs in tree-structured logic circuits," *1991 Spring National Convention Record, the IEICE*, p. 5-177, Mar 1991.

# Chapter 6

- M. Fujishima, M. Yamashita, M. Ikeda, K. Asada, Y. Omura, K. Izumi, T. Sakai and T. Sugano, "1 GHz 50 μW 1/2 frequency divider fabricated on ultra-thin SIMOX substrate," 1992 Symposium on VLSI circuits, pp. 46–47, Jun 1992.
- M. Fujishima, K. Asada, Y. Omura and K. Izumi, "Low power 1/2 frequency dividers using 0.1 μm CMOS circuits built with ultra-thin SIMOX substrates," *IEEE J. Solid-State Cricuits*, vol. SC-28, no. 4, Apr 1993.

## Chapter 7

 M. Fujishima and K. Asada, "On the highest performance CMOS frequency dividers," submitted to IEEE J. Solid-State Circuits.

### Chapter 8

1. M. Fujishima, K. Asada, and T. Sugano, "A High-Speed Adder and Counter Based on

New Binary-Carry-Look-AheadAlgorithm," 1991 Autumn National Convention Record, the IEICE, p. 5-105, Sep 1991.



