# OPTIMIZING THE INTEGRATION AND ENERGY EFFICIENCY OF THROUGH SILICON VIA-BASED 3D INTERCONNECTS

PANAGIOTIS ASIMAKOPOULOS

A Thesis Submitted for the Degree of Doctor of Philosophy at Newcastle University



School of Electrical, Electronic and Computer Engineering

Newcastle upon Tyne

November 2011

Panagiotis Asimakopoulos: *Optimizing the integration and energy efficiency of through silicon via-based* 3*D interconnects,* PhD Thesis, © November 2011

The aggressive scaling of CMOS process technology has been driving the rapid growth of the semiconductor industry for more than three decades. In recent years, the performance gains enabled by CMOS scaling have been increasingly challenged by highlyparasitic on-chip interconnects as wire parasitics do not scale at the same pace. Emerging 3D integration technologies based on vertical through-silicon vias (TSVs) promise a solution to the interconnect performance bottleneck, along with reduced fabrication cost and heterogeneous integration.

As TSVs are a relatively recent interconnect technology, innovative test structures are required to evaluate and optimise the process, as well as extract parameters for the generation of design rules and models. From the circuit designer's perspective, critical TSV characteristics are its parasitic capacitance, and thermomechanical stress distribution. This work proposes new test structures for extracting these characteristics. The structures were fabricated on a 65*nm* <sub>3</sub>D process and used for the evaluation of that technology.

Furthermore, as TSVs are implemented in large, densely interconnected <sub>3</sub>D-system-on-chips (SoCs), the TSV parasitic capacitance may become an important source of energy dissipation. Typical low-power techniques based on voltage scaling can be used, though this represents a technical challenge in modern technology nodes. In this work, a novel TSV interconnection scheme is proposed based on reversible computing, which shows frequencydependent energy dissipation. The scheme is analysed using theoretical modelling, while a demonstrator IC was designed based on the developed theory and fabricated on a 130*nm* <sub>3</sub>D process. Some ideas and figures have appeared previously in the following publications:

- D. Perry, J. Cho, S. Domae, P. Asimakopoulos, A. Yakovlev, P. Marchal, G. Van der Plas, and N. Minas. An efficient array structure to characterize the impact of through silicon vias on fet devices. In Proc. IEEE Int Microelectronic Test Structures (ICMTS) Conf, pages 118–122, 2011. (best paper award)
- A. Mercha, G. Van der Plas, V. Moroz, I. De Wolf, P. Asimakopoulos, N. Minas, S. Domae, D. Perry, M. Choi, A. Redolfi, C. Okoro, Y. Yang, J. Van Olmen, S. Thangaraju, D. S. Tezcan, P. Soussan, J. H. Cho, A. Yakovlev, P. Marchal, Y. Travaly, E. Beyne, S. Biesemans, and B. Swinnen. Comprehensive analysis of the impact of single and arrays of through silicon vias induced stress on high-k / metal gate cmos performance. In Proc. IEEE Int. Electron Devices Meeting (IEDM), 2010.
- A. Mercha, A. Redolfi, M. Stucchi, N. Minas, J. Van Olmen, S. Thangaraju, D. Velenis, S. Domae, Y. Yang, G. Katti, R. La- bie, C. Okoro, M. Zhao, P. Asimakopoulos, I. De Wolf, T. Chiarella, T. Schram, E. Rohr, A. Van Ammel, A. Jourdain, W. Ruythooren, S. Armini, A. Radisic, H. Philipsen, N. Heylen, M. Kostermans, P. Jaenen, E. Sleeckx, D. Sabuncuoglu Tezcan, I. Debusschere, P. Soussan, D. Perry, G. Van der Plas, J. H. Cho, P. Marchal, Y. Travaly, E. Beyne, S. Biesemans, and B. Swinnen. Impact of thinning and through silicon via proximity on high-k / metal gate first cmos performance. In Proc. Symp. VLSI Technology (VLSIT), pages 109–110, 2010.
- P. Asimakopoulos, G. Van der Plas, A. Yakovlev, and P. Marchal. Evaluation of energy-recovering interconnects for low-power 3d stacked ics. In Proc. IEEE Int. Conf. 3D System Integration 3DIC 2009, pages 1–5, 2009.
- P. Asimakopoulos and A. Yakovlev. An Adiabatic Power-Supply Controller For Asynchronous Logic Circuits. 20th UK Asynchronous Forum, pages 1–4, 2008.

In the end, I hope there's a little note somewhere that says I designed a good computer.

- Steve Wozniak [79]

# ACKNOWLEDGEMENTS

First and foremost I would like to express my gratitude to my supervisor, Prof. Alex Yakovlev, who provided the perfect balance between freedom and guidance, allowing me to obtain spherical knowledge and scope in microelectronics but at the same time to promptly complete my studies. I'm also grateful to the Engineering and Physical Science Research Council (EPSRC) for funding my PhD studies through their Doctoral Training Accounts.

As part of my research project I had the opportunity to spend some time at Imec, Belgium. The experiences and lessons learned during my stay at Imec will undoubtedly prove valuable in my forthcoming professional career. I would like to thank Dan Perry from Qualcomm for sharing his experience and ideas for the design of the characterization structures implemented on the 65*nm* Imec process. Special thanks to Shinichi Domae from Panasonic and Dr. Dimitrios Velenis for their assistance with the 65*nm* wafer measurements, as well as Marco Facchini for the time he spend trying to measure the energy-recovery circuits. Last but not least, I'd like to thank Geert Van der Plas and Paul Marchal for their advices and guidance throughout my stay at Imec.

Finally, many thanks to my colleagues at Newcastle University for assisting me in diverse ways during my studies. I would like to especially mention Dr. Santosh Shedabale and James Docherty for reviewing and commenting on parts of this text, as well as Dr. Robin Emery for the numerous technical discussions and for helping setup the design tools.

# CONTENTS

| 1 | Intro | oduction 1                                          |
|---|-------|-----------------------------------------------------|
|   | 1.1   | Research goals and contribution 2                   |
|   | 1.2   | Thesis outline                                      |
| 2 | Bacl  | kground 5                                           |
|   | 2.1   | 3D integrated circuits                              |
|   |       | 2.1.1 3D integration approaches 6                   |
|   |       | 2.1.2 TSV interconnects                             |
|   | 2.2   | Energy-recovery logic                               |
|   |       | 2.2.1 Energy-recovery architectures                 |
|   |       | 2.2.2 Power-clock generators                        |
|   | 2.3   | Full-custom design flow                             |
|   | 2.4   | Summary                                             |
| 3 | Cha   | racterization of TSV capacitance 23                 |
|   | 3.1   | Introduction                                        |
|   | 3.2   | TSV parasitic capacitance                           |
|   | 3.3   | Integrated capacitance measurement techniques 24    |
|   | 3.4   | Electrical characterization                         |
|   |       | 3.4.1 Physical implementation                       |
|   |       | 3.4.2 Measurement setup                             |
|   |       | 3.4.3 Experimental results                          |
|   | 3.5   | Summary and conclusions                             |
| 4 | TSV   | proximity impact on MOSFET performance 46           |
|   | 4.1   | Introduction                                        |
|   | 4.2   | TSV-induced thermomechanical stress                 |
|   | 4.3   | Test structure implementation                       |
|   |       | 4.3.1 MOSFET arrays                                 |
|   |       | 4.3.2 Digital selection logic                       |
|   | 4.4   | Experimental measurements                           |
|   |       | 4.4.1 Data processing methodology 61                |
|   |       | 4.4.2 Measurement setup                             |
|   |       | 4.4.3 PMOS (long channel)                           |
|   |       | 4.4.4 PMOS (short channel)                          |
|   | 4.5   | Summary and conclusions                             |
| 5 | Eval  | luation of energy-recovery for TSV interconnects 75 |
|   | 5.1   | Introduction                                        |
|   | 5.2   | Energy-recovery TSV interconnects                   |
|   | 5.3   | Analysis                                            |
|   |       | 5.3.1 Adiabatic driver                              |
|   |       | 5.3.2 Inductor's parasitic resistance 82            |
|   |       | 5.3.3 Switch M1                                     |
|   |       | 5.3.4 Timing Constraints                            |
|   | 5.4   | Simulations                                         |
|   |       | 5.4.1 Comparison to CMOS                            |

|                  | 5.5                                | Summary and conclusions               | 91  |  |
|------------------|------------------------------------|---------------------------------------|-----|--|
| 6                | Energy-recovery 3D-IC demonstrator |                                       |     |  |
|                  | 6.1                                | Introduction                          | 92  |  |
|                  | 6.2                                | Physical implementation               | 92  |  |
|                  |                                    | 6.2.1 Adiabatic driver                | 94  |  |
|                  |                                    | 6.2.2 Pulse-to-level converter        | 95  |  |
|                  |                                    | 6.2.3 Data generator                  | 95  |  |
|                  |                                    | 6.2.4 Pulse generator                 | 95  |  |
|                  |                                    | 6.2.5 Integrated inductor             | 97  |  |
|                  |                                    | 6.2.6 Top level design                | 98  |  |
|                  |                                    | 6.2.7 CMOS I/O circuit                | 101 |  |
|                  | 6.3                                | Post-layout simulation and comparison | 101 |  |
|                  | 6.4                                | Experimental measurements             | 106 |  |
|                  |                                    | 6.4.1 Measurement setup               | 106 |  |
|                  |                                    | 6.4.2 Results                         | 106 |  |
|                  | 6.5                                | Summary and conclusions               | 111 |  |
| 7                | Con                                | clusions                              | 114 |  |
|                  | 7.1                                | Main contributions                    | 115 |  |
|                  | 7.2                                | Future work                           | 116 |  |
| Α                | App                                | endix A: Photos                       | 118 |  |
| в                | App                                | endix B: Statistics                   | 120 |  |
|                  | 11                                 |                                       |     |  |
| BIBLIOGRAPHY 122 |                                    |                                       |     |  |

# LIST OF FIGURES

| Figure 1.1  | Scaling trend of logic and interconnect delay     |
|-------------|---------------------------------------------------|
|             | [30]2                                             |
| Figure 2.1  | Comparison of $2D/3D$ Integration 5               |
| Figure 2.2  | System-in-a-package 6                             |
| Figure 2.3  | Package-on-package 6                              |
| Figure 2.4  | Imec's 3-D SIP wireless bio-electronic sensor     |
|             | [67]7                                             |
| Figure 2.5  | 3D wafer-level-packaging                          |
| Figure 2.6  | 3D-stacked IC                                     |
| Figure 2.7  | TSV process steps                                 |
| Figure 2.8  | Before thinning/after thinning 10                 |
| Figure 2.9  | TSV stacking                                      |
| Figure 2.10 | Conventional AND gate                             |
| Figure 2.11 | CMOS switching                                    |
| Figure 2.12 | Adiabatic switching                               |
| Figure 2.13 | Basic 2N-2P adiabatic circuit                     |
| Figure 2.14 | 2N-2P timing                                      |
| Figure 2.15 | Adiabatic driver                                  |
| Figure 2.16 | Stepwise charging generator                       |
| Figure 2.17 | 1N power clock generator                          |
| Figure 2.18 | Design flow                                       |
| Figure 2.19 | Inverter in Virtuoso schematic                    |
| Figure 2.20 | Inverter in Virtuoso lavout.                      |
| Figure 3.1  | Isolated 2D wire capacitance.                     |
| Figure 2.2  | TSV parasitic capacitance                         |
| Figure 3.2  | Capacitance vs. Gate voltage (CV) diagram         |
| i iguie 3.3 | of a MOS Capacitor                                |
| Figure 2.4  | CBCM pseudo-inverters                             |
| Figure 2.5  | Current as a function of frequency in CBCM 28     |
| Figure 3.5  | CBCM peoudo invortor pair in lavout               |
| Figure 3.0  | PPC1 test ned medule (levent)                     |
| Figure 3.7  | PPC1 test pad module (micrograph)                 |
| Figure 3.0  | FINCT lest pau module (iniciograph) 30            |
| Figure 3.9  | Structure A - Single 15V capacitance (TOP         |
| т.          | die)                                              |
| Figure 3.10 | Structure B - Single ISV capacitance (BOI-        |
| -           | 10M die)                                          |
| Figure 3.11 | Structure 'C' - Variable pitch TSV capacitance.34 |
| Figure 3.12 | Structure 'D' - TSV capacitance measure-          |
|             | ment with interconnection on TOP or BOT-          |
|             | TOM die                                           |
| Figure 3.13 | Structure 'R' - Evaluation of TSV capacit-        |
|             | ance measurement accuracy. $\ldots \ldots 35$     |

| Figure 3.14 | Structure 'E' - TSV capacitance effect on       |
|-------------|-------------------------------------------------|
|             | signal propagation delay (layout) 35            |
| Figure 3.15 | Structure 'E' - TSV capacitance effect on       |
|             | signal propagation delay (schematic) 36         |
| Figure 3.16 | Measurement setup                               |
| Figure 3.17 | TSV capacitance D2D variability on wafer        |
|             | 'D11' - Probability density function 39         |
| Figure 3.18 | TSV capacitance D2D variability on wafer        |
|             | 'D18' - Probability density function 40         |
| Figure 3.19 | TSV capacitance measurement results on          |
|             | wafer D11                                       |
| Figure 3.20 | TSV capacitance measurement results on          |
|             | wafer D18                                       |
| Figure 3.21 | Simulated oxide liner thickness variation 42    |
| Figure 3.22 | TSV capacitance W2W variability - Probab-       |
|             | ility density functions                         |
| Figure 3.23 | Variable pitch TSV capacitance experimental     |
|             | data from wafer 'D18' - Probability density     |
|             | function                                        |
| Figure 3.24 | LCR experimental data from wafer 'D18' -        |
|             | Probability density function                    |
| Figure 4.1  | TSV thermomechanical stress simulation [40]. 47 |
| Figure 4.2  | TSV thermomechanical stress interaction         |
|             | between two TSVs [40]. $\ldots \ldots 48$       |
| Figure 4.3  | Estimated mobility shift on PMOS (a) and        |
|             | NMOS (b) transistors in between two TSVs        |
|             | separated by distance d [65]                    |
| Figure 4.4  | PTP1 test pad module (layout)                   |
| Figure 4.5  | PTP1 test pad module (micrograph) 50            |
| Figure 4.6  | MOSFET dimensions and rotations 51              |
| Figure 4.7  | 1 TSV MOSFET array configuration 52             |
| Figure 4.8  | 4 diagonal TSVs MOSFET array configuration.52   |
| Figure 4.9  | 4 lateral TSVs MOSFET array configuration. 53   |
| Figure 4.10 | 9 TSVs MOSFET array configuration 53            |
| Figure 4.11 | MOSFET array detail                             |
| Figure 4.12 | Digital selection logic                         |
| Figure 4.13 | MOSFET array with digital selection logic. 57   |
| Figure 4.14 | 4-bit Gate-enable/Array-enable decoder 58       |
| Figure 4.15 | Transmission gate for MOSFET Gate terminals.59  |
| Figure 4.16 | Transmission gates for MOSFET Drain/-           |
|             | Source terminals                                |
| Figure 4.17 | 2-bit Drain-select decoder 60                   |
| Figure 4.18 | Processing of measurement results 62            |
| Figure 4.19 | Interpolation of measurement results 63         |
| Figure 4.20 | Measurement setup 63                            |
| Figure 4.21 | Normalised on-current for PMOS structure        |
|             | "Fo" (Do8)                                      |

| Figure 4.22  | Normalised on-current for PMOS structure                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 66         |
|--------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|
| Figure 4.23  | Normalised on-current for PMOS structure                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 00         |
|              | "Go" (Do8,Y=14,X=16)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 66         |
| Figure 4.24  | Normalised on-current for PMOS structure                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 67         |
| Figure 4 25  | Normalised on-current for PMOS structure                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | •7         |
| 1 iguie 4.29 | "Go" (D20,Y=14,X=16)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 67         |
| Figure 4.26  | Normalised on-current for PMOS structure "Go" (Do8, $60^{\circ}$ C)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 68         |
| Figure 4.27  | Normalised on-current for PMOS structure                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 00         |
| 0 1 7        | "Go" (Do8, Y=14, X=16, 60°C).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 69         |
| Figure 4.28  | Normalised on-current for PMOS structure                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |            |
|              | "Io" (Do8)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 69         |
| Figure 4.29  | Normalised on-current for PMOS structure                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |            |
|              | "Io" (Do8, X=16)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 70         |
| Figure 4.30  | Normalised on-current for PMOS structure                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |            |
| 0            | "Io" (Do8, Y=14).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 70         |
| Figure 4.31  | Normalised on-current for PMOS structure                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |            |
| 0 10         | "G90" (Do8)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | 71         |
| Figure 4.32  | Normalised on-current for PMOS structure                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |            |
| 0 15         | "G90" (D08, Y=14, X=16)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | 71         |
| Figure 4.33  | Normalised on-current for PMOS structure                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |            |
| 0 100        | "Bo" (Do8)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 72         |
| Figure 4.34  | Normalised on-current for PMOS structure                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | /-         |
|              | "Bo" (Do8, Y=14, X=16)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | 72         |
| Figure 5.1   | TSV energy dissipation per cycle increase                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | <b>`</b>   |
| 0 9          | with 3D integration density.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | 76         |
| Figure 5.2   | Comparison of a typical CMOS TSV driver                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | / -        |
|              | to the proposed energy-recovery.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 77         |
| Figure 5.2   | Adiabatic driver                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 78         |
| Figure 5.3   | A simple pulse-to-level converter (PoI C)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 78         |
| Figure 5.4   | Reconant pulse generator with dual-rail data                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | 70         |
| riguite 5.5  | encoding                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 70         |
| Figuro = 6   | Encountg                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 79         |
| Figure 5.0   | Energy recovery scheme for 15v interconnects.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 79<br>80   |
| Figure 5.7   | The second secon | 03         |
| Figure 5.8   | lest case for the evaluation of the theoret-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | ο.         |
| <b>T</b> '   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 84         |
| Figure 5.9   | Energy dissipation results ( <i>f</i> / <i>cycle</i> ), the-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |            |
|              | oretical model (1) versus SPICE simulator                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 0          |
| <b>T</b> '   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | <b>ŏ</b> 5 |
| Figure 5.10  | Energy dissipation estimation error of the                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |            |
|              | theoretical model compared to the SPICE                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | 0          |
|              | simulator.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 85         |
| Figure 5.11  | Test case for comparison of energy-recovery                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |            |
|              | to CMOS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | 86         |
| Figure 5.12  | Improved P2LC design.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 87         |

| Figure 5.13 | Simulated energy dissipation of improved                                                                                          |
|-------------|-----------------------------------------------------------------------------------------------------------------------------------|
|             | P2LC design                                                                                                                       |
| Figure 5.14 | Energy dissipation reduction achieved by                                                                                          |
|             | the energy-recovery circuit compared to                                                                                           |
|             | CMOS for variable Q factor ( $C = 80 fF$ ) 88                                                                                     |
| Figure 5.15 | Energy contribution of energy-recovery cir-                                                                                       |
| -           | cuit components at $200MHz$ ( $C = 80fF$ ) 88                                                                                     |
| Figure 5.16 | Energy contribution of energy-recovery cir-                                                                                       |
| -           | cuit components at $800MHz$ ( $C = 80fF$ ) 89                                                                                     |
| Figure 5.17 | Energy dissipation reduction achieved by                                                                                          |
| 0           | the energy-recovery circuit compared to                                                                                           |
|             | CMOS with variable TSV capacitance ( $C =$                                                                                        |
|             | 80 fF, f = 500 MHz)                                                                                                               |
| Figure 5.18 | Energy dissipation reduction achieved by                                                                                          |
|             | the energy-recovery circuit compared to                                                                                           |
|             | CMOS ( $C = 80 fF, f = 200 MHz$ )                                                                                                 |
| Figure 5.19 | Energy dissipation reduction achieved by                                                                                          |
| -           | the energy-recovery circuit compared to                                                                                           |
|             | CMOS for variable technology ( $C = 80 fF$ ,                                                                                      |
|             | $f = 500 MHz). \dots \dots$ |
| Figure 6.1  | Demonstrator circuit structure (showing 1-                                                                                        |
|             | bit input/output (I/O) channel, face-down                                                                                         |
|             | view)                                                                                                                             |
| Figure 6.2  | Energy-recovery I/O circuit schematic dia-                                                                                        |
|             | gram                                                                                                                              |
| Figure 6.3  | Adiabatic driver (layout/schematic view) 94                                                                                       |
| Figure 6.4  | P2LC (layout/schematic view)                                                                                                      |
| Figure 6.5  | Data generator (layout/schematic view). $96$                                                                                      |
| Figure 6.6  | Timing constraints for the energy-recovery                                                                                        |
|             | demonstrator circuit. $\dots \dots \dots \dots \dots \dots \dots \dots 96$                                                        |
| Figure 6.7  | Pulse generator (layout/schematic view) 97                                                                                        |
| Figure 6.8  | Pulse generator Input/Output signals 97                                                                                           |
| Figure 6.9  | Inductor's Q factor simulation in ASITIC /                                                                                        |
|             | Estimated energy dissipation reduction per                                                                                        |
|             | cycle over standard CMOS. $\dots \dots \dots \dots 98$                                                                            |
| Figure 6.10 | Planar square spiral inductor designed in                                                                                         |
|             | ASITIC                                                                                                                            |
| Figure 6.11 | Resonant pulse distribution tree 100                                                                                              |
| Figure 6.12 | Interdigitated metal-insulator-metal capa-                                                                                        |
|             | citor (fringe capacitor) 101                                                                                                      |
| Figure 6.13 | Energy-recovery I/O circuit in layout view 102                                                                                    |
| Figure 6.14 | CMOS I/O circuit schematic diagram 103                                                                                            |
| Figure 6.15 | CMOS driver schematic 103                                                                                                         |
| Figure 6.16 | CMOS I/O circuit in layout view 103                                                                                               |
| Figure 6.17 | Post-layout SPICE simulation of the energy-                                                                                       |
|             | recovery circuit                                                                                                                  |
| Figure 6.18 | Energy-recovery I/O circuit micrograph 107                                                                                        |

| Figure 6.19 | CMOS I/O circuit micrograph 107                             |
|-------------|-------------------------------------------------------------|
| Figure 6.20 | Measurement setup                                           |
| Figure 6.21 | Oscilloscope output for signals S1B and OC-                 |
|             | MOS                                                         |
| Figure 6.22 | Measured/simulated power dissipation for                    |
|             | the CMOS circuit                                            |
| Figure 6.23 | Measured/simulated energy dissipation for                   |
|             | the CMOS circuit. $\ldots$ $\ldots$ $\ldots$ $\ldots$ $110$ |
| Figure 6.24 | Oscilloscope output for signals S1A and OER.110             |
| Figure 6.25 | Measured/simulated energy dissipation for                   |
|             | the energy-recovery circuit                                 |
| Figure 6.26 | Energy-recovery I/O circuit with fault hy-                  |
|             | pothesis                                                    |
| Figure 6.27 | Simulation of circuit with fault hypothesis 112             |
| Figure 6.28 | Measured energy dissipation and simula-                     |
|             | tion of circuit with fault hypothesis 113                   |
| Figure A.1  | 3D65 300mm wafer (on chuck)                                 |
| Figure A.2  | Probe card 118                                              |
| Figure A.3  | 3D130 200mm wafer (on chuck) 119                            |
| Figure A.4  | Probe head                                                  |
|             |                                                             |

| Table 2.1  | 3D interconnect technologies comparison 8                     |
|------------|---------------------------------------------------------------|
| Table 2.2  | Technology parameters for Imec 3D130 process.11               |
| Table 2.3  | Technology parameters for Imec 3D65 process.11                |
| Table 2.4  | Comparison energy-recovery/conventional                       |
|            | CMOS                                                          |
| Table 3.1  | Test structures on the PRC1 module 31                         |
| Table 3.2  | PRC1 test pad instrument connections 37                       |
| Table 3.3  | TSV capacitance D2D variability on wafer                      |
| 00         | 'D11'                                                         |
| Table 3.4  | TSV capacitance D2D variability on wafer                      |
| 51         | 'D18'                                                         |
| Table 3.5  | Variable pitch TSV capacitance experimental                   |
| 55         | data from wafer 'D18'. $\ldots$ $\ldots$ $\ldots$ $\ldots$ 43 |
| Table 3.6  | Comparison between CBCM/LCR measure-                          |
| 9          | ments on wafer 'D18'. $\ldots$ 45                             |
| Table 4.1  | Coefficients of thermal expansion                             |
| Table 4.2  | MOSFET array configurations for test pad                      |
|            | modules PTP1/PTP2                                             |
| Table 4.3  | MOSFET array configurations selected for                      |
|            | experimental evaluation                                       |
| Table 4.4  | PTP1/PTP2 test pad module instrument                          |
|            | connections                                                   |
| Table 4.5  | Summary of experimental measurements                          |
|            | on long (L) and short (S) channel PMOS                        |
|            | devices                                                       |
| Table 5.1  | Energy dissipation results ( <i>f1/cucle</i> ), the-          |
| 10.210 ).1 | oretical model (T) versus SPICE simulation                    |
|            | (S) 84                                                        |
| Table 6.1  | Energy-recovery I/O circuit parameters                        |
| Table 6.2  | Calculated resistance tolerances for each                     |
| 14010 0.2  | wire branch in the resonant clock distribu-                   |
|            | tion tree 100                                                 |
| Table 6 a  | Energy-dissipation theoretical calculations                   |
| fuble 0.9  | for $\Omega = 17$ $f = 500 MHz$ 104                           |
| Table 6 4  | Post-layout simulated energy dissipation /                    |
| 10010-0.4  | Area requirements                                             |
| Table 6 =  | Energy-recovery circuit instrument connec-                    |
| 10010 0.7  | tions 108                                                     |
| Table 6.6  | CMOS circuit instrument connections 108                       |
| 10010-0.0  | $\sim 1000$ chean monument connections. $\sim 100$            |

# ACRONYMS

| al |
|----|
|    |

- 2N-2P 2 nMOS/2 pMOS adiabatic logic
- <sub>3</sub>D-IC three-dimentional integrated circuit
- 3D-SIC 3D stacked-IC
- <sub>3</sub>D-SIP <sub>3</sub>D system-in-package
- <sub>3</sub>D three-dimentional
- 3D-WLP 3D wafer-level-packaging
- BCB benzocyclobutene
- BEOL back-end-of-line
- CAL clocked adiabatic logic
- CBCM charge-based capacitance measurement
- CMOS complementary metal-oxide-semiconductor
- CMP chemical-mechanical planarization
- CTE coefficient of thermal expansion
- Cu copper
- CVD chemical vapor deposition
- D<sub>2</sub>D die-to-die
- D2W die-to-wafer
- DCVSL differential cascode voltage switch logic
- DRC design rule check
- DUT device under test
- ECRL efficient charge recovery logic
- EDA electronic design automation
- FEM finite element method
- FEOL front-end-of-line
- GPIB general purpose interface bus
- IC integrated circuit

- I/O input/output
- KOZ keep-out-zone
- LC inductance-capacitance
- LVS layout versus schematic
- MEMS microelectromechanical systems
- MIM metal-insulator-metal
- MOSFET metal-oxide-semiconductor field-effect transistor
- MOS metal-oxide-semiconductor
- MPU microprocessor unit
- MuGFET multiple gate field-effect transistor
- NMOS n-channel MOSFET
- P<sub>2</sub>LC pulse-to-level converter
- PAL pass-transistor adiabatic logic
- PCB printed circuit board
- PCG power-clock generator
- PMD pre-metal dielectric
- PMOS p-channel MOSFET
- PVT process, voltage, temperature
- QSERL quasi-static energy recovery logic
- RC resistance-capacitance
- RDL re-distribution layer
- RLC resistance-inductance-capacitance
- RMS root mean square
- SIP system-in-a-package
- Si silicon
- SMU source measurement unit
- SoC system-on-chip
- TaN tantalum nitride
- TSV through-silicon via
- W tungsten

- W<sub>2</sub>W wafer-to-wafer
- WLP wafer-level-packaging

Extending integrated circuit (IC) functionality while keeping cost per silicon area approximately constant has historically been the fundamental factor behind the growth of the semiconductor industry. Ever since Moore's observation [51] in 1965 chip functionality has been doubling every 1.5-2 years, advancing from the Intel 4004 with 2,000 transistors to modern generation microprocessor units (MPUs) that integrate over 1 billion transistors [35]. While there are indications that this trend might be slowing down due to planar metal-oxide-semiconductor fieldeffect transistor (MOSFET) physical limitations [22], innovative solutions such as multiple gate field-effect transistors (MuGFETs) continue to drive the transistor roadmap forward as was recently demonstrated by Intel on their 22nm node MPU [12]. It can be reasonably expected that IC transistor density will continue to increase well into the next decade reaching 10 or even 100 billion transistors per chip [4].

As transistor gate length, dielectric thickness, and junction depth are scaled performance in terms of speed and power improves as well. The performance requirements for ICs have been steadily increasing for succeeding technology generations driven by market demand. Although transistor scaling has traditionally been the main technology focus determining both cost and performance of ICs, in recent years the performance gains enabled by scaling have been gradually challenged by on-chip interconnects.

In an IC interconnects are the conducting wires connecting devices and functional blocks in the circuit. With smaller transistors and increased functionality in modern processes, interconnects have been progressively becoming more complex occupying up to 12 metal layers in the chip. As wire length increased, so did the signal delay time and power consumption of interconnects, which are typically determined by wire intrinsic parasitics in terms of resistance-capacitance (RC). It was estimated that a 130*nm* microprocessor consumed over 50% of dynamic power on interconnects [41], with a projection that in future designs up to 80% of microprocessor power would be consumed by interconnects [3]. Likewise, signal propagation delay of global interconnects as a portion of the total system delay has been rapidly increasing for succeeding technology nodes (Figure 1.1).

The rapid increase of the interconnect impact on IC performance prompted ITRS<sup>1</sup> to appoint the realisation of novel and

<sup>1</sup> International Technology Roadmap for Semiconductors



Figure 1.1: Scaling trend of logic and interconnect delay [30].

innovative interconnect technologies as one of the "Grand Challenges" in 2007 [3]. Since then several technologies have been investigated to address the interconnect bottleneck, such as copper/lowk interconnects, optical interconnects, and three-dimentional (3D) interconnects. Although copper/low-k interconnects are more compatible with current processes and remain the most attractive option into the near term, 3D interconnects are probably the most realistic option for meeting future technology trends [53].

Emerging <sub>3</sub>D integration technologies based on vertical throughsilicon vias (TSVs) promise a solution to the interconnect performance bottleneck, along with reduced fabrication cost and heterogeneous integration. TSVs are electrical connections passing through the silicon substrate and interconnecting separate dies stacked vertically in a <sub>3</sub>D package [69]. While there are several <sub>3</sub>D interconnect technologies [11], TSVs have the potential to offer the best all around <sub>3</sub>D integration technology with their advantages including reduced parasitics, and high interconnection density.

#### 1.1 RESEARCH GOALS AND CONTRIBUTION

As TSV is a relatively recent <sub>3</sub>D interconnect technology, there are still challenges to be overcome before TSVs can be widely adopted and commercialised. Fabrication issues such as, etching, liner deposition, metallization, and chemical-mechanical planarization (CMP) steps are in the process of being resolved, or solutions have already been found [55]. Consequently the next logical step is the implementation of three-dimentional integrated circuits (3D-ICs) with more complex functionalities than simple test chips. However, complex circuit design requires design methodologies and tools that for 3D-ICs are still at their infancy stage [18].

The motivation behind this project was to accelerate the development of such design methodologies and tools for future <sub>3</sub>D-IC designs, by addressing key concerns of the TSV technology primarily from the circuit designer's perspective. The research effort was aimed at developing characterization structures for evaluating TSV characteristics affecting signal propagation delay, power consumption, and circuit robustness, as well as proposing design techniques for overcoming limitations. Although TSVs are interconnects, their behaviour resembles more to MOSFET devices than the commonly used back-end-of-line (BEOL) metal wires, while their integration into silicon introduces new challenges for ICs. Consequently, under certain conditions conventional design approaches are sufficient neither for TSV characterization nor for TSV interconnection.

The TSV parasitic capacitance and thermomechanical stress distribution are critical TSV characteristics. The parasitic capacitance affects both delay and power consumption of signals passing through the TSV interconnect, while thermomechanical stress affects MOSFET device mobility in the vicinity of TSVs. Accurately determining these parameters is necessary for generating design rules and models that can be used in TSV-aware design tools. As part of this project, sophisticated test structures were developed and fabricated on a 65*nm* <sub>3</sub>D process for extracting the TSV parasitic capacitance and thermomechanical stress distribution.

The TSV parasitic capacitance is typically small, however it may still become an important source of power consumption as TSVs are gradually implemented in large, densely interconnected <sub>3</sub>Dsystem-on-chips (SoCs). Although conventional low-power design techniques based on voltage scaling can be used, this represents a technical challenge in modern technology nodes. In this work, a novel TSV interconnection scheme is proposed, which is based on energy-recovery logic and shows frequency-dependent power consumption. The scheme is analysed using theoretical modelling, while a demonstrator IC was designed based on the developed theory and fabricated on a 130*nm* <sub>3</sub>D process.

The work presented in this thesis was part of a larger research project undertaken at Imec<sup>2</sup>, with the goal of improving and optimising TSV-based <sub>3</sub>D integration technology. The author's contribution within this project was to propose, design and simulate the circuits presented in the following chapters, as well as perform and analyse the experimental measurements.

#### **1.2 THESIS OUTLINE**

This thesis is organised into seven Chapters and two Appendixes as follows:

<sup>2</sup> Imec is a micro- and nanoelectronics research center located in Leuven, Belgium (http://www.imec.be/)

Chapter 2 (Background) introduces the basic concepts comprising the fields of <sub>3</sub>D integration and energy-recovery logic. In addition, the design flow used for implementing the design part of this work is described.

Chapter 3 (Characterization of TSV capacitance) presents a group of test structures for characterizing the TSV parasitic capacitance under various conditions. The test structures are fabricated on a  $65nm_{3D}$  process, and the measurement data are analysed using statistical methods.

Chapter 4 (TSV proximity impact on MOSFET performance) presents a test structure for monitoring the impact of TSV-induced thermomechanical stress on MOSFET device performance. The test structure uses proven design techniques for implementing a large number of test-cases and accessing numerous devices with maximum precision. The structure is fabricated and used for characterizing thermomechanical stress on a  $65nm_{3D}$  process.

Chapter 5 (Evaluation of energy-recovery for TSV interconnects) investigates the potential of the energy-recovery technique for reducing the energy dissipation of TSV interconnects in <sub>3</sub>D-ICs. The total energy dissipation per cycle and optimum device sizing are extracted for the proposed scheme using theoretical modelling, while the configuration is evaluated against conventional static complementary metal-oxide-semiconductor (CMOS) design.

Chapter 6 (Energy-recovery 3D-IC demonstrator) elaborates on the design of a 3D demonstrator circuit based on the energyrecovery scheme, under realistic physical and electrical constraints. The demonstrator is compared to a CMOS circuit designed with identical specifications using post-layout simulations. Both circuits are fabricated on a 130*nm* 3D technology process and evaluated based on the experimental measurements.

Chapter 7 (Conclusions) summarises the contributions of this thesis as presented in the previous chapters, and discusses areas of future work.

Appendix A includes photos used as reference in various parts of this work.

Appendix B elaborates on the statistical descriptors used in this work for analysing experimental measurement data.

# 2

The work presented in this thesis contributes in the fields of <sub>3</sub>D integration and energy-recovery logic. This chapter introduces the basic concepts comprising these two fields; not in extensive detail, but enough that the reader can understand the motivation and the choices made in the context of this work. Furthermore, the full-custom design flow is discussed, that was used to implement the design part of this thesis as will be presented in the following chapters.

### 2.1 3D INTEGRATED CIRCUITS

CMOS integration technologies are traditionally based on twodimentional (2D) planar architectures. Each die is individually packaged and placed on a printed circuit board (PCB), which is used to provide mechanical support and interconnection between different dies and electronic components. A 3D-IC is a stack of multiple dies (Figure 2.1), vertically integrated and interconnected, sharing a common package.

There are several motivations for the development of <sub>3</sub>D-ICs. Perhaps the most significant advantage of <sub>3</sub>D integration is the reduced interconnection delay and energy dissipation. In general, global interconnect delay and energy is linearly related to wire length in an IC. By using <sub>3</sub>D integration, modelling of simple <sub>3</sub>D-IC architectures has demonstrated that up to 50% reduction in global interconnect length may be achieved [24], with comparable reductions in delay and energy.

Another advantage of <sub>3</sub>D integration is heterogeneous integration. Presently, <sub>2</sub>D planar architectures limit designers to a single fabrication technology for both the analog and digital parts of



Figure 2.1: Comparison of 2D/3D Integration.



Figure 2.2: System-in-a-package. Figure 2.3: Package-on-package.

a circuit. The common practice is to use an inexpensive digital process for the complete design, which is usually inefficient for implementing analog circuits. In contrast, a <sub>3</sub>D-IC allows for the most suitable process technology to be used for implementing each part of the design. In addition other components may be integrated in the same circuit, such as microelectromechanical systems (MEMS) [80] and passive devices [66] (resistors, capacitors, inductors), enabling ICs with novel functionalities.

The replacement of long horizontal global interconnects with short vertical ones allows for reduced die size as well, which is directly related to the fabrication cost. Also the vertical integration of dies in the same package, removes the need for the spacing overhead between components that is necessary in PCBs, increasing the area efficiency and reducing the form factor.

#### 2.1.1 3D integration approaches

<sub>3</sub>D integration may be achieved using several microelectronic technologies, which result in different densities and capabilities. The levels of <sub>3</sub>D integration can be reasonably classified into three major technology platforms [11, 4], based on the interconnect wiring hierarchy and the industrial infrastructure:

- 3D system-in-package (3D-SIP): Stacking of individual packages.
- 3D wafer-level-packaging (3D-WLP): Stacking of embedded dies by wafer-level-packaging (WLP).
- 3D stacked-IC (3D-SIC): TSV-based vertical system integration.

The <sub>3</sub>D-SIP integration is based on traditional packaging interconnect technologies. Each sub-system is integrated in a system-ina-package (SIP) fashion (Figure 2.2) using wire-bonds for interconnection, and subsequently each SIP subsystem is assembled vertically to create package-on-package <sub>3</sub>D stacks (Figure 2.3).

<sub>3</sub>D-SIP is currently the most mature technology and has already reached volume production. It provides with increased functionality in a smaller form factor and low cost. However, interconnect density is limited, since both wire bond pads and solder bumps



Figure 2.4: Imec's 3-D SIP wireless bio-electronic sensor [67].



Figure 2.5: 3D wafer-level-packaging.

take up a large area on silicon due to their large pitch. The long bond wire length can also significantly distort signals due to its intrinsic parasitics. Hence, this approach offers limited performance when compared to other <sub>3</sub>D interconnect solutions. An example of <sub>3</sub>D-SIP application is the wireless bio-electronic sensor (Figure 2.4) developed at Imec [67].

<sub>3</sub>D-WLP integration is based on WLP technology, directly stacking individual dies face-to-face using flip-chip micro-bumps for interconnection. The <sub>3</sub>D interconnects are processed after IC passivation using WLP techniques and may also be realised using TSVs (Figure 2.5). The TSVs, if existing, are much larger than the equivalent interconnections in a <sub>3</sub>D-SIC, typically in the range of  $20 - 40\mu m$  ( $1 - 5\mu m$  in <sub>3</sub>D-SIC) [11].

The 3D-SIC technology [69] is similar to a 2D SoC approach, except that circuits exist physically on different layers and vertical interconnects replace the global interconnections of the SoC. 3D-SICs (Figure 2.6) are based on TSV interconnections and have the potential to offer the best all around<sub>3</sub>D integration technology. The advantages of TSV-based 3D-SICs include the short interconnect length (reduced parasitics), and the high interconnection density. However, the additional process steps required for implementing the TSVs increase manufacturing cost.

A comparison between the discussed <sub>3D</sub> interconnect technologies is attempted in Table 2.1.





| 3D interconnect<br>technology             | Advantages                                                                                     | Disadvantages                                                                              |
|-------------------------------------------|------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|
| 3D system-in-<br>package (3D-SIP)         | Simple, existing<br>technology, best<br>heterogeneous<br>integration                           | Performance<br>(speed, power), low<br>density,<br>manufacturing cost<br>in high-quantities |
| 3D wafer-level-<br>packaging (3D-<br>WLP) | Existing<br>technology, good<br>compromise<br>between<br>performance-<br>manufacturing<br>cost | Average<br>performance and<br>density                                                      |
| 3D stacked-<br>IC (3D-SIC)                | Performance<br>(speed, power),<br>high density,<br>manufacturing cost<br>in high-quantities    | Manufacturing cost<br>in low-quantities,<br>not ready for<br>manufacturing                 |

Table 2.1: 3D interconnect technologies comparison.



Figure 2.7: TSV process steps.

# 2.1.2 TSV interconnects

Of particular interest to this PhD thesis is the <sub>3</sub>D-SIC approach developed at Imec and specifically the TSVs used for vertical interconnection.

In Imec's <sub>3</sub>D-SIC process [69, 73], TSVs are created prior to the BEOL processing. The TSV process steps are illustrated in Figure 2.7. After processing of the CMOS front-end-of-line (FEOL) and pre-metal dielectric (PMD) stack, the TSV hole is plasma-etched with a diameter of  $5\mu m$  and a minimum pitch of  $10\mu m$ . A thin chemical vapor deposition (CVD) oxide layer (~ 200nm) is then deposited for insulating the TSVs from the substrate. The metal-lization sequence consists of a tantalum nitride (TaN) barrier deposition and filling of the TSV hole with electroplated copper (Cu). Finally, the Cu overburden is removed in a top-side CMP step and a standard 2-metal layer BEOL process is applied to finalise the die.

Before stacking, the wafer is mounted on a temporary carrier and thinned down to a silicon (Si) thickness of ~  $25\mu m$ , leaving the TSV Cu exposed on the back-side of the wafer (Figure 2.8). After dicing, the dies are stacked by Cu-Cu thermocompression in a die-to-die (D<sub>2</sub>D) or die-to-wafer (D<sub>2</sub>W) fashion (Figure 2.9) using a tacky polymer as an intermediate glue layer, such as benzocyclobutene (BCB) ( $\epsilon_r = 2.7$ ). Target technology parameters for Imec's <sub>3</sub>D-SIC 130*nm* and 65*nm* processes are summarised in Tables 2.2 and 2.3 respectively.

As TSV is a relatively new <sub>3</sub>D interconnect technology, there are several challenges to be overcome before TSV technology can be widely adopted and commercialised. Issues in the fabrication, like etching, liner deposition, metallization, and CMP steps, are in the process of being resolved, or solutions have already been found [55]. On the contrary, design methodologies for TSV-based ICs are still at their infancy stage [18].



Figure 2.8: Before thinning/after thinning.



Figure 2.9: TSV stacking.

| Minimum device<br>channel length ( <i>nm</i> ) | 130 |
|------------------------------------------------|-----|
| Supply voltage (V)                             | 1.2 |
| Routing layers                                 | 2   |
| TSV diameter (µm)                              | 5   |
| TSV pitch (µm)                                 | 10  |
| TSV length (µm)                                | 22  |
| TSV resistance $(m\Omega)$                     | ~20 |
| TSV capacitance<br>( <i>fF</i> )               | ~40 |
| Wafer diameter ( <i>mm</i> )                   | 200 |

| Minimum device<br>channel length ( <i>nm</i> ) | 70  |
|------------------------------------------------|-----|
| Supply voltage (V)                             | 1.0 |
| Routing layers                                 | 2   |
| TSV diameter ( $\mu m$ )                       | 5   |
| TSV pitch (µm)                                 | 10  |
| TSV length (µm)                                | 40  |
| TSV resistance $(m\Omega)$                     | ~40 |
| TSV capacitance<br>( <i>fF</i> )               | ~80 |
| Wafer diameter<br>( <i>mm</i> )                | 300 |

| Table 2.2: | Tecl     | hnol | ogy pa | aramet- |
|------------|----------|------|--------|---------|
|            | ers      | for  | Imec   | 3D130   |
|            | process. |      |        |         |

Table 2.3: Technology parameters for Imec 3D65 process.

Approaching the technology challenges from the designer's perspective, we can emphasise on the following issues: design tools, thermomechanical stress, thermal analysis, electrical characteristics, and testing.

#### Design tools

Although it is possible to design <sub>3</sub>D-ICs using existing tools by utilising ad hoc techniques, as <sub>3</sub>D implementations become more advanced "native" <sub>3</sub>D design and analysis tools will become necessary. A few physical layout tools are currently available for TSV-based design [2], however they do not offer automatic partitioning, placement, and routing. Moreover, there are no available commercial tools for evaluating performance (speed, power), reliability, and manufacturability in <u>3</u>D-IC designs.

# Thermomechanical stress

TSV fill materials may include copper (Cu), tungsten (W), or polysilicon[20]. The large mismatch in the coefficient of thermal expansion (CTE) between the fill material of the TSV  $(17 \cdot 10^{-6}/K)$  and the Si surface  $(3 \cdot 10^{-6}/K)$ , induces mechanical stress around the vias. As Cu contracts much faster than Si, it pulls the surface of the surrounding bulk while cooling down, causing tensile stress in the area [25]. Since even low stress values affect carrier mobility in MOSFET devices [71], performance shift of the transistors in proximity to TSVs may occur.

The intensity of TSV-induced stress depends on various conditions such as, TSV geometry, device proximity, and process characteristics. Defining design rules, such as keep-out-zones (KOZs), for digital and analog circuits and generating stress-effect models is an important milestone for implementing TSV-aware circuits with predictable performance. Simulations predict the complex distribution of thermomechanical stress [40], which emphasises the need for sophisticated characterization structures that can monitor stress impact with precision and verify simulation models. In Chapter 4, a characterization structure is presented for evaluating the impact of TSV-induced thermomechanical stress on MOSFET device performance.

#### Thermal analysis

The close proximity of dies in a <sub>3</sub>D-IC vertical stack increases the heat density proportionally to the number of stacked dies. As temperature affects transistor parameters, such as leakage current, accurate thermal analysis is required for <sub>3</sub>D-ICs to ensure that design specifications are met. Presently, tools for thermal analysis of planar ICs are available, however they have to be adapted for use with complex <sub>3</sub>D systems, so as to reduce run time and improve the accuracy of TSV models [18]. In addition, "thermally-aware" design tools are required, which can partition the design in such a way that high heat generating components are placed on layers in proximity to the heat-sink.

## Electrical characteristics

The electrical characteristics of TSVs such as, delay, crosstalk, signal integrity (SI), and power integrity (PI), dictate many aspects of a <sub>3</sub>D-IC design. Proper evaluation of TSV characteristics, typically through simulation means, is critical for accurately predicting the performance of a <sub>3</sub>D system [57].

Several attempts have been made to generate models of the TSV structure [77, 27], using its intrinsic resistance-inductancecapacitance (RLC) parasitics in an equivalent circuit model. The RLC parasitics depend on the materials and geometry of the TSV structure, while prior work has indicated that these parameters are relatively small, with the capacitance assuming the dominant role on TSV electrical properties [27]. For verifying the accuracy and calibrating the models, experimental measurements are required using compatible characterization structures. Such structures are presented in Chapter 3, which were used for extracting the TSVs parasitic capacitance of an experimental 3D process.

## Testing

Testing ICs for defects is crucial for guaranteeing a sufficient manufacturing quality. The unique processing steps of TSVs such as, wafer thinning, alignment, and bonding, introduce new types of defects that should be considered during testing [46]. Furthermore, finding and diagnosing defects in TSV interconnections is considerably more challenging than 2D interconnects, considering that it is not possible to gain physical access to the vias after the top die has been placed. All these factors emphasise the need for new testing structures and methodologies for testing 3D systems.

#### 2.2 ENERGY-RECOVERY LOGIC

TSV 3D interconnect technology aims to reduce delay and energy dissipation when compared to the traditionally used planar BEOL interconnections. A key TSV characteristic is its parasitic capacitance, discussed in Chapter 3, which is typically small (Tables 2.2 and 2.3), influencing proportionally both delay and energy dissipation. Although TSV technology offers significant performance improvements to previous solutions, a low-power interconnection scheme may still be desirable under certain conditions due to specific characteristics of TSV technology:

- 1. The TSV capacitance may become a concern in future large, densely interconnected <sub>3</sub>D-SoCs, as it increases linearly with the number of tiers and interconnections.
- 2. Since interconnects do not follow transistor scaling, transistor gate capacitance will always be smaller than interconnect capacitance. With the gap tending to widen in future technologies, it can be reasonably expected that <sub>3</sub>D interconnects will become a performance bottleneck in ICs, just as the BEOL interconnections preceding them.

Furthermore, conventional CMOS low-power design techniques based on voltage scaling represent a technical challenge in modern technology nodes, as process, voltage, temperature (PVT) variations and increased leakage [61] decrease in practice the range over which voltages may be scaled. An interesting low-power design approach is energy-recovery logic, which demonstrates frequency-dependent energy dissipation. Energy-recovery logic is a design technique for minimising the energy dissipation of computing devices well below typical CMOS limits. Its major limitation is the requirement for passive components to generate time-variable signals, which cannot be efficiently integrated into current generation ICs. TSV-based <sub>3</sub>D stacking and heterogeneous integration of high-quality passive devices [66] may enable the use of novel design techniques, such as energy-recovery, that



Figure 2.10: Conventional AND gate.

were not practically feasible in the past due to technology limitations. In this PhD thesis, energy-recovery is used for driving TSV-based interconnections in a low-power scheme discussed in Chapters 5 and 6.

Historically, energy-recovery logic has its roots in reversible computing. Reversibility is a concept from thermodynamics describing physical processes that, after they have taken place, can be reversed without loss or dissipation of energy. Reversible processes cause no change in the entropy of the system or its surroundings (do not generate any waste heat), therefore they achieve maximum energy efficiency. The concept of reversibility in thermodynamics may be extended to logical reversibility, if a computation can be implemented in such a way, that it is always possible to determine the previous state of the computation given a description of its current state. In practical terms, logical reversibility would require that all information about the state of the computation, in any stage, is retained, and never discarded.

Yet, the typical computing device is not reversible (irreversible) and as Landauer [36] has shown, there is a fundamental minimum amount of energy that must dissipated in such devices. This minimum energy dissipation is known as the Von Neumann - Landauer limit and is at least  $kT \ln 2$  per irreversible bit computation. The irreversible operation of conventional computing can be demonstrated using the AND gate as an example (Figure 2.10). The AND gate has two input lines and one output. When both inputs are at logical 1, the output is 1, however the output is 0 in all of the remaining three combinations that have a logical 0 on the inputs. Hence, every time the gate's output becomes 0, there is loss of information, since we cannot determine exactly which input state caused the logical 0 at the output.

Computations do not necessarily have to be irreversible, as was shown by Bennett [9], and thus it is still possible to break the  $kT \ln 2$  limit. A reversible computing device would have to

perform each computation twice, the first time in the forward direction, applying the inputs to a logic function and obtaining the result, and the second time backwards, computing the inverse function and returning the system to its initial state. If a computation could be performed in such a way, and implemented on non-dissipative hardware, its energy requirements could be potentially reduced to zero [10].

In practice, even if a circuit is designed to be logically reversible, achieving zero-energy computations utilising existing electrical circuits is limited, considering that any process merely involving charge transfer will result in dissipated energy. Nevertheless, it is still possible to reduce the energy dissipation during charge transferring to "asymptotically zero" levels, applying circuit design techniques generally known as adiabatic switching [8, 31].

The operating principle of adiabatic switching can be best explained by comparing it to conventional CMOS switching. In a conventional CMOS circuit (Figure 2.11), when the switch closes (typically the pull-up network) a certain amount of charge  $Q = CV_{DD}$  is pulled out of the positive power supply rail that charges the load capacitance *C* up to  $V_{DD}$ . The energy transferred from the power supply during the charging period (*T*) is [59]:

$$E = QV_{DD} = CV_{DD}^2 \tag{2.1}$$

From the total transferred energy only half  $(\frac{1}{2}CV_{DD}^2)$  is stored on the capacitor *C*, while the other half is dissipated on the switch and interconnect resistance *R*. As can be derived from Equation 2.1, to reduce the energy dissipation in conventional CMOS, either the supply voltage  $V_{DD}$  has to be reduced, or the load capacitance *C*.

The adiabatic charging of an equivalent node capacitance *C* is modelled in Figure 2.12. In contrast to the conventional CMOS circuit, the supply voltage in this case is not constant, but time-variable with a slow rising time ( $T \gg RC$ ). If the charging period of the capacitor (or rising time of the voltage supply) is *T*, then it can be proven [82, 60] that the energy dissipated on the resistance *R* would be proportional to:

$$E_{Adiabatic} \propto \left(\frac{RC}{T}\right) C V_{DD}^2$$
 (2.2)

The distinct advantage of adiabatic switching is that the slow rising time of the supply voltage, keeps the voltage difference (and thus current flow) on the resistance *R* low at all times. As the charge transfer is spread more evenly over the entire charging period, the energy dissipation ( $E = I^2 R$ ) is greatly reduced. In addition, as can be derived from Equation 2.2, increasing the charging period (*T*) can potentially reduce the energy dissipation to arbitrary low levels irrespective of the  $V_{DD}$  or *C* parameters.



Figure 2.11: CMOS switching.

Figure 2.12: Adiabatic switching.

| <b>Conventional CMOS</b>                                                    | Energy-recovery CMOS                                          |  |
|-----------------------------------------------------------------------------|---------------------------------------------------------------|--|
| 2 states: True, False                                                       | 3 states: True, False, Off                                    |  |
| Energy of information bit<br>dissipated as heat on the<br>pull-down network | Energy of information bit<br>recovered by the<br>power-supply |  |
| High speed                                                                  | Low/medium speed                                              |  |
| High energy dissipation                                                     | Very low energy dissipation                                   |  |
| Area efficient                                                              | Some area overhead                                            |  |

Table 2.4: Comparison energy-recovery/conventional CMOS.

Combining reversible computations with adiabatic switching techniques allows for circuit implementations with extremely low energy dissipation compared to conventional designs [37, 38]. However, fully reversible circuits in CMOS technology suffer from high area overhead, as the inverse logic function has to be implemented and intermediate logic states must be stored to enable reversibility of the computation [8]. In cases where the overhead of full reversibility is too high, it can be significantly reduced by allowing some information to be occasionally discarded without sacrificing much of the energy benefits. These circuits that partially apply the principles of reversible computing and adiabatic switching to achieve low, but not zero, energy dissipation are called energy or charge recovery [31].

A comparison between energy-recovery and conventional CMOS is summarised in Table 2.4.



Figure 2.13: Basic 2N-2P adiabatic Figure 2.14: 2N-2P timing. circuit.

#### 2.2.1 Energy-recovery architectures

Several energy-recovery circuit architectures have been proposed in the literature over the past years. Examples of logic families are, 2 nMOS/2 pMOS adiabatic logic (2N-2P) [33], efficient charge recovery logic (ECRL) [50], pass-transistor adiabatic logic (PAL) [54], clocked adiabatic logic (CAL) [45], quasi-static energy recovery logic (QSERL) [83], and boost logic [62]. These architectures apply the principles of reversible computing and adiabatic switching to a variable degree, while their differences are mainly related to characteristics such as: single/dual-rail logic style, number of clock phases, charging/discharging path, and energy efficiency.

A simple configuration is the  $_{2N-2P}$  family of logic circuits (Figure 2.13), which is based on the differential cascode voltage switch logic (DCVSL) circuit [59]. The timing diagram of the  $_{2N-2P}$  circuit is presented in Figure 2.14, using an idealised adiabatic voltage supply. Initially, the voltage supply is in the WAIT phase (LOW), keeping the outputs in the LOW state. Then the inputs are set complementary to each other, and the supply voltage ramps up. As the inputs are evaluated, the output that was enabled by the pull-down network follows the power supply until it reaches  $V_{DD}$ . At that moment the inputs return to the LOW state, and after a certain period of time in the HOLD "1" phase, the voltage supply ramps down discharging the output. As can be observed, the average voltage supply during the computation cycle is well below  $V_{DD}$ , which can be advantageous in modern technology nodes where leakage currents are of a concern.

The adiabatic charging/discharging takes place during the ramping up/down of the EVALUATE/RESET phases, while the reverse computation occurs during the RESET phase, when the transferred charge is returned to the power supply. However, the reversibility of the circuit is only partial, since the p-channel



Figure 2.15: Adiabatic driver.

MOSFET (PMOS) transistor that was turned-on by the pull-down network will be forced to turn-off when the output is approximately a threshold voltage above zero. This will cause some charge to be trapped on the output, which will be dissipated as heat on a subsequent cycle. Using the 2N-2P circuit, arbitrary logic functions may be implemented in the place of the pull-down network.

The energy efficient characteristics of energy-recovery logic can be also beneficial for driving large load capacitances. A circuit that may be used in such case is the adiabatic driver/amplifier (Figure 2.15) [8].

The input of the adiabatic driver is dual-rail encoded, as well as the output. The operation principle is straightforward. First the inputs are set to valid values and then an adiabatic time-variable signal goes through the pass-gate, charging the load capacitance at the output to its maximum value. The output is now valid and can be used by other stages to perform computations. Subsequently the output slowly discharges returning the charge back to the power-supply. Energy is saved by two methods. Firstly, the slow-rising adiabatic signal forces a low voltage difference between the inputs of the pass-gate, resulting in low current and thus low energy dissipation. Secondly, the energy that is not dissipated as heat during the transfer is returned back to the power-supply, which can be re-used in the next computation cycle.

Circuits utilising the energy-recovery technique have been successfully implemented in the past, while they have demonstrated great potential whilst driving highly-parasitic global interconnects, such as clock distribution networks [42, 63].

# 2.2.2 Power-clock generators

Energy-recovery circuits require specialised power supplies that can generate multiple-phase time-variable adiabatic signals and



Figure 2.16: Stepwise charging generator.

recover charge. Since the power supply in an energy-recovery circuit provides with both the power and clock signal, they are referred to as power-clock generators (PCGs) [6]. The power-clock generators fall into two categories: staircase, and resonant generators.

Staircase generators use the stepwise charging technique [68]. The basic circuit (Figure 2.16) is composed of N number of tank capacitors ( $C_T$ ) that are successively discharged in N steps, until the load capacitance ( $C_{LOAD}$ ) is charged to the maximum voltage level. This technique attempts to "emulate" the adiabatic charging/discharging signal, while its linearity depends on the number of charging steps N. The advantage of staircase generators is that they can be easily integrated on-chip, since they do not require inductors as is the case with resonant generators. However, both speed performance and energy efficiency are relatively low.

Resonant generators [44] are based on inductance-capacitance (LC) oscillators. A simple circuit is the 1N single-phase power clock generator of Figure 2.17. If *R* is the resistance of the adiabatic driver in Figure 2.15, then the load capacitance ( $C_{LOAD}$ ) along with a resonant-tank inductor (*L*) form an RLC oscillator resonating at:

$$f = \frac{1}{2\pi\sqrt{LC_{LOAD}}} \tag{2.3}$$

As typical for RLC oscillators, the shape of the power-clock is a sinewave, which approximates the idealised waveform of Figure 2.14. Energy is transferred adiabatically during the rising half of the sinewave, while on the falling half the energy is



Figure 2.17: 1N power clock generator.

partially recovered and stored in the inductor for later use. RLC oscillators are very energy efficient power-clock generators and can operate at high speeds. However, inductors may be difficult to implement on-chip.

#### 2.3 FULL-CUSTOM DESIGN FLOW

All circuits presented in this thesis were designed using the fullcustom approach since the target fabrication technology was an experimental <sub>3D</sub> process, for which, neither standard cells nor dedicated <sub>3D</sub> design tools were in our disposal.

In the full-custom design methodology, each individual transistor and associated connectivity in the integrated circuit is manually specified in the layout level by the designer. Full-custom is considered to be superior to standard-cell design, when the main concern is circuit performance and area efficiency. Furthermore, full-custom design approaches are essential in cases where specialised circuitry are to be designed, or when standard cell libraries are not available.

The electronic design automation (EDA) environment that was used, was based on Cadence<sup>1</sup> applications and tools as integrated in the Design Framework II (DFII version 5.1.41). Cadence DFII consists of tools for design management (library manager), schematic entry (Virtuoso schematic), physical layout (Virtuoso layout), verification (Assura), and simulation (Spectre). These tools are completely customisable supporting various process technologies through foundry-specific design kits. In our case,

<sup>1</sup> Cadence Design Systems, Inc (http://www.cadence.com/)
the Assura tool was replaced by Mentor Graphics<sup>2</sup> Calibre for the verification step.

Our customised design flow is illustrated in Figure 2.18. Initially, the circuit is captured at the transistor-level using the schematic editor (Figure 2.19) and the completed design is simulated and modified until it conforms to the design specifications. The same circuit is re-created in the layout editor (Figure 2.20) where the detailed geometries and positioning of each fabrication mask layer is described. The layout must conform to a series of process design rules and any violations of these rules can be detected during the design rule check (DRC) step. Subsequently the layout design is translated to a circuit description (netlist), which is compared to the initial circuit in the schematic editor during the layout versus schematic (LVS) step. If the two designs match, the interconnectivity and transistor parasitics are estimated from the layout design, and another netlist is created which includes these parasitics. This detailed netlist can be simulated to get a more accurate estimation of the circuit behaviour, including the influence of parasitics.

#### 2.4 SUMMARY

In this chapter, the reader was introduced to the basic concepts and terminology that are relevant in the context of this thesis. The following chapter begins the discussion of the research part of this work, presenting a group of test structures that were used for the electrical characterization of the TSV parasitic capacitance in an experimental <sub>3D</sub> process.

<sup>2</sup> Mentor Graphics, Inc (http://www.mentor.com/)







Figure 2.19: Inverter in Virtuoso schematic.

Figure 2.20: Inverter in Virtuoso layout.

## 3.1 INTRODUCTION

In a semiconductor process technology, electrical characterization is crucial for testing device performance and reliability after going through the process fabrication steps. Electrical characterization verifies the electrical properties of devices to specifications, and provides feedback to the wafer fabrication process engineers for improving and optimising the process technology.

In a <sub>3</sub>D system, the electrical behaviour of the TSV can be modelled in terms of its resistance-inductance-capacitance (RLC) parasitics as discussed in 2.1.2 (page 12). Parasitic models are typically used for simulating circuit behaviour and they are critical for accurately predicting the performance of a <sub>3</sub>D system during design verification [57].

From the RLC parasitics associated with a TSV, capacitance assumes the most significant role on its electrical properties, such as energy dissipation, and delay [27]. The capacitance's role is so dominant, that for latency and signal integrity (SI) analysis the TSV behaviour can be modelled by considering the capacitance alone [21]. Thus, accurate characterization of the TSV capacitance is highly desirable when designing a <sub>3</sub>D system.

Typical laboratory capacitance measurement techniques based on LCR<sup>1</sup> meters require well controlled process technologies for realising a high number of devices without defects, which might not be possible at the current maturity level of TSV process technology. In this chapter, an alternative measurement technique is proposed for accurate on-chip characterization of the TSV parasitic capacitance. Test structures based on this technique are fabricated on a 65*nm* <sub>3</sub>D process (Table 2.3, page 11), and used for characterizing the TSV parasitic capacitance under various conditions. The experimental measurements are analysed using statistical methods, and the proposed test structures are evaluated.

#### 3.2 TSV PARASITIC CAPACITANCE

A typical isolated 2D back-end-of-line (BEOL) metal wire can be modelled as a conductor over a ground plane (Figure 3.1). Since the conductor is separated from the ground plane by a dielectric

LCR meter is a measurement equipment used to determine the inductance (L), capacitance (C), and resistance (R) of devices based on impedance measurements.



Figure 3.1: Isolated 2D wire capacitance.

material ( $\varepsilon_{ox}$ ), its electrical behaviour closely resembles that of a capacitor. The basic formula for calculating a parallel plate capacitance is:

$$C = \frac{\varepsilon_{ox}}{h} wl \tag{3.1}$$

In practice, calculating the parasitic capacitance of a BEOL metal wire is significantly more complex, since fringing fields increase the total capacitance value [78]. Nevertheless, whether a 2D metal wire capacitance is calculated using Equation 3.1, or more complex formulas [85], its value is only dependent on geometrical characteristics and the dielectric material.

In that context,  $_{3}D$  interconnects in the form of TSVs do not behave as simple wires, but as devices. The TSV copper (Cu) is surrounded by silicon (Si) and these two materials are separated by a dielectric oxide liner (Figure 3.2). This structure resembles a metal-oxide-semiconductor (MOS) capacitor, and as has been shown by modelling and measurements [27], exhibits distinct accumulation, depletion and inversion regions (Figure 3.3). The TSV capacitance attains its maximum value ( $C_{ox}$ ) in the accumulation, when  $V_{bias} < V_{FB}$  (flatband voltage), and inversion regions ( $V_{bias} > V_{th}$ , f < 1kHz).

The bias voltage and frequency dependency of the TSV capacitance complicates both modelling and the experimental measurement of its parasitic capacitance. Furthermore, the MOS behaviour of TSVs is not desirable in circuits using <sub>3</sub>D interconnections. For that reason, special techniques are employed [28] that shift the TSV C-V characteristic in such a way that for typical circuit operating voltages  $(0 - V_{DD})$  the TSV capacitance is constant and at its minimum value (Figure 3.3).

## 3.3 INTEGRATED CAPACITANCE MEASUREMENT TECHNIQUES

Extracting the TSV parasitic capacitance requires the use of measurement techniques that are compatible with the particular TSV



Figure 3.2: TSV parasitic capacitance.



Figure 3.3: Capacitance vs. Gate voltage (CV) diagram of a MOS Capacitor.

characteristics as indicated in the previous section. The techniques commonly used for measuring integrated capacitances in a laboratory environment can be classified into two categories:

- Measurements based on external instruments; these can be direct capacitance measurements, LCR meter based [52], or RF S-parameter based measurements [39].
- On-chip capacitance measurements; examples are the reference capacitor technique [32], ring-oscillator based [26], and charge-based [15].

The disadvantage of using external instruments directly connected to the device under test (DUT) is that the measurement setup introduces additional parasitics, that have to be subtracted from the measurement result. Using standard measurement setups, significant noise is added when attempting to determine capacitances well below 1pF, due to the influence of the instrument parasitics [64]. In cases where the DUT capacitance is small, such as the femto-Farad TSV capacitance (Table 2.3), a large number of identical devices (typically in the hundreds) can be connected in parallel to increase the total capacitance value [75]. However, this method is only usable with process technologies that are well controlled and allow the realisation of a high number of devices without defects. In general, the extracted capacitance of the whole population may not be representative of a single device [56], and as a result individual device measurement accuracy may suffer.

On-chip techniques can achieve better accuracy when the DUT capacitance is small. Although DUT yield is not critical, the process technology should allow the integration of active devices and their parameters should be known to a reasonable degree. A simple, yet, high resolution technique is the charge-based capacitance measurement (CBCM), which can be used to extract capacitances down to the atto-Farad range [15].

In the CBCM technique, the DUT capacitance ( $C_{DUT}$ ) is charged/discharged at frequency f using a pseudo-inverter (Figure 3.4), while the average charging current ( $I_m$ ) is measured with a laboratory ammeter. Since transistor and wiring parasitic capacitances are in parallel with the DUT capacitance, a second (reference) inverter is used to measure these parasitics ( $I_{ref}$ ), which are then subtracted from the calculated result. The DUT capacitance is calculated as follows:

$$C_{DUT} = \frac{I_m - I_{ref}}{V_{DD} \cdot f} \tag{3.2}$$

The accuracy of the CBCM technique can be further increased by measuring the charging current over a range of frequencies. The DUT capacitance can be then calculated from the slope of



Figure 3.4: CBCM pseudo-inverters.

 $I_{net}$  ( $I_m - I_{ref}$ ) (Figure 3.5). The accuracy is greatly improved as pseudo-inverter leakage current is removed from the extracted result (slope is not affected by constant biases), and also defective DUTs can be detected by inspecting the linearity of the current/-frequency plot.

The CBCM technique has been previously used for measuring BEOL metal interconnect parasitic capacitances [15], and MOSFET capacitances [64]. In this work CBCM is proposed for the on-chip measurement of the TSV parasitic capacitance. CBCM is compatible with the TSV parasitic capacitance on the premise that the voltage and frequency ranges of CBCM are selected such that the TSV is forced into a region where its capacitance is constant. As was shown in the previous section (Figure 3.3) TSV capacitance is constant for typical circuit operating voltages, while the frequency dependency of the capacitance becomes significant only in the multi-*MHz/GHz* range [39]. Therefore, operating the CBCM pseudo-inverters in the  $0 - V_{DD}$  range and below 1MHz should be adequate to avoid the MOS behaviour of the TSV.

#### 3.4 ELECTRICAL CHARACTERIZATION

A group of test structures based on the CBCM technique were implemented for on-chip characterization of the TSV capacitance on Imec's  $65nm_{3D}$  process technology (Table 2.3). As it is common in IC design, maximising the utilisation of the available chip area is one of the priorities of the design effort. For that reason, supplementary test structures were designed and implemented



Figure 3.5: Current as a function of frequency in CBCM.

on the same chip that went beyond the core objectives of this work.

The complete design consisted of 6 test structures, 5 for extracting the TSV capacitance under various conditions based on CBCM, and 1 additional experiment for direct measurement of the RC constant. All test structures are presented in this section, however due to equipment and material availability only the most significant were measured and analysed.

#### 3.4.1 *Physical implementation*

In the CBCM technique the extraction accuracy of the TSV capacitance is highly dependant on the design of the pseudo-inverters. The 'reference' pseudo-inverter in CBCM is used for measuring both the wire parasitic capacitance, and the inverter parasitics in the form of diffusion and gate-drain overlap capacitances. Therefore, it is critical that the two pseudo-inverters are identical in terms of parasitics, as any mismatches will appear in the extracted measurement result.

Transistor mismatch is mostly influenced by process technology parameters, however it is possible to minimise process variations, such as in the poly gate length, by applying special design techniques. A major cause of gate length variation is the polysilicon pitch to neighbours, and one way to mitigate that effect is to fill unused space between transistor gates with polysilicon dummies [13]. This technique was widely used in the CBCM pseudo-inverter design presented in this work (Figure 3.6). Concurrently, the inverter transistors were designed with relatively large dimensions for the implementation process technology (P:10 $\mu$ m/1 $\mu$ m, N:5 $\mu$ m/1 $\mu$ m), so as to minimise the effect of variation as a percentage of the transistor parasitics.



Figure 3.6: CBCM pseudo-inverter pair in layout.

Imec's 65*nm* <sub>3</sub>D process allows for 2-tier vertical stacking of dies (*TOP* and *BOTTOM*), with the TSV passing through the *TOP* die Si substrate as illustrated in Figure 3.2. The I/O for each test structure was accessible through a standard  $2 \times 12$  test pad module on the *TOP* die compatible with our wafer probe test system. The  $2 \times 12$  test pad module (PRC1) along with the complete test structure design is illustrated in Figure 3.7, while a micrograph image of the fabricated chip can be seen in Figure 3.8.

The CBCM pseudo-inverters were implemented on either *TOP* or *BOTTOM* tier, while the TSVs were connected to the inverters from *TOP* or *BOTTOM* and arranged in 4 configurations (1,  $1 \times 2, 2 \times 2, 2 \times 5$ ) with different pitch distances. The 6 test structures available on the PRC1 test pad module are summarised in Table 3.1. A detailed description of the test structures ('A', 'B', 'C', 'D', 'R' and 'E') follows in the paragraphs below.

## Structure 'A': Single TSV capacitance (TOP die)

The purpose of test structure 'A' (Figure 3.9) is to extract the capacitance of a single TSV when it is electrically accessed through the *TOP* die. The structure is composed of 3 pseudo-inverters connected to the TSVs using identical geometry Metal-2 layer wires, whose capacitance is also measured (Reference). In addition to



Figure 3.7: PRC1 test pad module (layout). Figure 3.8: PRC1 test pad module (micrograph).

| Structure               | A    |     | В    |      |      |      |
|-------------------------|------|-----|------|------|------|------|
| TSVs                    | -    | 1   | 2X5  | -    | 1    | 2x5  |
| Inverter<br>location    | ТОР  | ТОР | TOP  | BTM  | BTM  | BTM  |
| TSV<br>connection       | TOP  | ТОР | TOP  | BTM  | BTM  | BTM  |
| Multi-TSV<br>connection | -    | -   | TOP  | -    | -    | BTM  |
| Multi-TSV<br>pitch (µm) | -    | -   | 15   | -    | -    | 15   |
| Pad name                | AREF | Aı  | A10  | BREF | B1   | B10  |
| Structure               |      | С   |      |      | D    |      |
| TSVs                    | 2X2  | 2X2 | 2X2  | 2X2  | 1X2  | 1X2  |
| Inverter<br>location    | TOP  | ТОР | TOP  | TOP  | ТОР  | ТОР  |
| TSV<br>connection       | ТОР  | ТОР | TOP  | TOP  | ТОР  | ТОР  |
| Multi-TSV connection    | TOP  | TOP | TOP  | TOP  | ТОР  | BTM  |
| Multi-TSV<br>pitch (µm) | 10   | 15  | 20   | 50   | 20   | 20   |
| Pad name                | C10  | C15 | C20  | C50  | DTOP | DBTM |
| Structure               | R    |     | E    |      |      |      |
| TSVs                    | 1    | 1   | -    | 1X2  | 1X2  |      |
| Inverter<br>location    | TOP  | ТОР | TOP  | TOP  | ТОР  |      |
| TSV<br>connection       | ТОР  | ТОР | -    | TOP  | ТОР  |      |
| Multi-TSV connection    | -    | -   | -    | TOP  | BTM  |      |
| Multi-TSV<br>pitch (µm) | -    | -   | -    | 20   | 20   |      |
| Pad name                | R1   | R2  | EREF | EC   | ERC  |      |

Table 3.1: Test structures on the PRC1 module.



Figure 3.9: Structure 'A' - Single TSV capacitance (TOP die).

the single TSV DUT, the structure also contains 10 parallel TSVs arranged in a  $2 \times 5$  array with a  $20\mu m$  pitch. Since the suitability of the CBCM method for extracting the small TSV capacitance was undetermined during design time the  $2 \times 5$  array was included as a precaution, effectively increasing the DUT capacitance an order of magnitude.

## Structure 'B': Single TSV capacitance (BOTTOM die)

The design of structure 'B' is similar to the structure 'A', as described above. The only difference is that the CBCM pseudoinverters are implemented on the *BOTTOM* die (Figure 3.10), while connectivity to the TSVs is provided using a wire on the Metal-2 layer of the *BOTTOM* die. The purpose of this structure is to examine whether the perceived TSV capacitance varies when it is accessed from the *BOTTOM* side.

#### Structure 'C': Variable pitch TSV capacitance

The use of minimum TSV pitch in designs is highly desirable so as to minimise total chip area. The purpose of test structure 'C' (Figure 3.9) is to investigate whether TSV pitch has an effect on the parasitic capacitance. It has been previously reported in the



Figure 3.10: Structure 'B' - Single TSV capacitance (BOTTOM die).

literature that TSV proximity to other TSVs may affect its parasitic capacitance, probably through an interaction with the oxide liner [23]. Test structure 'C' aims to verify the presence of the pitch effect on the  $65nm_{3D}$  process.

The CBCM pseudo-inverters are implemented on the *TOP* die (Figure 3.11), while the TSVs are connected in parallel in groups of 4 with variable pitch distances at  $10\mu m$ ,  $15\mu m$ ,  $20\mu m$ , and  $50\mu m$ . All groups connect to the pseudo-inverters using identical length wires. Since wire parasitics are not measured, their value is not extracted from the measured TSV capacitances, thus the results are relative.

# *Structure 'D': TSV capacitance with interconnection on TOP or BOT-TOM die*

The purpose of structure 'D' (Figure 3.12) is similar to structure 'B', which is to examine whether the perceived TSV capacitance changes when it is accessed from the *BOTTOM* die. However, in this structure the CBCM pseudo-inverters are implemented on the *TOP* die, in the case the front-end-of-line (FEOL) active devices of the *BOTTOM* die were not capable of operating properly after fabrication and stacking.



Figure 3.11: Structure 'C' - Variable pitch TSV capacitance.



Figure 3.12: Structure 'D' - TSV capacitance measurement with interconnection on TOP or BOTTOM die.



Figure 3.13: Structure 'R' - Evaluation of TSV capacitance measurement accuracy.



Figure 3.14: Structure 'E' - TSV capacitance effect on signal propagation delay (layout).

#### *Structure 'R': Evaluation of TSV capacitance measurement accuracy*

Structure 'R' (Figure 3.13) aims to evaluate the accuracy with which CBCM can extract the capacitance of two nearby TSVs. The two pseudo-inverters, wires, and TSVs are identical, thus theoretically they should give identical results. Any variance in the result should be mainly due to transistor variability.

#### Structure 'E': TSV capacitance effect on signal propagation delay

Structure 'E' (Figure 3.14) allows the direct measurement of the delay caused by the TSV on signal propagation. The signal propagation delay is measured under two conditions: When the signal passes through the TSV (which includes the effect of resistance on delay), and when only the TSV capacitance is in the path of the current (Figure 3.15). The contribution of wire/transistor parasitics is also measured, so as to be subtracted from the final result.



Figure 3.15: Structure 'E' - TSV capacitance effect on signal propagation delay (schematic).



Figure 3.16: Measurement setup.

## 3.4.2 Measurement setup

The presented test structures were fabricated on 300mm wafers (Figure A.1), each containing 472 identical dies. For I/O access, a general purpose on-wafer measurement system was used that allowed direct contact with the wafer and the  $2 \times 12$  test pad module through probecard pins (Figure A.2). All I/O signals were provided by standard laboratory instruments controlled through the general purpose interface bus (GPIB) port of a PC running a customised LabVIEW<sup>2</sup> program. The measurement setup is illustrated in Figure 3.16.

The voltage source of the CBCM pseudo-inverters was set at 1V and the CBCM pulse frequencies were swept in the range 100kHz - 1MHz. For each measured test structure the DC current measurement was stored in non-volatile memory and sub-

<sup>2</sup> LabVIEW is a visual programming language from National Instruments for automating measurement equipment in a laboratory setup.

| Test pad             | Direction | Instrument connection                            |
|----------------------|-----------|--------------------------------------------------|
| VDD                  | Power     | Power supply - 1V                                |
| GND                  | Power     | Power supply - oV                                |
| SET                  | Input     | Pulse generator<br>(Non-overlapping with RESET). |
| RESET                | Input     | Pulse generator<br>(Non-overlapping with SET).   |
| A[REF, 1, 10]        | Input     | SMU (Inverter source - 1V<br>supply).            |
| B[REF, 1, 10]        | Input     | SMU (Inverter source - 1V<br>supply).            |
| C[10, 15, 20,<br>50] | Input     | SMU (Inverter source - 1V<br>supply).            |
| D[TOP, BTM]          | Input     | SMU (Inverter source - 1V supply).               |
| R[1, 2]              | Input     | SMU (Inverter source - 1V supply).               |
| E[REF, C, RC]        | Output    | Oscilloscope (high-impedance).                   |

Table 3.2: PRC1 test pad instrument connections.

sequently the prober progressed to the next die, measuring all dies on the wafer. The I/O test pads of the PRC1 module (Figure 3.7) and the instrument connections are summarised in Table 3.2.

#### 3.4.3 Experimental results

There were two wafers available for measurements, named in this text 'D11' and 'D18', which were selected from the same lot. Each wafer contained 472 identical dies of the *TOP* <sub>3</sub>D tier, measured using the measurement setup described in 3.4.2. Measurement data were extracted only from test structures 'A' (Figure 3.9) and 'C' (Figure 3.11), which were analysed using standard statistical methods.

When utilising statistical methods to analyse data acquired in a laboratory environment, it is common for some data points to be significantly away from the other measurements. These data points are termed outliers (Appendix B) and usually result from measurement errors, such as bad probe pin contacts, or malfunctioning dies on the wafer. Since the following analysis is focused on TSV capacitance variability, these few outliers are excluded from the data set so as to improve analysis accuracy.

| Structure                                | 'A' (Fig. <u>3.9</u> ) |
|------------------------------------------|------------------------|
| TSV dimensions $(l/d)$                   | 40µm/5µm               |
| Oxide liner thickness                    | 200 <i>nm</i>          |
| Total dies                               | 472                    |
| Processed dies (excluding outliers)      | 454                    |
| Mean capacitance                         | 70.9 <i>f</i> F        |
| Standard deviation ( $\sigma$ )          | 1.9 <i>f</i> F         |
| Coefficient of variation $(3\sigma/\mu)$ | 8.0%                   |

Table 3.3: TSV capacitance D2D variability on wafer 'D11'.

The statistical descriptors used for analysing the measurement data are elaborated in Appendix B and include calculations of mean, standard deviation, and coefficient of variation. These parameters are used to evaluate the TSV capacitance in terms of die-to-die (D<sub>2</sub>D), wafer-to-wafer (W<sub>2</sub>W), and TSV pitch variability. The accuracy of the CBCM results is evaluated by comparing them to standard LCR measurements. Furthermore, the correlation between the extracted TSV capacitance and simulated oxide liner thickness is shown, signifying the predominant role of the oxide liner on the observed TSV capacitance variability.

#### Die to die (D<sub>2</sub>D) variability

The die-to-die (D<sub>2</sub>D) variability is evaluated by measuring the capacitance of the single TSV of structure 'A' (Figure 3.9) across all dies on the wafer. The measurements are repeated on both available wafers ('D11' and 'D18').

On wafer 'D11', 18 dies showed a capacitance value several standard deviations away from the calculated mean of the total population. These outliers indicated a measurement error, or malfunctioning on the die, and they were removed from the statistical analysis. In the remaining 454 dies, the mean capacitance of the experimental data was 70.9fF with a standard deviation ( $\sigma$ ) of 1.9fF (Table 3.3). Standard deviation is an important indicator allowing us to determine whether the variance in the experimental data is expected, or if there is a random error affecting the measurements. As can be seen in Figure 3.17, 70.6% of the measured values are within  $\pm \sigma$  of the mean capacitance value. Since most of the experimental data are within a standard deviation of the mean value, the variability observed is insignificant and expected, thus the measurement results can be considered valid.

On wafer 'D18', from the 472 total dies 58 were removed as outliers. The mean capacitance of the experimental data was 89.5 fF, with a standard deviation ( $\sigma$ ) of 2.2 fF (Table 3.4). The probability density function can be seen in Figure 3.18, with 69.1%



Figure 3.17: TSV capacitance D2D variability on wafer 'D11' - Probability density function.

| Structure                                | 'A' (Fig. <u>3.9</u> ) |
|------------------------------------------|------------------------|
| TSV dimensions $(l/d)$                   | 40µm/5µm               |
| Oxide liner thickness                    | 200 <i>nm</i>          |
| Total dies                               | 472                    |
| Processed dies (excluding outliers)      | 414                    |
| Mean capacitance                         | 89.5 <i>f</i> F        |
| Standard deviation ( $\sigma$ )          | 2.2 <i>f</i> F         |
| Coefficient of variation $(3\sigma/\mu)$ | 7.4%                   |

Table 3.4: TSV capacitance D2D variability on wafer 'D18'.

of the experimental data within  $\pm \sigma$  of the mean capacitance value.

## Correlation between TSV capacitance and oxide liner thickness variation

The TSV capacitance as extracted from the experimental measurements is plotted for wafers 'D11' and 'D18' (Figures 3.19 and 3.20). The capacitance value in the figures is normalised to the calculated mean, while x,y axis indicate die coordinates on the wafer. We observe that on both wafers the capacitance is near the mean value at the periphery of the wafer and reduces as we approach to the centre, indicating a common cause of variability on both wafers.

Parameters affecting the TSV capacitance are the geometry of the TSV (length, diameter), and oxide liner characteristics (dielectric constant, thickness). The variation of the oxide liner thickness across the wafer can be predicted by simulation means. Oxide liner simulation results for the specific process and fabrication



Figure 3.18: TSV capacitance D2D variability on wafer 'D18' - Probability density function.



Figure 3.19: TSV capacitance measurement results on wafer D11.



Figure 3.20: TSV capacitance measurement results on wafer D18.

tools can be seen in Figure 3.21. The liner thickness in the graph is normalised to the mean value, and x,y axis indicate die coordinates on the wafer. As can be observed, the oxide liner thickness is near the mean value at the periphery of the wafer and increases to a peak value when approaching the centre. Since the TSV capacitance is inversely proportional to the oxide liner thickness, there's a strong correlation between the two variations suggesting the oxide liner thickness to be the major cause of die-to-die (D<sub>2</sub>D) variability on wafers 'D11' and 'D18'.

#### Wafer to wafer (W2W) variability

Probability density functions for both wafers 'D11' and 'D18' are plotted in Figure 3.22. Comparing the two experimental data sets (Tables 3.3 and 3.4), we can see that although the main statistical descriptors have not changed much (standard deviation is slightly larger in D18), the mean capacitance is much larger in 'D18' (89.5*fF*) when compared to 'D11' (70.9*fF*). The difference is 26.2% capacitance increase from wafer to wafer. Taking into account that both wafers are from the same lot, this result indicates the need for better controllability on process parameters.



Figure 3.21: Simulated oxide liner thickness variation.



Figure 3.22: TSV capacitance W2W variability - Probability density functions.

| Structure                           | ′C′ (Fig                   | g. 3.11)         |
|-------------------------------------|----------------------------|------------------|
| TSV dimensions $(l/d)$              | 40µm/5µm                   |                  |
| Oxide liner thickness               | er thickness 200 <i>nm</i> |                  |
| Total dies 472                      |                            | 72               |
| Processed dies (excluding outliers) | 424                        |                  |
| Experiment                          | C10                        | C50              |
| Number of TSVs                      | 4                          | 4                |
| Mean capacitance                    | 407.3 <i>f</i> F           | 412.5 <i>f</i> F |
| Standard deviation ( $\sigma$ )     | 8.4 <i>f</i> F             | 7.4 <i>f</i> F   |
| Population within $\sigma$          | 69.5%                      | 65.3%            |

Table 3.5: Variable pitch TSV capacitance experimental data from wafer 'D18'.

## Variable pitch TSV capacitance

The variable pitch TSV capacitance was extracted using structure 'C' (Figure 3.11). TSVs with pitch distances at  $10\mu m$  and  $50\mu m$  were measured (experiments C10 and C50) on wafer 'D18'. From the 472 total dies, 424 were functional and the measurement results are summarised in Table 3.5. The extracted capacitance values represent the 4 parallel-connected TSVs, which also include the parasitics of Metal 2 connectivity wires.

As can be observed in the probability distribution functions (Figure 3.23) the measurement data for experiments C10 and C50 are very close to each other, and there is an overlap between their standard deviations. Even though there is an observable change in mean capacitance between the two experiments, the overlap of standard deviations does not allow to conclude whether the capacitance change is due to difference in pitch, or process variability.

To decide whether the change observed in capacitance is significant, we run a Kruskal–Wallis (Appendix B) one-way statistical analysis of variance in Minitab<sup>3</sup>. Assuming that the null hypothesis is true, we calculate a p-value of 0 indicating that the two data sets are different and that the pitch has indeed an affect the TSVs capacitance value. The pitch dependence of the capacitance with the same direction has been also previously observed on Imec's 130*nm* <sub>3</sub>D process [23].

#### Measurement accuracy of CBCM

The capacitance measurements extracted using the CBCM technique were compared to conventional LCR measurements (Figure 3.24) on wafer 'D18'. Comparing the two measurements, we

<sup>3</sup> Minitab is a program that implements common statistical functions. It is developed by Minitab Inc. (http://www.minitab.com).



Figure 3.23: Variable pitch TSV capacitance experimental data from wafer 'D18' - Probability density function.



Figure 3.24: LCR experimental data from wafer 'D18' - Probability density function.

observe in Table 3.6 that the standard deviations of the two data sets are similar, while mean capacitance is  $\sim 8\%$  smaller in the LCR method. However, the LCR method produced more outliers than CBCM. That was expected as LCR cannot extract the capacitance of a single TSV, but can only measure groups of TSVs connected in parallel (32 in our example), so as to increase the total measured capacitance. This increases the chance of measuring TSVs with defects, which in turn affects the accuracy of the LCR extracted capacitance.

#### 3.5 SUMMARY AND CONCLUSIONS

In this chapter, various test structures based on the CBCM technique were presented for the electrical characterization of the TSV parasitic capacitance. The test structures were implemented on a 65*nm* <sub>3</sub>D process technology and measured using a general purpose on-wafer measurement system. The measurement

| Method                                   | LCR             | CBCM            |
|------------------------------------------|-----------------|-----------------|
| Total dies                               | 472             | 472             |
| Processed dies (excluding outliers)      | 393             | 414             |
| Mean capacitance                         | 82.0 <i>f</i> F | 89.5 <i>f</i> F |
| Standard deviation ( $\sigma$ )          | 2.3 <i>f</i> F  | 2.2 <i>f</i> F  |
| Coefficient of variation $(3\sigma/\mu)$ | 8.6%            | 7.4%            |

Table 3.6: Comparison between CBCM/LCR measurements on wafer 'D18'.

results were statistically analysed in terms of TSV capacitance die-to-die (D<sub>2</sub>D) and die-to-wafer (D<sub>2</sub>W) variability. The calculated D<sub>2</sub>D variability was found to be reasonable, while a comparison to simulated data of the oxide liner thickness variation revealed the oxide liner as a major contributor to the observed TSV capacitance variability. However, D<sub>2</sub>W variability was extensive, and feedback was provided to the fabrication engineering team for further improvement of the process. The results also confirmed the effect of TSV pitch on the parasitic capacitance, as has been previously reported in the literature for a different process [2<sub>3</sub>].

The comparison of CBCM-based measurements to conventional LCR gave similar results, while in addition the CBCM technique was shown to produce less outliers (less defective samples) in the measurement data. Considering the simplicity of CBCM and that it has acceptable accuracy for extracting the capacitance of single TSVs, it can be a valuable tool for process characterization in the early research efforts of 3D integration.

The ability of CBCM for measuring single-ended TSVs could also prove beneficial for testing TSVs prior to die stacking, which is an critical step for keeping 3D-IC yield acceptable [47]. Extracting TSV parasitic capacitance can provide important information about the TSV device fabrication quality, since slight variations in process parameters will greatly affect its parasitic capacitance as was demonstrated in this chapter. Techniques based on RC constant measurement and sense amplification have been proposed in the literature for a similar role [16], however they are significantly more complex to implement than CBCM.

In the following chapter another critical TSV characteristic, thermomechanical stress, is evaluated using a test structure designed and implemented on the same  $65nm_{3D}$  process.

## TSV PROXIMITY IMPACT ON MOSFET PERFORMANCE

#### 4.1 INTRODUCTION

The integration of through-silicon via (TSV) <sub>3</sub>D interconnects in CMOS technology has raised concerns over the impact on active devices implemented on the same silicon (Si) substrate. The large coefficient of thermal expansion (CTE) mismatch between the TSV filling material and Si substrate has been shown to induce thermomechanical stress and deform the surface in proximity to the TSV. Since thermomechanical stress can have an effect on active device carrier mobility [71], it is essential that the stress impact on MOSFET performance is kept under control. Monitoring the TSV-induced thermomechanical stress effect on MOSFET device performance and generating keep-out-zone (KOZ) guidelines for digital and analog circuits, is an important milestone for designing TSV-aware circuits with predictable performance.

Simulations using finite element method (FEM) modelling [40] predict that TSV-induced thermomechanical stress has a complex distribution that depends on numerous conditions, such as TSV geometry, device type, channel orientation, and proximity. However, currently published experimental measurements on thermomechanical stress investigate just a small number of test-cases, typically limited to a single TSV/channel orientation/axis [53, 70, 81]. The complex distribution of thermomechanical stress impacts the need for sophisticated characterization structures that can monitor stress impact with precision and verify simulation models.

In this chapter, a test structure is presented for monitoring the impact of TSV-induced thermomechanical stress on MOSFET device performance. The test structure uses proven design techniques for implementing a large number of test-cases and accessing numerous devices with maximum precision. Furthermore, the test structure is fabricated on an experimental 65*nm* <sub>3</sub>D process (Table 2.3) and used for characterizing that process. Some of the presented circuits and results have previously appeared in the following publications [48, 49, 58].

### 4.2 TSV-INDUCED THERMOMECHANICAL STRESS

Individual TSV integration technologies use different conductive fill materials, which may include copper (Cu), tungsten (W), or

| Material | CTE at 20 °C $(10^{-6}/K)$ |
|----------|----------------------------|
| Si       | 3                          |
| Cu       | 17                         |
| W        | 4.5                        |

Table 4.1: Coefficients of thermal expansion.



Figure 4.1: TSV thermomechanical stress simulation [40].

polysilicon [20]. The coefficient of thermal expansion (CTE) mismatch (Table 4.1) between the fill material and the silicon (Si) can induce strain on the surface in proximity to the TSV. This deformation is caused by thermomechanical stress along the XY-plane during the cooling-down phase [25], due to the filling material's higher-than-Si contraction rate.

Simulations using finite element method (FEM) modelling show that the thermomechanical stress exhibits two-fold rotational symmetry, with tensile and compressive stresses concentrated on separate axis [40]. The thermomechanical stress distribution depends on TSV geometrical characteristics, as well as the interaction with other TSVs. Figure 4.1 illustrates the simulated stress distribution of a single TSV, while in Figure 4.2 the interaction between two near-by TSVs is shown. As can be observed, when multiple TSVs are in proximity to each other stress-free areas can be generated as a result of destructive interaction.

It is well known that stress can affect carrier mobility in MOS-FET devices. In fact, strained-silicon technology processes have been in use for years as means to enhance device performance in several logic families [71]. However, in TSV-based <sub>3</sub>D integration the induced strain is not intentional, since it is a characteristic intrinsic to the TSV process technology, and thus the resulting shift in device performance might not be always desirable. Characterization of the TSV impact on device performance and definition of design rules, such as keep-out-zones (KOZs), for digital and analog circuits is an important milestone for implementing TSV-aware circuits with predictable performance.

Simulation results [65] have shown that the mobility shift depends on various conditions, such as TSV geometry, device type,



Figure 4.2: TSV thermomechanical stress interaction between two TSVs [40].

proximity, and channel orientation. In Figure 4.3, the mobility shift of transistors between two TSVs at variable distances is simulated. As can be observed, the effect is positive and more intense on PMOS devices, while for n-channel MOSFET (NMOS) the mobility shift is mostly negative.

The mobility shift due to TSV proximity has been also confirmed by experimental data [53, 70, 81], and in most cases the performance shift was found to be within acceptable limits. However, currently published experimental results investigate a small number of test-cases typically limited to a single TSV/device orientation/axis. As was shown in Figures 4.1 and 4.2 and discussed in the text, the thermomechanical stress distribution is complex and depends on various factors. In order to evaluate the impact of TSV thermomechanical stress accurately and under different conditions more sophisticated test structures are required. Such approach is attempted in the test structure which is presented in the following sections.

#### 4.3 TEST STRUCTURE IMPLEMENTATION

TSV-induced thermomechanical stress can be compressive in some directions, tensile in others, and it's effect on device mobility may vary depending on MOSFET type/orientation/distance, or interaction with other TSVs. To accurately characterize the effect of TSV proximity on MOSFET device performance, a test structure was designed and fabricated on Imec's 65*nm* <sub>3</sub>D process which attempted to cover as many test-cases as possible.



Figure 4.3: Estimated mobility shift on PMOS (a) and NMOS (b) transistors in between two TSVs separated by distance d [65].

The test structure consisted of MOSFET arrays superimposed over TSVs, such that MOSFET devices surrounded the TSVs in all directions at variable distances. The purpose of these experiments was to indirectly observe the thermomechanical stress effect on transistor mobility by measuring the saturation current of MOSFET devices at various distances and directions in relation to the TSVs. Several configurations of TSVs and MOSFET arrays were investigated, so as to characterize the impact of single and multiple TSVs, as well as to observe the effect on different device types.

Stacked silicon dies were not fabricated, thus all devices were implemented on the *TOP* die Si substrate as illustrated in Figure 3.2 (page 25). The test structure input/output (I/O) was accessible through a standard  $2 \times 12$  test pad module on the *TOP* die, compatible with our wafer probe test system. In total two test pad modules were designed for this experiment, *PTP*1 and *PTP*2, implementing NMOS-type and PMOS-type arrays respectively. Test pad module *PTP*1 along with the test structure layout is illustrated in Figure 4.4, while a micrograph image of the fabricated chip can be seen in Figure 4.5.





Figure 4.4: PTP1 test pad module (layout).

Figure 4.5: PTP1 test pad module (micrograph).



Figure 4.6: MOSFET dimensions and rotations.

#### 4.3.1 MOSFET arrays

Each MOSFET array was a separate experiment consisting of  $37 \times 37$  transistors, which could be either NMOS-type or PMOS-type, short channel ( $0.7\mu m \times 0.07\mu m$ ) or long channel ( $0.5\mu m \times 0.5\mu m$ ), as well as at 0° or 90° rotation in relation to TSVs (Figure 4.6).

The MOSFET arrays enclosed TSVs in one of five different configurations: 1 TSV (Figure 4.7), 4 TSVs diagonal (Figure 4.8), 4 TSVs lateral (Figure 4.9), 9 TSVs (Figure 4.10), and no-TSVs. In total there were 16 configurations for each MOSFET array type (NMOS/PMOS) as listed in Table 4.2. The location of each MOSFET array in the top-level design is illustrated in Figure 4.4.

The number of available I/O test pads on each module was  $2 \times 12$ , which imposed a limitation for accessing all 21,904 (16 ×  $37 \times 37$ ) MOSFETs. To overcome that restriction only a subset of the transistors in each MOSFET array were wired out (16 × 16) and thus electrically accessed using digital selection logic. The rest of the transistors in the  $37 \times 37$  MOSFET array were inserted as dummy devices, placed in the gaps to ensure uniform boundary conditions and good matching between transistors (Figure 4.11).

Since thermomechanical stress exhibits two-fold rotational symmetry, as discussed in 4.2, the stress effect is the same for transistors on the same axis whether they are placed east/west of the TSV on the latitudinal axis, or north/south of the TSV on the longitudinal axis. The symmetry of the stress distribution was exploited in the design of the MOSFET arrays, so as to increase the resolution of the experiment despite the limited number of transistors. As can be seen in Figure 4.11 the transistors in the array are placed in an irregular grid with a  $2\mu m$  pitch. Some transistors are placed on odd-number distances ( $1\mu m$ ,  $3\mu m$ ...) in relation to the TSV, while others on even-number distances ( $2\mu m$ ,  $4\mu m$ ...). Nevertheless, rotational symmetry allows the evaluation of the thermomechanical stress at all directions with a  $1\mu m$  resolution.



Figure 4.7: 1 TSV MOSFET array configuration.



Figure 4.8: 4 diagonal TSVs MOSFET array configuration.



Figure 4.9: 4 lateral TSVs MOSFET array configuration.



Figure 4.10: 9 TSVs MOSFET array configuration.

| Configuration | Rotation | FET<br>width<br>(µm) | FET<br>length<br>(µm) | Name |
|---------------|----------|----------------------|-----------------------|------|
| 0             | o°       | 0.7                  | 0.07                  | Ao   |
| 1             | o°       | 0.7                  | 0.07                  | Во   |
| 4diagonal     | o°       | 0.7                  | 0.07                  | Со   |
| 4lateral      | o°       | 0.7                  | 0.07                  | Do   |
| 9             | o°       | 0.7                  | 0.07                  | Ео   |
| 0             | o°       | 0.5                  | 0.5                   | Fo   |
| 1             | o°       | 0.5                  | 0.5                   | Go   |
| 4diagonal     | o°       | 0.5                  | 0.5                   | Но   |
| 4lateral      | o°       | 0.5                  | 0.5                   | Io   |
| 9             | o°       | 0.5                  | 0.5                   | Jo   |
| 0             | 90°      | 0.7                  | 0.07                  | A90  |
| 1             | 90°      | 0.7                  | 0.07                  | B90  |
| 9             | 90°      | 0.7                  | 0.07                  | E90  |
| 0             | 90°      | 0.5                  | 0.5                   | F90  |
| 1             | 90°      | 0.5                  | 0.5                   | G90  |
| 9             | 90°      | 0.5                  | 0.5                   | J90  |

Table 4.2: MOSFET array configurations for test pad modules PTP1/PTP2.



Figure 4.11: MOSFET array detail.

## 4.3.2 Digital selection logic

Each MOSFET array was composed of 256 active transistors ( $16 \times 16$ ). Test modules *PTP*1 and *PTP*2 implemented 16 array configurations each, with 4,096 devices in total ( $16 \text{ configurations} \times 256 \text{ transistors}$ ), which could be accessed individually through the 24 I/O pads of the 2 × 12 test pad modules. Individual transistor Gate, Drain, and Source terminals were multiplexed to the I/O pads using a combination of decoders and transmission-gates (Figure 4.12). The transmission-gates were used as analog switches for current flow, while the decoders enabled the corresponding transmission-gates for accessing the externally selected transistor for measurement.

The procedure for selecting a single MOSFET device for measurement was as follows:

- Array (o to 15) was enabled using the 4 *bit* Array-select decoder. The MOSFET Source terminal (S), common to all transistors in a single array, was electrically connected to I/O pads "SF"/"SS" when enabled.
- MOSFET Gate terminal (Go to G15) was connected to I/O pad "VG" using the 4 – *bit* Gate-select decoder and corresponding transmission-gates.
- MOSFET Drain terminal (Do to D15) was connected to I/O pads "DFn"/"DSn" using the 2 – bit Drain-select decoder



Figure 4.12: Digital selection logic.

and corresponding transmission-gates. Drain terminals were always enabled in groups of 4 and connected sequentially to I/O pads "DFo"/"DSo" to "DF3"/"DS3", thus the 2 - bit Drain-select decoder was adequate for accessing all 16 Drain terminals in the array.

When a MOSFET is switched on, a voltage drop is expected between the I/O pads and the transistor Drain/Source terminals due to the resistance of wires. To guarantee accurate voltage delivery, transistor Drain/Source terminals are "sensed" on the "DSn"/"SS" I/O pads, so as to ensure that the applied voltage on I/O pads "DFn"/"SF" supplies the desired voltage values on the transistor terminals. This procedure, known as four-terminal sensing [29] is typically automated when using laboratory measurement equipment, such as a source measurement unit (SMU).

A single MOSFET array with its digital selection logic can be seen in layout view in Figure 4.13. Transistor Gate terminals are selected using a 4 - bit decoder (Figure 4.14), while the externally applied voltage on I/O pad "VG" passes through transmission gates (Figure 4.15). The transmission gates schematic used for the Drain/Source terminals can be seen in Figure 4.16. The Drain terminal transmission gates are enabled in groups of 4 using a 2 - bit decoder (Figure 4.17). If a MOSFET array is not in the selected state by the Array-select decoder, Gate terminals are forced at *GND* voltage level (for *PTP*1) or to *VDD* voltage (for *PTP2*), so that non-selected MOSFETs remain turned-off instead of floating.


Figure 4.13: MOSFET array with digital selection logic.



Figure 4.14: 4-bit Gate-enable/Array-enable decoder.



Figure 4.15: Transmission gate for MOSFET Gate terminals.



Figure 4.16: Transmission gates for MOSFET Drain/Source terminals.



Figure 4.17: 2-bit Drain-select decoder.

| # | Configuration                 | Туре                    | Name | Wafer |
|---|-------------------------------|-------------------------|------|-------|
| 1 | No-TSV                        | PMOS<br>(0.5μm × 0.5μm) | Fo   | Do8   |
| 2 | 1-TSV / 0° / 25 °C            | PMOS<br>(0.5μm × 0.5μm) | Go   | Do8   |
| 3 | 1-TSV / 0° / 25 °C            | PMOS<br>(0.5μm × 0.5μm) | Go   | D20   |
| 4 | 1-TSV / 0° / 60 °C            | PMOS<br>(0.5μm × 0.5μm) | Go   | Do8   |
| 5 | 4-lateral TSV / 0° / 25<br>°C | PMOS<br>(0.5μm × 0.5μm) | Io   | Do8   |
| 6 | 1-TSV / 90° / 25 °C           | PMOS<br>(0.5μm × 0.5μm) | G90  | Do8   |
| 7 | 1-TSV / 0° / 25 °C            | PMOS (0.7μm × 0.07μm)   | Во   | Do8   |

Table 4.3: MOSFET array configurations selected for experimental evaluation.

#### 4.4 EXPERIMENTAL MEASUREMENTS

Test modules *PTP*1 and *PTP*2 implementing the presented MOS-FET arrays were fabricated on 300*mm* wafers (Figure A.1), each containing 472 identical dies. Two wafers were selected for measurements, referred to as "Do8" and "D2o", both originating from the same lot. The test module used for experimental evaluation was *PTP*2, containing the PMOS-device arrays, which were expected to have the highest performance impact from thermomechanical stress as was discussed in 4.2. The specific MOSFET arrays that were selected for evaluation are listed in Table 4.3.

## 4.4.1 Data processing methodology

For each MOSFET array evaluated, the drain-to-source current  $(I_{DS})$  of all transistors in the array was measured in the saturation region ( $V_{DS} = 1V$ ,  $V_{GS} = 1V$ ). Subsequently, the saturation current was divided with the median current value of all transistors in the array generating  $I_{DS}$  values normalised to the median. The median of a data set represents the arithmetic value that the majority of values tend (Appendix B). Since only a small percentage of the 256 transistors in the array are in proximity to the TSVs, the median  $I_{DS}$  value represented typical transistor saturation current not affected by thermomechanical stress.

Measurement precision is increased with repeated measurements, as random noise is removed from the experiment reducing



Figure 4.18: Processing of measurement results.

the standard deviation of the sampled mean (Appendix B). Consequently, each MOSFET array was measured across 98 identical dies on the wafer and their results were averaged. Since the saturation current shift due to thermomechanical stress is systematic across all dies, averaging the saturation currents only removed the random noise, effectively increasing the measurement precision by a factor of ~ 10 ( $\sqrt{98}$ ). The described procedure for processing the measurement results is illustrated in Figure 4.18.

Each MOSFET array contains 256 transistors, which are not placed in a regular  $16 \times 16$  grid as can be seen in Figure 4.11. This approach was chosen so as to minimise the number of transistors in an array, while allowing for  $1\mu m$  resolution in the evaluation of the thermomechanical stress by taking advantage of its two-fold rotational symmetry as discussed in 4.3.1. However, illustrating measurements in a contour graph requires a regular grid of data. Thus, for illustration purposes the experimental measurements were reconstructed to a  $33 \times 31$  regular grid (1,023 transistors), with the intermediate values estimated using linear interpolation (Figure 4.19).

## 4.4.2 Measurement setup

For I/O access, a general purpose on-wafer measurement system was used that allowed direct contact with the wafer and the  $2 \times 12$  test pad modules through probecard pins (Figure A.2). I/O signal generation and measurement was provided by standard



Figure 4.19: Interpolation of measurement results.

laboratory instruments, controlled through the GPIB port of a PC running a customised LabVIEW program. Four-terminal sensing for MOSFET Drain/Source terminals was automated using a Keithley 2602 dual-channel SMU [1]. The measurement setup is illustrated in Figure 4.20.

The I/O test pads for the *PTP1/PTP2* modules can be seen in Figure 4.4, while the instrument connections used for each test pad are summarised in Table 4.4.



Figure 4.20: Measurement setup.

| Test pad | Direction | Instrument connection                                         |  |  |  |
|----------|-----------|---------------------------------------------------------------|--|--|--|
| VDD      | Power     | Power supply - 1V                                             |  |  |  |
| GND      | Power     | Power supply - oV                                             |  |  |  |
| DEC[0-9] | Input     | Digital Output<br>(GATE/DRAIN/ARRAY Select<br>Decoders Input) |  |  |  |
| DF[0-3]  | Input     | SMU (Drain Force terminal)                                    |  |  |  |
| DS[0-3]  | Output    | SMU (Drain Sense terminal)                                    |  |  |  |
| SF       | Input     | SMU (Source Force terminal)                                   |  |  |  |
| SS       | Output    | SMU (Source Sense terminal)                                   |  |  |  |
| VG Input |           | Digital Output (Gate voltage)                                 |  |  |  |

Table 4.4: PTP1/PTP2 test pad module instrument connections.

#### 4.4.3 PMOS (long channel)

#### No-TSV configuration

This MOSFET array does not contain any TSVs, thus transistor mobility is not affected by thermomechanical stress. The array was measured on 98 identical dies of wafer "Do8", the saturation currents normalised, and then averaged over all measured dies using the procedure elaborated in 4.4.1. In Figure 4.21 the normalised saturation current for each transistor in the array can be seen after interpolation to a 33 × 31 grid. Each coordinate in the figure represents a  $1\mu m$  distance.

As can be observed, the majority of transistors in the array have almost identical saturation currents as expected. However, there is a distinctive stripe running from north to south where the current value is reduced up to  $\sim$  5%. This revealed a flaw in the design of the MOSFET arrays that can be best observed in Figure 4.11. The transistors that are located adjacent to the substrate tap do not perceive the same boundary conditions as the rest of the transistors in the array. That difference is enough to affect the physical characteristics of these transistors during fabrication, which in turn determine transistor parameters, such as saturation current. This flaw stresses out the significance of careful boundary condition design when targeting for good transistor matching in a test structure.

#### 1 TSV configuration

In this configuration long-channel MOSFETs ( $0.5\mu m \times 0.5\mu m$ ) are placed in the periphery of a single TSV (Figure 4.7). The transistors in the array were measured over 98 identical dies of wafer "Do8"



Figure 4.21: Normalised on-current for PMOS structure "Fo" (Do8).

and processed according to 4.4.1. The normalised saturation current for each transistor in the array can be seen in Figure 4.22 after interpolation to a  $33 \times 31$  grid. It can be clearly observed that the transistors in proximity to the TSV show a distinctive variation in their saturation current in comparison to the other transistors in the array, which is the direct consequence of TSV-induced thermomechanical stress. These results also demonstrate that the proposed test structure design has enough measurement precision to distinguish thermomechanical stress from background noise and variability. The north to south stripe observed in the previous configuration due to boundary condition mismatch is also present in these measurements.

To observe the TSV effect on nearby transistors more accurately, the normalised saturation current versus distance is plotted for axis X = 16 and Y = 14 (Figure 4.23), which cross the TSV almost at its the centre. In the graph only actual measurements are plotted prior to interpolation. We observe that the saturation current of a transistor at  $1\mu m$  distance from the TSV increases ~ 20% on the latitudinal axis and decreases ~ 15% on the longitudinal, while the effect is almost negligible after  $8\mu m$ .

The same measurement is repeated on wafer "D2o". In Figure 4.24 the normalised saturation current for each measured transistor in the array can be seen after interpolation to a  $33 \times 31$ grid. Also the normalised saturation current versus distance is plotted for axis X = 16 and Y = 14 (Figure 4.25). In the graph



Figure 4.22: Normalised on-current for PMOS structure "Go" (Do8).



Figure 4.23: Normalised on-current for PMOS structure "Go" (Do8,Y=14,X=16).



Figure 4.24: Normalised on-current for PMOS structure "Go" (D20).

we observe that the saturation current of a transistor at  $1\mu m$  distance from the TSV increases ~ 30% on the latitudinal axis and decreases ~ 18% on the longitudinal, while the effect is almost negligible after  $8\mu m$ . These results show that thermomechanical stress can be quite different at distances very close to the TSV even in between wafers originating from the same lot.

Finally, the measurement is repeated on wafer "Do8" after increasing the temperature to 60 °C. Normalised current contour plot can be seen in Figure 4.26 and measurements for X = 16 and Y = 14 in Figure 4.27. The saturation current of a transistor



Figure 4.25: Normalised on-current for PMOS structure "Go" (D20,Y=14,X=16).



Figure 4.26: Normalised on-current for PMOS structure "Go" (Do8, 60°C).

at 1 $\mu$ *m* distance from the TSV increases ~ 16% on the latitudinal axis (-4% in comparison to 25 °C) and decreases ~ 11 on the longitudinal (-4% in comparison to 25 °C) , while the effect is almost negligible after 6 $\mu$ *m* (-2 $\mu$ *m* in relation to 25 °C).

# 4 lateral TSV configuration

In this configuration the thermomechanical stress is evaluated in the lateral direction between 4 TSVs. The array is measured over 98 identical dies of wafer "Do8" at 25 °C. The contour plot of the normalised saturation current can be seen in Figure 4.28 and measurements for X = 16 and Y = 14 in Figures 4.29 and 4.30 respectively. As can be observed, the saturation current of a transistor at 1 $\mu$ m distance from the TSV increases ~ 18% on the latitudinal axis (-2% in relation to 1 TSV) and decreases ~ 12% on the longitudinal (-3% in relation to 1 TSV), while TSV interaction reduces the effect range to 5 $\mu$ m (-3 $\mu$ m in relation to 1 TSV).

## 90° rotation TSV configuration

In this configuration the orientation of the transistor channel in relation to the thermomechanical stress is rotated to 90°. The array is measured over 98 identical dies of wafer "Do8" at 25°C. The contour plot of the normalised saturation current can be seen in Figure 4.31 and measurements for X = 16 and Y = 14 in Figure 4.32. As was expected the stress effect on mobility has



Figure 4.27: Normalised on-current for PMOS structure "Go" (Do8, Y=14, X=16, 60°C).



Figure 4.28: Normalised on-current for PMOS structure "Io" (Do8).



Figure 4.29: Normalised on-current for PMOS structure "Io" (Do8, X=16).



Figure 4.30: Normalised on-current for PMOS structure "Io" (Do8, Y=14).



Figure 4.31: Normalised on-current for PMOS structure "G90" (Do8).

been reversed between the two axis. The saturation current of a transistor at  $1\mu m$  distance from the TSV decreases  $\sim 13\%$  on the latitudinal axis and increases  $\sim 20\%$  on the longitudinal, while the effect is almost negligible after  $6\mu m$ .

# 4.4.4 PMOS (short channel)

In this configuration short-channel MOSFETs  $(0.7\mu m \times 0.07\mu m)$  are placed around a single TSV (Figure 4.7). The contour plot of the normalised saturation current can be seen in Figure 4.33 and



Figure 4.32: Normalised on-current for PMOS structure "G90" (Do8, Y=14, X=16).



Figure 4.33: Normalised on-current for PMOS structure "Bo" (Do8).

measurements for X = 16 and Y = 14 in Figure 4.34. At at  $1\mu m$  distance from the TSV the saturation current decreases 7% on the longitudinal axis (-8% in relation to long channel), while on the latitudinal axis no stress effect could be detected. That could be attributed either to short-channel effects, or malfunctioning of the specific transistor due to proximity to the TSV and the experimental nature of the manufacturing process. On both axis the effect is almost negligible after  $5 - 6\mu m$ .



Figure 4.34: Normalised on-current for PMOS structure "Bo" (Do8, Y=14, X=16).

| Size | Wafer | Configuration      | Rotation<br>Effect @ $1\mu m$ lat. |            | Effect @ $1\mu m$ long. | KOZ |
|------|-------|--------------------|------------------------------------|------------|-------------------------|-----|
| L    | Do8   | 1-TSV (25 °C)      | o <sup>o</sup>                     | +20%       | -15%                    | 8µт |
| L    | D20   | 1-TSV (25 °C)      | o <sup>o</sup>                     | +30%       | -18%                    | 8µт |
| L    | Do8   | 1-TSV (60 °C)      | o <sup>o</sup>                     | +16%       | -11%                    | 6µт |
| L    | Do8   | 4-lat. TSV (25 °C) | o <sup>o</sup>                     | +18%       | -12%                    | 5µm |
| L    | Do8   | 1-TSV (25 °C)      | 90°                                | -13%       | +20%                    | 6µт |
| S    | Do8   | 1-TSV (25 °C)      | o <sup>o</sup>                     | $\sim 1\%$ | -7%                     | 8µт |

Table 4.5: Summary of experimental measurements on long (L) and short (S) channel PMOS devices.

#### 4.5 SUMMARY AND CONCLUSIONS

In this chapter, a test structure was presented for monitoring the impact of TSV-induced thermomechanical stress on MOSFET device performance. The test structure implemented arrays of MOSFET devices superimposed with TSVs in various configurations. Digital selection logic allowed access to the MOSFET devices, which could be used to indirectly monitor thermomechanical stress by measuring the saturation current of transistors.

The structure was implemented in Imec's experimental 65*nm* <sub>3</sub>D process and used to characterize the process technology while evaluating the test structure. Experimental measurements were carried on PMOS devices and the thermomechanical stress was found to be on par with simulation estimations.

In Figure 4.5, the measurements are summarised for the latitudinal and longitudinal axis at  $1\mu m$  distance from the TSV, along with the recommended keep-out-zone (KOZ). The KOZ is indicative and represents the distance where the thermomechanical stress effect on saturation current is almost negligible (< 1%). In a practical case, KOZs will depend on the tolerated saturation current mismatch between devices, which is a direct consequence of timing and other circuit constraints. Typically, digital circuits can tolerate, and thus function properly, with much more transistor variability than analog circuits.

Comparing the experimental results of this work to other published data in the literature, reveals that the thermomechanical stress effect varies dramatically with different process characteristics. Experimental measurements reported by Cho et al. [17] confirm that long channel devices are more sensitive than short channel, however both the saturation current shift and KOZs are considerably reduced to 2% and  $< 2\mu m$ , respectively. In addition, the dependency of the stress effect on TSV position was not observed by the authors, in contrast to the measurements presented in this work and predictions by theoretical models. Contradictory results between theory and experiments and between different experiments, emphasise the need for better understanding of the specific mechanisms involved in the TSV-induced thermomechanical stress.

Evaluation of the measurements results allowed for some conclusions to be made on the effectiveness of the presented test structure:

- 1. It is desirable for transistors in the array to be in a regular grid so as processing of the measurement data is less complex.
- 2. The boundary conditions of transistors have an considerable effect on transistor matching and thus measurement accuracy.
- 3. Distances of interest are generally  $< 8\mu m$ , therefore the arrays could be smaller, or more dense in the area in proximity to the TSV than further away.
- 4. The experimental process used to implement the test structure had low yield, which stressed out the need for mechanisms to detect faults in the digital selection logic.

The experiences learned from this project can be potentially used for designing improved thermomechanical stress characterization structures for next generation TSV processes. In a future work it would be interesting to perform experimental measurements on NMOS devices, which were not evaluated in this project. Furthermore, the cause of the glitch measured on the short-channel transistor nearest to the TSV should be further investigated.

In the following chapter, the potential of the energy-recovery technique is investigated for reducing the energy dissipation of TSV interconnects in future <sub>3</sub>D-SoC designs implementing a large number of TSVs.

# EVALUATION OF ENERGY-RECOVERY FOR TSV INTERCONNECTS

## 5.1 INTRODUCTION

TSV-based <sub>3</sub>D-ICs enable low-parasitic direct connections between functional blocks in a SoC, by replacing traditional long horizontal <sup>2</sup>D global interconnects with short vertical ones (2.1, page 5). Compared to <sup>2</sup>D interconnects, the reduced resistance-capacitance (RC) parasitics of TSVs improve both speed (*RC*), and energy ( $1/2CV_{DD}^2$ ) performance of circuits. However, the TSV parasitic capacitance may still become an important source of energy dissipation in large, densely interconnected <sub>3</sub>D-SoCs, since the combined capacitance and thus the energy required to drive TSVs increases linearly with the number of tiers and interconnections (Figure 5.1).

The principal technique for reducing energy dissipation in CMOS has traditionally been based on supply voltage scaling [14], though this represents a technical challenge below the 65*nm* node since PVT variations and physical limits decrease in practice the range over which voltages may be scaled. An alternative low-power design technique is energy-recovery logic [31], which demonstrates frequency-dependent energy dissipation and thus does not suffer from the limitations of voltage scaling. The origins and operating principle of energy-recovery logic were discussed in detail in 2.2 (page 13).

In this chapter, the potential of the energy-recovery technique is investigated for reducing the energy dissipation of TSV interconnects in <sub>3</sub>D-ICs. The total energy dissipation per cycle and optimum device sizing are extracted for the proposed scheme using theoretical modelling, while the configuration is evaluated against conventional static CMOS. Some of the presented methods and results have appeared previously in Asimakopoulos et al. [7].

#### 5.2 ENERGY-RECOVERY TSV INTERCONNECTS

As was discussed in 2.2 (page 13), energy-recovery circuits conserve energy by restricting current-flow across resistances, and subsequently recovering part of the supplied energy using timevariable power sources. The energy dissipated in an energy-



Figure 5.1: TSV energy dissipation per cycle increase with 3D integration density.

recovery circuit becomes proportional to the charging period (*T*) of the capacitive load ( $C_L$ ) [82, 60]:

$$E_{ER} \propto \left(\frac{RC_L}{T}\right) C_L V_{DD}^2 \tag{5.1}$$

In comparison, a conventional CMOS circuit driving an equivalent capacitive load would dissipate during the same charging period T [59]:

$$E_{CMOS} = \frac{1}{2} C_L V_{DD}^2 \tag{5.2}$$

Combining Equations 5.1 and 5.2, we calculate the energy dissipation relation between the two circuits:

$$\frac{E_{ER}}{E_{CMOS}} \propto \frac{2RC_L}{T}$$
(5.3)

From Equation 5.3 we observe that the conditions under which energy-recovery conserves energy when compared to conventional CMOS are when either the charging period (T) is large, or the resistance/capacitance ( $RC_L$ ) of the load is small.

In a <sub>3</sub>D-IC, TSV parasitics in terms of resistance-capacitance (RC) are inherently low by design as was discussed in 2.1 (page 5). In fact, as was demonstrated by Katti et al. [27], resistance is negligible and thus the electrical behaviour of a TSV in a <sub>3</sub>D-IC



Figure 5.2: Comparison of a typical CMOS TSV driver to the proposed energy-recovery.

can be modelled as a single capacitance due to its predominant impact on energy/delay. Therefore, and based exclusively on Equation 5.3, energy-recovery has an advantage over conventional static CMOS when used for driving low-parasitic TSV interconnects.

The differences between a typical CMOS driver in a TSV-based <sub>3</sub>D-IC, and the energy-recovery scheme proposed in this chapter are illustrated in Figure 5.2. A logic block in *Tier*1 generates a digital output which is regenerated by a driver and received by a second logic block in *Tier*2, after passing through the TSV. In the proposed energy-recovery scheme, as an intermediate step the digital output is converted to an adiabatic signal before passing through the TSV, so as to take advantage of the low-power characteristics of energy-recovery. Compatibility between the receiving digital logic block in *Tier*2 and energy-recovery is retained by restoring the adiabatic signal back to its digital form.

Converting the digital signal to an adiabatic, as depicted in the simplified diagram of Figure 5.2, requires the use of a specialised driver that can combine (mix) the time-variable signal of the power-clock generator (PCG) with a digital output. Signals generated by a PCG return to zero at the end of a cycle, resembling the operation of a digital clock (2.2.2, page 18). In contrast, digital outputs are typically allowed to have arbitrary values from one cycle to the next, thus a special structure is required that can encode the PCG generated signal to arbitrary binary values.



Figure 5.3: Adiabatic driver.



Figure 5.4: A simple pulse-to-level converter (P2LC).

An example of such structure is the adiabatic driver [8, 84] (Figure 5.3). Adiabatic drivers are composed of a pair of transmission gates switching on/off alternately based on the value of a digital input. On each cycle, the time-variable adiabatic signal is redirected either to the driver's inverting, or non-inverting output, and as a result the data carried by the adiabatic signal are dual-rail encoded.

The signals at the outputs of the adiabatic driver can be restored to their digital form after passing through the TSV, by converting the dual-rail encoded sinusoidal pulses back to level signals using a pulse-to-level converter (P2LC) [72] (Figure 5.4).

For generation of the time-variable adiabatic signal, a resonant PCG may be used such as the one discussed in 2.2.2 (page 18). A resonant PCG generates continuous sinusoidal pulses approximating the idealised (linear) adiabatic signal of Figure 5.2. The adiabatic driver's resistance ( $R_{TG}$ ) along with the capacitive load ( $C_L$ ) assume a crucial role in the generation of the resonant sinusoidal pulse, as their presence in the signal path along with



Figure 5.5: Resonant pulse generator with dual-rail data encoding.



Figure 5.6: Energy-recovery scheme for TSV interconnects.

a resonant-tank inductor form an RLC oscillator [44] (Figure 5.5) resonating at:

$$f = \frac{1}{2\pi\sqrt{L_{ind}C_L}} \tag{5.4}$$

A detailed block diagram of the proposed energy-recovery scheme, including the PCG and dual-rail encoding, is illustrated in Figure 5.6.

## 5.3 ANALYSIS

The energy dissipation relation depicted in Equation 5.1 is based on the assumption that an idealised (linear) adiabatic signal is used for charging the TSV capacitance. Furthermore, energy losses occurring in the additional components used in the realistic energy-recovery circuit illustrated in Figure 5.6 are not considered in Equation 5.1. The purpose of this section is to analyse and model all possible sources of energy dissipation in the proposed energy-recovery scheme.

The bulk of the energy dissipation in the energy-recovery circuit of Figure 5.6 will occur in the adiabatic driver ( $E_{AD}$ ), the inductor's parasitic resistance ( $R_{ind}$ ), transistor M1 ( $E_{M1}$ ) and the P<sub>2LC</sub> ( $E_{P2L}$ ). Therefore, the total energy dissipated per cycle in the energy-recovery circuit would be:

$$E_{ER} = E_{AD} + E_{ind} + E_{M1} + E_{P2L}$$
(5.5)

Parameters  $E_{AD}$ ,  $E_{ind}$  and  $E_{M1}$  are theoretically derived in the following subsections, while  $E_{P2L}$  depends on the specific P2LCs architecture used and could be easily extracted from simulation data, as will be shown later in this chapter.

#### 5.3.1 Adiabatic driver

There are two sources of energy dissipation in the adiabatic driver of Figure 5.3, adiabatic dissipation caused by current flowing through the transmission gate resistance ( $R_{TG}$ ), and CMOS dissipation caused by external stages driving the transmission gate and pull-down NMOS transistor input capacitances.

The sinusoidal pulses generated by the PCG charge/discharge the load capacitance ( $C_L$ ) through the transmission gate resistance ( $R_{TG}$ ). The full charge/discharge cycle of the load capacitance concludes during a period T = 1/f, while the energy dissipated on the transmission gate resistance ( $E_{TG}$ ) during that period is adiabatic and thus proportional to Equation 2.2. Appending a correction factor  $\xi$ , equal to  $\pi^2/8$  for a sine-shaped current [8, 76], and multiplying the equation by 2 to calculate energy dissipation for the full cycle:

$$E_{TG} = 2\xi \left(\frac{R_{TG}C_L}{\frac{1}{2}T}\right) C_L V_{DD}^2 = \frac{\pi^2}{2} \left(R_{TG}C_L f\right) C_L V_{DD}^2 \quad (5.6)$$

The cross-coupled PMOS pair of Figure 5.3 reduces the transmission gate input capacitance by 1/2 when compared to the simple adiabatic driver of Figure 2.15 (page 18). External CMOS stages are not required to drive the PMOS pair, however the PMOS gate capacitance will appear as an additional capacitive load at the output of the driver. Energy will still be dissipated for driving the PMOS pair, but will be adiabatic and thus more energy efficient. Transmission gates are typically sized to have equal NMOS and PMOS parts, as increasing the size of the PMOS only slightly improves the gate resistance while significantly increasing the input capacitance [78]. Thus, assuming both NMOS/PMOS transistors are equally sized to  $W_n$ , the additional capacitive load appearing at the driver's output due to the PMOS pair would be one gate capacitance  $C_n$ . In addition, drain/source diffusion capacitance  $(C_D)$  will be an important portion of the load, since in each cycle  $6C_D$  will be present in the current flow path (4 contributed by the ON and 2 by the OFF transmission gate). Including the TSV capacitance at the output of the adiabatic driver, the combined load capacitance seen by the PCG in any cycle would be:

$$C_L = C_{TSV} + C_n + 6C_D \tag{5.7}$$

The transmission gate resistance ( $R_{TG}$ ) can be related to the gate capacitance by a "device technology factor" ( $\kappa_{TG}$ ) [43], which we can define for calculation convenience:

$$\kappa_{TG} = R_{TG}C_n \Rightarrow R_{TG} = \frac{\kappa_{TG}}{C_n}$$
(5.8)

Assuming that the pull-down NMOS transistors of Figure 5.3 are small, CMOS dissipation is primarily the result of driving the NMOS part of the transmission gates, which is  $C_n V_{DD}^2$ . Combining the CMOS dissipation with Equations 5.6, 5.7 and 5.8 gives the total dissipated energy per cycle in the adiabatic driver:

$$E_{AD} = C_n V_{DD}^2 + \frac{\pi^2}{2} \frac{\kappa_{TG}}{C_n} f \left[ C_{TSV} + C_n + 6C_D \right]^2 V_{DD}^2$$
(5.9)

The second term of Equation 5.9 has a consistent contribution on the energy dissipation in every cycle, while the first term is dependent on data switching activity (*D*). We can also further simplify this equation by defining the diffusion capacitance as a fraction of the transistors input capacitance ( $C_D = bC_n$ ), and assigning term  $\frac{\pi^2}{2}\kappa_{TG}f$  to a variable ( $y = \frac{\pi^2}{2}\kappa_{TG}f$ ). Equation 5.9 then becomes:

$$E_{AD} = D \cdot C_n V_{DD}^2 + \frac{y}{C_n} [C_{TSV} + (6b+1)C_n]^2 V_{DD}^2$$
  
=  $\left[ C_n \left( \frac{D}{y} + (6b+1)^2 \right) + \frac{1}{C_n} C_{TSV}^2 \right] \cdot y V_{DD}^2$   
+  $(12b+2)C_{TSV} \cdot y V_{DD}^2$  (5.10)

Since in Equation 5.10  $C_n$  is the free parameter, the first two terms are inversely proportional and thus  $E_{AD}$  is minimised when

they become equal. Solving equation,  $C_n\left(\frac{D}{y} + (6b+1)^2\right) = \frac{1}{C_n}C_{TSV}^2$  for  $C_n$ , the optimum transmission gate input capacitance is calculated for a given  $C_{TSV}$ :

$$C_{n(opt)} = \sqrt{\left[\frac{D}{y} + (6b+1)^2\right]^{-1}} C_{TSV}$$
(5.11)

Combining Equations 5.10 and 5.11 gives the energy dissipation of an optimally sized adiabatic driver:

$$E_{AD(min)} = \left[\sqrt{\frac{D}{(6b+1)^2y} + 1} + 1\right] \cdot (12b+2)yC_{TSV}V_{DD}^2$$
(5.12)

## 5.3.2 Inductor's parasitic resistance

The inductor's ( $L_{ind}$ ) parasitic resistance  $R_{ind}$  is proportional to the  $Q_{ind}$  factor, which typically depends on the inductor's implementation technology. For the purposes of this analysis it can be estimated as:

$$Q_{ind} = \frac{1}{R_{ind}} \sqrt{\frac{L_{ind}}{C_L}} \Rightarrow R_{ind} = \frac{1}{Q_{ind}C_L 2\pi f}$$
(5.13)

Energy dissipation in  $R_{ind}$  is adiabatic and can be estimated combining Equations 5.6 and 5.13:

$$E_{ind} = \frac{\pi^2}{2} \left( R_{ind} C_L f \right) C_L V_{DD}^2 = \frac{\pi}{4} \frac{C_L}{Q_{ind}} V_{DD}^2$$
(5.14)

## 5.3.3 Switch M1

Transistor *M*1 is switched-on briefly to recover the energy dissipated each cycle in the  $R_{total} = R_{TG} + R_{ind}$ . Its energy dissipation is a trade-off between dissipation in its on-resistance  $R_{M1}$  and input capacitance  $C_{M1}$ .

Since *M*1 is a fairly large transistor, previous ratioed stages will have a significant contribution to the energy consumption as well. For that reason a *m* factor is used to compensate for the additional losses.  $I_{M1(rms)}$  is the root mean square (RMS) current passing through the transistor while turned-on and  $V_{GM1}$  is the peak gate voltage. A methodology for deriving optimum values for both these parameters is proposed in [44] and the total dissipated energy in *M*1 can be estimated as:

$$E_{M1(min)} = 2I_{M1(rms)}V_{GM1}\sqrt{\frac{m\kappa_{M1}}{f}}$$
(5.15)



Figure 5.7: Energy-recovery timing constraints.

#### 5.3.4 Timing Constraints

The gate of transistor *M*1 in Figure 5.5 is controlled by a pulse with period  $T_{CLK}$  and width  $W_{PULSE}$ . For optimum energy efficiency, the digital inputs of the adiabatic driver in Figure 5.3 should switch during the time period defined by  $W_{PULSE}$ , while the input values must be valid for the complete resonant clock cycle as shown in Figure 5.7.

The adiabatic driver's input data latching is controlled by clock signal *CLK*1 (Figure 5.6), and thus its positive edge should take place during  $W_{PULSE}$  (*Delay*1 <  $W_{PULSE}$ ). If these constraints are not met, energy that could be otherwise recovered by the inductor will be dissipated on *M*1 during  $W_{PULSE}$ . The clock signal at the receiving end, *CLK*2, does not have the same constraint, however it's positive edge should occur before the P<sub>2</sub>LC outputs new data (*Delay*2 <  $W_{PULSE} + PD_{P2L}$ ).

## 5.4 SIMULATIONS

To verify and evaluate the proposed theoretical models, a test case was designed for which energy dissipation was calculated using both theory and SPICE simulations. The test case assumed 1 - bit of information per cycle was transferred between 2 tiers of a <sub>3</sub>D-IC, in which the TSV capacitance ( $C_{TSV}$ ) was 80fF. The TSV was driven using the energy-recovery technique proposed in this chapter, implemented with a single adiabatic driver for data encoding, and an inductor ( $L_{ind}$ ,  $R_{ind}$ ) for generation of the



Figure 5.8: Test case for the evaluation of the theoretical model.

| Q  | 250 <i>MHz</i> |      | 500 <i>MHz</i> |      |      | 750 <i>MHz</i> |      |      |              |
|----|----------------|------|----------------|------|------|----------------|------|------|--------------|
|    | Т              | S    | e (%)          | Т    | S    | e (%)          | Т    | S    | e (%)        |
| 6  | 36.0           | 40.9 | -12.0          | 47.6 | 51.6 | -7.8           | 57.8 | 61.3 | -5.7         |
| 8  | 30.8           | 34.7 | -11.4          | 41.7 | 44.9 | -7.0           | 51.5 | 54.3 | -5.2         |
| 10 | 27.6           | 31.2 | -11.5          | 38.2 | 41.1 | -7.0           | 47.7 | 49.8 | <b>-</b> 4·3 |
| 12 | 25.5           | 28.9 | -11.9          | 35.8 | 38.6 | -7.1           | 45.1 | 47.2 | -4.4         |

Table 5.1: Energy dissipation results (fJ/cycle), theoretical model (T) versus SPICE simulation (S).

adiabatic signal and energy recovery. The circuit used for the test case is illustrated in Figure 5.8.

The energy dissipation on the adiabatic driver, inductor, and transistor *M*1 were calculated using Equations 5.12, 5.14, and 5.15, with the *Q* factor and operating frequency acting as free variables. Technology-specific parameters  $\kappa_{TG}$ ,  $\kappa_{M1}$ , and  $C_D$  were extracted using simulation models for Imec's 130*nm* <sub>3</sub>D process (Table 2.2, page 11). The same circuit was simulated on Cadence Virtuoso Spectre simulator under identical conditions. The energy dissipation results for various *Q* factors and operating frequencies are listed in Table 5.1.

The calculated and simulated energy dissipation results are also plotted in Figure 5.9. As can be observed, theoretical estimations are lower than than the equivalent results reported by the simulator in all the cases examined. That can be expected as the calculations performed by the simulator are thorough, and consider much more details that are skipped in the theoretical models. Nevertheless, there is good correlation between theory and simulations.

In Figure 5.10 the error in the estimation of the energy dissipation is plotted when using the theoretical model. The *Q* factor does not seem to have a significant effect on the error, however there is a considerable dependence on the operating frequency.



Figure 5.9: Energy dissipation results (*fJ/cycle*), theoretical model (T) versus SPICE simulator (S).



Figure 5.10: Energy dissipation estimation error of the theoretical model compared to the SPICE simulator.

The parameters  $\kappa_{TG}$ ,  $\kappa_{M1}$ , and  $C_D$  that were extracted for the specific technology, even though they were used as constants in the models, in practice they are frequency dependent. Therefore it could be said that based on the parameters chosen, the theoretical results are optimised for a specific frequency. Increasing frequency doesn't necessary result in reduced error, as seems to be suggested by the graphs in Figure 5.10. In fact, it is expected that the estimation error will show a positive sign for larger frequencies. As was discussed previously in the text, it is a reasonable expectation for the theoretical models to slightly underestimate the energy-dissipation.



Figure 5.11: Test case for comparison of energy-recovery to CMOS.

## 5.4.1 Comparison to CMOS

When comparing different circuits it is typically desirable to take a holistic approach evaluating various aspects, such as power, area, speed, cost etc. The purpose of this section is to compare the energy-recovery scheme to a conventional static CMOS circuit, using just the theoretical models presented in this chapter. This basically limits the comparison to the energy dissipation and speed parameters, while on Chapter 6 (page 92) area is also considered. The test case used for the comparison is illustrated in Figure 5.11. The TSV load capacitance was assumed to be 80 fFfor both circuits.

The energy dissipation of the CMOS circuit in Figure 5.11 is described by Equation 5.2. Appending the data switching activity (*D*):

$$E_{CMOS} = D \cdot \frac{1}{2} C_L V_{DD}^2 \tag{5.16}$$

Equation 5.16 obviously calculates the theoretical minimum energy dissipation, since no parasitics and previous CMOS stages are included. In practice the energy dissipation of the CMOS circuit can be much larger, however Equation 5.16 is considered adequate for the purposes of this comparison.

Energy dissipation of the energy-recovery circuit is calculated using Equation 5.5. The energy contribution of each sub-component, the adiabatic driver ( $E_{AD}$ ), the inductor's parasitic resistance ( $R_{ind}$ ), and transistor M1 ( $E_{M1}$ ) were theoretically derived previously in the text. The P2LC energy was simulated in SPICE for convenience. The simple circuit topology of Figure 5.4 was slightly improved, adding the PMOS transistors seen in Figure 5.12, so as to reduce short-circuit current when the input is in the range



Figure 5.12: Improved P2LC design.



Figure 5.13: Simulated energy dissipation of improved P2LC design.

 $V_{th} \rightarrow (V_{DD} - V_{th})$ . The energy dissipation results extracted using SPICE can be seen in Figure 5.13. As can be observed, the energy dissipation is frequency dependent, something not expected in a CMOS circuit. This behaviour is caused by the slow rising sinusoidal inputs, which for low frequencies, extend the period during which both PMOS and NMOS are on, and thus increase short-circuit current.

The *Q* factor is a critical parameter in the energy efficiency of the energy-recovery scheme, since the entire circuit is basically an RLC oscillator. A wide range of inductors are available as discrete components for use in circuits with *Q* factors depending on the size and material. On-chip integrated inductors have more restricted performance, however their compact size is an advantage. Integrated *Q* factors are typically in the neighbourhood of 10 for *nH* range inductors with ~  $100\mu m$  radius [5]. The *Q* factor



Figure 5.14: Energy dissipation reduction achieved by the energyrecovery circuit compared to CMOS for variable Q factor (C = 80 fF).



Figure 5.15: Energy contribution of energy-recovery circuit components at 200MHz (C = 80fF).

values in the following simulations were deliberately chosen to be in realistic ranges for integrated inductors.

In Figure 5.14, the percentage reduction in energy dissipation achieved by the energy-recovery circuit is plotted when compared to CMOS. As expected, energy efficiency of the energy-recovery circuit improves with high Q factors, and low frequencies.

In Figures 5.15 and 5.16, the energy dissipation contribution of each component in the energy-recovery circuit can be seen at 200*MHz* and 800*MHz* respectively. In both cases, the energy dissipated on the inductor's resistance is reduced with higher *Q* factors. At lower frequencies the P<sub>2</sub>LC energy dissipation becomes a significant part of the total.

In Figure 5.17, the percentage reduction in energy dissipation is plotted, when the TSV capacitance ( $C_{TSV}$ ) is variable. The energy dissipation in the adiabatic driver (Equation 5.12), inductor (Equation 5.14), and transistor M1 (Equation 5.15) are linearly related to the TSV capacitance. Since the same applies to the CMOS circuit (Equation 5.16), relative energy dissipation of both circuits should be constant with TSV capacitance. The change observed in



Figure 5.16: Energy contribution of energy-recovery circuit components at 800MHz (C = 80 fF).



Figure 5.17: Energy dissipation reduction achieved by the energyrecovery circuit compared to CMOS with variable TSV capacitance (C = 80 fF, f = 500 MHz).

Figure 5.17 is caused by the constant energy dissipation contribution of the P<sub>2</sub>LC. However, as the load capacitance increases the energy dissipation on the P<sub>2</sub>LC becomes insignificant in relation to the rest of the circuit components, thus the reduction in energy is expected to asymptotically approach a maximum value.

Switching activity (*D*) can also be a significant factor affecting energy performance. Since in the energy-recovery circuit the sinusoidal oscillation may not be halted, all capacitances in the current flow path will charge and discharge on each cycle regardless of data activity. In contrast, static CMOS ideally dissipates energy only when switching, and thus the energy-recovery circuit has a disadvantage at low switching activities. In Figure 5.18, the estimated effect of the switching activity on energy performance is plotted for an operating frequency of 200*MHz*.

Since the technology factors  $\kappa_{TG}$  and  $\kappa_{M1}$  were extracted for a 130*nm* process, reducing their value by 1/2 could also provide us with an estimation of the circuit's energy performance for a 65*nm* node. The result in plotted in Figure 5.19, and as can be observed, technology scaling has a positive effect on energy dissipation when compared to conventional static CMOS.



Figure 5.18: Energy dissipation reduction achieved by the energy-recovery circuit compared to CMOS (C = 80 fF, f = 200 MHz).



Figure 5.19: Energy dissipation reduction achieved by the energyrecovery circuit compared to CMOS for variable technology (C = 80 fF, f = 500 MHz).

#### 5.5 SUMMARY AND CONCLUSIONS

The work presented in this chapter investigated the potential of the energy-recovery technique, as used in adiabatic logic and resonant clock distribution networks, for reducing the energy dissipation of TSV interconnections in <sub>3</sub>D-ICs. The proposed energy-recovery scheme for TSVs was analysed using theoretical modelling, generating equations for energy dissipation and optimum device sizing. The model accuracy was evaluated against SPICE simulations, which showed good correlation with the theoretical estimations on a 130*nm* technology process.

The energy-recovery scheme was compared to conventional static CMOS, when both circuits driving equivalent TSV load capacitances under similar conditions. The analysis revealed the energy performance dependence on multiple circuit parameters, the *Q* factor, operating frequency, switching activity, and TSV capacitance. The results demonstrated favourable energy performance for the energy-recovery scheme for low operating frequencies and high *Q* factors, switching activities, and TSV capacitances. Furthermore, an estimation was provided on the energy performance of the proposed scheme in an advanced technology node, which predicted further energy improvement.

Encoding the PCG signal to arbitrary digital values using the dual-rail adiabatic driver (Figure 5.3) is only one of the possible solutions. In fact, energy could be saved if instead of dual-rail another encoding method is used, such as 1-out-of-4 [74]. In 1-out-of-4 encoding 4 outputs are used to represent 2-bits, however only one of the outputs is charged at any given cycle. In comparison, dual-rail charges two outputs per cycle for the same number of bits, and thus 1-out-of-4 could potentially be saving up to 50% energy when compared to dual-rail. 1-out-of-4 encoding could prove an interesting extension to the proposed energy-recovery scheme.

The theoretical models developed in this work can be used as a quick alternative to SPICE simulations for design-space exploration. This is shown in the following chapter, where the theory established on the energy-recovery TSV scheme is used for designing a low-power <sub>3</sub>D demonstrator circuit that was fabricated on a 130*nm* process.

## 6.1 INTRODUCTION

In the previous chapter, a theoretical approach was taken to investigate the potential of the energy-recovery technique for reducing the energy dissipation of TSV interconnects in <sub>3</sub>D-ICs. The total energy dissipation per cycle and optimum device sizing were extracted for the proposed scheme using theoretical modelling, while analysis revealed the design parameters affecting energy performance.

Practical design issues, such as wire parasitics, signal distribution, area etc., were not considered, hence in this chapter a <sup>3D</sup> demonstrator circuit based on the energy-recovery scheme is designed under realistic physical and electrical constraints. The demonstrator is compared to a CMOS circuit designed with the same specifications using post-layout simulations. Both circuits are fabricated on a 130*nm* <sup>3D</sup> technology process and evaluated based on the experimental measurements.

## 6.2 PHYSICAL IMPLEMENTATION

The demonstrator circuit implemented a 500*MHz* 80-bit parallel I/O channel on a 5-tier TSV-based <sub>3</sub>D-IC. The TSVs interconnections were driven using the energy-recovery scheme, as discussed in Chapter 5.

The target process technology allowed for a low-resistivity backside re-distribution layer (RDL) to be deposited, which was used to implement a high-Q integrated inductor. Stacking of dies was not possible in the target process, so the 5-tier <sub>3</sub>D-IC was emulated by series-connecting TSVs in groups of 4 as is illustrated in Figure 6.1.

The circuit consisted of a total of 640 TSVs ( $2 \times 4 \times 80bits$ ), of which half were electrically connected to the integrated inductor during a single clock cycle. The TSV capacitance for the fabrication technology process had already been characterized and was estimated at ~ 40 fF (Table 2.2, page 11). Optimal device sizing and parameters for the energy-recovery I/O circuit at 500MHz were calculated using the methodology developed in Chapter 5 and are listed in Table 6.1.

In Figure 6.2 the schematic diagram of the energy-recovery circuit is illustrated. The main components comprising the circuit are: the adiabatic drivers, the pulse-to-level converters (P<sub>2LCs</sub>),


Figure 6.1: Demonstrator circuit structure (showing 1-bit I/O channel, face-down view).

| Adiabatic driver |                        | 80-bit circuit            |                                           |
|------------------|------------------------|---------------------------|-------------------------------------------|
| $W_n$            | 4.61µm (Eq.5.11)       | <i>R</i> <sub>total</sub> | $2.1\Omega\left(\frac{R_{TG}}{80}\right)$ |
| R <sub>TG</sub>  | 168Ω (Eq. <u>5</u> .8) | <i>C</i> <sub>total</sub> | $16.5 pF (80 \cdot C_{load})$             |
| $C_L$            | 207 <i>fF</i> (Eq.5.7) | L <sub>ind</sub>          | 6.13 <i>nH</i> (Eq.5.4)                   |

Table 6.1: Energy-recovery I/O circuit parameters.



Figure 6.2: Energy-recovery I/O circuit schematic diagram.



Figure 6.3: Adiabatic driver (layout/schematic view).

the data generator, the pulse generator, transistor M1, and the integrated inductor. All sub-components are discussed in detail in the following paragraphs.

# 6.2.1 Adiabatic driver

The adiabatic driver was based on the configuration theoretically analysed in 5.3.1 (page 80). The adiabatic driver had 2 inputs for the resonant pulse (*RES\_CLK*) and the digital data (*DATA*) signals, and 2 outputs for the inverted (*OUT\_B*) and non-inverted (*OUT*) signals. The layout and schematic views of the adiabatic driver are illustrated in Figure 6.3.



Figure 6.4: P2LC (layout/schematic view).

#### 6.2.2 Pulse-to-level converter

The pulse-to-level converter (P<sub>2</sub>LC) was based on the simple circuit of Figure 5.4 (page 78). The design in layout and schematic views is illustrated in Figure 6.4. The circuit had 2 inputs for the inverted  $(IN_B)$  and non-inverted (IN) resonant pulse, and 2 outputs for the inverted ( $OUT_B$ ) and non-inverted (OUT) digital signals.

#### 6.2.3 Data generator

A data generator was used for generating the digital data signal required at the adiabatic driver's input (DATA). The circuit, which is illustrated in Figure 6.5, was a simple D-type flip-flop configured as a toggle. The data generator had 1 input, a digital clock signal (CLK) operating at the same frequency as the resonant clock, and the digital data output (Q) with a switching activity of 1.

As discussed in 5.3.4 (page 8<sub>3</sub>), there are certain timing constraints that must be respected for the energy-recovery circuit to operate efficiently. The digital data received at the adiabatic driver's input (*DATA*), should switch during the period defined by  $W_{PULSE}$  (Figure 6.6). In the demonstrator circuit, the distribution network carrying the *DATA* signal to the adiabatic drivers was carefully designed so that its delay impact was less than  $W_{PULSE}$ .

# 6.2.4 Pulse generator

The gate of transistor *M*1 in Figure 6.2 is controlled by a pulse having the same frequency as the resonant clock (*RES\_CLK*) and width  $W_{PULSE}$ , which is calculated using the models in 5.3.3 (page 82). This digital pulse can be generated externally of the chip using a highly accurate laboratory signal generator. However, wire parasitics in the path from the external signal source to the



Figure 6.5: Data generator (layout/schematic view).



Figure 6.6: Timing constraints for the energy-recovery demonstrator circuit.



Figure 6.7: Pulse generator (layout/schematic view).



Figure 6.8: Pulse generator Input/Output signals.

gate of on-chip transistor *M*1 affect the rising and falling times of the pulse, resulting in pulse widths which are difficult to control.

For that reason the pulse generator was implemented on-chip (Figure 6.7), and could be used to generate pulses with arbitrary widths and frequencies. The pulse generator had 2 sinusoidal inputs (S1A, S2), supplied externally of the chip, and generated a digital pulse with the same frequency as S1A/S2. The pulse width was defined by the phase difference between S1A/S2 as illustrated in Figure 6.8.

# 6.2.5 Integrated inductor

The integrated inductor was designed in ASITIC<sup>1</sup> as a planar square spiral with fixed dimensions ( $500\mu m \times 500\mu m$ ) and variable inductance, while aiming for optimum Q factor. The simu-

<sup>1</sup> ASITIC is CAD tool that aids in the optimization and modelling of spiral inductors (http://rfic.eecs.berkeley.edu/~niknejad/asitic.html)



Figure 6.9: Inductor's Q factor simulation in ASITIC / Estimated energy dissipation reduction per cycle over standard CMOS.

lated Q factor of the inductor for each frequency was then used in Equation 5.5 to estimate the energy performance of the energyrecovery circuit when compared to standard CMOS (Equation 5.2).

As can be observed in Figure 6.9, the maximum performance is not reached at the point where the maximum Q factor is achieved, since energy-recovery is also dependent on the operating frequency. As a result the largest reduction in energy dissipation was 40% attained at 300*MHz*, while the maximum Q factor reported by ASITIC was 20 at 625*MHz*. The high Q factors were a consequence of the large area allocated for the inductor and the availability of the backside re-distribution layer (RDL) on the target process technology for the implementation of the inductor, where thick low-resistance copper paths could be deposited.

For an operating frequency of 500MHz the inductor had a Q factor of ~ 17, which resulted in an estimated energy reduction of 33% in comparison to the CMOS circuit. The inductor had a value of 6.13nH at that frequency and its design in layout view can be seen in Figure 6.10.

# 6.2.6 Top level design

The various sub-components of the circuit were interconnected using Metal1 and Metal2 layers. In the design there were two global signals that assumed a critical role in the performance of the circuit. The digital data signal (*DATA*), which was discussed in 6.2.3, and the resonant clock (*RES\_CLK*).

For the distribution of the data signal, buffers and interconnects were properly sized so that the timing constraints of Figure 6.6 were respected. A sinusoidal signal, such as *RES\_CLK*, does not require regeneration as it carries just one frequency harmonic, however the path itself has to be properly sized so as to make sure that any parasitic resistances/capacitances are not significantly affecting the target performance of the design. As was previously



Figure 6.10: Planar square spiral inductor designed in ASITIC.

shown by Chueh et al. [19], when the driven load capacitance in a resonant distribution network is much larger than wire parasitic capacitance, then wire resistance is more significant for determining both skew and energy performance.

The resonant pulse distribution topology used in this design was an asymmetric tree, as shown in Figure 6.11, in which the inductor forces a current at the root branch terminating on all adiabatic drivers in parallel. As such, and considering that the resonant pulse supplies 80 adiabatic drivers with transmission gate resistance of  $R_{TG} = 168\Omega$ , the total load resistance seen at the output node of the inductor is  $\frac{R_{TG}}{80} = \frac{168}{80} = 2.1\Omega$ . However, as the current flows to lower branch levels in the tree the perceived resistance increases to  $\frac{R_{TG}}{16}$ ,  $\frac{R_{TG}}{4}$ , and on the last one to  $R_{TG}$ . Any additional resistance from wires is going to add in series with the adiabatic driver resistance, which in turn will induce an almost linear effect on the energy performance, as can be estimated from Equation 5.12.

Assuming that the total driven resistance is not desirable to increase more than 20% and that there are 4 branch levels in the design, each branch can tolerate wire resistance equal to  $\frac{20\%}{4} = 5\%$  of its value. Starting from the inductor output node and calculating allowed wire resistivity for each branch level, the calculated tolerances for each branch are listed in Table 6.2. These calculated limits were used for optimally sizing the resonant clock distribution tree in the demonstrator circuit.

To ensure a stable voltage supply at the input of the inductor (VIND), a 1pF interdigitated metal-insulator-metal (MIM) decoupling capacitor [78] was designed (Figure 6.12), which was inserted between VIND and ground. The top level design of



Figure 6.11: Resonant pulse distribution tree.

| Branch level | Wire resistance                                                              |  |
|--------------|------------------------------------------------------------------------------|--|
| 1st          | $5\% \cdot \frac{168}{80} = 105m\Omega$                                      |  |
| 2nd          | $5\% \cdot \frac{168}{16} = 525m\Omega$                                      |  |
| 3rd          | $5\% \cdot \frac{168}{4} = 2.1\Omega$                                        |  |
| 4th          | $5\% \cdot 168\Omega = 8.4\Omega$                                            |  |
| Total load:  | $\frac{168+8.4}{80} + \frac{2.1}{20} + \frac{0.525}{5} + 0.105 = 2.52\Omega$ |  |

Table 6.2: Calculated resistance tolerances for each wire branch in the resonant clock distribution tree.



Figure 6.12: Interdigitated metal-insulator-metal capacitor (fringe capacitor).

the energy-recovery I/O circuit in layout view is illustrated in Figure 6.13.

# 6.2.7 CMOS I/O circuit

The CMOS I/O circuit which was designed for comparison purposes is illustrated in the schematic diagram of Figure 6.14. The circuit used half the number of TSVs than the energy-recovery circuit and was composed of two basic sub-components. The data generator, which was identical to the one discussed in 6.2.3, and the CMOS drivers (Figure 6.15) which were properly sized to drive the 160 fF TSV capacitance.

The top level layout view of the CMOS I/O circuit can be seen in Figure 6.16.

#### 6.3 POST-LAYOUT SIMULATION AND COMPARISON

The theoretically calculated parameters were used for the design of the demonstrator circuit in Imec's 3D130nm process technology. In Table 6.3, the estimated energy dissipation of each circuit subcomponent is listed.



Figure 6.13: Energy-recovery I/O circuit in layout view.



Figure 6.14: CMOS I/O circuit schematic diagram.



Figure 6.15: CMOS driver schematic.



Figure 6.16: CMOS I/O circuit in layout view.

| Energy dissipation (80-bits / cycle) |                             |  |
|--------------------------------------|-----------------------------|--|
| Inductor                             | 1.1 <i>pJ</i> (Eq.5.14)     |  |
| Switch M1                            | 0.92 <i>pJ</i> [ <b>?</b> ] |  |
| P2LCs                                | 0.98 <i>pJ</i> (Simulated)  |  |
| Adiabatic drivers                    | 3.15 <i>pJ</i> (Eq.5.12)    |  |
| Total                                | 6.15 <i>pJ</i>              |  |

Table 6.3: Energy-dissipation theoretical calculations for Q = 17, f = 500MHz.

After parasitics extraction, the top-level design was simulated in SPICE. The post-layout simulation results for a selection of output signals are plotted in Figure 6.17. As can be observed the timing constraints discussed in 6.2.3 are respected, since the adiabatic driver's input signal (*DATA*) optimally switches during the 327*ps PULSE* width. It can also be seen that the resonant clock (*CLK\_RES*) does not fully complete its discharge cycle before it is being forced to do so by the *PULSE* signal, since the additional capacitive load induced by the wire parasitics reduces the oscillator's natural frequency.

Post-layout simulated energy dissipation, as well as area requirements, for the energy-recovery and CMOS I/O circuits are listed in Table 6.4. The energy dissipation of the energy-recovery implementation is 25% more at the post-layout level in comparison to the theoretical estimations. The energy overhead can be attributed to the effect of the wire parasitics, as discussed in 6.2.6, and the transistor parasitics that are not included in the theoretical equations. Therefore the accuracy of the theoretical models can be considered acceptable for designing circuits based on the proposed energy-recovery scheme.

As was expected, the dual-rail energy-recovery circuit occupies almost double the area of the equivalent CMOS circuit and even though both implementations have increased energy dissipation in comparison to the theoretical calculations, the CMOS circuit energy increases relatively more. The latter was anticipated since the theoretical estimation of the CMOS energy dissipation was calculated using the simple Equation 5.2, which does not account for any additional driver stages and parasitics. The end result is that the comparative performance favours the energy-recovery implementation, which improves its energy savings from the



Figure 6.17: Post-layout SPICE simulation of the energy-recovery circuit.

|                 | Area       | Energy<br>dissipation<br>(Theory)                  | Energy<br>dissipation<br>(Post-layout)  |
|-----------------|------------|----------------------------------------------------|-----------------------------------------|
| CMOS            | $1.44mm^2$ | 9.22 <i>pJ</i><br>(115 <i>fJ/bit</i> )<br>(Eq.5.2) | 12.65 <i>pJ</i><br>(158 <i>fJ/bit</i> ) |
| Energy recovery | $2.64mm^2$ | 6.15pJ<br>(77fJ/bit)                               | 7.68pJ<br>(96fJ/bit)                    |
| Difference      | +83%       | -33.3%                                             | -39.3%                                  |

Table 6.4: Post-layout simulated energy dissipation / Area requirements.

theoretically calculated 33.3%, up to 39.3% in the post-layout simulations.

#### 6.4 EXPERIMENTAL MEASUREMENTS

The presented energy-recovery and CMOS I/O circuits were fabricated on a 200*mm* wafer (Figure A.<sub>3</sub>), which contained 47 identical dies. Micrograph images of the energy recovery and CMOS circuits can be seen in Figures 6.18 and 6.19 respectively.

#### 6.4.1 Measurement setup

For I/O access, a manual on-wafer measurement system was used that allowed direct contact with the wafer and the I/O test pads through RF probe heads (Figure A.4). I/O signal generation and measurement was provided by laboratory instruments which were manually controlled. The measurement setup is illustrated in Figure 6.20.

The I/O test pads for the energy-recovery and CMOS circuits and their equivalent instrument connections used for each test pad are summarised in Tables 6.5 and 6.6.

# 6.4.2 Results

#### CMOS I/O circuit

In Figure 6.21 the input signal S1B and the output signal *OCMOS* are shown as captured by the oscilloscope. Signal S1B, which has a frequency of 500MHz, is used by the data generator (Figure 6.14) to generate a data signal switching at the same frequency



Figure 6.18: Energy-recovery I/O circuit micrograph.



Figure 6.19: CMOS I/O circuit micrograph.



Figure 6.20: Measurement setup.

| Test pad | Direction | Instrument connection                                                  |
|----------|-----------|------------------------------------------------------------------------|
| VIND     | Power     | Power supply - >0.6V                                                   |
| GND      | Power     | Power supply - oV                                                      |
| S1A      | Input     | Sine-wave generator (o° Phase,<br>500MHz, non-50Ω-terminated)          |
| S2       | Input     | Sine-wave generator (Variable<br>Phase, 500MHz,<br>non-50Ω-terminated) |
| EN       | Input     | Digital Output (enable output data switching)                          |
| OER      | Output    | Oscilloscope                                                           |
| VDR      | Power     | Power supply (adiabatic<br>drivers+P2LCs) - 1.2V                       |
| VBUF1    | Power     | Power supply (Buffers) - 1.2V                                          |

Table 6.5: Energy-recovery circuit instrument connections.

| Test pad | Direction | Instrument connection                                      |
|----------|-----------|------------------------------------------------------------|
| VCMOS    | Power     | Power supply (CMOS drivers) -<br>1.2V                      |
| GND      | Power     | Power supply - oV                                          |
| VBUF2    | Power     | Power supply (Buffers) - 1.2V                              |
| S1B      | Input     | Sine-wave generator (o° Phase, 500MHz, non-50Ω-terminated) |
| OCMOS    | Output    | Oscilloscope                                               |
| EN       | Input     | Digital Output (enable output data switching)              |

Table 6.6: CMOS circuit instrument connections.



Figure 6.21: Oscilloscope output for signals S1B and OCMOS.



Figure 6.22: Measured/simulated power dissipation for the CMOS circuit.

that passes through the TSVs and is output on *OCMOS*. Therefore, the measured result on the oscilloscope suggests that the circuit is operating as expected.

The proper operation of the circuit is also verified by measuring power and energy per cycle dissipation on pad *VCMOS*. The power dissipation of the two dies measured is approximately linear as expected by simulations (Figure 6.22), while energy dissipation approximately constant (Figure 6.23). In addition, absolute power and energy dissipation values closely match post-layout simulations.

# Energy-recovery I/O circuit

Input signal S1A and output signal OER can be seen in Figure 6.24 as captured by the oscilloscope. The output at OER



Figure 6.23: Measured/simulated energy dissipation for the CMOS circuit.



Figure 6.24: Oscilloscope output for signals S1A and OER.

shows a data signal switching at 500*MHz* as expected. The slow rising edge of the data signal suggests higher than anticipated parasitics at the *OER* I/O pad, though the functionality of the circuit is not affected.

The energy dissipation is calculated by measuring the input current on I/O test pads *VIND* and *VDR*. The results for the two dies measured can be seen in Figure 6.25. The energy dissipation shows an approximately constant relation to frequency, similar to the CMOS circuit, which does not correlate with post-layout simulations. A properly operating energy-recovery circuit would be expected to show minimum energy dissipation at the resonance frequency ( $\sim 500MHz$ ).

Since the energy-recovery circuit shows CMOS-type energy dissipation, it was estimated that the inductor could not resonate



Figure 6.25: Measured/simulated energy dissipation for the energyrecovery circuit.



Figure 6.26: Energy-recovery I/O circuit with fault hypothesis.

properly and a hypothesis was made that the cause was a larger than anticipated parasitic capacitance on the integrated inductor. To determine whether the hypothesis is valid, a 200*pF* capacitor is inserted in the circuit schematic, connected at the output of the inductor as shown in Figure 6.26. The new circuit is simulated, and as can be seen in Figure 6.27, the output signal at *OER* switches properly even though the resonant clock (*RES\_CLK*) is not oscillating. The simulated energy dissipation of the circuit with the fault hypothesis can be seen in Figure 6.28, and as can be observed it is very similar to the experimental measurements, both in terms of absolute values and frequency relation.

### 6.5 SUMMARY AND CONCLUSIONS

In this chapter, the design approach of a demonstrator circuit was presented for evaluating the energy-recovery <sub>3</sub>D interconnecting scheme. Expanding on the theoretical analysis proposed in Chapter 5, practical design issues were discussed and methods were suggested for overcoming them.

The circuit was designed and implemented on an experimental 130*nm* <sub>3</sub>D process technology and post-layout simulations res-



Figure 6.27: Simulation of circuit with fault hypothesis.



Figure 6.28: Measured energy dissipation and simulation of circuit with fault hypothesis.

ults, including parasitics, were shown and interpreted. The postlayout results revealed that the demonstrator circuit dissipated 25% more energy per cycle than the value estimated by theory, however the energy overhead was attributed to the effect of the wire/transistor parasitics, which are not included in the theoretical models. Nevertheless, the energy-recovery I/O circuit dissipated 39% less energy per cycle than the equivalent CMOS circuit implemented on the same process technology.

Experimental measurements demonstrated that the fabricated CMOS I/O circuit closely matched the energy-dissipation estimated by post-layout simulations. However, the energy-recovery I/O circuit did not operate as expected and as a result its was not possible to evaluate the energy-recovery <sub>3</sub>D interconnecting scheme on-chip. Using simulation data, the fault was attributed to higher than anticipated integrated inductor parasitics.

Planar on-chip interconnects have become a major concern in modern processes, as their impact on IC performance has been progressively increasing with each technology node. In recent years, <sub>3</sub>D interconnects based on vertical TSVs have been proposed as a possible solution to the interconnect bottleneck. TSVs have the potential to offer the best all around technology for meeting future technology trends, with their advantages including reduced parasitics, and high interconnection density.

As TSV is a relatively recent <sub>3</sub>D interconnect technology, there are still challenges to be overcome before TSVs can be widely adopted and commercialised. Issues in the fabrication of TSV devices are in the process of being resolved, or solutions have already been found. For implementing TSV-based <sub>3</sub>D-ICs with more complex functionalities than simple test chips, <sub>3</sub>D design methodologies and tools are needed which are still at their infancy stage. Accurate perception of the TSV characteristics that influence circuit performance is a major step towards the development of design methodologies and tools for <sub>3</sub>D-ICs.

Key TSV characteristics are the TSV parasitic capacitance, and thermomechanical stress distribution. The parasitic capacitance affects delay and power consumption of signals passing through the TSV interconnect, while thermomechanical stress affects MOS-FET device mobility in the vicinity of TSVs. For extracting these parameters accurately and under different conditions, sophisticated test structures were presented in this work which were fabricated and evaluated on an experimental <sub>3D</sub> process.

Although the TSV parasitic capacitance is small, it may still become an important source of power consumption as TSVs are implemented in large, densely interconnected <sub>3</sub>D-SoCs. Since conventional low-power design techniques based on voltage scaling represent a technical challenge in modern technology nodes, a novel TSV interconnection scheme with frequency-dependent power consumption was proposed in this work based on energy-recovery logic. The proposed scheme was analysed using theoretical modelling, while a demonstrator IC was designed and fabricated on a 130*nm* <sub>3</sub>D process.

The contributions of this thesis as presented in the previous chapters are summarised in the following sections, as well as possible directions for future work.

#### 7.1 MAIN CONTRIBUTIONS

From the parasitics associated with a TSV, capacitance assumes the most significant role on its electrical properties, such as energy dissipation and delay, thus accurate characterization of the TSV capacitance is highly desirable when designing a 3D system. In Chapter 3, a group of test structures were presented for the electrical characterization of the TSV parasitic capacitance under various conditions. The test structures were based on the CBCM technique, which has been used in the past for measuring BEOL metal interconnect parasitic capacitances, and MOSFET capacitances. The presented test structures were fabricated on a  $65nm_{3D}$  process and the measurement results were statistically analysed in terms of TSV capacitance die-to-die (D2D), and die-towafer (D2W) variability. Comparison of the measurement data to simulated oxide liner thickness variation for the specific process, revealed the oxide liner as a major contributor to the observed TSV capacitance variability. Measurement results also confirmed the effect of TSV pitch on the parasitic capacitance, as has been previously reported in the literature for a different process. The proposed CBCM-based TSV capacitance measurement technique was shown to be superior to conventional LCR measurements, producing both accurate measurement results as well as less outliers (less defective samples) in the measurement data.

The integration of TSV 3D interconnects in CMOS technology has been shown to induce thermomechanical stress on the silicon (Si) substrate. Thermomechanical stress can have an impact on active device carrier mobility implemented on the same Si substrate, thus monitoring and controlling stress is crucial for designing TSVaware circuits with predictable performance. Simulations using finite element method (FEM) modelling predict a complex thermomechanical stress distribution, which emphasises the need for sophisticated characterization structures that can monitor stress impact with precision and verify simulation models. In Chapter 4, a test structure was presented for indirectly monitoring of the TSV-induced thermomechanical stress distribution by measuring its effect on transistor saturation current. The structure implemented arrays of MOSFET devices superimposed with TSVs in various configurations, and enabled accurate measurement of individual devices by combining digital selection logic with four-terminal sensing. Imec's experimental 65nm 3D process was characterized using the presented test structure, extracting the thermomechanical stress effect on PMOS devices, which was found to be on par with simulation estimations.

In future large, densely interconnected <sub>3</sub>D-SoCs, the TSV parasitic capacitance may become an important source of energy dissipation. Energy-recovery logic is an interesting low-power design

approach, as it is not limited by voltage scaling which represents a technical challenge in modern technology nodes. In Chapter 5, the potential of the energy-recovery technique was investigated for reducing the energy dissipation of TSV interconnects in <sub>3</sub>D-ICs. The proposed energy-recovery scheme was analysed using theoretical modelling, generating equations for energy dissipation and optimum device sizing. The model accuracy was evaluated against SPICE simulations, which showed good correlation with the theoretical estimations on a 130*nm* technology process. Comparison of the energy-recovery scheme to conventional static CMOS demonstrated favourable energy performance for the energy-recovery scheme for low operating frequencies and high *Q* factors, switching activities, and TSV capacitances.

The theoretical models developed in Chapter 5 were used for designing a energy-recovery <sub>3</sub>D demonstrator circuit in Chapter 6, under realistic physical and electrical constraints. The energy-recovery circuit and its CMOS equivalent were fabricated on a 130*nm* <sub>3</sub>D process technology and post-layout simulations showed that the energy-recovery circuit dissipated 39% less energy per cycle than the CMOS circuit on the same process technology. Experimental measurements and comparison to simulations confirmed the proper operation of the fabricated CMOS circuit, however the energy-recovery circuit did not operate as expected and as a result its was not possible to evaluate the proposed energy-recovery scheme on a chip. Using simulation data, the fault was attributed to higher than anticipated integrated inductor parasitics.

#### 7.2 FUTURE WORK

The CBCM-based test structures in Chapter 3 were shown to be at least as accurate as external instruments for characterizing TSV capacitance. Considering that CBCM can evaluate single-ended TSVs, it could prove a valuable tool for testing TSVs prior to die stacking, which is a critical step for keeping 3D-IC yield acceptable. Measuring the TSV parasitic capacitance can provide with important information about the quality of the fabricated TSV device, since even slight variations in process parameters will affect its parasitic capacitance, as was demonstrated in Chapter 3. CBCM is simpler and potentially more accurate than other techniques that have been proposed in the literature for the same role, such as RC constant measurement using sense amplification [16].

In Chapter 4, the experiences learned from evaluation of the experimental measurements results could be potentially used for designing improved thermomechanical stress characterization structures for next generation TSV processes:

1. The transistors should be in a regular grid in the array for simple processing of the measurement data.

- 2. The boundary conditions of transistors should be carefully designed as they have an considerable effect on transistor matching and thus measurement accuracy.
- 3. Distances of interest are generally  $< 8\mu m$ , thus the arrays could be smaller and more dense in the area in proximity to the TSV.
- Mechanisms are required to detect faults in the digital selection logic when the implementation process is expected to have low-yield issues.

Furthermore, in a future work the thermomechanical stress effect on NMOS devices should be measured, which were not evaluated in this project. Simulations predict that NMOS devices are much less affected by thermomechanical stress than PMOS devices, thus the approach used in this work might not be sufficiently accurate to extract stress effect on NMOS mobility.

In Chapter 5, the energy-recovery scheme proposed could use a different encoding technique than dual-rail as a possible improvement. A suggested encoding is 1-out-of-4 [74], since only charges a single output at any given cycle, which can potentially save up to 50% energy in comparison to dual-rail. Also, the demonstration of the energy-recovery scheme feasibility on a chip is still pending, since the energy-recovery <sub>3</sub>D demonstrator discussed in Chapter 6 failed to operate properly. The fault was attributed to higher than anticipated integrated inductor parasitics, thus the same circuit should be implemented in the future with an improved inductor design for verifying the proposed theory.

# A

# APPENDIX A: PHOTOS



Figure A.1: 3D65 300mm wafer (on chuck).



Figure A.2: Probe card.



Figure A.3: 3D130 200mm wafer (on chuck).



Figure A.4: Probe head.

# B

### MEAN $(\mu)$

In a given data set, the mean is the sum of the values divided by the number of values. The mean represents the average value in a population. If  $x_i$  is a value in the data set and n the total number of values, the mean  $\mu$  can be calculated as:

$$\mu = \frac{1}{n} \sum_{i=1}^{n} x_i$$

MEDIAN (m)

The median is useful for skewed distributions and represents the arithmetic value that the majority of the values in a data set tend. The median of a normal distribution with mean  $\mu$  and variance  $\sigma^2$  is the same as the mean.

STANDARD DEVIATION  $(\sigma)$ 

Standard deviation is used for measuring variability in a data set. It shows how much variation there is from the mean value. To calculate the standard deviation of a data set, first the square of the difference of each value from the mean is calculated, and then the square root of the mean gives the standard deviation.

A normal distribution contains 68.2% of its values within  $\pm 1\sigma$ . Other distributions can have a different spread, and typically a low standard deviation indicates low variability, whereas a high standard deviation increased variability.

```
STANDARD DEVIATION OF THE MEAN (\sigma_{\mu})
```

The standard deviation of the sampled mean ( $\sigma_{\mu}$ ) provides information on the precision of the calculated mean of a data set. This is especially useful in experimental measurements, where the certainty of the data being an accurate representation of the measured physical quantity increases the more the measurement is repeated.

Assuming there is a data set *X* with *n* values  $x_i$  of mean  $\mu$ :

### COEFFICIENT OF VARIATION

The coefficient of variation is the ratio of the standard deviation  $\sigma$  to the mean  $\mu$  :

$$CV = \frac{\sigma}{\mu}$$

It can be more useful in determining variability in a population than standard deviation, as in contrast to standard deviation coefficient of variation is a dimensionless number often expressed as percentage.

#### OUTLIER

Outliers are values in a data set that are numerically distant from the rest of the data. Observing, and sometimes removing, the outliers is crucial in experimental measurement as these values usually indicate a measurement error or other malfunction. Determining outliers is generally subjective to the observer as there is no specific mathematical definition of what constitutes outliers. Typically the outliers are located by observing the probability distribution function of a data set.

#### KRUSKAL-WALLIS [34]

Kruskal–Wallis one-way analysis of variance compares the medians of two or more data sets to determine if the samples originate from separate populations. This analysis tests the null-hypothesis that there is no relationship between the data sets, calculating the probability of the data sets originating from the same population and rejecting the null-hypothesis if it is less than a significance level. The Kruskal–Wallis test can be used on data sets that do not have a normal distribution and their variances do not have to be equal.

- [1] Model 2602 dual-channel system sourcemeter instrument. URL http://www.keithley.com/products/dcac/ currentsource/broadpurpose/?mn=2602.
- [2] Max-3d. URL http://www.micromagic.com/.
- [3] The international technology roadmap for semiconductors, 2007. URL http:/public.itrs.net/.
- [4] The international technology roadmap for semiconductors, 2009. URL http:/public.itrs.net/.
- [5] Jaime Aguilera, Joaquin de No, Andres Garcia-Alonso, Frank Oehler, Heiko Hein, and Josef Sauerer. A guide for onchip inductor design in a conventional cmos process for rf applications. In *Applied Microwave & Wireless Magazine*, pages 56–65, October 2001.
- [6] M. Arsalan and M. Shams. Charge-recovery power clock generators for adiabatic logic circuits. In *Proc. 18th Int VLSI Design Conf*, pages 171–174, 2005. doi: 10.1109/ICVD.2005. 64.
- [7] P. Asimakopoulos, G. Van der Plas, A. Yakovlev, and P. Marchal. Evaluation of energy-recovering interconnects for low-power 3d stacked ics. In *Proc. IEEE Int. Conf.* 3D System Integration 3DIC 2009, pages 1–5, 2009. doi: 10.1109/3DIC.2009.5306581.
- [8] W. C. Athas, L. J. Svensson, J. G. Koller, N. Tzartzanis, and E. Ying-Chin Chou. Low-power digital systems based on adiabatic-switching principles. 2(4):398–407, 1994. doi: 10. 1109/92.335009.
- [9] C. H. Bennett. Logical reversibility of computation. *IBM Journal of Research and Development*, 17(6):525–532, 1973. doi: 10.1147/rd.176.0525.
- [10] Charles H Bennett. The fundamental physical limits of computation. *Scientific American*, 253(1):48–56, 1985. URL http://www.nature.com/doifinder/10.1038/ scientificamerican0785-48.
- [11] E. Beyne. 3d system integration technologies. In Proc. Int VLSI Technology, Systems, and Applications Symp, pages 1–9, 2006. doi: 10.1109/VTSA.2006.251113.

- [12] Mark Bohr and Kaizad Mistry. Intel's revolutionary 22 nm transistor technology, 2011. URL http://download.intel.com/newsroom/kits/22nm/pdfs/ 22nm-Details\_Presentation.pdf.
- [13] Ke Cao, S. Dobre, and Jiang Hu. Standard cell characterization considering lithography induced variations. In *Proc. 43rd ACM/IEEE Design Automation Conf*, pages 801–804, 2006. doi: 10.1109/DAC.2006.229327.
- [14] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen. Low-power cmos digital design. 27(4):473–484, 1992. doi: 10.1109/4.126534.
- [15] J. C. Chen, B. W. McGaughy, D. Sylvester, and Chenming Hu. An on-chip, attofarad interconnect charge-based capacitance measurement (cbcm) technique. In *Proc. Int. Electron Devices Meeting IEDM '96*, pages 69–72, 1996. doi: 10.1109/IEDM. 1996.553124.
- [16] Po-Yuan Chen, Cheng-Wen Wu, and Ding-Ming Kwai. Onchip tsv testing for 3d ic before bonding using sense amplification. In *Proc. ATS '09. Asian Test Symp*, pages 450–455, 2009. doi: 10.1109/ATS.2009.42.
- [17] Sungdong Cho, Sinwoo Kang, Kangwook Park, Jaechul Kim, Kiyoung Yun, Kisoon Bae, Woon Seob Lee, Sangwook Ji, Eunji Kim, Jangho Kim, Y. L. Park, and E. S. Jung. Impact of tsv proximity on 45nm cmos devices in wafer level. In *Proc. IEEE Int. Interconnect Technology Conf. and 2011 Materials for Advanced Metallization (IITC/MAM)*, pages 1–3, 2011. doi: 10.1109/IITC.2011.5940326.
- [18] Kyu-Myung Choi. An industrial perspective of 3d ic integration technology from the viewpoint of design technology. In *Proc. 15th Asia and South Pacific Design Automation Conf.* (*ASP-DAC*), pages 544–547, 2010. doi: 10.1109/ASPDAC. 2010.5419823.
- [19] Juang-Ying Chueh, C. H. Ziesler, and M. C. Papaefthymiou. Empirical evaluation of timing and power in resonant clock distribution. In *Proc. Int. Symp. Circuits and Systems ISCAS* '04, volume 2, 2004. doi: 10.1109/ISCAS.2004.1329255.
- [20] Thuy Dao, D. H. Triyoso, M. Petras, and M. Canonico. Through silicon via stress characterization. In *Proc. IEEE Int. Conf. IC Design and Technology ICICDT '09*, pages 39–41, 2009. doi: 10.1109/ICICDT.2009.5166260.
- [21] M. Grange, R. Weerasekera, D. Pamunuwa, and H. Tenhunen. Exploration of through silicon via interconnect par-

asitics for 3-dimensional integrated circuits. In *Workshop Notes, Design, Automation and Test in Europe (DATE),* 2009.

- [22] N. Z. Haron and S. Hamdioui. Why is cmos scaling coming to an end? In *Proc. 3rd Int. Design and Test Workshop IDT* 2008, pages 98–103, 2008. doi: 10.1109/IDT.2008.4802475.
- [23] Imec. 200mm 3d-sic tsv line, 2009. URL http://www.imec. be/ScientificReport/SR2009/HTML/1213307.html.
- [24] J. W. Joyner, R. Venkatesan, P. Zarkesh-Ha, J. A. Davis, and J. D. Meindl. Impact of three-dimensional architectures on interconnects in gigascale integration. 9(6):922–928, 2001. doi: 10.1109/92.974905.
- [25] A. P. Karmarkar, Xiaopeng Xu, and V. Moroz. Performanace and reliability analysis of 3d-integration structures employing through silicon via (tsv). In *Proc. IEEE Int. Reliability Physics Symp*, pages 682–687, 2009. doi: 10.1109/IRPS.2009.5173329.
- [26] G. Katti, A. Mercha, M. Stucchi, Z. Tokei, D. Velenis, J. Van Olmen, C. Huyghebaert, A. Jourdain, M. Rakowski, I. Debusschere, P. Soussan, H. Oprins, W. Dehaene, K. De Meyer, Y. Travaly, E. Beyne, S. Biesemans, and B. Swinnen. Temperature dependent electrical characteristics of through-si-via (tsv) interconnections. In *Proc. Int. Interconnect Technology Conf. (IITC)*, pages 1–3, 2010. doi: 10.1109/IITC.2010.5510311.
- [27] G. Katti, M. Stucchi, K. De Meyer, and W. Dehaene. Electrical modeling and characterization of through silicon via for three-dimensional ics. 57(1):256–262, 2010. doi: 10.1109/ TED.2009.2034508.
- [28] G. Katti, M. Stucchi, J. Van Olmen, K. De Meyer, and W. Dehaene. Through-silicon-via capacitance reduction technique to benefit 3-d ic performance. 31(6):549–551, 2010. doi: 10.1109/LED.2010.2046712.
- [29] Walt Kester. Sensor signal conditioning. In Jon S. Wilson, editor, Sensor Technology Handbook, pages 31 136. Newnes, Burlington, 2005. ISBN 978-0-75-067729-5. doi: DOI:10.1016/B978-075067729-5/50044-6. URL http://www.sciencedirect.com/science/article/pii/B9780750677295500446.
- [30] Jonggab Kil, Jie Gu, and C. H. Kim. A high-speed variationtolerant interconnect technique for sub-threshold circuits using capacitive boosting. 16(4):456–465, 2008. doi: 10.1109/ TVLSI.2007.915455.

- [31] Suhwan Kim, C. H. Ziesler, and M. C. Papaefthymiou. Charge-recovery computing on silicon. 54(6):651–659, 2005. doi: 10.1109/TC.2005.91.
- [32] C. Kortekaas. On-chip quasi-static floating-gate capacitance measurement method. In Proc. Int. Conf. Microelectronic Test Structures ICMTS 1990, pages 109–113, 1990. doi: 10.1109/ ICMTS.1990.67889.
- [33] A. Kramer, J. S. Denker, B. Flower, and J. Moroney. 2nd order adiabatic computation with 2n-2p and 2n-2n2p logic circuits. In *Proceedings of the 1995 international symposium on Low power design*, ISLPED '95, pages 191–196, New York, NY, USA, 1995. ACM. ISBN 0-89791-744-8. doi: http://doi.acm.org/10.1145/224081.224115. URL http://doi.acm.org/10.1145/224081.224115.
- [34] William H. Kruskal and W. Allen Wallis. Use of ranks in onecriterion variance analysis. *Journal of the American Statistical Association*, 47(260):583–621, 1952. ISSN 01621459. doi: 10. 2307/2280779. URL http://dx.doi.org/10.2307/2280779.
- [35] Kelin J. Kuhn. Moore's law past 32nm: Future challenges in device scaling, 2009. URL http://download.intel.com/ pressroom/pdf/kkuhn/Kuhn\_IWCE\_invited\_text.pdf.
- [36] R. Landauer. Irreversibility and heat generation in the computing process. *IBM Journal of Research and Development*, 5 (3):183–191, 1961. doi: 10.1147/rd.53.0183.
- [37] Joonho Lim, Dong-Gyu Kim, and Soo-Ik Chae. A 16-bit carry-lookahead adder using reversible energy recovery logic for ultra-low-energy systems. 34(6):898–903, 1999. doi: 10.1109/4.766827.
- [38] Joonho Lim, Dong-Gyu Kim, and Soo-Ik Chae. nmos reversible energy recovery logic for ultra-low-energy applications. 35(6):865–875, 2000. doi: 10.1109/4.845190.
- [39] F. Liu, X. Gu, K. A. Jenkins, E. A. Cartier, Y. Liu, P. Song, and S. J. Koester. Electrical characterization of 3d through-siliconvias. In *Proc. 6oth Electronic Components and Technology Conf.* (*ECTC*), pages 1100–1105, 2010. doi: 10.1109/ECTC.2010. 5490839.
- [40] K. H. Lu, Xuefeng Zhang, Suk-Kyu Ryu, J. Im, Rui Huang, and P. S. Ho. Thermo-mechanical reliability of 3-d ics containing through silicon vias. In *Proc. 59th Electronic Components and Technology Conf. ECTC 2009*, pages 630–634, 2009. doi: 10.1109/ECTC.2009.5074079.

- [41] Nir Magen, Avinoam Kolodny, Uri Weiser, and Nachum Shamir. Interconnect-power dissipation in a microprocessor. In Proceedings of the 2004 international workshop on System level interconnect prediction, SLIP '04, pages 7–13, New York, NY, USA, 2004. ACM. ISBN 1-58113-818-0. doi: http:// doi.acm.org/10.1145/966747.966750. URL http://doi.acm. org/10.1145/966747.966750.
- [42] H. Mahmoodi, V. Tirumalashetty, M. Cooke, and K. Roy. Ultra low-power clocking scheme using energy recovery and clock gating. 17(1):33–44, 2009. doi: 10.1109/TVLSI.2008. 2008453.
- [43] D. Maksimovic. A mos gate drive with resonant transitions. In Proc. nd Annual IEEE Power Electronics Specialists Conf. PESC '91 Record, pages 527–532, 1991. doi: 10.1109/PESC. 1991.162725.
- [44] D. Maksimovic and V. G. Oklobdzija. Integrated power clock generators for low energy logic. In *Proc. th Annual IEEE Power Electronics Specialists Conf. PESC '95 Record*, volume 1, pages 61–67, 1995. doi: 10.1109/PESC.1995.474793.
- [45] D. Maksimovic, V. G. Oklobdzija, B. Nikolic, and K. W. Current. Clocked cmos adiabatic logic with integrated single-phase power-clock supply. 8(4):460–463, 2000. doi: 10.1109/92.863629.
- [46] E. J. Marinissen. Testing tsv-based three-dimensional stacked ics. In Proc. Design, Automation & Test in Europe Conf. & Exhibition (DATE), pages 1689–1694, 2010.
- [47] E. J. Marinissen and Y. Zorian. Testing 3d chips containing through-silicon vias. In *Proc. Int. Test Conf. ITC 2009*, pages 1–11, 2009. doi: 10.1109/TEST.2009.5355573.
- [48] A. Mercha, A. Redolfi, M. Stucchi, N. Minas, J. Van Olmen, S. Thangaraju, D. Velenis, S. Domae, Y. Yang, G. Katti, R. Labie, C. Okoro, M. Zhao, P. Asimakopoulos, I. De Wolf, T. Chiarella, T. Schram, E. Rohr, A. Van Ammel, A. Jourdain, W. Ruythooren, S. Armini, A. Radisic, H. Philipsen, N. Heylen, M. Kostermans, P. Jaenen, E. Sleeckx, D. Sabuncuoglu Tezcan, I. Debusschere, P. Soussan, D. Perry, G. Van der Plas, J. H. Cho, P. Marchal, Y. Travaly, E. Beyne, S. Biesemans, and B. Swinnen. Impact of thinning and through silicon via proximity on high-k / metal gate first cmos performance. In *Proc. Symp. VLSI Technology (VLSIT)*, pages 109–110, 2010. doi: 10.1109/VLSIT.2010.5556190.
- [49] A. Mercha, G. Van der Plas, V. Moroz, I. De Wolf, P. Asimakopoulos, N. Minas, S. Domae, D. Perry, M. Choi,

A. Redolfi, C. Okoro, Y. Yang, J. Van Olmen, S. Thangaraju, D. S. Tezcan, P. Soussan, J. H. Cho, A. Yakovlev, P. Marchal, Y. Travaly, E. Beyne, S. Biesemans, and B. Swinnen. Comprehensive analysis of the impact of single and arrays of through silicon vias induced stress on high-k / metal gate cmos performance. In *Proc. IEEE Int. Electron Devices Meeting (IEDM)*, 2010. doi: 10.1109/IEDM.2010.5703278.

- [50] Yong Moon and Deog-Kyoon Jeong. An efficient charge recovery logic circuit. 31(4):514–522, 1996. doi: 10.1109/4. 499727.
- [51] G. E. Moore. Cramming more components onto integrated circuits. *Electronics*, 38(8):114–117, April 1965. doi: 10.1109/JPROC.1998.658762. URL http://dx.doi.org/10.1109/JPROC.1998.658762.
- [52] O. S. Nakagawa, S.-Y. Oh, T. Hsu, and S. Habu. Benchmark methodology of interconnect capacitance simulation using inter-digitated capacitors. In *Proc. Int Microelectronic Test Structures ICMTS* 1998. Conf, pages 235–237, 1998. doi: 10. 1109/ICMTS.1998.688103.
- [53] M. W. Newman, S. Muthukumar, M. Schuelein, T. Dambrauskas, P. A. Dunaway, J. M. Jordan, S. Kulkarni, C. D. Linde, T. A. Opheim, R. A. Stingel, W. Worwag, L. A. Topic, and J. M. Swan. Fabrication and electrical characterization of 3d vertical interconnects. In *Proc.* 56th Electronic Components and Technology Conf, 2006. doi: 10.1109/ECTC.2006.1645676.
- [54] V. G. Oklobdzija, D. Maksimovic, and Fengcheng Lin. Passtransistor adiabatic logic using single power-clock supply. 44(10):842–846, 1997. doi: 10.1109/82.633443.
- [55] J. Van Olmen, C. Huyghebaert, J. Coenen, J. Van Aelst, E. Sleeckx, A. Van Ammel, S. Armini, G. Katti, J. Vaes, W. Dehaene, E. Beyne, and Y. Travaly. Integration challenges of copper through silicon via (tsv) metallization for 3d-stacked ic integration. *Microelectronic Engineering*, 88 (5):745 – 748, 2011. ISSN 0167-9317. doi: DOI:10.1016/ j.mee.2010.06.026. URL http://www.sciencedirect.com/ science/article/pii/S0167931710002121. The 2010 International workshop on - MAM 2010, The 2010 International workshop on.
- [56] J. J. Paulos and D. A. Antoniadis. Measurement of minimumgeometry mos transistor capacitances. 20(1):277–283, 1985. doi: 10.1109/JSSC.1985.1052303.

- [57] Vasilis F. Pavlidis and Eby G. Friedman. Interconnect delay minimization through interlayer via placement in 3-d ics. In *Proceedings of the 15th ACM Great Lakes symposium on VLSI*, GLSVLSI '05, pages 20–25, New York, NY, USA, 2005. ACM. ISBN 1-59593-057-4. doi: http://doi.acm.org/10. 1145/1057661.1057669. URL http://doi.acm.org/10.1145/ 1057661.1057669.
- [58] Dan Perry, Jonghoon Cho, Shinichi Domae, Panagiotis Asimakopoulos, Alex Yakovlev, Pol Marchal, Geert Van der Plas, and Nikolaos Minas. An efficient array structure to characterize the impact of through silicon vias on fet devices. In *Proc. IEEE Int Microelectronic Test Structures (ICMTS) Conf*, pages 118–122, 2011. doi: 10.1109/ICMTS.2011.5976872.
- [59] Jan M. Rabaey, Anantha Chandrakasan, and Borivoje Nikolic. Digital integrated circuits- A design perspective. Prentice Hall, 2ed edition, 2004.
- [60] K. Roy and S. Prasad. Low-power CMOS VLSI circuit design. "A Wiley-Interscience publication.". Wiley, 2000. ISBN 9780471114888.
- [61] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand. Leakage current mechanisms and leakage reduction techniques in deep-submicrometer cmos circuits. 91(2):305–327, 2003. doi: 10.1109/JPROC.2002.808156.
- [62] V. S. Sathe, M. C. Papaefthymiou, and C. H. Ziesler. Boost logic : a high speed energy recovery circuit family. In *Proc. IEEE Computer Society Annual Symp. VLSI*, pages 22–27, 2005. doi: 10.1109/ISVLSI.2005.22.
- [63] V. S. Sathe, J. C. Kao, and M. C. Papaefthymiou. Resonantclock latch-based design. 43(4):864–873, 2008. doi: 10.1109/ JSSC.2008.917501.
- [64] B. Sell, A. Avellan, and W. H. Krautschneider. Charge-based capacitance measurements (cbcm) on mos devices. 2(1):9–12, 2002. doi: 10.1109/TDMR.2002.1014667.
- [65] C. S. Selvanayagam, Xiaowu Zhang, R. Rajoo, and D. Pinjala. Modelling stress in silicon with tsvs and its effect on mobility. In *Proc. 11th Electronics Packaging Technology Conf. EPTC '09*, pages 612–618, 2009. doi: 10.1109/EPTC.2009.5416477.
- [66] D. Shariff, P. C. Marimuthu, K. Hsiao, L. Asoy, Chia Lai Yee, Aung Kyaw Oo, K. Buchanan, K. Crook, T. Wilby, and S. Burgess. Integration of fine-pitched through-silicon vias and integrated passive devices. In *Proc. IEEE 61st Electronic Components and Technology Conf. (ECTC)*, pages 844– 848, 2011. doi: 10.1109/ECTC.2011.5898609.
- [67] S. Stoukatch, C. Winters, E. Beyne, W. De Raedt, and C. Van Hoof. 3d-sip integration for autonomous sensor nodes. In *Proc. 56th Electronic Components and Technology Conf*, 2006. doi: 10.1109/ECTC.2006.1645678.
- [68] L. J. Svensson and J. G. Koller. Driving a capacitive load without dissipating fcv2. In Proc. IEEE Symp. Low Power Electronics Digest of Technical Papers, pages 100–101, 1994. doi: 10.1109/LPE.1994.573220.
- [69] B. Swinnen, W. Ruythooren, P. De Moor, L. Bogaerts, L. Carbonell, K. De Munck, B. Eyckens, S. Stoukatch, D. S. Tezcan, Z. Tokei, J. Vaes, J. Van Aelst, and E. Beyne. 3d integration by cu-cu thermo-compression bonding of extremely thinned bulk-si die containing 10 ÎCEm pitch through-si vias. In *Proc. Int. Electron Devices Meeting IEDM '06*, pages 1–4, 2006. doi: 10.1109/IEDM.2006.346786.
- [70] N. Tanaka, M. Kawashita, Y. Yoshimura, T. Uematsu, M. Fujisawa, H. Shimokawa, N. Kinoshita, T. Naito, T. Kikuchi, and T. Akazawa. Characterization of mos transistors after tsv fabrication and 3d-assembly. In *Proc. 2nd Electronics System-Integration Technology Conf. ESTC 2008*, pages 131–134, 2008. doi: 10.1109/ESTC.2008.4684338.
- [71] S. E. Thompson, Guangyu Sun, Youn Sung Choi, and T. Nishida. Uniaxial-process-induced strained-si: extending the cmos roadmap. 53(5):1010–1020, 2006. doi: 10.1109/ TED.2006.872088.
- [72] N. Tzartzanis and W. C. Athas. Clock-powered cmos: a hybrid adiabatic logic style for energy-efficient computing. In Proc. 20th Anniversary Conf. Advanced Research in VLSI, pages 137–151, 1999. doi: 10.1109/ARVLSI.1999.756044.
- [73] J. Van Olmen, A. Mercha, G. Katti, C. Huyghebaert, J. Van Aelst, E. Seppala, Zhao Chao, S. Armini, J. Vaes, R. C. Teixeira, M. Van Cauwenberghe, P. Verdonck, K. Verhemeldonck, A. Jourdain, W. Ruythooren, M. de Potter de ten Broeck, A. Opdebeeck, T. Chiarella, B. Parvais, I. Debusschere, T. Y. Hoffmann, B. De Wachter, W. Dehaene, M. Stucchi, M. Rakowski, P. Soussan, R. Cartuyvels, E. Beyne, S. Biesemans, and B. Swinnen. 3d stacked ic demonstration using a through silicon via first approach. In *Proc. IEEE Int. Electron Devices Meeting IEDM 2008*, pages 1–4, 2008. doi: 10.1109/IEDM.2008.4796763.
- [74] Tom Verhoeff. Delay-insensitive codes an overview. Distributed Computing, 3:1–8, 1988. ISSN 0178-2770. URL http: //dx.doi.org/10.1007/BF01788562. 10.1007/BF01788562.

- [75] P. Vitanov, T. Dimitrova, and I. Eisele. Direct capacitance measurements of small geometry mos transistors. *Microelectronics Journal*, 22(7-8):77 – 89, 1991. ISSN 0026-2692. doi: DOI:10.1016/0026-2692(91)90016-G. URL http://www. sciencedirect.com/science/article/B6V44-4829XDJ-2N/ 2/f5b2c221dcf240c38cd13b7b3432f913.
- [76] B. Voss and M. Glesner. A low power sinusoidal clock. In *Proc. IEEE Int. Symp. Circuits and Systems ISCAS 2001*, volume 4, pages 108–111, 2001. doi: 10.1109/ISCAS.2001. 922182.
- [77] R. Weerasekera, M. Grange, D. Pamunuwa, H. Tenhunen, and Li-Rong Zheng. Compact modelling of through-silicon vias (tsvs) in three-dimensional (3-d) integrated circuits. In *Proc. IEEE Int. Conf.* 3D System Integration 3DIC 2009, pages 1–8, 2009. doi: 10.1109/3DIC.2009.5306541.
- [78] N.H.E. Weste and D.F. Harris. CMOS VLSI design: a circuits and systems perspective. Pearson/Addison-Wesley, 2005. ISBN 9780321269775.
- [79] Steve Wozniak. Failure magazine interview (by jason zasky), July 2000. URL http://failuremag.com/index.php/site/ print/steve\_wozniak\_interview/.
- [80] Hyung Suk Yang and Muhannad S. Bakir. 3d integration of cmos and mems using mechanically flexible interconnects (mfi) and through silicon vias (tsv). In *Proc. 6oth Electronic Components and Technology Conf. (ECTC)*, pages 822– 828, 2010. doi: 10.1109/ECTC.2010.5490716.
- [81] Yu Yang, G. Katti, R. Labie, Y. Travaly, B. Verlinden, and I. De Wolf. Electrical evaluation of 130-nm mosfets with tsv proximity in 3d-sic structure. In *Proc. Int. Interconnect Technology Conf. (IITC)*, pages 1–3, 2010. doi: 10.1109/IITC. 2010.5510710.
- [82] Yibin Ye and K. Roy. Energy recovery circuits using reversible and partially reversible logic. 43(9):769–778, 1996. doi: 10.1109/81.536746.
- [83] Yibin Ye and K. Roy. Qserl: quasi-static energy recovery logic. 36(2):239–248, 2001. doi: 10.1109/4.902764.
- [84] C. C. Yeh, J. H. Lou, and J. B. Kuo. 1.5 v cmos full-swing energy efficient logic (eel) circuit suitable for low-voltage and low-power vlsi applications. *Electronics Letters*, 33(16): 1375–1376, 1997. doi: 10.1049/el:19970915.

[85] C. P. Yuan and T. N. Trick. A simple formula for the estimation of the capacitance of two-dimensional interconnects in vlsi circuits. 3(12):391–393, 1982. doi: 10.1109/EDL.1982. 25610.