Waseda University Doctoral Dissertation

# Research on Low Power Technology by AC Power Supply Circuits

Yimeng ZHANG

Graduate School of Information, Production and Systems Waseda University July 2012

### Abstract

In recent years, mobile devices are widely used in the consuming electronics field, and the battery life becomes a very important criterion for a device. Therefore, the power dissipation of a circuit is concerned more than ever before; in many applications, it is even more important than the high speed performance. Many directions, including novel system structures such as clock gating, novel algorithms such as parallel calculation, and novel circuit technology such as sub-threshold logic, are researched to lower circuits' power dissipation. In this dissertation, charge recovery logic technology is researched; the research area includes novel structure of charge recovery logic and application area is also discussed. Because the performance of portable devices increases dramatically, if there is no great development in the battery technology, the battery life of portable device will become very poor or a large size of battery will be necessary. Several technologies such as sub-threshold logic, power gating and clock gating are proposed to prevent the power dissipation from increasing so dramatically. With these technologies, circuits can work with very low power dissipation, but as the power is reduced, the operation frequency is also lowered.

Charge recovery logic is also a technology to lower the power dissipation of circuits. The basic concept of charge recovery logic uses clock as power supply. As clock rises and falls, energy is charged into and discharged from circuits, and during this procedure, energy can be recycled and only dissipates on parasitic resistance. The principle of charge recovery logic was researched firstly in 1961, and with the principle, charge recovery logic was applied in circuits design in 1990s. Until today, researches on charge recovery logic are processing, and many novel structures of charge recovery logic, such as ECRL, ADCVSL, CPAL, CAMOS, ADCL, 2PDADCL, and QSERL, have been proposed. These published charge recovery logics achieve low power dissipation in low operation frequency range, but when the operation frequency is higher than 100MHz, these types cannot work correctly any more. To apply the charge recovery logic in a higher frequency range, other types of charge recovery logic such as boost logic, enhanced boost logic, and sub-threshold boost logic are proposed. These types of charge recovery can work at Giga-hertz level with low power dissipation. But still they have shortcomings: low value of DC voltage power supply is required. This shortcoming would cause high complexity of circuits and extra requirements of peripheral circuits (e.g. multiple DC-DC converters). This dissertation proposed two novel types of charge recovery logic which can work in a high operation frequency with even lower power dissipation comparing with previous works. Moreover, several application fields of charge recovery are also discussed. This dissertation contains 5 chapters which are listed as follows:

Chapter 1 [Introduction] discusses the background of low power technologies and introduces several types of low power technologies. The principle of charge recovery logic is explained and several published charge recovery logics are introduced. To generate the power clock to drive charge recovery logic, the structure of *LC* oscillator is introduced and the power dissipation of the oscillator is analyzed in Chapter 1.

Chapter 2 [Pulse Boost Logic and Application on Multiplier] presents a novel structure of charge recovery logic named Pulse Boost Logic (PBL).

The PBL uses two-phase non-overlapped power clock as power supply. PBL gate is divided into logic evaluation part and boost part. The logic parts consist of two-rail complementary evaluation blocks and for each rail, pull up network (PUN) and pull down network (PDN) are constructed by NMOS to promote the circuit performance in high frequency range. The boost part is a latch structure which is used to boost up the evaluated value from evaluation part. Energy firstly is charged into the evaluation part and logic value is evaluated, then the boost part is charged to amplify the evaluated value while the energy is discharged from the evaluation part.

In Chapter 2, the power dissipation of PBL is analyzed theoretically, and comparing with previous types of charge recovery logic, PBL has lower power dissipation according to the analytic result. To demonstrate the low power of PBL, a 4-bit multiplier is designed and fabricated with  $0.18\mu$ m CMOS technology. To detect the operation frequency, a 5-bit counter is designed for frequency dividing, and to convert AC format output of PBL to normal DC logic signal, data A/D converter is also designed. In simulation, PBL can work at as high as 1.8GHz, and dissipates less energy comparing with Enhanced Boost Logic, which has similar structure with PBL. The measurement of test chip is at an operation frequency up to 161MHz, and the power dissipation of the design is  $772\mu W$  including 4-bit pipeline multiplier, frequency divider and data A/D converter.

Chapter 3 [Pseudo NMOS Boost Logic and Application on Large Scale Logic Circuits] presents another novel type of charge recovery logic called Pseudo NMOS Boost Logic (pNBL).

The structure of PBL has four complementary blocks in its logic evaluation parts, which requires a large number of transistors, and pNBL is proposed to solve this problem. PNBL utilizes pseudo-NMOS structure in its evaluation part, and by this method, almost half number of transistors are reduced. When pNBL evaluates the logic value, the large sized PMOS transistors are on, and PDN in one side of the two-rail evaluation network pulls the evaluated logic down. The boost part amplifies the evaluated voltage value and output to other gates. During this procedure, energy is charge and discharged the same with that in PBL.

The power dissipation of pNBL is also analyzed, and the analytic result shows that the pNBL has lower power dissipation comparing with PBL and EBL. To demonstrate the low power dissipation of pNBL, a Processing Engine which is used in LDPC decode system is designed and fabricated with pNBL in standard  $0.18\mu$ m CMOS process technology. The simulation results show when operation frequency is lower than 1.1GHz PE with pNBL gates achieves lower power dissipation than PE with conventional static CMOS gates. At the frequency range of several hundred megahertz which LDPC application is usually applied, energy dissipation of PE with pNBL gates is reduced much. The proposed PE dissipates 3.5pJ per cycle at 1.1GHz, and 1pJ at 403MHz in simulation. The latter one is only 36% of PE with static CMOS gates. Comparing with other charge recovery logic, pNBL also has a better performance over energy dissipation. The test chip was fabricated and measured, the result showed that the test chip can work at frequency up to 609MHZ with the energy dissipation of 2.1pJ/cycle including PE module and blip power clock generator.

Chapter 4 [Other Applications of Charge Recovery Logic] discusses two specified applications which is suitable to the charge recovery logic.

Because charge recovery logic is driven by sinusoidal format power clock, it can be applied in the AC power environment such as wireless power transmission. In this chapter, an on-chip inductive coupling system is built, and the load circuits are designed with pulse boost logic (PBL). Due to the characteristics of PBL, no voltage rectifier is required and the power can be saved. The test chip was fabricated and measured; the result indicates that power transmitted by the system is 22mW while the value in previous work is 2.5mW. Other than wireless power transmission system, crystal oscillator also generates sinusoidal format clock. In sensor network systems, sensor nodes are using the crystal oscillator as clock generator. While in the sleep mode, only real time counter works in sensor node. But to drive the real time counter and other digital circuits, a converter is required to convert sinusoidal clock to square wave clock and the converter consumes relative large power. By substituting the conventional static CMOS with charge recovery logic to construct the real time counter, the power dissipation in sleep mode can be reduced dramatically. To demonstrate this proposal, 16-bit counters with both pseudo-NMOS Boost Logic (pNBL) and static CMOS are designed. The simulation results show that real time counter with charge recovery logic dissipates only 16% power of that dissipated by counter with static CMOS. Adding the power dissipated by the clock signal converter, sensor node structure with charge recovery logic counter reduced 92% power dissipation comparing with conventional structure.

Chapter 5 [Conclusion] summaries the proposals and draws conclusion of this dissertation.

## Acknowledgements

First of all, I would like to express my sincere gratitude to my advisor, Professor Tsutomu Yoshihara, who is not only my supervisor in my research, but also a mentor during my life in Waseda University. I will regard him as the most important benefactor in my life.

I would also like to express my appreciation to Professor Toshihiko Yoshimasu, Professor Yasuaki Inoue and Professor Satoshi Goto for their guidance to my research. I would like to thank Dr. Tsukasa Oishi, and Dr. Kazutami Arimoto from Renesas Electronics for their help to broaden my horizon in the research.

I also thank Mr. Leona Okamura, Mr. Mengshu Huang, Mr. Chong Zhang, and Mr. Nan Wang for working together with me on the research and other activities. I also express my thanks to all the members of Yoshihara Lab. for their help.

I am grateful to Professor Noriyoshi Yamauchi at Waseda University for the valuable knowledge and experience I learned from him during my master course.

I should thank my best friends Xiao Peng and Muchen Li for their great support in my campus life in Waseda University. I would express most appreciation and love to my parents, who give me selfless love and support for my whole life.

Finally, I express my gratitude to Waseda University Ambient SoC Global COE Program of MEXT for the support to my research. I also thank Ms. Kozue Ohata and Mr. Yasuyuki Saito for their help.

# Contents

| A        | bstra           | $\mathbf{ct}$ |                                          | i    |
|----------|-----------------|---------------|------------------------------------------|------|
| A        | cknov           | wledge        | ments                                    | vii  |
| Li       | st of           | Tables        | 5                                        | xiii |
| Li       | st of           | Figure        | es                                       | xv   |
| 1        | $\mathbf{Intr}$ | oducti        | on                                       | 1    |
|          | 1.1             | Lower         | Power Technologies                       | 1    |
|          |                 | 1.1.1         | Technology and Circuit Design            | 2    |
|          |                 | 1.1.2         | Logic and Module Design                  | 4    |
|          | 1.2             | Charg         | e Recovery Logic                         | 6    |
|          |                 | 1.2.1         | Principle of Charge Recovery Logic       | 7    |
|          |                 | 1.2.2         | ADCL and 2PDADCL                         | 10   |
|          |                 | 1.2.3         | Boost Logic Family                       | 15   |
|          | 1.3             | Power         | Clock Generator                          | 21   |
|          |                 | 1.3.1         | Architecture of LC Resonant Oscillator   | 22   |
|          |                 | 1.3.2         | Power Analysis of LC Resonant Oscillator | 25   |
| <b>2</b> | Pul             | se Boo        | st Logic and Application on Multiplier   | 29   |
|          | 2.1             | Introd        | uction                                   | 29   |
|          | 2.2             | Pulse         | Boost Logic                              | 30   |

|   |                                                      | 2.2.1                                                                                                                  | Structure of PBL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 31                                                                                      |    |
|---|------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|----|
|   |                                                      | 2.2.2                                                                                                                  | Operation of PBL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 31                                                                                      |    |
|   |                                                      | 2.2.3                                                                                                                  | Analysis of Energy Dissipation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | 34                                                                                      |    |
|   | 2.3                                                  | Desigr                                                                                                                 | n of 4-bit Multiplier                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 38                                                                                      |    |
|   |                                                      | 2.3.1                                                                                                                  | Elements of 4-bit multiplier                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 38                                                                                      |    |
|   |                                                      | 2.3.2                                                                                                                  | Interface between PBL and Static CMOS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 40                                                                                      |    |
|   |                                                      | 2.3.3                                                                                                                  | 4-bit Pipeline Multiplier                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | 41                                                                                      |    |
|   |                                                      | 2.3.4                                                                                                                  | 5-bit Counter as Frequency Divider                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 41                                                                                      |    |
|   |                                                      | 2.3.5                                                                                                                  | Peripheries of Multiplier                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | 44                                                                                      |    |
|   | 2.4                                                  | Test C                                                                                                                 | Chip of Pulse Boost Logic                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | 45                                                                                      |    |
|   |                                                      | 2.4.1                                                                                                                  | Simulation Result                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | 45                                                                                      |    |
|   |                                                      | 2.4.2                                                                                                                  | Test Chip Measurement                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 47                                                                                      |    |
|   | 2.5                                                  | Conclu                                                                                                                 | usion                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 50                                                                                      |    |
|   |                                                      |                                                                                                                        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | ו •,                                                                                    | ۲1 |
| 3 | Pse                                                  | udo N                                                                                                                  | MOS Boost Logic and Application on Large Scale Logic (                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | Circuits                                                                                | 91 |
| 3 | <b>Pse</b><br>3.1                                    | udo Ni<br>Introd                                                                                                       | MOS Boost Logic and Application on Large Scale Logic C         uction                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 51                                                                                      | 91 |
| 3 | <b>Pse</b><br>3.1<br>3.2                             | udo N<br>Introd<br>Pseud                                                                                               | MOS Boost Logic and Application on Large Scale Logic C         uction                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 51<br>52                                                                                | 91 |
| 3 | <b>Pse</b><br>3.1<br>3.2                             | udo N<br>Introd<br>Pseud<br>3.2.1                                                                                      | MOS Boost Logic and Application on Large Scale Logic Contraction         uction                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | 51<br>52<br>52                                                                          | 91 |
| 3 | <b>Pse</b><br>3.1<br>3.2                             | udo N<br>Introd<br>Pseud<br>3.2.1<br>3.2.2                                                                             | MOS Boost Logic and Application on Large Scale Logic Control         uction                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 51<br>52<br>52<br>52<br>54                                                              | 91 |
| 3 | Pse<br>3.1<br>3.2                                    | udo N<br>Introd<br>Pseud<br>3.2.1<br>3.2.2<br>3.2.3                                                                    | MOS Boost Logic and Application on Large Scale Logic Control         uction       uction         o-NMOS Boost Logic       uction         Pseudo-NMOS Circuits       uction         Pseudo-NMOS Boost Logic       uction         Pseudo-NMOS Boost Logic       uction         Pseudo-NMOS Boost Logic       uction         Understand       uction | 51<br>52<br>52<br>54<br>55                                                              | 91 |
| 3 | Pse<br>3.1<br>3.2<br>3.3                             | udo N<br>Introd<br>Pseud<br>3.2.1<br>3.2.2<br>3.2.3<br>Desigr                                                          | MOS Boost Logic and Application on Large Scale Logic Control         uction       uction         o-NMOS Boost Logic       uction         Pseudo-NMOS Circuits       uction         Pseudo-NMOS Boost Logic       uction         In of Processing Engine       uction                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 51<br>52<br>52<br>54<br>55<br>60                                                        | 91 |
| 3 | Pse<br>3.1<br>3.2<br>3.3                             | udo N<br>Introd<br>Pseud<br>3.2.1<br>3.2.2<br>3.2.3<br>Design<br>3.3.1                                                 | MOS Boost Logic and Application on Large Scale Logic C         uction                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 51<br>52<br>52<br>54<br>55<br>60<br>60                                                  | 91 |
| 3 | Pse<br>3.1<br>3.2<br>3.3                             | udo N<br>Introd<br>Pseud<br>3.2.1<br>3.2.2<br>3.2.3<br>Design<br>3.3.1<br>3.3.2                                        | MOS Boost Logic and Application on Large Scale Logic C         uction                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 51<br>52<br>52<br>54<br>55<br>60<br>60<br>61                                            | 91 |
| 3 | Pse<br>3.1<br>3.2<br>3.3<br>3.3                      | udo N<br>Introd<br>Pseud<br>3.2.1<br>3.2.2<br>3.2.3<br>Design<br>3.3.1<br>3.3.2<br>Test C                              | MOS Boost Logic and Application on Large Scale Logic C         uction                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 51<br>52<br>52<br>54<br>55<br>60<br>60<br>61<br>65                                      | 21 |
| 3 | Pse<br>3.1<br>3.2<br>3.3<br>3.3                      | udo N<br>Introd<br>Pseud<br>3.2.1<br>3.2.2<br>3.2.3<br>Design<br>3.3.1<br>3.3.2<br>Test C<br>3.4.1                     | MOS Boost Logic and Application on Large Scale Logic C         uction         o-NMOS Boost Logic         Pseudo-NMOS Circuits         Pseudo-NMOS Boost Logic         Pseudo-NMOS Boost Logic         Energy Dissipation         in of Processing Engine         Introduction on PE         Design PE with pNBL         Chip of Proposed PE         Simulation and Evaluation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 51<br>52<br>52<br>54<br>55<br>60<br>60<br>61<br>65<br>65                                | 21 |
| 3 | Pse<br>3.1<br>3.2<br>3.3<br>3.4                      | udo N<br>Introd<br>Pseud<br>3.2.1<br>3.2.2<br>3.2.3<br>Design<br>3.3.1<br>3.3.2<br>Test C<br>3.4.1<br>3.4.2            | MOS Boost Logic and Application on Large Scale Logic C         uction         o-NMOS Boost Logic         Pseudo-NMOS Circuits         Pseudo-NMOS Boost Logic         Pseudo-NMOS Boost Logic         Energy Dissipation         of Processing Engine         Introduction on PE         Design PE with pNBL         Chip of Proposed PE         Simulation and Evaluation         Test Chip Measurement                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 51<br>52<br>52<br>54<br>55<br>60<br>60<br>61<br>65<br>65<br>69                          | 21 |
| 3 | Pse<br>3.1<br>3.2<br>3.3<br>3.4<br>3.5               | udo N<br>Introd<br>Pseude<br>3.2.1<br>3.2.2<br>3.2.3<br>Design<br>3.3.1<br>3.3.2<br>Test C<br>3.4.1<br>3.4.2<br>Conche | MOS Boost Logic and Application on Large Scale Logic C         uction         o-NMOS Boost Logic         Pseudo-NMOS Circuits         Pseudo-NMOS Boost Logic         Pseudo-NMOS Boost Logic         Energy Dissipation         of Processing Engine         Introduction on PE         Design PE with pNBL         Chip of Proposed PE         Simulation and Evaluation         Test Chip Measurement         usion                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 51<br>52<br>52<br>54<br>55<br>60<br>60<br>61<br>65<br>65<br>65<br>69<br>72              | 21 |
| 3 | Pse<br>3.1<br>3.2<br>3.3<br>3.4<br>3.4<br>3.5<br>Oth | udo N<br>Introd<br>Pseud<br>3.2.1<br>3.2.2<br>3.2.3<br>Design<br>3.3.1<br>3.3.2<br>Test C<br>3.4.1<br>3.4.2<br>Conclu  | MOS Boost Logic and Application on Large Scale Logic Control         uction         o-NMOS Boost Logic         Pseudo-NMOS Circuits         Pseudo-NMOS Boost Logic         Pseudo-NMOS Boost Logic         Energy Dissipation         in of Processing Engine         Introduction on PE         Design PE with pNBL         Chip of Proposed PE         Simulation and Evaluation         Usion         usion                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | 51<br>52<br>52<br>54<br>55<br>60<br>60<br>61<br>65<br>65<br>65<br>69<br>72<br><b>73</b> | 21 |

|                | 4.1.1   | Introduction                                                                                    | 73  |
|----------------|---------|-------------------------------------------------------------------------------------------------|-----|
|                | 4.1.2   | Wireless Power Transmission System                                                              | 75  |
|                | 4.1.3   | Test Chip and Experiment Results                                                                | 79  |
| 4.2            | Real 7  | Time Counter in Sensor Network System                                                           | 84  |
|                | 4.2.1   | Introduction                                                                                    | 84  |
|                | 4.2.2   | Power Clock Generator                                                                           | 85  |
|                | 4.2.3   | Real Time Counter                                                                               | 88  |
|                | 4.2.4   | Simulation and Test Chip                                                                        | 91  |
| 4.3            | Concl   | usion $\ldots$ | 92  |
| 5 Coi          | nclusio | n                                                                                               | 94  |
| Bibliography 9 |         |                                                                                                 | 97  |
| Publications   |         |                                                                                                 | 103 |

## List of Tables

| 2.1 | Comparison of 4-bit multiplier with charge recovery logics                                     | 44 |
|-----|------------------------------------------------------------------------------------------------|----|
| 2.2 | Truth table of T flip-flop                                                                     | 44 |
| 2.3 | Performance summary of PBL test chip                                                           | 49 |
|     |                                                                                                |    |
| 3.1 | Performance Comparison Table                                                                   | 69 |
| 3.2 | Performance summary of pNBL gate PE                                                            | 71 |
|     |                                                                                                |    |
| 4.1 | Comparison with previous work $\ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots$ | 84 |
| 4.2 | Power dissipation comparison                                                                   | 91 |

# List of Figures

| 1.1  | Consumer Portable Power Consumption Trends [1]                              | 2  |
|------|-----------------------------------------------------------------------------|----|
| 1.2  | Gate length and supply voltage trends [1]                                   | 3  |
| 1.3  | Concept of clock gating                                                     | 5  |
| 1.4  | Concept of power gating                                                     | 6  |
| 1.5  | Concept of power dissipation in static CMOS                                 | 8  |
| 1.6  | Concept of power dissipation in charge recovery logic                       | 8  |
| 1.7  | Equivalent circuit of charge recovery logic                                 | 8  |
| 1.8  | Structure of ADCL                                                           | 11 |
| 1.9  | Simulation waveform of ADCL-NOT                                             | 12 |
| 1.10 | Structure of 2PDADCL                                                        | 13 |
| 1.11 | Simulation waveform of 2PDADCL-NOT                                          | 14 |
| 1.12 | Structure of BL                                                             | 16 |
| 1.13 | Simulation waveform of BL-NOT                                               | 16 |
| 1.14 | Structure of SBL                                                            | 17 |
| 1.15 | Simulation waveform of SBL-NOT                                              | 18 |
| 1.16 | Structure of EBL                                                            | 19 |
| 1.17 | Simulation waveform of EBL-NOT                                              | 20 |
| 1.18 | Architecture of $LC$ Resonant Oscillator and its simplified model           | 22 |
| 1.19 | A $LC$ Resonant Oscillator structure with clock injection locking $\ . \ .$ | 24 |
| 2.1  | Structure and operation of PBL                                              | 32 |
| 2.2  | PBL inverter cascade connection                                             | 35 |
|      |                                                                             |    |

| 2.3  | Elements of multiplier with PBL structure                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 39 |
|------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 2.4  | Structure of data A/D converter and simulation result $\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfi$ | 40 |
| 2.5  | 4-bit pipeline multiplier with PBL gates                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 42 |
| 2.6  | Clock generator with blip [2] structure                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | 43 |
| 2.7  | Test chip module and simulation result                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | 43 |
| 2.8  | 5-bit counter with PBL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | 44 |
| 2.9  | Microphotograph of test chip                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | 47 |
| 2.10 | Measured power dissipation and energy per cycle vs. frequency                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 48 |
| 3.1  | Structure of pseudo-NMOS [3]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | 53 |
| 3.2  | Proposed Structure of pNBL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 54 |
| 3.3  | pNBL buffer chain                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 56 |
| 3.4  | Block diagram of Processing Engine                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 60 |
| 3.5  | 5-bit comparator with pNBL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 62 |
| 3.6  | Interface circuits between pNBL and static CMOS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 63 |
| 3.7  | Blip power clock generator                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 64 |
| 3.8  | Block diagram of PE system                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 65 |
| 3.9  | Energy dissipation of pNBL PE in simulation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | 66 |
| 3.10 | Energy dissipation comparison between static CMOS and pNBL $\ . \ .$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 68 |
| 3.11 | Microphotograph of test chip                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | 70 |
| 3.12 | Energy dissipation comparison between measurement and simulation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 70 |
| 4.1  | Concepts of chip interconnection                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 74 |
| 4.2  | Structure of conventional power transmission system $\ldots \ldots \ldots$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 75 |
| 4.3  | Diagram of symmetric spiral inductor                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 76 |
| 4.4  | Structure of H-bridge                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 77 |
| 4.5  | Signal transceiver from main-chip to sub-chip                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 78 |
| 4.6  | Signal transceiver from sub-chip to main-chip                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 78 |
| 4.7  | Transmitted power vs. coupling factor $k$ $\hdots$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 79 |
| 4.8  | Transmitted power vs. operation frequency                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 80 |

| 4.9  | Microphotograph of test chip            | 81 |
|------|-----------------------------------------|----|
| 4.10 | Measured power vs. distance             | 82 |
| 4.11 | Measured power vs. operation frequency  | 83 |
| 4.12 | Symbol and equivalent model of crystal  | 86 |
| 4.13 | Structure of crystal oscillator         | 87 |
| 4.14 | Structure of 4-bit counter with pNBL    | 88 |
| 4.15 | Block diagram of 16-bit counter         | 89 |
| 4.16 | Sensor node with conventional structure | 90 |
| 4.17 | Sensor node with novel structure        | 90 |

## Chapter 1

## Introduction

In the recent 10 years, people focused on ecology much more than ever. In the electronic field, low power dissipation becomes a key criterion of the product. In this chapter, several low power dissipation technologies are reviewed, and the charge recovery logics are discussed.

### 1.1 Lower Power Technologies

The motivation of developing low power dissipation technologies is not only in the ecology perspective. As the hand-held and embedded computing devices become indispensable, the performance of these devices becomes more and more powerful. However, as the computing capability of these devices increases, the power dissipation increases dramatically, which causes the battery life limited. Fig 1.1 shows the trends of power dissipation of SoCs in mobile application: as the technology improves power dissipation also increases dramatically. However, the required power dissipation does not increase, which is limited by the battery capacitance. To extend the battery life and meanwhile maintain the performance of devices, many methodologies are developed. These methodologies are mainly in two perspectives: 1) technology and circuit design; 2) logic and module design.



Figure 1.1: Consumer Portable Power Consumption Trends [1]

#### 1.1.1 Technology and Circuit Design

Dynamic power dissipation of CMOS circuits is shown as

$$P_{dyn} = 1/2V_{dd}^2 C_L f$$
 (1.1)

where  $V_{dd}$  is the supply voltage,  $C_L$  is load capacitance of circuit which is highly related to the gate capacitance of transistors, f is the operation frequency.

As the processing scaling down, supply voltage and the capacitance of transistors decrease. Fig 1.2 shows the trend of gate length and power supply in the next two decades. As a result dynamic power dissipation of CMOS circuits decreases as process scaling down. In the other hand, as the process scaling down, the leakage current increases, which causes the static power of CMOS increases. Comparing with dynamic power, static power is much smaller, so in total scaling down is helpful to reduce the power dissipation.



Figure 1.2: Gate length and supply voltage trends [1]

However, as process scaling down, integration of transistors highly increases, which means much more transistors can be integrated on the same area. As shown in Fig 1.1 and Fig 1.2, though power supply decreases 40% in 2024, power dissipation increases more than 7 times. Therefore, in order to decrease power dissipation of SoCs, only by improving process technology is not enough.

How to reduce the power dissipation with the same technology becomes a main topic. Some technologies such as sub-threshold logic and charge recovery logic are researched and many results are published. This dissertation mainly focuses on the charge recovery logic.

#### Sub-threshold Logic

Sub-threshold logic is researched for many years, but only in late 1990s, the interest in digital sub-threshold is revived. Many publications show the research results of sub-threshold logics, the research includes basic logic gates [4], multiplier with 0.475V power supply [5], and ring oscillator with 80mV power supply [6]. In these publications, sub-threshold logics achieve excellent power performance without extra technology requirement; however, the disadvantage is that with low power supply, the circuits speed is much slower than conventional circuits.

#### Charge Recovery Logic

Different from conventional logic and sub-threshold circuit, charge recovery logic uses clock as power supply. As clock rise and fall, energy is charged into and discharged from circuits, and during this procedure, energy can be recycled and only dissipates on parasitic resistance. The detail of charge recovery logic will be explained in Section 1.2.

#### 1.1.2 Logic and Module Design

As predicated by Moore's Law, the chip area becomes larger and larger while the function of a single chip becomes more and more complex. Comparing with analog circuits, digital circuits are in charge of most logical functions and consume more power. To reduce the power dissipation of digital circuits, many design methodologies are proposed and applied. The main methodologies are a) clock gating, b) power gating, and c) lower down operation frequency.

#### **Clock Gating Technology**

In digital circuits, most functions are realized by Finite State Machines (FSM), which are sequential circuits. Major source of power consumption in sequential digital designs is the clock tree which may consume up to 45% of the system power. Therefore reduction of clock tree power can lead to a considerable reduction in overall power consumption and hence the main concentration of this paper will be the effect of different clock gating strategies on the reduction of dynamic power.

Clock gating is firstly proposed by L. Benini, et.al in 1994 [7], and now is widely used in low power digital design. The main idea of clock gating is to divide the FSM to several sub-FSM, and by the method of gating clock to make only one sub-FSM's



Figure 1.3: Concept of clock gating

clock is driven while others' are sleeping. Fig 1.3 illustrates the concept of clock gating: the FSM is divided into 2 sub-FSMs, and the global clock is gated and input into them; the clock is controlled by the Clock-Ctrl-Block, which generates the control signal by state of each sub-FSM.

Many publications show the power dissipation performance of clock gating technology. Cao Cao and Bengt Oelmann shows at most 67.0% improvement in power performance [8], by combination with energy recovery logic, in the sleeping mode, Hamid Mahmoodi, et, al achieved 99% improvement in power performance [9].

#### **Power Gating**

Power gating is a similar methodology with clock gating. The difference is that in clock gating, clock is not provided to inactive FSMs, but power gating is turning off the power of inactive logic parts. Comparing with clock gating, power gating method can reduce both dynamic power and static power.

Fig. 1.4 illustrates the concept of power gating: a sleep transistor between the actual Vdd rail and the component's Vdd is added; by turning off this sleep transistor,



Figure 1.4: Concept of power gating

the power of the inactive block can be turned off. The methodology of power gating is now widely applied due to its affectivity. Many publications prove the low power dissipation can be achieved by using power gating: A. Sathanur et al showed that 34.9% power in average can be saved [10]; Kaijian Shi et al analyzed the challenge of power gating methodology and showed the optimization method of power gating [11]; Sin-Yu Chen et al challenged to utilize the power gating method in the design of standard cell so that power gating can be used in automation circuits design [12], they achieved 52% power reduction with 8% area and 17% delay overhead.

### 1.2 Charge Recovery Logic

Charge recovery logic is also called energy recovery logic, or adiabatic logic. The principle of charge recovery was researched firstly in 1961 [13], and with the principle, charge recovery logic was applied in circuits design in 1990s [14, 15, 16]. Until today, researches on charge recovery logic are processing, and many novel structures of charge

recovery logic, such as ECRL, ADCVSL, CPAL, CAMOS, ADCL, 2PDADCL, and QSERL [17, 18, 19, 20, 21, 22, 23], have been proposed. In this section, the principle of charge recovery logic is firstly introduced, and several structures of charge recovery logic are studied and analyzed.

### 1.2.1 Principle of Charge Recovery Logic

The charge recovery technique uses inductors and intermediate power supplies to provide a linear decrease in energy consumption with switching frequency. When the clock is sufficiently slow, charge recovery circuits approach zero energy consumption. Fig. 1.5 shows the principle that how power dissipates in conventional static CMOS. Pull up network (PUN) and pull down network (PDN) can be simplified to two switches: when the output is switched from 0 to 1, energy is charged from power source to the load capacitance, and when the output is switched from 1 to 0, the energy is discharged to ground through the PDN. Therefore, the energy dissipates during this procedure. Fig. 1.6 shows the principle that how power saves in charge recovery logic: when the output is switched from 0 to 1, power clock charges the load capacitance and keeps the value, and when the output is switched from 1 to 0, energy stored in the load capacitance is discharged to the power source when power clock falls down. Because the power source for charge recovery logic is a LC tank which can recycle the electrical energy to magnetic energy and reuse in next cycle. In ideal condition, all the energy can be recycled without dissipating as heat, so the charge recovery logic is also called adiabatic logic.

To analyze the power dissipation of charge recovery logic, an equivalent model of charge recovery circuit is given in Fig. 1.7. A resistor  $R_{on}$  in series with a load capacitor C is modeled as the circuits, the power source  $V_{dd}/2$  and the inductor L is modeled as the power clock generator which can generate sinusoidal clock. The inductor in circuit acts as an electrical "flywheel" that forces the shuttling of energy



Figure 1.5: Concept of power dissipation in static CMOS



Figure 1.6: Concept of power dissipation in charge recovery logic



Figure 1.7: Equivalent circuit of charge recovery logic

between the capacitor and the  $V_{dd}/2$  supply. The RLC circuit is described by

$$\frac{d^2 V_C}{dt^2} + 2\alpha \frac{dV_C}{dt} + \frac{V_C}{\omega_0^2} = 0$$
(1.2)

where  $V_c$  is the capacitor voltage,  $\alpha = R/2L$ , and  $\omega_0 = 1/\sqrt{LC}$ . For the solution to oscillate, the circuit must be satisfied that  $2\sqrt{LC} > RC$ . The frequency of oscillation  $\omega_d$ , is given by

$$\omega_d = \sqrt{\omega_0^2 - \alpha^2} \tag{1.3}$$

where  $\omega_d$  is the frequency of operation and is equal to  $\pi/T$ . For  $V_C(0) = V_{dd}/2$  and  $i_C(0) = 0$ , voltage of capacitor can be derived as

$$V_C(t) = \frac{1}{2} V_{dd} e^{-\alpha t} (\cos\omega_d t + \frac{\alpha}{\omega_d} \sin\omega_d t)$$
(1.4)

and

$$i_C(t) = \frac{1}{2} V_{dd} C \frac{\omega_0^2}{\omega_d} e^{-\alpha t} sin\omega_d t$$
(1.5)

The power dissipated in a single rail-to-rail swing for the gate consists of two components. The first is the power dissipated in R during the swing  $E_s$ , which results in a final voltage that is lower than  $V_{dd}/2$ . Using Equation 1.4,  $E_s$  can be achieved as

$$E_s = \frac{1}{2} V_{dd}^2 C (1 - e^{\frac{-2\pi\alpha}{\omega_d}})$$
(1.6)

The second is the energy lost in R while charging C to the rail to compensate for the lower peak voltage due to  $E_s$ . This can be done by simply connecting the line to the rail and losing some energy  $E_c$  in R, during this process.

$$E_C = \frac{1}{2} V_{dd}^2 C (1 - e^{\frac{-\pi\alpha}{\omega_d}})^2$$
(1.7)

adding these two terms and multiplying by two to allow for both directions of swing, energy loss per cycle is

$$E_{loss} = 2V_{dd}^2 C (1 - e^{\frac{-\pi\alpha}{\omega_d}}) \tag{1.8}$$

Using Equation 1.8, comparing with the energy dissipated by conventional CMOS in one cycle which is  $V_{dd}^2 C$ , the ratio of energy loss  $F_{saving}$  can be achieved as

$$F_{saving} = \frac{2}{1 - e^{\frac{-\pi\alpha}{\omega_d}}} \tag{1.9}$$

which is much larger than 2. This means that the energy dissipation of charge recovery logic is less than that of conventional static CMOS.

#### 1.2.2 ADCL and 2PDADCL

ADCL (Adiabatic Dynamic CMOS Logic) [21] and 2PDADCL (2-phase Drive Adiabatic Dynamic CMOS Logic) [22] were proposed by the research group of Gifu University.

Structure of ADCL is shown in Fig 1.8, which is a NOT gate. In the figure,  $V_{in}$  is input signal,  $V_{out}$  is output signal,  $D_1$  and  $D_2$  are two diode-connected transistors, and C is the load capacitance. Power clock clk drives the circuit and synchronize the output signal. The operation of ADCL is described as follows:

When the input signal  $V_{in}$  is low voltage, the load capacitance C is charged to the amplitude of power clock clk, which causes the output signal  $V_{out}$  to be high voltage. When  $V_{in}$  changes from low voltage to high voltage, the PMOS and D1 shut off while the NMOS and D2 turn on. At this time, C is discharged to low voltage. And when  $V_{in}$  changes to low voltage once again, the PMOS and  $D_1$  turn on and C is charged to high voltage again. In this operation,  $V_{out}$  inverts as  $V_{in}$  inverts.

Fig 1.9 shows the simulation of ADCL-NOT. According to the simulation, there's voltage difference between the peak voltage of  $V_{out}$  and  $V_{in}$ : in the simulation, high



Figure 1.8: Structure of ADCL

voltage of  $V_{in}$  and the amplitude of clk are set to 5V, and low voltage of  $V_{in}$  is set 0V. The simulation result shows that the high voltage of  $V_{out}$  is 4.3V and low voltage is 0.7V. This is because there's a voltage loss on the two diode-connect transistors  $D_1$ and  $D_2$ . Therefore, if the difference between clk and the voltage across C is large, adiabatic operation will not be established and power will be largely dissipated. Also, according to the simulation,  $V_{out}$  is delayed by 0.5 period per gate of the power clock clk in the ADCL circuit, which means an ADCL-NOT gate has propagation-delay time of 0.5 period. The propagation-delay time of the other ADCL circuits may be larger, such as EXOR and EXNOR have a 1 period delay, respectively.

2PDADCL has similar structure with ADCL; the difference is that two phase power clocks are required to drive the circuits, and load capacitance of 2PDADCL is smaller than ADCL. Fig 1.10 shows the structure of 2PDADCL-NOT gate. A PMOS and a NMOS transistor are the same structure with static CMOS, while two diodeconnect transistors  $D_1$  and  $D_2$  are connected with 2-phase power clocks. With these two diodes, sinusoidal signals clk and clk can be rectified to DC power, and drive



Figure 1.9: Simulation waveform of ADCL-NOT



Figure 1.10: Structure of 2PDADCL

the circuits. Energy is charged into circuits through  $D_1$  and discharged to power source through  $D_2$ , and recycled in power source. Fig 1.11 shows the simulation of 2PDADCL-NOT gate. When clk is rising and  $\overline{\text{clk}}$  is falling, there is conducting path(s) in either PMOS devices or NMOS devices. Output node may evaluate from low to high or from high to low or remain unchanged, which resembles to the CMOS circuit. Thus, there is no need to restore the node voltage to 0 (or  $V_{DD}$ ) every cycle. When clk is falling and  $\overline{\text{clk}}$  is rising, output node holds its value in spite of the fact that Vp and Vp are changing their values. Circuits node are not necessarily charging and discharging every clock cycle, reducing the node switching activity substantially.

Energy dissipation of ADCL and 2ADCL are described in Equation 1.10 and Equation 1.11.

$$E_{ADCL} \approx 2C_L (V_{dd} - 2V_D) V_D \tag{1.10}$$

$$E_{2PDADCL} \approx 2C_L (V_{dd} - 2V_D) V_D \tag{1.11}$$



Figure 1.11: Simulation waveform of 2PDADCL-NOT

where  $C_L$  is a load capacitance,  $V_D$  is a threshold voltage of diode, and  $V_{dd}$  is the peak value of power clock clk. The expressions are the same, however, because ADCL requires a large  $C_L$  to keep the output voltage stable, while 2PADCL doesn't need this. The load capacitance of 2PADCL is the gate capacitance  $C_{gs}$  of next level 2PADCL gate. So Equation 1.11 can be updated as

$$E_{2PDADCL} \approx 2C_{gs}(V_{dd} - 2V_D)V_D \tag{1.12}$$

Comparing Equation 1.10 with Equation 1.12, the conclusion that 2PDADCL is more energy saving than ADCL due to a smaller load capacitance can be drawn.

#### 1.2.3 Boost Logic Family

The boost logic family includes boost logic [24], enhanced boost logic (EBL) [25], and sub-threshold boost logic (SBL) [26]. These structures are proposed by the research group in University of Michigan.

In the previous research on charge recovery logic, operation speed of circuits is limited due to the characteristics of circuit structures. For example, ADCL and 2PDADCL utilize diode structure, which limits the operation frequency of ADCL and 2PDADCL to 1MHz. As operation frequency increasing, the level of several Mega-Hertz is not enough. To solve this problem, boost logic was proposed. With the hybrid structure of conventional switch and charge recovery state, boost logic achieved significant energy savings over voltage-scaled CMOS and a much higher operation frequency comparing with other charge recovery logic.

Structure of BL NOT gate is illustrated in Fig 1.12. Boost Logic [24] is a twophase, dual-rail, partially energy recovering n-n logic. The operation of a Boost gate can be divided into logical evaluation ("Logic") and boost conversion ("Boost"). The Logic stage has differential outputs out and  $\overline{\text{out}}$ . In the logical evaluation parts, two DC power  $V'_{dd}$  and  $V'_{ss}$  are used. Their values are

$$V'_{dd} = (V_{dd} + V_{th})/2 \tag{1.13}$$

$$V_{ss}' = (V_{dd} - V_{th})/2 \tag{1.14}$$

where  $V_{dd}$  is the amplitude of power clock clk and  $V_{th}$  is the threshold voltage.

Logic values are calculated by the evaluation tree which works similar with pseudo-NMOS. When clk falls and  $\overline{\text{clk}}$  rises, the header transistors and footer transistors (M5 to M8) turn on. As out evaluates high, the header transistor M5 pulls the output node to  $V'_{dd}$ . The complementary output discharges through the evaluation tree to nearly  $V'_{ss}$ . When clk rises past  $V'_{ss}$ , the header transistors and footer transistors (M5 to M8) turn off. As clk continues to rise past  $V'_{dd}$ , boost conversion turns on. Since



Figure 1.12: Structure of BL



Figure 1.13: Simulation waveform of BL-NOT


Figure 1.14: Structure of SBL

out is at  $V'_{dd}$  and  $\overline{\text{out}}$  is at  $V'_{ss}$ , transistors M1 and M4 turn on. Therefore, out and  $\overline{\text{out}}$  subsequently follows clk and  $\overline{\text{clk}}$ , respectively. As the voltage difference between out and  $\overline{\text{out}}$  increases, M1 and M4 turn on more strongly. In the end, out reaches  $V_{dd}$  and  $\overline{\text{out}}$  reaches Gnd, and output to next level to drive the fanout logics. Fig 1.13 shows the simulation result of BL-NOT gate: the simulation is operated with 1GHz power clock.

Energy dissipation of BL is divided into logical part and boost part, and the energy in boost part is recoverable. Energy dissipation of BL is expressed in Equation 1.15.

$$E_{BL} \approx kC_L V_{th}^2 + \frac{\pi^2 R C_L}{2T} C_L (V_{dd} - V_{th})$$
 (1.15)

where  $V_{dd}$  is the amplitude of the power-clock, T is the clock period, R is the resistance of Boost looking into a power-clock terminal, and  $C_L$  is the total capacitance driven by the gate. The coefficient k, for the energy dissipation in Logic, is greater than 1/2due to the crowbar current that flows in the output rail that evaluates to  $V'_{ss}$ .

Boost logic has a demerit on complexity of structure: it requires three DC supply levels. To improve this demerit, subthreshold boost logic (SBL) was proposed.



Figure 1.15: Simulation waveform of SBL-NOT



Figure 1.16: Structure of EBL

Fig 1.14 shows the structure of SBL structure. The logical part uses NMOS pull up network (PUN) and NMOS pull down network (PDN), which is different from the BL's pseudo-NMOS likely structure. The voltage of DC supply is near threshold voltage  $V_{th}$ , so when the voltage of input signal is higher than  $2V_{th}$ , there's no  $V_{th}$  loss with a NMOS PUN. Because both PUN and PDN are NMOS, the inputs of PUN and PDN should be complementary, i.e.  $in_{PUN} = \overline{in_{PDN}}$ . The two rails of logic part are required to generate complementary outputs (out and  $\overline{out}$ ), so the PUN and PUD in each rail are also complementary. When out is logic 1, PUN in the out side is on and PDN is off, while PUN in the  $\overline{out}$  side is off while PDN is on. The voltage of out is near  $V_{th}$ , and  $\overline{out}$  is GND, so when entering the boost stage, M1 and M4 are on, and outputs are amplified to the amplitude of power clock  $V_{dd}$  and GND. Fig 1.15 illustrates the simulation waveform of SBL NOT gate.

Energy analysis of SBL is similar with BL, and the expression is in Equation 1.16.

$$E_{SBL} = \frac{1}{2}C_L V_{cc}^2 + \frac{9K(V_{dd} - V_{cc})^2 \pi^2 C_L^2 R}{16T} + E_{crowbar}$$
(1.16)



Figure 1.17: Simulation waveform of EBL-NOT

where  $V_{dd}$  is the amplitude of the power-clock, Vcc is the voltage of DC power supply, T is the clock period, R is the resistance of Boost looking into a power-clock terminal, and  $C_L$  is the total capacitance driven by the gate. In Equation 1.16 a component  $E_{crowbar}$  is introduced. Because input signal is in AC format, the rising of inputs are relatively slow, therefore the crowbar current from  $V_{cc}$  to GND is quite large. In low operation frequency range, the  $E_{crowbar}$  becomes dominant in the total energy dissipation.

The SBL structure requires four blocks to evaluate the logic value, which makes some redundancy. To solve this problem, enhanced boost logic (EBL) was proposed. Structure of EBL is shown in Fig 1.16. The boost stage of EBL is the same with SBL, but the NMOS PUNs in both rails of logic part are substitute with two NMOS transistors (M5 and M6 in Fig 1.16). The gates of M5 and M6 are connected with  $\overline{\text{clk}}$ , when clk is low and  $\overline{\text{clk}}$  is high, EBL circuits enter the evaluation stage. When clk is higher than  $2V_{cc}$ , M5 and M6 are fully on without threshold loss, so both out and out are supposed to be pulled up to  $V_{cc}$ . In the meanwhile, NMOS PDN of one side (e.g. out) turns on, and pull the output to GND. As a result, in the evaluation stage, voltage of out is kept to  $V_{cc}$  and voltage of out is pulled down to GND. In the boost stage, the mechanism is the same with SBL: M1 and M4 are on, and outputs are amplified to the amplitude of power clock  $V_{dd}$  and GND. Fig 1.17 illustrates the simulation waveform of EBL NOT gate.

Energy dissipation of EBL is shown in Equation 1.17.

$$E_{EBL} = \frac{1}{2}\alpha C_L V_{cc}^2 + \frac{9K(V_{dd} - V_{cc})^2 \pi^2 C_L^2 R}{16T} + E_{crowbar}$$
(1.17)

where  $V_{dd}$  is the amplitude of the power-clock, Vcc is the voltage of DC power supply, T is the clock period, R is the resistance of Boost looking into a power-clock terminal, and  $C_L$  is the total capacitance driven by the gate. The same with SBL, the EBL circuits also have the crowbar problem in low operation frequency range.

Comparing with BL, EBL reduces the number of DC power supply, and the PDN of EBL has a 2 times overdriven ability and therefore a much more complex function can be realized in single EBL gate.

The three kinds of boost logic have the common in high speed low power characteristics, but in the same time, they have disadvantages, such as DC power supply is still required, they can only work in high frequency range due to the crowbar current.

## **1.3** Power Clock Generator

As mentioned in the previous section, charge recovery logic requires a LC resonant circuit to supply the power clock so that the energy can be recycled and reused by the LC tank. Thus, the conventional ring oscillator and clock tree are no longer applicable in charge recovery logic. To generate the power clock which can drive charge recovery logic, oscillator using LC resonant circuit is researched.

## 1.3.1 Architecture of LC Resonant Oscillator



Figure 1.18: Architecture of LC Resonant Oscillator and its simplified model

A typical LC Resonant Oscillator architecture is shown in Fig 1.18a. In the architecture, the on-chip inductor is structured in a spiral form. And a pair of NMOS (M1 and M2) are used as negative differential transconductors as gain element, which works for loss compensation and maintaining oscillation. Distribution capacitance of clock loads is simplified to two constant capacitors, though it may vary during the operation of circuits. In some extremely conditions, the variation of distribution capacitance must be considered to avoid minus effect of jitter and skew performance. Amplitude of the clock is determined by the bias current  $I_{bias}$ . Fig 1.18b illustrates a simplified model of the LC Resonant Oscillator, within which, parasitic resistance R including parasitic resistance in the distribution network and the inductor, clock load capacitance C and the on-chip inductor L form an RLC circuit. Energy resonates between the inductor and the clock load capacitance, and power dissipates on the parasitic resistance.

Clock skew and jitter appear in the resonant clocking system if the network topology and circuit parameter are not designed carefully. These problems also exist in the LC Resonant Oscillator. Since the clock network is using as the capacitance to set the frequency, its topology must be well designed to avoid transmission-line effect and to keep skew acceptable. Electromagnetic signals propagate at a speed by

$$v = \frac{c}{\sqrt{\varepsilon_r}} \tag{1.18}$$

or roughly  $150\mu m/ps$  in wires of silicon process. Usually the skew tolerance is 10%. Thus, for a 1 GHz to meet the tolerance, the clock cannot propagate farther than 15mm. Nevertheless, the actual propagation speed is will be slower than that predicted by Equation 1.18 because of loading and branching in the clock network.

The jitter performance is mainly limited by power-supply noise. It is highly correlated period-to-period, and as a result, jitter accumulates through addition of standard deviations and finally contributes the largest portion of jitter variations [27]. Some LC Resonant Oscillator circuits introduce injection locking to improve jitter



Figure 1.19: A LC Resonant Oscillator structure with clock injection locking

performance. Fig 1.19 gives LC Resonant Oscillator architecture with clocking injection locking. Comparing with the typical architecture, a pair of inverters is added to supply injection locking clock, which is an external reference to limit jitter

accumulation by providing phase filtering. The pole point of jitter is given by Ref [28]

$$p_0 = \frac{\ln(1 - S_c)}{t_{inj}} (rad/s)$$
(1.19)

where  $t_{inj}$  is the ideal period of clock (period of injection clocking), and  $S_c$  is the injection coupling strength given approximately by

$$S_c = \frac{W_{inj}}{W_{ind} + W_{inj}} \tag{1.20}$$

where  $W_{inj}$  is the driven capability of the injection locking buffer, and  $W_{ind}$  is the driven ability of the oscillator. By locating the pole point  $p_0$  below frequency of the reference clock, the injection clocking element can filter out the large cycle-to-cycle jitter.

#### 1.3.2 Power Analysis of LC Resonant Oscillator

#### Quality Factor

In the power analysis, the quality factor Q of the circuits is a key factor, since the larger Q is, the better power and frequency performance LC Resonant Oscillator can achieve. There are several definitions of quality factor, but all of them can essentially equivalent to

$$Q = 2\pi \frac{MaximumEnergyStored}{EnergyDissipatedperCycle}$$
  
=  $\omega_0 \frac{MaximumEnergyStored}{AveragePowerDissipation}$   
=  $\frac{\omega_0}{\beta}$  (1.21)

where  $\omega_0$  is the resonant angular frequency and  $\beta$  is the bandwidth, the difference between the half-power frequency above and below  $\omega_0$  [29]. It is useful to relate the individual components quality factor to that of the whole circuit, Ref [29] gives the quality factors of capacitor and inductor with parasitic resistance. Equation 1.22 and 1.23 give the approximate expressions of  $Q_C$  and  $Q_L$ .

$$Q_C = \omega_0 C R_C \tag{1.22}$$

$$Q_L = \frac{R_L}{\omega_0 L} \tag{1.23}$$

where  $R_C$  is the parasitic resistance of the capacitor expressed as a parallel resistance at resonance, and  $R_L$  is the equivalent resistance in the inductor. While the quality factor of the LC tank is expressed as

$$Q_{tank} = \omega_0 C(R_C \parallel R_L) \tag{1.24}$$

Substituting  $R_C$  and  $R_L$  from 1.22 and 1.23, quality factor of the LC tank can be expressed in terms of its component's quality factors.

$$Q_{tank} = Q_C \parallel Q_L \tag{1.25}$$

Both quality factors of capacitor and inductor can limited the quality of LCResonant Oscillator according to Equation 1.25, however, the inductor can be controlled precisely by semiconductor process, while the capacitor is the capacitance of distribution network. Hence, some techniques have to be applied to keep the clock resistance minimum to improve the quality factor.

#### Power Dissipation of LC Resonant Oscillator

In LC Resonant Oscillator buffers to drive the clock loads are no longer necessary, since the load capacitance is used in the resonant LC tank. Hence, in the resonant clocking system, there is only a litter number of buffers, and LC Resonant Oscillator can drive the clock load directly. As a result, the power dissipated by buffers can be reduced which is a large portion of the total power dissipation. For a nonresonant clocking system with *m*-step buffers where each step gain n, if the clock load capacitance is  $C_L$ , the buffers' load capacitance  $C_B$  can be approximately expressed as

$$C_B = C_L/n + C_L/n^2 + \dots + C_L/n^m$$
(1.26)

Usually in a distribution network, the buffer steps m is large enough, so that  $C_B = C_L/(n-1)$ . The dynamic power dissipated by the nonresonant clock system can be achieved as

$$P_{ns} = (C_L + C_B) V_{DD}^2 f$$
  
=  $\frac{n}{n-1} C_L V_{DD}^2 f$  (1.27)

where  $V_{DD}$  is the power-supply voltage and f is the clock frequency.

For a LC Resonant Oscillator clocking system, power is not dissipated in the resonance, and only the parasitic resistance which is caused by resistance elements of inductor and the distribution capacitance, consumes power. The clock generated by LC Resonant Oscillator is in a sinusoid format as

$$v_{clk}(t) = V_A[1 + \sin(\omega_0 + \varphi)] \tag{1.28}$$

where  $V_A$  is the amplitude of the clock,  $\omega_0$  is the angular frequency of clock,  $\varphi$  is the initial phase of the clock.  $V_A$  is determined by bias current  $I_{bias}$ , while  $\omega_0 = 1/\sqrt{LC}$ . The average power of the *LC* Resonant Oscillator is calculated as

$$P_{DA} = \frac{1}{T} \int_0^T p(t) dt$$
  
=  $\frac{V_A^2}{RT} \int_0^T [1 + \sin(\omega_0 + \varphi)]^2 dt$   
=  $\frac{3V_A^2}{2R}$  (1.29)

This result indicates that power dissipation has no relationship with clock frequency. Even when the frequency is promoted to a very high level, the power dissipation won't change in ideal situation. In the real situation, power may change due to frequency variation, but it will be in a small scale.

By setting the amplitude of clock to  $V_{DD}$ , power dissipation of *LC* Resonant Oscillator and nonresonant clocking system can be compared, and the result is given as [30]

$$\frac{P_{DA}}{P_{ns}} = \frac{3\pi(n-1)}{4Qn}$$
(1.30)

In Equation 1.30, if the stage gain of buffer is e, the quality factor Q has to be larger than 1.49 to ensure the power dissipation of LC Resonant Oscillator is smaller than nonresonant clocking system. Ref [31] reports that the quality factor of inductor can reach 20 with CMOS process. However, according to Equation 1.25, not only  $Q_L$  of inductor, but also  $Q_C$  of capacitance can limit Q of the whole circuits.

With the analysis above, it is obvious that LC Resonant Oscillator has the advantage of low power dissipation and because of the LC resonant characteristic, it is proper to supply power clock for charge recovery logic.

## Chapter 2

# Pulse Boost Logic and Application on Multiplier

## 2.1 Introduction

In recent communication systems, other than data signals, power is also supplied in a wireless way, and inductive coupling is a common way for power transmission. Power transferred by inductors is in an AC (sinusoidal) format, and meanwhile conventional static CMOS requires a DC voltage as power supply, which means a rectifier is necessary to convert AC voltage to DC voltage for power supply. As the scale of circuits increasing, the energy efficiency becomes critical. Therefore, in order to improve the utilization efficiency of wireless transmitted power, charge recovery logics are applicable.

However, the charge recovery logics published in literatures have some crucial shortcomings to be applied in wireless power transmission. To achieve sufficient energy saving performance, charge recovery logics such as ADCL, 2PDADCL, and QSERL are required to work in a limited frequency range (up to one 100MHz) due to their structure: ADCL [21] and 2PDADCL[22] use diode-connected transistors, QSERL [23] use PMOS devices in evaluation tree, both reduce the operation frequency. Although boost logics [24, 26, 25] can work at a high frequency, their structures require DC power supplies for evaluation logic value.

In this chapter, a circuit structure called Pulse Boost Logic (PBL) [32] is proposed, which is a novel two-phase high-speed charge recovery logic. PBL is a fully AC power supply circuit which is driven by 2-phase non-overlap clock and the operation frequency can be up to Gigahertz level. PBL can be cataloged into boost logic family; the operation of PBL can be divided into two stages: evaluation stage and boost stage, which is similar with other members in boost logic family, and the detail is explained in the Section 2.2. To demonstrate the performance of PBL, a 4-bit multiplier is designed and fabricated with  $0.18\mu m$  CMOS process technology, and the simulation result indicates that PBL can work at operation frequency up to 1.8GHz; while the measurement of test chip is at 161MHz, and dissipate  $772\mu W$  when working at 161MHz.

This chapter is organized as follows. In Section 2.2, operation of boost logic is discussed, and comparison between PBL and previous boost logic is made. The design of 4-bit pipeline multiplier is described in Section 2.3, and meanwhile the simulation results of both function and energy dissipation are presented. The measurement results of test chip demonstrate the energy reduction performance of PBL in Section 3.12. In Section 2.5 the test chip performance is summarized and conclusions are given.

## 2.2 Pulse Boost Logic

In this section, structure and operation principle of PBL are first discussed and then a comparison of PBL and former boost logic is given in the power dissipation performance.

#### 2.2.1 Structure of PBL

Pulse boost logic has a dual-rail, two-phase clock driven structure as shown in Fig 2.1a. The structure is divided into two parts: Evaluation network and boost conversion.

Evaluation network has two complementary parts, and each of them consists of Pull Up Network (PUN) and Pull Down Network (PDN), both of which consist of NMOS only. Logic value is calculated in evaluation network, and because of NMOS PUN and PDN, evaluation speed is considerable. Different from the conventional static CMOS circuits, two groups of inputs (in and in) are required to make sure NMOS PUN and NMOS PDN work functional correctly.

Boost network consists of 6 transistors, including a latch structure (M1-M4) and 2 pass gates (M5 and M6). Logic values calculated in the evaluation network transfer from the pass gates to the latch, and get amplified. Logic '1' transferred from the NMOS PUN and pass gate to the latch has threshold loss, however, the voltage value is boosted to the amplitude of power supply in boost network.

PBL is driven by two-phase non-overlap power clocks as illustrated in Fig 2.1a: clk and  $\overline{\text{clk}}$  are power clock to drive the circuits. Amplitude of clk and  $\overline{\text{clk}}$  (assuming it is  $V_{dd}$ ) should be high enough so that output of PBL can be recognized as logical '1', and usually  $V_{dd}$  is set as the DC power supply of the conventional static CMOS. Output of PBL is supposed to be other PBL gates' input, and the peak value  $V_{dd}$  is high enough as driven signals.

Although for a same function, the transistors occupied by PBL are more than the conventional static CMOS, PBL is area saving in sequential circuits because it is driven by clocks.

#### 2.2.2 Operation of PBL

According to change of clocks, operation of PBL also can be divided into two stages: Evaluation stage and Boost stage. Fig 2.1b illustrates simplified waveform of these two stages.



Figure 2.1: Structure and operation of PBL

In evaluation stage, clk is in low half cycle (0 to  $V_{dd}/2$ ) while  $\overline{\text{clk}}$  is in high half cycle ( $V_{dd}/2$  to  $V_{dd}$ ). As shown in Fig 2.1a, the evaluation network calculates the logic value when  $\overline{\text{clk}}$  is high and clk is low. Because both PUN and PDN are formed by NMOS, output level is about  $V_{dd} - V_{th}$  for logic '1', and is near 0V for logic '0' in the evaluation stage. But as the voltage of clk rises and  $\overline{\text{clk}}$  falls, output level for logic '0' will rise up to  $V_{th}$  before  $\overline{\text{clk}}$  falls low enough to shut down the pass gates. And because the voltage range of  $\overline{\text{clk}}$  is set to be much larger than threshold voltage of NMOS devices, overdrive ability is larger compared with Boost Logic and Enhanced Boost Logic. During this stage, since  $\overline{\text{clk}}$  is high, pass gates M5 and M6 are both on; voltage values calculated by evaluation network pass from the pass gates. In the meanwhile, voltage generated by boost network can barely affect these values.

In boost stage, clk is in high half cycle while clk is in low half cycle. Since clk is low, pass gates M5 and M6 are both off, so voltage generated by the evaluation network cannot transfer to boost network. Voltage values calculated by the previous stage are approximately  $V_{dd} - V_{th}$  for logic '1' and  $V_{th}$  for logic '0', and these values are boosted to peak of clk swing voltage ( $V_{dd}$  and 0 for logic '1' and '0', respectively). The voltage swing of output of PBL therefore is only  $V_{th}$ , crowbar current during boost stage is small as a result.

Different from BL or EBL, power supply of evaluation network is also AC power, and the voltage range is much larger than  $V_{th}$ , which means the overdrive ability of PBL is larger than BL and EBL. Since the larger gate's overdrive ability is, the more complexity circuits can be designed with, PBL can realize the same function of circuits with fewer gates. Take EBL as an example, power supply for evaluation network is a low DC voltage which is approximately  $V_{th}$ . Although peak value of input signals is the same  $V_{dd}$ , power supply with peak value  $V_{dd}$  can obviously drive more transistors than  $V_{th}$ .

Operation of single PBL gate is explained as above, however, it's more important if PBL gates can form a circuit. To make sure that peak voltage value of output is passed to adjacent PBL gates during their evaluation stage, clock phases between adjacent PBL gates must have a 180° difference. Since the peak value of output is in boost stage, for a PBL gate, output signal has half cycle latency. To illustrate how PBL gate drives other PBL gates, an inverter cascade is simulated. Fig 3.3 illustrates the connection and operation of PBL gates. As shown in Fig 2.2a, power clocks of the two adjacent gates have a 180° phase difference. out0 and out0 are output signals of the first PBL inverter, and input signals of the second PBL inverter. Fig 2.2b gives the voltage change of output signals of both inverters. In the first half cycle, the first inverter is in evaluation stage, and input signal is calculated in the evaluation tree. In the second half cycle, the first inverter is in boost stage, the calculated value is boosted to a high level and input to the second inverter; the second inverter is in evaluation stage, so that the output signal of the first inverter is calculated in the evaluation network. In the third half cycle, the second inverter is in boost stage, signal calculated in the previous half cycle is boosted and output. Through two inverters, the input signal is one cycle delayed.

#### 2.2.3 Analysis of Energy Dissipation

As explanation of structure of PBL, energy dissipation analysis can also be divided into two parts: energy dissipated by the evaluation network and by the boost network.

Before analyzing energy dissipation performance of PBL, some assumptions are made to simplify the analysis. First one, both output signals of evaluation network and boost network are assumed to be in a sinusoidal format, since PBL is a fully AC power supply driven circuit. For evaluation network, swing voltage is simplified to  $V_{dd} - V_{th}$  and the frequency is the same with power clock; while for boost network, signals have to be analyzed in logic '1' and '0' cases respectively. In logic '1' case, output signal swings between  $V_{dd}$  and  $V_{dd} - V_{th}$  at the same frequency with power clock, and in logic '0' case, voltage swing of output signal is  $V_{th}$  and the frequency is twice of power clock because the voltage swings in both half cycles. For an AC power supply system, power only dissipates on resistant, however, resistance of MOS devices



(b) Voltage waveform

Figure 2.2: PBL inverter cascade connection

is constant only in the linear region. So another assumption is that for the given power clock, all MOS devices work on linear region, and their equivalent resistance is R, and in boost network, all MOS devices including NMOS and PMOS are supposed to be the same, which can be realized by setting proper size parameters.

With the assumptions made above, energy dissipation of PBL can be derived. For evaluation network, only capacitance of pass gate  $C_{pass}$  is load capacitance, thus, amplitude of current flowing through evaluation network is  $\omega C_{pass}(V_{dd} - V_{th})$ . For boost network, assuming load capacitance is  $C_L$ , amplitude of current flowing through boost network is  $\omega C_L V_{th}$  for logic '1' and  $2\omega C_L V_{th}$  for logic '0'. Energy dissipation on each device in one cycle can be expressed as  $I^2 RT$ , in which I stands for the effective value of current with  $I = I_{peak}/\sqrt{2}$ . Energy dissipation on evaluation network in one cycle is expressed as

$$E_{eval} \approx \frac{2\pi^2 R_e C_{pass}}{T} C_{pass} (V_{dd} - V_{th})^2$$
(2.1)

where  $R_e$  is equivalent resistance of NMOS devices in evaluation network, and T is time of one cycle. Though there're logic '1' and '0' cases in boost network, due to the dual-rail structure of PBL, probability of logic '1' and '0' is the same. Therefore, energy dissipated in boost network is expressed as

$$E_{boost} \approx \frac{1}{2} (\frac{1}{\sqrt{2}} \omega C_L V_{th})^2 R_b T + \frac{1}{2} (\sqrt{2} \omega C_L V_{th})^2 R_b T$$
  
=  $\frac{5\pi^2 R_b C_L}{T} C_L V_{th}^2$  (2.2)

where  $R_b$  is equivalent resistance of boost network. Thus, energy dissipation of the whole PBL structure is

$$E_{PBL} = E_{eval} + E_{boost}$$

$$\approx \frac{2\pi^2 R_e C_{pass}}{T} C_{pass} (V_{dd} - V_{th})^2$$

$$+ \frac{5\pi^2 R_b C_L}{T} C_L V_{th}^2$$
(2.3)

To simplify this result, some analysis is necessary. In the circuits, usually the load capacitance is in dozens of femto level if the operation system is in a Gigahertz frequency, while  $C_{pass}$  is the parasitic capacitance of single NMOS which is only several femto farad; the equivalent resistance of single MOS device is about  $K\Omega$  level, and equivalent resistance of both evaluation network and boost network is comparable; and in the  $0.18\mu m$  process,  $V_{dd}$  is about 4 times of  $V_{th}$ . With these preconditions, an assumption as follow is made:

$$R_e C_{pass}^2 (V_{dd} - Vth)^2 \approx R_b C_L^2 Vth^2$$
(2.4)

With this assumption, Equation 3.5 can be simplified as follows.

$$E_{PBL} \approx \frac{7\pi^2 R_b C_L}{T} C_L V_{th}^2 \tag{2.5}$$

Comparing with other members in boost logic family, structure of PBL is similar with EBL. Therefore, a similar energy analysis can be made on EBL to make a comparison of these two structures. Energy dissipation of EBL consists of two parts: DC energy dissipation and AC energy dissipation. Equation 2.6 gives the energy dissipation derived by the same analysis method.

$$E_{EBL} \approx C_L V_{th}^2 + \frac{\pi^2 R_b C_L}{T} C_L (V_{dd} - V_{th})^2$$
(2.6)

Comparison of energy dissipation between PBL and EBL can be made.

$$\frac{E_{PBL}}{E_{EBL}} \approx \frac{7\pi^2 R_b C_L C_L V_{th}^2}{T C_L V_{th}^2 + \pi^2 R_b C_L C_L (3V_{th})^2} \\
= \frac{7\pi^2}{T/R_b C_L + 9\pi^2}$$
(2.7)

With the preconditions made above, assuming the system works at a frequency of 1GHz,  $C_L$  is 100 fF (actually smaller than this value in most cases), and  $R_b$  is  $1K\Omega$ . As such, the result is about 70%, which means that energy dissipation of PBL is less than that of EBL. Moreover, as analyzed in subsection 2.2, the overdrive ability of PBL is higher and PBL requires no DC power supply.

## 2.3 Design of 4-bit Multiplier

#### 2.3.1 Elements of 4-bit multiplier

Design of 4-bit multiplier utilizes the  $4 \times 4$  array multiplier structure, so AND gate, HALF adder and FULL adder are required; moreover, to ensure the multiplier work functional correctly with clock driven, buffers are also necessary. Equation 2.8 gives Boolean formulas of AND gate, HALF adder and FULL adder.

AND : 
$$Y = A \cdot B$$
  
HAdder :  $Sum = A \oplus B$   
 $C_{out} = A \cdot B$  (2.8)  
FAdder :  $Sum = A \oplus B \oplus C_i$   
 $C_{out} = A \cdot B + C_i \cdot (A + B)$ 

According to these Boolean formulas, AND gate, HALF adder and FULL adder are designed with PBL structure. As described in 2.2.1, evaluation network consists of NMOS PUN and PDN, and since PBL is a dual-rail structure, the evaluated value of either rail must be complementary. Structures of each PBL gate are illustrated in



(c) FULL adder with PBL

Figure 2.3: Elements of multiplier with PBL structure



Figure 2.4: Structure of data A/D converter and simulation result

Fig 2.3. Though in combinational circuits, PBL may have issue of area overhead, in sequential circuits, area of circuits with PBL structure may even smaller because it is clock driven.

#### 2.3.2 Interface between PBL and Static CMOS

PBL is supposed not only to be used as a substitute of conventional static CMOS, but also to be compatible with static CMOS. Therefore, interface between static CMOS and PBL to ensure their compatibility is required.

Input of PBL can be DC format just by following a timing rule: setup time of input signal must be no smaller than half cycle of clock. This rule is made because evaluation network calculates the logic value in half cycle of clock; with this rule, correct calculation can be ensured.

Output signal of PBL can be recognized as logic '1' and '0' by other PBL gates; however, it can hardly be recognized by static CMOS circuits. Therefore an interface circuit is required to convert sinusoidal format signal to DC format signal. Fig 2.4 shows structure of data A/D converter and the simulation result. D and  $\overline{D}$  are data output from PBL circuits, and they are supposed to be boost stage when clk is in high half cycle. In the first half cycle of clock, since clk is low, M6 is off and the pair of inverters (M2&M4, M3&M5) don't work, while  $\overline{\text{clk}}$  is high, M0 and M1 are on and nets D<sub>1</sub> and  $\overline{D_1}$  are precharged to high. In the second half cycle, clk is high and M6 is on, while  $\overline{\text{clk}}$  is low, M0 and M1 are off; as such, inverters works, and voltage values of nets D<sub>1</sub> and  $\overline{D_1}$  are decided by D and  $\overline{D}$ . These values are locked by the latch formed by NAND gates and inverters, and finally converted logic value Q and  $\overline{Q}$  in DC format are output.

#### 2.3.3 4-bit Pipeline Multiplier

The most popular parallel multiplier is  $n \times n$  array structure [33]. Bit products  $a_i \cdot b_j$  are calculated with AND gates, and formed using adders by columns. Adders including full-adder and half-adder are arranged in a carry-save chain so that the carry-out bits can be fed to the next available adder in the column to the left. This  $n \times n$  structure accepts all input bits simultaneously.

4-bit PBL multiplier adopts pipeline design because each PBL gate has half cycle delay. Fig 2.5 illustrates structure of 4-bit pipeline multiplier: since the longest path in the structure is from  $a_1 \cdot b_0$  to  $p_7$  including 7 PBL gates, 7 stages pipeline is designed; each stage has half cycle delay, so 3.5 cycles latency generates after the multiplication calculation.

#### 2.3.4 5-bit Counter as Frequency Divider

Because the operation frequency is quite high, power clock can hardly be measured. Therefore, a 5-bit counter is designed, with which frequency of power clock is divided to 1/32, and can be easily output and measured.

The counter is constructed with 5 T flip-flops and 3 AND gates. T flip-flop is triggered at the rising edge of clock, and in this design the input is connected to fixed



Figure 2.5: 4-bit pipeline multiplier with PBL gates



Figure 2.6: Clock generator with blip [2] structure



Figure 2.7: Test chip module and simulation result

|              | Power Supply                                | Input signal<br>frequency | Energy<br>pJ/cycle | Delay              |
|--------------|---------------------------------------------|---------------------------|--------------------|--------------------|
| 2PDADCL [22] | 1.8V, 2-phase<br>sinwave 45MHz              | 2.2MHz                    | 12.27              | 67.4ns             |
| EBL [25]     | 1.8V, 2-phase<br>sinwave 1.3GHz<br>0.4V, DC | 1.3GHz                    | 3.35               | $3.08 \mathrm{ns}$ |
| PBL          | 1.8V, 2-phase<br>sinwave 1.8GHz             | 1.8GHz                    | 3.09               | 2.28ns             |
|              | 1.8V, 2-phase<br>sinwaye 1.3GHz             | 1.3GHz                    | 2.65               | $3.08 \mathrm{ns}$ |

Table 2.1: Comparison of 4-bit multiplier with charge recovery logics



Figure 2.8: 5-bit counter with PBL

high  $(V_d d)$ . The truth table of T flip-flop is shown in Table 2.2: on the rising edge of clock, if input is low, output keeps, otherwise output inverts. With this truth table, a T flip-flop is designed with PBL, and with this PBL T flip-flop, the 5-bit counter is constructed as shown in Fig 2.8.

### 2.3.5 Peripheries of Multiplier

Other than core of multiplier, peripheral circuits for different functions are necessary.

 Table 2.2:
 Truth table of T flip-flop

| Т | $Q_n$ | $Q_{n+1}$ |
|---|-------|-----------|
| 0 | 0     | 0         |
|   | 1     | 1         |
| 1 | 0     | 1         |
|   | 1     | 0         |

PBL is supposed to work in a high operation frequency range (several hundred Megahertz to Gigahertz), and two-phase non-overlap clock power is required. And because PBL is a charge recovery logic, LC resonant circuit is necessary to recycle energy. To generate required power clock, a structure called blip [2] is utilized. Fig 2.6 shows blip clock generator: an LC resonant structure is adopted. Two inductors and parasitic capacitance of circuits form an LC resonant system, and cross-coupled NMOS pair works as negative resistance to keep the clock generator oscillating. In the test chip only the NMOS pair is integrated, while the inductor  $L_1$  and  $L_2$  are using off-chip inductors because on-chip inductor is area consuming and difficult to control.

In the inductive-coupling wireless power transmission system, power receiver is also an inductor, and the power received is in AC format which is similar with the clock signals generated by blip. However, in the wireless power transmission, degradation of sin-wave shape, phase will occur; impedance mismatching will also affect the power transmission efficiency. These problems should be concerned in the design of wireless power transceiver.

## 2.4 Test Chip of Pulse Boost Logic

The 4-bit multiplier is fabricated with  $0.18\mu m$  CMOS process technology, and in this section the simulation result and measurement result are exhibited.

#### 2.4.1 Simulation Result

Test chip module consists of 4-bit multiplier, frequency divider, data A/D converter and negative resistance of clock generator. Layout is implemented with  $0.18\mu m$ CMOS process; sizes of PMOS and NMOS are set to guarantee equivalent resistance of PMOS and NMOS the same as 2.2.3 assumes. For more flexibility in the experiment, independent body bias supplies for well and substrate are introduced. Fig 2.7a is a block diagram of the test chip.

Simulation with Cadence Spectre is applied to verify if the multiplier works functional correctly, and if energy dissipation is considerably decreased. To compare with performance of other charge recovery logics, the operation frequency is set to maximum value at which PBL can work properly. Fig 2.7b shows the function simulation result of 4-bit multiplier with PBL gates at operation frequency of 1.8GHz. In the simulation result waveform, a 2.28ns latency from input to output is generated, in which 1.94ns latency is caused by the multiplier according to former analysis, and 0.34ns latency is caused by the data A/D converter. Signal 1/32F is the output of frequency divider; since the operation frequency is 1.8GHz, cycle of 1/32F is 17.78ns, which can be used to indicate the circuits' operation frequency. Though the multiplier can process input data length the same with clock cycle, the simulation adopts five times of clock cycle as input data length to make the simulation result more clear.

Table 2.1 summarizes performances of several charge recovery logics by simulation, and for each structure, operation frequency is set to the maximum value at which it can work properly. In order to compare the performance of each charge recovery logic, the simulation conditions are set the same: the library is  $0.18 \mu m$  process and the amplitude of AC power supply is 1.8V. Operation frequency of PBL can reach 1.8GHz, while EBL reaches 1.3GHz, however, 2PADCL can only reach 45MHz. From the comparison, energy dissipation of PBL is only 79% of that of EBL when the operation frequency is 1.3GHz, and PBL requires no DC power supply as analyzed. Although delay time of PBL may be larger than 2PADCL in low frequency since its delay time is dependent on power supply frequency, in high frequency range, PBL achieves much higher performance than 2PADCL.



Figure 2.9: Microphotograph of test chip

#### 2.4.2 Test Chip Measurement

Fig 2.9 shows the photomicrograph of test chip: area of 4-bit multiplier is  $224\mu m \times 74\mu m$ , area of frequency divider is  $170\mu m \times 7\mu m$ , while data A/D converter takes  $96\mu m \times 20\mu m$ . Some parts are covered by dummy metal, therefore the NMOS pair cannot be obviously observed in the micrograph.

Since off-chip inductors are used, whose inductance is 50nH, while the capacitance of pads is about 3pF. Thus, the resonance frequency of clock generator is limited. In the meanwhile, the bandwidth of wire also limits the frequency of input clock. Due to these reasons, the operation frequency of the test chip is limited to 161MHz. In the measurement, DC power supply of blip is 0.65V, and off-chip inductors are fixed to 50nH. Frequency of test chip can be changed by adjusting the capacitance in the blip structure; achieved resonant frequencies are 71MHz, 80MHz, 88MHz, 98MHz, 125MHz and 161MHz. Fig 2.10 shows measured power dissipation and energy per cycle versus clock frequency.



Figure 2.10: Measured power dissipation and energy per cycle vs. frequency

As analysis result Equation 2.5 predicts, energy dissipation increases as operation frequency increasing. As Fig 2.10 shows, when operation frequency is higher than 98MHz, the increasing trend of energy dissipation becomes slower. That's because the Q factor value of inductor changes with operation frequency changes. Specification of off-chip inductor shows that when working frequency increases from 80MHz to 100MHz, the Q value increases from about 9 to 10, while when working frequency increases from 100MHz to 200MHz, the Q values increases from about 10 to 20. Therefore, in the frequency range of 98MHz to 161MHz, Q factor is much larger, and the energy efficiency is higher than that in the lower frequency range.

Comparing with the simulation result in Table 2.1, energy dissipation of simulation result is much less than the measurement result even working at a much higher frequency, this is because elements like NMOS pair, pads and bonding-wire are also included in the measurement. To ensure the blip oscillator work, sizes of NMOS pair are large, and the parasitic resistance and capacitance of pads are also considerably large; these elements dissipate a large partial power. Chip performance is summarized in Table 2.3.

| Items                               | Value                                  |  |
|-------------------------------------|----------------------------------------|--|
| Maximum operation frequency         | 161MHz                                 |  |
| Technology                          | $0.18 \mu m CMOS$                      |  |
| Total Transistors NO.               | NMOS:1400                              |  |
| (including data A/D converter)      | PMOS:300                               |  |
|                                     | Multiplier: $1.65 \times 10^4 \mu m^2$ |  |
| Area of test chip                   | Frequency divider: $1190\mu m^2$       |  |
|                                     | Data A/D: 1920 $\mu m^2$               |  |
| Power supply for oscillator         | DC 0.65V                               |  |
| Power supply for data A/D converter | DC 1.8V                                |  |
| Total power dissipation             | $772\mu W@161MHz$                      |  |
| Energy dissipation per cycle        | 4.81pJ@161MHz                          |  |

 Table 2.3:
 Performance summary of PBL test chip

## 2.5 Conclusion

In this chapter a novel structure of charge recovery logic called Pulse Boost Logic is presented, different with other members in boost logic family, which requires no DC power supply.

To verify the performance of this structure, a 4-bit pipeline multiplier is designed and implemented with  $0.18\mu$ m CMOS process. To detect the operation frequency, a 5-bit counter is designed for frequency dividing, and to convert AC format output of PBL to normal DC logic signal, data A/D converter is also designed. In simulation, PBL can work at as high as 1.8GHz, and dissipates less energy comparing with Enhanced Boost Logic, which has similar structure with PBL.

The design is also fabricated out, and the measurement of test chip is at an operation frequency up to 161MHz, and the power dissipation of the design is  $772\mu W$  including 4-bit pipeline multiplier, frequency divider and data A/D converter.

## Chapter 3

# Pseudo NMOS Boost Logic and Application on Large Scale Logic Circuits

## 3.1 Introduction

In Chapter 2, structure of PBL is introduced. However, the PBL structure required four evaluation blocks in the logic part, which consumes a large number of transistors. In this chapter, a novel charge recovery logic called pseudo-NMOS boost logic (pNBL) [34] is proposed. pNBL has the features of boost logic: the operation can be divided into evaluation stage and boost stage, and the operation frequency can reach as high as Giga-hertz. Moreover, the drivability of pNBL is higher comparing with other boost logics. And comparing with BL, SBL, and EBL, no DC power supply is required. To verify the effect of proposed pNBL in reducing power dissipation of LDPC chip, a critical sequential circuit module called Processing Engine (PE) which occupies large portion of power dissipation is constructed. By demonstrating that the power dissipation of PE can be reduced by using pNBL, the total power of LDPC is also can be reduced as a result. Low Density Parity Codec (LDPC) is a high code rate codec which is researched widely in recent years [35], and the LDPC chip is broadly used in many wireless communication field for error correction. In wireless communication application, power dissipation is becoming more and more critical because it affects the battery duration directly. Reducing power dissipation of LDPC chip is an important topic in recent research, and the main ideas of low power are focusing on architecture and algorithm [36]. Both methods require improvement of complexity of circuits to realize the low power, and in some conditions, a trade-off between speed and power dissipation has to be considered. An alternative method of lowering power dissipation of LDPC chip is proposed in this research: not focusing on architecture or algorithm, but using novel low power dissipation circuits techniques

This chapter is organized as follows. In Section 3.2, operation of pseudo-NMOS boost logic is discussed, and comparison between pNBL and the previous boost logics (ie. PBL [32] and EBL [25]) is described. The structure of Processing Engine is discussed in Section 3.3. The simulation results and test chip measurement results are presented in Section 3.4, and comparison with conventional static CMOS and other charge recovery logics is also given. In Section 3.5, performance of proposed pNBL and PE is summarized and conclusions are given.

## 3.2 Pseudo-NMOS Boost Logic

In this section, structure and operation of pseudo-NMOS and pNBL are first discussed, and then the power dissipation of pNBL is analyzed.

#### 3.2.1 Pseudo-NMOS Circuits

Pseudo-NMOS [3] is a CMOS ratioed logic as shown in Figure 3.1. The pull-down network (PDN) is the same with static CMOS gates, while the pull-up network (PUN)
is replaced with a single PMOS transistor whose gate is grounded. Since the PMOS is always on, once the PDN is shut off, output is pulled up to  $V_{dd}$ .

Comparing with the static CMOS circuit, pseudo-NMOS circuits have the advantage in reduction of input capacitance and layout area, because a large number of PMOS transistors in PUN are eliminated. In the other hand, pseudo-NMOS circuits also have disadvantage due to their structures: the ratio of PMOS has to be adjusted carefully to satisfy the pull up ability. When the output is logic '0', both PMOS and PDN are ON, and the current is larger than the static CMOS which causes large static power dissipation. This disadvantage limits the application of pseudo-NMOS circuits.

For charge recovery logic, the current is recycled and reused with LC tank. Therefore the disadvantage of pseudo-NMOS can be overcome in the charge recovery logic. With this idea, a novel charge recovery logic called pseudo-NMOS boost logic is proposed.



Figure 3.1: Structure of pseudo-NMOS [3]



Figure 3.2: Proposed Structure of pNBL

#### 3.2.2 Pseudo-NMOS Boost Logic

Pseudo-NMOS boost logic (pNBL) illustrated in Figure 3.2 is an enhancement of PBL. In pNBL, PUNs in both rails are substituted with a single PMOS transistor, respectively, and gates of the PMOS transistors are connected to clk. Comparing with PBL, pNBL uses fewer transistors in the evaluation block, and as a result, the input capacitance of pNBL is smaller than PBL. Since the input capacitance is the load capacitance of other pNBL gates, the power dissipation caused by load capacitance is lower than PBL circuits.

Operation of pNBL is also divided into evaluation stage and boost stage as other members in boost logic family.

In the evaluation stage, clk is in low half cycle (lower voltage) while clk is in high half cycle (higher voltage). When clk is low, PMOS M7 and M8 are always on and act as Pull-Up Network, and the complementary inputs make one PDN on and the other off. This operation procedure is the same with pseudo-NMOS circuits. Because PMOS transistors are used to pull up the logic '1' value, there's no threshold voltage loss in the evaluation stage comparing with PBL. During this stage, since clk is high, pass gates M5 and M6 are both on; voltage values calculated by the evaluation block transfer from the pass gates to boost block. In the evaluation stage, voltage generated by the boost block can barely affect these values.

In the boost stage, clk is in high half cycle while clk is in low half cycle. Since clk is low, pass gates M5 and M6 are both off, so the evaluation block cannot affect the boost block. Voltage values generated in the evaluation stage are then latched in the boost block. The voltages from two sides are different, take the  $V_{out} > V_{out}$  case as an example: in this case, M1 and M4 are off while M2 and M3 are on,  $V_{out}$  ramps up by following clk and  $V_{out}$  drops down by following clk. As a result, output voltage is boosted to amplitude value of clk voltage ( $V_{dd}$  for logic '1' and 0 for logic'0').

Figure 3.3a illustrates the buffer chain of pNBL, and Figure 3.3b shows the output waveform of pNBL gates buf1 and buf2. When buf1 is in the boost stage, out0 and  $\overline{\text{out0}}$  are in high level and output to buf2 which is in the evaluation stage. As shown in the simulation waveform, signal has a half cycle delay for each gate. For the conventional static CMOS flip-flop in synchronous sequential circuits, the delay time for each gate is at least one cycle, so the delay time of pNBL is reduced to half. This feature can help synchronous sequential circuits reduce delay time.

Compare with PBL, pNBL improves as follows. First, the complex pull-up networks are eliminated and therefore complexity of circuits can be improved with the same number of transistors. Second, the crowbar current in the evaluation block is reduced so that power dissipation of pNBL is smaller than PBL.

#### 3.2.3 Energy Dissipation

The power dissipation analysis method of pNBL is similar with that applied on PBL in Ref [32]. To apply this analysis, some assumptions as preconditions are required. For the AC power supply analysis, signals are simplified to sinusoidal format, and the transistors are all working in the linear region. With the simplification, energy



Figure 3.3: pNBL buffer chain

dissipation analysis using average current can be applied instead of using integration of instantaneous value of current and voltage. With average current, energy dissipation can be expressed as  $E = I_A^2 RT$ , where  $I_A$  is the average current, R is equivalent resistance, and T is the period time. Energy analysis is divided into two parts: energy dissipated in the evaluation block and the boost block. In the evaluation block of pNBL, the signals output to boost block are different for logic '1' and '0'. In logic '1' case, when clk is lower than  $V_{th}$  and  $\overline{\text{clk}}$  is higher than  $V_{dd} - V_{th}$ , M7 (or M8) is on, voltage is following with  $\overline{\text{clk}}$ ; when clk is higher than  $V_{th}$  and  $\overline{\text{clk}}$  is lower than  $V_{dd} - V_{th}$ , M7 (or M8) is turning off, and voltage is kept at  $V_{dd} - V_{th}$ . So the swing of signal is between  $V_{dd}$  and  $V_{dd} - V_{th}$  at the same frequency with  $\overline{\text{clk}}$ . While in logic '0' case, PDN is on, so when clk is falling down to 0, voltage is following with clk, but when clk climbs up higher than  $V_{th}$ , and  $\overline{\text{clk}}$  falls down lower than  $V_{dd} - V_{th}$ , the PDN is turning off, and the voltage is kept at  $V_{th}$ . So the swing of signal is between 0 and  $V_{th}$  at the same frequency with clk. Assuming operation frequency is f, then angular frequency  $\omega = 2\pi f$ . Load capacitance for the evaluation block is parasitic capacitance  $C_{pass}$  of pass gate M5 and M6. Thus, current amplitude for both logic '0' and '1' conditions is  $\omega C_{pass}V_{th}$ . Due to the two rail structure of pNBL, possibilities of logic '0' and '1' are the same. Energy dissipation on the evaluation block in one cycle is expressed as:

$$E_{eval} = \frac{1}{2} (\frac{1}{\sqrt{2}} I_0)^2 R_e T + \frac{1}{2} (\frac{1}{\sqrt{2}} I_1)^2 R_e T$$
  

$$\approx \frac{2\pi^2 R_e C_{pass}}{T} C_{pass} V_{th}^2$$
(3.1)

where  $R_e$  is equivalent resistance of transistors in the evaluation block, and T is time of one cycle.

In the boost block of pNBL, analysis is also divided into logic '0' case and logic '1' case. In logic '0' case, because output follows clk in evaluation stage and  $\overline{\text{clk}}$ in boost stage, output voltage swings between 0 and  $V_{th}$  in both half cycles, and therefore its frequency is twice of operation clock. While in logic '1' case, output signal swings between  $V_{dd}$  and  $V_{dd} - V_{th}$  at the same frequency with operation clock. Assuming the load capacitance of pNBL gate is  $C_L$ , current amplitudes are  $2\omega C_L V_{th}$ and  $\omega C_L V_{th}$  for logic '0' and '1' cases, respectively. According to the analysis above, energy dissipation of the boost block is expressed as:

$$E_{boost} \approx \frac{1}{2} (\frac{1}{\sqrt{2}} \omega C_L V_{th})^2 R_b T + \frac{1}{2} (\sqrt{2} \omega C_L V_{th})^2 R_b T$$
  
=  $\frac{5\pi^2 R_b C_L}{T} C_L V_{th}^2$  (3.2)

where  $R_b$  is equivalent resistance of the boost block. As a result, the energy dissipation of pNBL is

$$E_{pNBL} = E_{eval} + E_{boost} + E_{crowbar}$$

$$\approx \frac{2\pi^2 R_e C_{pass}}{T} C_{pass} V_{th}^2 + \frac{5\pi^2 R_b C_L}{T} C_L V_{th}^2 + E_{crowbar}$$
(3.3)

In pNBL structure, equivalent resistance  $R_b$  and  $R_e$  are comparable. Load capacitance  $C_L$  is tens of femto farad at Gigahertz level operation frequency while capacitance of pass gate  $C_{pass}$  is only several femto farad. With these preconditions, the first item in Equation 3.3 is smaller than the second item, which makes Equation 3.3 be simplified to

$$E_{pNBL} \approx \frac{5\pi^2 R_b C_L}{T} C_L V_{th}^2 + E_{crowbar}$$
(3.4)

Equation 3.4 shows that the contribution of the evaluation block is small enough to be negligible, and main power dissipation is caused in the the boost block. The crowbar current happens when evaluated value is logic '0'; the crowbar current flows from  $\overline{\text{clk}}$  to clk. However, in the Pseudo-NMOS structure, the pull-up PMOS M7 and M8 have very large on-resistance to make sure PDN has larger drivability than pull up PMOS. As a result, the crowbar current is very small due to the large on-resistance. Furthermore, the input signals rise and fall with power clock clk and  $\overline{\text{clk}}$ , therefore, even the input signals waveform has a relatively large rising and falling time, the voltage difference between input signals and power clock is approximately the same during the evaluation stage, and this limits the crowbar current. Comparing with energy dissipation of PBL in Equation 3.5, energy dissipation of pNBL is obviously smaller.

$$E_{PBL} \approx \frac{7\pi^2 R_b C_L}{T} C_L V_{th}^2 + E_{crowbar}$$
(3.5)

$$E_{EBL} = 0.045\alpha C_L V_{DD}^2 + \frac{(0.45\pi^2 R C_L)}{T} C_L V_{DD}^2 + E_{crowbar}$$
(3.6)

Equation 3.6 shows the energy dissipation of EBL [25]. The energy dissipation is also divided into 3 parts: evaluation block, boost block and the crowbar current part. Since the structures of boost blocks are the same in both pNBL and EBL, Eboost is the same. The energy comparison is made in the evaluation block and energy dissipated by crowbar current. For the evaluation block part, evaluation blocks of pNBL only drive pass gate M5 and M6, therefore the load capacitance  $C_{pass}$  (several femto-Farad) is smaller than load capacitance of the whole pNBL gate  $C_L$  (tens of femto-Farad). According to Ref [25],  $V_{th}$  is about 0.3 $V_{DD}$ . Assuming activity factor  $\alpha$  as 1/2, even when the circuit works at a frequency of Gigahertz, energy dissipated in pNBL's evaluation block is only 1/1000 of that in EBL's evaluation block by calculation. For the crowbar current part, because power supply for the evaluation block of EBL is DC, and the rising and falling edges of the input waveform are relatively slow, the crowbar current of is large. In pNBL, because the input signals rise or fall with power clock, the crowbar current caused by slow rising and falling is not a problem. As a result the energy dissipated due to crowbar current of pNBL is also smaller than that of EBL.



Figure 3.4: Block diagram of Processing Engine

# 3.3 Design of Processing Engine

#### 3.3.1 Introduction on PE

Processing Engine (PE) is a central processing unit of the architecture of LDPC decoding circuit. In LDPC decoder, a parity check matrix (PCM) is required, and decoding algorithm called message passing algorithm (MPA) is employed. In MPA, log likelihood ratio (LLR) is used to combine messages. PEs are used to perform calculation of LLR.

Figure 3.4 shows a simplified block diagram of PE. The target of PE is to find the minimum value of the input LLR information and subtract the offset factor, and use this offset value to update the check message. Inputs of PE are two 5-bit signed number (1 sign bit and 4 bit number), and by performing the MPA algorithm, the outputs of PE are also two 5-bit signed number. In the PE module, circuits including adder, FIFO, multiplexer and comparator, which can be presentative for digital circuits.

#### 3.3.2 Design PE with pNBL

Design of PE utilizes Top-Down design methodology. Firstly, whole structure and specification of system are confirmed, then the whole system is broken to small modules, and what circuits are required in each module is clarified. After all circuits are designed to meet the request, circuits are formed to modules. Modules have to be simulated to verify whether the combination of circuits meets specification. In the final step, all modules are combined to the system level, and then system level simulation is performed to verify the design.

In the PE illustrated in Figure 3.4, modules of adders, comparator and absolute value calculator are required. To reduce latency of circuits, complexity of pNBL gates should be improved so that one pNBL gate can realize as many functions as possible. Schematic of a 5-bit pNBL comparator is shown in Figure 3.5 which exhibits the capability of pNBL for high complexity function. In the comparator, when input  $a[4:0] \ge b[4:0]$ , output is logic '1' ,and otherwise output is logic '0'. To realize this function, stack height of evaluation block is set to 5, and the simulation result indicates that the circuits can work at the operation frequency as high as 2.2GHz with amplitude 1.2V of power clock.

In order to combine these circuits together into the system, layouts of all circuits are designed with the same power rail, by which way complexity of layout design is much reduced. To apply the proposed pNBL gate with standard CMOS technology, bulk gates of all NMOS transistors and PMOS transistors are connected with Ground and 1.2V, respectively.

Interface circuits between the conventional static CMOS and proposed pNBL are shown in Figure 3.6. With these interfaces, pNBL circuits can be compatible with the conventional static CMOS. Interface between static CMOS is a pNBL buffer, with which input DC signals can be transferred to sinwave-like signals, and setup/hold time of generated signals can also satisfy the requirement of pNBL. Interface between pNBL and static CMOS is the same with that described in Ref [32], which can be



Figure 3.5: 5-bit comparator with pNBL

recognized as a 1-bit A/D converter. This data A/D converter samples data when output is in the peak values, and therefore delay caused by it is small comparing with cycle time of power clock.

To generate the power clock, LC tank is used to recycle energy from circuits because inductor can store energy in format of magnet energy, and then energy is reused in the next cycle in the format of electrical power. Since 2-phase non-overlap power clock is necessary to drive pNBL, circuit called blip [2] shown in Figure 3.7 is designed, the W/L ratio of NMOS pair is set to  $200\mu m/1\mu m$ , and the R in the figure is DCR of on-chip inductor. The power supply is marked as  $V_{CC}$  in Figure 3.7, while in other figures DC power supply for circuits is marked as  $V_{dd}$  because the voltage values are different.

Power supply to the clock generator  $V_{CC}$  is 0.8V DC, amplitude of generated power clock is  $2 \times 0.8 = 1.6V$ , which is determined by the W/L ratio of NMOS pair and



Figure 3.6: Interface circuits between pNBL and static CMOS

current of inductor. To verify that the proposed system can work at a high operation frequency with high energy efficiency, center-tapped on-chip inductor model is used in the simulation. By using an on-chip inductor, all system can be integrated on a chip. Although blip power clock generator has advantage on integration and low power [2], due to its principal, there is disadvantage. Q factor of on-chip inductor is low, and therefore the power efficiency is limited. The influence of mutual inductance is also a demerit point. For the central-tapped inductance, mutual inductance exists and degrades the energy efficiency, but in this design, the inductor is fully symmetric, so the influence of mutual inductance for both ports of the inductor is the same, and as a result, the mutual inductance doesn't affect the operation of clock generator much. Moreover, though the power dissipation of the clock generator is lower than the conventional clock tree circuits, the area penalty of the clock generator is large due to employing the on-chip inductor.

In a large scale circuit, a clock distribution network has to be designed carefully since clock skew and jitter would affect the circuit performance. In the conventional CMOS circuit, a buffer tree is used as a distribution network, but in pNBL circuits, this method cannot be adapted. Because when a power clock signal is gated by a DC power supply buffer, the energy can no longer be recycled and reused. The problem of clock jitter and skew is researched in several literatures [37, 27]. In this work, scale



Figure 3.7: Blip power clock generator



Figure 3.8: Block diagram of PE system

of the proposed PE circuit is small enough to neglect the effect of clock skew and jitter.

Block diagram of whole system is shown in Figure 3.8, which includes power clock generator, PE with pNBL and interface between pNBL and static CMOS.

### 3.4 Test Chip of Proposed PE

#### 3.4.1 Simulation and Evaluation

To demonstrate the improvement on low power, a conventional static CMOS PE with the same architecture is designed using the same  $0.18\mu m$  process technology which is used in the pNBL PE design.

Since the pNBL is compatible with static CMOS, the functional verification method is the same with that of conventional PE circuits: input the data pattern and verify the output data. The simulation shows that the proposed pNBL PE can work correctly at the frequency of 1.5GHz.

Figure 3.9 shows the energy dissipation per cycle with various operation frequencies. To achieve various operation frequencies, two ways are applied: by changing the model of on-chip inductor and by changing capacitance of *LC* tank. The two curves shown in Figure 11 are achieved by varying on-chip inductor and the capacitor, respectively. For the inductor curve, the capacitor in *LC* tank is fixed to 20pF, and 5 value of inductor models are used: 1nH, 1.5nH, 2nH, 3nH, 5nH and 7nH; larger value models are not used because the inductor size is too large for integration on chip. While for the capacitor curve, the inductor is fixed to 3nH, and the capacitor values are 6pF, 10pF, 14pF, 20pF, 35pF and 40pF, and as a result the operation frequencies by changing inductance and capacitance are almost the same. Both curves shows that the energy dissipation increases as the operation frequency increases which is indicated in Equation 4. In the energy dissipation simulation, the operation frequencies are in the range between 403MHz and 1.1GHz.



Figure 3.9: Energy dissipation of pNBL PE in simulation

As shown in Figure 3.9, in higher frequency energy dissipation with smaller inductance is smaller than that with smaller capacitance, in lower frequency energy dissipation with larger inductance is larger than that with larger capacitance. The phenomena is because the energy storage ability of on-chip inductor and capacitor is different. On-chip inductor is created with top two layers of metal, which are with a small value of thickness, and the diameter size of inductor is limited, so the Q value of on-chip inductor is not so large (about 10 with the HFSS simulation), while Q value of on-chip capacitor is much larger, which means the energy dissipates more on on-chip inductor than on on-chip capacitor. Therefore, with the same frequency generated in the power clock generator, higher energy efficiency can be achieved with a larger capacitor than with a large inductor. Also because the Q values are limited especially in small value inductors and capacitors cases, when the clock generator resonates at high frequency, energy efficiency is low.

In this simulation, on-chip inductor is used, so the inductance of blip generator can hardly be changed, and therefore, to achieve various operation frequencies, capacitance of *LC* tank is changed. The on-chip inductor is created with top two layers of metal, which are with a small value of thickness, and the diameter size of inductor is limited, so the *Q* value of on-chip inductor is not so large (about 10 according to the HFSS simulation). With the HFSS simulation result, an on-chip inductor model whose inductance is 3nH was designed. The capacitor values are set to 6pF, 10pF, 14pF, 20pF, 35pF, and 40pF, and the achieved operation frequencies are in the range between 403MHz and 1.1GHz. The simulation result shows that the energy dissipation increases as the operation frequency increases which is indicated in Equation 3.4.

To make the comparison of energy dissipation with the conventional static CMOS circuit, a same architecture PE with conventional static CMOS is designed. A post-layout simulation is performed to verify the energy dissipation changing with operation frequency. Figure 3.10 shows the comparison of energy dissipation between PE with static CMOS and PE with pNBL gates. As explained previously, energy

dissipation of pNBL gate increases linearly with the increase of operation frequency. In the static CMOS case, energy dissipation per cycle is expressed as

$$E_{CMOS} = \frac{1}{2} C_L V_{DD}^2 \tag{3.7}$$

where  $C_L$  is the load capacitance and  $V_{DD}$  is power supply. To achieve a better performance over energy dissipation, in low operation frequency range, a lower  $V_{DD}$ is used, and in the high operation frequency range, a higher  $V_{DD}$  is required to make sure the circuits works correctly. In Figure 3.10, when the operation frequency is higher than 900MHz, a  $V_{DD} = 1.8V$  is required, and the energy dissipation almost doesn't change by varying the operation frequency.



Figure 3.10: Energy dissipation comparison between static CMOS and pNBL

| Items                      | This work               | EBL [25]                | PBL [32]                |
|----------------------------|-------------------------|-------------------------|-------------------------|
| Technology                 | $0.18 \mu m$            | $0.18 \mu m$            | $0.18 \mu m$            |
|                            | 1.2V, sinwave           | 1.2V, sinwave           | 1.2V, sinwave           |
| Power supply               | 2-phase                 | 2-phase                 | 2-phase                 |
|                            |                         | 0.4V,DC                 |                         |
| Simulation frequency range | 400MHz                  | 400MHz                  | 400MHz                  |
| (400 MHz to Max frequency) | to $1.5 \mathrm{GHz}$   | to $1.2 \mathrm{GHz}$   | to $1.5 \mathrm{GHz}$   |
| Energy Dissipation @400MHz | $1.0 \mathrm{pJ/cycle}$ | $1.4 \mathrm{pJ/cycle}$ | $1.2 \mathrm{pJ/cycle}$ |

 Table 3.1:
 Performance Comparison Table

When operation frequency is in the low range (403MHz), energy dissipation of pNBL gate PE is 36% of that dissipated by static CMOS PE. When operation frequency is higher than 1.1GHz, energy dissipation of static CMOS PE and pNBL gate PE is in the same level. To improve the energy efficiency in the high frequency range, the clock generator should be more improved using a higher Q on-chip inductor.

For LDPC application, the operation frequency range is about several hundred megahertz level, so according Figure 3.10, pNBL gate PE can achieve a lower energy dissipation comparing with static CMOS gate in this range.

Table 4.1 compares performance of pNBL with EBL and PBL. Same structure PE's are designed with EBL and PBL for comparison. According to the comparison result, proposed pNBL achieves lower power dissipation than EBL and PBL.

#### 3.4.2 Test Chip Measurement

The proposed PE module with pNBL was fabricated with  $0.18\mu$ m CMOS process technology. Figure 3.11 shows the microphotograph of the test chip including power clock generator and PE module. Area of Processing Engine was  $230 \times 161\mu m^2$ , the whole system including power clock generator was  $558 \times 240\mu m^2$ . The central-tapped on-chip inductor used in the power clock generator was 3nH, and off-chip capacitors 40pF, 37pF, 35pF, 31pF, 27pF, 25pF, and 20pF were used to adjust the resonance frequency of power clock generator, and achieved operation frequencies were 404MHz, 420MHz, 440MHz, 476MHz, 507MHz, 549MHz, and 609MHz, respectively.



Figure 3.11: Microphotograph of test chip



Figure 3.12: Energy dissipation comparison between measurement and simulation

Firstly the function of Processing Engine was tested, the result showed that the test chip could work correctly in all achieved frequency. And then the energy dissipation was measured; in the measurement, the test chip dissipated 1pJ/cycle at low frequency 404MHz and 2.1pJ/cycle at high range frequency 609MHz.

To compare with measurement result, simulation in the range between 400MHz and 600MHz was also run. Figure 3.12 shows the measurement result comparing with simulation result with the varying of operation frequency. Trend of measured result is the same with simulation result, but measured energy dissipation is a little larger than simulated result. This is because elements in measurement such as wire resistance is not included in simulation. According to the measurement result, load capacitance  $C_L$  and equivalent resistance  $R_e$  can be derived. By estimating  $C_L$ 's value to about 1pF,  $R_e$  is calculated to about  $3K\Omega$ , which are reasonable values. Table 3.2 summarizes performance of the test chip.

| Items                               | Value                               |
|-------------------------------------|-------------------------------------|
| Operation frequency range           | 404MHz to 609MHz                    |
| Technology                          | $0.18\mu m$ CMOS technology         |
| Total Transistors NO.               | NMOS:1083                           |
| (including static CMOS interface)   | PMOS:563                            |
| Power supply for oscillator         | DC 0.8V                             |
| Power supply for data A/D converter | DC 1.2V                             |
| Energy dissipation per cycle        | 1pJ@403MHz                          |
| in simulation                       | 3.5 pJ@1.1 GHz                      |
| Power dissipation                   | $403\mu$ W@403MHz                   |
| in simulation                       | $3.85 \mathrm{mW}@1.1 \mathrm{GHz}$ |
| Energy dissipation per cycle        | 1.1pJ@ 404MHz                       |
| in measurement                      | $2.1 \mathrm{pJ}@609 \mathrm{MHz}$  |
| Power dissipation                   | $444 \mu W@404 MHz$                 |
| in measurement                      | $1.28 \mathrm{mW}@609 \mathrm{Hz}$  |
| Test chip area                      | $558 \times 240 \mu m^2$            |

 Table 3.2:
 Performance summary of pNBL gate PE

# 3.5 Conclusion

In this chapter a structure of charge recovery logic called pseudo-NMOS boost logic (pNBL) is proposed. Comparing with other charge recovery logic, the proposed pNBL has the advantage of high speed and low power dissipation with fewer transistors.

A Processing Engine which is used in LDPC decode system is designed with this charge recovery logic and implemented with standard  $0.18\mu m$  CMOS process technology. The simulation results show that proposed PE can work correctly at the operation frequency of 1.5GHz, and when operation frequency is lower than 1.1GHz PE with pNBL gates achieves lower power dissipation than PE with conventional static CMOS gates. At the frequency range of several hundred megahertz which LDPC application is usually applied, energy dissipation of PE with pNBL gates is reduced much. The proposed PE dissipates 3.5pJ per cycle at 1.1GHz, and 1pJ at 403MHz in simulation. The latter one is only 36% of PE with static CMOS gates. Comparing with other charge recovery logic, pNBL also has a better performance over energy dissipation.

The test chip was fabricated and measured, the result showed that the test chip can work at frequency up to 609MHZ with the energy dissipation of 2.1pJ/cycle including PE module and blip power clock generator.

# Chapter 4

# Other Applications of Charge Recovery Logic

In Chapter 2 and 3, both systems use blip LC tank as power clock generator. This structure of clock generator has advantages in high efficiency, and can provide a high operation frequency. In the meanwhile, charge recovery logic can use other methods as power supply, so that the application field of charge recovery logic can be broader. In this chapter, other two applications of charge recovery logic and the power supply methods are discussed.

# 4.1 Wireless Power Transmission

#### 4.1.1 Introduction

In recent research on System-in-a-Package (SiP) is becoming an attractive topic. In SiP, multiple chips are integrated in one package; to make connection between chips, methods such as wire bonding are used. But these methods are very area consuming. To improve the chip area utilization rate, wireless methods are researched to transmit data and power [38, 39, 40]. With these methods, area consumption reduces to 1/20

comparing with the conventional wire connection method [41]. Figure 4.1 illustrates the concepts of wire bonding and inductive coupling transmission.

For conventional circuits, power supply has to be DC format, however, power transferred with inductive coupling method is in AC voltage format. Therefore rectifier is required to convert AC power to DC power. Figure 4.2 shows circuit diagram of the conventional power transmission system, in which the rectifier uses diode structure. Since diode structure dissipates power when convert input AC voltage to DC voltage. Although researches on how to reduce the power dissipation are studied [39, 42], the rectifier dissipates a large proportion of power.

Since the power transferred by inductive coupling is the format of AC, and can be controlled with a single peak frequency, the received power is in a sinusoidal



Figure 4.1: Concepts of chip interconnection



Figure 4.2: Structure of conventional power transmission system

format. Pulse boost logic (PBL) [32] is a charge-recovery logic driven by sinusoidal voltage, and the power dissipation of PBL is lower than conventional CMOS circuits. With these features, PBL is suitable for the wireless power transmission system: it's sinusoidal voltage driven circuits so rectifier is no longer required; it's low power dissipation so the energy efficiency can be higher.

This section proposes a wireless power transmission system with inter-chip power transmission application. In the chip stacking, chips should be face-to-back stacking, however, in the experiment, face-to-face way is used to verify the relationship between distance and power delivered. A test chip is fabricated and measured with the proposed structure. The measurement result shows the 22mW power is transmitted with the coupled inductor, and the AC driven circuits are drove by the transmitted power with no rectifier required.

#### 4.1.2 Wireless Power Transmission System

#### **Power Transmission Components**

To achieve higher power transmission efficiency, on-chip inductor is the key component. The higher Q factor of inductor is, the higher power efficiency can be achieved.



Figure 4.3: Diagram of symmetric spiral inductor

However, Q factor of on-chip is limited because the thickness of metals and the size of spiral are small.

In order to promote the Q factor value, the structure of on-chip inductor is designed. First, two layers of metals including top layer and 4th layer are used to increase thickness of inductor's metal. Second, a symmetric structure is used so that the parasitic capacitance can be reduced. The designed on-chip inductor is simulated by HFSS, and the simulation result indicates that the Q factor can reach as high as 10, which is a considerable value in the on-chip inductor case. Figure 4.3 illustrates the diagram of symmetric spiral on-chip. Since inductive coupling method is applied in this system, coupling factor k is also affect the power efficiency. Equation 1 illustrates the relationship between coupling factor and inductance.

$$k = \frac{M}{\sqrt{L_1 L_2}} \tag{4.1}$$

where M is the mutual inductance of  $L_1$  and  $L_2$ . Once the coupled inductors are fixed, k is only related to the distance between the two coupled inductors: the larger distance is, the smaller k is.

The power transmitter system utilizes the H-bridge structure as illustrated in Figure 4.4. By changing the input clock signals, current on inductor  $L_T$  in the transmitter side changes, and a changing magnetic field is generated; by sensing the field changes,  $L_R$  in the receiver side engenders AC current through itself.

#### Signal Transmission System

Since the proposed system has no rectifier, no DC power is available in the receiver side. The receiver side utilizes the PBL circuits which requires no DC voltage. And to transfer signals between transmitter side and receiver side, on-chip symmetric inductors are used, and the size of each inductor is  $20\mu m \times 20\mu m$ .

Figure 4.5 illustrates the transceiver circuits of signals and the simulation waveform. Output signals are in AC format, which is suitable for the PBL circuits. In the test chip, only a D flip-flop PBL circuits is implemented to verify whether the wireless power transmission system can drive PBL circuits or not.

In some cases, signals are transferred from the sub-chip (power receiver side) to main-chip (power transmitter side). Since in sub-chip side signals are processed with PBL circuits, they are in AC format. Thus, the transceiver circuit has to be redesigned. Figure 4.6 illustrates the proposed structure and the simulation waveform of transceiver.



Figure 4.4: Structure of H-bridge



Figure 4.5: Signal transceiver from main-chip to sub-chip



Figure 4.6: Signal transceiver from sub-chip to main-chip



Figure 4.7: Transmitted power vs. coupling factor k

#### 4.1.3 Test Chip and Experiment Results

In this section, simulation results and measurement results are exhibited and compared.

#### Simulation Results

In the simulation circuits, inductance of both sides of transceiver is according to the simulation result of HFSS. Simulation on the relationship between transmitted power and coupling factor is applied; as described previous, coupling factor k is only related to distance between transmitter chip and receiver chip, so the simulation indicates the relationship between transmitted power and distance. Figure 4.7 shows the simulation result. According to the result, transmitted power decreases as the distance between chips increases, which is the same as analyzed.



Figure 4.8: Transmitted power vs. operation frequency

In this system, frequency of transmitted power is the same with operation frequency, so the relationship between transmitted power and operation frequency is also simulated. The simulation result is shown in Figure 4.8; the power reaches the maximum value 32mW at the frequency of 160MHz. The function of PBL circuits is also simulated, the simulation result indicates that the PBL D flip-flop works functional correctly.

#### Measurement of Test Chip

The proposed system is implemented with standard  $0.18\mu$ m CMOS technology. Microphotographies of test chip including transmitter and receiver are shown in Figure 4.9. Inductors for power transmission are the ones of size  $700\mu$ m× $700\mu$ m; those ones of size  $20\mu$ m× $20\mu$ m are for signals transmission. The two chips are setup face-to-face cross over. In the real packaging process, chips should be setup face up,



(a) Main-chip



(b) Sub-chip

Figure 4.9: Microphotograph of test chip

but in this experiment, a measurement on relationship between distance and power, which is much easier to be applied in the face-to-face way.

Three values of distance are set to test the relationship between power and chip distance. Figure 4.10 shows the measurement result, which is quite consistent to the simulation result. When the distance is larger than  $800\mu$ m, almost no power can be detected in the receiver side, that's because inductance of the on-chip spiral inductor is limited by size and metal thickness.



Figure 4.10: Measured power vs. distance

As simulation result shows, transmitted power is varying with the operation frequency. Figure 4.11 shows the measurement result of transmitter power with different operation frequency at the chip distance of  $15\mu$ m. According to the measurement result, the transmitted power increases as operation frequency increases in the low frequency range, when the operation frequency is 135MHz, transmitted power reaches the maximum value 22mW; after that the transmitted power decreases



Figure 4.11: Measured power vs. operation frequency

as operation frequency increases. The operation frequency at which the maximum power achieved is closed to simulation result; and the most proper operation frequency can be decided for power transmission. Comparing with the simulation results, measured power is 10mW lower and the frequency of maximum power achieved is also different. That's because there are other elements such as parasitic resistance and capacitance of pads in the test chip which is not included in the simulation. These elements dissipate power and affect frequency performance, thus, the measurement results are different from simulation results. Table 4.1 gives the comparison with previous published work, this work achieves larger power transmission with a longer distance.

 Table 4.1: Comparison with previous work

|                       | This work                                      | CICC [39]                  |
|-----------------------|------------------------------------------------|----------------------------|
| Transmitted power     | $22 \mathrm{mW}$                               | $2.5\mathrm{mW}$           |
| On-chip inductor size | $700 \mu \mathrm{m} \times 700 \mu \mathrm{m}$ | $700\mu m \times 700\mu m$ |
| Distance              | face-to-face $15\mu m$                         | face-to-face $< 10 \mu m$  |
| CMOS process          | $0.18 \mu \mathrm{m}$                          | $0.35 \mu { m m}$          |

Other than power transmission, signals transmission is also measured. In this system, mainly functional correction is measured. The measurement result shows that output signal is one cycle delayed comparing with input signal, which proves the D flip-flop works functional correctly.

# 4.2 Real Time Counter in Sensor Network System

#### 4.2.1 Introduction

In the ubiquitous society, sensor network helps us to monitor the environment, health condition, and etc. However, for many applications, the sensors are in sleep mode in most of time, for example, sensor to monitor the rate of water flow is in sleep mode for almost one hour, turns to active mode for several micro seconds and monitor the rate of water flow, and then turns to sleep mode again. In the sleep mode, only a real time counter works to calculate the sleeping time of sensor, and therefore, the power consumption of the real time counter decides the battery life of the sensor node.

To reduce the power dissipated by the real time counter, charge recovery logic is a decent option, because the real time counter is totally a sequential circuit, and as a result, the power clock can drive the circuit as clock and power supply. The problem is that power clock signals have to be recycled and reused by the clock generator, while some clock generators, such as ring-oscillator, don't have the ability to recycle power. Thus, a power clock generator which has enough drivability and the ability to recycle and reuse power is the crucial factor in using charge recovery logic. The blip power clock generator used in Chapter 2 and 3 can provide power clock, however, in the application of sensor network, these power clock signals are not sufficient. The blip power clock generator uses LC resonant circuit to generate power clock, which decides the operation frequency of the circuits to be  $f = 1/2\pi\sqrt{LC}$ . The operation frequency of sensor node is usually several hundreds of kilo-hertz, so to achieve this range of frequency, L and C should be very large, which occupies area and dissipates more power. To satisfy these criteria, crystal oscillator is discussed.

In this section, a real time counter with charge recovery logic is proposed, and the structure and characteristics of crystal oscillator is discussed. Simulation indicates that the power dissipation of the real time counter with charge recovery logic is only 16% of that with static CMOS.

#### 4.2.2 Power Clock Generator

As discussed previous, crystal oscillator is a good option to satisfy the criteria for charge recovery logic application. In this section, structure and characteristic of crystal oscillator are discussed.

A crystal oscillator is an electronic oscillator circuit that uses the mechanical resonance of a vibrating crystal of piezoelectric material to create an electrical signal



Figure 4.12: Symbol and equivalent model of crystal

with a very precise frequency. In the circuits scheme, a crystal is usually presented as Fig 4.12a, and the equivalent model of crystal is shown in Fig 4.12b. As the figure shows, the equivalent circuit of crystal is a LCR circuit, so the impedance of the model can be described as follows.

$$Z(s) = \left(\frac{1}{s \cdot C_1} + s \cdot L_1 + R_1\right) \parallel \left(\frac{1}{s \cdot C_0}\right)$$
(4.2)

or

$$Z(s) = \frac{s^2 + s\frac{R_1}{L_1} + \omega_s^2}{(s \cdot C_0)[s^2 + s\frac{R_1}{L_1} + \omega_p^2]}$$

$$\Rightarrow \omega_s = \frac{1}{\sqrt{L_1 C_1}},$$

$$\omega_p = \sqrt{\frac{C_1 + C_0}{L_1 C_1 C_0}} = \omega_s \sqrt{1 + \frac{C_1}{C_0}}$$

$$\approx \omega_s (1 + \frac{C_1}{2C_0}) \quad (C_0 \gg C_1)$$
(4.3)

where s is the complex frequency  $(s = j\omega)$ ,  $\omega_s$  is the series resonant frequency in radians and  $\omega_p$  is the parallel resonant frequency in radians.

Adding additional capacitance across a crystal will cause the parallel resonance to shift downward. This can be used to adjust the frequency at which a crystal oscillates. Crystal manufacturers normally cut and trim their crystals to have a specified resonance frequency with a known 'load' capacitance added to the crystal. Only the crystal is not enough to oscillating, some peripheral circuits are required, Fig 4.13 shows the structure of crystal oscillator. The capacitance C includes capacitor to adjust the oscillation frequency, and the equivalent capacitance of load circuits.



Figure 4.13: Structure of crystal oscillator

To apply this crystal oscillator as power supply for charge recovery logic, the power must be able to be recycled. So the LCR network must connect the charge recovery logic circuits directly. According to Fig 4.13, the output of crystal oscillator is sinusoidal signal and directly input to the load circuit, which satisfies the previous description. To achieve high power efficiency, the Q factor of LC tank should be as high as possible, Q factor is defined as Equation 4.4. Therefore, the higher Q value is, the lower power dissipated. The Q factor of oscillation crystal is very high (up to

tens of thousand), so the energy efficiency can be much higher comparing with the LC resonant clock generator.

$$Q = 2\pi \frac{Maximum \ Energy \ Stored}{Energy \ Dissipated \ per \ Cycle}$$
(4.4)

#### 4.2.3 Real Time Counter

Real time counter is used for calculating the sleeping time of sensor node. Main structure of real time counter is binary counter with peripheral circuits. Because the counter is required to counter the time for tens of minutes, it requires 28 bits at the operation frequency of 100 kHz  $(2^{28}/(60 \times 100k) \approx 44min)$ . However, for simulation, a 28-bit counter consumes too much simulation time, so a 16-bit counter is designed and simulated to verify the power dissipation enhancement of charge recovery logic with crystal oscillator. In this design , pNBL type charge recovery logic is used.



Figure 4.14: Structure of 4-bit counter with pNBL

A 16-bit counter is formed by four 4-bit counter, and the structure of 4-bit counter with pNBL is shown in Fig 4.14. The carry out signal Co is required to trigger the next stage 4-bit counter, so it has to be logic 1 at the same time when the output Q[3:0] = 4'b1111. However, half cycle clock latency exists for each pNBL gate, so the input of the AND gate cannot be Q[3:0], but other signals which can eliminate this


Figure 4.15: Block diagram of 16-bit counter

delay. After the carry out signal Co is well designed, four 4-bit counter combines to one 16-bit counter which is shown in Fig 4.15.

As described in previous section, clock signal generated by crystal oscillator is sinusoidal format, which cannot be used as clock signal for conventional static CMOS because it would cause large crowbar current power loss. So in the conventional sensor node, a component which is to convert the sinusoidal signal to square wave signal is necessary. Fig 4.16 illustrates the concept of sensor node with conventional real time counter. An inverter is used to convert the sinusoidal signal to square wave signal, the converted clock signal are input to real time counter and the data processing unit. When the counter counts to a certain time, an enable signal is generated to make sensor node enter active mode. Sensor monitors the environment, and data are processed in the data processing unit. This procedure lasts very short time, and then the node turns to sleep mode and the real time counter starts counting again.

Fig 4.17 illustrates the novel structure of sensor node with pNBL real time counter. The clock signal output from the crystal oscillator drives the real time counter directly. When the counter counts to a certain value, an enable signal is generated to trigger the AND gate to convert sinusoidal clock signal to square wave clock signal and input the clock to data processing unit. After the data processing unit finishes the data



Figure 4.16: Sensor node with conventional structure



Figure 4.17: Sensor node with novel structure

| Operation frequency | Power Dissipation | Power Dissipation | Power Dissipation |
|---------------------|-------------------|-------------------|-------------------|
| (Hz)                | CRL(nW)           | static CMOS (nW)  | converter (nW)    |
| 16k                 | 5.07              | 21.63             | 156.7             |
| 32k                 | 8.92              | 41.68             | 158               |
| 50k                 | 12.52             | 65.11             | 164.3             |
| 70k                 | 16.59             | 89.75             | 171.4             |
| 100k                | 21.63             | 129.2             | 173.3             |

 Table 4.2: Power dissipation comparison

process procedure, the node turns to sleep mode and the real time counter starts counting again.

Comparing Fig 4.16 and Fig 4.17, two major different points can be found out. 1) The convert component only works in the active mode for the novel structure, but in the conventional structure, it is required to work all the time. 2) The novel structure adopts the pNBL charge recovery logic to construct the real time counter. With these two points, power dissipation of the sensor node in the sleep mode can be greatly reduced.

#### 4.2.4 Simulation and Test Chip

To compare the power dissipation of proposed sensor node structure and conventional sensor node structure, both structures were constructed with  $0.18\mu$ m CMOS technology, and the energy dissipation simulation were applied. Table ?? shows the simulation result of power dissipation with varying operation frequency. The power dissipation consists of real time counter and clock signal converter. According to the simulation result, power dissipated by the converter in conventional structure is more than 150 nW, and in the novel structure, this part is eliminated. While the power dissipate by the real time counter is also reduced dramatically in the novel structure; it's only 16% of that in the conventional structure when the operation frequency is 100 kHz. In total, about 280 nW power is saved with this novel structure at 100 kHz, which is 92% of power dissipation with former structure.

#### 4.3 Conclusion

In this chapter, alternative application fields of charge recovery logic are discussed. In Section 4.1, wireless power transmission system is studied. In the System-in-a-Chip (SiP) application, wire bonding method is widely used to connect chips, however, this method is area consuming, which limits the chip efficient area. To improve the chip area utilization rate, wireless methods are researched to transmit data and power. In the conventional wireless power transmission systems by inductive coupling method, a rectifier is required since the power transmitted is in a sinusoidal format, but the power efficiency is lowered dramatically by the rectifier. In the meanwhile, power supplied to the charge recovery logic is in sinusoidal format. Therefore, logic circuits with charge recovery logic don't need the rectifier to convert sinusoidal voltage to stable DC voltage, which means the power efficiency can be promoted greatly. To verify this proposal, a test chip was designed, and two on-chip inductors are used to transmit power to drive a PBL D filp-flop. The measurement results show that power efficiency is promoted to 22mW while previous work only transmitted 2.5mW.

In Section 4.2, application of charge recovery logic in sensor node is studied. In some sensor network application, sensor nodes are in sleep mode for most time. In the sleep mode, only the real time counter is supposed to work to calculate the time of sleep mode. The clock generator for the sensor node is crystal oscillator which generates sinusoidal wave signal which cannot be used as clock signal for conventional static CMOS. Thus, a converter which converts sinusoidal signal to square wave signal is necessary. So, in sleep mode, power is dissipated by two components: 1) real time counter and 2) clock signal converter. Power dissipation in sleep mode takes large partition in the total power dissipation of sensor node because in a sensing cycle, sleep mode takes time of tens of minutes while active mode takes only several milliseconds. In order to lengthen the battery life, power dissipated in sleep mode should be reduced. The charge recovery logic circuits require sinusoidal wave clock as power supply and clock signal, so the clock signal converter is no longer necessary, and it also dissipates less power than conventional static CMOS circuits. Therefore, substitute static CMOS real time counter with charge recovery logic can reduce both the two components' power dissipation. To prove this proposal, 16-bit counters with both pNBL type charge recovery logic and conventional static CMOS were designed and simulated. The simulation results show that real time counter with charge recovery logic dissipates only 16% power of that dissipated by counter with static CMOS. Adding the power dissipated by the clock signal converter, sensor node structure with charge recovery logic counter reduced 92% power dissipation comparing with conventional structure.

# Chapter 5

### Conclusion

Charge recovery logic is a promising technology for low power application. In this dissertation, charge recovery logic circuits for low power and high operation frequency are studied. To improve the operation speed of charge recovery logic, two novel structures of charge recovery logic circuits are proposed.

The first one is called Pulse Boost Logic, which consists of two rails of complementary NMOS evaluation networks and a latch to amplify output voltage. Power supply for PBL is two-phase non-overlap power clocks, and the power clock generator uses LC resonant circuit. To demonstrate the low power of PBL, a 4-bit multiplier is designed and fabricated with  $0.18\mu$ m CMOS technology. The simulation result and measurement result prove that PBL can work at high operation frequency while consumes low power.

The second one is called pseudo-NMOS Boost Logic, which reduces the transistor number of PBL and achieves lower power dissipation. To demonstrate that pNBL can achieve low power dissipation as well as construct a large scale of circuits, a Processing Engine which is used in LDPC decode system is designed and fabricated with pNBL in standard 0.18 $\mu$ m CMOS process technology. A PE is much more complex than multiplier, thus, the PE test chip can prove that pNBL is suitable for large scale of integration circuits. Simulation result and measurement result of test chip prove that PE with pNBL works correctly and achieves lower power dissipation comparing with other charge recovery logic.

Since the proposed charge recovery logic circuits are driven by clock in sinusoidal format, application fields whose clock signals are in sinusoidal format are suitable to use charge recovery logic. Two types of applications are studied to demonstrate that charge recovery logic is applicable. The first application is wireless power transmission in inductive coupling method. Power transmitted in inductive coupling method is with sinusoidal wave voltage, which cannot be used as power supply for conventional static CMOS, hence rectifier is necessary. However, rectifier reduces the power efficiency greatly. Charge recovery logic is driven by sinusoidal wave voltage, so the rectifier is no longer required. The simulation result and measurement result show that wireless power transmission method can drive charge recovery logic and the power transmitted is about 10 times of conventional method. The second application is real time counter in sensor node. Clock generator of sensor node is crystal oscillator which generates sinusoidal wave clock signal. To drive real time counter with static CMOS, converter to convert the clock signal to square wave clock signal is required, but the converter consumes relative large power. Substituting static CMOS with charge recovery logic to construct the real time counter can eliminate the converter and reduce the power dissipation of real time counter itself. A simulation is run to demonstrate the proposal, and the result shows that more than 90% power can be saved by using charge recovery logic.

All the description, analysis, and experiments in this dissertation demonstrate that charge recovery logic can work with low power dissipation as well as in a high operation frequency. These characteristics indicate that for low power high performance circuits, charge recovery logic is a promising substitution of conventional static CMOS circuits.

# Bibliography

- [1] http://www.itrs.net/.
- W.C. Athas, L.J. Svensson, and N. Tzartzanis. A resonant signal driver for twophase, almost-non-overlapping clocks. *Circuits and Systems, 1996. ISCAS '96.*, *'Connecting the World'., 1996 IEEE International Symposium on*, 4:129–132 vol.4, 12-15 1996.
- [3] John P. Uyemura. Introduction to VLSI Circuits and Systems. John Wiley & Sons, INC., 2002.
- [4] H. Soeleman and K. Roy. Ultra-low power digital subthreshold logic circuits. In Low Power Electronics and Design, 1999. Proceedings. 1999 International Symposium on, pages 94 – 96, 1999.
- [5] B.C. Paul, H. Soeleman, and K. Roy. An 8x8 sub-threshold digital cmos carry save array multiplier. In Solid-State Circuits Conference, 2001. ESSCIRC 2001. Proceedings of the 27th European, pages 377 –380, sept. 2001.
- [6] M. Jamal Deen, M.H. Kazemeini, and S. Naseh. Ultra-low power vcos performance characteristics and modeling (invited). In *Devices, Circuits and Systems, 2002. Proceedings of the Fourth IEEE International Caracas Conference on*, pages C033–1 – C033–8, 2002.

- [7] L. Benini, P. Siegel, and G. De Micheli. Saving power by synthesizing gated clocks for sequential circuits. *Design Test of Computers, IEEE*, 11(4):32 –41, winter 1994.
- [8] C. Cao and B. Oelmann. Mixed synchronous/asynchronous state memory for low power fsm design. In *Digital System Design*, 2004. DSD 2004. Euromicro Symposium on, pages 363 – 370, aug.-3 sept. 2004.
- [9] H. Mahmoodi, V. Tirumalashetty, M. Cooke, and K. Roy. Ultra low-power clocking scheme using energy recovery and clock gating. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 17(1):33-44, jan. 2009.
- [10] A. Sathanur, L. Benini, A. Macii, E. Macii, and M. Poncino. Multiple powergating domain (multi-vgnd) architecture for improved leakage power reduction. In Low Power Electronics and Design (ISLPED), 2008 ACM/IEEE International Symposium on, pages 51 –56, aug. 2008.
- [11] K. Shi and D. Howard. Sleep transistor design and implementation simple concepts yet challenges to be optimum. In VLSI Design, Automation and Test, 2006 International Symposium on, pages 1 –4, april 2006.
- [12] Sin-Yu Chen, Rung-Bin Lin, Hui-Hsiang Tung, and Kuen-Wey Lin. Power gating design for standard-cell-like structured asics. In *Design, Automation Test in Europe Conference Exhibition (DATE)*, 2010, pages 514–519, march 2010.
- [13] R. Landauer. Irreversibility and heat generation in the computing process. IBM Journal of Research and Development, 5(3):183 –191, july 1961.
- [14] J.G. Koller and W.C. Athas. Adiabatic switching, low energy computing, and the physics of storing and erasing information. In *Physics and Computation*, 1992. PhysComp '92., Workshop on, pages 267 –270, oct 1992.

- [15] S. Younis and T. Knight. Practical implementation of charge recovering asymptotically zero power cmos. In *Proceedings of the 1993 Symp. on Integrated* Sys., MIT Press, pages 234–250, 1993.
- [16] W.C. Athas, J.G. Koller, and L.J. Svensson. An energy-efficient cmos line driver using adiabatic switching. In VLSI, 1994. Design Automation of High Performance VLSI Systems. GLSV '94, Proceedings., Fourth Great Lakes Symposium on, pages 196 –199, mar 1994.
- [17] Yong Moon and Deog-Kyoon Jeong. Efficient charge recovery logic. VLSI Circuits, 1995. Digest of Technical Papers., 1995 Symposium on, pages 129 – 130, Jun 1995.
- [18] D. Suvakovic and C. Salama. Two phase non-overlapping clock adiabatic differential cascode voltage switch logic (adcvsl). Solid-State Circuits Conference, 2000. Digest of Technical Papers. ISSCC. 2000 IEEE International, pages 364 -365, 2000.
- [19] Hu Jianping, Cen Lizhang, and Liu Xiao. A new type of low-power adiabatic circuit with complementary pass-transistor logic. ASIC, 2003. Proceedings. 5th International Conference on, 2:1235 – 1238 Vol.2, Oct. 2003.
- [20] V.K. De and J.D. Meindl. Complementary adiabatic and fully adiabatic mos logic families for gigascale integration. Solid-State Circuits Conference, 1996. Digest of Technical Papers. 42nd ISSCC., 1996 IEEE International, pages 298 -299, 461, Feb 1996.
- [21] Y. Takahashi, K. Konta, K. Takahashi, M. Yokoyama, K. Shouno, and M. Mizunuma. Carry propagation free adder/subtracter using adiabatic dynamic cmos logic circuit technology. *Fundamentals of Electronics, Communications and Computer Sciences, IEICE Transactions on*, E86-A(6):1437–1444, Jun 2003.

- [22] Y. Takahashi, T. Sekine, and M. Yokoyama. Vlsi implementation of a 4x4-bit multiplier in a two phase drive adiabatic dynamic cmos logic. *Electronics, IEICE Transactions on*, E90-C(10):2002–2006, Oct 2007.
- [23] Yibin Ye and K. Roy. Qserl: quasi-static energy recovery logic. Solid-State Circuits, IEEE Journal of, 36(2):239 –248, Feb 2001.
- [24] V. S. Sathe, J.-Y. Chueh, and M. C. Papaefthymiou. Energy-efficient ghz-class charge-recovery logic. *Solid-State Circuits, IEEE Journal of*, 42(1):38–47, Jan. 2007.
- [25] J.C. Kao, Wei-Hsiang Ma, V.S. Sathe, and M. Papaefthymiou. A charge-recovery 600mhz fir filter with 1.5-cycle latency overhead. *ESSCIRC*, 2009. *ESSCIRC '09. Proceedings of*, pages 160–163, Sept. 2009.
- [26] Wei-Hsiang Ma, Jerry C. Kao, Visvesh S. Sathe, and Marios Papaefthymiou. A 187mhz subthreshold-supply robust fir filter with charge-recovery logic. VLSI Circuits, 2009 Symposium on, pages 202 –203, June 2009.
- [27] B. Mesgarzadeh, M. Hansson, and A. Alvandpour. Jitter characteristic in charge recovery resonant clock distribution. *Solid-State Circuits, IEEE Journal of*, 42(7):1618–1625, july 2007.
- [28] Hiok-Tiaq Ng, R. Farjad-Rad, M.-J.E. Lee, W.J. Dally, T. Greer, J. Poulton, J.H. Edmondson, R. Rathi, and R. Senthinathan. A second-order semidigital clock recovery circuit based on injection locking. *Solid-State Circuits, IEEE Journal of*, 38(12):2101 2110, dec. 2003.
- [29] J. W. Nilsson. *Electric Circuits 3rd edition*. New York: Addison-Wesley, 1990.
- [30] A.J. Drake, K.J. Nowka, T.Y. Nguyen, J.L. Burns, and R.B. Brown. Resonant clocking using distributed parasitic capacitance. *Solid-State Circuits, IEEE Journal of*, 39(9):1520 – 1528, sept. 2004.

- [31] To-Po Wang and Huei Wang. High-q micromachined inductors for 10-to-30-ghz rfic applications on low resistivity si-substrate. In *Microwave Conference*, 2006. 36th European, pages 56 –59, sept. 2006.
- [32] Yimeng Zhang, Leona Okamura, Mengshu Huang, and Tsutomu Yoshihara. A novel structure of energy efficiency charge recovery logic. Green Circuits and Systems, The 1st International Conference on, pages 133–136, June 2010.
- [33] John P. Uyemura. Introduction to VLSI Circuits and Systems. John Wiley & Sons, INC., 2002.
- [34] Yimeng Zhang, Mengshu Huang, Nan Wang, Satoshi Goto, and Tsutomu Yoshihara. A 1pj/cycle processing engine in ldpc application with charge recovery logic. Solid-State Circuits Conference, 2011. A-SSCC 2011. IEEE Asian, pages 213 – 216, Nov 2011.
- [35] Lei Chen, Jun Xu, I. Djurdjevic, and S. Lin. Near-shannon-limit quasi-cyclic lowdensity parity-check codes. *Communications, IEEE Transactions on*, 52(7):1038
   - 1042, july 2004.
- [36] A. Darabiha, A. Chan Carusone, and F.R. Kschischang. Power reduction techniques for ldpc decoders. *Solid-State Circuits, IEEE Journal of*, 43(8):1835 -1845, aug. 2008.
- [37] A.J. Drake, K.J. Nowka, T.Y. Nguyen, J.L. Burns, and R.B. Brown. Resonant clocking using distributed parasitic capacitance. *Solid-State Circuits, IEEE Journal of*, 39(9):1520 – 1528, sept. 2004.
- [38] Y. Take, N. Miura, and T. Kuroda. A 30gb/s/link 2.2tb/s/mm2 inductivelycoupled injection-locking cdr. In Solid State Circuits Conference (A-SSCC), 2010 IEEE Asian, pages 1 –4, nov. 2010.
- [39] K. Onizuka, H. Kawaguchi, M. Takamiya, T. Kuroda, and T. Sakurai. Chipto-chip inductive wireless power transmission system for sip applications. In

Custom Integrated Circuits Conference, 2006. CICC '06. IEEE, pages 575–578, sept. 2006.

- [40] Yuan Yuxiang, Y. Yoshida, and T. Kuroda. Non-contact 10% efficient 36mw power delivery using on-chip inductor in 0.18-μm cmos. In Solid-State Circuits Conference, 2007. ASSCC '07. IEEE Asian, pages 115 –118, nov. 2007.
- [41] K. Niitsu, Y. Shimazaki, Y. Sugimori, Y. Kohama, K. Kasuga, I. Nonomura, M. Saen, S. Komatsu, K. Osada, N. Irie, T. Hattori, A. Hasegawa, and T. Kuroda. An inductive-coupling link for 3d integration of a 90nm cmos processor and a 65nm cmos sram. In *Solid-State Circuits Conference - Digest of Technical Papers, 2009. ISSCC 2009. IEEE International*, pages 480 –481,481a, feb. 2009.
- [42] M. Ghovanloo and K. Najafi. Fully integrated wideband high-current rectifiers for inductively powered devices. *Solid-State Circuits, IEEE Journal of*, 39(11):1976 - 1984, 2004.

## Publications

### **Journal Papers**

- Y. Zhang, M. Huang, N. Wang, S. Goto, and T. Yoshihara, "Energy Efficient Processing Engine in LDPC Application with High-Speed Charge Recovery Logic," Journal of Semiconductor Technology and Science, in press.
- Mengshu Huang, <u>Y. Zhang</u>, and T. Yoshihara, "An Efficient Dual Charge Pump Circuit using Charge Sharing Clock Scheme," Fundamentals of Electronics, Communications and Computer Sciences, IEICE Transactions on, Vol.E95-A,No.2,pp.439-446,Feb. 2012.
- Y. Zhang, L. Okamura, and T. Yoshihara, "An Energy Efficiency 4-bit Multiplier with Two-phase Non-overlap Clock Driven Charge Recovery Logic," Electronics, IEICE Transactions on, vol.E94-C, no.4,pp.605-612, April 2011.

#### International Conference

- J. Zhou, M. Huang, <u>Y. Zhang</u>, H. Zhang, and T. Yoshihara, "A Novel Charge Sharing Charge Pump for Energy Harvesting Application," SSoC Design Conference (ISOCC), 2011 International, pp.373-376, 17-18 Nov. 2011, Jeju.
- Y. Zhang, M. Huang, N. Wang, S. Goto, and T. Yoshihara, "A 1pj/cycle Processing Engine in LDPC Application with Charge Recovery Logic," Solid-State Circuits Conference, 2011. A-SSCC 2011. IEEE Asian, pp.213-216, Nov 2011, Jeju.

- Y. Zhang, M. Huang, H. Zhang, and T. Yoshihara, "A Non-Rectifier Wireless Power Transmission System Using On-Chip Inductor," 2011 IEEE 9th International Conference on ASIC, ASICON 2011, pp124-127, Oct 2011, Xiamen.
- M. Huang, <u>Y. Zhang</u>, H. Zhang, and T. Yoshihara, "Double Charge Pump Circuit with Triple Charge Sharing Clock Scheme," 2011 IEEE 9th International Conference on ASIC, ASICON 2011, pp148-152, Oct 2011, Xiamen.
- H. Zhang, <u>Y. Zhang</u>, M. Huang, and T. Yoshihara, "CMOS Low-power Subthreshold Reference Voltage Utilizing Self-biased Body Effect," 2011 IEEE 9th International Conference on ASIC, ASICON 2011, pp552-555, Oct 2011, Xiamen.
- N. Wang, <u>Y. Zhang</u>, and T. Yoshihara, "An Innovative Invert Charge Recovery Logic Structure," Electronic Devices, Systems and Applications (ICEDSA), 2011 International Conference on, pp.25-28, April 2011, Kuala Lumpur.
- H. Zhu, M. Huang, <u>Y. Zhang</u>, and T. Yoshihara, "A 4-Phase Cross-Coupled Charge Pump with Charge Sharing Clock Scheme," Electronic Devices, Systems and Applications (ICEDSA), 2011 International Conference on , pp.73-76 April 2011, Kuala Lumpur.
- Y. Zhang, L. Okamura, N. Wang, and T. Yoshihara, "A 160MHz 4-Bit Pipeline Multiplier Using Charge Recovery Logic Technology," SoC Design Conference (ISOCC), 2010 International, pp.127-130, Nov. 2010, Incheon.
- Y. Zhang, L. Okamura, M. Huang, and T. Yoshihara, "A Novel Structure of Energy Efficiency Charge Recovery Logic," Green Circuits and Systems (ICGCS), 2010 International Conference on, pp.133-136, June 2010, Shanghai.
- Y. Zhang, L. Okamura, M. Huang, and T. Yoshihara, "Power Analysis of Distributed Differential Oscillator," Electronic Devices, Systems and Applications (ICEDSA), 2010 Intl Conf on, pp.179-182, April 2010, Kuala Lumpur.

 Y. Tseng, <u>Y. Zhang</u>, L. Okamura, and T. Yoshihara, "A New 7-transistor SRAM Cell Design with High Read Stability," Electronic Devices, Systems and Applications (ICEDSA), 2010 Intl Conf on, pp.43-47, April 2010, Kuala Lumpur.

#### Patent

No. 2011-244246 (Japan)(Pending)
 Title: Semiconductor Integration Circuits Equipment
 Inventors: Y. Zhang, T. Yoshihara