# A Fully Integrated SRAM-based CMOS Arbitrary Waveform Generator for Analog Signal Processing

A Dissertation Presented to The Academic Faculty

by

**Tae Joong Song** 

In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the School of Electrical and Computer Engineering

> Georgia Institute of Technology August 2010

Copyright 2010 by TAE JOONG SONG

# A Fully Integrated SRAM-based CMOS Arbitrary Waveform Generator for Analog Signal Processing

Approved by:

Dr. Jongman Kim, Advisor School of Electrical and Computer Engineering *Georgia Institute of Technology* 

Dr. Emmanouil M Tentzeris School of Electrical and Computer Engineering *Georgia Institute of Technology* 

Dr. Sung Ha Kang School of Mathematics *Georgia Institute of Technology*  Dr. Saibal Mukhopadhyay School of Electrical and Computer Engineering *Georgia Institute of Technology* 

Dr. Chang-Ho Lee School of Electrical and Computer Engineering *Georgia Institute of Technology* 

Date Approved: May 28, 2010

## Acknowledgements

First, I would like to express the deepest appreciation to my advisor, Professor Jongman Kim. He has given me a continuous and brilliant spirit of adventure related to research. Without his guidance and persistent help this dissertation would not have been possible.

I would like to thank my reading committee members, Professor Saibal Mukhopadhyay and Professor Chang-Ho Lee for their valuable support and helpful comments during my research. I also thank Professor Emmanouil M Tentzeris and Professor Sung Ha Kang for their precious time, cooperation, and suggestions.

I am especially grateful to Dr. Kyutae Lim for his support and valuable comments on my research during the whole research years.

I am greatly indebted to the members of Microwave Applications Group for their assistance, cooperation and favors, especially to cognitive radio members: Sang Min Lee, Jongmin Park, Jaehyouk Choi, Kwanwoo Kim, Joonhoi Hur, Sanghyun Woo, Sungho Beck, Stephen Kim, Seungil Yoon, Michael Lee, and Dr. Bong-Guk Yu.

I am deeply grateful to my mother, Myung Woo Han. During my graduate studies, she has been the source of the endless love and encouragement. This work could not have been completed without her dedicated support.

Finally, I would like to thank my family, Hyun Ju Kim, Seo-Yeon Song for their abundant love and support. All my work was fulfilled by their effort and concerns.

## **Table of Contents**

| Acknowledgementsiii                            |
|------------------------------------------------|
| List of Tablesviii                             |
| List of Figuresix                              |
| List of Abbreviationsxiii                      |
| Summaryxvi                                     |
| CHAPTER I: Introduction                        |
| 1.1 Technology trends1                         |
| 1.2 Motivation of dissertation3                |
| 1.3 Organization of dissertation7              |
| CHAPTER II: Arbitrary Waveform Generator (AWG) |
| 2.1 Overview                                   |
| 2.1.1 Multi-Point AWG12                        |
| 2.1.2 Multi-Rate AWG12                         |
| 2.2 Fully-Integrated SRAM-based AWG14          |
| 2.2.1 Conventional AWG14                       |

| 2.2.2        | Address Generator Embedded AWG                            |            |
|--------------|-----------------------------------------------------------|------------|
| 2.2.3        | Power Analysis of AWG                                     | 26         |
| 2.2.4        | Measurement Results of MRSS-AWG                           |            |
| 2.3 AWG      | for Analog Matched Filter                                 |            |
| 2.3.1        | Synchronizable MF-AWG Architecture                        |            |
| 2.3.2        | Measurement Results of MF-AWG                             |            |
| 2.3.3        | MF-AWG Output Analysis                                    |            |
| 2.4 Compa    | parison of MRSS-AWG and MF-AWG                            | 46         |
| CHAPTER III: | I: Low-Power AWG for Low-Power MRSS                       |            |
| 3.1 Overvi   | /iew                                                      | 48         |
| 3.2 Self-De  | Deactivated Data Transition Bit (DTB) Scheme              | 48         |
| 3.3 Diode-   | -Connected Low-Swing Signaling Scheme with a Short Cur    | rent       |
| Reduct       | ction buffer                                              | 54         |
| 3.4 Charge   | ge Recycling with Push-Pull Level Converter for Asynchron | ous Design |
| ••••••       |                                                           | 56         |
| 3.4.1 Iı     | Introduction                                              | 56         |
| 3.4.2 A      | Asynchronous and Synchronous Charge Recycling             |            |

| 3.    | .4.3 Proposed Push-Pull Level Converter (PPLC)                       |
|-------|----------------------------------------------------------------------|
| 3.    | .4.4 Asynchronous Charge Recycling Scheme with PPLC (ACR_PPLC)62     |
| 3.5 R | Robust Latch-Type Sense Amplifier Using an Adaptive Latch Resistance |
| 3.    | .5.1 Introduction                                                    |
| 3.    | .5.2 Conventional Latch Type Sense Amplifier65                       |
| 3.    | .5.3 Proposed Latch Resistance-Controlled SA67                       |
| 3.    | .5.4 Latch_R-Type Sense Amplifier Analysis70                         |
|       | <u>3.5.4.1 Output Margin (OM) analysis</u> 70                        |
|       | <u>3.5.4.2 Mismatch Effects on an SM</u> 73                          |
|       | 3.5.4.3 Latch Resistance Effect on SA speed73                        |
|       | 3.5.4.4 Latch Resistance Effect on Bit-Cell Array and SA74           |
| 3.    | .5.5 Test Chip Experiment and Measurement Results77                  |
| 3.6 F | ully-gated ground 10T-SRAM bitcell in a 45-nm SOI technology         |
| 3.    | .6.1 Introduction79                                                  |
| 3.    | .6.2 Leakage Current Effect on Bitline                               |
| 3.    | .6.3 Simulation Results                                              |

| 3.7 Low-Power Techniques of Sub-blocks for LP-MRSS        |     |
|-----------------------------------------------------------|-----|
| 3.7.1 Overview.                                           |     |
| 3.7.2 Low-Power Analog Correlator.                        | 87  |
| 3.7.3 Low-Power Pipeline Analog-to-Digital Converter      |     |
| 3.7.4 Fast-Sweeping Frequency Synthesizer                 |     |
| 3.8 Measurement Results of LP-MRSS                        | 91  |
| CHAPTER IV: Conclusions and Future works                  |     |
| 4.1 Technical contribution and impact of the dissertation | 97  |
| 4.2 Scope of future research                              | 99  |
| References                                                | 100 |
| Publications and Patents                                  | 107 |
| Vita                                                      |     |

## List of Tables

| Table I. Comparison of MP-AWG and MR-AWG                                                      | 13      |
|-----------------------------------------------------------------------------------------------|---------|
| Table II. Power Consumption Comparison of Multi-Resolution And Multi-W Sensing Process        | aveform |
| Table III. AWG modes                                                                          | 26      |
| Table IV. Analysis of address toggle number ( $N_{toggle}$ ) and toggle rate ( $f_{toggle}$ ) | 27      |
| Table V. Summary of characteristics of MRSS-AWG and MF-AWG                                    | 46      |
| Table VI. Comparison of Waveform Generators                                                   | 47      |
| Table VII. Power analysis of SRAM of AWG.                                                     | 49      |
| Table VIII. Analysis on data toggling rate of Hann Window.                                    | 50      |
| Table IX. Measured LP-MRSS Power Comparison.                                                  | 95      |

## **List of Figures**

| Figure 1. Roadmaps of Gene's law, signal processing demand, and total system power consumption                                                                                                                                                                                 |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Figure 2. Power consumption of ARM926 at the peak frequency versus the technology nodes                                                                                                                                                                                        |
| Figure 3. Block diagram of a multi-resolution spectrum sensing technique for a cognitive radio with an arbitrary waveform generator                                                                                                                                            |
| Figure 4. Block diagram of a matched filter technique for a pulse compression with an arbitrary waveform generator                                                                                                                                                             |
| Figure 5. Landscape of analog signal processing7                                                                                                                                                                                                                               |
| Figure 6. Block diagram of a direct digital synthesizer                                                                                                                                                                                                                        |
| Figure 7. Block diagram of AWG showing (a) the principle control scheme, (b) long and short window generation of MP-AWG, and (c) long and short window generation of MR-AWG                                                                                                    |
| Figure 8. Conventional DDS-based AWG having an address generator outside an SRAM.<br>(b) Proposed AWG with an address generator-embedded SRAM                                                                                                                                  |
| Figure 9. Conceptual timing diagram and operation table of multi-resolution and multi-waveform spectrum sensing process                                                                                                                                                        |
| Figure 10. Conceptual power consumption of the conventional and proposed AWG during two-resolution and two-waveform spectrum sensing process of the UHF-bands from 512 MHz to 698 MHz. (a) The conventional DDS-based AWG. (b) The proposed address generator-embedded AWG. 19 |
| Figure 11. Schematic diagram of a conventional SRAM and address generator of AWG                                                                                                                                                                                               |
| Figure 12. Schematic diagram of an address generator embedded SRAM for low-power AWG                                                                                                                                                                                           |
| Figure 13. Serial latch of the proposed SRAM for low-power AWG                                                                                                                                                                                                                 |
| Figure 14. Timing diagrams of (a) conventional SRAM and (b) proposed SRAM for a low-power AWG in sweep modes                                                                                                                                                                   |

| Figure 15. Power consumption of SRAM sub-blocks and an address generator in a conventional and a proposed AWG                                                                                                                                                                                                                                      |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Figure 16. Power dissipation breakdown. (a) Conventional AWG at RES=00, and (b)<br>RES=11. (c) Proposed AWG                                                                                                                                                                                                                                        |
| Figure 17. Schematic diagram of 11-bit R-2R type of DAC in AWG30                                                                                                                                                                                                                                                                                   |
| Figure 18. Frequency response of 11-bit R-2R type of DAC in AWG                                                                                                                                                                                                                                                                                    |
| Figure 19. Die microphotograph of MRSS-AWG in the MRSS IC                                                                                                                                                                                                                                                                                          |
| Figure 20. Measurement results of MRSS-AWG showing a multi-resolution $cos^4$                                                                                                                                                                                                                                                                      |
| Figure 21. Block diagram of MF-AWG                                                                                                                                                                                                                                                                                                                 |
| Figure 22. Schematic diagram of SRAM and an address generator of MF-AWG                                                                                                                                                                                                                                                                            |
| Figure 23. Synchronizable timing diagram of MF-AWG                                                                                                                                                                                                                                                                                                 |
| Figure 24. Measurement results of (a) chirp signal, and (b) Daubechies wavelet of MF-AWG                                                                                                                                                                                                                                                           |
| Figure 25. Measurement results of arbitrary starting chirp waveform of MF-AWG38                                                                                                                                                                                                                                                                    |
| Figure 26. Chirp waveform (a) ideally generated in the time domain by MATLAB <sup>TM</sup> , (b) sampled at $f_{clk}$ =38.4 MHz, and transferred to a frequency domain by N=2 <sup>19</sup> point FFT, (c) resampled at 100-MS/s after Butterworth 2 <sup>nd</sup> order LPF with a 10.78 MHz cutoff frequency, and (d) measured at 100-MS/s       |
| Figure 27. Daubechies wavelet (a) ideally generated in the time domain by MATLAB <sup>TM</sup> , (b) sampled at $f_{clk}$ =38.4 MHz, and transferred to a frequency domain by N=2 <sup>19</sup> point FFT, (c) resampled at 100-MS/s through Butterworth 2 <sup>nd</sup> order LPF with a 10.78 MHz cutoff frequency, and (d) measured at 100-MS/s |
| Figure 28. Layout of SRAM, Latch, and DAC of MF-AWG45                                                                                                                                                                                                                                                                                              |
| Figure 29. Die micrograph of MF-AWG in the MF45                                                                                                                                                                                                                                                                                                    |
| Figure 30. Conceptual diagram showing data stored in a RAM of the LP-DWG51                                                                                                                                                                                                                                                                         |
|                                                                                                                                                                                                                                                                                                                                                    |

Figure 31. A self-deactivated data transition bit (DTB) of the LP-DWG showing (a)schematic, (b) timing diagram, and (c) mismatch sense amplifier......53

| Figure 32. Diode-connected low-swing signaling scheme with an I <sub>SHORT</sub> reduction (SR) buffer                                                                                                                      |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Figure 33. Asynchronous charge recycling with an inverter receiver ( <i>ACR_IR</i> ) and synchronous <i>CR</i> with clocked inverter receiver ( <i>SCR_CIR</i> )                                                            |
| Figure 34. Timing diagram and power consumption of <i>ACR_IR</i> , <i>SCR_CIR</i> , and inverter driver with an inverter receiver ( <i>ID_IR</i> )                                                                          |
| Figure 35. <i>PLC</i> and <i>PPLC</i> showing initial status starting to consume <i>I<sub>S</sub></i> at <i>LtoH</i> transition                                                                                             |
| Figure 36. (a) Transient response, and (b) DC response of <i>PLC</i> and <i>PPLC</i> 60                                                                                                                                     |
| Figure 37. Schematic of <i>ACR_PPLC</i> and <i>EQ</i> generator                                                                                                                                                             |
| Figure 38. Power comparison versus $T_{EQ}$ and $C_L$ at $V_{DD}$ =1.8 V, 125 °C, FF63                                                                                                                                      |
| Figure 39. Block diagram of bit-cell array showing an $I_{LEAK}$ effect on a $\Delta V_{BL}$                                                                                                                                |
| Figure 40. Conventional latch-type sense amplifier (a) Schematic. (b) Simulation                                                                                                                                            |
| Figure 41. Proposed latch-resistance sense amplifier (a) Schematic. (b) Simulation 68                                                                                                                                       |
| Figure 42. (a) Conceptual timing of <i>SO/SOb</i> , <i>SO_g/SOb_g</i> , and <i>BL/BLb</i> showing the nodes of <i>latch_R1/latch_R2</i> transistors in a regeneration phase. (b) Latch error modeling in <i>latch_R</i> -SA |
| Figure 43. Histogram of <i>OM</i> . (a) Conventional, and (b) Proposed SA72                                                                                                                                                 |
| Figure 44. Output margin sweep versus mismatch ratio and input voltage margin. (a)<br>Conventional SA. (b) Proposed SA                                                                                                      |
| Figure 45. Output margin sweep versus mismatch ratio and input voltage margin. (a)<br>Conventional SA. (b) Proposed SA                                                                                                      |
| Figure 46. Simulated <i>SM</i> and speed versus various <i>latch_R</i> with mismatches74                                                                                                                                    |
| Figure 47. Timing diagram showing $\Delta T$ ( <i>WL</i> , <i>S_OUT</i> )                                                                                                                                                   |
| Figure 48. (a) $\Delta T$ ( <i>WL</i> , <i>S_OUT</i> ) and $\Delta T$ ( <i>S_EN</i> , <i>S_OUT</i> ), (b) Power consumption                                                                                                 |
| Figure 49. SA test chip schematic                                                                                                                                                                                           |

| Figure 50. (a) SA test chip micrograph. (b) Measured <i>SM</i> versus supply voltages and <i>WL_LEAK</i>                                                         |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Figure 51. 6T, 8T-LC, 10T, and 10T-RGND MC                                                                                                                       |
| Figure 52. 6T, 8T-LC, 10T, and 10T-RGND MC array showing $I_{OFF}$ effect on $\Delta V_{BL}$ 81                                                                  |
| Figure 53. $V_{BL}$ and $V_{/BL}$ of 6T, 8T-LC, 10T, and 10T-RGND versus NOB                                                                                     |
| Figure 54. Static leakage power consumption and $\Delta V_{BL}$ of 6T, 8T-LC, 10T, and 10T-RGND versus NOB                                                       |
| Figure 55. Low power MRSS block diagram                                                                                                                          |
| Figure 56. Low-power analog correlator                                                                                                                           |
| Figure 57. Low-power pipeline analog-to-digital converter                                                                                                        |
| Figure 58. Fast-sweeping frequency synthesizer                                                                                                                   |
| Figure 59. Measured results of LP-DWG. (a) Multi-resolution $cos^4$ window. Power reduction with (b) a DTB, (c) a low-swing signaling scheme, and an SR buffer91 |
| Figure 60. Measured spectrum sensing feature. (a) Spectrum analyzer. (b) LP- MRSS 93                                                                             |
| Figure 61. Measured in-band detection range with 100-kHz window94                                                                                                |
| Figure 62. LP-MRSS chip micrograph96                                                                                                                             |

## LIST OF ABBREVIATIONS

| ACR     | asynchronous charge recycling scheme        |
|---------|---------------------------------------------|
| AWG     | arbitrary waveform generator                |
| ADC     | analog-to-digital converter                 |
| CMOS    | complementary metal-oxide-semiconductor     |
| CR      | cognitive radio                             |
| DAC     | digital-to-analog converter                 |
| DDS     | direct digital synthesizer                  |
| DTB     | data transition bit                         |
| FFT     | fast Fourier transform                      |
| IC      | integrated circuit                          |
| IFT     | inverse fast Fourier transform              |
| LNA     | low-noise amplifier                         |
| LP-AWG  | low-power arbitrary waveform generator      |
| LP-MRSS | low-power multi-resolution spectrum sensing |
| LPF     | low-pass filter                             |
| MF      | matched filter                              |

| MIMO   | multiple input, multiple output                    |
|--------|----------------------------------------------------|
| MP-AWG | multi-point arbitrary waveform generator           |
| MR-AWG | multi-rate arbitrary waveform generator            |
| MRSS   | multi-resolution spectrum sensing                  |
| OFDMA  | orthogonal frequency-division multiple access      |
| ОМ     | output margin                                      |
| PLL    | phase locked loop                                  |
| PPLC   | push-pull level converter                          |
| PVT    | process, supply voltage, and operating temperature |
| RAM    | random access memory                               |
| SA     | sense amplifier                                    |
| SAW    | surface acoustic wave                              |
| SM     | sense amplifier margin                             |
| SNR    | signal-to-noise ratio                              |
| SOI    | silicon on insulator                               |
| SRAM   | static random access memory                        |
| UHF    | ultra high frequency                               |

| ultra wideband                | UWB  |
|-------------------------------|------|
| voltage-controlled oscillator | VCO  |
| variable-gain amplifier       | VGA  |
| wireless local area network   | WLAN |

### Summary

This dissertation focuses on design and implementation of a fully-integrated SRAM-based arbitrary waveform generator for analog signal processing applications in a CMOS technology. The dissertation consists of two parts: Firstly, a fully-integrated arbitrary waveform generator for a multi-resolution spectrum sensing of a cognitive radio applications, and an analog matched-filter for a radar application and secondly, low-power techniques for an arbitrary waveform generator. The fully-integrated low-power AWG is implemented and measured in a 0.18-µm CMOS technology. Theoretical analysis is performed, and the perspective implementation issues are mentioned comparing the measurement results. Moreover, the low-power techniques of SRAM are addressed for the analog signal processing: Self-deactivated data-transition bit scheme, diode-connected low-swing signaling scheme with a short-current reduction buffer, and charge-recycling with a push-pull level converter for power reduction of asynchronous design. Especially, the robust latch-type sense amplifier using an adaptive-latch resistance and fully-gated ground 10T-SRAM bitcell in a 45-nm SOI technology would be used as a technique to overcome the challenges in the upcoming deep-submicron technologies.

## **CHAPTER I**

## Introduction

#### 1.1 Technology trends

A signal processing analyzes the signals to understand the information in the signal to be transmitted and received. As the technology advances, the systems have been challenged by the volume of the signal processing for the shorter processing time. Therefore, the systems have been more complicated, and consumed the power increasingly. Figure 1 shows the roadmaps of Gene's law, signal process demand, and total system power consumption [1]. Even if the power consumption for million instructions per second (MIPS) is decreasing according to the Gene's law, the overall system power consumptions stays almost constantly for the increasing signal processing volume. Therefore, the low-power approaches for the signal processing meets the challenging situation.



Figure 1. Roadmaps of Gene's law, signal processing demand, and total system power consumption.



Figure 2. Power consumption of ARM926 at the peak frequency versus the technology nodes.

Figure 2 shows ARM926 power consumptions at the peak frequency according to the technology nodes [2]. The power consumption of ARM926 stays at the 90-nm technology without being reduced. This is caused by the higher demand on the processing performance and the limitation of the low-power techniques.

From these two results, it can be concluded that there must be accordingly new approaches to overcome the limitation of the power reduction approaches. The analog signal processing technique can be a good candidate to reduce the power consumption for the signal processing [1]. The analog signal processing performs the signal process in an analog domain without converting the signal to the digital domain. Therefore it can reduce the burden of the power consumption of an analog-to-digital converter (ADC) and a digital-to-analog converter (DAC).

#### **1.2 Motivation of dissertation**

A cognitive radio (CR) has been proposed as a promising way of improving spectrum utilization by adopting dynamic spectrum resource management [3-7]. To protect the primary users, the spectrum usage occupancy should be carefully monitored, which demands both accuracy and fast spectrum sensing time. From a CR system commercialization standpoint, minimizing the hardware complexity as well as the power consumption is also crucial. To meet the demands for speed and accuracy, a multi-resolution spectrum sensing (MRSS) technique was proposed and successfully demonstrated [8], [9].



Figure 3. Block diagram of a multi-resolution spectrum sensing technique for a cognitive radio with an arbitrary waveform generator.

Figure 3 shows a block diagram of MRSS. During spectrum sensing, the flexibility of a detection bandwidth can be controlled by correlating the received analog signal with the window signal generated by a arbitrary waveform generator. For the success of CR systems, various spectrum processing techniques should be implemented for the sensing and the selection of the desired spectrum resource. To maximize the throughput of CR systems, those spectrum processing techniques should be highly flexible and reconfigurable to be adaptive depending on the availability of spectrum resources. Those demands can be achieved by utilizing the flexible and adaptable arbitrary waveform generator (AWG). This dissertation deals with the accomplishment of reducing the power consumption in the AWG.

Meanwhile, a radar range resolution, a pulse width, and an average transmitted power have a trade-off, since the range resolution is given by

$$\Delta R = \mathbf{c} \times \tau / 2, \tag{1}$$

where  $\tau$  is the radar pulse width, and *c* is the speed of light.

A pulse compression technique allows a high average transmitted power of a long pulse while maintaining a high resolution of a short pulse. There are two well-known conventional approaches for the pulse compression: surface acoustic wave (SAW) device-based [10] and fast convolution processing (FCP)-based [11]. Also, there have been various studies to find alternatives to conventional approaches. Using a capacitor as a delay or a memory element has been widely investigated [12]. However, this approach could have problems such as a capacitor mismatch or a leakage current issue. An approach based on the floating-gate MOS technology [13] could also have difficulties with long pulses. There has been a study using a bank of digitally controlled transconductors along with small capacitors [14], but it could suffer from a transconductance mismatch. In addition, even though DSPs are becoming more powerful with time for a Gene's law [15], a FCP is computationally expensive since this operation requires both a fast Fourier transform (FFT) and an inverse fast Fourier transform (IFFT). More generally, digital signal processing blocks such as channel estimators and digital filters consume a significant amount of power [16]. Also, ADCs should have a high sampling rate and a wide dynamic range. Therefore, to utilize the benefit of the pulse compression technique overcoming those disadvantages, we recently proposed an analog matched filter (MF) for a radar pulse compression [17].



Figure 4. Block diagram of a matched filter technique for a pulse compression with an arbitrary waveform generator.

Our solution compared to a conventional approach is to implement a fully integrated analog MF before an ADC stage, so that requirements for an ADC and a DSP such as speed and power can be relaxed. As in Figure 4, an analog MF is different from previous approaches since the MF function is implemented in a mixed signal domain using an analog multiplier, an analog integrator, and an AWG.

This approach makes use of both the flexibility of the digital domain and the simplicity of the analog domain using mixed-signal integration. Since neither FFT nor IFFT is required, a digital computational burden and ADC requirements are relaxed. Also, the fabrication process is fully CMOS-compatible, so a system-on-a-chip (SOC) integration is possible, unlike a SAW-based MF, and further this could be a viable approach for a very large scale ( $\sim 10^6$  elements) phased array [18]. Moreover, this approach opens the possibility of using a complicated waveform such as a wavelet over a traditional chirp

signal. Even though it has proven to be a useful tool for several applications, the computational cost of signal processing using wavelets is heavy, so hardware implementations were almost concentrated on digital circuit implementation [19]. Therefore, our approach will open the possibility of developing to the direction of waveform diversity. The key building block for this approach is an AWG. The goal is to implement a multi-resolution, flexible, low-power, and compact AWG that can be integrated as part of a SOC. Since the proposed AWG is random access memory (RAM)-based, it can be programmed with an arbitrary waveform beforehand, and the AWG keeps the flexibility of digital processing. Also, by simply changing the addressing and clocking scheme, it is possible to change the resolution during operation.

#### **1.3 Organization of dissertation**



Figure 5. Landscape of analog signal processing.

Figure 5 shows the landscape of analog signal processing which have been achieved in my research lab. Cognitive radio spectrum sensing was proposed utilizing the multi-resolution spectrum sensing technique in communication area. For the military radar application, radar pulse compressing using analog matched filter has been proposed. For those two applications, arbitrary waveform generator is working as a key function block. Therefore, my research is focused on arbitrary waveform generator for those two applications.

Chapter II addresses the conceptual AWG. It explains the basic structure of AWG, and the mechanism to store and retrieve the data to make the waveform: the multi-point and the multi-rate AWG. Then, the fully-integrated SRAM-based AWG for MRSS of CR application is explained. The proposed architecture of the AWG is compared to the

conventional one, and the power consumption is analyzed according to the input vector. Then, the measurement results of MRSS-AWG are discussed. The AWG is optimally implemented for the analog matched-filter (MF). The special feature of MF-AWG is to synchronize the waveform at the detection point of the MF. This functionality is described in the architecture of MF-AWG. Then, the measurement results of MF-AWG are described.

Chapter III shows the low-power techniques of the AWG, especially for the low-power MRSS (LP-MRSS). Firstly, the self-deactivated data-transition-bit (DTB) scheme is applied to the LP-AWG. Secondly, the diode-connected low-swing signaling scheme with a short current reduction buffer is also adopted in the LP-AWG. Meanwhile, the charge-recycling scheme is additionally also suggested adopting the push-pull level converter, which can be applied to the asynchronous design. Then, the robust latch-type sense amplifier using an adaptive latch resistance is suggested. This scheme can be efficiently applicable in the deep-sub micron technologies, where the leakage-current affects the AWG potentially. Finally, the fully-gated ground 10T-SRAM bitcell is developed in a 45-nm SOI technology. The new SRAM bitcell can be a good candidate for the AWG in the upcoming challenging technologies. The low-power techniques for the LP-AWG are implemented for the LP-MRSS. The LP-MRSS is composed of the low-power analog correlator, the low-power pipeline ADC, and the fast-sweeping frequency synthesizer. In the last Section, the measurement results of the LP-MRSS are described compared to the conventional MRSS IC. In Chapter IV, the conclusions of the dissertation are addressed. Then, the impact of the research and the future works are described.

## **CHAPTER II**

## **Arbitrary Waveform Generator (AWG)**

#### 2.1 Overview

An AWG has been widely developed for applications in communication, and signal processing [10-16]. These applications are based on the direct digital synthesis (DDS) technique for a precise frequency control, shown in Figure 6.



Figure 6. Block diagram of a direct digital synthesizer.

A DDS makes an analog waveform that is stored in a phase-to-amplitude converter (PAC) or a weighted digital-to-analog convert (DAC) block. A phase accumulator (PA) makes the address to access the PAC at the clock cycle. The output frequency of a DDS is expressed as

$$f_{out} = f_{clk} \times FCW / 2^N \tag{2}$$

where  $f_{clk}$  is the clock frequency, and N-bit FCW is the frequency control word in a PA.

The objectives of the preliminary research are

- to build a fully integrated AWG for MRSS functionality;
- to build a fully integrated AWG for MF functionality.

The research for the conceptual algorithm of the AWG is developed in Section 3-1. Then, the AWG is implemented and extended to the specific designs of the MRSS in Section 3-2 and the MF in Section 3-3. In Section 3-4, the comparison of the MRSS-AWG and MF-AWG is summarized.

Figure 7(a) shows a block diagram of the AWG. The role of the AWG is to make an arbitrary window having a flexible duration. The AWG stores the window data in the RAM and controls the duration of the window digitally by changing the read-out methods. The duration control methods can be classified in two ways, the RAM address increment control method and the clock frequency control method. In the AWG, a horizontal resolution,  $N_{hor}$ , is the number of points covering one window basis. Vertical resolution,  $N_{ver}$ , is the width of a RAM and the resolution of a DAC.  $t_{\omega}$  is the window duration, and  $f_{clk}$  is the clock frequency. The relation among  $t_{\omega}$ ,  $f_{clk}$ , and  $N_{hor}$  is

$$t_{\omega} = \frac{(N_{hor} - 1)}{f_{clk}} \tag{3}$$

The RAM depth should be as big as the maximum  $N_{hor}$  of the most precise window basis, if it is aimed for a multi-resolution window. A LPF reconstructs an analog window signal by removing the harmonics of clock signals. A proper window basis is made by setting the appropriate RAM addressing increments or clock frequency.



Figure 7. Block diagram of AWG showing (a) the principle control scheme, (b) long and short window generation of MP-AWG, and (c) long and short window generation of MR-AWG.

#### 2.1.1 Multi-Point AWG

The multi-point AWG (MP-AWG) alters the window duration by controlling the RAM address increment, as shown in Figure 7(b). If every row of the RAM is accessed at the rate of  $1/f_{clk1}$ , the longest window is generated. If every other row of the RAM is accessed at the same rate of  $1/f_{clk1}$ , the MP-AWG generates the short window having the half of  $t_{\omega}$  of the longest one.

The advantage of the MP-AWG is that long and short windows can be generated only by switching RAM addressing schemes. In addition, the LPF for the clock harmonics rejection can be fixed to a certain cut-off frequency because the sampling frequency is the same for any window duration. On the contrary, the RAM depth should be defined to be the maximum  $N_{hor}$ , corresponding to the longest window. Therefore, when a short window is used, the RAM would not be optimally utilized. Moreover, there is a burden to implement a flexible address increment controller.

#### 2.1.2 Multi-Rate AWG

Figure 7(c) shows the multi-rate AWG (MR-AWG), which generates a different duration of windows by controlling  $f_{clk}$ . Every row of the RAM is accessed consecutively at the rate of  $1/f_{clk1}$  for the generation of a long window. The MR-AWG generates a short window having half the  $t_{\omega}$  of the long one when  $f_{clk2}$  is set to be twice  $f_{clk1}$ . In the MR-AWG,  $N_{hor}$  is the same for any window duration, so the address increment is fixed. The duration is controlled by the clock frequency,  $f_{clk}$ .

The advantage of the MR-AWG is that the RAM depth can be optimally sized. Because  $N_{hor}$  of the window is the same for various windows, there is less area overhead than with the MP-DWG. In addition, a simple address accessing scheme is applicable.

However,  $1/f_{clk}$  must be tuned according to the duration of the window basis, and a LPF must be tuned to different cut-off frequencies to remove the harmonics of clock frequencies. Table I compares the characteristics of the MP-AWG and the MR-AWG. The AWG in this proposal has adopted both methods, so that the windows can be made with a different duration by changing either the address increment or the clock frequency adequately.

| Comparison of MP-AWG and MR-AWG         |           |           |
|-----------------------------------------|-----------|-----------|
| Characteristics                         | MP-DWG    | MR-DWG    |
| $N_{hor}$ (# of data point in a window) | Varying   | Fixed     |
| $f_{clk}$ (Sampling frequency)          | Fixed     | Varying   |
| RAM depth                               | Oversized | Optimized |
| RAM address accessing logic             | Varying   | Fixed     |
| RAM clock logic                         | Fixed     | Varying   |
| Reconstruction filter                   | Fixed     | Varying   |

Table I

#### 2.2 Fully-Integrated SRAM-based AWG

#### 2.2.1 Conventional AWG

Conventionally, an AWG has been based on a direct digital synthesis (DDS) technique for generating a waveform with precise frequency controllability. A DDS generates an analog signal that is stored in a phase-to-amplitude converter (PAC) or directly converted through a weighted DAC [17]. A phase accumulator (PA) generates the addresses to access a PAC on a clock cycle. A fully-integrated DDS provides a single-phase of sinusoid utilizing the novel PA [18] and a reduced PAC [19] for low power and high speed. However, the DDS is optimized to generate a high frequency clock signal or predefined waveforms such as a sinusoidal, a ramp, and a normal chirp waveform. Therefore, it is not a good candidate for the MRSS or the MF, both of which require arbitrary waveforms such as a Hann window, the pre-distorted chirp, or a wavelet.

Meanwhile, square/triangular waveform generators were investigated by controlling passive devices or a current source utilizing the commercial ICs [20]-[22]. Then, on-chip pulse-shaping generators were proposed [23], [24] for the UWB applications. However they have the drawbacks of generating the pulse waveforms which has the limitation to make the spectral characteristic good enough for the MRSS and the MF. Moreover, they don't beat the AWG's ability to generate multiple-arbitrary waveforms; linear frequency modulated (LFM), non-linear frequency modulated (NLFM), pulsed continuous wave (CW), with modern radar systems. The AWG is required to not only switch the multiple waveforms at will, but also modify the waveforms with additional analysis blocks [25]. For example, if the matched-filter result shows an undesired spike or attenuation, the AWG can pre-distort or pre-amplify a waveform at certain frequency for much-desired.



Figure 8. (a) Conventional DDS-based AWG having an address generator outside an SRAM. (b) Proposed AWG with an address generator-embedded SRAM.

Off-chip based AWGs were widely developed for applications in the communication and signal processing [26]-[33]. They used commercially-available integrated circuits (ICs) or field-programmable gate array (FPGA) devices. They could meet the stringent system requirements for the processing speed and resolution by utilizing a digital signal processor (DSP) and off-the-shelf ICs. However, the power consumption is much higher than a fully-integrated AWG. A radar-waveform generator was developed using commercial ICs [34], but it is not trivial to implement a compact, low power, and flexible AWG based on the off-the-shelf ICs. Moreover, it is not easy to synchronize the internal signals according to the process, voltage, and temperature (PVT) variations. Therefore, the features of the AWG for the MRSS and MF should be:

1) to store an arbitrary waveform;

2) to change the shape and resolution of the waveform;

3) to be fully integrated with other processing sub-blocks.

Figure 8 shows the conventional DDS-based AWG. An address generator is designed separate to a static random-access memory (SRAM), which makes addresses randomly to retrieve the data from an SRAM. Therefore, the whole SRAM including x-decoder blocks (high-order and low-order address decoders) are accessed on every clock cycle, even if they are necessary in a random-access mode. This architecture is useful for a fine-tuning resolution, but not for the low-power, since addresses are generated independent of an SRAM. Therefore, in this paper, the novel architecture for the low-power AWG is addressed.

16



Figure 9. Conceptual timing diagram and operation table of multi-resolution and multi-waveform spectrum sensing process.

Figure 9 shows the conceptual timing diagram and operation table of a multi-resolution and multi-waveform spectrum sensing process, which is analyzed to investigate the power- saving efficiency of the proposed AWG. For the spectrum sensing, two waveforms are used at two resolutions. Then, the sensing time in a fine sensing mode with an A window is given by

$$t_{sense1} = \left(\frac{f_{end} - f_{start}}{f_{step}} + 1\right) \times \left[N_{AVG}\left(t_{\omega 1} + t_{buf}\right) + t_{sw0}\right] (sec), \qquad (4)$$

where  $(f_{end} - f_{start})$  is the frequency sweep range of a phase- locked loop (PLL),  $f_{step}$  is the amount of frequency change in a PLL,  $N_{AVG}$  is the number of averages at one PLL frequency,  $t_{buf}$  is a margin between consecutive windows,  $t_{\omega I}$  is the bandwidth of an A type window, and  $t_{sw0}$  is the maximum switching the settling time of a PLL [8], [9]. Then,  $t_{sense2}$ of the sensing time in coarse sensing with an  $A \times 4$  window is given by replacing  $t_{\omega I}$  with  $t_{\omega 2}$ in (4). Since the bandwidth of a waveform of fine sensing is larger than that of coarse sensing, and  $t_{\omega 2}$  is assumed to be one-quarter of  $t_{\omega I}$ ,  $t_{sense2}$  is less than  $t_{sense1}$ . If the bandwidth of a *B* waveform is the same as that of an *A* waveform,  $t_{sense3}$  is the same as  $t_{sense1}$ , and  $t_{sense4}$  is same as  $t_{sense2}$ . Meanwhile,  $t_{sw1}$  is the time for switching the resolution of the window, which is necessary to assign new resolution-control word (*FCW*) to the AWG. Then,  $t_{sw2}$  is defined as the time when the window shape changes or pre-distorts according to the channel environment and/or the sensing signal. If the whole shape of the window changes,  $t_{sw2}$  is the time to access the whole SRAM. If not, it is the time to access a part of the window.

For example, if a 100-kHz window is used with 50 correlating averages to sense the UHF bands from 512 MHz to 698 MHz by a step of 1 MHz, then  $t_{\omega 1}$  is 10 µs,  $t_{buf}$  is 0.15 µs,  $N_{AVG}$  is 50,  $t_{sw0}$  is 30 µs, and  $t_{sense1}$  and  $t_{sense3}$  are 100.5 ms according to (1). Then, if a 400-kHz window is used for coarse sensing,  $t_{sense2}$  and  $t_{sense4}$  can be calculated to be 30.3 ms according to (1), and  $t_{sw1}$  is  $1/f_{clk}$  for switching a window resolution. For writing a new window in the whole SRAM,  $t_{sw2}$  is given by

$$t_{sw2} = N_{word} \times 1/f_{clk}, \tag{5}$$

where  $N_{word}$  is the number of words of an SRAM. Then,  $t_{sw1}$  is 0.052 µs,  $t_{sw2}$  is 13.3 µs at  $f_{clk}$ = 19.2 MHz and  $N_{word}$  = 256. If these calculations are applied to the overall sensing time as shown in Figure 9, the power consumption of the conventional and proposed AWG can be further investigated.

Figure 10 shows the conceptual power consumption comparison of the previous DDS-based and the proposed AWG during the two-resolution and two-waveform spectrum sensing process of the UHF-bands.



Figure 10. Conceptual power consumption of the conventional and proposed AWG during two-resolution and two-waveform spectrum sensing process of the UHF-bands from 512 MHz to 698 MHz. (a) The conventional DDS-based AWG. (b) The proposed address generator-embedded AWG.

The numerical notation indicates the power consumption of each block at each sensing period, shown in Figure 9. For the proposed AWG, the power consumption of the high-order address decoder and address generator can be reduced during  $t_{sense1}$ ,  $t_{sense2}$ ,  $t_{sense3}$ , and  $t_{sense4}$ , while the additional power in the control block is consumed.

Table II shows the average power consumption equation of the conventional and proposed AWG. It is driven by the current and time, shown in Figure 10.  $\Delta P_{avg}$  is the

difference between the conventional and proposed AWG power consumption. To calculate the power consumption more accurately, the architecture and the toggling rate of the SRAM and address generator need to be analyzed in more detail.

| Power Consumption of Multi-Resolution and Multi-Waveform Sensing Process |                                                                                                                                                                                                                                                                                             |  |
|--------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| $P_{avg}(\text{conv.})$                                                  | $\frac{\{(1) + (3) + (4) + (5) + (6) + (7) + (8)\} \times \{(t_{sensel} + t_{sw1} + t_{sense2}) \times 2 + t_{sw2}\} + \{(2) + (2)^*\} \times (t_{sensel} + t_{sw1} + t_{sense2}) + (2) \times t_{sw2}}{(t_{sensel} + t_{sense2} + t_{sw1}) \times 2 + t_{sw2}}$                            |  |
| <b>P</b> <sub>avg</sub> (prop.)                                          | $\frac{\{(3) + (4) + (5)^{*} + (6) + (7) + (8)\} \times \{(t_{sensel} + t_{sw1} + t_{sense2}) \times 2 + t_{sw2}\} + \{(1) + (2)\} \times t_{sw2}}{(t_{sensel} + t_{sense2} + t_{sw1}) \times 2 + t_{sw2}}$                                                                                 |  |
| $\Delta P_{avg}$                                                         | $\frac{(1) \times \{(t_{sensel} + t_{swl} + t_{sense2}) \times 2\} + \{(2) + (2)^*\} \times \{(t_{sensel} + t_{swl} + t_{sense2}) + t_{sw2}\} - \{(5)^* - (5)\} \times \{(t_{sensel} + t_{swl} + t_{sense2}) \times 2 + t_{sw2}\}}{(t_{sensel} + t_{sense2} + t_{sw1}) \times 2 + t_{sw2}}$ |  |

 Table II

 Power Consumption of Multi-Resolution and Multi-Waveform Sensing Process

Figure 11 shows the schematic of the conventional SRAM architecture with an address generator block of a DDS-based AWG. The SRAM consists of a control block, x-decoder blocks, 6-T memory cells (MCs), and y-path blocks configuring four banks of 224x11b. MCs store the digital data of a waveform that has been pre-converted and stored through I2C externally. To do this operation, the MC is accessed randomly by addresses through many x-decoders and high-capacitance loading lines. For x-decoder blocks, address buffers are divided into several blocks for the optimal design considering the power consumption and area efficiency. The upper six bits among eight bits of x-addresses are called high-order addresses (*HOaddr*), and the others low-order addresses (*LOaddr*). For the random access of the SRAM, HOaddr are buffered to drive high-capacitance loading lines of x-decoder blocks. Then, they are pre-decoded to a HOaddr summing signal (HA SUM). This is decoded through a two-input NAND buffer with the pre-decoded LOaddr signals, X < 3:0 >, to make a wordline accessing one row of the MC array. In conventional DDS-based AWG architecture, addresses are made in the external address generator block, and x-decoder blocks operate at every clock cycle repetitively to
access the SRAM sequentially. However, the buffers and pre-decoder blocks for *HOaddr* are useful for the random access mode of the SRAM, but not necessary for making a sequential waveform.



Figure 11. Schematic diagram of a conventional SRAM and address generator of DDS-based AWG.

### 2.2.2 Address Generator Embedded AWG

Figure 12 shows the address generator embedded SRAM for the low-power operation of the AWG. The motive for low-power operation is that the addresses to access the SRAM are not random but sequential and repetitive in a sweep mode. Therefore, an x-decoder can be divided into several blocks and placed in a more power-efficient way. In a random access mode, *HOaddr* buffers and pre-decoders are operating as the conventional SRAM. However, in a sweep mode, HA SUM is generated by the internal SL, not by the external address generator and HOaddr buffers. For this operation, instead of address generator blocks for *HOaddr* and *LOaddr* to access the SRAM, an address generator is needed only for LOaddr. Therefore, the power consumption for operating HOaddr buffers, pre-decoders, and driving the high capacitance loading can be reduced, even though there is additional power in the operation of the SL block and X3 PULSE. Then, there is the limitation of making random addresses by the address generator for the embedded SL block. Therefore, the diverse frequency controllability in the proposed architecture is not achieved as much as the conventional DDS-based AWG. Even though there is the limitation for the frequency resolution of the address generator embedded architecture, it can be applied to the MRSS and the MF having the limited frequency resolution.



Figure 12. Schematic diagram of an address generator embedded SRAM for low-power AWG.

Figure 13 shows the serial latch (SL) block that is embedded in the x-decoder block to access the SRAM in making a waveform sequentially for a low-power feature. The SL substitutes the operation of the external address generator, internal buffers, and pre-decoders for *HOaddr*. It consists of a flip-flop, a delay circuit, and an initialization block for the feedback of the flip-flop output. The flip-flop transfers the data of the decoded *HOaddr* from the previous SL to the next SL at the edge of *X3\_PULSE*. The delay circuit makes sufficient delay such that the SL output can be transferred to the next SL input without interfering with the active wordline. Therefore, the delay variation needs to be within the worst delay for evaluating the MC data through the MC array, and the y-path at PVT variations. In this structure, delay techniques based on replica circuits are adopted in order to secure the sequential SRAM operation. In the SL block, a sweep-enable pulse (*SEQ2*) is generated in a sweep mode at the time when *SEQ\_EN* or the top-wordline is enabled. Therefore, the top-wordline indication block is included in the SRAM control block.



Figure 13. Serial latch of the proposed SRAM for low-power AWG.

Figure 14(a) shows the conventional SRAM timing diagram of the AWG in sweep modes. *HOaddr* generated by an external address generator decodes *HA\_SUM*, and *LOaddr* makes X < 3:0 > at the rising-edge of the internal clock (*ICLK*). Buffers and the pre-decoder block for *HOaddr* switch at every clock cycle consuming the dynamic current.

Figure 14(b) shows the proposed SRAM timing diagram. Because the SL block makes *HA\_SUM* signal instead of *HOaddr* buffers, the SL block operates at a cycle of *LOaddr*. Therefore, *HOaddr* buffers are disabled to reduce the switching power.



Figure 14. Timing diagrams of (a) conventional SRAM and (b) proposed SRAM for a low-power AWG in sweep modes.

## 2.2.3 Power Analysis of AWG

Table III shows the AWG modes according to digital control signals. At the falling-edge of the sweep mode selection signal (SEQ), the sweep mode is selected according to resolution selection signals (RES < 1:0 >), making the amount of address increment one, two, or four.

| AWG modes |          |                               |  |  |  |
|-----------|----------|-------------------------------|--|--|--|
| SEQ       | RES<1:0> | Mode                          |  |  |  |
| 0         | XX       | Random access mode, non-sweep |  |  |  |
| 1→0       | 00       | Sweep, addr. increment = $1$  |  |  |  |
| 1→0       | 01       | Sweep, addr. increment = $2$  |  |  |  |
| 1→0       | 10       | Undefined mode                |  |  |  |
| 1→0       | 11       | Sweep, addr. increment = $4$  |  |  |  |

Table III

Table IV shows the *HOaddr* toggle number ( $N_{toggle}$ ) within one window duration ( $t_{\omega}$ ) and average toggle rate ( $f_{toggle}$ ) per clock cycle ( $t_{CK}$ ). To make calculation simple, total wordlines are assumed as 256, while the AWG of the MRSS has 224. HOaddr toggles when one cycle of *LOaddr* is complete. For example, *HOaddr* changes at every fourth  $t_{CK}$ with the address increment of one, and at every  $t_{CK}$  with the address increment of four. *HOaddr* average toggling rate per  $t_{CK}$  ( $f_{toggle HO}$ ) can be calculated as

$$f_{toggle\_HO} = \frac{\sum N_{toggle\_HO}}{N_{toggle\_LO}}$$
(6)

where  $f_{toggle HO}$  is 0.49, 0.98, and 1.97, when RES < 1:0 > = 00, 01, and 11, respectively.

| Analysis of address toggie number (N <sub>toggle</sub> ) and toggie face () <sub>toggle</sub> ) |                                         |    |    |    |               |      |        |              |        |
|-------------------------------------------------------------------------------------------------|-----------------------------------------|----|----|----|---------------|------|--------|--------------|--------|
|                                                                                                 | HOaddr A<7:2>                           |    |    |    | LOaddr A<1:0> |      |        |              |        |
| <i>RES&lt;1:0&gt;</i>                                                                           |                                         |    |    |    |               |      | 00     | 01           | 11     |
| ADDR                                                                                            | A7                                      | A6 | A5 | A4 | A3            | A2   | X<3:0> | X<3:0>       | X<3:0> |
|                                                                                                 | 0                                       | 0  | 0  | 0  | 0             | 0    |        |              | 0001   |
|                                                                                                 | 0                                       | 0  | 0  | 0  | 0             | 1    |        |              |        |
|                                                                                                 | 0                                       | 0  | 0  | 0  | 1             | 0    |        |              |        |
| VALUE                                                                                           | 0                                       | 0  | 0  | 0  | 1             | 1    | 0001   | 0001<br>0100 |        |
|                                                                                                 | 0                                       | 0  | 0  | 1  | 0             | 0    | 0010   |              |        |
|                                                                                                 | 0                                       | 0  | 0  | 1  | 0             | 1    | 0100   |              |        |
|                                                                                                 | 0                                       | 0  | 0  | 1  | 1             | 0    | 1000   |              |        |
|                                                                                                 | 0                                       | 0  | 0  | 1  | 1             | 1    |        |              |        |
|                                                                                                 |                                         |    |    |    |               |      |        |              | 1      |
|                                                                                                 | 1                                       | 1  | 1  | 1  | 1             | 1    |        |              |        |
| Ntoggle                                                                                         | 2                                       | 4  | 8  | 16 | 32            | 64   | 64 × 4 | 64 × 2       | 64 × 1 |
| $f_{toggle\_HO}$                                                                                | (2+4+8+16+32+64)/N <sub>toggle_LO</sub> |    |    |    |               | 0.49 | 0.98   | 1.97         |        |

Table IV Analysis of address toggle number ( $N_{toggle}$ ) and toggle rate ( $f_{toggle}$ )

This means that *HOaddr* buffers switch an average of 0.49, 0.98, and 1.97 times at every  $t_{CK}$ , respectively, depending on the value of *RES*<*1:0*>. The current consumption of one *HOaddr* buffer in the AWG is about 0.068 mA at a supply voltage 1.8 V and 19.2 MHz of clock frequency by simulation. If  $f_{toggle_HO}$  in different sweep modes is considered, the power ratio of *HOaddr* buffers to the total SRAM block in the AWG can be calculated. For example, with  $f_{toggle_HO}$  of 0.49, the total power consumption of *HOaddr* buffers at  $t_{CK}$  is 0.068 mA × 0.49 = 0.03 mA when an address increment is one. In addition, the power consumption of a 6-bit counter to make *HOaddr* is 0.14 mA. These two power values occupy about 15% of the total SRAM power consumption. When an address increment is four, *HOaddr* buffers consume about 0.068 mA × 1.97 = 0.13 mA at  $t_{CK}$ . When the power of a 6-bit counter block is also considered, it occupies about 25% of the total power consumption of the SRAM and the address generator.

Figure 15 compares the dynamic power consumption of the conventional SRAM block and the proposed one. The *HOaddr* buffers and the 6-bit counter consume from 15% to 25% of the total power in the conventional DDS-based architecture, depending on the *HOaddr* toggling ratio. The power consumption of the SL and the x3\_pulse block is added in the proposed architecture, which is about 6% of the total power consumption. The Ypath consumes about 20% for a pre-charging, a sensing, and a writing operation. The control block dissipates about 58% of the total power in generating the internal control signals of the SRAM. The total power consumption of the proposed architecture is saved by 9%, 12%, and 18%, when *RES*<*1:0>* = 00, 01 and 11, respectively considering the power value of sub-blocks.



Figure 15. Power consumption of SRAM sub-blocks and an address generator in a conventional and a proposed AWG.



Figure 16. Power dissipation breakdown. (a) Conventional AWG at RES=00, and (b) RES=11. (c) Proposed AWG.

Figure 16 shows the power dissipation breakdown of the conventional and proposed AWG. It is simulated at  $f_{clk}$  = 19.2 MHz and  $V_{DD}$  = 1.8 V. The same DAC and LPF that are used for the conventional and proposed AWG consume about 63% power of the AWG. Then, the power consumption of the SRAM and address generator occupies 12.7% and 14.7% of the proposed and conventional AWG, respectively. Then, latch buffers, which consume about 23% of the total power with the buffers for probing the internal signals of the AWG, transfer the bundle of digital data between the SRAM and the DAC.

Figure 17 shows the architecture of the R-2R DAC of the AWG. It is designed by the primitive 1-bit DAC cell to configure an 11-bit DAC.  $V_{HIGH}$  and  $V_{LOW}$  define a swing range of the DAC to be optimally controlled for the maximum correlation output of the MRSS and the MF.



Figure 17. Schematic diagram of 11-bit R-2R type of DAC in AWG.

As the DAC generates a window, the dynamic specification of the DAC influences the MRSS and MF performance. Since a thermal noise is mixed with the output of the AWG, a spurious noise of the DAC makes the reciprocal mixing process with a thermal noise. Therefore, interferers of the incoming signals are folded into an in-band due to the spurious noise at the output of the DAC, which results in the degradation of the signal detection. The DAC shows the 45-dBc spurious-free dynamic range (SFDR), shown in Figure 18. If it is considered that the jammer rejection ability of the signal path is 35 dB [9], then the noise increment due to the reciprocal mixing process can be neglected. The largest spurious noise of the DAC does not severely degrade the sensitivity of the signal detection.



Figure 18. Frequency response of 11-bit R-2R type of DAC in AWG.

# 2.2.4 Measurement Results of MRSS-AWG

The AWG is fabricated in a 0.18- $\mu$ m CMOS technology for the MRSS functionality. This is called a MRSS-AWG. Figure 19 shows the die micrograph of the MRSS-AWG in the MRSS IC. The MRSS-AWG die size is 1.46 × 0.93 mm<sup>2</sup>, which is about 11 % of the MRSS chip having 4.8 × 2.4 mm<sup>2</sup>.



Figure 19. Die microphotograph of MRSS-AWG in the MRSS IC.

Figure 20 shows the measurement result of a Hann window ( $cos^4$ ) having  $f_{\omega}$  from 25 kHz to 800 kHz. This is successfully measured from the MRSS-AWG by changing the address increment signals, *RES*<1:0>, and  $f_{clk}$  from 4.8 MHz to 38.4 MHz.



Figure 20. Measurement results of MRSS-AWG showing a multi-resolution  $cos^4$ .

#### **2.3 AWG for Analog Matched Filter**

### 2.3.1 Synchronizable MF-AWG Architecture

An important functionality of the AWG for the analog MF is its ability to align a waveform with an arbitrary starting point. The MRSS requires measuring only the power of the incoming signal; thus the waveform starting point is not important. However, the timing information is crucial for the MF since it should be possible to determine precisely the maximum correlation point on a time axis. Therefore, the AWG in this work incorporates an additional functionality such that the starting point of the waveform can be synchronized with an external input trigger. The AWG for the MF is called a MF-AWG. Figure 21 shows a block diagram of the MF-AWG. To make the waveforms for the separate I and Q paths, the MF-AWG is composed of a 22-bit SRAM, four 11-bit DACs, and four LPFs. The SRAM is configured as four banks of 256x22b, which store the digitized chirp signal or the wavelet signal to be converted to analog signals through four DACs. The 22-bit data of SRAM are read out to two latch blocks by 11-bit data separately, which enable the complementary and synchronized outputs for four DACs.



Figure 21. Block diagram of MF-AWG.

Figure 22 shows a more detailed schematic diagram of the SRAM and the address generator block of the MF-AWG. In order to make the waveform in a row from the MF-AWG, the digital data stored in the SRAM is accessed by the signal generated in an address generator block continuously. Among the total 10-bit addresses, 2-bit low order addresses (LOaddrs) are generated by the 2-bit counter. However, 6-bit high-order addresses (HOaddrs) for x-decoders are generated in the embedded serial latch for reducing the power consumption in address buffer blocks like the MRSS-AWG. The 4-bank bit-cell array can store the different types of waveforms to be triggered and accessed digitally for another pulse compression purpose. To make the synchronizable MF-AWG, an additional address generation circuit has been adopted. For the external synchronizing functionality, *lat in* signal of the SL block in the x-decoder is not fixed, but is selected from the HA SUM signal, which is decoded by external HOaddr. If HA SUM is in high state, it blocks *lat in down* from the preceding *lat delay up* and turn-on P1 in the lat in selection block. This signal enters the input of the SL block and sets the start point of the waveform with the pre-set X < 3:0 >.

Figure 23 shows the synchronizable timing diagram of the SRAM and the address generator of the MF-AWG. The external 10-bit addresses are used to set the starting point of the MF-AWG. By assigning the expected point of the waveform in a normal mode, the waveform is generated from that point in a sweep mode. If one cycle of the waveform is generated, the internal reset signal is generated to set the latch data in the x-decoder to start the waveform again from the pre-assigned point.





WORD131 WORD132 WORD133 WORD134 WORD135 WORD135 WORD129 real\_DWG\_top SEQ2 X3\_PULSE

## 2.3.2 Measurement Results of MF-AWG

The duration of the waveform can be controlled externally by a clock frequency or resolution control bits in the MF-AWG. This waveform flexibility can be adapted to the digitally controlled multi-resolution pulse-compression technology. Figure 24(a) shows the measurement results of a chirp waveform,  $f_{\infty}$  being from 9.2 kHz to 18.7 kHz having the internal clock frequency from 9.6 MHz to 19.2 MHz, and the address increment 1 or 2, respectively. Figure 24(b) shows the measurement results of a Daubechies wavelet that can be used in the MF instead of a chirp waveform for MIMO radar applications for its orthogonal characteristics.



Figure 24. Measurement results of (a) chirp signal, and (b) Daubechies wavelet of MF-AWG.

Figure 25 shows the measurement results of the arbitrary starting characteristic of the MF-AWG. In order to identify the exact distance where the waveform has been bounced, the starting point of the waveform needs to be controlled and adjusted digitally during the correlation. For example, the MF-AWG is measured for the chirp waveform having the delay of 0, 1/4, 2/4, and 3/4 T of the whole waveform clock cycle. This can be achieved by controlling the external trigger signals.



Figure 25. Measurement results of arbitrary starting chirp waveform of MF-AWG.



Figure 26. Chirp waveform (a) ideally generated in the time domain by MATLAB<sup>TM</sup>, (b) sampled at  $f_{clk}$ =38.4 MHz, and transferred to a frequency domain by N=2<sup>19</sup> point FFT, (c) resampled at 100-MS/s after Butterworth 2<sup>nd</sup> order LPF with a 10.78 MHz cutoff frequency, and (d) measured at 100-MS/s.



Figure 26 continued.

To analyze the measurement results of the AWG, a linear chirp signal with a 5.39-MHz bandwidth is measured from the MF-AWG and compared with the signal generated by MATLAB<sup>TM</sup> as shown in Figure 26. The study shows that if the oversampling

frequency is non-integer times of the sampling frequency of the original waveform, a side lobe is influenced by spectral leakage. Therefore, to consider the side lobes effect of the oversampling frequency and analyze the measurement results fairly, we sampled the ideal simulation data at 38.4 MHz, transferred them to a frequency domain by an N=2<sup>19</sup> point FFT, and resampled them at 100 MS/s as Figure 26(a), (b), and (c), respectively. To cut off the noise above 10 MHz, we apply a Butterworth 2<sup>nd</sup> order lowpass filter with a 10.78-MHz cutoff frequency, shown in Figure 26(c). The measured chirp window in the time domain is originally generated at a 38.4-MHz clock frequency from the MF-AWG. Then, they are sampled by an oscilloscope (LeCroy<sup>TM</sup> 7300A) at 100 MS/s for I and Q, separately. They are converted to a frequency domain using an N=2<sup>19</sup> point FFT as shown in Figure 26(d). The output is rescaled for the dBm level and compared with Figure 26(c) to evaluate the AWG performance. The auto-correlation factor of the measured output *m*(*t*) and the simulated output *s*(*t*) are given by

$$R_{ss} = s * s^*, \tag{7}$$

$$R_{mm} = m * m^*, \tag{8}$$

The maximum cross-correlation factor between m(t) and s(t) is driven by

$$R_{ms} = \max\left(\left[\frac{m}{\sqrt{R_{mm}}}\right] * \left[\frac{s}{\sqrt{R_{ss}}}\right]^*\right).$$
(9)

The auto-correlation factor of the ideal signal generated by MATLAB<sup>TM</sup> is  $5.02 \times 10^5$ , and the measured signal is  $8.36 \times 10^4$ . Then, the maximum cross-correlation factor of the two signals, which are normalized by the square root of their auto-correlation values, is 0.94. This analysis suggests that the measured output from the MF-AWG cross-correlates the ideal signal by a factor of 0.94. Therefore, the discrepancy between those signals in a frequency response is about 6%.

An ideal Daubechies 8<sup>th</sup>-order wavelet is sampled at 38.4 MHz, converted to a frequency domain by an N=2<sup>19</sup> point FFT, and resampled at 100 MS/s, as shown in Figure 27(a), (b), and (c), respectively. The data measured at 38.4 MHz are sampled at 100 MHz by an oscilloscope and converted to a frequency domain by an N=2<sup>19</sup> point FFT, as shown in Figure 27(d). The auto-correlation factor of the ideal wavelet signal is  $4.36 \times 10^4$  and that of the measured signal is  $6.56 \times 10^3$ . The maximum cross-correlation factor is 0.988. These results indicate that the measured Daubechies wavelet is more cross-correlated to the ideal signal than the chirp waveform.



Figure 27. Daubechies wavelet (a) ideally generated in the time domain by MATLAB<sup>TM</sup>, (b) sampled at  $f_{clk}$ =38.4 MHz, and transferred to a frequency domain by N=2<sup>19</sup> point FFT, (c) resampled at 100-MS/s through Butterworth 2<sup>nd</sup> order LPF with a 10.78 MHz cutoff frequency, and (d) measured at 100-MS/s.





Figure 27 continued.

The layout of the MF-AWG except for the LPF is shown in Figure 28. The SRAM and the address generation logic occupy about 80% of the total MF-AWG area, two latch blocks, 6%, and four DACs, 14%, respectively.



Figure 28. Layout of SRAM, Latch, and DAC of MF-AWG.

Figure 29 shows the die micrograph of the analog MF having the MF-AWG. The MF is fabricated in a 0.18- $\mu$ m CMOS technology. The MF-AWG area is 1.7 × 1.5 mm<sup>2</sup> occupying about 45% of the total area of the MF of 3.13 × 1.81 mm<sup>2</sup>.



Figure 29. Die micrograph of MF-AWG in the MF.

#### 2.4 Comparison of MRSS-AWG and MF-AWG

A fully integrated SRAM-based AWG is developed for an analog signal processing. One is for the MRSS for a CR application (MRSS-AWG), and the other is for the MF for an analog pulse compression function (MF-AWG). The AWG utilizes the low-power architecture merging the address generator into SRAM not to make the redundant address buffer operate in a sweep mode. For the MRSS, the MRSS-AWG generates the Hann window, and the fabricated chip demonstrates the multi-resolution windows successfully. For the MF, the chirp window and the Daubechies wavelet are successfully generated featuring the arbitrary starting characteristics from the MF-AWG.

Table V shows the summary of characteristics of the MRSS-AWG and the MF-AWG. The MF-AWG is two times larger than the MRSS-AWG to make the waveforms through both I and Q paths at the same time. Therefore, the power consumption is proportional to the physical size. The AWG shows SFDR = -45 dBc at  $f_{clk}$  = 38.4 MHz and  $f_{out}$  = 37 kHz.

| Summary of characteristics of Witts5-Awd and Wit-Awd |                              |                                                       |                                             |  |  |  |
|------------------------------------------------------|------------------------------|-------------------------------------------------------|---------------------------------------------|--|--|--|
| Characteristics                                      |                              | MRSS-AWG                                              | MF-AWG                                      |  |  |  |
| Applications                                         |                              | Multi-resolution spectrum sensing for cognitive radio | Analog matched filter for pulse compression |  |  |  |
| Resolution Control bits                              |                              | LOaddr[1:0]                                           |                                             |  |  |  |
|                                                      | Words                        | 896                                                   | 1024                                        |  |  |  |
| RAM                                                  | Bits per Word                | 11b                                                   | 22b                                         |  |  |  |
|                                                      | Mux                          | 4                                                     | 4                                           |  |  |  |
| DAC                                                  |                              | R-2R type 11-bit                                      |                                             |  |  |  |
| LPF                                                  |                              | 6 <sup>th</sup> order Chebyshev type                  |                                             |  |  |  |
|                                                      | Area                         | $1.46 \times 0.93 \text{ mm}^2$                       | $1.7 \times 1.5 \text{ mm}^2$               |  |  |  |
| Performance                                          | Power                        | 15mW                                                  | 31mW                                        |  |  |  |
|                                                      | (f <sub>CLK</sub> =19.2 MHz) | 1511100                                               |                                             |  |  |  |
|                                                      | SFDR                         | -45dBc                                                |                                             |  |  |  |

Table V Summary of characteristics of MRSS-AWG and MF-AWG

| Design        | Туре                       | Waveform     | Application | SFDR<br>(dBc) | Process<br>(µm) | Maximum<br>f <sub>clk</sub><br>(MHz) | Supply<br>Voltage<br>(V) | Area<br>(mm <sup>2</sup> ) | Power<br>(µW/MH<br>z) |
|---------------|----------------------------|--------------|-------------|---------------|-----------------|--------------------------------------|--------------------------|----------------------------|-----------------------|
| This<br>Paper | On-chip<br>AWG             | Arbitrary    | MRSS/MF     | 45            | 0.18            | 38.4                                 | 1.8                      | 1.35                       | 750                   |
| [19]          | On-chip<br>DDS             | Sinusoidal   | General     | 101           | 0.25            | 201                                  | 2.5                      | 0.036                      | 62                    |
| [24]          | On-chip<br>pulse<br>shaper | UWB<br>pulse | UWB         | 30            | 0.18            | 2500                                 | 1.8                      | 0.32                       | 50                    |
| [34]          | Off-chip<br>AWG            | Arbitrary    | Radar       | 45            | N/A             | 1300                                 | N/A                      | N/A                        | 6 W<br>(DC)           |

TABLE VIComparison of Waveform Generators

Table VI shows the comparison of the waveform generators according to the applications. Since the MRSS-AWG and MF-AWG are fully integrated in an SOC to generate an arbitrary waveform, they are not easily compared with an on-chip DDS or an off-chip AWG. A DDS generates sinusoidal signals, and an off-chip AWG consumes a lot of power, but an on-chip AWG generates arbitrary waveforms consume more power than a DDS, and less than an off-chip AWG.

# **CHAPTER III**

# Low-Power AWG for Low-Power MRSS

## **3.1 Overview**

The objective of the proposed research is to develop a fully integrated low-power SRAM-based arbitrary waveform generator. The low-power AWG (LP-AWG) could be implemented using partial swing techniques of a self-deactivated data transition bit (DTB) structure. The LP-AWG will be applied to the multi-resolution spectrum sensing for a cognitive radio application and the analog matched filter for a radar pulse compression application in a deep sub-micron CMOS technology.

The proposed LP-AWG is exploiting the repetitive and predictable window stored in the SRAM. The partial accessing scheme using the self-deactivated data transition bit (DTB) structure is proposed. In addition, the short current reduction buffer is proposed to reduce the short current in the AWG adopting the diode-connected low-swing signaling scheme.

## 3.2 Self-Deactivated Data Transition Bit (DTB) Scheme

The capacitance of internal nodes can be analyzed after the layout of the SRAM in the AWG is extracted. Table VII shows the power analysis of the SRAM based on the capacitance of the internal node, the toggling rate, and the voltage swing range. If the SRAM is configured as four banks of 256×11b array, the power in the SRAM is consumed mainly in BIT lines (31.7%) and DOUT (36.3%) lines. Therefore, the design was focused on those parts to reduce the power consumption of the SRAM.

| Internal<br>Nodes | Cap.   | Toggling<br>Rate   | Total Cap.           | Swing Range                                | Power Consumption<br>(= Total Cap. × Swing<br>Range) |
|-------------------|--------|--------------------|----------------------|--------------------------------------------|------------------------------------------------------|
| BIT line          | 220 fF | $11 \times 4 = 44$ | 220 × 44<br>= 9.6 pF | $\Delta V < VDD$<br>.( $\approx 0.2 VDD$ ) | 1.92<br>(31.7%)                                      |
| Wordline          | 130 fF | 1                  | 0.13 pF              | VDD                                        | 0.13<br>(2.1%)                                       |
| HOaddr            | 280 fF | 2 (max)            | 280 × 2<br>= 0.56 pF | VDD                                        | 0.56<br>(9.2%)                                       |
| X3~X0             | 150 fF | 1                  | 0.15 pF              | VDD                                        | 0.15<br>(2.5%)                                       |
| PRE               | 220 fF | 5                  | 220 × 5<br>= 1.1 pF  | VDD                                        | 1.1<br>(18.2%)                                       |
| DOUT              | 200 fF | 11                 | 200 × 11<br>= 2.2 pF | VDD                                        | 2.2<br>(36.3%)                                       |

Table VII Power analysis of SRAM of AWG

For the power analysis considering the window characteristics, the toggling ratio of a Hann window  $(cos^4)$  is analyzed when it is stored in the SRAM as an 11-bit digital data (Table VIII). According to the analysis, the proposed DTB structure can reduce the toggling rate by about 34%. Moreover, if the bank-selected partial access scheme is added, the power consumption could be more reduced.

The digital data stored in the SRAM are read out to make a Hann window for a correlation with an incoming signal in the MRSS. The whole memory cell (MC) array in the SRAM is not necessary to be accessed at every clock cycle to make an adequate waveform. The easy way to save the power consumption is to divide the MC array into several blocks and access the blocks selectively when they are needed.

| Data[10:0]                                                                                                  | No. of Repetition (NoR) |  |  |  |
|-------------------------------------------------------------------------------------------------------------|-------------------------|--|--|--|
| 1000XXXXXXX                                                                                                 | 62                      |  |  |  |
| 1001XXXXXXX                                                                                                 | 10                      |  |  |  |
| 1010XXXXXXX                                                                                                 | 8                       |  |  |  |
| 1011XXXXXXX                                                                                                 | 7                       |  |  |  |
| 1100XXXXXXX                                                                                                 | 7                       |  |  |  |
| 1101XXXXXXX                                                                                                 | 7                       |  |  |  |
| 1110XXXXXXX                                                                                                 | 9                       |  |  |  |
| 1111XXXXXXX                                                                                                 | 36                      |  |  |  |
| 1110XXXXXXX                                                                                                 | 9                       |  |  |  |
| 1101XXXXXXX                                                                                                 | 7                       |  |  |  |
| 1100XXXXXX 7                                                                                                |                         |  |  |  |
| 1011XXXXXXX                                                                                                 | 7                       |  |  |  |
| 1010XXXXXXX 8                                                                                               |                         |  |  |  |
| 1001XXXXXX 10                                                                                               |                         |  |  |  |
| 1000XXXXXX 62                                                                                               |                         |  |  |  |
| Toggling no. of 4-bit MSB = $15 \times 4$                                                                   |                         |  |  |  |
| Toggling no. of 7-bit LSB = $256 \times 7$                                                                  |                         |  |  |  |
| Toggling no. of 11-bit: $256 \times 11 = 2816 (100\%) \rightarrow 256 \times 7 + 15 \times 4 = 1852 (66\%)$ |                         |  |  |  |

Table VIII Analysis on data toggling rate of Hann Window

Figure 30 shows a conceptual diagram showing data stored in a RAM of the LP-DWG. To have a band-limiting property for an MRSS operation, a DWG is required to generate a window with the frequency response of the low-pass filtering characteristic such as a Hann window [8], [9]. For a Hann window, the most significant bits (MSBs) show a lower transition probability than the least significant bits (LSBs). Because the bit-cell data are known in advance before they are written in a RAM, the unnecessary power consumption can be avoided by adding a DTB in a bit-cell array. Then, MSBs are accessed only when it is necessary to overwrite the output of a RAM using the new MSBs.



Figure 30. Conceptual diagram showing data stored in a RAM of the LP-DWG.

In a normal RAM without the DTB, the number of RAM bit-cell access for the duration of one window is

$$N_{bit} = BPW \times N_{word} , \qquad (10)$$

where *BPW* is the number of bits per a word, and  $N_{word}$  is the number of words per a window. Then, the number of RAM bit-cell access in the conventional DTB structure [35], [36] is

$$N_{bit}(\text{conv. DTB}) = (BPW-B_{sa}+1) \times N_{word} + B_{sa} \times N_{MSBword},$$
  
DTB evaluation  $\downarrow$  (11)

where  $B_{sa}$  is the number of bits conditionally accessed by a DTB, and  $N_{MSBword}$  is the number of words at which MSBs are accessed. Total *BPW* do not have to be accessed at every clock cycle, but they are conditionally accessed when the MSBs in the next cycle are

different from the previous ones in a RAM output.  $N_{bit}$  is increased by the DTB evaluation bits at every clock. However, in the proposed DTB structure, the DTB itself is evaluated when it is necessary. Therefore,  $N_{bit}$  is given by

$$N_{bit}(\text{prop. DTB}) = (BPW-B_{sa}) \times N_{word} + (B_{sa}+1) \times N_{MSBword}.$$
 (12)

Then, a power saving factor  $P_s$  of the proposed DTB over the normal scheme is given by

$$P_{s} = \frac{(BPW-B_{sa}) \times N_{word} + (B_{sa}+1) \times N_{MSBword}}{BPW \times N_{word}}.$$
(13)

Figure 31 shows the self-deactivated DTB adopted in the LP-DWG. A RAM is configured as four sub-arrays of 256×11b accessed by two *YMUX* addresses. The window,  $cos^4$ , is stored as 11b data [8], [9]. 11b data are divided into 4 MSBs and 7 LSBs. While 7 LSBs are accessed at every clock, 4 MSBs are accessed conditionally according to the DTB. To disable the DTB evaluation at every clock cycle, the DTB is enabled by second local wordline (*LWL2*) by the DTB itself. A power saving factor of the proposed DTB is 0.66 according to (12), where *BPW* = 11,  $N_{word}$  = 256,  $B_{sa}$  = 4, and  $N_{MSBword}$  = 15. This means that the proposed DTB can reduce the power consumption in a bit-cell array by about 11% over the conventional DTB power saving factor of 0.74 by limiting the operation of the DTB.

Figure 31(b) shows the timing diagram of the proposed DTB scheme. When DTB=0 and *LWL2*=0, the local SA enable signal (*LSENSE*) is disabled to inactivate the SAs of MSBs. Because the bit-lines of the DTB bit-cell stay at the precharged level, VDD, different from the conventional one, a mismatch SA is adopted to sense the DTB bit-cell status. Figure 31(c) shows a mismatch SA that is making *LSENSE* high or low according to the DTB bit-cell data.







(b)

Figure 31. A self-deactivated data transition bit (DTB) of the LP-DWG showing (a) schematic, (b) timing diagram, and (c) mismatch sense amplifier.



Figure 31 continued.

# **3.3 Diode-Connected Low-Swing Signaling Scheme with a Short Current** (I<sub>SHORT</sub>) Reduction (SR) Buffer

Figure 32 shows the diode-connected low-swing signaling scheme with the short current ( $I_{SHORT}$ ) reduction buffer embedded in the latch block of the LP-AWG. This scheme aims to reduce the power consumption due to the repetitive switching on a long signal bus with the diode-connected signaling scheme [37]. The techniques to utilize the leakage current reduction in a level shifter using multiple voltages were addressed in [38]. However, they overlooked an  $I_{SHORT}$  issue in the termination buffer that occurs during the input transition period, especially from low to high state. In order to reduce  $I_{SHORT}$ , a diode is added in the GND path of the buffer. When OUTd is expected to change from GND to VDD, the diode reduces the current in the buffer making the X2 level up above GND. If OUTd goes up to VDD, OUTd turns on N2 to force X2 level down to GND. This reduces  $I_{SHORT}$  in the buffer by an effect of GND bouncing at  $T_{EN1}$ . Similarly, when OUTd goes

from VDD to GND, X1 is going down below VDD, and decrease  $I_{SHORT}$  from VDD to GND for VDD bouncing at  $T_{EN2}$ . While the OUTd evaluation time is relatively high for the normal inverter buffer, this scheme is very effective in the non-performance critical path in the system together with the multiple voltages scheme. In the low-power MRSS (LP-MRSS), the operation of making the waveforms is not the bottleneck of the overall performance. Therefore, we could apply these schemes in the appropriate portion of the LP-MRSS.



Figure 32. Diode-connected low-swing signaling scheme with an  $I_{SHORT}$  reduction (SR) buffer.

# 3.4 Charge Recycling with Push-Pull Level Converter for Asynchronous Design

#### 3.4.1 Introduction

A low-power technique to drive high-capacitive load lines has been one of the challenging issues for low-power integrated circuits (ICs) [39]. The complex ICs are divided into the sub-blocks that are optimized for the multi-functions. Then, the non-timing-critical blocks has a local VDD lower than the timing-critical blocks [40]. However, the receiver block consumes redundant dynamic power for the extra control signal that is required to reduce the short circuit power consumption (IS). In the asynchronous receiver case, the level converters (LCs) were proposed for the multi-VDD schemes [41]. The LCs could reduce IS as an interface between different VDD blocks. However, they have not been fully investigated for the whole range of an input swing. Especially, when the high-capacitive load line toggles slowly, IS is still a big challenge. A low power Shmitt trigger was developed to reduce IS by changing a hysteresis according to the input transition [42]. However, the output of the Shmitt trigger does not stay at a full VDD or GND, because either PMOS or NMOS diode turns off in an idle status. Therefore, it has the limitation to be used as a low power LC for the DC leakage current at the fan-out circuit of the Shmitt trigger. Meanwhile, charge-recycling (CR) schemes were utilized to drive high-capacitive load lines [43]. The charge in a load line is stored in a dummy capacitor before discarded to GND. Then, it is injected again into the load line from a dummy capacitor before the load line is actively driven from VDD. However, CR requires the synchronous control signal for the receiver to reduce IS during the charge-recycling period. Therefore, the extra power consumption of the control signal diminishes the
advantage of CR. Moreover, it has been difficult to apply CR to the asynchronous logic. In this paper, the enhanced LC is developed to decrease IS in CR. The proposed LC changes the input logic threshold voltage adaptively to decrease IS according to the input transition, so it can be used for asynchronous circuits.

# 3.4.2 Asynchronous and Synchronous Charge Recycling

Figure 33 shows an asynchronous charge recycling with an inverter receiver (ACR\_IR) and a synchronous CR with a clocked inverter receiver (SCR\_CIR). Depending on the existence of a control signal in the receiver, it is called as an asynchronous or synchronous. In ACR\_IR, the receiver consumes IS during the charge-recycling timing (TEQ), because the receiver input stays at an equalization state (VEQ) between VDD and GND. In SCR\_CIR, CLK is used to disconnect the path from VDD and GND to block IS during TEQ consuming an additional ID for CLK.

Figure 34 shows the timing diagram and power consumption of *ACR\_IR*, *SCR\_CIR*, and inverter driver with an inverter receiver (*ID\_IR*). *ID\_IR* consumes dynamic power (ID) for the charging timing. Meanwhile, ACR\_IR needs TEQ two times for transferring charges to and withdrawing charges from D\_storage. Then, D is driven to full VDD after the second TEQ. Figure 34(c) shows that the benefit of small ID of ACR\_IR is diminished because of IS. Figure 34(d) shows the power consumption of SCR\_CIR. Even if SCR\_CIR eliminates IS during TEQ, it consumes an additional power for CLK (ICLK). Because ICLK can be the same as ID of ID\_IR, and it is consumed two times, SCR\_CIR shows the larger power consumption than both ID\_IR and ACR\_IR. Therefore, SCR\_CIR is not a good candidate for a low power application. To reduce ID using ACR, it is necessary to develop the receiver circuit to reduce IS without a control signal.





Figure 33. Asynchronous charge recycling with an inverter receiver (*ACR\_IR*) and synchronous *CR* with clocked inverter receiver (*SCR\_CIR*).



Figure 34. Timing diagram and power consumption of *ACR\_IR*, *SCR\_CIR*, and inverter driver with an inverter receiver (*ID\_IR*).

# 3.4.3 Proposed Push-Pull Level Converter (PPLC)

Figure 35 shows the conventional LC [41] and the proposed push-pull LC (PPLC). The conventional LC is composed of an inverter receiver (P2, N2), a PMOS diode (P0), and a PMOS trigger (P1), so it is called a PMOS-driven level-converter (PLC). PLC is effective to reduce IS partially during ACR operation. Therefore, to reduce IS during the whole period, the symmetric type of LC is necessary like PPLC. It is composed of the additional NMOS diode (N0) and NMOS trigger (N1) on top of the conventional PLC. N0

makes Z node higher than GND during the second TEQ. As DQ transitions to VDD, DQ turns on the NMOS N1 connecting the Z node to GND.



Figure 35. *PLC* and *PPLC* showing initial status starting to consume *I*<sub>S</sub> at *LtoH* transition.



Figure 36. (a) Transient response, and (b) DC response of *PLC* and *PPLC*.



Figure 36 continued.

Transient & DC Analysis of PLC and PPLC: Figure 36(a) compares the transient responses when D transitions from low to high (LtoH). PLC starts to operate when D arrives at 0.52 V by making VGS of N2 higher than threshold voltage (Vth), which is denoted in Figure 35. Meanwhile, PPLC starts to operate at 1.46 V by making VGS of N2 0.81 V, and VDS of N0 0.65 V. Figure 36(b) compares the static voltage transfer characteristics (VTC) of PLC and PPLC according to the direction of input voltage (VD). The inverter gate threshold voltage (VI) is defined by the point where the voltage transfer curve intersects the unity gain line defined as VZ = VD [44]. Generally, VI of the inverter is designed as 0.5VDD, which is fixed irrelevant to the input voltage transition. For both PLC and PPLC at HtoL transition, VI is driven by

$$\beta_p/2^*(V_I - V_{T_p})^2 = \beta_n/2^*(V_Y - V_I - V_{T_n})^2,$$
 (14)

$$V_{I} = \frac{V_{Y}}{2} = V_{M},$$
 (15)

where VTp and VTn are threshold voltages, and  $\beta p$  and  $\beta n$  are transconductances of PMOS and NMOS, respectively. Then, VTp = VTn = VT and  $\beta p = \beta n = \beta$  are assumed. In this case, because VY, which is the initial voltage of Y node, is less than VDD due to the diode of P0, VI is less than 0.5VDD. This means that the outputs of PLC and PPLC start transition after the inputs cross over 0.5VDD, and it helps to reduce IS. However, at the LtoH transition, while VI of PLC is given by VM same as the inverter buffer, VI of PPLC is derived by

$$\beta/2^{*}(V_{I}-V_{X}-V_{T})^{2} = \beta/2^{*}(V_{DD}-V_{I}-V_{T})^{2}, \qquad (16)$$

$$V_{I} = \frac{V_{DD}}{2} + \frac{V_{X}}{2} = V_{M^{+}}.$$
(17)

Because VI of PPLC is larger than VM, and VZ starts to transition after the input crosses over 0.5VDD. VM+ of PPLC helps to reduce IS compared to PLC having VM.

#### 3.4.4 Asynchronous Charge Recycling Scheme with PPLC (ACR\_PPLC)

The proposed ACR\_PPLC is shown in Figure 37. To reduce the dynamic power consumption of CR optimally, the non-overlapping equalization generator (EQgen) is proposed. The non-overlapping EQgen makes the equalization signal (EQ) to equalize D and D\_storage before D is driven through VDD by guaranteeing the rising of EQ after the falling of EN and the rising of EN after the falling of EQ. Figure 38 shows the power consumption comparison of PLC, IR, and PPLC versus TEQ. As TEQ increases, the power consumption of PLC, IR increases almost linearly due to IS, while IS stays constantly for PPLC.



Figure 37. Schematic of *ACR\_PPLC* and *EQ* generator.



Figure 38. Power comparison versus  $T_{EQ}$  and  $C_L$  at  $V_{DD}$ =1.8 V, 125 °C, FF.

This means that PPLC can block IS efficiently in the charge-recycling operation. The power consumption of ID\_IR, SCR\_CIR, ACR\_IR, and ACR\_PPLC versus a load capacitance (CL) is shown in Figure 38. When CL is small, ID\_IR is smallest among all schemes since the additional power consumption to operate the control blocks of other schemes is not required. However, as CL increases, CR takes an effect. Moreover, ACR\_PPLC shows the least power consumption for the benefit of PPLC at the large CL.

# 3.5 Robust Latch-Type Sense Amplifier Using an Adaptive Latch Resistance

#### 3.5.1 Introduction

A conventional latch-type sense amplifier (SA) was one of the best candidates for a static random access memory (SRAM) [45], [46] as well as an analog functional block [47] for its time-proven performance. The SA characteristics that the logic status is hard to be changed once the condition for the latch feedback is settled can be a double-edged sword; In an SRAM, if the bit-line voltage swing  $(\Delta V_{BL})$  does not meet the minimum SA margin (SM), read-failure probability is increased. As bit-cell leakage current ( $I_{LEAK}$ ) problems and PVT variations are aggravated in ultra deep-submicron (UDSM) technologies [48] SM degrades SRAM performance. In this paper, we propose the robust latch-type SA with an adaptive resistance technique to solve the problem mentioned above. The resistance (*latch* R) is inserted in the latch of SA, and adaptively controlled by the voltage level of bit-lines to make SA more immune to mismatches. The leakage current of bit-cell is important not only for low power, but also for reliability, because it affects  $\Delta V_{BL}$  by decreasing the precharged voltage level of bit-line. Figure 39 illustrates the block diagram of one column bit-cell array showing the worst effect of leakage current and a data structure on  $\Delta V_{BL}$ . When an active bit-cell stores the opposite data to other inactive bit-cells,  $\Delta V_{BL}$  is reduced by a voltage drop of *BLb* due to the summation of *I*<sub>LEAK</sub>. Moreover, under the worst process, voltage, and temperature (PVT) condition, it is excessive enough to fail SA. Therefore,  $\Delta V_{BL}$  needs to be investigated carefully to be larger than an SM under PVT variations.



Figure 39. Block diagram of bit-cell array showing an  $I_{LEAK}$  effect on a  $\Delta V_{BL}$ .

# 3.5.2 Conventional Latch Type Sense Amplifier

Figure 40(a) shows the conventional latch-type SA. It consists of an input stage (*MP1/MP2*), a latch stage (*MN1/MP3*, *MN2/MP4*), and SA enable stage (*MP5*). Without any mismatch in SA, the latch output nodes (*SO* and *SOb*) move in the same direction to the tiny  $\Delta V_{BL}$ . However, due to unfavorable SA mismatches, enough  $\Delta V_{BL}$  is necessary to overcome the SA mismatches. Therefore, it makes SA enable signal (*S\_EN*) delayed to acquire the sufficient *SM*, and degrades the SRAM performance at the end. Fig. 1(b) shows

the read-failure simulation result of the conventional latch-type SA. The  $\Delta V_{BL}$  is set to a value that is smaller than the *SM* and the unfavorable mismatches are given to the transistors. Therefore, *SO* and *SOb* move to be latched to the low and high status, respectively, though *BL* is lower than *BLb* initially.



Figure 40. Conventional latch-type sense amplifier (a) Schematic. (b) Simulation.

#### 3.5.3 Proposed Latch Resistance-Controlled SA

Figure 41(a) shows the proposed SA with a latch-resistance (*latch\_R*). It looks similar to the conventional SA except for the *latch\_Rs* which are controlled by the gate voltages of *BIT/BITb*. The *latch\_R* is connecting the gate of the latch inverter to the output of the other. In this way, it can control the connectivity strength between the latch output and input. When a *BLb* falls below a precharge level due to  $I_{LEAK}$ , the resistance value which connects an *SO* to an *SOb* is increasing, and the connectivity strength between two nodes gets to decrease accordingly. This makes delay for latches to amplify the initial voltage difference made by the *BL/BLb*. Figure 41(b) shows simulation result of the proposed *latch\_R*-type SA. *Mis-L<sub>MP2/MP1</sub>*, *mis-L<sub>MN2/MN1</sub>*, and *mis-C<sub>2/1</sub>* are 15% respectively, which is same as the conventional SA. Moreover, there is additional 15% length mismatch in *latch\_R1/latch\_R2* (*mis-L<sub>R1R2</sub>*) for the fair comparison with the conventional SA without *latch\_Rs*. Time response is divided into four regions compared to three of the conventional one.

*1)* During T1 (precharge phase), SA is precharged before SA evaluation like the conventional SA. Therefore, *SO/SOb* and *S\_REF* are set to a similar voltage level to those of the conventional one initially. PMOSs with the *latch-R1/latch-R2* connect *SO\_g* to *SOb*, and *SOb\_g* to *SO* at this phase.

2) During T2 (sense amplifier enable phase), SA is enabled to activate the internal nodes. The initial voltage difference in *SO/SOb* is decreasing, and it comes to 0 V at the end of T2 due to the reverse SA mismatches. However, *SO\_g/SOb\_g* stay in an initial state with smaller voltage difference. It means that an *SO\_g* higher than an *SOb\_g* can change *SO/SOb* in a forward direction again through two latch inverters.



Figure 41. Proposed latch-resistance sense amplifier. (a) Schematic. (b) Simulation.

3) During T3 (regeneration phase), SA regenerates the latch nodes like the conventional one. Even if *SO* is made higher than *SOb* at the start of T3, *SO* is lower than *SOb* to overcome SA mismatches.  $SO_g/SOb_g$  stay in forward direction ( $SO_g > SOb_g$ ). At the end of T3, *SO/SOb* overcome SA mismatches, and start to be amplified through a regeneration of two inverters. To analyze SA in this phase, it is necessary to investigate a *latch\_R* defined by the latch inverter voltages. A *latch\_R* is given by

$$\frac{1}{latch_R} = \mu_n C_{OX} \left[ \frac{W}{L} \left( V_{gs} - V_{in} \right) \right], \tag{18}$$

where  $\mu_n$  is electron mobility,  $C_{OX}$  is oxide capacitance, and W, L are transistor width and length. This equation means that *latch\_R* is defined by the gate and the source voltage. Meanwhile, *latch\_R1* is controlled by *BL* acting as a resistance connecting *SOb* to *SO\_g*. Therefore, as *BL* is lowered, it helps for *SO\_g* in a forward direction to be amplified through a latch inverter by helping the latch output to recover the reverse mismatches. Figure 42 shows the conceptual timing of *SO/SOb*, *SO\_g/SOb\_g*, and *BL/BLb* showing the nodes of *latch\_R1/latch\_R2* transistors in a regeneration phase.



Figure 42. (a) Conceptual timing of *SO/SOb*, *SO\_g/SOb\_g*, and *BL/BLb* showing the nodes of *latch\_R1/latch\_R2* transistors in a regeneration phase. (b) Latch error modeling of *Latch R-SA*.



Figure 42 continued.

4) During T4 (latch phase), SA is latched at the final state. Due to the forward  $SO_g/SOb_g$ , SO/SOb are set to the forward state at the end of T4. The output of the *latch\_R*-type SA is also fixed after a latch phase. *S\_EN* delay affects the system performance directly, and is discussed in Section IV.

# 3.5.4 Latch\_R-Type Sense Amplifier Analysis

# 3.5.4.1 Output Margin (OM) analysis

 $\Delta V_{OUT}$  in a forward direction is given by

$$\Delta V_{OUT}(forward) = V(SOb) - V(SO), \tag{19}$$

where *BL* is lower than *BLb* without mismatches. Meanwhile,  $\Delta V_{OUT}$  in a reverse direction is given by

$$\Delta V_{OUT}(reverse) = V(SO) - V(SOb), \qquad (20)$$

where mismatches is biased to make *SOb* lower than *SO* at a  $\Delta V_{BL} = 0$  mV. If *BLb* is higher than *BL*, SA outputs, *SOb/SO*, are supposed to be  $V_{DD}/0$ . However, the latch output voltages move toward a reverse direction due to reverse mismatches. To make a right operation of SA,  $\Delta V_{OUT}$  (*forward*) must be larger than  $\Delta V_{OUT}$  (*reverse*). When  $\Delta V_{OUT}$ reaches the latching voltage at which SA is latched,  $\Delta V_{OUT}$  (*reverse*) is larger than  $\Delta V_{OUT}$ (*forward*) for the conventional SA. Therefore, *SOb/SO* reach 0 *V*/*V*<sub>DD</sub>, even if *BL* is lower than *BLb* by  $\Delta V_{BL}$ . On the contrary, the proposed SA having 100 mV input voltage margin can overcome 15% mismatches, and make it operate in a right direction. Moreover, for the proposed *latch\_R*-type SA, it is shown that  $\Delta V_{OUT}$  (*reverse*) decreases as the gate voltage of the *latch\_R* is decreasing. This indicates that the lower bit-line voltage makes SA stronger by making a  $\Delta V_{OUT}$  (*forward*) larger than a  $\Delta V_{OUT}$  (*reverse*).

The output voltage margin (*OM*) is defined as the voltage difference of  $\Delta V_{OUT}$ (*forward*) and  $\Delta V_{OUT}$ (*reverse*) at a latching voltage given by

$$OM = \Delta V_{OUT}(forward) - \Delta V_{OUT}(reverse).$$
(21)

*OM* is positive to guarantee the right operation of SA. If it is negative,  $\Delta V_{BL}$  must increase until *OM* is positive. Therefore, the larger *OM* means the more robust SA. Figure 43 shows the histogram of *OM* when  $\Delta V_{BL}$  is 10 mV, and the number of samples is 1000. The mismatch is applied to two latch-inverters two input-stage PMOS transistors, and two *latch\_R*-NMOS transistors. The Monte Carlo simulation shows that the proposed SA provides 39% larger *OM*, and 30% less negative *OM*. Figure 44 shows the *OM* versus length mismatch and  $\Delta V_{BL}$ . The proposed SA shows more positive *OM*. This means that the proposed scheme covers larger set of mismatch conditions at the same  $\Delta V_{BL}$ .



Figure 43. Histogram of OM. (a) Conventional, and (b) Proposed SA.



Figure 44. Output margin sweep versus mismatch ratio and input voltage margin. (a) Conventional SA. (b) Proposed SA.



Figure 45. (a) Simulated SM versus mismatch types, and ratios.

## 3.5.4.2 Mismatch Effects on an SM

The mismatch effects on *SM* are shown in Fig 45(a). It is assumed that 15% length and capacitance mismatch in each transistor are mixed together to compare the effects of complex mismatch types. Then, three types of mismatches are applied together including the mismatch of *latch\_R* transistors. The simulation shows that *SM* is improved in the proposed SA from 110 mV to 80 mV by about 28 % at *mis-L<sub>MP2/MP1</sub>* =*mis-L<sub>MN2/MN1</sub>* =*mis-C<sub>2/1</sub>*=*mis-L<sub>R1R2</sub>*=15%. Figure 45(b) shows the *SM* comparison as a mismatch ratio changes from 5% to 20%. The simulation shows that an *SM* is improved by from 28% to 22%.

## 3.5.4.3 Latch Resistance Effect on SA speed

The *latch\_R* enhances *SM* sacrificing SA speed. *SM* enhancement and the speed degradation of the proposed SA are different according to *latch\_R* shown in Figure 46. *SM* is simulated assuming that there are mismatches as in Section 3. Then, *latch\_R*s are

changed by controlling the width and length of NMOSs. The simulation shows that *SM* is enhanced by from 16% to 31%, and the SA speed is delayed from 4% to 12% as the resistance changes. The small resistance value is more efficient for both *SM* and speed enhancement. *latch\_R* effect on SA without mismatches at  $\Delta V_{BL} = 300$  mV is investigated. The proposed SA operates more slowly than the conventional one by about 8 ps of 4% of SA speed, 179 ps. This is caused by *latch R* s loading effect.



Figure 46. Simulated *SM* and speed versus various *latch\_R* with mismatches.

#### <u>3.5.4.4 Latch Resistance Effect on Bit-Cell Array and SA</u>

When the speed and the power of SA are studied, they must be considered inside the bit-cell, because the speed and power degradation in SA could be trivial for that of a bit-cell array. On the contrary, the reduced *SM* for the proposed SA can enhance the speed and power considering the evaluation of a bit-cell data. Figure 47 shows the timing diagram showing  $\Delta T$  (*WL*, *S OUT*) of the proposed and the conventional SA. Because the

conventional SA requires the larger *SM* than the proposed SA, an *S\_EN* of the conventional SA is delayed to obtain the larger *SM* after the bit-cell is activated by *WL*. Moreover, the larger  $\Delta V_{BL}$  also affects the power consumption in a bit-cell array. The delay and the power effects can be different according to a bit-cell design and a bit-line capacitance.



Figure 47. Timing diagram showing  $\Delta T$  (*WL*, *S OUT*).

Figure 48(a) shows evaluation time comparison versus a bit-line capacitance. As the stacked bit-cells in one column increase, the bit-line capacitance increases. Therefore, the time to make *SM* for a safe SA increases proportionally. For the less *SM* required for the proposed SA, it shows  $\Delta T$  (*WL*, *S\_OUT*) is decreased. Figure 48(b) shows power consumption versus bit-line capacitance. Power consumption is related not only to bit-line capacitance, but also to the multiplexer (*MUX*) structure in bit-cell array. As the capacitance increases, the power reduction of the proposed SA increases proportionally, even though there is the power consumption penalty in the proposed SA. From the simulation results, it can be seen that the proposed *latch\_R*-type SA decreases the speed and power considering the bit-cell array.



Figure 48. (a)  $\Delta T$  (*WL*, *S\_OUT*) and  $\Delta T$  (*S\_EN*, *S\_OUT*), (b) Power consumption.

# 3.5.5 Test Chip Experiment and Measurement Results

Figure 49 shows a block diagram of SA test chip. In order to compare an *SM* of the conventional and the proposed SA, a control block, x-decoder, and a memory cell array are shared. The leakage current generator cell (*LEAK\_MC*) is placed on the top of the  $128 \times 3b$  MC array. For the enough leakage current to decrease the input voltage margin generated by a bit-cell on-current in a test chip, the wordline (*WL*) of a *LEAK\_MC* is raised to an appropriate voltage through a test pin (*WL\_LEAK*). After writing the opposite data into the MC array, an SRAM reads out the bit-cell data by sweeping a *WL\_LEAK*. An output *MUX* makes it possible to read out the output data through the different SA selectively for the same bit-cell data. By comparing the voltage level of a *WL\_LEAK* with which SA reads the wrong data, the robustness of SA can be investigated.

Figure 50(a) shows the micrograph of SA test chip. It is fabricated in a 0.18- $\mu$ m CMOS technology, and the size of the test chip is 226 x 276  $\mu$ m<sup>2</sup>. The measurement result of an *SM* comparison is shown in Figure 50(b). Even if the sufficient data in various supply voltages were not gathered, the proposed *latch\_R*-type SA shows the superior *SM* performance to the conventional one from 6% to 15% at different supply-voltages.



Figure 49. SA test chip schematic.



Figure 50. (a) SA test chip micrograph. (b) Measured SM versus supply voltages and  $WL\_LEAK$ .

# 3.6 Fully-gated ground 10T-SRAM bitcell in a 45-nm SOI technology 3.6.1 Introduction

In an ultra deep-submicron (UDSM) technology, it has been more difficult to acquire the large voltage difference of high capacitance bitlines ( $\Delta V_{BL}$ ) due to the increased subthreshold leakage current ( $I_{OFF}$ ) of a memory cell (MC).  $I_{OFF}$  affects  $\Delta V_{BL}$  by decreasing the precharged voltage level of a bitline (BL). Therefore, the data-dependent I<sub>OFF</sub> was investigated for a robust SRAM operation [49]. To mitigate I<sub>OFF</sub>, the single-ended (SE) 8T was proposed instead of the conventional differential-sensing (DS) 6T. By separating the read-port from the write-port, the cell latch effect on the read bitline could be minimized. However, SE-8T must sacrifices the longer BL evaluation time in addition to large dynamic power consumption accounting for the stable SE sense amplifier (SA) operation. The leakage-compromising 8T (8T-LC) was proposed to reduce the data-dependent I<sub>OFF</sub> issue [50], as shown in Figure 51. The 8T-LC is composed of 6T and I<sub>OFF</sub>-compensation transistors (M5, M6), which generate I<sub>OFF</sub> complimentarily to BL and /BL at the same time. Therefore, I<sub>OFF</sub> effect can be cancelled out irrelevant to the data pattern of inactive MCs. However, due to compensating I<sub>OFF</sub>, static leakage power increases compared to 6T. Moreover, as the number of MCs per bitline (NOB) increases, both BL and /BL are lowered below the precharged level due to additional I<sub>OFF</sub>, resulting in increased dynamic power consumption and reduced on-cell current. Some literatures addressed the advantage of 10T structure [51], [52], as shown in Figure 51. 10T can achieve  $\Delta V_{BL}$  faster than SE-8T, because the small voltage, 50 mV, is necessary for differential-sensing sense-amplifier (DS-SA). However, 10T still suffers from data-dependent I<sub>OFF</sub> mismatches caused by inactive MCs. To minimize I<sub>OFF</sub> in 10T, GND of the read-port is fully-gated by a row

decoder (10T-RGND), as shown in Figure 51 [54]. 10T-RGND forces RGND to GND only for an active MC, and V<sub>DD</sub> for inactive MCs. 6T, 8T-LC, 10T, and 10T-RGND MCs were designed in a 45-nm SOI technology, the layouts of which are shown in Fig. 1 with 1.856 ×  $1.237 \text{ um}^2$ ,  $1.856 \times 1.237 \text{ um}^2$ ,  $2.274 \times 1.237 \text{ um}^2$ , and  $2.274 \times 1.237 \text{ um}^2$ , respectively.



Figure 51. 6T, 8T-LC, 10T, and 10T-RGND MC.



 $I_{ON} \rightarrow I_{OFF2} \rightarrow I_{OFF} \rightarrow I_{ON\_weak} \rightarrow I_{OFF\_weak} \rightarrow I_{OFF\_weak/2} \rightarrow I_{OFF2} = 2 \times I_{OFF} I_{OFF\_weak/2} = 2 \times I_{OFF} \rightarrow I_{OFF\_weak/2} \rightarrow I_{OFF\_weak} \rightarrow I_{OFF\_weak/2} \rightarrow I_{$ 

| Rows=N   | MC(10)  | MC(01)  | $\Delta I$ (6T)                                                                | $\Delta I$ (8T-LC)                                                                                 | Δ <i>I</i> (10T)                                                                                                                      | ∆ <i>I</i> (10T-RGND)                                                                                           |
|----------|---------|---------|--------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|
| Case I   | N-1     | 0       | $I_{\rm ON}$ +(N-1)×( $I_{\rm OFF}$ )                                          | I <sub>ON_weak</sub> +(N-1)×(I <sub>OFF</sub> )<br>-(N-1)×(I <sub>OFF</sub> )=I <sub>ON_weak</sub> | $I_{\text{ON}} + (N-1) \times (I_{\text{OFF2}}) - (N-1) \times (I_{\text{OFF}})$<br>= $I_{\text{ON}} + (N-1) \times (I_{\text{OFF}})$ | I <sub>ON</sub> - (N-1)×(I <sub>OFF_weak</sub> )                                                                |
| Case II  | (N-1)/2 | (N-1)/2 | $I_{\rm ON}$ +(N-1)/2× $I_{\rm OFF}$<br>-(N-1)/2× $I_{\rm OFF}$ = $I_{\rm ON}$ | $I_{ON_weak}$ +(N-1)×( $I_{OFF}$ )<br>-(N-1)×( $I_{OFF}$ )= $I_{ON_weak}$                          | $I_{\rm ON}$ +(N-1)×( $I_{\rm OFF}$ )-(N-1)×( $I_{\rm OFF}$ )<br>= $I_{\rm ON}$                                                       | $I_{ON}$ -(N-1)×( $I_{OFF_weak}$ )/2<br>-(N-1)×( $I_{OFF_weak/2}$ )/2<br>= $I_{ON}$ -(N-1)×( $I_{OFF_weak}$ )/4 |
| Case III | 0       | N-1     | I <sub>ON</sub> - (N-1)×(I <sub>OFF</sub> )                                    | I <sub>ON_weak</sub> +(N-1)×(I <sub>OFF</sub> )<br>-(N-1)×(I <sub>OFF</sub> )=I <sub>ON_weak</sub> | $I_{\text{ON}} + (N-1) \times (I_{\text{OFF}}) - (N-1) \times (I_{\text{OFF}})$<br>= $I_{\text{ON}} - (N-1) \times (I_{\text{OFF}})$  | $I_{ON}-(N-1)\times(I_{OFF_{weak/2}})$<br>= $I_{ON}-(N-1)\times(I_{OFF_{weak}})/2$                              |

Figure 52. 6T, 8T-LC, 10T, and 10T-RGND MC array showing  $I_{OFF}$  effect on  $\Delta V_{BL}$ .

# 3.6.2 Leakage Current Effect on Bitline

Figure 52 shows  $I_{OFF}$  effect of 6T, 8T-LC, 10T, and 10T-RGND MC array on  $\Delta V_{BL}$  according to data-pattern. As active MC evaluates the stored data through /BL or /RBL by on-current ( $I_{ON}$ ) of the pass transistors (N4, or N8 and N10),  $I_{OFF}$  affects BL and/or /BL by lowering the precharged level. For 6T and 10T array, the effect of  $I_{OFF}$  increases proportional to the number of MC (01), which stores the opposite data to the active MC. If the sizes of the pass transistors (N3, N4, N5, and N6) are the same, the difference between  $I_{ON}$  and  $I_{OFF}$  of 6T under the  $I_{OFF}$  worst condition is given by

$$\Delta I (6T) = I_{ON} - (N-1) \times (I_{OFF}), \qquad (22)$$

where N is the number of MCs per bitline, and the numbers of MC(10) and MC(01) are 0, and N-1, respectively.

Meanwhile,  $\Delta I$  (8T-LC) under the I<sub>OFF</sub> worst condition is given by

$$\Delta I (8T-LC) = I_{ON\_weak} + (N-1) \times (I_{OFF}) - (N-1) \times (I_{OFF}) = I_{ON\_weak}, \quad (23)$$

where  $I_{ON\_weak}$  is the reduced  $I_{ON}$  as /BL is lowered below the precharged level due to the forcing  $I_{OFF}$ . 8T-LC compensates  $I_{OFF}$  of BL and /BL regardless of data pattern. However,  $I_{ON}$  is accordingly reduced due to the reduced drain-source of voltage (V<sub>DS</sub>) of N4.

Then,  $\Delta I$  (10T) under the I<sub>OFF</sub> worst condition is given by

$$\Delta I(10T) = I_{ON} + (N-1) \times (I_{OFF}) - (N-1) \times (I_{OFF2}) = I_{ON} - (N-1) \times (I_{OFF}),$$
(24)

where the size of N7, N8, N9, and N10 is twice of those of the 6T, and  $I_{OFF2}$  is assumed to be twice of  $I_{OFF}$ .

Meanwhile, for the proposed 10T-RGND, RGND of inactive MCs are forced to  $V_{DD}$  as a row-decoder derives a row-wordline (RWL) to full GND. Since drain-source voltages ( $V_{DS}$ ) of pass transistors (N7, N8, N9, and N10) are almost 0 V,  $I_{OFF}$  can be reduced to  $I_{OFF weak}$ . Therefore,  $\Delta I$  (10T-RGND) under the  $I_{OFF}$  worst condition is driven by

$$\Delta I(10T-RGND) = I_{ON} - (N-1) \times (I_{OFF \text{ weak}}), \qquad (25)$$

where  $I_{OFF\_weak}$  is much less than  $I_{OFF}$ . The  $I_{OFF}$  worst condition of 10T-RGND happens when all the inactive MCs store '01' data, while those of 6T and 10T happen when all the inactive MCs store '10' data.

#### 3.6.3 Simulation Results

Figure 53 shows  $V_{BL}$  and  $V_{/BL}$  of 6T, 8T-LC, 10T, and 10T-RGND versus NOB. MCs are designed as shown in Fig. 1, and simulated in a 45-nm SOI technology at  $V_{DD} = 1.0$  V, 125 °C, and fast corner process.  $V_{BL}$  and  $V_{/BL}$  are measured at 200-ps delay after WL or RWL is enabled. In simulation,  $\Delta V_{BL}$  becomes negative from NOB = 230, and 320 for 6T, and 10T, respectively. However, 8T-LC and 10T-RGND have positive  $\Delta V_{BL}$  until NOB = 1024. Meanwhile,  $V_{BL}$  and  $V_{/BL}$  of 8T-LC decrease more abruptly as NOB increases than those of 10T-RGND. These results indicate that the swing range of BL and /BL ( $V_{SWING}$ ) of 8T-LC is larger than that of 10T-RGND consuming more dynamic power.  $V_{SWING}$  of 6T, 8T-LC, 10T, and 10T-RGND at NOB = 1024 are 286 mV, 503 mV, 385 mV, and 118 mV, respectively. The small  $V_{SWING}$  of 10T-RGND can compensate redundant power consumption for driving RGND line on every clock cycle.

Figure 54 shows static leakage power consumption and  $\Delta V_{BL}$  of 6T, 8T-LC, 10T, and 10T-RGND versus NOB. While 10T-RGND has more transistors than 8T-LC, the total static leakage current of 10T-RGND is 5% less than 8T-LC, 17% less than 10T, but 22% larger than 6T at NOB = 1024. When the minimum  $\Delta V_{BL}$  to overcome SA offset is 50 mV, NOB of 10T-RGND is larger than 6T, 8T-LC, and 10T at  $\Delta V_{BL}$  = 50 mV by 10x, 40%, and 3.5x, respectively.



Figure 54. Static leakage power consumption and  $\Delta V_{BL}$  of 6T, 8T-LC, 10T, and 10T-RGND versus NOB.

The figure of merit (FOM) of SRAM MC is derived by area, total static leakage current ( $I_{LEAK}$ ),  $V_{SWING}$ , and NOB at  $\Delta V_{BL} = 50$  mV as

$$FOM = \frac{NoB}{Area \times I_{LEAK} \times V_{SWING}}.$$
 (26)

FOMs are calculated by simulation as 0.06, 0.2, 0.08, and 1.04 for 6T, 8T-LC, 10T, and 10T-RGND, respectively.

## **3.7 Low-Power Techniques of Sub-blocks for LP-MRSS**

#### 3.7.1 Overview

To minimize the power burden of spectrum sensing, it is desirable to reduce the power consumption of the MRSS. Therefore, the sub-blocks of the MRSS are optimized as a low power MRSS (LP-MRSS) as shown Figure 55. The LP- MRSS consists of RF front-ends, a low power digital window generator (LP-DWG), analog correlators, a fast-sweeping (F/S) frequency synthesizer, and pipeline analog-to-digital converter (ADC). RF front-ends of a low-noise amplifier (LNA), mixers, and a phase locked loop (PLL) downconvert an RF to a baseband. An analog correlator calculates a signal power of a baseband signal with a window from the LP-DWG. The results are digitized by pipeline ADCs for post-processing in an FPGA. The LP-DWG is composed of a random access memory (RAM), a digital-to-analog converter (DAC), and a low pass filter (LPF) to make an arbitrary window with variable duration for a multi-resolution spectrum sensing functionality of the LP-MRSS [55].



Figure 55. Low power MRSS block diagram.

# 3.7.2 Low-Power Analog Correlator



Figure 56. Low-power analog correlator.

Figure 56 shows an analog correlator of LP-MRSS sub-blocks. By doing correlation between the downconverted input signal and the window generated from the DWG, the signal power within the window bandwidth is calculated. An analog correlator is composed of a multiplier and an integrator. A multiplier is optimized to reduce the power consumption by considering the bandwidth requirement that is less than 10MHz. Then, an output common-mode feedback circuit and a 6-bit current DAC are added to control the output common-mode voltage and to cancel out the unwanted DC offset.

# 3.7.3 Low-Power Pipeline Analog-to-Digital Converter



Figure 57. Low-power pipeline analog-to-digital converter.

A low-power pipeline ADC shown in Figure 57 is composed of a sample-and-hold (S/H) stage and 8 pipeline stages. All the digital outputs of a pipeline ADC are merged by a digital correction block, and synchronized with a proper delay to generate 9b digital outputs finally. Then, two adjacent stages of a pipeline ADC are sharing op-amps commonly to reduce the power consumption. Therefore, 4 op-amps are used for an optimum architecture instead of 8 op-amps. Moreover, to ensure a wide dynamic range of a pipeline ADC, a novel architecture adopting complimentary inputs for dynamic comparators is employed.

# 3.7.4 Fast-Sweeping Frequency Synthesizer



Figure 58. Fast-sweeping frequency synthesizer.

The overall spectrum sensing time is one of the most important specifications of the LP-MRSS. To reduce the PLL settling time overhead in the spectrum sensing of the previous MRSS [8], [9], the F/S frequency synthesizer consisting of two identical sub-PLLs is proposed as in Figure 58. While one sub-PLL, which settled already in one channel, provides a clock into the LP-MRSS, the other sub-PLL moves to the next channel and starts settling down. Since the settling time of one sub-PLL is less than one-channel sensing time of the LP-MRSS, it can use another clock immediately without additional settling time burden of the frequency synthesizer in the next channel. As a result, the total sensing time throughout the target frequency bands is given by

$$t_{total} = \left(\frac{f_{end} - f_{start}}{f_{step}} + 1\right) \times \left[N_{AVG}\left(t_{\omega} + t_{buf}\right)\right] (\text{sec}),$$
(27)

where  $(f_{end} - f_{start})$ ,  $f_{step}$ ,  $N_{AVG}$ ,  $t_{\omega}$ , and  $t_{buf}$  is the target frequency band range, the frequency hopping step, the number of sensing per one channel, the window duration, and a margin between consecutive windows, respectively. For example, if a 1-MHz window is used with 50 averages to sense the frequency bands from 512MHz to 698MHz,  $t_{\omega}$  is 1µs,  $t_{buf}$  is 0.15 µs, and  $N_{AVG}$  is 50, then the total sensing time is 10.7 ms.

The reduction of the total sensing time is achieved using two sub-PLLs, which results in sacrificing the power consumption compared to the single-PLL architecture. However, by adopting the integer-N PLL instead of the fractional-N PLL, the F/S frequency synthesizer could reduce the dynamic power consumption of sigma-delta modulator and digital circuitry.

Meanwhile, the F/S frequency synthesizer adopted the saturated-type ring voltage controlled oscillator (VCO) [53] instead of an LC-VCO for the wide tuning range and small area. Therefore, phase noise is degraded compared to the PLL in the previous MRSS utilizing an LC-VCO from -115 dBc/Hz to -107 dBc/Hz at 500 kHz frequency offset. Generally, phase noise of the LO signal make interferers folded into the in-band when it is mixed with the input RF signal by the reciprocal mixing, which degrades the jammer selectivity. However, because phase noise of the F/S frequency synthesizer is much less than the jammer selectivity of 35dB at 500 kHz offset, the sensitivity degradation due to phase noise is negligible.

#### **3.8 Measurement Results of LP-MRSS**

Figure 59(a) shows the multi-resolution *cos*<sup>4</sup> windows generated from the LP-DWG. The output spectrum of the LO signal from the DAC in the LP-DWG shows a spurious-free dynamic range (SFDR) of 45 dB. Any spurious tone at the LO signal makes thermal noise and/or interferers folded into the in-band by the reciprocal mixing, thus creating an error in the signal detection and reducing the signal selectivity of the LP-MRSS. However, an SFDR of 45 dB is larger than the jammer rejection ability of the signal path, and noise increment due to the reciprocal mixing is negligible. Therefore, even the largest spurious tone does not severely degrade the sensitivity and selectivity.

Figure 59(b) shows the power reduction of the proposed DTB, which reduces the power consumption of the LP-DWG by from 12% to 15% at different clock frequencies with 5% area and speed overhead. The power reduction is achieved not only in a bit-cell array as shown in (4), but also in a sense amplifier array of MSBs.

Figure 59(c) shows the power reduction of a diode-connected low-swing signaling scheme of the LP-DWG. The SR buffer is more effective in a low supply voltage when an  $I_{short}$  is comparable to a dynamic current driving the SR buffer.



Figure 59. Measured results of LP-DWG. (a) Multi-resolution  $cos^4$  window. Power reduction with (b) a DTB, (c) a low-swing signaling scheme, and an SR buffer.



Figure 59 continued.

Figure 60 shows the measured spectrum sensing feature of the LP-MRSS. A -39 dBm OFDMA-modulated signal and a -54 dBm one-tone signal are used as inputs to both a spectrum analyzer and the LP-MRSS. Figure 60(a) is captured from a spectrum analyzer
with a resolution bandwidth of 200 kHz and the averaging of 10. Figure 60(b) is measured by the LP-MRSS with a window frequency of 75 kHz, meaning the detection bandwidth of about 150 kHz. The LP-MRSS shows the competitive spectrum sensing functionality compared to the commercial spectrum analyzer.

Figure 61 shows the measured in-band detection result of the LP-MRSS. The CW input signal is detected with a 100-kHz window at a 624-MHz PLL frequency while sweeping the input power. The LP-MRSS can detect the input power from -72.5 dBm to -48.5 dBm successfully within ±1-dB error from the ideal linear response. Below -72.5 dBm of the input power, the detected power is too close to the noise level. Above -48.5 dBm, the correlator block of the LP-MRSS starts to saturate. The detection range can be extended by controlling gains of the LNA, mixer, and VGA blocks.



Figure 60. Measured spectrum sensing feature. (a) Spectrum analyzer. (b) LP- MRSS.



Figure 60 continued.



Figure 61. Measured in-band detection range with 100-kHz window.

Table IX shows the power comparison of the LP-MRSS versus the previous MRSS. Since the LP-DWG adopts both low-power architectures shown in Section II, the power consumption is reduced by about 30% compared to the previous DWG. Additionally, the output buffers which were inserted for probing the sub-blocks are optimized for the power reduction. Even if the ADC blocks are additionally integrated, the power consumption of the LP-MRSS is 122 mW at a 1.8-V supply voltage, and 19.2-MHz clock frequency showing 32% less power consumption than the previous MRSS.

|                                       |                      |                                | Proposed                  | ISSCC'08[1]               |
|---------------------------------------|----------------------|--------------------------------|---------------------------|---------------------------|
| Technology                            |                      |                                | CMOS 0.18µm               | CMOS 0.18µm               |
| Die Size                              |                      |                                | 4.0 x 2.3 mm <sup>2</sup> | 4.8 x 2.4 mm <sup>2</sup> |
| Power<br>(@f <sub>cik</sub> =19.2MHz) | DWG                  | RAM<br>(11Kb)                  | 1.55mW                    | 5.4mW                     |
|                                       |                      | DAC<br>(11b)                   | 4.0mW                     | 4.0mW                     |
|                                       |                      | LPF<br>(6 <sup>th</sup> order) | 9.5mW                     | 9.5mW                     |
|                                       | Analog Correlator    |                                | 8.7mW                     | 14.4mW                    |
|                                       | Pipeline ADC<br>(9b) |                                | 7.9mW                     | N.A.                      |
|                                       | VCO + PLL            |                                | 34.0mW                    | 61.2mW                    |
|                                       | RF + Etc.            |                                | 56.9mW                    | 85.5mW                    |
|                                       | Total                |                                | 122.6mW                   | 180mW                     |

TABLE IX Measured LP-MRSS Power Comparison



Figure 62. LP-MRSS chip micrograph.

# **CHAPTER IV**

### **Conclusions and Future Works**

#### 4.1 Technical contribution and impact of the dissertation

In this dissertation, the SRAM-based on-chip arbitrary waveform generator (AWG) is developed for two types of analog signal processing applications: a multi-resolution spectrum sensing (MRSS) and a matched filter (MF). The AWG utilizes a power-efficient technique, in which an SRAM merges an address generator into the x-decoder. The power consumption of the SRAM and address generator is analyzed according to the toggling rate of the addresses, and applied to a multi-resolution and multi-waveform spectrum sensing process. The proposed low-power technique reduces power by 18% of SRAM and address generator at a 1.8-V supply voltage. The AWGs are fabricated in a 0.18-µm CMOS technology and demonstrate a cos<sup>4</sup> window, a chirp signal, and a Daubechies wavelet with a 45-dBc spurious-free dynamic range (SFDR) and a cross-correlation factor of 0.96 to 0.988 with ideal signals.

Moreover, the low-power techniques for the low-power AWG are proposed in the Chapter III. A self-deactivated data transition bit (DTB) structure reduces the power consumption in SRAM by limiting the toggling of the internal bitcell in the operation of the AWG. It shows about 12% to 15% power reduction compared to the conventional DTB structure. Then, the diode-connected low-signaling scheme with a short-current ( $I_{short}$ ) reduction buffer is proposed to reduce the power consumption in the long loading lines between SRAM and DAC. It shows about 3% power reduction of the AWG.

The charge-recycling scheme is proposed to be applied in an asynchronous logic utilizing the proposed push-pull level converter. The proposed *ACR\_PPLC* scheme can be used in the low-power digital block where the system timing is not critical, but the interconnect lines between blocks are long. In comparison with the conventional inverter driver with an inverter receiver (*ID\_IR*), the synchronous charge-recycling with a clocked-inverter receiver (*SCR\_CIR*), and the asynchronous charge-recycling with an inverter receiver (*ACR\_IR*), *ACR\_PPLC* reduces power consumption most effectively by limiting short circuit power consumption in the receiver block while achieving charge-recycling in simulation.

Then, this dissertation addresses the robust latch-type sense amplifier using an adaptive latch resistance (*latch\_R*-SA). The *latch\_R*-SA shows an improved sense amplifier margin (*SM*) at the different mismatches by simulation for the negligible speed and power penalty. The reduced *SM* of SA can decrease the speed and power of a MC array according to the different bit-line capacitance loading, bit-cell design, and *MUX* structure of a MC array. The test chip which models bit-cell leakage current in a 0.18-µm CMOS technology shows the *SM* enhancement by from 6% to 12% at the different supply voltages. The proposed *latch\_R*-SA can be effectively applied to the double-sensing (DS) bit-cell structure in deep-submicron technologies where the robust SA is essential for the large bit-cell leakage current and on-die mismatches.

As  $I_{OFF}$  affects  $\Delta V_{BL}$  more aggressively in an UDSM, MC is crucial for a robust SRAM design. In this dissertation, the fully-gated ground 10T (10T-RGND) is proposed to limit  $I_{OFF}$  dynamically. While 10T-RGND occupies larger area due to the additional four transistors than 6T, simulation shows that 10T-RGND has less  $V_{SWING}$ , and larger NOB at

 $\Delta V_{BL} = 50 \text{ mV}$  than 6T, 8T-LC, 10T. The FOM of 10T-RGND is larger than 6T, 8T-LC, and 10T by 17x, 5.2x, and 13x, respectively, which indicate that 10T-RGND can be among the best candidates for the differential-sensing MCs in an UDSM.

Finally, the several low-power techniques are applied to the low-power MRSS (LP-MRSS) with the low-power analog correlator, the low-power pipeline ADC, and the fast-sweeping frequency synthesizer. The LP-MRSS IC is developed as an opportunistic spectrum sensing technique for the cognitive radio application through the whole white space in UHF bands. The LP-MRSS shows a successful spectrum sensing functionality of a 24 dB dynamic range with 32% less power than the previous MRSS [8], [9]. The LP-MRSS is fabricated in a 0.18-µm CMOS technology with 9.2 mm<sup>2</sup> area. The LP-MRSS techniques will be useful for the advance of CR applications in the future.

### 4.2 Scope of future research

In this dissertation, the research has focused on the implementation of the low-power SRAM circuits, which are the main block of AWG. The proposed memory bit-cell, sense-amplifier, and the low-power architecture could be applied to the low-power AWG in UDSM technology. Moreover, the proposed low-power techniques can be fully implemented in the AWG for the analog signal processing applications: the multi-resolution spectrum sensing (MRSS) for the cognitive radio (CR) application and the analog matched-filter (AMF) for the radar application. Those applications have been proposed as a key functional block in the emerging analog processing techniques. Therefore, the proposed low-power techniques could be used as a key technique in the challenging signal processing applications.

# References

- Kyutae Lim, and Joy Laskar, "Emerging Opportunities of RF IC/System for Future Cognitive Radio Wireless Communications," in *Radio and Wireless Symp.*, 2008 *IEEE*, 2008, pp. 703-706.
- [2] Mike Muller, "A Embedded Processing at the heart of Life and Style," Solid-State Circuits Conference, 2008. ISSCC 2008. Digest of Technical Papers. IEEE International, 2008.
- [3] S. Haykin, "Cognitive radio: brain-empowered wireless communications," *Selected Areas in Communications, IEEE Journal on*, vol. 23, pp. 201-220, 2005.
- [4] Shared Spectrum Company, "Dynamic Spectrum Sharing". [Online]. Available: http://www.sharedspectrum.com/inc/content/press/Dynamic\_Spectrum\_Sharing\_IE
   EE 1 25 2005.ppt.
- [5] J. Mitola, III, "Cognitive radio for flexible mobile multimedia communications," in Mobile Multimedia Communications, 1999. (MoMuC '99) 1999 IEEE International Workshop on, 1999, pp. 3-10.
- [6] Federal Communications Commission, "Notice of Proposed Rulemaking and Order (NPRM 03-322): Facilitating Opportunities for Flexible, efficient, and Reliable Spectrum Use Employing Cognitive Radio Technologies," ET Docket No. 03-108, Dec 2003.
- [7] M. McHenry, "Frequency Agile Spectrum Access Technologies," in *FCC Workshop* on Cognitive Radio, May, 2003.
- [8] J. Park, T. Song, J. Hur, S. M. Lee, J. Choi, K. Kim, K. Lim, C.-H. Lee, H. Kim, J. Laskar, "A Fully Integrated UHF Receiver with Multi-Resolution Spectrum Sensing

(MRSS) Functionality for IEEE 802.22 Cognitive Radio Applications," in *Solid-State Circuits Conference, 2008. ISSCC 2008. Digest of Technical Papers. IEEE International*, 2008, pp. 526-633.

- [9] J. Park, T. Song, J. Hur, S. M. Lee, J. Choi, K. Kim, K. Lim, C.-H. Lee, H. Kim, J. Laskar, "A Fully Integrated UHF-Band CMOS Receiver with Multi-Resolution Spectrum Sensing (MRSS) Functionality for IEEE 802.22 Cognitive Radio Applications," *IEEE J. Solid-State Circuits*, vol.44, no.1, pp.258-268, Jan. 2009.
- [10] W. Hu, et al., "Arbitrary Waveform Generator Based on Direct Digital Frequency Synthesizer," in *IEEE International Symp. on Electronic Design, Test and Applications*, pp.567-570, Jan. 2008.
- [11] X. Liu, et al., "An Arbitrary Waveform Generator for Switch-Linear Hybrid Flexible Waveform Power Amplifier," in *IEEE Conference on Industrial Electronics and Applications*, pp.2732-2736, May 2007.
- [12] S. Prasad, S. K. Sanyal, "Design of Arbitrary Waveform Generator based on Direct Digital Synthesis Technique using Code Composer Studio Platform," in *International Symp. on Signals, Circuits and Systems,* pp.1-4, July 2007.
- [13] S. Max, "Direct AWG sine wave synthesis with fixed clock frequency," in Instrumentation and Measurement Tech. Conference, pp.230-234, 2000.
- [14] T. L, Y. Qiu, "An approach to the single-chip arbitrary waveform generator (AWG)," in *International Conference on ASIC*, pp.506-509, 2001.
- [15] A. Machida, et al., "Arbitrary waveform generator," in *IEEE Instrumentation and Measurement Technology Conference*, pp.251-257, May 1991.

- [16] D. Caviglia, et al., "Design and Construction of an Arbitrary Waveform Generator," *IEEE Transactions on Instrumentation and Measurement*, vol.32, no.3, pp.398-403, Sept. 1983.
- [17] J. Langlois, D. Al-Khalili, "Low power direct digital frequency synthesizers in 0.18 µm CMOS," in *Proc. IEEE Custom Integrated Circuits Conf.*, Sept. 2003, pp. 283-286.
- [18] Y. S. Kim, S.-M. Kang, "A High Speed Low-Power Accumulator for Direct Digital Frequency Synthesizer," in *IEEE MTT-S Int. Microw. Symp. Dig.*, 2006, pp.502-505.
- [19] D. D. Caro, N. Petra, A. Strollo, "Reducing Lookup-Table Size in Direct Digital Frequency Synthesizers Using Optimized Multipartite Table Method," *IEEE Trans. Circuits Syst. I*, vol.55, no.7, pp.2116- 2127, Aug. 2008.
- [20] Y. Lo, H. Chien, "Switch-Controllable OTRA-Based Square/ Triangular Waveform Generator," *IEEE Trans. Circuits and Syst. II Exp. Briefs*, vol.54, no.12, pp.1110-1114, Dec. 2007.
- [21] W. S. Chung et al., "Triangular/ squarewave generator with independently controllable frequency and amplitude," *IEEE Trans. Instrum. Meas.*, vol. 54, no. 2, pp. 105–109, Feb. 2005.
- [22] C. L. Hou, H. C. Chien, and Y. K. Lo, "Squarewave generators employing OTRAs," *Proc. IEE Circuits Devices Syst.*, vol. 152, pp. 718–722, Dec. 2005.
- [23] J. Ryckaert, "Carrier-based UWB impulse radio: simplicity, flexibility, and pulser implementation in 0.18-micron CMOS," in *IEEE Int. Conf. on UWB*, pp. 432-437, Sept. 2005.

- [24] Y. Zhu, J.D. Zuegel, J.R. Marciante, H. Wu, "Distributed Waveform Generator: A New Circuit Technique for Ultra-Wideband Pulse Generation, Shaping and Modulation," *IEEE J. Solid-State Circuits*, vol.44, no.3, pp.808-823, March 2009.
- [25] C. Lewis, H. Owen, D. Abi, J. Hecker, J. Sulzen, "Multi-waveform radar for ice sheet measurements and classroom demonstration," in *IEEE Int. Geoscience and Remote Sensing Symp.*, (IGARSS 2007), pp.2202-2205, July 2007.
- [26] W. Hu, C. Lee, X. Wang, "Arbitrary Waveform Generator Based on Direct Digital Frequency Synthesizer," in *IEEE Int. Symp. on Electronic Design, Test and Appl.*, pp.567-570, Jan. 2008.
- [27] X. Liu, S. Li, Y. Shi, "An Arbitrary Waveform Generator for Switch-Linear Hybrid Flexible Waveform Power Amplifier," in *IEEE Conf. on Indust. Electronics and Appl.*, pp.2732-2736, May 2007.
- [28] S. Prasad, S. K. Sanyal, "Design of Arbitrary Waveform Generator based on Direct Digital Synthesis Technique using Code Composer Studio Platform," in *Int. Symp.* on Signals, Circuits and Systems, pp.1-4, July 2007.
- [29] S. Max, "Direct AWG sine wave synthesis with fixed clock frequency," in Instrument. and Meas. Tech. Conf., pp.230-234, 2000.
- [30] T. L, Y. Qiu, "An approach to the single-chip arbitrary waveform generator (AWG)," in *Int Conf. on ASIC*, pp.506-509, 2001.
- [31] A. Machida, M. Hasegawa, I. Koga, "Arbitrary waveform generator," in *IEEE Instrum. and Meas. Tech. Conf.*, pp.251-257, May 1991.

- [32] D. Caviglia, A. De Gloria, G. Donzellini, G. Parodi, D. Ponta, "Design and Construction of an Arbitrary Waveform Generator," *IEEE Trans. Instrum. Meas*, vol.32, no.3, pp.398-403, Sept. 1983.
- [33] S. Ricci, L. et al., "Multichannel FPGA-based arbitrary waveform generator for medical ultrasound," *Electronics Letters*, vol.43, no.24, Nov. 2007.
- [34] E. Chuang, S. Hensley, and K. Wheeler, "A highly capable arbitrary waveform generator for next generation radar systems," in *IEEE Aerospace Conf.*, pp. 7, 2006.
- [35] L. Villa, et al., "Dynamic zero compression for cache energy reduction," in *IEEE/ACM MICRO-33*, pp.214-220, 2000.
- [36] V.G. Moshnyaga, "Reducing energy dissipation of frame memory by adaptive bit-width compression," *IEEE Trans. on Circuits and Systems for Video Technology* 2002, pp. 713-718, 2002.
- [37] M. Ferretti, P.A. Beerel, "Low swing signaling using a dynamic diode-connected driver," *Proc. ESSCIRC*, pp. 369-372, Sept. 2001.
- [38] B. Zhang, et al., "A New Level Shifter with Low Power in Multi-Voltage System," in *ICSICT '06*, pp.1857-1859, 2006.
- [39] J.C.G. Montesdeoca, J.A. Montiel-Nelson, S. Nooshabadi, "MOS Driver-Receiver Pair for Low-Swing Signaling for Low Energy On-Chip Interconnects," *IEEE TVLSI*, vol.17, no.2, pp.311-316, 2009.
- [40] K. Usami, et al., "Automated low-power technique exploiting multiple supply voltages applied to a media processor," *IEEE J. Solid-State Circuits*, vol.33, no.3, pp.463-472, 1998.

- [41] R. Puri, L. Stok, J. Cohn, D. Kung, D. Pan, D. Sylvester, A. Srivastava, S. Kulkarni,
  "Pushing ASIC performance in a power envelope," *Proc. Design Automation Conference*, pp. 788-793, June 2003.
- [42] S.F. Al-Sarawi, "Low power Schmitt trigger circuit," *Electronics Letters*, vol.38, no.18, pp. 1009-1010, 2002.
- [43] Y. Byung-Do, K. Lee-Sup, "A low-power charge-recycling ROM architecture," *IEEE TVLSI*, vol.11, no.4, pp. 590-600, 2003.
- [44] J.P. Uyemura, "CMOS Logic Circuit Design"' (Kluwer, Boston, MA, 1999).
- [45] B. Wicht, T. Nirschl, D. Schmitt-Landsiedel, "Yield and speed optimization of a latch-type voltage sense amplifier," *IEEE J. Solid-State Circuits*, vol.39, no.7, pp. 1148-1158, 2004.
- [46] R. Singh, N. Bhat, "An offset compensation technique for latch type sense amplifiers in high-speed low-power SRAMs," *IEEE TVLSI*, vol.12, no.6, pp. 652-657, 2004.
- [47] S. Lee, Taejoong Song, C. Cho, K. Lim, and J. Laskar, "Enhanced input range dynamic comparator for a pipeline analog-to-digital converter (ADC)," *Electronics Letters*, vol.45, no.14, pp.728 -730, 2008.
- [48] A. Natarajan, V. Shankar, A. Maheshwari, W. Burleson, "Sensing design issues in deep submicron CMOS SRAMs," *Proc. VLSI, IEEE Computer Society Annual Symposium on*, pp. 42-45, 2005.
- [49] L. Chang, L., et al., "Stable SRAM cell design for the 32 nm node and beyond," *IEEE Symp. on VLSI Technology*, pp.128-129, 2005.
- [50] A. Alvandpour, et al., "Bitline leakage equalization for sub-100nm caches," *Proc.* 29<sup>th</sup> European Solid-State Circuit Conf., pp. 401-404, 2003.

- [51] H. Noguchi, H., et al. "Which is the best dual-port SRAM in 45-nm process technology? — 8T, 10T single end, and 10T differential —," *IEEE Int. Conf. on Integrated Circuit Design and Technology and Tutorial*, pp. 55-58, 2008.
- [52] N. Shibata, N., et al., "A 0.5-V 25-MHz 1-mW 256-kb MTCMOS/SOI SRAM for solar-power-operated portable personal digital equipment - sure write operation by using step-down negatively overdriven bitline scheme," *IEEE JSSC*, vol.41, no.3, pp. 728-742, 2006.
- [53] J. Choi, et al., "High multiplication factor capacitor multiplier for an on-chip PLL loop filter," *Electronics Letters*, vol.45, no.5, pp.239-240, 2009.
- [54] Taejoong Song, Stephen Kim, Kyutae Lim, and Joy Laskar, "Fully-gated ground10T-SRAM bitcell in a 45-nm SOI technology," *Electronics Letters*, vol. 46, no. 7, April 2010.
- [55] Taejoong Song, J. Park, S. M. Lee, J. Choi, K. Kim, C.-H. Lee, K. Lim, and J. Laskar,
  "A 122 mW Low Power Multi-Resolution Spectrum Sensing IC Sensing IC with Self-Deactivated Partial Swing Techniques," *Circuits and Systems II: Express Briefs, IEEE Transactions on*, vol.57, no.3, pp.188-192, March 2010.

# **Publications and Patents**

- [1] Taejoong Song, J. Park, S. M. Lee, J. Choi, K. Kim, C.-H. Lee, K. Lim, and J. Laskar, "A 122 mW Low Power Multi-Resolution Spectrum Sensing IC Sensing IC with Self-Deactivated Partial Swing Techniques," *Circuits and Systems II: Express Briefs, IEEE Transactions on*, vol.57, no.3, pp.188-192, March 2010.
- [2] Taejoong Song, Stephen Kim, Kyutae Lim, and Joy Laskar, "Fully-gated ground10T-SRAM bitcell in a 45-nm SOI technology," *Electronics Letters*, vol. 46, no. 7, April 2010.
- [3] Taejoong Song, S. M. Lee, J. Park, J. Hur, M. Lee, K. Kim, C.-H. Lee, K. Lim, and Joy Laskar, "A Fully-Integrated SRAM-Based On-Chip Arbitrary Waveform Generator for Analog Signal Processing," submitted to *IEEE Transactions on Instrumentation and Measurement*.
- [4] J. Char, Taejoong Song, C. Cho, M. Ahn, C.-H. Lee, and Joy Laskar, "Low-Power CMOS Antenna-Switch Driver Using Shared-Charge Recycling Charge Pump," submitted to *IEEE Transaction On Microwave Theory And Techniques*.
- [5] **Taejoong Song**, S. M. Lee, J. Park, K. Lim, and J. Laskar, "A Fully-Integrated Arbitrary Waveform Generator for Analog Matched Filter," in *Microwave Conference, 2008. APMC 2008. Asia-Pacific*, 2008.
- [6] Taejoong Song, S. M. Lee, J. Choi, S. Kim, G. Kim, K. Lim, and J. Laskar, "A Robust Latch-Type Sense Amplifier Using Adaptive Latch Resistance," accepted in *Int. Conf. on Integrated Circuit Design & Technology (ICICDT'10)*, June 2010.

- [7] Taejoong Song, K. Lim, G. Kim, I. Son, and J. Laskar, "Bit Cell Leakage-Aware SRAM Sense Amplifier Activation Schemes," in *Int. Conf. on Memory Technology* and Design 2007 (ICMTD '07), May 2007.
- [8] Sang Min Lee, Taejoong Song, J. Park, C. Cho, S. An, K. Lim, and Joy Laskar, "A CMOS Integrated Analog Pulse Compressor for MIMO Radar Applications," accepted in *IEEE Transaction On Microwave Theory And Techniques*, Feb. 2010.
- [9] J. Park, Taejoong Song, J. Hur, S. M. Lee, J. Choi, K. Kim, K. Lim, C.-H. Lee, H. Kim, and J. Laskar, "A Fully Integrated UHF-band CMOS Receiver with Multi-Resolution Spectrum Sensing (MRSS) Functionality for IEEE 802.22 Cognitive Radio Applications," *Solid-State Circuits, IEEE Journal of*, vol.44, no.1, pp.258-268, Jan. 2009.
- [10] S. M. Lee, Taejoong Song, C. Cho, K. Lim, and Joy Laskar, "Enhanced input range dynamic comparator for a pipeline analog-to-digital converter (ADC)," *Electronics Letters*, vol.45, no.14, pp.728 -730, July 2009.
- [11] J. Park, Taejoong Song, J. Hur, S. M. Lee, J. Choi, K. Kim, J. Lee, K. Lim, C.-H. Lee, H. Kim, and J. Laskar, "A Fully Integrated UHF Receiver with Multi-Resolution Spectrum Sensing (MRSS) Functionality for IEEE 802.22 Cognitive Radio Applications," in *Solid-State Circuits Conference, 2008. ISSCC 2008. Digest of Technical Papers. IEEE International*, 2008, pp. 526-633.
- [12] Stephen T. Kim, Taejoong Song, J. Choi, F. Bien, K. Lim, and Joy Laskar, "Semi-Active High-Efficient CMOS Rectifier for Wireless Power Transmission," accepted in *Radio Frequency Integrated Circuits Symposium, 2010. RFIC 2010. IEEE*, June 2010.

- [13] Stephen T. Kim, J. Choi, S. Beck, Taejoong Song, K. Lim, and Joy Laskar, "SubVT Current Mode Matrix Determinant Computation for Analog Signal Processing," accepted in *ISCAS 2010*.
- [14] S. M. Lee, Taejoong Song, J. Park, K. Lim, and J. Laskar "Analog Pulse Compressor for Radar System", in *EuRAD 2008*, Oct. 2008.
- [15] J. Park, K.-W. Kim, Taejoong Song, S. M. Lee, J. Hur, K. Lim, and J. Laskar, "A Cross-layer Cognitive Radio Testbed for the Evaluation of Spectrum Sensing Receiver and Interference Analysis," in *CrownCom 2008*.
- [16] J. Park, Y. Hur, Taejoong Song, K. Kim, J. Lee, K. Lim, C. H. Lee, H. S. Kim, and J. Laskar, "Implementation Issues of A Wideband Multi-Resolution Spectrum Sensing (MRSS) Technique for Cognitlve Radio (CR) Systems," in *Cognitive Radio Oriented Wireless Networks and Communications, 2006. 1st International Conference on,* 2006.
- [17] Taejoong Song, J. Park, Y. Hur, K. Lim, C.-H. Lee, J. Lee, K. Kim, S. Lee, H. Kim, and Joy Laskar, "SYSTEMS, METHODS, AND APPARATUSES FOR DIGITAL WAVELET GENERATORS FOR MULTI-RESOLUTION SPECTRUM SENSING OF COGNITIVE RADIO APPLICATIONS," US 2008/0034171 A1, Feb., 2008.
- [18] J. Park, Taejoong Song, K. Lim, C.-H. Lee, J. Lee, K. Kim, S. Lee, H. Kim, and Joy Laskar, "SYSTEMS, METHODS, AND APPARATUSES FOR A LONG DELAY GENERATION TECHNIQUE FOR SPECTRUM- SENSIGN OF COGNITIVE RADIOS," US 2008/0024336 A1, Jan. 2008.

### Vita

Taejoong Song was born in Seoul, Korea, in 1971. He received the B.S, M.S degrees in electrical engineering from Hanyang University in 1995, and 1997, respectively. From 1997 to 2005, he was with Samsung Electronics Company, Ltd, Yongin, Korea, working on the development of embedded SRAM memory as a senior engineer. In 2005, he joined the Microwave Applications Group (MAG), Georgia Institute of Technology. His research interest includes CMOS digital and analog circuit design, especially low leakage and low power circuit techniques. To date, he has authored and co-authored 7 journals, 10 conferences, and 2 patents.