### University of Windsor Scholarship at UWindsor

**Electronic Theses and Dissertations** 

Theses, Dissertations, and Major Papers

Fall 2021

### Design and Implementation of Low Power SRAM Using Highly Effective Lever Shifters

Neda Rezaei University of Windsor

Follow this and additional works at: https://scholar.uwindsor.ca/etd

Part of the Electrical and Computer Engineering Commons

#### **Recommended Citation**

Rezaei, Neda, "Design and Implementation of Low Power SRAM Using Highly Effective Lever Shifters" (2021). *Electronic Theses and Dissertations*. 8886. https://scholar.uwindsor.ca/etd/8886

This online database contains the full-text of PhD dissertations and Masters' theses of University of Windsor students from 1954 forward. These documents are made available for personal study and research purposes only, in accordance with the Canadian Copyright Act and the Creative Commons license—CC BY-NC-ND (Attribution, Non-Commercial, No Derivative Works). Under this license, works must always be attributed to the copyright holder (original author), cannot be used for any commercial purposes, and may not be altered. Any other use would require the permission of the copyright holder. Students may inquire about withdrawing their dissertation and/or thesis from this database. For additional inquiries, please contact the repository administrator via email (scholarship@uwindsor.ca) or by telephone at 519-253-3000ext. 3208.

# Design and Implementation of Low Power SRAM Using Highly Effective Lever Shifters

By

Neda Rezaei

A Dissertation

Submitted to the Faculty of Graduate Studies through the Department of Electrical and Computer Engineering in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy at the University of Windsor

Windsor, Ontario, Canada

2021

©2021 Neda Rezaei

# Design and Implementation of Low Power SRAM Using Highly Effective Lever Shifters

By

Neda Rezaei

APPROVED BY:

M. M. Jamali, External Examiner University of Texas Permian Basin

> H. Wu University of Windsor

> B. Balasingam University of Windsor

> A. Jaekel University of Windsor

M. Mirhassani, Advisor University of Windsor

September 1, 2021

# Declaration of Co-Authorship/Previous Publication

#### I. Co-Authorship

I hereby declare that this thesis incorporates material that is result of joint research. All chapters this thesis were completed under the supervision of Dr. Mitra Mirhassani. In all cases, the key ideas, primary contributions, experimental designs, data analysis, interpretation, statistical analysis, graphing results, and writing, were performed by the author. The contribution of my supervisor (the co-author) was primarily through the provision of checking and comments on the literature review, mathematical derivations, systems architectures, providing feedback on refinement of ideas, editing of the manuscript, and advice on selecting peer reviewed journals for publication.

I am aware of the University of Windsor Senate Policy on Authorship and I certify that I have properly acknowledged the contribution of other researchers to my thesis, and have obtained written permission from the supervisor to include the above material in my thesis.

I certify that, with the above qualification, this thesis, and the research to which it refers, is the product of my own work.

#### **II.** Previous Publication

This thesis includes two original papers that have been previously published for publication in peer reviewed journals and conferences, as follows:

| Journal Papers                                                | Status    |
|---------------------------------------------------------------|-----------|
| Rezaei, N. and Mirhassani, M., 2021. An Efficient High        | Published |
| Speed and Low Power Voltage-Level Shifter.                    |           |
| AEU-International Journal of Electronics and Communications,  |           |
| p.153857.                                                     |           |
| Rezaei, N. and Mirhassani, M., 2021. Ultra low-power          | Published |
| negative DC voltage generator based on a proposed level       |           |
| shifter and voltage reference. Microelectronics Journal, 113, |           |
| p.105087.                                                     |           |

I certify that I have obtained a written permission from the copyright owners to include the above published materials in my thesis. I certify that the above material describes work completed during my registration as a graduate student at the University of Windsor.

#### III. General

I declare that, to the best of my knowledge, my thesis does not infringe upon anyone's copyright nor violate any proprietary rights and that any ideas, techniques, quotations, or any other material from the work of other people included in my thesis, published or otherwise, are fully acknowledged in accordance with the standard referencing practices. Furthermore, to the extent that I have included copyrighted material that surpasses the bounds of fair dealing within the meaning of the Canada Copyright Act, I certify that I have obtained a written permission from the copyright owners to include such materials in my thesis.

I declare that this is a true copy of my thesis, including any final revisions, as approved by my thesis committee and the Graduate Studies office, and that this thesis has not been submitted for a higher degree to any other University or Institution.

#### Abstract

The explosive growth of battery-operated devices has made low-power design a priority in recent years. In high performance Systems-on-Chip, leakage power consumption has become comparable to the dynamic component, and its relevance increases as technology scales. These trends are even more evident for SRAM memory devices since they are a dominant source of standby power consumption in low-power application processors. The on-die SRAM power consumption is particularly important for increasingly pervasive mobile and handheld applications where battery life is a key design and technology attribute. In the SRAM-memory design, SRAM cells also comprise the most significant portion of the total chip. Moreover, the increasing number of transistor in the SRAM memories and the MOSs' increasing leakage current in the scaled technologies have turned the SRAM unit a power hungry block for both dynamic and static viewpoints. Although the scaling of the supply voltage enables low-power consumption, the SRAM cells' data stability become a major concern. Thus, the reduction of SRAM leakage power has become a critical research concern.

To address the leakage power consumption in high-performance cache memories, a stream of novel integrated circuit and architectural level techniques are proposed by researchers including: leakage-current management techniques, cell array leakage reduction techniques, bitline leakage reduction techniques and leakage current compensation techniques. The main goal of this work was to improve the cell array leakage reduction techniques in order to minimize the leakage power for SRAM memory design in the low-power applications.

This study performs the body biasing application to reduce leakage current as well. To adjust the NMOSs' threshold voltage and consequently leakage current, a negative DC voltage could be applied to their body terminal as a second gate. As a result, in order to generate a negative DC voltage, this study proposes a negative voltage reference that includes a trimming circuit and a negative level shifter. These enhancements are employed to a 10kb SRAM memory operating at 0.3V in a  $65nm \ CMOS$  process.

### Acknowledgments

I would like to take this opportunity to express my extreme gratitude to my research supervisor, Dr. Mitra Mirhassani. I benefited from her advice at many stages in the course of this research project. Her positive outlook and confidence in my research inspired me and gave me confidence in my hard time of research. I also would like to appreciate all her patient when I was confused, and express my thanks for all her spiritual and financial supports during these years of research.

### Table of Contents

| D            | eclar | ation of Co-Authorship / Previous Publication | iii          |
|--------------|-------|-----------------------------------------------|--------------|
| $\mathbf{A}$ | bstra | ict                                           | $\mathbf{v}$ |
| A            | ckno  | wledgments                                    | vii          |
| Li           | st of | Tables                                        | xii          |
| Li           | st of | Figures                                       | xiii         |
| 1            | Intr  | roduction and Motivation                      | 1            |
|              | 1.1   | introduction                                  | 1            |
|              | 1.2   | Motivation                                    | 4            |
|              | 1.3   | Objectives                                    | 5            |
| <b>2</b>     | Lite  | erature Review                                | 6            |
|              | 2.1   | Review of Previous Works                      | 6            |
|              | 2.2   | Reducing Bit Line Swing Voltage               | 10           |
|              |       | 2.2.1 Bit Line Charge Recycling               | 12           |
|              |       | 2.2.2 Hierarchical Bit Line                   | 13           |
|              | 2.3   | Dual-Rail SRAM Architecture                   | 16           |
|              | 2.4   | Drowsy SRAM Cell                              | 17           |
|              | 2.5   | Conclusion                                    | 20           |
| 3            | Pro   | posed Leakage Current Reduction Technique     | 22           |
|              | 3.1   | Introduction                                  | 22           |
|              | 3.2   | Body-Bias Technique                           | 25           |
|              |       | 3.2.1 Leakage Current                         | 27           |
|              |       | 3.2.2 Power Consumption Model                 | 28           |
|              | 3.3   | Bias Generator                                | 29           |

|   | 3.4 | Level  | Shifter                                                                   | )      |
|---|-----|--------|---------------------------------------------------------------------------|--------|
|   |     | 3.4.1  | Design Challenges                                                         | 3      |
|   |     | 3.4.2  | Proposed Level Shifter Design                                             | 1      |
|   |     |        | 3.4.2.1 Performance Evaluation                                            | 7      |
|   |     |        | 3.4.2.2 Comparison                                                        | L      |
|   | 3.5 | Basic  | Concept of Negative Shifting 43                                           | 3      |
|   | 3.6 | Propo  | osed Negative Level Shifter: Case One                                     | 1      |
|   |     | 3.6.1  | Temperature Compensation                                                  | 3      |
|   |     | 3.6.2  | Performance Evaluation                                                    | 3      |
|   |     |        | 3.6.2.1 Thermal Assessment                                                | 3      |
|   |     |        | 3.6.2.2 Delay                                                             | )      |
|   |     |        | 3.6.2.3 Power Consumption                                                 | L      |
|   |     | 3.6.3  | Voltage Reference                                                         | 2      |
|   |     |        | 3.6.3.1 Proposed Voltage Reference                                        | 2      |
|   |     |        | 3.6.3.2 Effective Temperature Coefficient                                 | 1      |
|   |     |        | 3.6.3.3 Supply Voltage Sensitivity                                        | 1      |
|   |     |        | 3.6.3.4 Power Supply Rejection Ratio (PSRR) 55                            | 5      |
|   |     |        | 3.6.3.5 Voltage Reference Evaluation                                      | 5      |
|   |     | 3.6.4  | Comparison                                                                | 3      |
|   | 3.7 | Propo  | osed Negative Level Shifter: Case Two                                     | )      |
|   |     | 3.7.1  | Performance Parameters                                                    | 2      |
|   |     | 3.7.2  | Performance Evaluation                                                    | 3      |
|   |     | 3.7.3  | Comparison                                                                | 5      |
|   | 3.8 | Concl  | usion $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $65$ | 5      |
| 1 | CM  |        |                                                                           | 7      |
| 4 |     |        | duction 67                                                                | !<br>7 |
|   | 4.1 | Basic  | Storage Coll                                                              | ]<br>2 |
|   | 4.2 | Anchi  | tosture of an SDAM Unit                                                   | י<br>ר |
|   | 4.0 | ATCHII |                                                                           | 1      |

|   |     | 4.3.1 Sense Amplifier and Write Driver                                                                                             |
|---|-----|------------------------------------------------------------------------------------------------------------------------------------|
|   | 4.4 | Read Operation                                                                                                                     |
|   | 4.5 | Write Operation                                                                                                                    |
|   | 4.6 | Overview of Read Buffer-Foot                                                                                                       |
|   | 4.7 | Building Blocks and Techniques                                                                                                     |
|   |     | 4.7.1 Single-ended 8 <i>T</i> -SRAM Cell                                                                                           |
|   |     | 4.7.2 Peripheral Write Assist Circuitry                                                                                            |
|   |     | 4.7.3 Read Peripheral                                                                                                              |
|   |     | 4.7.4 Reverse-Body Biasing                                                                                                         |
|   | 4.8 | Conclusion                                                                                                                         |
| 5 | Phv | vsical Design 96                                                                                                                   |
| - | 5.1 | Fabrication                                                                                                                        |
|   | 5.2 | symmetry                                                                                                                           |
|   | 5.3 | Latchup                                                                                                                            |
|   | 5.4 | Guard Ring                                                                                                                         |
|   |     | 5.4.1 Guard Rings for Internal and External Latchup Phenomena 101                                                                  |
|   |     | 5.4.2 Double Guard Ring                                                                                                            |
|   | 5.5 | Effect of Body Biasing on the Physical Design                                                                                      |
|   | 5.6 | Physical Design of the Proposed Level Shifters                                                                                     |
|   |     | 5.6.1 Proposed Level Shifter in Fig. 3.8                                                                                           |
|   |     | 5.6.2 Proposed Level Shifter in Fig. 3.17                                                                                          |
|   | 5.7 | Conclusion                                                                                                                         |
| 0 |     |                                                                                                                                    |
| 0 | App | Distation: Physical Unclonable Function 112                                                                                        |
|   | 6.1 | Introduction                                                                                                                       |
|   |     | 6.1.1 Our Contribution                                                                                                             |
|   | 6.2 | SRAM PUF Architecture                                                                                                              |
|   |     | 6.2.1 Related Works $\ldots \ldots 117$ |

|    |                  | 6.2.2  | Effect of process variations on the power-up state | . 117 |
|----|------------------|--------|----------------------------------------------------|-------|
|    |                  | 6.2.3  | Power consumption                                  | . 119 |
|    | 6.3              | Propo  | sed Design                                         | . 120 |
|    |                  | 6.3.1  | SRAM design and Operation                          | . 120 |
|    |                  | 6.3.2  | Current Mirror Design                              | . 122 |
|    |                  | 6.3.3  | Negative-DC-Voltage Generator                      | . 123 |
|    |                  | 6.3.4  | Randomness Operation                               | . 125 |
|    |                  |        | 6.3.4.1 Phase One                                  | . 125 |
|    |                  |        | 6.3.4.2 Phase Two                                  | . 125 |
|    | 6.4              | Perfor | mance Evaluation                                   | . 125 |
|    |                  | 6.4.1  | Reliability                                        | . 126 |
|    |                  | 6.4.2  | Uniqueness                                         | . 126 |
|    | 6.5              | Impler | mentation and Test Requests                        | . 127 |
|    | 6.6              | Comp   | arison study                                       | . 129 |
|    | 6.7              | Conclu | usion                                              | . 130 |
|    |                  |        |                                                    |       |
| 7  | Con              | tribut | ion                                                | 131   |
|    | 7.1              | Summ   | ary and Conclusions                                | . 131 |
| Bi | Bibliography 133 |        |                                                    |       |
| Vi | ita A            | uctori | S                                                  | 150   |

### List of Tables

| 3.1  | Charge pump performance comparison                                                      | 30 |
|------|-----------------------------------------------------------------------------------------|----|
| 3.2  | Level shifters' performance comparison                                                  | 34 |
| 3.3  | Transistor Sizes                                                                        | 37 |
| 3.4  | Performance summary and comparison with the state of the art                            | 43 |
| 3.5  | Transistors' size ratio of the proposed Negative Level Shifter.                         | 47 |
| 3.6  | Performance summary and comparison with the state of the art. $\ldots$ $\ldots$         | 59 |
| 3.7  | Comparison of the proposed negative level shifter with satae-of-the-art LSs             | 59 |
| 3.8  | Comparison of the proposed Negative Level Shifter with charge                           |    |
|      | pumps                                                                                   | 60 |
| 3.9  | Transistor Sizes                                                                        | 61 |
| 3.10 | Level shifters' performance comparison                                                  | 66 |
| 4.1  | Performance comparison of two cases                                                     | 94 |
| 5.1  | Parameters comparison for an $8T$ SRAM cell with and without guard ring $\therefore$ 10 | 07 |
| 6.1  | Transistors' size ratio of the proposed current mirror                                  | 23 |
| 6.2  | Comparison with state-of-the-art                                                        | 29 |

### List of Figures

| 1.1  | Comparison between the SRAM, DRAM, and Flash memories                     | 2  |
|------|---------------------------------------------------------------------------|----|
| 1.2  | Technology scaling according to International Technology Roadmap          |    |
|      | for Semiconductors (ITRS)-2016 [14]                                       | 4  |
| 2.1  | Conventional SRAM column with bit-line leakage [18]. $\ldots$ .           | 8  |
| 2.2  | Overall architecture of sense amplifier cell scheme [17]                  | 11 |
| 2.3  | Hierarchical approach [34]                                                | 14 |
| 2.4  | Concept of the <i>HBLSA</i> -SRAM [26]                                    | 15 |
| 2.5  | dual-rail SRAM architecture [37]                                          | 17 |
| 2.6  | Leakage inside a SRAM cell [15]                                           | 18 |
| 2.7  | Drowsy SRAM cell with the supply voltage control mechanism [15].          | 19 |
| 2.8  | 16Kb array, consisting of 256 rows by 64 columns, with 16 divided         |    |
|      | local arrays.                                                             | 20 |
| 2.9  | A local array including 63 columns and 16 rows and the relevant           |    |
|      | attached word drivers (WD) and sense amplifiers (SA)                      | 21 |
| 3.1  | Simplified block diagram of a charge pump (CP)-based energy har-          |    |
|      | vesting power management integrated circuit (PMIC) [50]. $\ldots$         | 24 |
| 3.2  | NMOS $V_{th}$ versus body biasing in $65nm$                               | 26 |
| 3.3  | PMOS $V_{th}$ versus body biasing in $65nm$                               | 27 |
| 3.4  | PMOS normalized leakage versus body biasing in $65nm$                     | 28 |
| 3.5  | PMOS normalized power versus body biasing in $65nm$                       | 29 |
| 3.6  | Conventional DCVS level converter [67]                                    | 32 |
| 3.7  | Conventional DCVS level converter [67]                                    | 32 |
| 3.8  | Proposed level shifter.                                                   | 35 |
| 3.9  | Bulk current of $M_{p5}$ and $M_{p6}$ as VDDL varies between 0.4 and $1V$ |    |
|      | at different PVT corners                                                  | 37 |
| 3.10 | Transient Results of the Proposed Negative Level Shifter                  | 38 |

| 3.11 | Delay versus VDDL variations for five corners at temperatures of                   |    |
|------|------------------------------------------------------------------------------------|----|
|      | $-20^{\circ}C$ , $27^{\circ}C$ and $100^{\circ}C$                                  | 39 |
| 3.12 | Static power versus VDDL variations for five corners at tempera-                   |    |
|      | tures of $-20^{\circ}C$ , $27^{\circ}C$ and $100^{\circ}C$ .                       | 39 |
| 3.13 | Histogram of delay and static power consumption of the proposed                    |    |
|      | LS                                                                                 | 40 |
| 3.14 | Static power versus temperature.                                                   | 41 |
| 3.15 | Delay–energy characteristics comparison                                            | 41 |
| 3.16 | Common source amplifier.                                                           | 43 |
| 3.17 | Proposed Negative Level Shifter. (a) Conceptual model (b) Schematic                |    |
|      | of the proposed negative level shifter                                             | 45 |
| 3.18 | Output voltage variations with time                                                | 46 |
| 3.19 | Thermal behavior of $V_{gs}$ and $V_{th}$ for (a) $M_8$ and (b) $M_9$              | 47 |
| 3.20 | Output voltage with and without temperature compensation                           | 48 |
| 3.21 | $V_{gs}$ and $V_{th}$ thermal behavior of (a) $M_8$ and (b) $M_9$ before and after |    |
|      | compensation                                                                       | 49 |
| 3.22 | (a) $M_9$ drain-source voltage changes in terms of temperature before              |    |
|      | and after compensation (b) Negative level shifter's output-voltage                 |    |
|      | variations in terms of temperature over 5 different process corners                | 49 |
| 3.23 | Histograms of the $V_{out}$                                                        | 50 |
| 3.24 | Transient Results of the Proposed Negative Level Shifter                           | 51 |
| 3.25 | (a) Delay (b) power consumption of the proposed NLS in terms of                    |    |
|      | VDD variations                                                                     | 52 |
| 3.26 | Proposed Voltage Reference circuit.                                                | 53 |
| 3.27 | Voltage reference variations in terms of (a) temperature (b) digital               |    |
|      | words over 5 different process corners                                             | 56 |
| 3.28 | $V_{ref}$ versus VDD                                                               | 56 |
| 3.29 | (a) TC sensitivity (b) PSRR in terms of VDD                                        | 57 |
| 3.30 | Proposed negative level shifter.                                                   | 61 |

| 3.31 | Transient Results of the Proposed Negative Level Shifter              | 62 |
|------|-----------------------------------------------------------------------|----|
| 3.32 | Output voltage variations versus input voltage changes when both      |    |
|      | the VDDL and VDDH varies.                                             | 64 |
| 3.33 | Delay and static power variations with VDDL and VDDH                  | 65 |
| 4.1  | 6T SRAM cell [101]                                                    | 68 |
| 4.2  | (a) The basic SRAM cell consists of two access transistors that       |    |
|      | provide a connection to a static memory element. (b) An SRAM          |    |
|      | array consists of multiple SRAM cells arranged in an array and        |    |
|      | provides multi-bit and multi-address storage [101]                    | 69 |
| 4.3  | Construction of an array based on a plurality of SRAM cells [102]     | 70 |
| 4.4  | A typical SRAM architecture [103]                                     | 71 |
| 4.5  | A typical SRAM architecture [103]                                     | 72 |
| 4.6  | 6T bit-cell during a read operation                                   | 73 |
| 4.7  | SRAM column slice showing pre-charge transistors, a single SRAM       |    |
|      | cell, and read/write circuits [105]                                   | 74 |
| 4.8  | 6T bit-cell during a write operation                                  | 76 |
| 4.9  | 8T bit-cell uses two-port topology to eliminate read $SNM$ and pe-    |    |
|      | ripheral assists, controlling Buffer-Foot and $VVDD$ , to manage bit- |    |
|      | line leakage and write errors [111]                                   | 77 |
| 4.10 | Half-select disturbance in $8T$ SRAM array [112]                      | 78 |
| 4.11 | Single-ended $8T$ SRAM cell                                           | 79 |
| 4.12 | 8 <i>T</i> -cell (a) without (b) with the Read Buffer-Foot            | 80 |
| 4.13 | 8T-SRAM cell with Buffer-Foot during the read operation. (a) se-      |    |
|      | lected column (b) unselected column                                   | 81 |
| 4.14 | A SRAM memory block, including 16 8T-SRAM cells per bitline           |    |
|      | and their connections through two vertical and two horizontal con-    |    |
|      | trolling lines.                                                       | 83 |
| 4.15 | Supply voltage controlling mechanism for array                        | 84 |

| 4.16                                                                                                   | A column of 16 $8T$ -SRAM                                                                                                                                                                                                                                                                                    | 34                                                                                 |
|--------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------|
| 4.17                                                                                                   | write peripheral Assist Circuitry.                                                                                                                                                                                                                                                                           | 35                                                                                 |
| 4.18                                                                                                   | Proposed supply voltage controlling mechanism for write driver 8                                                                                                                                                                                                                                             | 36                                                                                 |
| 4.19                                                                                                   | The write driver symbol                                                                                                                                                                                                                                                                                      | 37                                                                                 |
| 4.20                                                                                                   | The transient response of the memory block                                                                                                                                                                                                                                                                   | 37                                                                                 |
| 4.21                                                                                                   | Partially discharge of the unselected bitline pairs                                                                                                                                                                                                                                                          | 38                                                                                 |
| 4.22                                                                                                   | Read peripheral circuit                                                                                                                                                                                                                                                                                      | 38                                                                                 |
| 4.23                                                                                                   | Proposed Read peripheral circuit                                                                                                                                                                                                                                                                             | 39                                                                                 |
| 4.24                                                                                                   | Read peripheral circuit symbol                                                                                                                                                                                                                                                                               | 39                                                                                 |
| 4.25                                                                                                   | The transient response of the memory block.                                                                                                                                                                                                                                                                  | 90                                                                                 |
| 4.26                                                                                                   | Current in terms of PMos and NMOS length                                                                                                                                                                                                                                                                     | <i>)</i> 1                                                                         |
| 4.27                                                                                                   | Current in terms of NMOS body bias.                                                                                                                                                                                                                                                                          | <i>)</i> 1                                                                         |
| 4.28                                                                                                   | Write operation over different NMOS body bias                                                                                                                                                                                                                                                                | 92                                                                                 |
| 4.29                                                                                                   | Read and write operation.                                                                                                                                                                                                                                                                                    | 93                                                                                 |
|                                                                                                        |                                                                                                                                                                                                                                                                                                              |                                                                                    |
| 4.30                                                                                                   | Comparison of out voltage of two proposed negative voltage gener-                                                                                                                                                                                                                                            |                                                                                    |
| 4.30                                                                                                   | Comparison of out voltage of two proposed negative voltage gener-<br>ators                                                                                                                                                                                                                                   | 94                                                                                 |
| 4.30<br>5.1                                                                                            | Comparison of out voltage of two proposed negative voltage gener-<br>ators                                                                                                                                                                                                                                   | 94                                                                                 |
| 4.30<br>5.1                                                                                            | Comparison of out voltage of two proposed negative voltage generators.       G         ators.       CMOS integrated circuits are fabricated on and in a silicon wafer         [117].       G                                                                                                                 | 94<br>97                                                                           |
| <ul><li>4.30</li><li>5.1</li><li>5.2</li></ul>                                                         | Comparison of out voltage of two proposed negative voltage generators.       2         ators.       2         CMOS integrated circuits are fabricated on and in a silicon wafer       2         [117].       2         Latch formation in CMOS [122].       2                                                | )4<br>)7<br>)9                                                                     |
| <ul><li>4.30</li><li>5.1</li><li>5.2</li><li>5.3</li></ul>                                             | Comparison of out voltage of two proposed negative voltage generators.       9         ators.       9         CMOS integrated circuits are fabricated on and in a silicon wafer       9         [117].       9         Latch formation in CMOS [122].       9         Latch formation in CMOS [121].       9 | 94<br>97<br>99<br>)0                                                               |
| <ul> <li>4.30</li> <li>5.1</li> <li>5.2</li> <li>5.3</li> <li>5.4</li> </ul>                           | Comparison of out voltage of two proposed negative voltage gener-<br>ators                                                                                                                                                                                                                                   | 94<br>97<br>99<br>00                                                               |
| <ul> <li>4.30</li> <li>5.1</li> <li>5.2</li> <li>5.3</li> <li>5.4</li> </ul>                           | Comparison of out voltage of two proposed negative voltage gener-<br>ators                                                                                                                                                                                                                                   | 94<br>97<br>99<br>00                                                               |
| <ul> <li>4.30</li> <li>5.1</li> <li>5.2</li> <li>5.3</li> <li>5.4</li> <li>5.5</li> </ul>              | Comparison of out voltage of two proposed negative voltage gener-<br>ators                                                                                                                                                                                                                                   | 94<br>97<br>99<br>00                                                               |
| <ul> <li>4.30</li> <li>5.1</li> <li>5.2</li> <li>5.3</li> <li>5.4</li> <li>5.5</li> </ul>              | Comparison of out voltage of two proposed negative voltage gener-<br>ators                                                                                                                                                                                                                                   | 94<br>97<br>99<br>00                                                               |
| <ul> <li>4.30</li> <li>5.1</li> <li>5.2</li> <li>5.3</li> <li>5.4</li> <li>5.5</li> </ul>              | Comparison of out voltage of two proposed negative voltage gener-<br>ators                                                                                                                                                                                                                                   | 94<br>97<br>99<br>00                                                               |
| <ul> <li>4.30</li> <li>5.1</li> <li>5.2</li> <li>5.3</li> <li>5.4</li> <li>5.5</li> </ul>              | Comparison of out voltage of two proposed negative voltage gener-<br>ators                                                                                                                                                                                                                                   | 94<br>97<br>99<br>00<br>02                                                         |
| <ul> <li>4.30</li> <li>5.1</li> <li>5.2</li> <li>5.3</li> <li>5.4</li> <li>5.5</li> <li>5.6</li> </ul> | Comparison of out voltage of two proposed negative voltage gener-<br>ators                                                                                                                                                                                                                                   | <ul> <li>94</li> <li>97</li> <li>99</li> <li>00</li> <li>02</li> <li>04</li> </ul> |

| Variation of depletion region charge with bulk voltage [128] 106  |
|-------------------------------------------------------------------|
| Single-ended $8T$ SRAM cell Layout                                |
| The write driver layout                                           |
| Read peripheral circuit layout                                    |
| Physical Design of the level shifter in Fig. 3.8                  |
| Physical Design of the proposed level shifter in Fig. 3.17(b) 111 |
| SRAM cell in a Array                                              |
| Power consumption of SRAM PUF (single block) with and without     |
| negative DC generator                                             |
| General block diagram for SRAM PUF                                |
| General block diagram for SRAM PUF (Single block)                 |
| Proposed Current mirror                                           |
| Negative-DC-voltage generator applied to the NMOSs' of the SRAM   |
| units                                                             |
| Negative-DC-voltage generator physical design                     |
| General flowchart                                                 |
| MonteCarlo analysis                                               |
| MonteCarlo analysi.s                                              |
|                                                                   |

# Chapter 1

# Introduction and Motivation

Excessive demand of Electronic gadgets has made an innovation in the VLSI technology. These devices facilitate the most basic amenities of human life and needs to be upgraded in the course of time.

The VLSI technology applications are widely spread including: smart phones [1], GPS systems [2], military operations [3], radars [3], [4], medical devices [5] and so on so forth. There are various groups of memories that perform functionalities and store data either permanently or temporarily with static or dynamic operations.

This chapter sets the stage for the low-power embedded SRAM design. Section 1 briefly presents introduction about SRAM memories. Section 1.2 elaborates the motivation behind this research and challenges ahead of the low-power embedded SRAM design.

### 1.1 introduction

Dynamic random access memory (DRAM) and static random access memory (SRAM) are two types of memory. The main difference is that DRAM includes

capacitors while SRAM does not, though there are also considerations such as different processing, different speeds, and different cost for developers. DRAM also allows for higher densities at a lower cost than SRAMs.

However, DRAM did not prove to be a promising device for high performance and high volume production [6].

Alternately, the conventional six-transistor (6T) SRAM [7], is the main choice for today's applications that could meet the demand of high performance microprocessors. Figure 1.1 compares the SRAMs' cost and speed with DRAM and Flash memories.



FIGURE 1.1: Comparison between the SRAM, DRAM, and Flash memories.

SRAM is a vital buildig block in almost all digital systems including: microcontrollers, powerful microprocessors, digital signal processing circuits, hard disk and router buffers, LCD screens, printers, sensors, medical devices and so on so forth. There is a main key that leads such wide application of the SRAM: data can be stored without any kind of extra processing or refreshing periodically - unlike DRAMS - which led to a faster operation than DRAMs. Furthermore, to obtain high performance, SRAM must be dense and leaks low power while maintaining high performance of the traditional 6T scheme [7]. Low power nano-scale SRAMs are becoming increasingly important to satisfy the low power requirements of the high-end microprocessor units while ensuring the reliability of miniaturized devices. To design such low power SRAMs, some challenges relevant to the memory characteristics must be accomplished: (1) leakage current must be reduced. (2) the noise margin must be improved by increasing the signal-to-noise (S/N). (3) Voltage and temperature (PVT) variations must be reduced.

In the CPU design, the leakage power is dominated by the CPU core; however, the large on-die SRAMs in low-power application processor can be the dominant source of standby power dissipation because the rest of the processor has been greatly optimized to reduce power consumption [8, 9]. Consequently, the SRAM power consumption has become one of the significant factor in the overall powermanagement strategy for advanced VLSI system design [10].

The opportunity for high level of integration to obtain low power and performance in VLSI systems is increasingly available thanks to the Moore's law since it scales the CMOS technology well into the nanoscale regime.

Moreover, Designers also lower the SRAM cell's power supply in order to reduce the standby leakage power. This also reduces active power consumption which is associated with switching the highly capacitive bit lines and word lines in the active mode. However, this is achieved at the cost of stability degradation [11].

Meanwhile, maintaining a sufficient cell-stability margin became a challenge as transistor size and bit-cells were minimized, to meet low-power constraints, by the technology scaling [12]. The transistor threshold voltage, the gate length and gate dielectric thickness are also the traditional parameters that could assist designers to control the transistor leakage but at the cost of speed and area [13].

Therefore, there is a need to improve the power efficiency of the SRAMs for applications that require low-power consumption.

### 1.2 Motivation

Over the past few years, the extensive growth of battery-operated devices has significantly increased the demand for low-power integrated circuits.

According to International Technology Roadmap for Semiconductors (ITRS)-2016 [14], the technology scaling is going to reduce to 3nm in 2021 (see Figure 1.2).



FIGURE 1.2: Technology scaling according to International Technology Roadmap for Semiconductors (ITRS)-2016 [14].

As the technology scales, the transistors' density in the SRAM cells increases significantly, which in turn causes a major challenge in standard CMOS SoCs: high leakage current. As leakage current directly impacts the power consumption, it plays a key role in CMOS circuit design.

The high-performance device is used just in critical paths, and the low leakage devices are used everywhere else. This approach, however, has achieved limited success for the medium size embedded SRAM units because of the area overhead of the high- $V_{th}$  transistors and the extra cost of the dual  $V_{th}$  process.

### 1.3 Objectives

This dissertation has tried to address power consumption issue of the SRAM memories through the leakage current reduction techniques.

First, a negative DC voltage is proposed by a negative level shifter to control the NMOS transistors' leakage current located in the SRAM cells. This proposal could successfully creates square waves with operating frequencies based on the input frequencies (2.5MHz). It could also performs properly in terms of static power and propagation delay.

Next, a dual-rail designs is applied to the memory to split logic and SRAM rails and allow logic rail to be supplied without being restricted by the Vmin or Vmax requirement of SRAM bit-cells. The SRAM array voltage is supplied by a dedicated voltage that does not scale with the periphery logic. However, the SRAM periphery voltage is attached to the digital subsystem.

To reduce the leakage current further, a drowsy configuration is also proposed. Unlike the regular configuration of the memory that only SRAM cells are pushed into the sleep within standby mode, the write driver and sense amplifier are also forced into the sleep mode as well in this proposal.

# Chapter 2

# Literature Review

The growing market of high-speed devices demands lesser power dissipation for longer battery life and compact system. Static Random Access Memory (SRAM) has become an interesting topic area because it is used in high-speed applications such as cache memory and occupies about 90% of silicon area. The main SRAM memory-design challenge is to reduce leakage power consumption without sacrificing speed or memory density. Various research works have been studied to reduce its leakage power, and we survey recent developments in SRAM leakage reduction techniques at the circuit and architectural level.

### 2.1 Review of Previous Works

As processor technology scales below  $0.1\mu m$ , sub-threshold leakage (static) power dissipation dominates digital circuits' total power. Moreover, sub-threshold leakage presents an interesting trade-off between performance and costs.

Performance demands require fast high-leakage transistors; however, new applications concern appreciate designs that are power efficient [15]. As a result, there is an apparent necessity to mitigate these two often-conflicting power and performance requirements [16].

As on-chip SRAM memory occupies a large portion of the chip area, the memory power dissipation, both active and standby, is becoming a dominant part of the total power consumption of the chip [10]. Thus, considerable attention has been paid to the low-power, high-performance SRAM memories, which in turn affects the total power and die area in high-performance processors [8].

As systems become more complex, on-chip SRAMs tend to have a large number of bit widths, typically ranging from 16 to 256 bits, though the numbers can be even greater. This has a direct impact on the power consumption [17].

At the architectural level, there are several sources of power consumption in SRAM memories, which could be generally categorized into two groups: those that attempt to lower (1) the minimum required bitlines' swing voltage (2) leakage current of the cells.

The bit/data line leakage current has become increasingly prominent due to decreasing transistors' threshold voltages in the nano-meter technology. The bit/data line leakage current is induced by the leakage current of the transmission transistors attached to the bit lines [18, 19].

Fig. 6.1 demonstrates a single-column circuit for a conventional SRAM [18]. In this scheme, the accessed cell stores logic-one while the other cells store logic-zero, which is the worst-case scenario. The sub-threshold leakage current in each nonaccessed cell flows from the bit-line, passing through the access transistor, and finally into the storing node of Logic Zero. Under this circumstance, these leakage currents accordingly contribute to the overall bit-line leakage current, which increases total power consumption.

During the read operation, this bit-line leakage current will also become the noise that goes against the cell current and may avoid the bit-line differential signal



FIGURE 2.1: Conventional SRAM column with bit-line leakage [18].

detection performed by the sense amplifier [18]. The increased bit-line leakage also causes slow or incorrect write operation of the SRAM [19].

Furthermore, the discharge rate of the bit lines contributes to the read access time of the SRAM, which is proportional to a time constant as noted in (2.1) [20].

$$\tau \approx \frac{C_{bitline}}{\acute{K}(\frac{W}{L})(VDD - V_{th})^2} \cdot \Delta V$$
(2.1)

where  $\Delta V$  is the discharge voltage amount,  $C_{bitline}$  is the total bit line capacitance,  $\dot{K}$  is the intrinsic transconductance of the word line select transistor ( $\mu C_{ox}$ ), W/Lis the width-to-length ratio of the select transistor, VDD is the supply voltage, and  $V_{th}$  is the threshold voltage of the select transistor.

As Eq. (2.1) implies, bit lines' discharge time constant is proportional to the bit lines capacitance, which can be achieved by reducing the number of bit cells sharing a given bit line [9]. The hierarchical bit line is an architectural technique to reduce the total capacitance of the bit lines.

According to Eq. 2.1, reducing the voltage swing of bit lines is another way to reduce the bit lines' discharge rate of bit lines [10, 21]. Although several power

efficient methods [22]-[23] have been introduced in the literature to decrease the bit-lines voltage swing, the bit line charge recycling (CR) technique is one of the most promising procedure to reduce the bit lines' swing voltage and in turn the total power consumption.

However, this minimum required signal swing is restricted by the sense amplifier offset. The higher the offset is, the higher the power consumption would be. Consequently, the maximum obtainable SRAM performance is restricted by sense amplifier input offset voltage because the voltage differential between the bit lines must be larger than the input offset of the worst-case sense-amplifier [24].

Suitable sense timing is another critical consideration that sense amplifiers within reads should be taken into account to achieve the high-speed, and low-power SRAM [25]. Bit line discharging is the most time-consuming procedure during the read operation [25]. Moreover, in the early assertation of the sense amplifier enable (SAE), the sense amplifier cannot amplify the difference between a pair of bit-line voltages correctly. Consequently, read failure occurs. In contrast, if the SAE is asserted late, the cycle time is expanded, and extra power is consumed. As a result, the optimum timing for the SAE is critical for high-speed, low power SRAM read operation [25].

The rest of this chapter is organized as follows. In section 2.2, some attempts to lower the bit-line swing are discussed. This section is divided into two sub-sections: bit-line charge recycling and hierarchical bit lines. Section 2.3 and 2.4 discusses two configurations to reduce the leakage current. Finally conclusion comes at the section 2.5.

### 2.2 Reducing Bit Line Swing Voltage

Concerning the SRAM cells' operation, the read power is reduced by limiting the swing voltages of bit lines and data bus to small voltages during the read cycles. However, the SRAM consumes much larger power during write cycles than the read cycles due to the full swing property in the bit lines and data bus during write cycles [17, 26]. Therefore, there were some attempts in the literature to lower the bit-line swing in order to decrease write power consumption.

In low-power embedded SRAMs, the bit lines are referenced to VDD, and during writes, they are discharged almost to ground, which results in the write power be significantly larger than the read power. Thus, write power can be reduced by decreasing the bit-line swings during writes. Several techniques have been proposed to save the write power by reducing the swing voltage of bit lines [17], [27, 28].

A bit-line reference of VDD/2 [27] was proposed, which enables designers to reduce the bit-line swing during writes by half of the conventional technique. This topology reduces the swing voltage to VDD/2 by pre-charging of the bit lines to a VDD/2.

However, to avoid degrading the operation specifications, some considerations were applied to design. For example, to prevent cell instability during reads, a larger cell voltage was applied. Meanwhile, the pull-up PMOS was weakened to prevent write margin degradation, and the cell voltage of the accessed row was also lowered during writes to further reduce the power. The model also employs the two-to-four encoding technique to achieve capacitance matching of bit lines with the write data bus, which can theoretically reduce the write power of the bit lines by 75% of the full-swing version [27]. Full details regarding the encoding technique are outlined in [29].

However, it is difficult to further reduce the power because of write-error problems in the half-swing (HS) scheme. The HS scheme also has a problem in stable read operation because pre-charging bit lines to VDD/2 in a read cycle increases the possibility of an erroneous flip of cell data [17].

Another small-swing SRAM arrangement utilizing a sense-amplifying cell was presented in [17], which could further save power in write cycles. In this scheme, the charging and discharging power of the bit/data lines is reduced by lowering the bit line swing to VDD/6 and amplifying the voltage swing by a sense-amplifier arrangement in a memory cell [17].

Figs. 2.2 illustrates the circuit diagram of the proposed sense-amplifying cell (SAC) scheme. This structure's principal feature is an additional NMOS (VSS switch) connected to the source of pull-down NMOS transistors in a memory cell, which permits a small swing of bit lines in writes. A bit line is pre-charged to  $VDD-V_{th}$  by an NMOS load transistor and is pulled down to  $VDD-V_{th}-\Delta V_{BL}$  in a write "0" operation, where  $\Delta V_{BL}$  is write swing.



FIGURE 2.2: Overall architecture of sense amplifier cell scheme [17].

SAC-SRAM scheme, however, cannot lower the write power further because reducing the voltage swing of bit-lines is restricted by possible write failures under process variations. This is because the stable write operation in the proposed scheme requires the amplitude of the bit-line's voltage swing to be greater than a marginal value if it is to invert the state of the internal nodes of the SRAM cell in write operation [21].

Authors in [28] also employed the bit line lowering voltage swing technique to reduce the write power. In this technique, swing voltage was lowered to VDD/10 by floating the source line of memory cells when a word line is high and then by driving the source line to the ground.

These low swing writing techniques considerably decrease the write power consumption in bit lines. However, they require extra logic in each local row decoder and some dc–dc voltage converters to pre-charge/discharge the bit lines. Thus, these techniques require more area overhead and more significant speed degradation. Moreover, the low swing write techniques reduce the cells' noise margin because they act as sense amplifiers for the write operation [26].

To decrease the write power via the bit-line swing reduction, two methods were proposed: One technique involves bit-line charge recycling; another is a hierarchical technique. Both are explained as follows.

#### 2.2.1 Bit Line Charge Recycling

In this technique, the bit line's differential voltage swing is obtained by recycled charge (CR) from its neighbouring bit line capacitance, instead of the power-supply line. Employing such a charge-recycling method to the bit-line considerably decreases write power [21, 30]. When N bit lines recycle their charges, the bit lines' swing voltage and power are reduced to 1/N, and  $1/N^2$ , respectively [31].

In this technique, the bit line charge can be recycled repeatedly by rolling down to the lower neighbouring bit-line pair in each cycle until this charge reaches ground [21]. This proposed write scheme can deduct the total power used in the conventional SRAM memories by 88% [21]. However, this technique suffers from some drawbacks. For example, the bit-line pairs are pre-charged to VDD during a read operation; however, the bit-line pairs have different voltage levels during the write operation. As a result, a reference voltage generator is required for the write pre-charge voltage, which results in power and delay overheads for the write startup. Also, such column-based power control reduces the static noise margins in the memory cells [30, 31].

Unlike the conventional charge-recycling SRAM [21], another charge-recycling SRAM was proposed that could recycle the charge in bit lines during both read and write operations [31]. In this way, both read and write powers could be decreased. Moreover, it does not have the delay overheads due to the read-to-write mode change of the conventional charge-recycling SRAM. The simulations demonstrate that this CR-SRAM saves 17% and 84% read and write power respectively compared with the conventional SRAM [31].

#### 2.2.2 Hierarchical Bit Line

As mention earlier, in order to lower the power consumption of SRAM, suppressing bit line swing is vital. The minimum bit line swing is, however, limited by the offset voltage of the sense amplifiers. Moreover, in the presence of die variation, the memory cell current changes widely, especially when the operation voltage scales down to the near-threshold-voltage region. Signals of most of the bit lines are largely developed before the slowest bit line reaches the minimum swing that can be sensed by the sense amplifier. Consequently, it would be difficult to decrease the average bit line swing in very low voltage operations, which causes wasted power consumption [32]. At the architectural level, hierarchical bit line topology is introduced in the literature to reduce both write swing voltage and capacitance of bit lines [33, 34]. The bit line capacitance primarily comprises the pass transistors' drain capacitance and the metal capacitance of the bit line. In order to reduce this capacitance, drain capacitance and metal capacitance should be decreased. The bit line capacitance can be reduced by the hierarchical approach, where the number of transistors connected to the bit line is deducted by combining two or more SRAM cells [34].

Fig. 2.3 demonstrates four SRAM cells that are combined together by the sub-bit line and connected through one pass transistor to the bit lines. As a result, the number of pass transistors attached to the bit-line and the bit line capacitance is reduced by 4.



FIGURE 2.3: Hierarchical approach [34].

The write power of the conventional bit line is expressed as follows [26]

$$P_{CVBL} = f \times C_{CVBL} \times VDD^2 \tag{2.2}$$

where f is the clock frequency,  $C_{CVBL}$  and VDD are the capacitance of the conventional bit line and power supply, respectively. However, the write power of the hierarchical bit line  $P_{HBL}$  is

$$P_{HBL} = f \times (C_{BL} \times V_{BL} + C_{SBL} \times VDD) \times VDD$$
(2.3)

where f is the clock frequency,  $C_{BL}$  is the capacitance of the bit line, and  $C_{SBL}$  is the capacitance of the sub-bit line. A comparison of Eq. (2.2) and (2.3) demonstrates that the hierarchical bit line consumes much less power than the conventional bit line during write cycles because both  $C_{BL}$  and  $C_{SBL}$  are less than  $C_{CVBL}$ .

The authors in [26] explored this architecture further and proposed hierarchical bit lines with local sense amplifiers (HBLSA-SRAM) to deduct the write power dissipation in bit lines without the noise margin degradation. This structure saves the write power by lowering the swing voltage of bit lines and data bus. Fig. 2.4 depicts the concept of the HBLSA-SRAM. The hierarchical bit line consists of a bit line (BL) and sub-bit lines (sub-BL), which includes M memory cells and a local sense amplifier (LSA) [26].



FIGURE 2.4: Concept of the *HBLSA*-SRAM [26].

However, *HBLSA*-SRAM requires local sense-amplifiers in each SRAM sub-arrays, which results in extra area overhead. Furthermore, the SRAM design that uses this method is complex because it requires optimization of SRAM sub-array size for proper power and memory area [21].

### 2.3 Dual-Rail SRAM Architecture

With the rise of various functionalities inside a single system on chip (SoC), there is a growing demand for power optimization. Advanced SoC comprises of numerous dedicated subsystems (SRAM and logics) that are divided into multiple voltage and power line necessities.

Embedded static random access memory (SRAM) often requires the minimum voltage of a subsystem due to its extremely dense layout and high multiplicity [35]. The higher Vmin requirement of 6T and 8T cells compared to logic circuits causes SRAM Vmin to be a system bottleneck. To address this problem, the SRAM array voltage should be fed by a separate voltage that does not scale with the periphery (including control and decoder logic; sense amplifiers; write drivers etc.) logic [36].

Dual-rail designs split logic and SRAM rails and allow logic rail to be supplied without being restricted by the Vmin or Vmax requirement of SRAM bit-cells [36]. The SRAM array voltage is supplied by a dedicated voltage that does not scale with the periphery logic. However, the SRAM periphery voltage is attached to the digital subsystem. Such a dual supply SRAM enables very low voltage without any overhead regarding managing the SRAM interface [35].

Fig. 2.5 shows the dual-rail SRAM architecture where the power supply is decoupled for the bit-cell (VDDCE) and periphery (VDDPE). Dual-rail memory architecture facilitate a wider range of operating voltage for the system [37].

Although effective, this technique exhibits various challenges at the system level.

First, the new rail requires distribution of metal resources for robust power delivery. Creation and regulation of this new power line and its routing through package and die increase the system area [38].



FIGURE 2.5: dual-rail SRAM architecture [37].

Second, signals crossing the power lines should be carefully controlled. In particular, level conversion might be necessary for all input and control signals crossing power lines. This contributes to an area overhead for the macro. If explicit level converters are avoided to conserve area, the logic supply and SRAM supply cannot be controlled completely independently [38].

#### 2.4 Drowsy SRAM Cell

A standard SRAM cell has two cross-coupled inverters, which provide two paths for the off-state leakage current (Fig. 2.6). The weak inversion current dominates the off-state leakage current. This relationship can be modelled as

$$I_D = I_{S0} e^{V_{GS} - V_{th}/(nkT/q)} (1 - e^{-V_{DS}/(nkT/q)}) (1 + \lambda V_{DS})$$
(2.4)

In context of weak inversion,  $\lambda$  is a parameter that models the pseudo-saturation region. The on transistors are located in a strong inversion region. Moreover, their



FIGURE 2.6: Leakage inside a SRAM cell [15].

serial resistance is small enough to be ignored. As a result, the overall current will be equal to the total leakage currents from the cross-coupled inverters' off transistors. The SRAM cell's overall leakage can thus be modelled as

$$I_L = \left( \left( \lambda I_{sn} + \lambda I_{sp} \right) + \left( \lambda I_{sn} + \lambda I_{sp} \right) V_{DD} \right) \times \left( 1 - e^{-V_{DD}/(nkT)/q} \right)$$
(2.5)

In this context,  $I_{sn}$  is the nMOS off-transistor current factor, while  $I_{sp}$  is the pMOS off-transistor current factors. They are independent of  $V_{DS}$  in 2.4. Based on 2.5, the leakage current is able to reduce the super-linearly via  $V_{DD}$ . As a result, the drowsy mode is able to produce a substantial reduction in leakage power.

However, to maintain state in the drowsy mode, it is critical to apply a minimum voltage. The researchers increased the or data-retention voltage or state-preserving to a total that was 50% higher than the voltage of the threshold. Despite variations in the process, they discovered that, if needed, the state-preserving supply voltage can be further reduced due to the conservative nature the state-preserving voltage.

A proposed model of a drowsy SRAM cell is illustrated in Fig. 2.7. This model features a mechanism that controls the supply voltage.

The operating mode requires two options: active or standby. Based on this, the memory cell's support voltage will be controlled by the two pMOS transistors: P1
and P2. When in active mode, P1 supplies  $(V_{DD})$ . Alternately, when in standby mode, P2 supplies  $(V_{DDLow})$ . Complementary supply voltage control signals have control over both the pMOS transistors; however, standby mode does not allow SRAM cell access. This is due to the fact that the bit line has a higher voltage than the core voltage of the storage cell, and the difference could cause the destruction of the cell state.



FIGURE 2.7: Drowsy SRAM cell with the supply voltage control mechanism [15].

A combination of voltage scaling and body biasing is a promising methode to lower the total power consumption of the memories [39], [40], [41].

In this section, the mechanisms that are required to lower the leakage current of the memory are discussed. This methodologies includes: dual-rail SRAM architecture, drowsy SRAM cells and body biasing technique.

Fig. 2.8 presents the overall architecture of the 16Kb array, consisting of 256 rows by 64 columns of SRAM bit-cells. It is divided into 16 local arrays, each with 16 rows. Each local array (see Fig. 2.9) includes 63 columns and 16 rows and the relevant attached word drivers (WD) and sense amplifiers (SA). As shown, the local array's voltage supply is derived by sleep line (SL) to provide the required voltage to the columns considering the operational state. The WD and SA are also supplied by a separate power line.

The combination of the hierarchical, dual SRAM and drowsy structure could reduce the overall leakage and consequently the power consumption; however, to reduce the leakage current further, the body biasing technique is introduced in the literature. So, the body biasing technique and how to generate the required voltages are discussed in the next chapter.



FIGURE 2.8: 16Kb array, consisting of 256 rows by 64 columns, with 16 divided local arrays.

# 2.5 Conclusion

In this chapter, we have surveyed SRAM power reduction techniques, emphasizing bit lines offset swing reduction techniques and leakage current reduction techniques. First, we motivated our discussion by small-swing SRAM arrangements.

We have reviewed the charge recycling method at the architectural level to reduce the bit line swing to 1/N, where N is the number of bit lines sharing their charge.



FIGURE 2.9: A local array including 63 columns and 16 rows and the relevant attached word drivers (WD) and sense amplifiers (SA).

We have also explained how the hierarchical technique could reduce the leakage by reducing the bit lines capacitance.

From leakage current reduction point of view, two methods were discussed: dualrail SRAM configuration and drowsy SRAM cells.

The main objective of this research is to create a SRAM building block to reduce the static power consumption of the memory by leakage current reduction techniques. Moreover, as technology shrinks, the leakage current of the cells increases significantly. Therefore, applying one of the discussed techniques would not satisfy the expected leakage reduction. As a consequence, a combination of these mentioned methodologies are proposed to obtain a competitive proposal.

As reverse body biasing is one of the applied methods to lower the leakage current of the cells, a negative voltage would be required to feed the body of NMOSs. Thus, to propose an applicable methodology, a negative voltage generator is also proposed which is attached to the SRAM building block.

# Chapter 3

# Proposed Leakage Current Reduction Technique

# 3.1 Introduction

Increasing demand for low-power applications continues to drive the need for large and high-speed embedded static random-access memory (SRAM) to improve system performance. These applications are also required to reduce dynamic and standby power consumption to satisfy the stringent battery-life requirement.

Furthermore, CMOS scaling beyond 100nm technology requires two criteria: (1) very low threshold voltages to maintain the device switching speeds (2) ultra-thin gate oxides to preserve the current drive ability and keep the threshold voltage changes under control when dealing with short-channel effects.

Low threshold voltage results in an exponential rise in the sub-threshold leakage current, while ultra-thin oxide leads to an exponential growth in the gate leakage current. Consequently, although the leakage power in high-performance CPU is dominated by CPU core, the large on-die SRAMs in low-power application processors is the dominant core of standby leakage current[42]. The leakage power dissipation is also almost proportional to the chip area. As caches occupy roughly 50% of the chip area in many processors [43], the caches' leakage power is one of the major blocks of power consumption in high-performance microprocessors.

As a consequence, with respect to technology enhancements, the sub-threshold leakage considerably increases. For this reason, lowering the leakage power consumption without compromising speed performance is of importance in VLSI design. With the advent of systems-on-chip (SoC), the design of fast and power efficient SRAM structures has become increasingly crucial.

One of the most widely-used topologies to reduce the total power consumption of the SRAM memories is to scale down the dynamic voltage (dynamically scales the supply voltage and clock frequency) [44]. However, this methodology is ineffective when the sub-threshold leakage current is significant due to shrinking of the voltage supply.

Moreover, the low voltage operations (below 400mV) have been successfully exhibited in real silicon measurements [45]. However, operating memory cells at such a low voltage is more challenging because preserved data degrades considerably at these low voltages. In the sub-threshold region, conventional 6T-SRAM shows poor read stability and weak writability [46]. As the read stability and the writability have conflicting design requirements, it is extremely difficult to operate the 6T-SRAM in the sub-threshold region [47].

A more promising technique to reduce the power consumption of the memory cells is the utilization of a body biasing technique. This approach is used here, and the proposed circuit is benefiting from this feature.

The body biasing technique is able to exponentially reduce the leakage current by adjusting the threshold voltage  $(V_{th})$  of the transistors [48], [49]. However, there are some restrictions on the maximum threshold voltage because if it is increased

significantly, the transistors would not be able to stay in the sub-threshold region and the preserved data would be destroyed. As a result, the threshold voltage should be controlled in such a way to be low enough to ensure a proper read/write operation and it must be high enough (by reverse-body biasing of the transistors) to reduce the leakage current in the hold state.

One of the major challenges in this technique is the required negative voltage for n-type transistors to bias their body in the reverse direction.

Charge pumps are introduced in the literature to produce such a negative voltage. Charge pumps fall into the group of the inductor-less DC-DC converters. Although they are commercial building blocks, charge pumps are complicated switch-capacitor based circuitries, which need a sophisticated switching block to take care of the proper operation.

As an example, Fig. 3.1 illustrates the simplified block-diagram of a power management integrated circuit (PMIC) [50], which includes the charge pump. The input voltage,  $V_{\rm IN}$ , is achieved when N-stage of charge pumping is completed. This charge pump is equipped with a clock-generation block to drive the required pulses.



FIGURE 3.1: Simplified block diagram of a charge pump (CP)-based energy harvesting power management integrated circuit (PMIC) [50].

Level shifters are another topic of interest in VLSI area, whose main objective is to convert different voltages together where multiple supply voltages were required. Furthermore, all the proposed level shifters were designed to generate a higher (positive) level of voltage than a reference voltage. However, the major goal of this study is to generate a negative voltage for reverse-body biasing of the NMOS transistors by level shifters.

The rest of this chapter will briefly discuss the body-biasing concept and two current bias generators: charge pump and level shifter.

# **3.2** Body-Bias Technique

Digital design styles that utilize the body biasing require detailed understanding of body biased circuit behaviour because it affects the physical MOS transistor parameters, impacting circuit performance and power consumption. In this section body biasing impact on leakage and power consumption is modelled.

The transistors' threshold voltage is realized as the gate voltage where an inversion layer creates at the interface between the insulating layer (oxide) and the transistor's body. This technique influences  $V_{th}$  by changing the transistor's body voltage with respect to its source.

Under body bias conditions, the  $V_{th}$  of an NMOS transistor is approximated by Shichman-Hodges model. This model is related to the formation of a conducting inversion layer at the source terminal, which is defined as follow [51]:

$$V_{th} = V_{th0} + \gamma (\sqrt{V_{SB} + 2\phi_F} - \sqrt{2\phi_F})$$

$$(3.1)$$

where  $V_{th0}$  is the threshold voltage without body bias applied.  $\gamma$  is the body effect coefficient,  $2\phi_F$  is the surface potential in strong inversion, and  $V_{SB}$  is the sourceto-body voltage ( $V_{SB} > 0$  for RBB, and  $V_{SB} < 0$  for FBB). A similar expression holds for the PMOS transistor.

Forward body bias (FBB) reduces the  $V_{th}$ , and contrarily it is increased with reverse body biasing (RBB). These features are shown in Fig. 3.2 and Fig. 3.3 which express  $V_{th}$  as a function of body bias voltage for an NMOS and PMOS transistor in 65nm standard-process. These results were obtained for a W/L =120nm/65nm NMOS under room temperature and 1.2V power supply.

The simulations, conducted in three corners; including the fast-fast (ff), Typicaltypical (tt) and slow-slow (ss) verify that the value of  $V_{th}$ , and its sensitivity to body biasing changes strongly depends on the process variations. For the nominal NMOS device, body biasing from 0.5V (FBB) down to -1.2V (RBB) spans over a  $V_{th}$  range of about 255mV. This range is somewhat smaller for PMOS devices (237mV).



FIGURE 3.2: NMOS  $V_{th}$  versus body biasing in 65nm.



FIGURE 3.3: PMOS  $V_{th}$  versus body biasing in 65nm.

### 3.2.1 Leakage Current

The body biasing, sub-threshold leakage, gate-induced-drain leakage (GIDL) and junction leakage are related in an exponential manner as follows: [51]:

$$leakage_{norm} = \begin{cases} e^{l_1 V_{BB}} + l_2(e^{l_3 V_{BB}} - 1) & \text{for } V_{BB} \ge 0\\ e^{l_1 V_{BB}} + l_4(e^{l_5 V_{BB}} - 1) & \text{for } V_{BB} < 0 \end{cases}$$
(3.2)

where  $l_1$ ,  $l_2$ ,  $l_3$ ,  $l_4$ , and  $l_5$  are polynomial coefficients. These coefficients depend on the process, VDD,  $V_{th}$ , and temperature, which leads to different coefficients for each type of digital gate. The sample of  $l_1 = 2.26$ ,  $l_2 = 0.08$ ,  $l_3 = 10.27$ ,  $l_4 = 0.05$ , and  $l_5 = -0.93$  were picked in [51].

The first term of expression concerns the sub-threshold leakage dependency on body biasing. The second term forms either the junction leakage under forwardbody bias or the junction leakage under reverse body bias and GIDL.

Fig. 3.4 represents the leakage current in conjunction with body biasing. This result was obtained for a device size of W/L = 120nm/65nm PMOS under room temperature and 1.2V power supply. The simulations verify that the normalized leakage reduces significantly as body voltage increases from -1 to 1V. For the

nominal PMOS device, body biasing from -1 (FBB) down to 1V (RBB) spans over a normalized leakage range of about 1,000.



FIGURE 3.4: PMOS normalized leakage versus body biasing in 65nm.

### 3.2.2 Power Consumption Model

The total power consumption of a digital gate can be modeled by the summation of dynamic and leakage power consumption [51]:

$$P_{gate} = P_{dyn} + P_{leak} = a(xC_{intr} + C_{extr})V_{DD}^2 f_{ck} + I_{leak}V_{DD}$$
(3.3)

where a is the switching activity of the gate, which is the average number of transitions  $(0 \rightarrow 1 \text{ or } 1 \rightarrow 0)$  a signal switches per unit of time,  $f_{ck}$  is the operating frequency  $(= 1/T_{ck})$ , parameter  $C_{intr}$  is the intrinsic capacitance of the gate, which is body bias dependent, and parameter  $I_{leak}$  is the leakage current of a gate, which depends on both VDD,  $V_{th}$  and  $V_{BB}$ .

Fig. 3.5 demonstrates the power consumption versus body biasing. This result was obtained for a W/L = 120nm/65nm PMOS under room temperature and 1.2V power supply. The simulations verify that the normalized power reduces significantly as body voltage increases from -1 to 1V. For the nominal PMOS device, body biasing from -1 (FBB) down to 1V (RBB) spans over a normalized power range of about 550.



FIGURE 3.5: PMOS normalized power versus body biasing in 65nm.

# 3.3 Bias Generator

To generate the required negative voltage for reverse body biasing of n-type transistors, two main solutions; charge pumps or level shifters.

Charge pumps (CPs) [52]-[59] are power converters that convert the supply voltage to higher or lower constant (DC) levels. Through these building blocks, the charge packets are transferred from the power supply to the output terminal by only switches and capacitors to generate the target voltage.

Charge pumps were conventionally used in nonvolatile memories and SRAMs, in which the design was driven by settling time, or RF antenna switch controllers and LCD drivers [52]. They are also widely used to adapt the voltage levels between two or more functional blocks and to convey the electric energy extracted from surrounding environment towards a stronger buffer.

The first and most critical challenge of charge pumps' design is power efficiency; low power-efficiency charge pumps restricts the benefit of power conversion on chip. It is desirable to increase charge pumps efficiency in both battery powered systems, and also in many applications with common supply voltages to reduce the integrated circuits packaging cost because of heat dissipation.

Furthermore, charge pump usually needs an inverted switching signal to control both clock phases. Therefore, an oscillator is usually attached into the charge pumps to provide two out-of-phase signals.

In literature, several attempts on charge pump circuits are presented [53]-[59]. In many of them, the authors focused on optimization design methodologies. Table 3.1 compares the operation of some state-of-the-arts in terms of clock frequency and maximum output current/power. The observations showed that although charge pumps are efficient, they consume current/power in some order of mico amp/watt. This amount of current/power consumption for memory applications is very high. Thus, in order to generate the required negative DC voltage to bias the NMOS transistors' body, this study introduces the level shifter concept, which is discussed in the next section.

| Ref. | Year | Technology | Area     | Clock Freq. | Maximum $I_{out}(A)/$ |
|------|------|------------|----------|-------------|-----------------------|
|      |      | (nm)       | $(mm^2)$ | (MHz)       | Power(W)              |
| [53] | 2021 | 130        | 0.56     | 20-100      | 120mA                 |
| [54] | 2020 | 28         | 0.116    | 1           | $0.68 \mu A$          |
| [55] | 2019 | 180        | 0.96     | 1           | $89\mu W$             |
| [56] | 2018 | 350        | 0.56     | 1           | $40\mu A$             |
| [57] | 2018 | 65         | 0.032    | 15.2        | $1.5 \mu W$           |
| [58] | 2017 | 65         | 1.32     | 10          | $6.6 \mu W$           |
| [59] | 2016 | 130        | 0.6      | 0.04        | $15\mu W$             |

TABLE 3.1: Charge pump performance comparison

# **3.4** Level Shifter

In the past few decades, one of the most efficient methods to achieve a significant reduction in the digital circuits' power consumption is to reduce their supply voltage [60]. As the complexity of systems on chips (SoCs) increases, circuits operating in the sub-threshold region have attracted more attention to be utilized in low-power applications, such as wireless sensor networks, miniature healthcare devices, and environmental monitoring systems. However, the main problem is that lowering the power supply decreases circuit speed [61].

One effective solution to address this problem is to apply multiple supply voltage (MSV) in order to feed different parts of the system. In this methodology, higher voltages are employed for the circuits that require higher speed and the reverse is employed for lower speeds [62]. Therefore, level shifters (LS) that are able to provide wide voltage conversion ranges to convert the signal between different voltage domains were introduced [62].

The main challenge associated with MSV is the minimization of delay and power for level conversion between different voltage levels [63]. Furthermore, the increase in the number of power lines makes this issue particularly critical [64]. Thus, the design of a LS plays a key role in the overall system performance.

Several level shifter structures were recently proposed to enable voltage conversion from the deep sub-threshold region up to the nominal supply voltage level. The LS circuits in are based on the current-mirror (CM) configuration; these conventional CM-based lever shifters convert deep sub-threshold voltages to higher voltage levels. This is done due to the high drain-to-source voltage of PMOS transistors that facilitates the construction of a stable current mirror, which presents an effective on-off current comparison at their output node. However, these structure face high static current consumption which results in high standby power [65],[66],[67].

Alternatively, a conventional cross-coupled (CC) LS is a differential cascade voltage switch logic (DCVSL) to raise a low voltage level. Two factors have a significant impact on the operating range of CC LSs: transistors' threshold voltage and size; however, this operating range is difficult to extend to the sub-threshold region (concerning the NMOS threshold voltage). Furthermore, for converting a sub-threshold voltage, CC-based LSs require an exponential increase in NMOS transistor size, which is not area efficient[68].



FIGURE 3.6: Conventional DCVS level converter [67].



FIGURE 3.7: Conventional DCVS level converter [67].

To address these problems, several LS circuits have been reported in literature [69], [70], [71]. The LS proposed in [69] applied the Wilson current mirror scheme. The main advantage of this circuit is a fast operation; however, it happens at the expense of large standby power consumption. To lower static power concern, some modified Wilson current mirror-based LS configurations were recently presented in. However, static power consumption remains considerable.

Author [72] proposed a self-controlled current limiter by detecting output error to reduce power. This achieves relatively robust operation; however, this design increases power consumption and instantaneous short-circuit power, mainly at the fast corner and high temperature. The work in [73] removes the input inverter to decrease the falling edge delay. This design also could cut off the static power; however, it would be vulnerable to noisy or leaky environments. The falling edge delay is also reduced in [61] by employing a level-shift capacitor repeatedly charged to the voltage difference of VDDH and VDDL.

### 3.4.1 Design Challenges

Level shifters (LS) provide wide voltage-conversion ranges that are needed to implement different supply voltages and communicating with each other. The propagation delay and the power consumption are two important characteristics of an LS.

As the number of domains increases in a SoC, the number of required LS increases, and so does the delay and the power consumption associated with these level shifters. Thus, these two parameters play a key role on the overall system performance.

For multiple supply voltage designs, level shifters (LSs) are indispensable and ubiquitously inserted among different voltage domains, or directly used to drive the highly capacitive loads. In view of this, the LSs are preferred to operate in a wide dynamic range, including the sub-threshold input scenarios. Unfortunately, the conventional LS design based on the Differential Cascode Voltage Switch (DCVS) [74] topology is challenging for robust up-conversion from sub-threshold to superthrehold, which is due to the significant current contention caused by the limited driving strength of the pull-down devices operated in the sub-threshold region. Generally, when the input signal scales below 500mV, the contention leads to the conversion failure.

Table 3.2 compares the performance of the most recently published level shifters. The observations verify that levels shifters could produce the required output voltages while keep the static power consumption in some order of nano-Watt compared with micro-Watt in charge pump circuits. The other benefits of level shifters over charge pump is that they occupy less chip area.

| Ref. | Year | Technology | Area        | Clock Freq. | Static Power | Delay     |
|------|------|------------|-------------|-------------|--------------|-----------|
|      |      | (nm)       | $(\mu m^2)$ | (MHz)       | (W)          | (ns)      |
| [75] | 2020 | 180        | 182.46      | -           | 1.26n        | 22.47     |
|      |      |            |             |             | @1.8V        | @1.8V     |
| [76] | 2020 | 180        | 81.8        | -           | 1.33n        | 7.6       |
|      |      |            |             |             | @0.4V        | @0.4-1.8V |
| [77] | 2020 | 130        | 80.69       | 1           | 9.87n        | 21.98     |
|      |      |            |             |             | @0.3V        | @0.3V     |
| [78] | 2020 | 65         | 6.9         | 1           | 2.66n        | 26.75     |
|      |      |            |             |             | @0.3V        | @0.2-1V   |
| [79] | 2018 | 65         | 51.9        | -           | 1.35m        | 0.45      |
| [73] | 2018 | 65         | 7.45        | -           | 2.64n        | 7.5       |
|      |      |            |             |             | @0.3V        | @0.3V     |
| [80] | 2017 | 65         | 54.73       | -           | $165.77\mu$  | 0.35      |

TABLE 3.2: Level shifters' performance comparison

All the level shifter were proposed to convert positive voltages together; however, the main objective of this study is to utilizes this concept for producing negative voltages, which has its own advantages and disadvantages.

The rest of this chapter is organized as follows. Section 3.4.2 describes the proposed level shifter. Section 3.5 then describes basic concept of negative level shifting. Sections 3.6 and 3.7 discuss the proposed negative level shifters.

### 3.4.2 Proposed Level Shifter Design

The main objective of the level shifters is to extend the amplitude of the positive input voltages, and in this sub-section a novel level shifter is proposed. Fig. 3.8 demonstrates the proposed LS structure, which is composed of an input inverter  $(M_{n4} \text{ and } M_{p6})$ , a current mirror  $(M_{p3} \text{ and } M_{p4})$ , a cross-coupled pair  $(M_{p1} \text{ and} M_{p2})$ , a diode-connected current limiter  $(M_{n3})$ , and an output inverter  $(M_{n5} \text{ and} M_{p5})$ .



FIGURE 3.8: Proposed level shifter.

The input inverter transistors,  $M_{n4}$  and  $M_{p6}$ , provide the differential low-voltage signals, and the output inverter ( $M_{n5}$  and  $M_{p5}$ ) is designed to assure adequate output driving strength. Moreover, the combination of the current mirror and crosscoupled configuration creates a cascading effect, which lowers the drain-source voltage drop across transistors, and in turn decreases the leakage current.

In the proposed design here, when the signal IN goes from high to low, the  $M_{n1}$  is turned off, while the differential signal INL provided by input inverter turns ON transistor  $M_{n2}$ . By turning  $M_{n2}$  ON, the voltage at node X hovers above zero, which results in  $M_{p5}$  entering the ON state. Finally, VDDH voltage level is transferred to the output node. The same procedure would happen when signal IN goes to high, which results in a low-output voltage.

In sub-threshold region, an increase in the threshold voltage exponentially decreases the operating current. The low threshold  $(lV_{th})$  device improves the circuit performance in sub-threshold region, but it contributes to the leakage power [84]. Therefore, the  $lV_{th}$  devices are only employed to improve performance.

The key strength of the proposed design is that rather than utilizing multithreshold techniques to improve design performance in the sub-threshold region as same as the works in [68, 71, 77], the body biasing technique is employed. The main advantage is that the threshold voltage could be controlled without using different devices in the design.

In the proposed scheme, all PMOSs' body terminal, except  $M_{p5}$  and  $M_{p6}$ , are tied to the highest voltage level (VDDH) while all NMOSs' body are grounded; however,  $M_{p6}$  is required to have low-threshold voltage to enhance the performance in sub-threshold region. On the contrary, the pull-up  $M_{p5}$  connects to a highthreshold voltage  $(hV_{th})$  to reduce leakage current. In this context, the threshold voltage of  $M_{p5}$  is risen by applying an external 1.8V voltage.

The  $M_{p6}$ 's body voltage should be less than the minimum VDDL. This required body voltage is generated by a voltage divider. The voltage divider is composed of two diode-connected MOSs,  $M_{n7}$  and  $M_{n8}$ , as a load and an extra NMOS,  $M_{n6}$ , to break the VDDL down to the lower voltages. Moreover, the output node,  $B_n$ , would supply the  $M_{p6}$ 's body voltage. The main advantage is that the  $M_{p6}$ 's body voltage can be updated with the VDDL changes. Hence, its threshold voltage and the whole performance of the circuit will be adjusted if VDDL changes. Fig. 3.9 illustrates the bulk current of  $M_{p5}$  and  $M_{p6}$  when VDDL varies between 0.4 and 1Vat different PVT corners. As this figure exhibits, the  $M_{p5}$  bulk current is saturated to some orders of atto-amp across different corners, which could be neglected. However, the amount of current the  $M_{p6}$ 's bulk leaks increases to almost 200pA at fast-fast corner and VDDL of 0.6V. However, it would exponentially increase to almost 120nA at VDDL of 1V. These results validate that the proposed circuitry could effectively keep the leakage current low as VDDL is small.

Along with the mentioned proposal, a novel topological strategy is also employed in this design. The proposed circuit combines the cross-coupled and current mirror LS structures. Through this combination, the voltage drop across MOSs which are placed between VDDH and ground is decreased. Therefore, the leakage current and in turn the static-power consumption are lowered. A current limiter,  $M_{n3}$ , is



FIGURE 3.9: Bulk current of  $M_{p5}$  and  $M_{p6}$  as VDDL varies between 0.4 and 1V at different PVT corners.

also employed to decrease the leakage current furthermore. The optimal transistor sizes are listed in Table 3.3.

TABLE 3.3: Transistor Sizes

| Transistor          | $W/L(\mu m)$ | Transistor        | $W/L(\mu m)$ |
|---------------------|--------------|-------------------|--------------|
| $M_{n1}$ - $M_{n3}$ | 0.45/0.18    | $M_{p1} - M_{p3}$ | 0.45/0.18    |
| $M_{n4}$            | 0.45/0.18    | $M_{p4}$          | 0.85/0.18    |
| $M_{n5}$            | 0.45/0.18    | $M_{p5}$          | 0.45/0.18    |
| $M_{n6}$ - $M_{n8}$ | 0.45/2       | $M_{p6}$          | 1/0.18       |

### 3.4.2.1 Performance Evaluation

The circuit is implemented using 180nm CMOS technology. Simulations were performed for three process temperature (PT) corners  $(-20^{\circ}C, 27^{\circ}C \text{ and } 100^{\circ}C)$ , considering an input signal frequency of 1MHz, with VDDH fixed at 1.8V. Three important parameters for proper operation of LSs includes: Propagation delay, power consumption, and energy per transition, which are discussed in this section.

• Propagation-Delay Assessment

The propagation delay of a logic gate is the difference in time (calculated at 50% of input-output transition), when output switches, after application of input. The average delay of fall-time and rise-time is reported as the propagation delay.

Fig. 3.10 shows the signal transition of the proposed negative level shifter circuit, where the circuit can operate under a typical corner and temperature of  $27^{\circ}C$  at 1MHz.



FIGURE 3.10: Transient Results of the Proposed Negative Level Shifter.

Fig. 3.11 shows the delay variation of the proposed LS with VDDL variations across five corners and three temperature conditions  $(-20^{\circ}C, 27^{\circ}C \text{ and } 100^{\circ}C)$ . As expected, when VDDL increases, delay decreases significantly. Furthermore, when temperature raises from  $-20^{\circ}C$  to  $100^{\circ}C$ , the delay in the typical corner decreases from 15.5ns to 7ns at VDDL=0.4V. The obtained delay at  $27^{\circ}C$  for typical-typical corner also falls under 570ps as VDDL increases from 0.4V to 0.9V. The worst delay was 160ns under slow-fast corner and  $-20^{\circ}C$ , whereas the best delay (350ps) was happened under fast-fast corner at  $100^{\circ}C$ .

• Power Consumption Assessment



FIGURE 3.11: Delay versus VDDL variations for five corners at temperatures of  $-20^{\circ}C$ ,  $27^{\circ}C$  and  $100^{\circ}C$ .

Fig. 3.12 exhibits the relationship between the proposed LS's power consumption and VDDL. The results illustrate that the typical static power at  $27^{\circ}C$  evaluated for VDDH = 1.8V and VDDL between 0.4 to 0.9V, saturates to Pico-Watt (70pW for 0.4V VDDL). Our proposed LS has also achieved a minimum power of 58pW when VDDL is 0.9V under slow-slow corner and  $-20^{\circ}C$ , and exhibits the maximum power of 374nW when VDDL is .9V under fast-fast corner and  $100^{\circ}C$ .



FIGURE 3.12: Static power versus VDDL variations for five corners at temperatures of  $-20^{\circ}C$ ,  $27^{\circ}C$  and  $100^{\circ}C$ .

To investigate the robustness of the proposed level shifter, a 5000-point Monte Carlo simulation (see Fig. 3.13) was performed under typical-typical corner at VDDL=0.4V.

In case of delay, the highest number of samples were happened at 10ns, which is as expected. With regard to static power, the most number of samples was also occurred at 75pW, which is closed to the reported static power in 3.4.2.1 (70pWunder VDDL=0.4V and typical corner at  $27^{\circ}C$ ).



FIGURE 3.13: Histogram of delay and static power consumption of the proposed LS.

• Thermal Assessment

Fig. 3.14 shows the static power changes in different corners as temperature varies from  $-20^{\circ}C$  to  $100^{\circ}C$ . With regard to static power, the thermal variation of the fast-slow and fast-fast corners are significant compared with three other corners. The power at fast-slow and fast-fast corners increases to 7nW and 9.66nW respectively. The static power also increases by 1.4nW in typical-typical corner. However, the thermal changes at slow-fast and slow-slow are in order of pico-watt, which are respectively 743pW, and 433pW.



FIGURE 3.14: Static power versus temperature.

### 3.4.2.2 Comparison

Fig. 3.15 compares delay, energy per transition and total power of the proposed LS with the obtained result in [64], [82] and [83]. From the observation, the proposed LS topology in this study has lower propagation delay at VDDL of 0.45V compared with [64], [82] and [83]. The LS in [64] also has a delay of 20ns at 0.4V which is exactly twice of the reported delay in this study under same condition. This is a major verification of the higher speed of our LS.



FIGURE 3.15: Delay-energy characteristics comparison.

In terms of energy per transition, our proposed LS could also perform better than two others at low VDDL. As it is shown, it consume 26 fJ at VDDL=0.4V; however, the circuitries proposed in [64] and [82] needs energy in order of some hundreds of femto-joule (180 fJ and 200 fJ respectively). Our proposed LS could also perform better than what was designed in [83] because ours could consume almost 3 times less energy in average than the LS in [83].

Table. 3.4 compares the proposed circuit with several state-of-the-art ultra-low-voltage level shifters. To extend our comparison analysis, six recent proposals designed with the same technology were considered.

Among the level shifters realized in the 180nm technology, the SCMLS design proposed in [84] exhibits the lowest static power consumption (31.5pW) at VDDL=0.4V. However, this performance is achieved at the cost of higher Energy than ours (77.1fJ/T compared with 26fJ/T), which is almost 3 times more than our reported energy.

The MCMLS, DDSLS and cross-coupled designs in [84] also consume less static power than our proposal; however, all these three models require more energy/transition than our LS (42.9fJ, 130.9fJ, and 81.7fJ respectively compared with 26fJ). The same issue is also occurs for the LS in [64]. It consumes less static power while requires higher energy ( $\times 6.65$ ) than the proposed LS in this study.

In terms of propagation delay, the level shifters proposed in [84] and [83] exhibits faster operation; however, this achievement is obtained at the cost of a higher energy and static power respectively than what is proposed in this paper.

The level shifter proposed in [85] has  $\approx 2.85$ , and 3 times more static power and delay respectively than our proposed design. However, it requires 5.5 times less energy per transition.

|                          | [61]  | [85]                 | [82]  | [64]  | [72] | [84]   | [84]   | [84]         | [84]          | [83]                 | ThisWork     |
|--------------------------|-------|----------------------|-------|-------|------|--------|--------|--------------|---------------|----------------------|--------------|
|                          |       |                      |       |       |      | MCMLS  | SCMLS  | DDSLS        | Cross-Coupled |                      |              |
| Year                     | 2017  | 2017                 | 2017  | 2017  | 2018 | 2021   | 2021   | 2021         | 2021          | 2021                 | 2021         |
| Technology $(nm)$        | 180   | 180                  | 180   | 180   | 180  | 180    | 180    | 180          | 180           | 180                  | 180          |
| Results                  | chip  | $\operatorname{sim}$ | chip  | chip  | chip | sim    | sim    | $_{\rm sim}$ | sim           | $\operatorname{sim}$ | $_{\rm sim}$ |
| min VDDL (V)             | 330m  | 360m                 | 80m   | 200m  | 250m | -      | -      | -            | -             | 240m                 | 400m         |
| VDDH (V)                 | 1.8   | 1.8                  | 1.8   | 1.8   | 1.8  | 1.8    | 1.8    | 1.8          | 1.8           | 1.8                  | 1.8          |
| Operating frequency (Hz) | 500K  | 1M                   | 10K   | 100K  | 140K | 100M   | 100M   | 100M         | 100M          | 1M                   | 1M           |
| $P_{avg} (\mu W)$        | -     | -                    | -     | -     | -    | 4.2    | 7.2    | 12.7         | 8.1           | -                    | 2.6          |
| Energy(fJ/Transition)    | 61.5  | 4.7                  | 240   | 173   |      | 42.9   | 77.1   | 130.9        | 81.7          | 38.39                | 26           |
| $P_{static}(W)$          | 330p  | 200p                 | 150p  | 55p   | 1.5n | 37.6p  | 31.5p  | 37.6p        | 33.5p         | 98.3p                | 70p          |
| Delay $(s)$              | 29n   | 30n                  | 21.4n | 31.7n | 180n | 553.4p | 931.7p | 575.2p       | 854.2p        | 6.43n                | 10n          |
| Area $(\mu m^2)$         | 229.5 | -                    | 95.6  | 108.8 |      | -      | -      | -            | -             | -                    | 99.875       |
|                          |       |                      |       |       |      |        |        |              |               |                      |              |

TABLE 3.4: Performance summary and comparison with the state of the art.

# 3.5 Basic Concept of Negative Shifting

So far a level shifter has been proposed and before moving on to the proposed negative level shifter, the common source amplifier (see Fig. 3.16) is reviewed. In this circuit, the DC voltage node vary between VDD and -VSS. In this context, the nodes' voltage changes between 0 and -VSS if the highest applied voltage is being grounded. Through this simple methodology, the negative DC voltages appear on the nodes, which is the concept of proposing a negative level shifter in this study.



FIGURE 3.16: Common source amplifier.

Two negative level shifters are proposed in this research, that are discussed below.

### 3.6 Proposed Negative Level Shifter: Case One

A simplified model of level shifer is shown in Fig. 3.17(a). It is made up of a current source and cascading transistors to ensure high output resistance. In this scheme, the output voltage could be written as follow

$$V_{out} = V_{ds1} + V_{ds2} + V_{SS} \tag{3.4}$$

As  $V_{SS}$  is a negative voltage, this expression implies that  $V_{out}$  could be shifted down by  $V_{ds1} + V_{ds2}$ . The output voltage could also be expressed as

$$V_{out} = V_{ref} - V_{th1} \tag{3.5}$$

This expression also implies that  $V_{ref}$  can be shifted down by  $V_{th1}$ .

On the other hand, if  $V_{SS}$  is a negative voltage, then  $V_{ref}$  will be shifted towards the negative voltages. The schematic of the proposed Negative Level Shifter (NLS) is shown in Fig. 3.17(b). This circuit benefits from the properties of current scaling, and is able to achieve a low power consumption, as will be explained next.

The configuration shown in Fig. 3.17 (b) is composed of a current source,  $M_1$ - $M_6$ , cascading transistors,  $M_8$ - $M_9$ . Transistors  $M_{10}$ - $M_{14}$  are employed to compensate the output temperature dependency. The voltage  $V_{ref}$ , the voltage that will be shifted, is fed to this circuit thought a voltage reference circuit, which will be explained in detail later in section 3.6.3. To ensure of the low power operation of the circuit, transistors are biased in the sub-threshold region.



FIGURE 3.17: Proposed Negative Level Shifter. (a) Conceptual model (b) Schematic of the proposed negative level shifter.

In this design,  $M_1$ -  $M_6$  generate the required voltage for keeping  $M_7$  and  $M_9$  in the on-state.  $M_1$ -  $M_6$  are wide-length transistors in order to reduce their leakage current and consequently lower the whole static power of the circuitry. Pair of  $M_5/M_6$  also generate a current mirror to feed the required bias of  $M_6$  and  $M_9$ .

 $M_9$  always operates in the on-state because it is biased by  $M_5$ 's drain terminal which has a higher voltage than VDDL. The same operation happens to  $M_8$  as well because of a positive-gate bias.

When the voltage of zero is applied to the VDDL, the zero volt is expected at the output node; however, as the absolute value of VDDL increases, the absolute output voltage does increase. Fig. 3.18 demonstrates the output voltage variations with time. Clearly, when VDDL increases from 0.4V to 1.2V, the absolute output voltage grows by almost 0.7V.

Power and area are two primary measures for evaluating the circuit operations. However, it is vital to ensure that the proposed architecture is prone to temperature and mismatch effects as well. Therefore, full analysis of the these factors



FIGURE 3.18: Output voltage variations with time.

were performed and results are presented here.

### 3.6.1 Temperature Compensation

In the proposed configuration, shown in Fig. 3.17, the overdrive voltage of the  $M_8$  and  $M_9$  is the dominating factor in determining how output node varies with changes in temperature.

In order to demonstrate the effect, simulations on temperature variations were performed. The result of the simulation, presented in Fig. 3.19, demonstrates the changes in  $V_{gs}$  and  $V_{th}$  of transistors  $M_8$  and  $M_9$  with respect to changes in temperature within the range of  $0 - 100^{\circ}C$ . For transistor  $M_8$ , the  $V_{gs}$  and  $V_{th}$ slope changes at the rates of  $\approx 0.35mV/^{\circ}C$  and  $\approx 0.77mV/^{\circ}C$ , respectively. For transistors  $M_9$  these are  $3.94mV/^{\circ}C$  and  $1.5mV/^{\circ}C$  respectively. Such drastic fluctuation in the  $V_{gs}$  and  $V_{th}$ , especially in the case of  $M_9$ , will cause outputvoltage fluctuations when the temperature changes. Therefore, to decrease the output temperature dependency, a scheme is required to compensate for the temperature dependency. Fig. 3.19(b) depicts the schematic of the temperature compensation circuitry.



FIGURE 3.19: Thermal behavior of  $V_{gs}$  and  $V_{th}$  for (a)  $M_8$  and (b)  $M_9$ .

The advantage of this circuit arrangement is that it is capable of improving the thermal slope of  $V_{gs}$  and  $V_{th}$ . Consequently, this circuit is attached to the output node of the enhanced NLS (see Fig. 3.19(b)).

The output voltage with and without temperature compensation has been shown in Fig. 3.20. Moreover, the temperature-compensation cell ratios of the circuit is shown in Table 3.5.

TABLE 3.5: Transistors' size ratio of the proposed Negative Level Shifter.

| $M_1 - M_4$       | $200nm/15\mu m$ |
|-------------------|-----------------|
| $M_5 - M_6$       | $200nm/10\mu m$ |
| $M_7$             | $200nm/15\mu m$ |
| $M_8 - M_9$       | 200nm/100nm     |
| $M_{10}$          | $200nm/2\mu m$  |
| $M_{11} - M_{14}$ | 200nm/60nm      |



FIGURE 3.20: Output voltage with and without temperature compensation.

### 3.6.2 Performance Evaluation

The circuit is implemented using the TSMC 65nm technology. The thermal behavior, delay, and static power consumption are evaluated in this section.

### 3.6.2.1 Thermal Assessment

As Fig. 3.21 shows, the thermal slope of  $V_{gs}$  and  $V_{th}$  for  $M_9$  in Fig. 3.19(b) are improved while no changes are appeared for these two voltages with respect to  $M_8$ . With regard to  $M_9$ , a thermal slope of  $\approx 0.81 mV/^{\circ}C$  in  $V_{gs}$  happens, compared with  $\approx 0.96 mV/^{\circ}C$  for  $V_{th}$ , which are respectively improved by  $3.13 mV/^{\circ}C$  and  $0.54 mV/^{\circ}C$ .

Furthermore, to explore  $M_9$  sensitivity to temperature, the  $V_{ds}$  of the  $M_9$  were compared before and after compensation, as seen in Fig. 3.22(a). As can be observed, the thermal slope of  $5.439mV/^{\circ}C$  before compensation decreases to  $1.193mV/^{\circ}C$ after compensation.



FIGURE 3.21:  $V_{gs}$  and  $V_{th}$  thermal behavior of (a)  $M_8$  and (b)  $M_9$  before and after compensation.

Moreover, the output voltage in terms of temperature for different corners is also shown in Fig. 3.22(b). The results imply that the negative output voltage varies between -0.682 and -0.671V over the range of  $0 - 100^{\circ}C$  for typical-typical corner. This range will also be extended to -0.71 and -0.65V in presence of process variations.



FIGURE 3.22: (a)  $M_9$  drain-source voltage changes in terms of temperature before and after compensation (b) Negative level shifter's output-voltage variations in terms of temperature over 5 different process corners.

The Monte Carlo simulation with 2,000 samples is performed, as shown in Fig. 3.23. The statistical CMOS transistor model is used in the Monte Carlo simulation,

and the analysis variable is process variation only. This simulation is done for dc analysis when temperature is swept between 0 and  $100^{\circ}C$ , and the effect of such variation on  $V_{out}$  for different mu is shown in Fig. 3.23. Clearly, for  $mu = 772\mu$ and  $mu = 88\mu$ , the highest number of samples are happened for  $V_{out}$  equal to -0.68V and -0.69V, respectively.



FIGURE 3.23: Histograms of the  $V_{out}$ .

#### 3.6.2.2 Delay

The reported delay is calculated in the simulations by the following definitions:

- Delay in rising time  $= t_{r,out} t_{r,in}$
- Delay in falling time  $= t_{f,out} t_{f,in}$

The average delay is considered as the reported delay.

Fig. 3.24 shows the signal transition of the proposed negative level shifter circuit, where the circuit can operate under a typical corner and temperature of  $27^{\circ}C$  at 250kHz.

To validate that the proposed circuit operates properly across all PVT corners, the delay of the proposed circuit has been simulated at VDDL = 1V. In the



FIGURE 3.24: Transient Results of the Proposed Negative Level Shifter.

proposed circuit, three corners as the typical, worst and best cases are assumed. The Typical Case Corner (TCC) includes typical-NMOS and PMOS. For delay investigation, the corner including fast-NMOS and slow-PMOS is the Best corner (BCC). Inversely, slow NMOS, slow PMOS, are selected to illustrate the worst Case Corner (WCC).

Fig. 3.25(a) shows the variation of the delay of the proposed NLS across the mentioned PVT corners. These results demonstrate that the worst-case delay is almost close to the best-case delay at VDDL= 0.3V. However, as the VDDL increases to 1V, the worst-case delay is 2.35 times higher than the best-case delay, while the circuit can still operate properly.

### 3.6.2.3 Power Consumption

Fig. 3.25(b) exhibits the relationship between the proposed NLS's power consumption and the input voltage. The simulation was performed under the fast-NMOS and fast-PMOS as the worst case (WCC) and the slow-NMOS and fast-PMOS (BCC) as the best case and typical-NMOS and PMOS (TCC) as the Typical Case Corner.



FIGURE 3.25: (a) Delay (b) power consumption of the proposed NLS in terms of VDD variations.

Our proposed NLS has achieved a minimum power of 0.58nW when the input voltage is 300mV under BCC, and exhibits the maximum power of 68nW when the input voltage is 1V under WCC condition.

### 3.6.3 Voltage Reference

In this section, the realization of the proposed voltage reference is first described in subsection 3.6.3.1. In the sequel, design considerations in terms of Effective Temperature Coefficient (TC), Supply Voltage Sensitivity (or Line Sensitivity (LS)) and Power Supply Rejection Ratio (PSRR) are discussed in following.

#### 3.6.3.1 Proposed Voltage Reference

The proposed Voltage Reference is based on the Resistorless Beta Multiplier circuit [86]. The input and output levels range from a subthreshold voltage to the standard power supply.

Fig. 3.26 shows the schematic of the proposed Voltage Reference circuit. In this circuit,  $M_1 - M_{10}$  constitutes the Resistorless Beta Multiplier, and  $M_{15} - M_{17}$ 

are the cascode stage that was employed to improve the PSRR performance. According to [86], the TC of  $I_{ref}$  is a function of  $V_{gs,M2}$  and  $V_{thn}$ . To equalize the thermal slopes of  $V_{gs,M2}$  and  $V_{thn}$ ,  $M_3$  was added.



FIGURE 3.26: Proposed Voltage Reference circuit.

To generate a constant voltage,  $M_{18} - M_{19}$  are employed as the load.  $M_{p1} - M_{p8}$  function as the trimming circuit at the  $V_{ref}$  node to compensate for the influence of process variations.

As reported by [87], process variations and mismatch between the sizes and  $V_{th}$  of the MOS pairs induces variations in the  $V_{ref}$  slopes. The mismatch effect can be greatly reduced by proper matching techniques during circuit implementation. To calibrate the process variations of  $V_{ref}$ , a digital controlled binary weighted aspect ratio adjustment (trimming circuits, see Fig. 3.26) has been employed to  $V_{ref}$  node in order to compensate for the influence of process variations.

The number of the trimming bits is critical because there are some disadvantages to using a large number of bits. For example, it creates circuit complexity and increases the switches' leakage current [88]. Alternatively, using a small number of trimming bits is also problematic because it could result in the low precision of  $V_{ref}$ . In this work, a 5-bit  $(D_4-D_0)$  digital trimming structure is employed to cover the process variations and mismatches. Depending on the minimum desire precision of  $V_{ref}$ , the switches could be designed. As shown in this figure, the minimum size (LSB) is W/L while the maximum size (MSB) is 8W/L.

The overall mechanism of the trimming circuit can be summarized as follow: this circuit can change the total leakage current flowing into the  $V_{ref}$  node, which causes different  $V_{ref}$  levels. The  $b_0$  to  $b_4$  are the logic voltage levels. By turning each bit on/off the relevant transistor  $(M_{p5}-M_{p8})$  could switch on/off. Therefore, the  $M_{n1}$  leakage and in turn  $V_{ref}$  are managed based on the applied code to  $b_3b_2b_1b_0$ .

### 3.6.3.2 Effective Temperature Coefficient

A temperature coefficient (TC) describes the relative change of a physical feature associated with a given change in temperature. In chip design, the lowest TCmeans the associated parameter shows lowest sensitivity as temperature varies. The effective TC, given by 3.6 [87], is the temperature coefficient in the range of  $V_{max}$  and  $V_{min}$ .

$$TC_{EFE} = \frac{V_{ref,max} - V_{ref,min}}{(T_{max} - T_{min})V_{ref}(27^{\circ}C)} \times 10^{6}$$
(3.6)

### 3.6.3.3 Supply Voltage Sensitivity

The reference voltage sensitivity with respect to supply voltage variations is commonly evaluated through the line sensitivity (LS) parameter [89]. The LS is defined as

$$LS = \frac{\Delta V_{ref}}{\Delta V DD \times V_{ref,\mu}} \times 100\%$$
(3.7)
in which  $\Delta VDD$  is the VDD range of operation,  $\Delta V_{ref} (V_{(ref,max)} - V_{(ref,min)})$ is the absolute difference of the reference voltage in the  $\Delta VDD$  range considered and  $V_{ref,\mu}$  is the mean value of the voltage reference for the  $\Delta VDD$  range. The LS optimization depends on the minimization of  $\Delta V_{ref}$ .

#### 3.6.3.4 Power Supply Rejection Ratio (PSRR)

PSRR is a measure of how well a circuit rejects noise of various frequencies from a particular device's power supply. The PSRR could be obtained based on the small-signal model [88] and the result is approximately given by

$$PSRR = -20log_{10}(\left|\frac{VDD}{V_{ref}}\right|) \tag{3.8}$$

#### 3.6.3.5 Voltage Reference Evaluation

The Voltage Reference variations in terms of temperature is illustrated in Fig. 3.27(a). The evidence suggests that, within the typical-typical corner, the obtained trimmed-output voltage in room temperature is around 750mV, with a slight curvature, and has a peak-to-peak variation of 0.16V with temperature ranging from  $0 - 100^{\circ}C$  at 1V power supply.

Fig. 3.27(b) illustrates  $V_{ref}$  versus digital words over 5 different process corners. It shows that the trimming  $V_{ref}$  is distributed in a narrow range of almost 20mV in typical-typical corner. By a handy calculation, the mean value of  $V_{ref}$  would be 1.25mV, which is approximately proportional to the weight of each bit. The evidence shows that this technique also promises low leakage current of the whole circuit, which is 8.5nA for a 750mV output voltage.

The discontinuity between code 7  $(b_3b_2b_1b_0 = 0111)$  and code 8  $(b_3b_2b_1b_0 = 1000)$ in this figure happened because of the transistor  $(M_{p4})$  cell ratio of the trimming



FIGURE 3.27: Voltage reference variations in terms of (a) temperature (b) digital words over 5 different process corners.

circuit. The major difference between these two codes is  $b_3$  logic code which turns from 0 to 1. As  $b_3$  is the controlling line for switching  $M_{p8}$  on/off, the key answer is behind this transistor leakage current, which is fed by  $M_{p4}$ . As a result, the cell ratio of this transistor  $(M_{p4})$  makes a difference in the  $V_{ref}$  discontinuity from code 7 to 8.

Fig. 3.28 shows  $V_{ref}$  curve vs. VDD and line regulations calculated. In our case, the LS at room temperature is 0.29%/V.



FIGURE 3.28:  $V_{ref}$  versus VDD.

The effective TC is  $147ppm/^{\circ}C$  for the 0 to  $100^{\circ}C$  temperature range, with 1V supply. The TC sensitivity with respect to VDD is shown in Fig. 3.29(a). Typical

simulations present a minimum TC of  $129ppm/^{\circ}C$  at 1.1V supply voltage while the maximum TC is less than 210 ppm/^C at 0.85V. The obtained results confirm the promising operation of the proposed voltage reference when temperature varies.



FIGURE 3.29: (a) TC sensitivity (b) PSRR in terms of VDD.

Fig. 3.29(b) presents a plot of PSRR versus noise frequency.

Different *PSRR* manners in different corners are driven from the NMOS and PMOS behavior in different corners. Fast and slow corners exhibit carrier mobilities that are higher and lower than normal, respectively. As the mobility and frequency are directly related, when the mobility (corners) changes, the frequency does so.

Furthermore, PSRR is the variation of the VDD/ $V_{ref}$  as frequency changes. When the circuitry is performed in different corners, the mobility of the MOSs will change, which affects the characteristics of the MOSs. As a consequence,  $V_{ref}$ would change with the variation of the frequency and corners. These variations will in turn appear in both low and high frequencies.

In this study, the supply rejection is -46dB at 10Hz and 1V supply.

#### 3.6.4 Comparison

As explained, a negative DC voltage generator based on a level shifter and a voltage reference is proposed in this study. The main role of the level shifter is that to convert the positive values to the negative ones. The voltage reference was also required to be employed as the input of the the level shifter.

This negative DC voltage generator is proposed to improve the main disadvantage of the current generators (converters and charge pumps): high power consumption.

In this section, the proposed voltage reference is compared with other works in Table. 3.6 in therms of power, TC, LS, and PSSR. Table. 3.7 also compares the proposed negative level shifter with state-of-the-art level shifters, and then negative level shifter is compared with other methods (converters and charge pumps) to generate a negative DC voltages in Table. 3.8.

Table. 3.6 presents the summary of performance comparison among the state-ofthe-art voltage references. From the comparison, it is observed that the power consumption is improved by 87.8% and 87.5% respectively compared to [90] and [91].

Such power reduction occurs while the VDD is  $3.33 \times$  larger that applied in [90]. These results verify that our modified structure could consume lower power even under higher VDD. However, it happens in cost of higher LS.

Table. 3.7 compares the proposed NLS with the recently reported ultra-low-voltage LSs. The proposed NLS exhibits  $1.87 \times$  faster delay than what is reported in [73] under the same voltage, 0.3V. Compared with those power-efficient designs implemented in the same technology, [73] and [78], our proposed LS exhibits  $3.41 \times$  and  $3.44 \times$  power reduction. Regards to delay, our proposal could also successfully reduce this parameter,  $1.87 \times$  and  $6.68 \times$  improvement compared with [73] and [78]. Furthermore, the proposed LS could operate at higher frequencies comparing

|                                         | [92]           | [93]           | [94]           | [95]           | [87]        | [90]           | [91]           | [96]           | [97]            | ThisWork     |
|-----------------------------------------|----------------|----------------|----------------|----------------|-------------|----------------|----------------|----------------|-----------------|--------------|
| Technology                              | 180            | 180            | 180            | 180            | 130         | 65             | 65             | 65             | 65              | 65           |
| (nm)                                    |                |                |                |                |             |                |                |                |                 |              |
| Results                                 | $_{\rm sim}$   | chip           | chip           | chip           | chip        | chip           | chip           | chip           | $_{\rm sim}$    | $_{\rm sim}$ |
| Year                                    | 2020           | 2020           | 2019           | 2019           | 2019        | 2016           | 2020           | 2018           | 2020            | 2021         |
| VDD (V)                                 | 3.3            | 0.9            | 0.7-2          | 1.2            | 0.3 - 1.2   | 0.3            | 1.2            | 1.2            | 1.2             | 1            |
| Power $(nW)/$                           | 5,168          | 1.8            | 28             | -              | 4,120       | 70             | 68             | 186,000        | 765             | 8.5          |
| Current $(nA)$                          | nW             | nW             | nW             | nW             | nW          | nW             | nW             | nW             | nA              | nW           |
| $V_{ref}(mV)$                           | 118.54         | 261            | 368            | 500            | 306.8       | 168            | 202            | 500            | 496             | 750          |
| $\overline{\mathrm{TC}(ppm/^{\circ}C)}$ | 21.5           | -              | 43.1           | 22             | 81.5        | 142            | 36.6           | -              | 11.42           | 147          |
| Temp                                    | $-60 \sim 120$ | $-40 \sim 130$ | $-40 \sim 128$ | $5~0 \sim 100$ | $0 \sim 85$ | $-20 \sim 100$ | $-10 \sim 110$ | $-40 \sim 125$ | $5-40 \sim 125$ | $0 \sim 100$ |
| LS(%/V)                                 | 0.035          | 0.013          | 0.027          | 0.28           | 13          | 4.8            | 0.181          | 0.86           | 0.725           | 0.29         |
| PSRR                                    | -68.64@100     | -73.5@100      | -59@10         | -42@N/A        | -29@100     | -              | -              |                | -54@100         | -46@10       |
| (dB)@freq.                              | -60.37@10K     |                | -39@1k         |                |             |                |                |                |                 | -27@1K       |
| (Hz)                                    |                |                | -39@1k         |                |             |                |                |                |                 | -26@1M       |
|                                         |                |                |                |                |             |                |                |                |                 |              |

TABLE 3.6: Performance summary and comparison with the state of the art.

with designs in [73] and [78], 2.5MHz versus 1MHz. However, this power and delay-efficient design expenses in a larger area.

This proposed NLS also consumes lower power by far than what is proposed in [80] and [79], 0.772nW compared with  $165.77\mu W$  and 1.35mW respectively; however, this achievement costs at a slower operation and larger area of the NLS.

|                  | [77]  | [75]   | [76]        | [73]    | [80]        | [79]      | [78]   | ThisWork     |
|------------------|-------|--------|-------------|---------|-------------|-----------|--------|--------------|
| Technology       | 130   | 180    | 180         | 65      | 65          | 65        | 65     | 65           |
| (nm)             |       |        |             |         |             |           |        |              |
| Results          | meas  | sim    | meas        | meas    | meas        | sim       | meas   | $_{\rm sim}$ |
| Year             | 2020  | 2020   | 2020        | 2018    | 2017        | 2018      | 2020   | 2021         |
| VDDL (V)         | 0.031 | 0.1-3  | 0.085 - 1.8 | 0.1-1.2 | 0.45 - 1    | 0.5 - 1.2 | 0.12   | 0.4 - 1.2    |
| Operating Freq.  | 1     | 1      | 0.1         | 1       | 500         | 500       | 1      | 2.5          |
| (MHz)            |       |        |             |         |             |           |        |              |
| Static           | 9.87n | 1.26n  | 1.33n       | 2.64n   | $165.77\mu$ | 1.35m     | 2.66   | 0.772n       |
| Power $(W)$      | @0.3V | @1.8V  | @0.4V       | @0.3V   |             |           | @0.3V  | @0.3V        |
| Delay $(nS)$     | 21.98 | 22.47  | 7.6         | 7.5     | 0.35        | 0.45      | 26.75n | 4            |
|                  | @0.3V | @1.8V  | @0.4-1.8V   | @0.3V   |             |           | @0.2-1 | @0.3V        |
| Area $(\mu m^2)$ | 80.69 | 182.46 | 81.8        | 7.45    | 54.73       | 51.9      | 6.9    | 87           |

TABLE 3.7: Comparison of the proposed negative level shifter with satae-of-the-art LSs.

Table. 3.8 compares the negative level shifter with charge pumps. As it is reported, the charge pumps provide the output voltage in range of volt; however, a negative DC voltage under 1V could be obtained by the proposed negative level shifter. Furthermore, the leakage current of the proposed negative level shifter is in order of nano-Amp while charge pumps and converters leaks the current in terms of micro and mili-Amp respectively.

Table. 3.8 compares the operation of the proposed negative level shifter with state-of-the-art charge pumps. Clearly, charge pumps consumes power/current from some hundred of nano Watt/Amp to some order of milli Watt/Amp depending on their design and implemented technology. Compared with those that were implemented in 65nm process, [100], [57] and [58], our proposed LS could generate the negative voltages with by far lower power consumption (×886,000, ×22 and ×97 respectively). Furthermore, the observations demonstrate that the expected negative voltages could be obtained under lower chip area,  $0.087mm^2$  versus  $1.92mm^2$  and  $1.32mm^2$  in [100] and [58] respectively.

 TABLE 3.8:
 Comparison of the proposed Negative Level Shifter with charge pumps.

| Ref.      | Year | Technology | Area     | Clock Freq. | Maximum $I_{out}(A)/$ |
|-----------|------|------------|----------|-------------|-----------------------|
|           |      | (nm)       | $(mm^2)$ | (MHz)       | Power(W)              |
| This Work | 2021 | 65         | 0.087    | <b>2.5</b>  | <b>68</b> nA          |
| [53]      | 2021 | 130        | 0.56     | 20-100      | 120mA                 |
| [99]      | 2020 | 110        | 0.073    | 25          | 1.476mW               |
| [54]      | 2020 | 28         | 0.116    | 1           | $0.68 \mu A$          |
| [55]      | 2019 | 180        | 0.96     | 1           | $89\mu W$             |
| [98]      | 2019 | 180        | 0.8      | 0.01        | $10\mu A$ - $1mA$     |
| [100]     | 2019 | 65         | 1.92     | 40          | 60mW                  |
| [56]      | 2018 | 350        | 0.56     | 1           | $40\mu A$             |
| [57]      | 2018 | 65         | 0.032    | 15.2        | $1.5 \mu W$           |
| [58]      | 2017 | 65         | 1.32     | 10          | $6.6 \mu W$           |
| [59]      | 2016 | 130        | 0.6      | 0.04        | $15\mu W$             |

However, the main advantage of charge pumps over this proposal is that most of charge pumps do not require a negative supply to operate. Our LS requires a negative supply voltage which is fed by the negative terminal of an external power supply. Consequently, for the future work, the creation of negative voltages by a reference voltage is suggested.

## 3.7 Proposed Negative Level Shifter: Case Two

Fig. 3.30 demonstrates the proposed LS structure, which is composed of an input inverter  $(M_{n9} \text{ and } M_{p5})$ , a current mirror  $(M_{n1} \text{ and } M_{n2})$ , a cross-coupled pair

 $(M_{n3} \text{ and } M_{n4})$ , a pair of diode-connected current limiter  $(M_{n5} \text{ and } M_{n6})$ , and an output inverter  $(M_{n7}-M_{n8} \text{ and } M_{p3}-M_{p4})$ .

The input inverter transistors,  $M_{n9}$  and  $M_{p5}$ , provide the differential low-voltage signals, and the output inverter ( $M_{n7}$ - $M_{n8}$  and  $M_{p3}$ - $M_{p4}$ ) is designed to assure adequate output driving strength. Moreover, the combination of the current mirror and cross-coupled fashion creates a cascading effect, which lowers the drain-source voltage drop across transistors, and in turn decreases the leakage current.



FIGURE 3.30: Proposed negative level shifter.

The optimal transistor sizes are listed in Table 3.9.

TABLE 3.9: Transistor Sizes

| Transistor          | W/L      | Transistor          | W/L     |
|---------------------|----------|---------------------|---------|
|                     | (nm/nm)  |                     | (nm/nm) |
| $M_{n1}$            | 200/60   | $M_{n7} - M_{n9}$   | 200/60  |
| $M_{n2}$            | 200/1000 | $M_{p1}$ - $M_{p2}$ | 200/60  |
| $M_{n3}$ - $M_{n4}$ | 200/60   | $M_{p3} - M_{p4}$   | 200/60  |
| $M_{n5}$ - $M_{n6}$ | 200/1000 | $\dot{M_{p5}}$      | 200/60  |

#### **3.7.1** Performance Parameters

Two parameters are critical in the level shifter design: conversion voltage and propagation delay, which are discussed as follow.

The reported delay is calculated in the simulations by the following definitions:

- Delay in rising time  $= t_{r,out} t_{r,in}$
- Delay in falling time  $= t_{f,out} t_{f,in}$

The average delay is considered as the reported delay.

Fig. 3.31 shows the signal transition of the proposed negative level shifter circuit, where the circuit can operate under a typical corner and temperature of  $27^{\circ}C$  at 250kHz.



FIGURE 3.31: Transient Results of the Proposed Negative Level Shifter.

In the proposed design, when the signal IN goes from low to high, the  $M_{p1}$  is turned off, while the differential signal  $\overline{IN}$  provided by input inverter turns transistor  $M_{p2}$ on. By turning  $M_{p2}$  on, the voltage at node X hovers below zero. The absolute value of this voltage would be definitely less than the absolute of VDDH, which results in  $M_{n7}$  and  $M_{n8}$  entering the on state. Finally, VDDH voltage level is transferred to the output node. The same procedure would happen when signal IN goes to low, which results in a low-output voltage. Consequently, the output voltage level is controlled bt the VDDH.

Furthermore, the delay of output signal with respect to the input is mainly affected by the VDDL for two reasons: (1) the input-NOT gate is supplied by the VDDL and (2) the on and off sate of the  $M_{p1}$  and  $M_{p2}$  are defined by the VDDL.

#### 3.7.2 Performance Evaluation

Fig. 3.32 compares the output voltage changes with the input voltage variations for both the VDDL and VDDH variations.

As shown, when the VDDL increases from 0.2V to 1V (left figures), the same increase also happens in the input voltage level. This increase has no affect on the output voltage level; however, the speed of the charging and discharging of the output node is directed by the VDDL. For the VDDL=0.2V, the delay is significant, but it would become marginal as the VDDL grows from 0.4V to 1V.

However, when VDDH increases from 0.4V to 1.2V, the low level of the output is varied between same voltages while the high level of output remains 0V.

Fig. 3.33 (top figures) shows the delay variation of the proposed LS with the VDDL and VDDH variations across five corners.

As expected, when the VDDL increases, delay decreases significantly. The obtained delay in typical-typical corner falls under 460ps as the VDDL increases from 0.2V to 1V. The worst delay was 50ns, which occurred under the slow-slow corner at a VDDL of 0.2V, whereas the best delay (150ps) occurred under fast-fast corner at a VDDL of 1V.

Furthermore, when the VDDH increases from 0.6V to 1V, the delay lowers from 4ns to 400ps within the typical-typical corner. The worst and best case occurs for



FIGURE 3.32: Output voltage variations versus input voltage changes when both the VDDL and VDDH varies.

slow-slow and fast-fast corners, where the delay is 25ns at 0.6V and 89ps at 1V respectively.

Fig. 3.33 (bottom figures) exhibits the static power variation of the proposed LS with VDDL and VDDH variations across five corners.

The results illustrate that the typical static power evaluated for a VDDL between 0.2 to 1V saturates to nano-Watt (9.6*nW* for 0.2*V* of the VDDL). Our proposed LS has also achieved a minimum power of 2.83nW when the VDDL is 0.2V under slow-slow corner, and it exhibits the maximum power of 546nW when the VDDL is 1V under fast-fast corner.

Similarly, when the VDDH changes from 0.6V to 1.2V, the static power dissipation of the proposed LS increases in some order of nano-Watt according to the corner analysis. The most significant variation of the power belongs to the fast-fast corner, in which the power increases by 105nW. Conversely, the minimum variations occur at a slow-slow corner, where the power increases from 490pW to 5.75nW. For a



typical-typical corner, this parameter also varies between 2nW and 18.96nW when the VDDH increases from 0.6V to 1.2V.

FIGURE 3.33: Delay and static power variations with VDDL and VDDH.

#### 3.7.3 Comparison

Table. 3.10 compares the proposed NLS with the recently reported ultra-low-voltage LSs. The proposed NLS exhibits by far a lower power consumption compared with what is reported in [80] and [79]. However, it happens at the cost of slower operation. It also shows  $10.4 \times$  and  $3 \times$  faster operation compared to [73] and [78]. However, these achievements expenses in a greater power consumtion.

## 3.8 Conclusion

Negative DC voltage generation is a bottleneck in the analogue circuit design. Charge pumps are among the most interested blocks to generate both positive and negative DC voltages. Although they are a commercial blocks, charge pumps' design is faced with some challenges for low-power applications, including: (1) the

| Ref.      | Year | Technology | Area        | Clock Freq. | Static Power | Delay     |
|-----------|------|------------|-------------|-------------|--------------|-----------|
|           |      | (nm)       | $(\mu m^2)$ | (MHz)       | (W)          | (ns)      |
| This work | 2021 | 65         | -           | 2.5         | 12n          | 2.5       |
|           |      |            |             |             | @0.3V        | @0.3V     |
| [75]      | 2020 | 180        | 182.46      | -           | 1.26n        | 22.47     |
|           |      |            |             |             | @1.8V        | @1.8V     |
| [76]      | 2020 | 180        | 81.8        | -           | 1.33n        | 7.6       |
|           |      |            |             |             | @0.4V        | @0.4-1.8V |
| [77]      | 2020 | 130        | 80.69       | 1           | 9.87n        | 21.98     |
|           |      |            |             |             | @0.3V        | @0.3V     |
| [78]      | 2020 | 65         | 6.9         | 1           | 2.66n        | 26.75     |
|           |      |            |             |             | @0.3V        | @0.2-1V   |
| [79]      | 2018 | 65         | 51.9        | -           | 1.35m        | 0.45      |
| [73]      | 2018 | 65         | 7.45        | -           | 2.64n        | 7.5       |
|           |      |            |             |             | @0.3V        | @0.3V     |
| [80]      | 2017 | 65         | 54.73       | -           | $165.77\mu$  | 0.35      |

 TABLE 3.10:
 Level shifters' performance comparison

output current and in turn output power are high. (2) they occupy a large portion of the chip.

To address these problems, a level shifter would be proposed in this study to generate the required negative DC voltage to bias the NMOS transistors body in a reverse direction. The proposed level shifter shifts the input voltage under ground voltage, which would be considered as a negative voltage.

## Chapter 4

# CMOS **SRAM Overview**

This chapter presents the basics of the CMOS SRAM design and operation. it will explains the architecture of the SRAM cells, read/write operations and the required peripheral circuitries.

## 4.1 Introduction

SRAM and DRAM are two examples of volatile memories that can store data as long as power is applied to the device. All stored data in the memory would be lost if the power is ever removed. The primary advantage of the SRAM memory cells over their DRAM counterpart are their speed, whereas DRAM's main feature is its storage density. Unlike DRAMs, SRAM cells do not need to be refreshed which makes them available for reading and writing the data 100% of the time [101].

## 4.2 Basic Storage Cell

A number of SRAM cell topologies have been reported in the past decade. Among these topologies, six transistor (6T) SRAM (see Figure 4.1) cell have received more attention in comparison with 4T SRAM cells [102]. The main reason for the 6T-SRAM popularity in low-power SRAM units include: (1) its data stability is independent of the leakage current, (2) a significantly higher tolerance against noise which is an important benefit especially in the scaled technologies where the noise margins are shrinking.



FIGURE 4.1: 6T SRAM cell [101].

SRAM's distinguishing feature is that the memory storage element is a static-logic memory element. Two back-to-back inverters forms the main memory cell, while two access transistors are attached to the memory cell to create a path for read and write operations. An example SRAM storage cell is shown in Figure 4.2(a). To provide multi-bit and multi-location storage, the single cell is arranged in an array, as shown in Figure 4.2(b). The word lines (WLs) run horizontally through the array, which actives the cells in a single row. On the other hand, the bitline pairs run vertically. In this way, the data that is being read or written enables to communicate between the cell and the read/write circuits [101, 102].



FIGURE 4.2: (a) The basic SRAM cell consists of two access transistors that provide a connection to a static memory element. (b) An SRAM array consists of multiple SRAM cells arranged in an array and provides multi-bit and multiaddress storage [101].

## 4.3 Architecture of an SRAM Unit

The periphery blocks, in an SRAM unit facilitate the cell accessibility for the read or write operation. In practice, multiple bits that are defined by a word (word size M) are accessed during the read or write operation at the same time. In the regular SRAMs only one word is accessed at a time. The number of the words that are accommodated in the row specifies the length of address, N. So, the total number of the cells in an array is calculated as  $M \times N$ . An SRAM unit consists of several periphery blocks including: decoders, Sense amplifiers (SA) and timing control units, which are explained in the sub-sections.

An array contains the arrangement of the cells connected horizontally and vertically together. Figure 4.3 illustrates an array construction and the associated bitlines and wordlines. While the wordlines connect the cells sitting on the same row, those on the same column share the same bitline pair. In each access cycle, only one wordline is activated in the array. The wordline activation discharges all SRAM cells on the same row through their corresponding bitlines. Hence, all the bitlines are discharged due to wordline activation [102].



FIGURE 4.3: Construction of an array based on a plurality of SRAM cells [102].

A basic SRAM array consists of  $2^{L}$  rows and  $N \times 2^{K}$  columns of the cells, where L is the number of address bits for the row decoder, K the number of address bits for the column decoder, and N the number of bits in a word (Figure 4.4).  $2^{L}$  word lines are decoded by the row decoder to select one of the  $2^{L}$  wordlines based on the row address bits (bits  $A_0$  to  $A_{L-1}$  in Figure 4.4), while there are K address bits only one of which is activated by the column decoder to select one of the N-bit words from a given row. Most recent microprocessors operate with 64-bit words, referring to as 64-bit processors. Thus, such systems include the SRAM array with  $64 \times 2^{K}$  (or  $2^{K+6}$ ) cell columns in total. To meet a square-shape criterion in the layout design,  $2^{K+6}$  should be equal to  $2^{L}$ , or K + 6 = L. The choice of using row select bits as MSB and column select bit as LSB of the entire address bits or vice versa is arbitrary. The required timing to activate the sense amplifier, write driver, decoders and other peripherals are controlled by a timing circuitry [103, 104].



FIGURE 4.4: A typical SRAM architecture [103].

#### 4.3.1 Sense Amplifier and Write Driver

Communication between an SRAM cell in the array and the outside world is conducted through a bitline pair which has two important characteristics: (1) bitlines conduct the information in a differential fashion, (2) they are highly capacitive interconnects due to being a long low metal layer and the numerous access transistors that loads them. These two characteristics generates a level of voltage on the bitlines that restricts to save power and delay time. A sense amplifier and a write driver interface between the bitlines pair to guarantee a proper voltage level translation on either side. The sense amplifier is responsible to amplify the analog differential voltage developed on the bitlines in a read access. A full swing single ended digital output would be the result of this amplification. The primary advantage of the SA is that it reduces the SRAM cell's size because the drive transistors does not need to fully discharge the bitlines. However, it needs to satisfy a few electrical requirements to operate properly in the SRAM unit. First, the required minimum differential voltage swing at the SA input should be smaller than the minimum differential voltage that is developed over the bitlines by the SRAM cell. Second, the SA should be capable of providing the output within the sense amplification time [102].

The write driver is used to discharge of one of the bitlines to zero to ensure a successful write operation in all process and mismatch corners. The write driver circuitry can be implemented in different ways based on the applications. A typical write driver circuit is shown in Figure 4.5 [103].



FIGURE 4.5: A typical SRAM architecture [103].

## 4.4 Read Operation

Figure 4.6 demonstrates the cell operation during a read access. In this figure, node Q carries a logic 0 and node  $\overline{Q}$  carries a logic 1 before the cell is accessed. Thus,  $MP_1$  and  $MN_2$ , are off while  $MN_1$  and  $MP_2$  are on, compensating for the leakage current of  $MP_1$  and  $MN_2$ .



FIGURE 4.6: 6T bit-cell during a read operation.

In conventional design, before the read operation begins, the bitline (BL) and its complement  $(\overline{BL})$  are precharged to VDD by the pull-up PMOSs and precharge transistors,  $M_{E1}$  and  $M_{E2}$ , controlled by the pre-charge signal (PRE) (see Fig. 4.5).

Then, Pre-charge transistors ( $M_{E1}$  and  $M_{E2}$  shown in Fig. 4.5) are turned off, by driving *PRE* high. Activation of a single wordline (*WWL*), i.e., the gate of the access transistors ( $MA_1$  and  $MA_2$ ), initiates the read operation, turning on all of the access transistors controlled by this *WWL*.

As the wordlines go high,  $MA_1$  operates in the saturation region while  $MN_1$  goes in triode region. Thanks to the short-channel effect,  $MA_1$  current has a linear relationship with the node Q voltage. Hence, these transistors act as resistors in this operation. Therefore,  $MA_1$  and  $MN_1$  form a voltage divider, which results in a  $\Delta V$  raise in the node Q voltage. This voltage drives the input of the inverter  $MP_2$ -  $MN_2$ . Referring to Figure 4.6, during a read operation, the access transistor that is connected to the side of the cell storing a logic-1 ( $MA_2$ ) is off, due to a low  $V_{gs}$ . This off-state access transistor ensures an independent BL voltage in conjunction with the cell. However, the access transistor that is connected to the side of the cell storing a logic-0 ( $MA_1$ ) is ON. Furthermore, the NMOS transistor that pulls the logic-0 storage node to ground is also ON ( $MN_1$ ). These two onstate transistors then create a conductive path between the pre-charged bitline (BL) and GND. Consequently, the bitline (BL) is discharged to ground by the side of the cell storing a logic-0. In such way, one can say that the data stored in the accessed cell is driven into BL and  $\overline{BL}$  which develops a voltage difference between bitline (BL) and its complement ( $\overline{BL}$ ). At the next stage, a sense amplifier at the periphery of the array senses that generated voltage difference. At the end of read cycle, wordline (WWL) and pre-charge signal (PRE) are driven low. Fig. 4.7 illustrates a SRAM cell attached to the pre-charge and write driver circuit.



FIGURE 4.7: SRAM column slice showing pre-charge transistors, a single SRAM cell, and read/write circuits [105].

To guarantee a non-destructive read operation, the  $\Delta V$  voltage level could be controlled by the resistive ratio of  $MA_1$  and  $MN_1$ . To assess the data stability during a read operation, cell ratio is defined as:

$$CR = \frac{W_4/L_4}{W_2/L_2} \tag{4.1}$$

where, W and L are the corresponding MOS transistors' width and length, respectively. To achieve more stable read operation,  $\Delta V$  must be kept low by a higher cell ratio (CR).

## 4.5 Write Operation

Fig. 4.8 shows the 6T bit-cell during write operation, the bitlines (BL and BL)are initially pre-charged to VDD by pre-charge transistors  $(M_{E1} \text{ and } M_{E2})$  and then turned off by driving pre-charge signal (PRE) high (shown in Fig. 4.5). The data value to be written should be first driven on the bitlines by the write driver (see Fig. 4.7) as shown in the Fig. 4.8 (1 is going to be written, BL=1 and  $\overline{BL}=0$ ). A single wordline (WWL) is then asserted to start the write operation, turning on all of the access transistors (MA1 and MA2) controlled by that WWL. If the data to be written is opposite to the previously stored data, the potential of the high internal node  $(\overline{Q} \text{ in this case})$  is lowered through the on-state access transistor  $(MA_2 \text{ in this case})$  and zero-voltage bitline path  $(\overline{BL})$ . The amount of the internal node's lowered voltage depends on the drive strengths of the pull-up PMOS and the pull-down NMOS transistors. Hence, the logic state of the cell is changed and the wordline becomes inactive after the write operation is completed. The  $\overline{PRE}$ is also driven low to pre-charge BL and  $\overline{BL}$  to VDD for the next operation.

The drive strength's ratio of the pull-up and access transistors is known as the pull-up ratio (PR).



FIGURE 4.8: 6T bit-cell during a write operation.

$$PR = \frac{W_5/L_5}{W_1/L_1} \tag{4.2}$$

The transistors require to be sized carefully so that the PR is low enough to pull down the high internal node voltage below the VTRIP (the required voltage to keep a proper read/write operation ) of the connected inverter.

To obtain a low PR, wide access transistors are compulsory; however, increasing the access transistors' width lowers the cell's stability during the read operation by affecting CR. Consequently, there is a trade-off between data-stability in the read operation and acceptable writability of the cell [101, 102, 105].

## 4.6 Overview of Read Buffer-Foot

One of the most significant challenges with 6T-SRAM memory cells is how to distinguish the read and write path. To address this issue, the 8T-SRAM cells are introduces as shown in Fig. 4.9. This two-port cell topology composes of a 6T storage cell and an extra 2T read-buffer which isolates the data-retention scheme during the read-accesses.



FIGURE 4.9: 8T bit-cell uses two-port topology to eliminate read SNM and peripheral assists, controlling Buffer-Foot and VVDD, to manage bit-line leakage and write errors [111].

The main advantage of this configuration is that the read SNM limitation is eliminated. Furthermore, this structure can deal with other two prominent limitations (bitline leakage and writability in the presence of variation) using the peripheral assists associated with the Buffer-Foot [111].

The bitline leakage issue in the single-ended 8T cell is comparable what is in the 6T case. So, in the 8T SRAM, the leakage currents can pull down RWL regardless of the accessed cell's state. When the ground voltage is directly connected to the source terminal of the  $M_7$  (in Fig 4.9) rather than buffer-foot, the RBL attached to the selected cell is correctly pulled low by the regarding accessed cell. Since the RWL is a horizon line, it activates all the cells on the same row, which push all un-selected cells in the same row to the half-selected mode. As a result, the RBL line connected to the half-selected cell would be discharge which increases the undesired leakage current [112]. In such case, depends on the Q value,  $M_7$  enters either to the off or triode region. If  $M_7$  operates in the triode region, it would leak some order of leakage current which increases the total power consumption of the chip.

Alternatively, to address this issue, instead of statically connecting its foot to

ground, a foot-driver is employed horizontally in the periphery, as shown in Fig. 4.10. In this configuration, the buffer-foots of all cells of the same word are shorted, and their foot-driver is shared. During the read mode, unlike the conventional scheme, only the foot of the accessed word is driven low while all others remain at VDD [113]. Accordingly, after RBL is pre-charged, the read-buffers of the un-accessed cells presents no voltage drop while the access devices have a negative voltage drop across their gate and source. Consequently, they impose no sub- $V_{th}$  leakage, and dynamically held data values of 1 on RBL can be sensed successfully.



FIGURE 4.10: Half-select disturbance in 8T SRAM array [112].

## 4.7 Building Blocks and Techniques

In SRAM arrays, the huge number of the cells must be idle for extended periods of time. During this time, it is not possible to shut-off the power supply since the cells must maintain their preserved data. However, during this period, the leakage power must be remained as low as possible to save power consumption. In this chapter, a particular combination of supply voltage and  $V_{th}$  are proposed that minimizes total energy consumption of the entire chip. These methodologies include dual-rail SRAM configuration, drowsy SRAM cells, and reverse-body biasing.

#### 4.7.1 Single-ended 8T-SRAM Cell

The main operation of the 8T-SRAM is explaied in detail in section 4.6, and its schematic view is depicted in Fig. 4.11. In this configuration, node x is connected to the read buffer foot in order to remove or decrease the aggregate leakage currents from the unaccessed cells which share same bit-lines [111]. This allows the read stack's virtual  $V_X$  to be controlled [116].



FIGURE 4.11: Single-ended 8T SRAM cell.

Fig. 4.12 compares the leakage current of the actual 8T-cell with and without the Read Buffer-Foot. As demonstrated in (a), if  $M_7$  source terminal is grounded, it turns on when the cell storage node (Q) is zero. As a result of this scheme, the *RBL* discharges from *VDD* to the ground, and the power consumption will consequently increase. However, it is important to note that the *RBL* discharge will vary depending on the number of cells and stored zeros per bit-line, though the *RBL* leakage could be decreased utilizing the Read Buffer-Foot, as shown in Fig (b). In this mechanism, when the read signal is deactivated, the Buffer-Foot goes high to disable the corresponding Read Stack. Thus, regardless of the stored data,  $M_7$  is pushed to the cut-off region, which results in almost stable RBL voltage level.





FIGURE 4.12: 8*T*-cell (a) without (b) with the Read Buffer-Foot.

Fig. 4.13 (a) illustrates the 8T-SRAM cell with Buffer-Foot during the read operation. Within this mode, when the read selection line ( $M_8$  gate terminal) goes high, the Buffer-Foot falls to zero voltage, thereby allowing the regular operation of the single-ended 8T-SRAM cell while leaving the unselected RBLs' voltage almost unchanged.





FIGURE 4.13: 8*T*-SRAM cell with Buffer-Foot during the read operation. (a) selected column (b) unselected column.

Fig. 4.14 demonstrates the overall array architecture for a SRAM memory block, including 16 8*T*-SRAM cells per bitline and their connections through two vertical and two horizontal controlling lines.

For the ease of simplicity, the read-and-write assist circuits are not included. In this arrangement, each cell in a column is linked vertically to others using two vertical controlling lines: write bitline pair  $(BL/\overline{BL})$  and read bitlines (RBL). These were connected in rows with the two horizontal controlling lines: write wordline (WWL) and read wordline (RBL). Furthermore, the power line (VVDD) of the array's cells are isolated from the periphery circuits (read and write assist circuits) to control the voltage level of the cells according to the operation mode.

The power line of the array itself is designed in such a way that it can apply high voltages to the cell in read/write mode. This can ensure a proper operation and is able to switch to a lower voltage in when in standby.

In this mechanism, there are two different supply voltages for each column, and each is controlled by a sleep controlling line (SL) (see Fig. 4.15). When SL=0, the column would be ready for read/write operation. Thus, PMOS turned on and transfer VDD1 to  $VVDD_i$  (cells power line). In contrast, when a column enters to the sleep mode (SL=1), NMOS transfers VDD2 to the  $VVDD_i$ . As a consequence, the cells' power supply would be varied according to the operation mode.

The block's symbol that was generated is shown in Figure 4.16.

#### 4.7.2 Peripheral Write Assist Circuitry

The write assist circuit contains three components: a write driver, bitlines' precharge circuit, and transmission gate switches (see Fig. 4.17). The write operation initiates with the pre-charging bitlines, and during the pre-charge operation, the  $PRE_W$  goes low, turning the  $M_8$  to  $M_{10}$  on. Thus, bitline pairs (BL and  $\overline{BL}$ )



FIGURE 4.14: A SRAM memory block, including 16 8*T*-SRAM cells per bitline and their connections through two vertical and two horizontal controlling lines.



FIGURE 4.15: Supply voltage controlling mechanism for array.



FIGURE 4.16: A column of 16 8T-SRAM.

charge to  $VDD - V_{ds}$ . Furthermore, the MOSs' drain-source voltage is small, in some order of micro-volt, because they operate in the sub-threshold region. As a result, bitline pairs would charge to approximately VDD.

Alternately, the write driver serves the purpose of driving the bitlines to the data value to be written while the transmission gates are devices that effectively drive a logic zero — but not a logic one — from write driver to the bitlines. Therefore,



FIGURE 4.17: write peripheral Assist Circuitry.

during pre-charge, the deactivated COL - W line does not allow the transmission gates to pass data from the write driver to the bitlines.

However, the COL - W line is activated for the write operation, and transmission gates pass data to the bitlines. For the unselected columns, the important concern with the write driver approach is that it leaks current as much as the selectedcolumns' driver, which consequently increases the power consumption. The novelty of this study is that to address this issue, the virtual VDD - S and VSS - S are applied to the write driver to decrease the rail-to-rail voltage of the *NOT* gates within the sleep mode (unselected columns).

The write driver is required to transfer data from data line to the bitline pairs during write operation. Otherwise, the transmission gate switches do not allow the data to be transferred to the bitline pairs. This means that the write driver is consuming power for doing no special operation. Thus, to reduce wasted power, the write driver is proposed to be push to the sleep mode when no write operation is expected.

This proposal could be implemented by a drowsy mechanism (see Fig. 4.18), like the cells' power line configuration. In this implementation, within the write operation ( $\overline{col} - W = 0$ ), PMOS is turned on and drives VDD to the VDD - Si. However, in the other states (read and sleep), unlike the regular write driver, its power line is grounded through NMOS. This in turn cuts the power consumption of this circuit in read and sleep mode of operations. Consequently, this mechanism results in a significant power reduction in the total chip. The write driver symbol is shown in Fig. 4.19.



FIGURE 4.18: Proposed supply voltage controlling mechanism for write driver.

The transient response of the memory block, which composes of the designed write assist circuitry, is illustrated in Fig. 4.20. Before the read or write starts, all bitlines are pre-charged to VDD, 300mV. Then, during write, one of the selected bitline pairs is discharged depending on the data, which should be written or read to/from the cell.

Unlike the totally discharge of one of the selected bitline pair, the discharge partially occurs for one of the unselected bitline pairs, which is graphically explained in Fig. 4.21.



FIGURE 4.19: The write driver symbol.



FIGURE 4.20: The transient response of the memory block.

Within the write operation, the high WWL activates all the access MOSs on the same row. Thus, un-accessed bitlines would find a path to ground, allowing them to leak in some measure. This leakage amount is directly influenced by the rail-to-rail drop voltage across the 4T-SRAM cell. The higher the rail-torail voltage is, the higher the bitline leakage current would be. Although, the prevention of the unselected bitline leakage is of importance in the memory design, this rail-to-rail voltage must not fall below the DRV; otherwise, data would be destroyed. Consequently, there is a trade-off between bitline leakage and data



FIGURE 4.21: Partially discharge of the unselected bitline pairs.

preservation, during which some portion of the leakage in the unselected bitline pairs is unavoidable.

## 4.7.3 Read Peripheral

The read assist circuit is comprised of a PMOS and two back-to-back NOT gate (see Fig. 4.22) to respectively recharge the RBL and read out the stored data.



FIGURE 4.22: Read peripheral circuit.

To decrease the leakage current and consequently the power consumption of the whole memory, same technique that was employed to the write assist circuit is applied to the read assist circuit. As a result, the power line (VDD - R) of the NOT gates are settled by a controlling line  $(\overline{col - R})$  as illustrated in Fig. 4.23.



VDDR i

FIGURE 4.23: Proposed Read peripheral circuit.

In this proposal, the SA leaks power only in the read operation. Within read operation  $(\overline{col - R} = 0)$ , PMOS drive VDD to VDD - R, which in turn supplies the SA. Alternatively, in write and sleep modes  $(\overline{col - R} = 1)$ , NMOS transfers ground to the SA's power line, which leads to a cut-off SA. The main advantage of this configuration is that SA consumes no power in write and sleep mode of operation. The notions of the read assist symbol is shown in Fig. 4.24.

Fig. 4.25 depicts the transient response of the memory block, which consists of the designed read and write assist circuitry.



FIGURE 4.24: Read peripheral circuit symbol.



FIGURE 4.25: The transient response of the memory block.

## 4.7.4 Reverse-Body Biasing

In the 65nm process under consideration, the cell leakage is high due to fast mobility of the electron-hole pairs and small size of the technology. The array would then need a mechanism to change the effective  $V_{th}$  of transistors. To achieve this goal, the reverse-body biasing technique could be applied that increases the transistors threshold voltage and therefore reduces its sub-threshold leakage. It could also use the lower value during normal operation to ensure a proper operation and use the higher values during sleep modes.

However, as technology shrinks, the body effect alone, with reasonable source–body bias values, cannot lower the cell leakage as much as desired. Another possible method that could be used to reduce the transistors leakage is to increase their length above the minimal value.

The simulation results in Fig. 4.26 illustrate that when the transistor length increases, the transistor leakage can be reduced if the width is adapted to keep the same driving capability. To achieve the lower possible leakage current, 500nm/60nm and 450nm/100nm were selected as the cell ratios of the pull-up PMOS and pull-down NMOS in the SRAM cells.


FIGURE 4.26: Current in terms of PMos and NMOS length

Based on the obtained cell ratio of the pull-up and pull-down transistors, the leakage current of a SRAM block in conjunction with the body biasing of the NMOS transistors were studied. Fig. 4.27 illustrates the total leakage current of an SRAM block, including 16-SRAM cells and periphery circuits. When the NMOS transistors' reverse body bias is increases from 0 to 1.2V, the leakage current of the entire block is reduced by almost 25nA.



FIGURE 4.27: Current in terms of NMOS body bias.

However, when the reverse-body bias increases further, so does the  $V_{th}$ . The main issue with increasing reverse-body bias is that the growing  $V_{th}$  leads to a lower number of electron-hole pairs that pass over the barrier. This will consequently lead to a significant reduction in both leakage current and speed. As a result, there is a trad-off between current and speed.

Fig. 4.28 confirms this trade-off: When the BN increases further, the data would be written slower. Moreover, for BN=-1.2V, the write operation is significantly degraded. So, to lower the power, there are restrictions in terms of speed of writability and leakage power.



FIGURE 4.28: Write operation over different NMOS body bias.

In order to satisfy both a proper write operation and low leakage current, a voltage in range of -0.8 to -0.4V could be applied to the body of the NMOS transistors. The lower values could be applied for read/write operations while the higher one would be a proper choice for the ideal mode of operation.

The read and write operation at BN=-0.6V are demonstrated in Fig. 4.29: When logic 1 is going to be written,  $\overline{BL}$  discharges to almost 0V. This results in the discharge of the  $\overline{Q}$ . The opposite happens when logic 0 is written. In read of logic 1, the RBL stays on vdd, but it discharges to almost 0 in case of reading logic 0.

Clearly, the read operation occurs slower than the write one. This slower read operation is due to the fact that the buffer-foot switches to ground in read operation. This will in turn push transistors that are attached to the RBL into on-state



FIGURE 4.29: Read and write operation.

if the stored data is 0 in the relevant SRAM cell. Consequently, this procedure decreases the speed of read operation.

In order to propose a power-efficient SRAM-array building block, a negative voltage in range of -0.8 to -0.4V is required. The creation of these negative values is a bottleneck in the low-power applications of the analog IC design. As a result, to complete this proposal, a negative voltage generator is required to feed the body terminals of NOT-NMOSs.

In this study, two novel negative voltage generators have been proposed (Refer to case one (Fig. 3.17) and case two (Fig. 3.30)). Fig. 4.30 compares the generated negative voltages of two cases. Clearly, both of these proposals could successfully provide negative voltages in range of -0.8 to -0.4V.

Table 4.1 compares the performance of two cases in terms of static power and delay. Case one consumes less power than the case two counterpart. However, the proposed circuity of case one's operation depends on a voltage reference that consumes 68nW extra static power (Refer to section 3.7). Alternatively, case two operates faster than case one circuitry, 2.5ns delay compared with 4ns delay.



FIGURE 4.30: Comparison of out voltage of two proposed negative voltage generators.

Thus, base on the criteria of the operational SRAMs (power and speed), one of these proposals could be selected.

TABLE 4.1: Performance comparison of two cases

| Case | Technology | Operating Fre. | Static Power | Delay     |
|------|------------|----------------|--------------|-----------|
|      | (nm)       | (MHz)          | @0.3V(nW)    | @0.3V(ns) |
| one  | 65         | 2.5            | 0.772        | 4         |
| Two  | 65         | 2.5            | 12           | 2.5       |

## 4.8 Conclusion

In the SRAM memories, in order to reduce power consumption, the array power line is designed in such a way that it applies high voltages to the cell in read/write mode and feed low voltages in case of sleep. In this study, same methodology is applied to the read/write periphery circuitries to lower the power consumption further. the novelty of this approach is that unlike the power line of array that requires two separate two VDD, it needs only one VDD.

Furthermore, it explained that to reduce the power consumption the body terminal of the NMOSs located in the NOT portion of the SRAM cells requires negative voltages to control the  $V_{th}$ , and in turn the power consumption. To generate this criteria, two negative level shifters have been proposed that each one could be productive based on either low power consumption or high speed requirements of the performance.

# Chapter 5

# **Physical Design**

## 5.1 Fabrication

CMOS integrated circuits are assembled on a slices of thin circular silicon called wafers. Each wafer consists of several of individual chips or "die" (see Figure 5.1) which are usually identical for production purposes [117].

The designed ICs can be fabricated through MOSIS on what is called a multiproject wafer. This wafer is comprised of chip designs of varying sizes. In order to split the fabrication cost among several designs to keep the cost low, MOSIS combines multiple chips on a wafer [118].

CMOS ICs are constructed utilizing an extremely complicated process that finally lead to tiny transistors and wires being constructed and connected on a silicon substrate. Layout design is the art of drawing these transistors and wires as they look like in silicon; thus, the layout can be realized as the circuit's physical representation [119].



FIGURE 5.1: CMOS integrated circuits are fabricated on and in a silicon wafer [117].

During laying out, designers need to deal with special design requirements such as layout symmetry, specific requirements for latch-up protection, or noise immunity, which are explained in detail in the following sections.

## 5.2 symmetry

For example, in measuring the timing characteristics of a circuit, it is desired to meet an absolute performance target.

To match the CMOSs' performance characteristics, it is desired that two layout designs be implemented identically. Following are examples when symmetry is routinely used [119]

• Differential amplifiers need operational matching between halves of the cell layout.

• Data path and memory array circuits need symmetric layout for each row and column.

To ensure that two circuits operate identically, same layout cell in both cases must be used. Signal symmetry is obtained when two signals have the same length, width, load, and coupling environment.

## 5.3 Latchup

Latchup is a failure mode in CMOS circuits that results in either soft failures or, in extreme cases, a destructive hard failure and permanent loss of the circuit. Device structures become more susceptible to both failure modes when isolation widths reduce unless considerations are taken into account to improve latchup robustness. Though shrinking the CMOS power supply voltage is one of the methods that can address hard latchup failures, it cannot address soft ones [120].

Parasitic p-n-p-n paths inherently exist in *CMOS* chips. In both development and layout stages, designers must be attentive of the related short circuit failure, which is identified as latchup. If the unpredictable conduction through the parasitic p-n-p-n structure is generated, such failure mechanisms can lead to a significant abnormal current. This condition could occur when voltage or current fluctuates at input/output (I/O) metal bonding pads (PADs). Damages or reliability problem will occur when the abnormal current is over the limited value that metal lines or contacts can sustain or the parasitic p-n-p-n structure can afford [121].

A depiction for the latchup formation is shown in Fig. 5.2.  $Q_1$  is double emitter pnp transistor. The base terminal is formed by the *PMOS*'s *NWELL* substrate. In this context, the source and drain of *PMOS* form two emitters, while the collector is formed by *NMOS* substrate. These parasitic transistors organize a positive feedback loop that is equivalent to an *SCR* [121].



FIGURE 5.2: Latch formation in CMOS [122].

The application of the guard ring is a potential solution for the traditional latchup prevention, as demonstrated in Fig. 5.3. A substrate current flew is present either within or outside of the internal circuits in instances when the external current source has a trigger. This current flow corresponds to the positive/negative pulse that is applied at I/O PAD. In this context, the primary reason for the presence of a latchup would be the substrate current. In order to absorb/release some of the latchup trigger current while avoiding the presence of a latchup in the internal circuits, there are two options with regard to the placement of the a guard ring: it can be used to either surround the I/O buffer or the electrostatic discharge (ESD) protection transistors [121].



FIGURE 5.3: Latch formation in CMOS [121].

## 5.4 Guard Ring

Guard structures have been used for many years to decouple parasitic bipolars from each other. They are critical for design integration of the digital and radio frequency (RF) applications where semiconductor devices need to be electrically isolated from adjacent circuitries, noise, and *CMOS* latchup matters. Furthermore, guard rings provide both electrical and spatial isolation between adjacent circuit elements and decrease the risk of inter-circuit interactions [123]. Where interaction is not desirable, guard rings are settled between each semiconductor device in a circuit. The guard rings between *PMOS* and *NMOS* in an inverter circuit are a good example for the prevention of the *CMOS* latchup between the two transistors.

According to [123], there are three key design-related issues with respect to the use of guard ring structures:

• Guard rings require a certain amount of the chip area, which is regarded as wasted area because this space is not used for the chip function. Consequently, circuit designers prefer to minimize this area. • The guard ring efficiency information within the guard ring model is not included in some semiconductor computer aided design (CAD) methodologies. As a result, they cannot check and verify the existence of the guard ring, and it appears as a shape that is not discernible or recognizable from other shapes.

• Some mixed signal and *RF CAD* methodologies contain substrate rings and guard rings in the primitive device design and do not allow alteration, deletion, or modification. This makes it difficult for the multiple elements to integrate in a dense and compact nature and thus does not efficiently utilize the chip area.

## 5.4.1 Guard Rings for Internal and External Latchup Phenomena

Guard rings isolate the adjacent circuit elements to avoid the interaction between devices and circuits that may undergo CMOS latchup.

Within this context, there are two options: either the minority carriers must be avoided to clear the region of the circuit so as not to influence the surrounding circuitry, or the minority carriers must be prevented from entering and thereby unduly influencing sensitive circuits.

Hence, in the discussion of internal latchup, the objective is to present an electrical isolation between the pnp and the npn structure. In this case, the guard ring minimizes the electrical coupling, thus preventing regenerative feedback between the pnp and the npn [118].

Guard rings can be grouped into two classes: minority and majority carrier guards whose decoupling action differs from the other.

Minority carrier guards are used to attract injected minority carriers before they result in a problem; otherwise, the injected carriers into the substrate would be attracted by a reverse-biased substrate junction and leak to the well as majority carriers. The obtained voltage drop would turn on the parasitic bipolar in the well if it is large enough[118]. Fig. 5.4 shows the basic concept of the minority carrier guard ring.



FIGURE 5.4: Minority carrier guard in substrate. (a)  $N^+$  diffusion guard. (b) NWELL guard. (c) NWELL in epi - CMOS [118].

The employment of a  $N^+$  diffusion guard ring between a parasitic emitter and the *NWELL* is shown in Fig. (a). In this structure, the  $N^+$  diffusion guard attracts those electrons flowing to a depth which almost equals to the extension of its depletion region. The remaining electrons that reach the *NWELL* will generate an ohmic drop that can forward bias the parasitic  $P^+$  emitter. A deeper *NWELL* guard is shown in (b) that is more efficient; however, it requires more area than the  $N^+$  diffusion. On the epi - CMOS, the *NWELL* guard ring virtually eliminates electron current flow to the guarded well (see Fig. (c)). In addition, minority carrier guards are substantially more effective in epi - CMOS than in regular bulk CMOS because the built-in electric field resulting from the substrate out-diffusion profile prevents minority carriers from diffusing under the guard ring [118].

Majority carrier guards, however, are used for I/O circuits. In this type of guard ring, the N-channel I/O devices are surrounded by a grounded  $P^+$  diffusion in the *PWELL* while the P-channel I/O devices are surrounded by an  $N^+$  diffusion in the substrate. In the latest case, the  $N^+$  guard ring is connected to the power supply [118].

The majority carrier guard ring in the well is illustrated in Fig. 5.5. The formation of the ohmic contact in the well by the  $N^+$  diffusion is shown in (a). In this type of guard ring, the ohmic contact reduces well resistance, Rw, for all parasitic  $P^+$ emitters. Some current, however, does flow laterally under the parasitic emitter.

In order to direct current from the parasitic emitter, the  $N^+$  diffusion in (b) is used, which thereby reduces the ohmic drop by further decreasing the collector resistance of the LNPN. This steering action can be significantly improved in epi - CMOS (c).

#### 5.4.2 Double Guard Ring

When utilizing the double guard-ring structure, the  $N^+$  guard-ring collects a portion of the injecting electrons, while the  $P^+$  guard ring collects some of the holes. This allows the electrons and holes that are respectively flowing to the *NWELL* and *PWELL* to be simultaneously reduced (see Fig. 5.6) [124].

Active devices require protection from both latchup and noise interference. This is typically achieved by utilizing the guard ring. For example,  $P^+$  guard rings, which have a VSS connection can protect active NMOS devices, while  $N^+$  guard rings, which are connected to VDD, can be used to protect active PMOS devices.



FIGURE 5.5: Majority carrier guard in well. (a)  $N^+$  diffusion guard to reduce NWELL sheet resistance. (b)  $N^+$  diffusion guard to steer current away from  $V_{PNP}$  emitter. (c)  $N^+$  diffusion guard steering on epi - CMOS [118].



FIGURE 5.6: Layout example of a guard ring to shield circuits from substrate noise, latch-up or other physical phenomena [125].

There are two options with respect to guard rings: single guard rings, which employ either a  $P^+$  or a  $N^+$  guard ring; and a double guard ring, which combines the two options. The mobility of a device can be significantly influenced by both the diffusion width of the guard ring as well as the implant type, which can by  $P^+$ ,  $N^+$ , or a combination of the two [126].

# 5.5 Effect of Body Biasing on the Physical Design

Substrate biasing was originally employed to shorten sub-threshold leakage in standby mode but its application has been extended in the recent years. In memory chip applications, reverse body biasing has been extensively used in order to decline the risk of latch up and data destruction. It is also been used in logic chips to reduce power consumption [127]. However, the transistors' physical implementation will be affected by body biasing. In this section, the body biasing effect and its impact on memories' physical implementation will be discussed.

The body effect of an NMOS transistor is given by the following equation:

$$V_{th} = V_{th0} + \gamma [\sqrt{(2\phi_b - V_{BS})} - \sqrt{(2\phi_b)}]$$
(5.1)

where  $\gamma$  and  $\phi_b$  are

$$\gamma = \frac{t_{ox}}{\varepsilon_{ox}} \sqrt{2\varepsilon_{si}qN_A} \tag{5.2}$$

$$\phi_b = \frac{KT}{q} ln(\frac{N_A}{N_i}) \tag{5.3}$$



FIGURE 5.7: Variation of depletion region charge with bulk voltage [128].

in which  $V_{BS}$  is the substrate potential,  $V_{th0}$  is  $V_{th}$  for  $V_{BS} = 0$ V,  $t_{ox}$  is the gate oxide thickness,  $\varepsilon_{ox}$  is the dielectric constant of the silicon dioxide,  $\varepsilon_{si}$  is the permittivity of silicon,  $N_A$  is the doping concentration density of the substrate,  $N_i$  is the carrier concentration in intrinsic silicon, K is Boltzmann's constant, qis the electronic charge, and T is the absolute temperature. The constant,  $\gamma$ , characterizes the body effect, and is called the body effect coefficient.

substrate connection attracts more holes when  $V_B$  becomes more negative. This will leave a large negative charge behind which consequently results in a wider depletion region (see Fig 5.7).

The threshold voltage is a function of the total charge in the depletion region, since the gate charge must mirror  $Q_{dep}$  before an inversion layer is formed. Thus, as  $V_B$ decreases and  $Q_{dep}$  grows, the  $V_{th}$  also increases [128]. This causes the electrons concentration in the inversion layer to fall which confirms the current reduction and in turn results in reducing the power consumption.

$$V_{th} = \Phi_{MS} + 2\Phi_F + \frac{Q_{dep}}{C_{ox}} \tag{5.4}$$

Therefore, the NMOS transistors should be reversed-bias to control the leakage current. The PMOS devices should be surrounded by a separate NWELL to ensure an isolated region with different body contact voltage. However, for an NMOS, the problem cannot be mitigated by such a straightforward methodology because the NMOSs are placed in P-substrate and are not surrounded by a well. Each NMOS with a different body bias must be placed in separate substrates to divide regions to a different contact voltage. As the TSMC65nm provides only one substrate, an approach should be employed to isolate the substrate to a subsubstrate with different body contact voltage. Guard ring is a methodology that allows to divide P-substrate regions. Although the main purpose of the guard ring utilization is to prevent noise and latch up phenomenon, it is also useful for the above-mentioned matter.

In this technique, the NMOS device is surrounded by a  $P^+$  guard ring, which creates a low ohmic connection between the PWELL and the ground (VSS) net. This  $P^+$  guard ring is then surrounded by a  $N^+$  guard ring, which connects a surrounding NWELL to the supply (VDD) [112].

Fig 5.8 illustrates the layout of the SRAM cell: Two *NMOS*s in the SRAM cell are placed in a guard ring to be able to apply a negative voltage to their body, as explained in section 5.4.2.

TABLE 5.1: Parameters comparison for an 8T SRAM cell with and without guard ring

|                        | without guard ring | with guard ring                     |
|------------------------|--------------------|-------------------------------------|
|                        |                    | (under $-500mV$ body bias of NMOSs) |
| leakage current $(nA)$ | 5.63               | 2.43                                |

The write driver layout is shown in Fig. 5.9. The notions of the read assist layout is shown in Fig. 5.10. In these designs, to generate long width transistors, a parallel-MOSs model is applied; furthermore, all broken transistors are symmetric.



FIGURE 5.8: Single-ended 8T SRAM cell Layout.

## 5.6 Physical Design of the Proposed Level Shifters

#### 5.6.1 Proposed Level Shifter in Fig. 3.8

Fig. 5.11 shows the layout view of the level shifter shown in Fig. 3.8 that is implemented using the TSMC 180nm CMOS technology. For this implementation, the bodies of transistors  $M_{p1}$ - $M_{p4}$  placed in a common high voltage N-well tied to VDDH whereas  $M_{p5}$  is placed in a 1.8V N-well. Furthermore, transistors  $M_{p6}$ are reside in a low voltage N-well tied to  $B_n$ . To implement long-length MOSs in the voltage divider configuration, four  $0.5\mu m$  length transistors are also attached in series. In such technology, the physical design of the proposed level shifter, including voltage divider occupies a silicon area of  $99.875\mu m^2$  ( $11.75\mu m \times 8.5\mu m$ ).

#### 5.6.2 Proposed Level Shifter in Fig. 3.17

The physical design view of the proposed level shifter in Fig. 3.17(b) is demonstrated in Fig. 5.12. According to table 3.5,  $M_1$ - $M_7$  are long length transistors. To implement these long length, a series-transistor technique has been applied.

## 5.7 Conclusion

In this chapter, the guard ring design and its effect on the leakage current of the transistors has been discussed. Furthermore, the physical design of the required blocks in the SRAM memory and level shifters are included.



FIGURE 5.9: The write driver layout.



FIGURE 5.10: Read peripheral circuit layout.



FIGURE 5.11: Physical Design of the level shifter in Fig. 3.8.



FIGURE 5.12: Physical Design of the proposed level shifter in Fig. 3.17(b).

## Chapter 6

# Application: Physical Unclonable Function

Physical uncloable functions are black building blocks for any integrated circuits with applications involve authentication and secret key generation. In this chapter, a novel mixed-signal physical uncloable function (PUF) is presented, which is based on the concept of SRAM and current mirror circuits. A novel SRAM design is designed which is triggered by a negative level shifter. The initial power on of SRAM is channelled for creating randomness and then cascaded with current mirror. The model is designed based on the concept of P-cell for semi-automation. Cells were designed using cadence virtuoso IC 5.1.04 at 65nm (TSMC foundry) using the layout tools and all major effects were considered including antenna, IR, EM effects and cross-talks for its custom layout. For a higher bit stability the current mirrors were introduced which is then cascaded with a buffer circuit at the end. The results shows that the proposed design is better than the state of the art design in terms of energy consumption which is estimated to be 0.21 f J/bitand 0.0409 F J@1Ghz. The uniqueness and reliability of the PUF is also estimated to be 48% and 98.25%. This paper also incorporate the creation of a novel negative level shifter for general purpose applications.

## 6.1 Introduction

In today's technology, hardware security is equally important to software security for any electronic devices that tape communication. This is important with regard to both artificial intelligence (AI) and Internet of things (IoT). Recently, physical unclonable function (PUF) has become a prevalent focus of industrial research. A PUF is a developed security block for generating volatile secret keys in cryptographic applications [129]-[131]. This block is also utilized for hardware primitives that use their physical characteristics to perform authentication, identification, counterfeit detection, and volatile cryptographic key generation [129]-[131]. PUF with precise specifications can be a boon for upcoming technology and large-scale manufacturing, as speculated by researchers.

A PUF is described as unclonable due to its uniqueness which is originated from the uncontrollable variations during the manufacturing process. PUFs grant a high level of protection in cryptographic applications that require an strong volatile key storage. PUFs receive a challenge (input and generate a unique and reliable response (output) in return. Since this response is unique to the device, it can, therefore, be used as a device ID or key.

However, the main objective of these designs is to strengthen security for a variety of purposes, such as securing intellectual property, hardware privacy, and authentication. In an ideal PUF, random (irregular) reactions are generated for N amount of challenges, inputs, issues, and difficulties. The key point is that the responses for every circuit should be distinctive. The hazard in electronic circuits are of verifiable purposes behind all the sorts of PUF's planned till now [132]. An electronic PUF is composed of physical data, which is very difficult to clone due to unique properties that derive from the random variations inherent to the CMOS fabrication process. An electronic PUF must satisfy the following requirements [133]-[135]:

1- Uniqueness: From a fabrication point of view, ICs are implemented under the same mask and manufacturing process conditions; however, the normal manufacturing variability causes the ICs to be slightly different. PUFs leverage this variability to extract the secret information that is unique to the chip. Thus, two different PUFs with identical designs are unique if they generate unique response signatures for the same set of challenges. In other words, uniqueness measures how well a PUF is differentiated from other PUFs based on its challenge-response pair. Different PUFs must generate different responses to a single challenge in order to separate one from another.

2- Reliability: The PUF must also be reliable and able to reproduce the unique secret key pattern when the operating conditions—temperature, voltage, and ag-ing—varies.

3- Randomness: The PUF must also originate from a source of random physical parameters, such as CMOS technology process variations, making it very difficult to clone. As a result, PUF is expected to generate zeroes and ones in the same probability. Randomness is a measurement that indicates the balance of zeroes and ones in the PUF responses.

Substantial research effort has been put into PUF, such as arbiter PUF [136], ring oscillator PUF [137]-[138], DRAM PUF [139]-[140] and SRAM PUF. There are two major challenges with the arbiter PUF: (1) its routing procedure is uncontrollable (2) it is hard to be implemented fully-symmetric. The ring oscillator PUF also has the potential to be broken as a result of locked frequencies by the power-supply variations. Therefore, hackers could locate its location inside the chip.

There are different PUFs in the literature that used the retention time characteristics of dynamic storage arrays as a source of randomness [141]-[143]. These PUFs were based on 1-transistor 1-capacitor (1T-1C) DRAM technology, which requires special implementation-process steps and is not available in many digital technology offerings.

In [143], a dedicated voltage regulator was applied to modify the word-line voltage of a silicon-on-insulator (SoI) embedded DRAM array. This regulator controls the array leakage current that causes a target number of bit failures. In [141], a DRAM with large storage capacitances was employed to obtain a very long authentication period of over an hour. A limited write duty-cycle time was proposed in [142] to produce write failures. However, the main constraint of this proposal is the requirement of complex delay monitors to identify the suitable write duty-cycle for the bits to fail. Furthermore, the write duty-cycle is highly dependent on the transistors' threshold voltage, which changes substantially with temperature. Therefore, the robustness of this methodology is low.

SRAM is a critical block in modern FPGA and system-on-chip in any technology [144]-[145]. SRAM PUF is very appealing because it is commonly available in most systems and therefore does not require additional hardware. However, there are two main challenges regarding SRAM PUFs, including sensitivity to noise and aging. SRAM PUF is sensitive to noise which is generated by temperature and voltage variations [145]-[146]. However, this sensitivity is not unique to SRAMs and occurs in a variety of electronics devices due to physical phenomena [147]-[149].

The aging also affects the output reliability of the SRAM PUF [145]. The errorcorrecting code (ECC) [145], [148–150] was introduced to repair errors in the SRAM output prior to use as a cryptographic key. However, this method creates a significant amount of overhead [149]. To address this issue, the fuzzy extractors are introduced that could generate information-theoretically secure cryptographic keys even if the helper data leaks information [151, 152]. As an alternative, exhaustive measurements of the SRAM PUF can be used to identify the most reliable cells [145, 146].

This study explores using the popular SRAM PUF for identification and cryptographic key generation. SRAM PUF offers the convenience of using commonly available and integrated SRAM (instead of including dedicated hardware in the circuit), as well as the capacity to provide large enough outputs for identifier/key generation/storage\_citeR6, R7.

#### 6.1.1 Our Contribution

This work proposes an improvised design of low power PUF device for ultra low power applications. The design has two major stages. The first stage is the creation of SRAM cells whose NMOSs' body are equipped with a negative DC generator to lower the power consumption. This is then connected to each other in rows and column. A current mirror is also proposed to utilize process variations and mismatch for creating randomness.

In overall, the main contribution of this research is particularly targeted to develop a PUF for low power applications. The process variation and mismatch for the PUF by Monte Carlo analysis are also assessed to find uniqueness and reliability of the circuit.

## 6.2 SRAM PUF Architecture

SRAM PUF is one of the most widely used methods to generate ID keys [153]-[154]. There are many investigations into SRAM PUF as a simple but effective method of hardware-based identification and key generation. The SRAM-based PUF is appealing because it is commonly available in most systems and thus does not require extra hardware.



FIGURE 6.1: SRAM cell in a Array.

#### 6.2.1 Related Works

The uniqueness of SRAM PUFs is also the largest among existing PUFs [155]-[156]. For a standard 6T SRAM (see Fig. 6.1), every memory cell is composed of six transistors that forms two cross-coupled inverters  $(M_1/M_3 \text{ and } M_2/M_4)$  and two access transistors  $(M_5 \text{ and } M_6)$ . The inverters are symmetric but random variations incurred during manufacturing will result in random mismatches.

The mismatch causes each SRAM cell to be either biased or skewed toward '0' or '1' during power-up. CMOS devices have different physical parameters (e.g., doping-levels, transistor oxide thickness, etc.) due to uncontrollable variations within the manufacturing process. When an SRAM is powered-up, these variations affect the power-up state of their associated cells.

#### 6.2.2 Effect of process variations on the power-up state

To generate an SRAM-based PUF, the SRAM array bits are sampled by system start-up procedure. When an SRAM array is powered-up, each individual memory cell randomly settles either a logic '0' or '1' value based on local process variations (mismatch) between transistors in each cell. Then, a challenge is applied to the SRAM by the memory address, and the relevant response is the corresponding bit values at this address.

However, the SRAM start-up state is extremely unstable and depends on aging and environmental variations [157]-[158]. Moreover, the SRAM arrays' power-up state depends on the previously preserved data of the array. These stored data require long periods of time between two consecutive power-up stages to secure an independent start-up attitude. These periods can reach up to several seconds of enrolment [159]-[160]. Various approaches have been proposed to avoid this problem, such as connecting two SRAM bit cells with complementary data signals and simultaneously enable their word-lines.

The perfect symmetry of the SRAM cells maximizes noise margins during operation. As a result, each SRAM cell should ideally include a 50% chance of acquiring either '0' or '1' during powered-up. However, in practice, most MOS transistor pairs will not be perfectly matched because of random manufacturing process variations.

The most significant device parameter that affects the power-up state of an SRAM cell is transistors' threshold voltages  $(V_{th})$ . Threshold voltage differences in the p-type and n-type transistors can either cause a cell bias in the same direction or in opposite directions. Furthermore, the net-imbalance determines the overall bias towards either '1' or '0' at power-up. A larger net-imbalance in the transistor pairs causes a more skewed SRAM cell. For relatively large net  $V_{th}$  imbalances, the power-up state may be stable and always the same within multiple power-up cycles. However, if the net  $V_{th}$  difference is small, the power-up values may still be partially random, with a bias towards either '0' or '1' [161].

#### 6.2.3 Power consumption

Power is one of the main performance parameters in SRAM-based PUFs. One of the most widely-used topologies to reduce the total power consumption of the SRAM cells is to scale down the dynamic voltage (dynamically scales the supply voltage and clock frequency) [162]. However, this methodology is ineffective when the sub-threshold leakage current is significant due to shrinking of the voltage supply.

Another promising technique to reduce the power consumption of the memory cells is the utilization of a body biasing technique, which is able to exponentially reduce the leakage current by adjusting the threshold voltage  $(V_{th})$  of the transistors [163], [164].

However, there are some restrictions on the maximum threshold voltage because if it is increased significantly, the transistors would not be able to stay in the sub-threshold region and the preserved data would be destroyed. As a result, the threshold voltage should be controlled in such a way to be low enough to ensure a proper read/write operation and it must be high enough (by reverse-body biasing of the transistors) to reduce the leakage current in the hold state.

One of the major challenges in this technique is the required negative voltage for n-type transistors to bias their body in the reverse direction.

In this study, a negative DC generator is proposed which could significantly total reduce the current and so does the power consumption, and in turn the energy per bit. Fig. 6.2 demonstrates the current of the SRAM unit with and without negative DC application to the NMOSs' of the SRAM cells. Clearly, with the application of body biasing to the NMOSs through the proposed negative DC voltages, the current could be reduced more as VDD increases from 0.8 to 1.2V.



FIGURE 6.2: Power consumption of SRAM PUF (single block) with and without negative DC generator.

## 6.3 Proposed Design

The proposed design is a combination of analog and digital circuits including SRAM blocks, current mirrors, and an AND gate (see Fig. 6.3). Fig. 6.3 shows N SRAM blocks that are cascaded with current mirrors in parallel. The output of current mirrors is finally applied to an AND gate to generate the final output. The SRAM design and operation, current mirror design, negative-DC-voltage generator, and randomness operation are discussed in the following sub-sections.

Each SRAM block is composed of six SRAM units (see Fig. 6.4) which are equipped with a novel negative level shifter to lower the leakage current and in turn the static power consumption. Each SRAM block then is cascaded with a current mirror circuit to generate the second phase of randomness which is discussed in 6.3.4.

#### 6.3.1 SRAM design and Operation

The SRAM array design, emphasizing the column group building block, containing sense/write, current mirroring, and the SRAM cell columns, is shown schematically in Fig 6.4. Here BL,BLB and WL represents the input challenges while Q and Qbar represents the outlinne which is the inputs for current mirrors. Each SRAM



FIGURE 6.3: General block diagram for SRAM PUF.



FIGURE 6.4: General block diagram for SRAM PUF (Single block).

blocks have 5 challenge lines in row and 18 challenge line in column. In total each SRAM cell has 30 challenge lines which is then cascaded with a current mirror circuits. In this design the negative DC generator is applied to the NMOS body terminals to lower the current and in turn reduce the power consumption.

In this design, Q and  $\overline{Q}$  (see Fig. 6.1) are shorted along the common rail of the column. This is to pull the output to the current mirrors. Moreover, the word lines are connected to each other to form cluster block, as the input word lines are applied to the SRAM cells of the same row and indeed it behaves differently to the same power up due to the initial randomness of SRAM cells.

#### 6.3.2 Current Mirror Design

Fig. 6.5 shows the developed current mirror circuit including four inputs. These inputs are driven by the SRAM cells outputs. The designed model has four inputs with two pull-down NMOS ( $M_{n1}$  and  $M_{n3}$ ) and two configured as pull-up PMOS ( $M_{p2}$  and  $M_{p4}$ ). The common node of  $M_{n1}$  and  $M_{p2}$  is then connected to a inverter ( $M_{n6}/M_{p6}$  pair) to generate the output voltage. Two current mirrors ( $M_{p1}/M_{p3}/M_{p5}$  and  $M_{n2}/M_{n4}/M_{n5}$ ) in top and bottom of the circuitry are employed to ensure the required current in each branch. In this design the randomness depends on process variations of the circuit.

The cell ratio of the transistors in this design are summarized in table 6.1.



FIGURE 6.5: Proposed Current mirror.

| Transistor        | ratio $(\mu m/\mu m)$ |
|-------------------|-----------------------|
| $M_{p1} - M_{p2}$ | 1.5/0.18              |
| $M_{p3} - M_{p4}$ | 2/0.18                |
| $M_{p5} - M_{p6}$ | 0.5/0.18              |
| $M_{n1} - M_{n2}$ | 1.5/0.18              |
| $M_{n3} - M_{n4}$ | 2/0.18                |
| $M_{n5} - M_{n6}$ | 0.5/0.18              |

TABLE 6.1: Transistors' size ratio of the proposed current mirror



FIGURE 6.6: Negative-DC-voltage generator applied to the NMOSs' of the SRAM units.

#### 6.3.3 Negative-DC-Voltage Generator

A negative-DC-voltage generator is proposed (see Fig. 6.6) to control the body biasing of the NMOSs in the SRAM units. This circuitry is composed of three sub-circuits: voltage reference, trimming circuit, and negative-Level shifter.

The proposed voltage reference is based on the Resistorless Beta Multiplier circuit. In this circuit,  $M_{n1} - M_{n4}$  and  $M_{p1} - M_{p10}$  constitutes the Resistorless Beta Multiplier, and  $M_{n5} - M_{n7}$  are the cascode stage that was employed to improve the *PSRR* performance. To generate a constant voltage,  $M_{n8} - M_{n9}$  are employed as the load.

To calibrate the process variations of  $V_{ref}$ , a digital controlled binary weighted aspect ratio adjustment (trimming circuits) has been employed to  $V_{ref}$  node in order to compensate for the influence of process variations.

The negative level shifter is composed of a current source,  $M_{n11}$ -  $M_{n12}$  and  $M_{p19}$ - $M_{p22}$ , and cascading transistors,  $M_{13}$ -  $M_{14}$ . Transistors  $M_{n15}$ -  $M_{n19}$  are employed to compensate the output temperature dependency. The  $V_{ref}$  is fed to this circuit by the voltage reference circuit.

The negative voltage generated by negative level shifter is then applied to the body of the NMOSs' in the SRAM unit. The physical design of proposed negative-DCvoltage generator is shown in Fig. 6.7. The occupied area is  $535.5\mu m^2$  ( $31.5\mu m \times 17\mu m$ ).



FIGURE 6.7: Negative-DC-voltage generator physical design.

#### 6.3.4 Randomness Operation

The randomness characteristics of the PUF in this work is generated in two phases which are discussed as following.

#### 6.3.4.1 Phase One

The SRAM designed utilizes its power up randomness to create the confusion in the circuit which is then formed into cluster of SRAM cells arranged in rows and columns. The SRAM cell is a generic 6T-model which is equipped with a novel negative-DC generator to reduce power consumption. According to the test methods, it is proved that power consumption goes down to a great extend when the proposed negative-DC generator feed body of NOMSs' in the SRAM units, which could be a term ideal for low power applications.

#### 6.3.4.2 Phase Two

The output of SRAM is given to the current mirror which has both PMOS and NMOS switches. The W/L ratio is set in such a way that when a signal is given to two transistors of same width and length, it creates a slight mismatch at the output. ie, one can be slight faster than the other this variation is used for generating the randomness at this phase. Under normal circumstances switching rate is high for normal PMOS and NMOS design.

## 6.4 Performance Evaluation

In this section, we discuss security parameters of PUF models namely, reliability and uniqueness. Intra and Inter Hamming distance is also studies in accordance with the work. Cadence virtuoso IC 5.1 65nm is also used for all analysis.

#### 6.4.1 Reliability

This PUF parameter shows the device ability to regenerate the same results even under different conditions. The ideal value of PUF is given as 100%. Electro migration and temperature are among those affects that could influence the performance of the device.

$$Reliability = \left(1 - \frac{2}{m \times (m-1)} \sum_{i=1}^{m-1} \sum_{j=i+1}^{m} \frac{HD(r_i, r_j)}{a}\right) \times 100\%$$
(6.1)

where *m* represents the number of response bits, and Hamming distance (HD) is the distance between two response samples, in equation  $r_i$  and  $r_j$ .

#### 6.4.2 Uniqueness

Uniqueness is the ability of the device to produce different outputs for same challenges when the design is same for the devices. Ideally two devices with same inputs must have two different outputs. Ideally uniqueness should be 50%, which is represented by an equation given below.

$$Uniqueness = \frac{2}{g \times (g-1)} \sum_{i=1}^{g-1} \sum_{j=i+1}^{g} \frac{HD(r_i, r_j)}{a} \times 100\%$$
(6.2)

where g represents the number of PUF instances under study. Hamming distance (HD) is the distance between two response samples, in equation  $r_i$  and  $r_j$ .
#### 6.5 Implementation and Test Requests

We used cadence virtuoso with TSMC foundary 65*nm* default library. The schematic design is first made and then run the test over analog environment for initial results (spectreS). Once the results are error-free, we imported the file over virtuoso environment for layout design. The layout concluded with all major physical verification technique like DRC, LVS checks and extraction with and without capacitance. The layout also focused on on reducing EM, IR and Antenna effects. A simple flow chart for the same is shown below.

Fig. 6.8 represents a simple flow chart of the design procedure used in this thesis. As the first specifications were set, a literature survey was conducted and finally laid a floor plan for the entire design.



FIGURE 6.8: General flowchart.

The initial tests were carried out in the Cadence Virtuoso schematic design tool, saved, and the Analog Design Environment was used to set it up. The Spectre spice models which were based on the TSMC data were was used for transient analysis; if there is no error, we proceed to the next phase of the design, i.e., physical design. In the second phase, once the requirement is met, layout design proceeds. Once the layout is accomplished, we go for physical verification, which includes DRC and LVS checks. The DRC check is a Design Rule Check while LVS is Layout Versus Schematic check. The DRC has a set of rule that is custom built and made by the TSMC foundry. Once the layout is designed, the software compares it with the rule check and matches it with the one made. Once the DRC check is cleared with no errors, then we continue with the same extraction and then conduct LVS. If any errors are found in DRC or LVS, we send it back to the layout to fix it. Finally, after extraction, a schematic is formed which includes all of the parasitic elements in the circuit and is used for post-layout simulations.

The layout is then used to run Monte Carlo (MC) analysis for 10,000 iterations to find uniqueness and reliability of the design. The extracted MC values are then transported over to excel to study all parameters.

Monte Carlo analysis is shown in the Fig.6.9, the results shows that the PUF could behave differently (responses) under the same challenges. Here we extract the values and find the PUF parameters such as reliability and uniqueness.



FIGURE 6.9: MonteCarlo analysis.

The uniqueness of the device its found to be 48% and reliability is estimated to be 98.25%.

From analog statical models, we extract multiple outputs using Monte Carlo analysis for analysing reliability and uniqueness. These are examined using excel by separating them into different PUF units. Fig 6.10 represents the final extraction data for the proposed PUF.



FIGURE 6.10: MonteCarlo analysi.s

#### 6.6 Comparison study

Table. 6.2 compares the overall performance of the proposed SRAM-based PUF with several state-of-the-art PUF designs. The observations confirm that this proposal could successfully meet uniqueness and reliability parameters, 48% and 98.25%, respectively.

TABLE 6.2: Comparison with state-of-the-art

| PUF type                 | [155]   | [130]  | [165] | [131]   | [166]  | [167] | [168] | [169] | Proposed |
|--------------------------|---------|--------|-------|---------|--------|-------|-------|-------|----------|
| Year                     | 2015    | 2017   | 2019  | 2017    | 2019   | 2020  | 2020  | 2020  | 2021     |
| Process(nm)              | 65      | 90     | 40    | 40      | 65     | 65    | 65    | 130   | 65       |
| Topology                 | Static  | Analog | AC    | Static  | Analog | SRAM  | SRAM  | WE    | Analog   |
|                          | MS      | TV     | RO    | MS      |        |       |       |       | Mixed    |
| Uniqueness (%)           | 50.01   | 50.10  | 49.97 | 49.07   | 49.94  | 49.3  | 49.9  | 50.2  | 48       |
| Reliablity (%)           | 99.9943 | 97     | 98.55 | 99.9951 | -      | -     | -     | -     | 98.25    |
| Power $(\mu W)$          | -       | 0.180  | -     |         | -      |       |       |       | 0.0409   |
| Energy consumption       | 15      | 1.81   | -     | 1.02    | 124    | 16    | 46    | 128   | 0.21     |
| per bit $(fJ/bit)$       |         |        |       |         |        |       |       |       |          |
| Standard cell design     | NO      | NO     | -     | YES     | NO     | -     | -     | -     | YES      |
| Area per bit $(\mu m^2)$ | 8.18    | -      | -     | 5.83    | 764    | -     | 3001  | -     | 535.5    |
| Number of challenges     | -       | 64     | -     | -       | -      | -     | -     | -     | 64       |

The main feature of this proposal compared with the state-of-the arts is its much lowest Energy per bit (0.21 f J/bit in comparison with 1 f J/bit to some hundred orders of f J/bit in different technologies). Among those implemented in 65nm, the proposed SRAM-PUF could significantly lower Energy between  $71 \times [155]$  to  $609 \times [169]$ . However, this achievement comes at the cost of greater area. It should be pointed out that the reported area is for the negative-DC generator included one SRAM unit. This negative-DC generator could supply 16 SRAM unit at the same time. Thus, it should be copied for every 16 SRAM cells.

#### 6.7 Conclusion

This work aimed at creating an ultra-low power PUF for low power applications like IoT devices. This design also introduces a design methodology for using multilayer of randomness in a single model and is channelled with standard cell approach. This methodology could be used to design larger PUF models with custom built P-cell modules stored from library. The design has brought a significant reduction in energy consumption for a 8-bit PUF unit. The draw back is that the design area is quite large when compared to the other models.

### Chapter 7

# Contribution

#### 7.1 Summary and Conclusions

From their use in processors to their applications in LCD screens and printers, SRAM memories have become a fundamental part of our daily lives. The growing application of SRAM memories raises the need for addressing their power challenges as well. If the power is not well taken care of, it can provide an extensive electricity consumption.

This dissertation has tried to address power consumption issue of the SRAM memories through the leakage current reduction techniques.

First, to control the NMOS transistors' leakage current located in the SRAM cells, a negative DC voltage is generated by a negative level shifter. This proposal could successfully creates square waves with operating frequencies based on the input frequencies (2.5MHz). It could also performs properly in terms of static power and propagation delay.

Next, a dual-rail designs is applied to the memory to split logic and SRAM rails and allow logic rail to be supplied without being restricted by the Vmin or Vmax requirement of SRAM bit-cells. The SRAM array voltage is supplied by a dedicated voltage that does not scale with the periphery logic. However, the SRAM periphery voltage is attached to the digital subsystem.

To reduce the leakage current further, a drowsy configuration is also proposed. Unlike the regular configuration of the memory that only SRAM cells are pushed into the sleep within standby mode, the write driver and sense amplifier are also forced into the sleep mode as well in this proposal.

# Bibliography

- Jozwiak, L., 2017. Advanced mobile and wearable systems. Microprocessors and microsystems, 50, pp.202-221.
- [2] Ryu, J.H., Gankhuyag, G. and Chong, K.T., 2016. Navigation system heading and position accuracy improvement through GPS and INS data fusion. Journal of Sensors, 2016.
- [3] ZHANG, X.W., LIN, X.L. and RUAN, C.L., 2005. The Overseas Military VLSI Application and Development Trend. Semiconductor Technology, p.08.
- [4] Chitra, K. and Vennila, C., 2020. A novel patch selection technique in ANN B-Spline Bayesian hyperprior interpolation VLSI architecture using fuzzy logic for highspeed satellite image processing. Journal of Ambient Intelligence and Humanized Computing, pp.1-14.
- [5] Zhao, J., Ghannam, R., Htet, K.O., Liu, Y., Law, M.K., Roy, V.A., Michel, B., Imran, M.A. and Heidari, H., 2020. Self-Powered Implantable Medical Devices: Photovoltaic Energy Harvesting Review. Advanced Healthcare Materials, 9(17), p.2000779.
- [6] Jaiswal, K. and Saxena, S., 2017. A review on FinFET based SRAM design for low power applications. Int J Technol Res Manage.
- [7] Kang, M., Gonugondla, S.K., Patil, A. and Shanbhag, N.R., 2018. A multi-functional inmemory inference processor using a standard 6T SRAM array. IEEE Journal of Solid-State Circuits, 53(2), pp.642-655.
- [8] Zhang, K., Bhattacharya, U., Chen, Z., Hamzaoglu, F., Murray, D., Vallepalli, N., Wang, Y., Zheng, B. and Bohr, M., 2005. SRAM design on 65-nm CMOS technology with dynamic sleep transistor for leakage reduction. IEEE Journal of Solid-State Circuits, 40(4), pp.895-901.

- [9] Wang, Y., Ahn, H.J., Bhattacharya, U., Chen, Z., Coan, T., Hamzaoglu, F., Hafez, W.M., Jan, C.H., Kolar, P., Kulkarni, S.H. and Lin, J.F., 2008. A 1.1 GHz 12μ A/Mb-Leakage SRAM Design in 65 nm Ultra-Low-Power CMOS Technology With Integrated Leakage Reduction for Mobile Applications. IEEE Journal of Solid-State Circuits, 43(1), pp.172-179.
- [10] Aly, R.E. and Bayoumi, M.A., 2007. Low-power cache design using 7T SRAM cell. IEEE Transactions on Circuits and Systems II: Express Briefs, 54(4), pp.318-322.
- [11] Amelifard, B., Fallah, F. and Pedram, M., 2008. Leakage minimization of SRAM cells in a Dual-V<sub>t</sub> and Dual-T<sub>ox</sub> Technology. IEEE transactions on very large scale integration (VLSI) systems, 16(7), pp.851-860.
- [12] Zhang, K., Hamzaoglu, F. and Wang, Y., 2007. Low-power SRAMs in nano-scale CMOS technologies. IEEE Transactions on Electron devices, 55(1), pp.145-151.
- [13] Frustaci, F., Corsonello, P., Perri, S. and Cocorullo, G., 2006. Techniques for leakage energy reduction in deep submicrometer cache memories. IEEE transactions on very large scale integration (vlsi) systems, 14(11), pp.1238-1249.
- [14] Gargini, P., 2016. Roadmap Past, Present and Future. keynote presentation at SPCC.
- [15] Kim, N.S., Flautner, K., Blaauw, D. and Mudge, T., 2004. Circuit and microarchitectural techniques for reducing cache leakage power. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 12(2), pp.167-184.
- [16] Itoh, K., Horiguchi, M. and Yamaoka, M., 2007, September. Low-voltage limitations of memory-rich nano-scale CMOS LSIs. In ESSCIRC 2007-33rd European Solid-State Circuits Conference (pp. 68-75). IEEE.
- [17] Kanda, K., Sadaaki, H. and Sakurai, T., 2004. 90% write power-saving SRAM using senseamplifying memory cell. IEEE journal of solid-state circuits, 39(6), pp.927-933.
- [18] Lai, Y.C. and Huang, S.Y., 2008. X-calibration: A technique for combating excessive bitline leakage current in nanometer SRAM designs. IEEE journal of solid-state circuits, 43(9), pp.1964-1971.
- [19] Agawa, K.I., Hara, H., Takayanagi, T. and Kuroda, T., 2001. A bitline leakage compensation scheme for low-voltage SRAMs. IEEE Journal of Solid-State Circuits, 36(5), pp.726-734.
- [20] Caravella, J.S., 1997. A low voltage SRAM for embedded applications. IEEE Journal of Solid-State Circuits, 32(3), pp.428-432.

- [21] Kim, K., Mahmoodi, H. and Roy, K., 2008. A low-power SRAM using bit-line chargerecycling. IEEE journal of solid-state circuits, 43(2), pp.446-459.
- [22] Minato, O., Masuhara, T., Sasaki, T., Sakai, Y. and Hayashida, T., 1984, February. A 20ns 64k cmos sram. In 1984 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Vol. 27, pp. 222-223). IEEE.
- [23] Reed, P., Alexander, M., Alvarez, J., Brauer, M., Chao, C.C., Croxton, C., Eisen, L., Le, T., Ngo, T., Nicoletta, C. and Sanchez, H., 1997, February. A 250 MHz 5 W RISC microprocessor with on-chip L2 cache controller. In 1997 IEEE International Solids-State Circuits Conference. Digest of Technical Papers (pp. 412-413). IEEE.
- [24] Lovett, S.J., Gibbs, G.A. and Pancholy, A., 2000. Yield and matching implications for static RAM memory array sense-amplifier design. IEEE Journal of Solid-State Circuits, 35(8), pp.1200-1204.
- [25] Niki, Y., Kawasumi, A., Suzuki, A., Takeyama, Y., Hirabayashi, O., Kushida, K., Tachibana, F., Fujimura, Y. and Yabe, T., 2011. A digitized replica bitline delay technique for random-variation-tolerant timing generation of SRAM sense amplifiers. IEEE Journal of Solid-State Circuits, 46(11), pp.2545-2551.
- [26] Yang, B.D. and Kim, L.S., 2005. A low-power SRAM using hierarchical bit line and local sense amplifiers. IEEE journal of solid-state circuits, 40(6), pp.1366-1376.
- [27] Mai, K.W., Mori, T., Amrutur, B.S., Ho, R., Wilburn, B., Horowitz, M.A., Fukushi, I., Izawa, T. and Mitarai, S., 1998. Low-power SRAM design using half-swing pulse-mode techniques. IEEE Journal of Solid-State Circuits, 33(11), pp.1659-1671.
- [28] Mizuno, H. and Nagano, T., 1996. Driving source-line cell architecture for sub-1-V highspeed low-power applications. IEICE transactions on electronics, 79(7), pp.963-968.
- [29] Kawashima, S., Mori, T., Sasagawa, R., Hamaminato, M., Wakayama, S., Sukegawa, K. and Fukushi, I., 1998. A charge-transfer amplifier and an encoded-bus architecture for low-power SRAM's. IEEE Journal of Solid-State Circuits, 33(5), pp.793-799.
- [30] Sharma, V., Cosemans, S., Ashouei, M., Huisken, J., Catthoor, F. and Dehaene, W., 2011. A 4.4 pJ/access 80 MHz, 128 kbit variability resilient SRAM with multi-sized sense amplifier redundancy. IEEE Journal of Solid-State Circuits, 46(10), pp.2416-2430.
- [31] Yang, B.D., 2010. A low-power SRAM using bit-line charge-recycling for read and write operations. IEEE journal of solid-state circuits, 45(10), pp.2173-2183.

- [32] Moriwaki, S., Kawasumi, A., Suzuki, T., Sakurai, T. and Miyano, S., 2011, September. 0.4 V SRAM with bit line swing suppression charge share hierarchical bit line scheme. In 2011 IEEE Custom Integrated Circuits Conference (CICC) (pp. 1-4). IEEE.
- [33] Hirose, T., Kuriyama, H., Murakami, S., Yuzuriha, K., Mukai, T., Tsutsumi, K., Nishimura, Y., Kohno, Y. and Anami, K., 1990. A 20-ns 4-Mb CMOS SRAM with hierarchical word decoding architecture. IEEE Journal of Solid-State Circuits, 25(5), pp.1068-1074.
- [34] Karandikar, A. and Parhi, K.K., 1998, October. Low power SRAM design using hierarchical divided bit-line approach. In Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No. 98CB36273) (pp. 82-88). IEEE.
- [35] Chhabra, A. and Vaderiya, Y.D., 2015. Low-energy power-on-reset circuit for dual supply SRAM. IEEE transactions on very large scale integration (VLSI) systems, 24(5), pp.2003-2007.
- [36] Chang, J., Liao, H.J., Chih, Y.D., Sinangil, M., Chen, Y.H., Clinton, M. and Lu, S.L.L., 2017, June. Embedded memories for mobile, IoT, automotive and high performance computing. In 2017 Symposium on VLSI Technology (pp. T26-T27). IEEE.
- [37] Cakir, C., Chen, A.W., Chong, Y.K., Thyagarajan, S., McCartney, M., Tan, P., Shi, Y. and Bhargava, M., 2019, June. A 4GHz 16nm SRAM Architecture with Low-Power Features for Heterogeneous Computing Platforms. In 2019 Symposium on VLSI Circuits (pp. C112-C113). IEEE.
- [38] Sinangil, M.E., Poulton, J.W., Fojtik, M.R., Greer, T.H., Tell, S.G., Gotterba, A.J., Wang, J., Golbus, J., Zimmer, B., Dally, W.J. and Gray, C.T., 2015. A 28 nm 2 Mbit 6 T SRAM with highly configurable low-voltage write-ability assist implementation and capacitor-based sense-amplifier input offset compensation. IEEE Journal of Solid-State Circuits, 51(2), pp.557-567.
- [39] Martin, S.M., Flautner, K., Mudge, T. and Blaauw, D., 2002, November. Combined dynamic voltage scaling and adaptive body biasing for lower power microprocessors under dynamic workloads. In Proceedings of the 2002 IEEE/ACM international conference on Computeraided design (pp. 721-725).
- [40] Kao, J.T., Miyazaki, M. and Chandrakasan, A.R., 2002. A 175-mV multiply-accumulate unit using an adaptive supply voltage and body bias architecture. IEEE journal of solidstate circuits, 37(11), pp.1545-1554.

- [41] Wang, A. and Chandrakasan, A., 2005. A 180-mV subthreshold FFT processor using a minimum energy design methodology. IEEE Journal of solid-state circuits, 40(1), pp.310-319.
- [42] Calhoun, B.H. and Chandrakasan, A.P., 2006. Static noise margin variation for subthreshold SRAM in 65-nm CMOS. IEEE Journal of solid-state circuits, 41(7), pp.1673-1679.
- [43] Kumar, H., Srivastava, S. and Singh, B., 2021. Low power, high-performance reversible logic enabled CNTFET SRAM cell with improved stability. Materials Today: Proceedings, 42, pp.1617-1623.
- [44] Burd, T.D., Pering, T.A., Stratakos, A.J. and Brodersen, R.W., 2000. A dynamic voltage scaled microprocessor system. IEEE Journal of solid-state circuits, 35(11), pp.1571-1580.
- [45] Hwang, M.E., Raychowdhury, A., Kim, K. and Roy, K., 2007, June. A 85mV 40nW processtolerant subthreshold 8× 8 FIR filter in 130nm technology. In 2007 IEEE Symposium on VLSI Circuits (pp. 154-155). IEEE.
- [46] Calhoun, B.H. and Chandrakasan, A.P., 2007. A 256-kb 65-nm sub-threshold SRAM design for ultra-low-voltage operation. IEEE journal of solid-state circuits, 42(3), pp.680-688.
- [47] Chang, I.J., Kim, J.J., Park, S.P. and Roy, K., 2009. A 32 kb 10T sub-threshold SRAM array with bit-interleaving and differential read scheme in 90 nm CMOS. IEEE Journal of Solid-State Circuits, 44(2), pp.650-658.
- [48] Mizuno, H., Ishibashi, K., Shimura, T., Hattori, T., Narita, S., Shiozawa, K., Ikeda, S. and Uchiyama, K., 1999. An 18-/spl mu/A standby current 1.8-V, 200-MHz microprocessor with self-substrate-biased data-retention mode. IEEE Journal of Solid-State Circuits, 34(11), pp.1492-1500.
- [49] Myers, J., Savanth, A., Howard, D., Gaddh, R., Prabhat, P. and Flynn, D., 2015, February. 8.1 An 80nW retention 11.7 pJ/cycle active subthreshold ARM Cortex-M0+ subsystem in 65nm CMOS for WSN applications. In 2015 IEEE International Solid-State Circuits Conference-(ISSCC) Digest of Technical Papers (pp. 1-3). IEEE.
- [50] Ballo, A., Grasso, A.D. and Palumbo, G., 2019. A review of charge pump topologies for the power management of IoT nodes. Electronics, 8(5), p.480.
- [51] Meijer, R.I.M.P., 2011. Body bias aware digital design: a design strategy for area-and performance-efficient CMOS integrated circuits.

- [52] Palumbo, G. and Pappalardo, D., 2010. Charge pump circuits: An overview on design strategies and topologies. IEEE Circuits and Systems Magazine, 10(1), pp.31-45.
- [53] Kennedy, S., Morrison, D., Delic, D., Yuce, M.R. and Redouté, J.M., 2021. Fully-Integrated Dickson Converters for Single Photon Avalanche Diode Arrays. IEEE Access, 9, pp.10523-10532.
- [54] Ballo, A., Grasso, A.D. and Palumbo, G., 2020. A Subthreshold Cross-Coupled Hybrid Charge Pump for 50-mV Cold-Start. IEEE Access, 8, pp.188959-188969.
- [55] Bose, S., Anand, T. and Johnston, M.L., 2019. Integrated cold start of a boost converter at 57 mV using cross-coupled complementary charge pumps and ultra-low-voltage ring oscillator. IEEE journal of solid-state circuits, 54(10), pp.2867-2878.
- [56] Zhang, J., Zhang, H. and Zhang, R., 2018. A high-efficiency charge pump in BCD process for implantable medical devices. Journal of Semiconductors, 39(10), p.105003.
- [57] Yi, H., Yin, J., Mak, P.I. and Martins, R.P., 2017. A 0.032-mm 2 0.15-V three-stage chargepump scheme using a differential bootstrapped ring-VCO for energy-harvesting applications. IEEE Transactions on Circuits and Systems II: Express Briefs, 65(2), pp.146-150.
- [58] Fuketa, H. and Matsukawa, T., 2016. Fully integrated, 100-mV minimum input voltage converter with gate-boosted charge pump kick-started by LC oscillator for energy harvesting. IEEE Transactions on Circuits and Systems II: Express Briefs, 64(4), pp.392-396.
- [59] Goeppert, J. and Manoli, Y., 2016. Fully integrated startup at 70 mV of boost converters for thermoelectric energy harvesting. IEEE Journal of Solid-State Circuits, 51(7), pp.1716-1726.
- [60] Zhao, W., Alvarez, A.B. and Ha, Y., 2015. A 65-nm 25.1-ns 30.7-fJ robust subthreshold level shifter with wide conversion range. IEEE Transactions on Circuits and Systems II: Express Briefs, 62(7), pp.671-675.
- [61] Maghsoudloo, E., Rezaei, M., Sawan, M. and Gosselin, B., 2016. A high-speed and ultra low-power subthreshold signal level shifter. IEEE Transactions on Circuits and Systems I: Regular Papers, 64(5), pp.1164-1172.
- [62] Zhai, B., Nazhandali, L., Olson, J., Reeves, A., Minuth, M., Helfand, R., Pant, S., Blaauw, D. and Austin, T., 2006, June. A 2.60 pJ/Inst subthreshold sensor processor for optimal energy efficiency. In 2006 Symposium on VLSI Circuits, 2006. Digest of Technical Papers. (pp. 154-155). IEEE.

- [63] Craig, K., Shakhsheer, Y., Arrabi, S., Khanna, S., Lach, J. and Calhoun, B.H., 2013. A 32 b 90 nm processor implementing panoptic DVS achieving energy efficient operation from sub-threshold to high performance. IEEE Journal of Solid-State Circuits, 49(2), pp.545-552.
- [64] Lanuzza, M., Crupi, F., Rao, S., De Rose, R., Strangio, S. and Iannaccone, G., 2016. An ultralow-voltage energy-efficient level shifter. IEEE Transactions on Circuits and Systems II: Express Briefs, 64(1), pp.61-65.
- [65] Vatanjou, A.A., Ytterdal, T. and Aunet, S., 2018. An ultra-low voltage and low-energy level shifter in 28-nm UTBB-FDSOI. IEEE Transactions on Circuits and Systems II: Express Briefs, 66(6), pp.899-903.
- [66] Rana, V. and Sinha, R., 2017. Stress relaxed multiple output high-voltage level shifter. IEEE Transactions on Circuits and Systems II: Express Briefs, 65(2), pp.176-180.
- [67] Lanuzza, M., Corsonello, P. and Perri, S., 2012. Low-power level shifter for multi-supply voltage designs. IEEE Transactions on Circuits and Systems II: Express Briefs, 59(12), pp.922-926.
- [68] Yong, Z., Xiang, X., Chen, C. and Meng, J., 2017. An energy-efficient and wide-range voltage level shifter with dual current mirror. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 25(12), pp.3534-3538.
- [69] Lutkemeier, S. and Ruckert, U., 2010. A subthreshold to above-threshold level shifter comprising a Wilson current mirror. IEEE Transactions on Circuits and Systems II: Express Briefs, 57(9), pp.721-724.
- [70] Luo, S.C., Huang, C.J. and Chu, Y.H., 2014. A wide-range level shifter using a modified Wilson current mirror hybrid buffer. IEEE Transactions on Circuits and Systems I: Regular Papers, 61(6), pp.1656-1665.
- [71] Zhou, J., Wang, C., Liu, X., Zhang, X. and Je, M., 2015. An ultra-low voltage level shifter using revised Wilson current mirror for fast and energy-efficient wide-range voltage conversion from sub-threshold to I/O voltage. IEEE Transactions on Circuits and Systems I: Regular Papers, 62(3), pp.697-706.
- [72] Lotfi, R., Saberi, M., Hosseini, S.R., Ahmadi-Mehr, A.R. and Staszewski, R.B., 2018. Energy-efficient wide-range voltage level shifters reaching 4.2 fJ/transition. IEEE Solid-State Circuits Letters, 1(2), pp.34-37.

- [73] Kim, T.T.H., 2018. An area and energy efficient ultra-low voltage level shifter with pass transistor and reduced-swing output buffer in 65-nm CMOS. IEEE Transactions on Circuits and Systems II: Express Briefs, 65(5), pp.607-611.
- [74] Colalongo, L., Richelli, A., Cabinio, P. and Kovacs-Vajna, Z.M., 2017. A Bidirectional Differential Cascode Voltage Switch DC–DC Buck-Boost Converter for Low Voltage Application. Journal of Low Power Electronics, 13(2), pp.255-262.
- [75] Chen, X., Zhou, T., Huang, J., Wang, G. and Li, Y., 2020. A Sub-100mV Ultra-Low Voltage Level-Shifter Using Current Limiting Cross-Coupled Technique for Wide-Range Conversion to I/O Voltage. IEEE Access, 8, pp.145577-145585.
- [76] Fassio, L., Settino, F., Lin, L., De Rose, R., Lanuzza, M., Crupi, F. and Alioto, M., 2020. A Robust, High-Speed and Energy-Efficient Ultralow-Voltage Level Shifter. IEEE Transactions on Circuits and Systems II: Express Briefs.
- [77] Late, E., Ytterdal, T. and Aunet, S., 2020. An energy efficient level shifter capable of logic conversion from sub-15 mV to 1.2 v. IEEE Transactions on Circuits and Systems II: Express Briefs, 67(11), pp.2687-2691.
- [78] Kim, K., Kim, J.Y., Moon, B.M. and Jung, S.O.K., 2020. A 6.9 μm2 3.26-ns 31.25fJ Robust Level Shifter with Wide Voltage and Frequency Ranges. IEEE Transactions on Circuits and Systems II: Express Briefs.
- [79] Garcia, J.C., Montiel–Nelson, J.A. and Nooshabadi, S., 2018. High performance CMOS level up shifter with full–scale 1.2 V output voltage. Microelectronics Journal, 78, pp.11-15.
- [80] Garcia, J.C., Montiel-Nelson, J.A. and Nooshabadi, S., 2017. High performance single supply CMOS 0.45–1 V input to 1.1 V output level up shifter. Microelectronics Journal, 60, pp.82-86.
- [81] Chang, M.C., Chang, C.S., Chao, C.P., Goto, K.I., Ieong, M., Lu, L.C. and Diaz, C.H., 2007. Transistor-and circuit-design optimization for low-power CMOS. IEEE Transactions on Electron Devices, 55(1), pp.84-95.
- [82] Matsuzuka, R., Hirose, T., Shizuku, Y., Shinonaga, K., Kuroki, N. and Numa, M., 2017. An 80-mv-to-1.8-V conversion-range low-energy level shifter for extremely low-voltage VLSIs. IEEE Transactions on Circuits and Systems I: Regular Papers, 64(8), pp.2026-2035.

- [83] Rajendran, S. and Chakrapani, A., 2021. Fast and Energy-Efficient Level Shifter Using Split-Control Driver for Mixed-Signal Systems. Arabian Journal for Science and Engineering, pp.1-6.
- [84] Rajendran, S. and Chakrapani, A., 2021. Energy-efficient CMOS voltage level shifters with single-V<sub>DD</sub> for multi-core applications. Analog Integrated Circuits and Signal Processing, pp.1-7.
- [85] Hosseini, S.R., Saberi, M. and Lotfi, R., 2017. A high-speed and power-efficient voltage level shifter for dual-supply applications. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 25(3), pp.1154-1158.
- [86] Chouhan, S.S. and Halonen, K., 2016. A 0.67-μW177-ppm/ C All-MOS Current Reference Circuit in a 0.18-μm CMOS Technology. IEEE Transactions on Circuits and Systems II: Express Briefs, 63(8), pp.723-727.
- [87] Toledo, P., Cordova, D., Klimach, H., Bampi, S. and Crovetti, P.S., 2019. A 0.3–1.2 V Schottky-Based CMOS ZTC Voltage Reference. IEEE Transactions on Circuits and Systems II: Express Briefs, 66(10), pp.1663-1667.
- [88] Lin, J., Wang, L., Zhan, C. and Lu, Y., 2019. A 1-nW ultra-low voltage subthreshold CMOS voltage reference with 0.0154%/V line sensitivity. IEEE Transactions on Circuits and Systems II: Express Briefs, 66(10), pp.1653-1657.
- [89] de Oliveira, A.C., Cordova, D., Klimach, H. and Bampi, S., 2017. Picowatt, 0.45–0.6 V selfbiased subthreshold CMOS voltage reference. IEEE Transactions on Circuits and Systems I: Regular Papers, 64(12), pp.3036-3046.
- [90] Lu, T.C., Ker, M.D. and Zan, H.W., 2016, April. A 70nW, 0.3 V temperature compensation voltage reference consisting of subthreshold MOSFETs in 65nm CMOS technology. In 2016 International Symposium on VLSI Design, Automation and Test (VLSI-DAT) (pp. 1-4). IEEE.
- [91] Lei, J., Wang, Z. and Wang, X., 2019. A 68-nW novel CMOS sub-bandgap voltage reference circuit. Microelectronics Journal, 89, pp.37-40.
- [92] Thakur, A., Pandey, R. and Rai, S.K., 2020. Low temperature coefficient and low line sensitivity subthreshold curvature-compensated voltage reference. International Journal of Circuit Theory and Applications, 48(11), pp.1900-1921.

- [93] Shao, C.Z., Kuo, S.C. and Liao, Y.T., 2020. A 1.8-nW,-73.5-dB PSRR, 0.2-ms Startup Time, CMOS Voltage Reference With Self-Biased Feedback and Capacitively Coupled Schemes. IEEE Journal of Solid-State Circuits.
- [94] Wang, L. and Zhan, C., 2019. A 0.7-V 28-nW CMOS subthreshold voltage and current reference in one simple circuit. IEEE Transactions on Circuits and Systems I: Regular Papers, 66(9), pp.3457-3466.
- [95] Olivera, F. and Petraglia, A., 2019. Adjustable output CMOS voltage reference design. IEEE Transactions on Circuits and Systems II: Express Briefs, 67(10), pp.1690-1694.
- [96] Souliotis, G., Plessas, F. and Vlassis, S., 2018. A high accuracy voltage reference generator. Microelectronics journal, 75, pp.61-67.
- [97] Yan, T., Chi Wa, U., Law, M.K. and Lam, C.S., 2020. A −40°C-125°C, 1.08ppm/°C, 918nW bandgap voltage reference with segmented curvature compensation. Microelectronics Journal, 105, p.104897.
- [98] Abdi, A. and Cha, H.K., 2019. A regulated multiple-output high-voltage charge pump IC for implantable neural stimulators. Microelectronics Journal, 92, p.104617.
- [99] Gao, J., Gu, T., Nie, K., Gao, Z. and Xu, J., 2020. A Low-Ripple Charge Pump With Novel Compensator for Transient-Response Improvement in CMOS Image Sensors. IEEE Transactions on Circuits and Systems II: Express Briefs.
- [100] Li, H., Shen, Y., Wang, T. and Liu, J., 2019. A 210fs RMS jitter 187.5 MHz-3GHz fractional-N frequency synthesizer with quantization noise suppression techniques and chopping differential charge pump for SDR applications. Microelectronics Journal, 85, pp.135-143.
- [101] Halupka, D., 2011. Effects of silicon variation on nano-scale solid-state memories. University of Toronto.
- [102] Sharifkhani, M., 2006. Design and analysis of low-power SRAMs.
- [103] Hussain, W. and Jahinuzzaman, S.M., 2012. A read-decoupled gated-ground SRAM architecture for low-power embedded memories. Integration, 45(3), pp.229-236.
- [104] Keerthi, R., 2007. Stability and Static Noise Margin Analysis of Static Random Access Memory (Doctoral dissertation, Wright State University).
- [105] Biswas, A., 2018. Energy-efficient smart embedded memory design for IoT and AI (Doctoral dissertation, Massachusetts Institute of Technology).

- [106] Birla, S., Singh, R.K. and Pattnaik, M., 2011. Static noise margin analysis of various SRAM topologies. International Journal of Engineering and Technology, 3(3), p.304.
- [107] Gopal, M., Prasad, D.S.S. and Raj, B., 2013. 8T SRAM cell design for dynamic and leakage power reduction. International Journal of Computer Applications, 71(9), pp.43-48.
- [108] Fischer, T., Amirante, E., Huber, P., Nirschl, T., Olbrich, A., Ostermayr, M. and Schmitt-Landsiedel, D., 2008. Analysis of read current and write trip voltage variability from a 1-MB SRAM test structure. IEEE transactions on Semiconductor Manufacturing, 21(4), pp.534-541.
- [109] Grossar, E., Stucchi, M., Maex, K. and Dehaene, W., 2006. Read stability and write-ability analysis of SRAM cells for nanometer technologies. IEEE Journal of Solid-State Circuits, 41(11), pp.2577-2588.
- [110] Samson, M. and Srinivas, M.B., 2008, August. Analyzing N-curve metrics for sub-threshold 65nm CMOS SRAM. In 2008 8th IEEE Conference on Nanotechnology (pp. 25-28). IEEE.
- [111] Verma, N. and Chandrakasan, A.P., 2008. A 256 kb 65 nm 8T subthreshold SRAM employing sense-amplifier redundancy. IEEE Journal of Solid-State Circuits, 43(1), pp.141-149.
- [112] Wen, L., Cheng, X., Zhou, K., Tian, S. and Zeng, X., 2016. Bit-interleaving-enabled 8T SRAM with shared data-aware write and reference-based sense amplifier. IEEE Transactions on Circuits and Systems II: Express Briefs, 63(7), pp.643-647.
- [113] Verma, N. and Chandrakasan, A.P., 2007, February. A 65nm 8T sub-Vt SRAM employing sense-amplifier redundancy. In 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (pp. 328-606). IEEE.
- [114] Kime, C.R. and Mano, M.M., 2003. Logic and computer design fundamentals. Prentice Hall.
- [115] Dilillo, L., Girard, P., Pravossoudovitch, S., Virazel, A. and Bastian, M., 2007. Analysis and test of resistive-open defects in SRAM pre-charge circuits. Journal of Electronic Testing, 23(5), pp.435-444.
- [116] Wang, D.P., Lin, H.J., Chuang, C.T. and Hwang, W., 2014. Low-power multiport sram with cross-point write word-lines, shared write bit-lines, and shared write row-access transistors. IEEE Transactions on Circuits and Systems II: Express Briefs, 61(3), pp.188-192.
- [117] Baker, R.J., 2019. CMOS: circuit design, layout, and simulation. John Wiley & Sons.

- [118] Troutman, R.R., 1986. Latchup in CMOS technology: the problem and its cure.
- [119] Clein, D., 1999. CMOS IC layout: concepts, methodologies, and tools. Elsevier.
- [120] Morris, W., 2003, March. Latchup in CMOS. In 2003 IEEE International Reliability Physics Symposium Proceedings, 2003. 41st Annual. (pp. 76-84). IEEE.
- [121] Tsai, H.W. and Ker, M.D., 2014. Active guard ring to improve latch-up immunity. IEEE Transactions on Electron Devices, 61(12), pp.4145-4152.
- [122] https://vlsiuniverse.blogspot.com/2013/03/latchup-condition-in-cmos-devices.html
- [123] Voldman, S.H., Perez, C.N. and Watson, A., 2006. Guard rings: Structures, design methodology, integration, experimental results, and analysis for RF CMOS and RF mixed signal BiCMOS silicon germanium technology. Journal of electrostatics, 64(11), pp.730-743.
- [124] Liao, S., Niou, C., Chien, K., Guo, A., Dong, W. and Huang, C., 2003, March. New observance and analysis of various guard-ring structures on latch-up hardness by backside photo emission image. In 2003 IEEE International Reliability Physics Symposium Proceedings, 2003. 41st Annual. (pp. 92-98). IEEE.
- [125] Veendrick, H.J., 2017. Robustness of nanometer CMOS designs: signal integrity, variability and reliability. In Nanometer CMOS ICs (pp. 429-493). Springer, Cham.
- [126] Liu, C.C., Lau, O. and Du, J.Y., 2016. Complete DFM Model for High-Performance Computing SoCs with Guard Ring and Dummy Fill Effect. arXiv preprint arXiv:1701.00460.
- [127] Narendra, S.G. and Chandrakasan, A.P. eds., 2006. Leakage in nanometer CMOS technologies. Springer Science and Business Media.
- [128] Razavi, B., 2002. Design of analog CMOS integrated circuits. Tata McGraw-Hill Education.
- [129] Rahman, M.T., Forte, D., Shi, Q., Contreras, G.K. and CSST, M.T., Preventing Distribution of Unlicensed and Rejected ICs by Untrusted Foundry and Assembly. In International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)-October-2014.
- [130] Holcomb, D.E., Burleson, W.P. and Fu, K., 2008. Power-up SRAM state as an identifying fingerprint and source of true random numbers. IEEE Transactions on Computers, 58(9), pp.1198-1210.

- [131] Yu, M.D. and Devadas, S., 2010. Secure and robust error correction for physical unclonable functions. IEEE Design and Test of Computers, 27(1), pp.48-65.
- [132] Su, Y., Holleman, J. and Otis, B.P., 2008. A digital 1.6 pJ/bit chip identification circuit using process variations. IEEE Journal of Solid-State Circuits, 43(1), pp.69-77.
- [133] Suh, G.E. and Devadas, S., 2007, June. Physical unclonable functions for device authentication and secret key generation. In 2007 44th ACM/IEEE Design Automation Conference (pp. 9-14). IEEE.
- [134] Alvarez, A., Zhao, W. and Alioto, M., 2015, February. 14.3 15fJ/b static physically unclonable functions for secure chip identification with; 2% native bit instability and 140× Inter/Intra PUF hamming distance separation in 65nm. In 2015 IEEE International Solid-State Circuits Conference-(ISSCC) Digest of Technical Papers (pp. 1-3). IEEE.
- [135] Karpinskyy, B., Lee, Y., Choi, Y., Kim, Y., Noh, M. and Lee, S., 2016, January. 8.7 Physically unclonable function for secure key generation with a key error rate of 2E-38 in 45nm smart-card chips. In 2016 IEEE International Solid-State Circuits Conference (ISSCC) (pp. 158-160). IEEE.
- [136] Suh, G.E. and Devadas, S., 2007, June. Physical unclonable functions for device authentication and secret key generation. In 2007 44th ACM/IEEE Design Automation Conference (pp. 9-14). IEEE.
- [137] Delvaux, J. and Verbauwhede, I., 2014. Fault injection modeling attacks on 65 nm arbiter and RO sum PUFs via environmental changes. IEEE Transactions on Circuits and Systems I: Regular Papers, 61(6), pp.1701-1713.
- [138] Cao, Y., Zhang, L., Chang, C.H. and Chen, S., 2015. A low-power hybrid RO PUF with improved thermal stability for lightweight applications. IEEE Transactions on computeraided design of integrated circuits and systems, 34(7), pp.1143-1147.
- [139] Keller, C., Gürkaynak, F., Kaeslin, H. and Felber, N., 2014, June. Dynamic memory-based physically unclonable function for the generation of unique identifiers and true random numbers. In 2014 IEEE International Symposium on Circuits and Systems (ISCAS) (pp. 2740-2743). IEEE.
- [140] Hashemian, M.S., Singh, B., Wolff, F., Weyer, D., Clay, S. and Papachristou, C., 2015, March. A robust authentication methodology using physically unclonable functions

in DRAM arrays. In 2015 Design, Automation and Test in Europe Conference and Exhibition (DATE) (pp. 647-652). IEEE.

- [141] Keller, C., Gürkaynak, F., Kaeslin, H. and Felber, N., 2014, June. Dynamic memory-based physically unclonable function for the generation of unique identifiers and true random numbers. In 2014 IEEE International Symposium on Circuits and Systems (ISCAS) (pp. 2740-2743). IEEE.
- [142] Hashemian, M.S., Singh, B., Wolff, F., Weyer, D., Clay, S. and Papachristou, C., 2015, March. A robust authentication methodology using physically unclonable functions in DRAM arrays. In 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE) (pp. 647-652). IEEE.
- [143] Rosenblatt, S., Fainstein, D., Cestero, A., Safran, J., Robson, N., Kirihata, T. and Iyer, S.S., 2013. Field tolerant dynamic intrinsic chip ID using 32 nm high-K/metal gate SOI embedded DRAM. IEEE journal of solid-state circuits, 48(4), pp.940-947.
- [144] Holcomb, D.E., Burleson, W.P. and Fu, K., 2008. Power-up SRAM state as an identifying fingerprint and source of true random numbers. IEEE Transactions on Computers, 58(9), pp.1198-1210.
- [145] Xiao, K., Rahman, M.T., Forte, D., Huang, Y., Su, M. and Tehranipoor, M., 2014, May. Bit selection algorithm suitable for high-volume production of SRAM-PUF. In 2014 IEEE international symposium on hardware-oriented security and trust (HOST) (pp. 101-106). IEEE.
- [146] Eiroa, S., Castro, J., Martínez-Rodríguez, M.C., Tena, E., Brox, P. and Baturone, I., 2012, December. Reducing bit flipping problems in SRAM physical unclonable functions for chip identification. In 2012 19th IEEE International Conference on Electronics, Circuits, and Systems (ICECS 2012) (pp. 392-395). IEEE.
- [147] Maes, R. and Verbauwhede, I., 2010. Physically unclonable functions: A study on the state of the art and future research directions. In Towards Hardware-Intrinsic Security (pp. 3-37). Springer, Berlin, Heidelberg.
- [148] Yu, M.D. and Devadas, S., 2010. Secure and robust error correction for physical unclonable functions. IEEE Design and Test of Computers, 27(1), pp.48-65.

- [149] Merli, D., Schuster, D., Stumpf, F. and Sigl, G., 2011, June. Side-channel analysis of PUFs and fuzzy extractors. In International Conference on Trust and Trustworthy Computing (pp. 33-47). Springer, Berlin, Heidelberg.
- [150] Armknecht, F., Maes, R., Sadeghi, A.R., Sunar, B. and Tuyls, P., 2010. Memory leakageresilient encryption based on physically unclonable functions. In Towards Hardware-Intrinsic Security (pp. 135-164). Springer, Berlin, Heidelberg.
- [151] Dodis, Y., Reyzin, L. and Extractors, A.S.F., 2004. How to Generate Strong Keys from Biometrics and Other Noisy, Data April 13. EUROCRYPT.
- [152] Maes, R., Van Der Leest, V., Van Der Sluis, E. and Willems, F., 2015, September. Secure key generation from biased PUFs. In International Workshop on Cryptographic Hardware and Embedded Systems (pp. 517-534). Springer, Berlin, Heidelberg.
- [153] Jeloka, S., Yang, K., Orshansky, M., Sylvester, D. and Blaauw, D., 2017, June. A sequence dependent challenge-response PUF using 28nm SRAM 6T bit cell. In 2017 Symposium on VLSI Circuits (pp. C270-C271). IEEE.
- [154] Koeberl, P., Li, J., Maes, R., Rajan, A., Vishik, C. and Wójcik, M., 2011, November. Evaluation of a PUF Device Authentication Scheme on a Discrete 0.13 um SRAM. In International Co
- [155] Maes, R. and Verbauwhede, I., 2010. Physically unclonable functions: A study on the state of the art and future research directions. In Towards Hardware-Intrinsic Security (pp. 3-37). Springer, Berlin, Heidelberg.
- [156] Eiroa, S., Castro, J., Martínez-Rodríguez, M.C., Tena, E., Brox, P. and Baturone, I., 2012, December. Reducing bit flipping problems in SRAM physical unclonable functions for chip identification. In 2012 19th IEEE International Conference on Electronics, Circuits, and Systems (ICECS 2012) (pp. 392-395). IEEE.
- [157] Maes, R., Tuyls, P. and Verbauwhede, I., 2009, June. A soft decision helper data algorithm for SRAM PUFs. In 2009 IEEE international symposium on information theory (pp. 2101-2105). IEEE.
- [158] Xiao, K., Rahman, M.T., Forte, D., Huang, Y., Su, M. and Tehranipoor, M., 2014, May. Bit selection algorithm suitable for high-volume production of SRAM-PUF. In 2014 IEEE international symposium on hardware-oriented security and trust (HOST) (pp. 101-106). IEEE.

- [159] Helfmeier, C., Boit, C., Nedospasov, D. and Seifert, J.P., 2013, June. Cloning physically unclonable functions. In 2013 IEEE International Symposium on Hardware-Oriented Security and Trust (HOST) (pp. 1-6). IEEE.
- [160] Maiti, A., Gunreddy, V. and Schaumont, P., 2013. A systematic method to evaluate and compare the performance of physical unclonable functions. In Embedded systems design with FPGAs (pp. 245-267). Springer, New York, NY.
- [161] Wang, W., Guin, U. and Singh, A., 2020. Aging-Resilient SRAM-based True Random Number Generator for Lightweight Devices. Journal of Electronic Testing, 36, pp.301-311.
- [162] Burd, T.D., Pering, T.A., Stratakos, A.J. and Brodersen, R.W., 2000. A dynamic voltage scaled microprocessor system. IEEE Journal of solid-state circuits, 35(11), pp.1571-1580.
- [163] Mizuno, H., Ishibashi, K., Shimura, T., Hattori, T., Narita, S., Shiozawa, K., Ikeda, S. and Uchiyama, K., 1999. An 18-/spl mu/A standby current 1.8-V, 200-MHz microprocessor with self-substrate-biased data-retention mode. IEEE Journal of Solid-State Circuits, 34(11), pp.1492-1500.
- [164] Myers, J., Savanth, A., Howard, D., Gaddh, R., Prabhat, P. and Flynn, D., 2015, February.
  8.1 An 80nW retention 11.7 pJ/cycle active subthreshold ARM Cortex-M0+ subsystem in 65nm CMOS for WSN applications. In 2015 IEEE International Solid-State Circuits Conference-(ISSCC) Digest of Technical Papers (pp. 1-3). IEEE.
- [165] Liu, C.Q., Cao, Y. and Chang, C.H., 2017. ACRO-PUF: A low-power, reliable and agingresilient current starved inverter-based ring oscillator physical unclonable function. IEEE Transactions on Circuits and Systems I: Regular Papers, 64(12), pp.3138-3149.
- [166] Zhao, X., Gan, P., Zhao, Q., Liang, D., Cao, Y., Pan, X. and Bermak, A., 2019. A 124 fJ/bit cascode current mirror array based PUF with 1.50% native unstable bit ratio. IEEE Transactions on Circuits and Systems I: Regular Papers, 66(9), pp.3494-3503.
- [167] Shifman, Y., Miller, A., Keren, O., Weizman, Y. and Shor, J., 2020. A Method to Utilize Mismatch Size to Produce an Additional Stable Bit in a Tilting SRAM-Based PUF. IEEE Access, 8, pp.219137-219150.
- [168] Shifman, Y., Miller, A., Keren, O., Weizman, Y. and Shor, J., 2020. An SRAM-based PUF with a capacitive digital preselection for a 1E-9 key error probability. IEEE Transactions on Circuits and Systems I: Regular Papers, 67(12), pp.4855-4868.

- [169] Liu, K., Min, Y., Yang, X., Sun, H. and Shinohara, H., 2020. A 373-F 2 0.21%-Native-BER EE SRAM Physically Unclonable Function With 2-D Power-Gated Bit Cells and V<sub>SS</sub> Bias-Based Dark-Bit Detection. IEEE Journal of Solid-State Circuits, 55(6), pp.1719-1732. nference on Trusted Systems (pp. 271-288). Springer, Berlin, Heidelberg.
- [170] Maes, R., Tuyls, P. and Verbauwhede, I., 2009, September. Low-overhead implementation of a soft decision helper data algorithm for SRAM PUFs. In International Workshop on Cryptographic Hardware and Embedded Systems (pp. 332-347). Springer, Berlin, Heidelberg.
- [171] Fujiwara, H., Yabuuchi, M., Nakano, H., Kawai, H., Nii, K. and Arimoto, K., 2011, June. A chip-ID generating circuit for dependable LSI using random address errors on embedded SRAM and on-chip memory BIST. In 2011 Symposium on VLSI circuits-digest of technical papers (pp. 76-77). IEEE.
- [172] Van Herrewege, A., Schaller, A., Katzenbeisser, S. and Verbauwhede, I., 2013, November. Inherent PUFs and secure PRNGs on commercial off-the-shelf microcontrollers. In Proceedings of the 2013 ACM SIGSAC conference on Computer and communications security (pp. 1333-1336).

# Vita Auctoris

NAME: Neda Rezaei

EDUCATION: Bachelor of Electrical Engineering, Azad University, Iran, 2009.

Master of Electrical Engineering, University of Shahid Chamran, Iran, 2013.

PUBLICATIONS: Rezaei, N. and Mirhassani, M., 2021. Ultra Low-Power Negative DC Voltage Generator Based on a Proposed Level Shifter and Voltage Reference. Microelectronics Journal, p.105087.

> Rezaei, N. and Mirhassani, M., 2021. An Efficient High Speed and Low Power Voltage-Level Shifter. AEUE - International Journal of Electronics and Communications.