DOCTORAL THESIS

# A Study on Ultra-Low Power and Large-Scale Design of Digital Circuit for Wireless Communications

Hokkaido University

Mohd Shamian Zainal

July 2010

## Preface

The continuous growth of recent mobile and portable devices has caused a push greater towards low-power circuit designs. Various methods and techniques have been found, for example, the utilization of concurrent or pipeline architecture with low supply voltage for traditional circuits. Proper designs of subthreshold circuits operating in a weak inversion region achieves ultra-low threshold and supply voltages and has been studied for both analog and digital circuits.

The analog circuit has been studied and implemented in many areas such as speech signal and image processing. On the other hand, digital circuits have been studied for very low clock frequency and can be applied in medical devices such as pacemakers and defibrillators. For the idle state of low-power, subthreshold voltage condition has been used for microprocessors in ultra-low voltage operation and leakage current. The idea to study subthreshold operation comes after much research carried out through conventional analysis focusing on, for example, low power, low voltage, low frequency, and application in small circuit systems.

Recently, as a result of the aggressive scaling of transistor size for high-performance applications, not only does subthreshold leakage current increase exponentially, but gate leakage and reverse-biased source-substrate and drain-substrate junction band-to-band tunneling (BTBT) currents also increase significantly. The tunneling currents are detrimental to the functionality of the devices. The well-known methods of low-power design (such as voltage scaling, switching activity reduction, architectural techniques of pipelining and parallelism, computer-aided design (CAD) techniques of device sizing, interconnect, and logic optimization). This may not be sufficient in many applications such as portable computing gadgets, and medical electronics, where ultra-low power consumption with medium frequency of operation is the primary requirement. To cope with this, several novel design techniques have been proposed. Energy recovery or adiabatic techniques are promising for reducing power in computation by orders of magnitude. However, they involve the use of high-quality inductors, which makes integration difficult. More recently, the design of digital subthreshold logic was investigated with transistors operated in the subthreshold region.

The aim of this study is to achieve ultra-low power communication circuits operating at high frequency. In this situation, we focus on implementing large-scale subthreshold circuits and must explore a new design in which only the CMOS standard cell library is used and simplify the modeling procedure of subthreshold circuits. The conventional design involves subthreshold analysis on a transistor level or cell library preparation under multiple voltage conditions. This procedure has disadvantageous that requires a long time to estimate the circuit performance for operation in the subthreshold region.

We proposed scale modeling so we need only to use a typical cell library, which is suitable for large-scale digital circuits such as wireless communication circuits. In the proposed method, each CMOS logic cell operating in the subthreshold region in circuit delays and power dissipation are analyzed and scaled factors are obtained by mapping from typical to subthreshold voltage conditions. This process does not need preparation of a special-purpose CMOS library operating in the sub-threshold region. The critical path delay is also obtained by scaling factors and used for determining the optimal voltage condition that satisfies the required timing constrains. For practical examples, we have designed wireless circuits of a channel equalizer, FIR filter and FFT used in an OFDM receiver. These circuits have been power dissipated by adjusting the overall voltage conditions to satisfy the required timing constrains of IEEE802.11a standard.

Continuing from the first research, we explore the power reduction on dynamic wordlength and voltage scaling for digital signal processing circuits. The determination of wordlength in digital signal processing (DSP) affects system performance, hardware size, and power consumption. A large wordlength yields better performance in digital hardware but increases power consumption. A small wordlength degrades system performance if the dynamic range is insufficient. Use of a fixed wordlength determined in design-level lacks flexibility for such changeable environments. Use of a dynamic variable wordlength technique can maintain system performance and keep power consumption low by dynamically changing an optimal wordlength for various environments. This technique has been applied to an OFDM demodulator and to an equalizer. There are two ways in reducing power for variable wordlength. One is to decrease switching activities by stopping unnecessary bit operations. Variable wordlength chooses small and large wordlength modes. For a small wordlength mode, unused bits can be masked by zero values. Gated clocks are

effective in halting switching activities for registers. However, it requires a clock management in its system. The other is voltage scaling (called as minimum power locus) to normalize a circuit delay for each wordlength mode. A small wordlength has a timing margin in a critical path when the timing delay of a large wordlength is adopted. It enables decreasing a voltage so as to have the same circuit delay of a large wordlength. Thus, dynamic wordlength and voltage scaling (DWVS) is suitable for power reduction in variable wordlength architecture.

This second research focus is power modeling for DWVS. The work does transistor-level simulation or actual measurements to analyze power consumption of variable wordlength. However, more rapid analysis and estimation done at gate-level and function-level are required for large scale circuits. We present a new power modeling approach where both voltage scaling and switching activities are modeled as DWVS parameters.

## Contents

| 1        | Intro | duction                                                              | 1  |
|----------|-------|----------------------------------------------------------------------|----|
| <b>2</b> | Low   | Power Design                                                         | 5  |
|          | 2.1   | Introduction                                                         | 5  |
|          | 2.2   | CMOS Circuits                                                        | 6  |
|          | 2.3   | Subthreshold Circuit                                                 | 10 |
|          | 2.4   | Power Modeling                                                       | 11 |
|          | 2.5   | Voltage Scaling                                                      | 14 |
|          | 2.6   | Frequency Scaling                                                    | 15 |
|          | 2.7   | Clock Gating and Power Gating                                        | 16 |
|          | 2.8   | Variable Wordlength                                                  | 18 |
|          | 2.9   | Cell Library                                                         | 19 |
|          | 2.10  | Summary                                                              | 26 |
| 3        | Wire  | eless Circuit Design                                                 | 27 |
|          | 3.1   | OFDM                                                                 | 27 |
|          | 3.2   | OFDM Transceiver                                                     | 28 |
|          |       | 3.2.1 FEC Coder                                                      | 29 |
|          |       | 3.2.2 Mapper                                                         | 31 |
|          |       | 3.2.3 IFF                                                            | 32 |
|          |       | 3.2.4 Guard Interval                                                 | 32 |
|          | 3.3   | Low Pass FIR Filter                                                  | 33 |
|          | 3.4   | FFT                                                                  | 34 |
|          | 3.5   | Channel Equalizer                                                    | 35 |
|          | 3.6   | RTL Design                                                           | 36 |
|          | 3.7   | Summary                                                              | 39 |
| 4        | Para  | meter Scaling for Subthreshold Circuits                              | 40 |
|          | 4.1   | Introduction                                                         | 40 |
|          | 4.2   | Proposed Method                                                      | 42 |
|          | 4.3   | Evaluation                                                           | 51 |
|          |       | 4.3.1 Channel Equalizer                                              | 51 |
|          |       | 4.3.2 FFT                                                            | 52 |
|          |       | 4.3.3 FIR Filter                                                     | 56 |
|          | 4.4   | Consideration                                                        | 57 |
|          | 4.5   | Summary                                                              | 57 |
| 5        | Dyn   | amic Wordlength and Voltage Scaling for Variable Wordlength Circuits | 58 |
|          | 5.1   | Introduction                                                         | 58 |
|          | 5.2   | Dynamic Wordlength and Voltage Scaling                               | 58 |
|          | 5.3   | Proposed Method                                                      | 62 |
|          |       |                                                                      |    |

| 5.4 Evaluation                | 64 |
|-------------------------------|----|
| 5.5 Summary                   | 67 |
| 6 Conclusion and Future Works | 68 |
| 61 Conclusion                 | 68 |
| 6.2 Contribution of this Work | 68 |
| 6.3 Future Research           | 69 |
| Bibliography                  |    |
| List of Author's Publications | 76 |
| Appendix                      | 78 |

.

# **List of Figures**

| 2.1  | CMOS circuit operation                                                          | 7   |
|------|---------------------------------------------------------------------------------|-----|
| 2.2  | Power dissipation based on transition                                           | 13  |
| 2.3  | Power dissipation based on independent switching                                | 13  |
| 2.4  | Power dissipation based on short-circuit                                        | 14  |
| 3.1  | Frequency responses of the subcarriers in a 4 tone OFDM signal                  | 28  |
| 3.2  | Block diagram of the 802.11a OFDM transceiver                                   | 29  |
| 3.3  | Addition of a guard period to an OFDM signal                                    | 33  |
| 3.4  | FIR filter architecture                                                         | 34  |
| 3.5  | Pipeline FFT structure                                                          | 35  |
| 3.6  | Channel equalizer architecture                                                  | 36  |
| 4.1  | MOS transistor in subthreshold region                                           | 41  |
| 4.2  | Region of operation of digital subthreshold circuit                             | 41  |
| 4.3  | Flowchart for conventional modeling process with different supply voltage       | 43  |
| 4.4  | Flowchart for conventional modeling process with multiple supply voltage        | 44  |
| 4.5  | Flowchart for proposed modeling process                                         | 44  |
| 4.6  | Flowchart for proposed scaling process                                          | 45  |
| 4.7  | Flowchart for modeling process: (a) conventional method and (b) proposed method | 45  |
| 48   | Maximum delay for each logic cell                                               | 48  |
| 4.9  | Maximum critical nath delay $(T_{i})$                                           | 48  |
| 4 10 | Power dissipation for each logic cell                                           | 49  |
| 4 11 | Scaling factor $(S_k)$ for power dissipation.                                   | 49  |
| 4 12 | Maximum power dissipation $(P_k)$                                               | 50  |
| 4 13 | Power dissipation for channel equalizer.                                        | 52  |
| 4.14 | Critical path delay for channel equalizer                                       | 53  |
| 4.15 | Power consumption for pipeline FFT                                              | 53  |
| 4.16 | Critical path delay for pipeline FFT                                            | 54  |
| 4.17 | Power dissipation for FIR filter                                                | .54 |
| 4.18 | Critical path delays for FIR filter                                             | 55  |
| 5.1  | Variable wordlength mechanism                                                   | .59 |
| 5.2  | Environment condition                                                           | 59  |
| 5.3  | Example of logic units                                                          | 59  |
| 5.4  | Gated clock                                                                     | 60  |
| 5.5  | Modeling process power reduction                                                | .63 |
| 5.6  | Delay for NAND gate                                                             | .64 |
| 5.7  | Wordlength characteristic with different delay                                  | .65 |
| 5.8  | Wordlength characteristic with different power                                  | .65 |

## **List of Tables**

.

| 30 |
|----|
| 46 |
|    |
|    |
|    |
|    |
|    |
| 66 |
|    |

· ·

## **1** Introduction

The constant changes of recent hand and portable devices have caused a demandgreater for lower power circuit design. Various methods and techniques have been found, for example, the utilization of concurrent or pipeline architecture with low supply voltage for traditional circuits [1]. Proper designs of subthreshold circuits operating in a weak inversion region achieves ultra-low threshold and supply voltages and has been studied for both analog and digital circuits [2].

Studies have been made on analog circuits and were implemented in many fields such as figure of speech and image processing [3]. On the other hand, digital circuits have been studied for very low clock frequency and can be applied in medical devices such as pacemakers and defibrillators. For the idle state of low-power, subthreshold voltage condition has been used for microprocessors in ultra-low voltage operation and leakage current. The idea to study subthreshold operation comes after much research carried out through conventional analysis focusing on, for example, low power, low voltage, low frequency, and applications in small circuit systems [4], [5].

Because of the fast growth scaling of transistor size for high-performance applications, not only does subthreshold leakage current increase exponentially, but gate leakage and reverse-biased source-substrate and drain-substrate junction band-to-band tunneling (BTBT) currents also increase significantly. Tunneling currents are detrimental to the functionality of the devices. The well-known methods of low-power design (such as voltage scaling, switching activity reduction, architectural techniques of pipelining and parallelism, computer aided design (CAD) techniques of device sizing, interconnect, and logic optimization). This may not be sufficient in many applications such as portable computing gadgets, and medical electronics, where ultra-low power consumption with medium frequency of operation is the primary requirement [6]. To cope with this, several novel design techniques have been proposed. Energy recovery or adiabatic techniques are promising for reducing power in computation by orders of magnitude. However, they involve the use of high-quality inductors, which makes integration difficult. More recently, the design of digital subthreshold logic was investigated with transistors operated in the

subthreshold region.

The objective of this research is to attain ultra-low power communication circuits operating at high-frequency. Ultra word refers to a condition far from the usual norm in which he showed to the extremist and extreme. For the words in the low voltage electronic circuits it refers to situations where electrical power consumption that is deliberately kept low. Therefore, the definition of ultra-low power refers to an electronic device that has millimeter or micro-watt power consumption. Large-scale digital circuit that has millimeter or micro-watt power consumption be developed through research at the subthreshold region.

Here, we concentrate on enforcing large-scale subthreshold circuits and must explore a new design in which only the CMOS standard cell library is used and simplify the modeling procedure of subthreshold circuits. The conventional design involves subthreshold analysis on a transistor level or cell library preparation under multiple voltage conditions [7]-[9]. This procedure has disadvantageous that requires a long time to estimate the circuit performance for operation in the subthreshold region if many voltages are tested.

We introduced scale modeling because we only needed to use a typical cell library, which is suitable for large-scale digital circuits such as wireless communication circuits. In the proposed method, each CMOS logic cell operating in the subthreshold region in circuit delays and power dissipation are analyzed and scaled factors are obtained by mapping from typical to subthreshold voltage conditions. This process does not need preparation of a special-purpose CMOS library operating in the subthreshold region. The critical path delay is also obtained by scaling factors and used for determining the optimal voltage condition that satisfies the required timing constrains. For practical examples, we have designed wireless circuits of a channel equalizer, FIR filter and FFT used in an OFDM receiver. These circuits have been power dissipated by adjusting the overall voltage conditions to satisfy the required timing constrains of IEEE802.11a standard.

After our first research, we continue to explore the power reduction on dynamic wordlength and voltage scaling for digital signal processing circuits. The determination of wordlength in digital signal processing (DSP) affects system performance, hardware size, and power consumption [10], [11]. A large wordlength yields better performance in digital hardware but increases power consumption. A small wordlength degrades system performance if the dynamic range is insufficient. Use of a fixed wordlength determined in design-level lacks flexibility for such changeable environments. Use of a dynamic variable

wordlength technique can maintain system performance and keep power consumption low by dynamically changing an optimal wordlength for various environments. This technique has been applied to an OFDM demodulator [12], [13] and to an equalizer [14]. There are two ways in reducing power for variable wordlength. One is to decrease switching activities by stopping unnecessary bit operations. Variable wordlength chooses small and large wordlength modes. For a small wordlength mode, unused bits can be masked by zero values. Gated clocks are effective in halting switching activities for registers [15]. However, it requires a clock management in its system. The other is voltage scaling (called as minimum power locus in [16]) to normalize a circuit delay for each wordlength mode. A small wordlength has a timing margin in a critical path when the timing delay of a large wordlength is adopted. It enables to decrease the voltage so as to have the same circuit delay of a large wordlength. Thus, dynamic wordlength and voltage scaling (DWVS) is suitable for power reduction in variable wordlength architecture.

Our second research focus is power modeling for DWVS. The work [16] does transistor-level simulation or actual measurements to analyze power consumption of variable wordlength. However, more rapid analysis and estimation done at gate-level and function-level are required for large scale circuits. We present a new power modeling approach where both voltage scaling and switching activities are modeled as DWVS parameters.

This work consists of six chapters which are organized as follows.

In Chapter 2, we describe previous work related to low power design such as CMOS circuits, subthreshold circuit, power modeling, voltage scaling, frequency scaling, clock gating, power gating, variable wordlength, and cell library.

In Chapter 3, we explain the complete OFDM system. We also present our laboratory wireless circuit as an example to evaluate the power consumption and critical path delay from subthreshold to super-threshold region. We explained the three wireless circuits in this chapter such as low pass FIR filter, FFT, and channel equalizer. We also explained about RTL script as an input for voltage scaling.

In Chapter 4, we present a parameter scaling for subthreshold circuits. We also explain our proposed model for evaluating large scale digital circuit. We used voltage scaling technique for power reduction. This technique reduces the total power consumption based on reducing the supply voltage. We evaluated the power consumption and critical path delay from three wireless circuits.

3

In Chapter 5, we report the implementation of dynamic wordlength and voltage scaling in variable wordlength circuits. We also explain the concept of dynamic wordlength and scaling voltage for wireless communication systems. We explained the proposed method for power reduction. Next, we evaluate the effect of power dissipation, critical path delay, variable wordlength and activity factor for low pass FIR filter.

In Chapter 6, we summarize the contributions of this thesis.

## 2 Low Power Design

#### 2.1 Introduction

Low power has emerged as a principal theme in today's electronics industry. The need for low power has caused a major paradigm shift where power dissipation has become as important a consideration as performance and area. Importance of power reduction in digital systems is now widely recognized as a key for enhancing the value of devices equipped with battery cells such as portable computers and mobile phones. Particularly helpful is the dramatic improvement in energy efficiency that is obtained by reducing the supply voltage. However, voltage scaling alone is not enough as complexity increases and as applications which require the use of a portable energy source require further energy reduction [17]. In term of software design, digital signal processing can also reduce power. The reduction of power can be done through the low power design technique such as voltage scaling, frequency scaling and optimize the CMOS logic cell [18]. Currently, no single technology is able to solve all the problems associated with the reduction in power consumption. Therefore motivating us to find a different method in enhancing the reduction of power consumption for digital circuits.

The purposes of Chapter 2 is based on previous studies related to low power design. One of the studies that is associated with low power design is CMOS circuit. Further study is concerned with subthreshold circuit. Subthreshold circuit study is based on several reasons including the need to reduce the effect of heat in electronic devices. Topics related to the power of modeling are also discussed here. Talk about the power modeling, is referring to the dynamic power, static power and leakage power. Voltage scaling method is also described in this chapter. Method of voltage scaling is used to decrease the voltage directly on the total power equation. While the method of frequency scaling is used with a different frequency to reduce power dissipation. Clock gating method is the most common optimization for reducing dynamic power that prevents propagation of signal transition of invalid data with keeping the stable value of the current data. Method of changing a variable wordlength of the state to another depending on the channel environment is able to save power consumption while stabilizing the system. Power reduction method based on cell library can be achieved when each cell is built it requires little power. When cells are combined with each other the total of power consumption for device can be reduced.

#### 2.2 CMOS Circuits

Complementary metal oxide semiconductor (CMOS) refers to a particular style of digital circuit design and processes used to implement the circuit on integrated circuits (IC). CMOS circuit has been dominated for most modern integrated circuit manufacturing [19]. This is due to the CMOS circuit dissipates, less than the logic family with resistive loads.

CMOS circuits using a combination of p-type and n-type metal-oxide semiconductor field effect transistor (MOSFET) to implement logic gates and other digital circuits found in electronic devices such as computers, telecommunication equipments, and signal processing equipments. Although CMOS logic can be implemented with particular devices, special products are CMOS integrated circuits consisting of millions of both types of transistors on a piece of silicon between 0.1 and 4 centimeters square.

CMOS circuits built up all the PMOS transistors which must have the input as a source of voltage or connected to other PMOS transistor. Similarly, all NMOS transistors have input from ground or from another NMOS transistor [20]. Composition of the PMOS transistor cause low resistance between source and low voltage contacts. On the other hand, the composition of the NMOS transistor creates a high-resistance between the source and the drain.

Figure 2.1 shows that the transistor at the input is connected to both the PMOS transistor (top diagram) and NMOS transistor (bottom diagram). When the input voltage is low, NMOS transistor output is in a high-voltage. This block of current that can flow from Q to the ground. When input PMOS transistors are in a lower resistance and more current flows from supply to output. Because the resistance between the supply voltage and Q is low, the voltage drop between the supply voltage and Q due to a current drawn from Q is small.



Figure 2.1 CMOS circuit operation

On the other hand, when the voltage input A is high, PMOS transistor is off the state. Therefore, it will block current flow from positive supply to the output, while the NMOS transistor is in the on state, which allows the output to drain into the ground. Because the resistance of the Q and the lowlands, the voltage reduction due to current drawn to place the Q on small land. This is the low drop output voltage of the low register.

In short, the output of PMOS and NMOS transistors will change depending on the input. When the input is low then the output becomes high. When the input is high the output becomes low. Because of this reversal behavior of the input and output, the output CMOS circuit is a reversal of the input.

A method for reducing standby leakage current by controlling the voltage of the MOSFET source node was proposed by Bansal et al. [21]. The method allows us to use low-threshold devices for high-speed and low power in active and standby mode. This method can easily be applied to a conventional circuit of ASIC library because there is no

additional process, circuit, or device unless the modification of some cells of the body contact is needed. Referring to the research [21], in-depth understanding about the nature of the transistor can be used to maximize the use of digital circuits at low voltage operation.

The use of digital to analog converter (DAC) circuit-based structure to produce a modulated Gaussian for low-frequency (3.1 to 5 GHz) and high-frequency (6 to 10.6 GHz) UWB band is explained in [22]. DAC reference voltage at the network obtained with the approach of constant cycle. Experimental results with Hspice software is used to determine power consumption in DAC circuit. Referring to the survey [22], Hspice software is suitable for use in evaluation of digital circuits using 0.18-µm CMOS technology with 1.8V or lower supply voltage.

The work of [23] have proposed a low-power CMOS with high reliability can reduce the current limit power dissipation circuit, and increase the reliability of the system with the implementation of the foldback current limit circuit. This circuit produced by the 0.6-µm CMOS technology site. Hspice simulation results prove eligibility circuit. Referring to the research [23], the use of foldback circuit can improve reliability and reduce power dissipation circuit. However, voltage scaling method is more convenient for reducing power dissipation in digital circuit, and improves reliability.

A new approach to model the analysis to assess the impact of single event transient (SET) in CMOS circuits is explained by Wirth et al. [24]. This model allows assessment of recognizing the transient amplitude and width (duration) at the logic level, without having to run the circuit level simulations. SET mechanisms in the circuit are usually investigated by the simulation of MOS circuits, such as Hspice. The availability of simple models of the logic gate level is likely to increase the sensitivity of circuit analysis. Referring to the work of [24], circuit simulation can be made to assess the impact of single event transient (SET) in CMOS circuits. However, this method requires early extensive research to assess the functionality of SET logic circuits in CMOS circuits.

The authors of [25] have studied low-power CMOS circuit design is adopted gradually changing power clock. A clock CMOS gate structure is presented and combination clock circuit design is analyzed. The Hspice simulations demonstrate the low-power characteristic of clocked CMOS circuits using trapezoidal power-clock. Referring to the research [25], method used is a bit similar to the method we use for evaluating the power and delay in digital circuits. However, we do not use power-trapezium, even set the clock frequency using the signal at 20 MHz.

Pamklang et al. [26] have been investigating the network of CMOS current control output driver circuits which are suitable for low-voltage supply integrated circuits. This network consists of a current mirror negative, positive and basic CMOS inverter network. This current mirror bias circuit can limit the current in the transition region, that techniques take advantage for lower power operation. Even when the mirror is quite promising power reduction, the current mirrors are complex when used in evaluating large digital circuits. Referring to the research [26], style of digital circuit topology capable of operating at low-voltage. This method is difficult to implement due to the concept of topology and depends on the circuit purpose.

Power reduction can be done by reducing the active power and the power CMOS motion estimator as illustrated in [27]. They have an architecture circuit capable of reducing voltage in the supply and the number of logic gates, fast motion estimation algorithm, and the reduction of leakage current of the circuits. In the study [27] method used to decrease the power done by reducing the supply voltage. The same method we use to evaluate ultra-low power of digital circuits which operate at low-voltage.

Another way to reduce power dissipation are using Domino and Nora circuits style were presented in 2002, by Samanta et al. [28]. In this study, they expect the delay and strength of the circuit based on models presented in the domino and cell to Nora. The results are then compared with static CMOS circuits synthesized by SIS standard devices. This approach has successfully achieved better results related to the area, delay and power consumption, compared with existing approaches. Referring to the use of Domino and Nora circuit styles in the study [28] is highly successful in reducing power dissipation in digital circuits. However, to develop large-scale digital circuit, using the style of Domino and Nora circuit must be maintained in every design. Therefore, the best method to evaluate ultra-low power digital circuits is to use the original character of the existing transistor and method of scaling the input voltage.

#### 2.3 Subthreshold Circuit

Subthreshold circuit operating at supply voltages lower than the threshold voltage is considered for ultra-low power system. Studies of subthreshold circuits have been presented in terms of size and application, ie, low-power microprocessor design, investigation of analog and digital circuits [3], circuit modeling methodology [7], action for a variety of devices [28], and device modeling and optimization [3]. We explain the methodology of modeling the circuit [7] - [9] in connection with our work and problems.

In [29], A. Tajalli et al. represent the power-delay increase ultra-low power subthreshold source-coupled logic (SCL) circuit. Based on the results of the analysis, they were confirmed by measurements in the 0.18-µm CMOS technology shows improvement by a factor of up to 2.4 on the power-delay product (PDP). This also shows that the proposed method (source-follower buffer) can be used to implement very low-power subthreshold SCL logic gates with better power and efficiency of the district, compared with subthreshold traditional SCL circuits. Referring to [29], the use of subthreshold conditions may help optimize the power delay product. Subthreshold level studies to help develop low-voltage digital circuits.

Leakage current is the main source of power dissipation which happens at low-frequencies applied in digital circuit submicrons by Casey et al. [30]. This situation introduces an active-mode leakage reduction techniques for novel ultra-low power (ULP) low-frequency applications. This is based on ULP CMOS logic style achieving a negative self-biasing  $V_{GS}$ . ULP logic gates have been static at the same time reduced by several times. For 0.13-µm commercial technology, power consumption of the ULP at the gate low-frequency lower than the standard CMOS counterparts because of higher Vt devices, subthreshold operation and reverses body biasing. ULP appear very stable against the process, voltage and temperature variations.

The utilization of many portable battery devices that run the modern application after the request for the design of ultra-low power increases. Many circuit techniques have been successfully implemented to reduce dynamic power and leakage power. Circuits operating in subthreshold region take advantage of the supply voltage ( $V_{dd}$ ) is close to or even less than the threshold voltage transistors. This operation results in low  $V_{dd}$  dissipation ultra-low power circuits, but significantly increasing the propagation delay in the circuit [31]. Gradual reduction of the supply voltage can be implemented in digital circuits. In the same time limits should be observed that the delay circuit operating at the desired frequency.

Supply voltage scaling is one of the easiest ways to reduce power dissipation. Therefore, Chang et al. [32] have considered subthreshold logic as a promising option for achieving ultra-low power dissipation. However, the circuit propagation delay is very sensitive to process, voltage and temperature (PVT) variation in subthreshold operation. Therefore, a large delay margin required for the successful operation of the conventional design. In this work, they explore asynchronous design approach to overcome the challenges of operating in subthreshold. To demonstrate the operation of the proposed subthreshold design approach, they made 8-tap FIR filter in 90 nm CMOS. Referring to the work of [32], the method of voltage scaling impact on the PVT variation in subthreshold operation. The same methods we use in developing large-scale digital circuit which does not focus on the effects of process and temperature. This is because research was done using a standard transistor characteristic.

#### 2.4 Power Modeling

The total power consumption  $P_{Total}$  is the sum of dynamic power, static power, and short-circuits power [33]. The total power consumption is given by

$$P_{Total} = P_{Dynamic} + P_{Static} + P_{Short-Circuit}$$
(2.1)

In the subthreshold region, dynamic energy decreases because of the reduction of supply voltage. Figure 2.2 shows the occurrence of power dissipation based on transition. The subthreshold current becomes the operating current, causing the delay to increase exponentially with voltage scaling. Since leakage power is linear with the circuit delay, it will increase with supply voltage reduction. While supply voltage is reduced to the subthreshold region, when the reduction of dynamic energy cannot compensate the increase in the leakage energy, a minimum energy point is reached. The static power is given by

$$P_{\text{Static}} = I_{\text{leak}} V_{dd} \tag{2.2}$$

The static power  $P_{Static}$  is due to leakage sources  $I_{leak}$  in the transistors, including subthreshold conduction of source, and drain, and reverse bias pn-junction leakage in the source/drain and substrate areas [9]. Figure 2.3 show the power dissipation based on independent switching. During input signal transitions, both NMOS and PMOS blocks conduct simultaneously for a short period of time causing a direct-current flow from the power supply to the ground, which results in short-circuit power. The short-circuit power is given by

$$P_{Short-Circuit} = C_{sc} V_{dd}^{2} f$$
(2.3)

The short-circuit power dissipation can be written in the same manner as the dynamic power dissipation using an equivalent capacitance concept [33], where f is the frequency of switching and  $C_{sc}$  is the capacitance component obtained from the mean accumulated charge during the time that the short-circuit current exists. Figure 2.4 shows the power dissipation based on short-circuit situation. A more accurate formula for short-circuit power dissipation was developed later by other researchers [3]. In the subthreshold region, time delay is

$$T_{delay} = \frac{C_L \times V_{dd}}{I} = \frac{C_L \times V_{dd}}{\mu C_{ox} (W/L) (V_{dd} - V_t)^2}$$
(2.4)

where  $C_L$  is the load capacitance,  $\mu$  is the mobility of electrons,  $C_{ox}$  is the oxide capacitance, W is the width of the layer size, L is the length of the layer size and  $V_t$  is the device threshold.

In [34], J. L. Rosselló and J. Segura presented an accurate method of analysis for analyzing power consumption in CMOS buffer. This comes from MM9 physical MOSFET model (Velghe et al, 1994.), (Foty et al., 1997) and alpha-power Sakurai's law model. Account analytic model generated by the impact of changes in the input voltage, the size of the device, carrier velocity saturation effects, add-to-clutch output capacitance, output load, and temperature. Compared with Hspice simulation results at level 50 and for other models previously published, considering the large set of parameters to 0.18-µm and 0.35-µm technology showed significant improvement.

In CMOS technology, the next generation, supply and threshold voltage should continue to maintain the improved performance of the scale, switching control power dissipation, and maintain reliability. Conditions continue scaling the supply and threshold voltages pose several technology and circuit design challenges. With the scale of sub-threshold voltage threshold leakage power is expected to be a part of total power in future CMOS systems. Therefore, it is important to predict sub-systems such as leakage power limit. The authors of [35] used the subthreshold leakage power prediction model to calculate the threshold voltage variation in subthreshold region.



Figure 2.2 Power dissipation due to transition



Figure 2.3 Power dissipation based on independent switching



Figure 2.4 Power dissipation based on short-circuit

#### 2.5 Voltage Scaling

The scaling of supply power ( $V_{dd}$ ) is one of the simplest approaches to reduce energy dissipation of digital circuits. Since dynamic energy dissipation can be expressed as  $CV_{dd}^2$  (C: effective capacitance) [36], supply scaling gives quadratic energy savings. To achieve ultra-low energy dissipation, researchers have lowered below the threshold voltage of transistors ( $V_t$ ) [37]. In [38], A. Wang et al. presented an FFT processor working at 180 mV. The work in [39] describes a scale down of an FIR filter down to 85 mV using a body-bias technique. The designs were implemented using conventional synchronous design approaches. All pipeline registers are triggered by a common clock signal, whose cycle is determined by the worst propagation delay of combinational logic blocks. It should be noted that circuit propagation delay is extremely sensitive to (PVT) variations in subthreshold operation, resulting in several challenging design problems for conventional synchronous designs. For example, large delay margins are required to cope with the extreme sensitivity of logic delay, deteriorating battery life significantly. In addition, even

small intra-die variations in the clock buffer tree lead to large clock skew. Hence, it is difficult to handle timing issues such as setup and hold time violations.

Dynamic voltage scaling (DVS) has become a standard approach for reducing power when performance requirements vary was presented by author [40]. DVS is a power management technique in computer architecture, where the voltage used in a component is increased or decreased, depending upon circumstances. DVS to increase voltage is known as over-volting. DVS to decrease voltage is known as under-volting. Under-volting is done in order to conserve power, particularly in laptops and other mobile devices, where energy comes from a battery and thus is limited. Over-volting is done in order to increase computer performance, or in rare cases, to increase reliability. In case of transistor operation, overvoltage is known as super-threshold and under-voltage is known as subthreshold.

Voltage scaling is a promising approach to reduce the power consumption in signal processing circuits. However aggressive voltage scaling can introduce errors in the output signal, thus degrading the algorithmic performance of the circuit [41]. The errors occurring due to voltage scaling: (a) errors introduced because of increased delay along the logic path and (b) errors caused by failures in the memory due to process variations.

#### 2.6 Frequency Scaling

Τ,

Frequency scaling (FS) is one of the most commonly used power reduction techniques in high-performance processors was presented in [42]. FS varies the frequency of a microprocessor in real-time according to processing needs. Although there are different versions of FS, at its core FS adapts power consumption and performance to CPU workload. Specifically, existing FS techniques in high-performance processors select an operating point (CPU frequency and voltage) based on processor utilization.

In [43], Y. Chen et al. presented a new dynamic voltage and frequency scaling (DVFS) FFT processor for MIMO OFDM applications. By the proposed multimode multipath-delay-feedback (MMDF) architecture, the FFT processor can process 1-8-stream 256-point FFTs or a high-speed 256-point FFT in two processing domains at minimum clock frequency for DVFS operations. A parallelized radix-2<sup>4</sup> FFT algorithm is also employed to save the power consumption and hardware cost of complex multipliers.

As clock frequency increases and feature size decreases, clock distribution and wire delays present a growing challenge to the designers of singly-clocked, globally synchronous systems. The alternative approach is called a multiple clock domain (MCD) processor, in which the chip is divided into several clock domains, within which independent frequency scaling can be performed by minimizing inter-domain synchronization costs. The CMOS processor the power consumption is directly proportional to the frequency of operation. The design is divided into various zones and operating with different frequencies thereby reducing the power consumption [44]. This research proposed globally asynchronous locally synchronous (GALS) based FFT processor.

In [45], Y. Chen et al. presented a high-throughput and low-complexity fast Fourier transform (FFT) processor for wideband orthogonal frequency division multiplexing communication systems. A new indexed-scaling method is proposed to reduce both the critical path delay and hardware cost by employing shorter wordlength. Together with the mixed-radix multipath delay feedback structure, the proposed FFT processor can achieve very high-throughput with low-hardware cost.

### 2.7 Clock Gating and Power Gating

100 m

Fisher et al. [46] describe the implementation of clock gating in digital circuit. Clock gating is the most common optimization for reducing dynamic power that prevents propagation of signal transition of invalid data with keeping the value stable of current data. This technique also reduces the leakage current of static power that proportion exponentially with temperature. To save power, clock gating support adds more logic to a circuit to prune the clock tree, thus disabling portions of the circuitry so that its flip-flops do not change state: their switching power consumption goes to zero, and only leakage currents are incurred.

Clock gating works by taking the enable conditions attached to registers, and uses them to gate the clocks. However a design must contain these enable conditions in order for clock gating to occur. This clock gating process can save significant area as well as power, since it removes large numbers of multiplexers and replaces them with clock gating logic. This clock gating logic is generally in the form of integrated clock gating (ICG) cells. However note that the clock gating logic will change the clock tree structure, since the clock gating logic will sit in the clock tree. Clock gating logic can be added into a design such as:

- i. Automatically translated into clock gating logic by synthesis tools.
- Manually inserted to module level clock gating by the RTL designers. It can be done by instantiating library specific ICG cells to gate the clocks of specific modules or registers.
- iii. Semi-automatically inserted into the RTL by automated clock gating tools. These typically also offer sequential clock gating optimizations.

The author [47] describe the power gating techniques use high Vt sleep transistors which cut-off a circuit block when it is not switching. The sleep transistor sizing is an important design parameter. This technique, also known as Multi-Threshold CMOS (MTCMOS), MTCMOS reduces stand-by or leakage power in digital circuits. Power gating affects design architecture more compared to the clock gating. It increases time delays as power gated modes have to be safely entered and exited. The possible amount of leakage power saving in such low power mode and the energy dissipation to enter and exit such mode introduces some architectural trade-offs. Shutting down the blocks can be accomplished either by software or hardware. Driver software can schedule the power down operations. Hardware timers can be utilized. A dedicated power management controller is the other option.

An externally switched power supply is very basic form of power gating to achieve long term leakage power reduction. To shutoff the block for small interval of time internal power gating is suitable. CMOS switches that provide power to the circuitry are controlled by power gating controllers. Output of the power gated block discharge slowly. Hence output voltage levels spend more time in threshold voltage level. This can lead to larger short circuit current.

Power gating uses low-leakage PMOS transistors as header switches to shut off power supplies to parts of a design in standby or sleep mode. NMOS footer switches can also be used as sleep transistors. Inserting the sleep transistors splits the chip's power network into a permanent power network connected to the power supply and a virtual power network that drives the cells and can be turned off.

Clock gating is an effective technique for minimizing dynamic power in sequential circuits. The work in [48] uses clock-gating at gate-level not only saves time compared to implementing clock-gating in the RTL code but also saves power and can easily be automated in the synthesis process. In [49], R. Bhutada and Y. Manoli presented a simulation results on various types of clock-gating at different hierarchical levels on a serial

peripheral interface (SPI) design. In general power savings of about 30% and 36% reduction on toggle rate can be seen with different complex clock gating methods with respect to no clock-gating in the design.

Recent researches on low power VLSI design techniques had established various innovations such as clock gating, multi-threshold voltage transistors, multi-supply voltage, dynamic voltage and frequency scaling, power shut-off and etc. Hemantha et al. [50] describe the multi-threshold CMOS design for low power digital circuits. The clock tree synthesis (CTS) is the process of distribution clock signal from PLL to all the synchronous components within a design. The author of [51] explained about clock gating which is used widely in clock distribution as a method to reduce clock network power dissipations. The clock gating components are part of the clock tree distribution components during CTS process. The clock root gating algorithm that is used to merge the clock gates with different enables function had been implemented by [52]. However, in [52] proposal, the initial clock gates placement is not being optimized where merging of the clock gates might not able to obtain the optimum clock gate structure.

#### 2.8 Variable Wordlength

In the world of computing, the term for the natural unit of computer data used by a particular design will refer to the words. A word is a group of bits handled by the system together. The number of bits in a word is important features of computer architecture.

The size of a word is reflected in many aspects of a computer's structure and operation. The majority of the registers in the computer are usually word sized and the amount of data transferred between the processing part computer and the memory system, in a single operation, is most often a word. The largest possible address size, used to designate a location in memory, is typically a hardware word (i.e, the full sized natural word of the processor, as opposed to any other definition used on the platform).

Modern computers usually have a word size of 16, 32 or 64 bits but many other sizes have been used, including 8, 9, 12, 18, 24, 36, 39, 40, 48 and 60 bits. The slab is an example of a system with an earlier word size. Several of the earliest computers used the decimal base rather than binary, typically having a word size of 10 or 12 decimal digits and some early computers had no fixed word length at all.

The determination of wordlength in digital signal processing (DSP) systems is

important because wordlength affects system performance, hardware size, and power consumption. Analyzing finite wordlength errors and finding optimum wordlengths using simulation-based trials have been proposed for minimizing hardware cost [53]. However, the minimum wordlengths found by these methods may degrade system performance under worse-than-expected conditions. In practice, extra bits are added to the length to reduce the chances of this happening. This means that there is room to reduce power dissipation further under better conditions. Using a variable wordlength is an effective approach because the wordlength can be optimized to match the conditions.

The variable-wordlength technique can be use electronic device such as a digital filter, channel equalizer, and a microprocessor. Systems used in wireless environments must be able to handle the sharp fluctuations in amplitude caused by multipath fading. This approach is to have the system optimize the wordlength by monitoring its operating performance and changing the wordlength appropriately.

#### 2.9 Cell Libraries

Today, the power consumption of digital design is the major issue in the design of integrated circuits for portable devices. Design methodologies at different abstraction levels such as systems, architectures, logic design, basic cells as well as layout, must take into account the power consumption.

The work of [54] present a high-speed low-power 1-bit full adder cell designed upon an alternative logic structure to derive the SUM and CARRY outputs. Hspice and Nanosim simulations show that this full adder cell designed using a 0.35-µm CMOS technology and supplied with 3.3V, exhibits delay and power dissipation around 720ps and 840W, respectively. These features reflect an overall improvement of 30% in the power-delay metric, when compared with the performance of other realizations recently published as well featured cells for low-power applications.

The authors in [2], [7], [9], [33], [38], and [48] have developed a scalable full custom cell library for implementing bit-level systolic array signal processors. The cell library achieves high-performance and low power consumption by using dynamic logic circuits with low-threshold voltage CMOS devices. The cell library is designed to implement signal processing functions such as finite impulse response (FIR) filter, infinite impulse response (IIR) filter, poly-phase filter bank, fast Fourier transform (FFT), inverse fast Fourier

transform (IFFT), and matrix operations. Modeling of cell library begins with considering the transistor characteristic itself. The example of transistor connection in cell library is expressed as follows.

M0 3 A1 0 0 NMOS L=1.8e-07 W=1.11e-06 M1 0 A2 3 0 NMOS L=1.8e-07 W=1.11e-06 M2 Z 3 0 0 NMOS L=1.8e-07 W=1.11e-06 M3 7 A1 3 1 PMOS L=1.8e-07 W=1.41e-06 M4 1 A2 7 1 PMOS L=1.8e-07 W=1.41e-06 M5 Z 3 1 1 PMOS L=1.8e-07 W=1.41e-06

.

The example of used parameters in modeling NMOS and PMOS in level 49 is indicated below.

| MODEL NMOS NMOS ( LEVEL = 49 |   |              |              |   |              | 9 |         |    |              |
|------------------------------|---|--------------|--------------|---|--------------|---|---------|----|--------------|
| +VERSION                     | = | 3.1          | INOM         | = | 27           |   | TOX     | =  | 4.2E-9       |
| +XJ                          | = | 1E-7         | NCH          | = | 2.3549E17    | , | VTHO    | =  | 0.3710619    |
| +K1                          | = | 0.5940793    | K2           | = | 2.070131E-3  |   | K3      | ï  | 1E-3         |
| +K3B                         | = | 2.7158495    | W0           | = | 1E-7         |   | NLX     | ij | 2.005089E-7  |
| +DVIOW                       | = | 0            | DVI1W        | Ŧ | 0            |   | DVT2W   | =  | 0            |
| +DVT0                        | = | 1,4615376    | DVT1         | = | 0,3798134    |   | DVI2    | =  | 0.0692378    |
| + <b>U</b> 0                 | = | 293.522312   | AU           | = | -6.73646E-10 |   | UB      | Ŧ  | 1.164182E-18 |
| +0C                          | Ξ | -2.84532E-11 | VSAT         | = | 9.236324E4   |   | AO      | 1  | 1.7591856    |
| +AGS                         | Ξ | 0.3162202    | B0           | = | -5.950938E-8 |   | B1      | ņ  | -1E-7        |
| +KETA                        | = | 0.0111532    | A1           | 9 | 3.896574E-4  |   | A2      | =  | 1            |
| +RDSW                        | - | 139.0465393  | PRWG         | = | 0.5          |   | PRWB    | =  | -0.2         |
| +WR                          | = | 1            | WINT         | = | 0            |   | LINI    | =  | 9.265899E-9  |
| +XL                          | = | -2E-8        | XW           | = | -1E-8        |   | DWG     | Ħ  | -1.343579E-9 |
| +DWB                         | Ħ | -1.391607E-8 | VOFF         | = | -0.0765575   |   | NFACTOR | =  | 2,4791597    |
| +CIT                         | Ξ | 0            | CDSC         | = | 2.4E-4       |   | CDSCD   | θ  | 0            |
| +CDSCB                       | Ħ | ٥            | ETAO         | ¥ | 0            |   | ETAB    | ŋ  | -0.0608407   |
| +DSUB                        | = | 1            | PCLM         | = | 0.2853499    |   | PDIBLC1 | =  | 0.116863     |
| +PDIBLC2                     | = | 0.01         | PDIBLCB      | = | -0.0475298   |   | DROUT   | =  | 0.5922434    |
| +PSCBE1                      | = | 8E10         | PSCBE2       | = | 5.248199E-10 |   | PVAG    | =  | 0.089248     |
| +DELTA                       | = | 0.01         | RSH          | = | 6.8          |   | MOBMOD  | =  | 1            |
| +PRT                         | = | 0            | UTE          | = | -1.5         |   | KT1     | F  | -0.11        |
| +KT1L                        | = | 0            | KT2          | = | 0.022        |   | UA1     | 8  | 4.31E-9      |
| +UB1                         | Ē | -7.61E-18    | 001          | = | -5.6E-11     |   | AT      | i  | 3.3E4        |
| +WL                          | = | 0            | WLN          | = | 1            |   | WW      | I  | 0            |
| +WNN                         | = | 1            | WML          | = | 0            |   | LL      | IJ | ٥            |
| +LLN                         | = | 1            | LW           | = | 0            |   | LWN     | =  | 1            |
| +LWL                         | = | a .          | CAPMOD       | = | 2            |   | XPART   | =  | 0.5          |
| +CGDO                        | = | 7.75E-10     | CGSO         | Ņ | 7.75E-10     |   | CGBO    | Ħ  | 1E-12        |
| +CJ                          | = | 9.955315E-4  | PB           | = | 0.7345743    |   | MJ      | =  | 0.3629904    |
| +CJSW                        | = | 2.586055E-10 | PBSW         | = | 0.6451808    |   | MJSW    | =  | 0.1296914    |
| +CJSWG                       | = | 3.3E-10      | PBSWG        | = | 0.6451808    |   | MJSWG   | Ξ  | 0.1296914    |
| +CF                          | = | 0            | <b>PVTHO</b> | = | 1.33957E-3   |   | PRDSW   | =  | -5           |
| +PK2                         | = | -1.7189E-4   | WKETA        | = | 0.010864     |   | IKETA   | =  | -0.0102793   |
| + <b>P</b> U0                | = | 37.4749547   | PUA          | = | 1.762367E-10 |   | FUB     | Ξ  | 9.411793E-25 |
| +PVSAT                       | = | 2E3          | FETAO        | = | -1E-4        |   | PKETA   | Ξ  | -1.356792E-3 |
| *                            |   |              |              |   |              |   |         |    |              |

| MODEL PR  | í٥' | S PMOS (      |         |            |              | LEVEL :                | - 4 | 19           |
|-----------|-----|---------------|---------|------------|--------------|------------------------|-----|--------------|
| +VERSION  | =   | 3.1           | TNOM    | Ħ          | 27           | TOX                    | =   | 4.2E-9       |
| +XJ       | =   | 1E-7          | NCH     | =          | 4.1589E17    | VTHO                   | =   | -0.4220357   |
| +K1       | =   | 0.5813738     | K2      | Ξ          | 0.0303955    | K3                     | Ξ   | 0            |
| +K3B      | =   | 11.3426872    | WO      | =          | 1E-6         | NLX                    | ×   | 9.376034E-8  |
| +DVTOW    | =   | 0             | DVI1W   | =          | 0            | DVI2W                  | =   | 0            |
| +DVI0     | =   | 0.5131166     | DVI1    | =          | 0.2665264    | DVT2                   | =   | 0.1          |
| +00       | =   | 120.5316596   | UA      | =          | 1.645481E-9  | $U\!B$                 | Ξ   | 1E-21        |
| +UC       | =   | -1E-10        | VSAT    | Ħ          | 2E5          | AD                     | =   | 1,671928     |
| +AGS      | ≠   | 0.3934127     | B0      | =          | 1.830733E-6  | B1                     | =   | 4.739218E-6  |
| +KETA     | =   | 0.0202801     | A1      | =          | 0,1976849    | AZ                     | Ħ   | 0.5787213    |
| +RDSW     | =   | 265.2609374   | PRWG    | <b>a</b> : | 0.5          | FRWB                   | ÷   | -0.2145086   |
| +FR       | =   | 1             | WINT    | ¥          | 0            | LINT                   | ~   | 2.176517E-8  |
| +XI.      | =   | -2E-8         | XW      | ~          | -1E-8        | DIFG                   | =   | -4.223522E-8 |
| +DWB      | æ   | 7.670464E-9   | VOFF    | ≈          | -0.096172    | NFACTOR                | ~   | 2            |
| +CIT      | =   | 0             | CDSC    | t:         | 2.4E-4       | CDSCD                  | ¥   | 0            |
| +CDSCB    | =   | 0             | ETAO    | ×          | 0.023671     | ETAB                   | ×   | -0.3005133   |
| +DSUB     | ≈   | 1.2320494     | PCLM    | ×          | 2.2844319    | PDIBLC1                | æ   | 4.836921E-3  |
| +POIBLC2  | ≈   | 0.0442167     | FDIBLCB | =          | -1E-3        | DROUT                  | =   | 9.991187E-4  |
| +PSCBE1   | ≈   | 1.732893E9    | PSCBE2  | Э          | 5E-10        | FVAG                   | Ħ   | 14.9616148   |
| +DELTA    | ~   | 0.01          | RSH     | Ű          | 7.6          | MOBMOD                 | Ħ   | 1            |
| +PRT      | ×   | 0             | UTE     | ÷          | -1,5         | KT1                    | Ξ   | ~0.11        |
| +KI1L     | =   | 0             | KT2     | Ξ          | 0.022        | ua1                    | Ξ   | 4.31E-9      |
| +UB1      | =   | -7.61E-18     | UC1     | =          | -5.6E-11     | AT                     | Ŧ   | 3.3E4        |
| +WL       | =   | 0             | WIN     | ij         | 1            | W.                     | Ξ   | 0            |
| +WWN      | Ħ   | 1             | WWL     | ,          | 0            | $\mathbf{L}\mathbf{L}$ | =   | 0            |
| +LIN      | =   | 1             | LN      | =          | 0            | LWN                    | =   | 1            |
| +LFL      | Ξ   | 0             | CAPMOD  | =          | 2            | XFART                  | =   | 0.5          |
| +CGDO     | Ξ   | 6.6E-10       | CGSO    | =          | 6.6E-10      | CGBO                   | =   | 1E-12        |
| +CJ       | =   | 1,183858E-3   | FB      | Ξ          | 0.8534482    | MJ                     | Ξ   | 0.4124158    |
| +CJSW     | F   | 2.066263E-10  | PBSW    | =          | 0.6189346    | MJSW                   | =   | 0.2893774    |
| +CJSWG    | =   | 4.22E-10      | PBSWG   | =          | 0.6189346    | MJSWG                  | =   | 0.2893774    |
| +CF       | =   | Q             | eviho   | =          | 2.308546E-3  | PROSW                  | =   | 13.6874174   |
| +PKZ      | =   | 2.657069E-3   | WETA    | Ħ          | 2.467864E-3  | LKETA                  | =   | -2.56649E-3  |
| +PUO      | =   | -1.846164     | PUA     | =          | -8.06063E-11 | FUB                    | =   | 1E-21        |
| +PVSAT    | =   | ~50           | PETAO   | Ŧ          | 1E-4         | FKETA                  | =   | 2.794471E~3  |
| *         |     |               |         |            |              |                        |     |              |
| .OP       |     |               |         |            |              |                        |     |              |
| .IRAN 1n  | 10  | 30n           |         |            |              |                        |     |              |
| .PRINT TH | (A) | IV(2)V(1)V(3) | i i     |            |              |                        |     |              |

.MEASURE AVG\_POW AVG POWER FROM=1n TO=10n

.END

The evaluation of cell library has been done on every logic gate by focusing low-voltage operation and high-frequency. Every logic cell showed the output performance with maintained the power reduction through the voltage scaling method. Table 2.1 explained the meaning of the name in the transistor physical model.

ţ

| -   |            |                                                    |                      |  |  |  |  |  |
|-----|------------|----------------------------------------------------|----------------------|--|--|--|--|--|
| No. | Name       | Description                                        | Unit                 |  |  |  |  |  |
| 1.  | TNOM       | Parameters measurement temperature                 | C                    |  |  |  |  |  |
| 2.  | TOX        | Gate oxide thickness                               | m                    |  |  |  |  |  |
| 3.  | XJ         | Source/drain junction depth                        | m                    |  |  |  |  |  |
| 4.  | NCH        | Peak channel doping concentration                  | cm <sup>-3</sup>     |  |  |  |  |  |
| 5.  | VTH0       | Threshold voltage at zero body bias for            | v                    |  |  |  |  |  |
|     |            | long-channel devices                               |                      |  |  |  |  |  |
| 6.  | K1         | Body-effect coefficient                            | m                    |  |  |  |  |  |
| 7.  | К2         | Charge-sharing parameter                           | m                    |  |  |  |  |  |
| 8.  | К3         | Narrow width coefficient                           | m                    |  |  |  |  |  |
| 9.  | КЗВ        | Narrow width coefficient                           | m                    |  |  |  |  |  |
| 10. | W0         | Narrow width coefficient                           | m                    |  |  |  |  |  |
| 11. | NLX        | Lateral non uniform doping coefficient             | m                    |  |  |  |  |  |
| 12. | DVT0W      | First coefficient of narrow-width effects          | -                    |  |  |  |  |  |
| 13. | DVT1W      | Second coefficient of narrow-width effects         | _                    |  |  |  |  |  |
| 14. | DVT2W      | Body-bias coefficient of narrow-width effects      | 1/v                  |  |  |  |  |  |
| 15. | DVT0       | Temperature coefficient of VTO                     | v/c                  |  |  |  |  |  |
| 16. | DVT1       | Second coefficient of short-channel effects        | -                    |  |  |  |  |  |
| 17. | DVT2       | Body-bias coefficient of short-channel effects     |                      |  |  |  |  |  |
| 18. | U0         | Low-field surface mobility at tnom                 | cm <sup>2</sup> /V s |  |  |  |  |  |
| 19. | UA         | First-order mobility reduction coefficient         | m/v                  |  |  |  |  |  |
| 20. | UB         | Second-order mobility reduction coefficient        | $m^{2\prime}v^2$     |  |  |  |  |  |
| 21. | UC         | Body-bias dependence of mobility                   | m/v <sup>2</sup>     |  |  |  |  |  |
| 22. | VSAT       | Carrier saturation velocity at tnom                | m/s                  |  |  |  |  |  |
| 23. | A0         | Non uniform depletion width effect coefficient     | -                    |  |  |  |  |  |
| 24. | AGS        | Gate-bias dependence of abulk.                     | f/m <sup>2</sup> v   |  |  |  |  |  |
| 25. | <b>B</b> 0 | Bulk charge coefficient due to narrow width effect | m                    |  |  |  |  |  |
| 26. | B1         | Bulk charge coefficient due to narrow width effect | m                    |  |  |  |  |  |
| 27. | KETA       | Body-bias coefficient for non-uniform depletion    | 1/v                  |  |  |  |  |  |
|     |            | width effect                                       |                      |  |  |  |  |  |

Table 2.1 The description of PMOS and NMOS physical model

L

|     |         | · · · · · · · · · · · · · · · · · · ·               |                    |
|-----|---------|-----------------------------------------------------|--------------------|
| 28. | A1      | No-saturation coefficient                           | -                  |
| 29. | A2      | No-saturation coefficient                           | -                  |
| 30. | RDSW    | Width dependence of drain-source resistance         | Ω μm               |
| 31. | PRWG    | Gate-effect coefficient for Rds                     | 1/v                |
| 32. | PRWB    | Body-effect coefficient for Rds                     | 1/√v               |
| 33. | WR      | Width offset for parasitic resistance               | -                  |
| 34. | WINT    | Delta W for capacitance model                       | m                  |
| 35. | LINT    | Delta L for capacitance model                       | m                  |
| 36. | XL      | Length variation due to masking and etching         | m                  |
| 37. | XW      | Width variation due to masking and etching          | m                  |
| 38. | DWG     | Gate-bias dependence of channel width               | m/v                |
| 39. | DWB     | Body-bias dependence of channel width               | m/√v               |
| 40. | VOFF    | Threshold voltage offset                            | v                  |
| 41. | NFACTOR | Subthreshold swing coefficient                      | -                  |
| 42. | CIT     | Interface trap parameter for subthreshold swing     | F                  |
| 43. | CDSC    | Source/drain and channel coupling capacitance       | F/m <sup>2</sup>   |
| 44. | CDSCD   | Drain-bias dependence of cdsc                       | $F/m^2 V$          |
| 45. | CDSCB   | Body-bias dependence of cdsc                        | F/m <sup>2</sup> V |
| 46. | ETA0    | DIBL coefficient subthreshold region                | -                  |
| 47. | ETAB    | Body-bias dependence of et0                         | 1/V                |
| 48. | DSUB    | DIBL effect in subthreshold region                  | _                  |
| 49. | PCLM    | Channel length modulation coefficient               |                    |
| 50, | PDIBLC1 | First coefficient of drain-induced barrier lowering | -                  |
| 51. | PDIBLC2 | Second coefficient of drain-induced barrier         | -                  |
|     |         | lowering                                            |                    |
| 52. | PDIBLCB | Body-effect coefficient for DIBL                    | 1/V                |
| 53. | DROUT   | DIBL effect on output resistance coefficient        | -                  |
| 54. | PSCBE1  | First coefficient of substrate current body effect  | V/m                |
| 55. | PSCBE2  | Second coefficient of substrate current body effect | m/v                |
| 56. | PVAG    | Gate dependence of Early voltage                    | -                  |
| 57. | DELTA   | Effective drain voltage smoothing parameter         | V                  |
| 58. | RSH     | Source/drain diffusion sheet resistance             | Ω/sqr              |

 $(1,1) \in \mathbb{R}^{n}$ 

## **Bibliography**

- [1] T. Kuroda, "CMOS design challenges to power wall," *Proc. of IEEE International Conference on Microprocessor and Nanotechnology*, pp. 6-7, November 2001.
- H. Soeleman, K. Roy, and B.C. Paul, "Robust subthreshold logic for ultra-low power operation," *IEEE Transactions on Very Large Scale Integration (VLSI)* Systems, vol. 9, Issue 1, pp. 90-99, February 2001.
- [3] J. Kim and K. Roy, "Double gate MOSFET subthreshold logic for ultra-low power applications," *Proc. of IEEE International on SOI Conference*, October 2003, pp. 97-98.
- [4] J. F. Ryan, W. Jiajing, and H. Benton, "Analyzing and modeling process balance for subthreshold circuit design," *Proc. of ACM 17th Great Lakes symposium on VLSI*, March 2007, pp. 275-280.
- [5] B. H. Calhoun, A. Wang, and A. Chandrakasan, "Modeling and sizing for minimum energy operation in subthreshold circuits," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 9, pp. 1778-1786, September 2005.
- [6] R. Vaddi, S. Dasgupta, and R. P. Agarwal, "Device and circuit design challenges in the digital subthreshold region for ultra-low power applications," *Proc. of ACM VLSI Design*, January 2009, pp. 1-14.
- B. C. Paul, A. Raychowdhury, and K. Roy, "Device optimization for digital subthreshold logic operation," *IEEE Transactions on Electron Devices*, vol. 52, no. 2, pp. 237-247, February 2005.
- [8] K. Roy, S. Mukhopadhyay, and H. M. Meimand, "Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits," *Proc. of IEEE*, February 2003, pp. 305-327.
- [9] H. Soeleman and K. Roy, "Digital CMOS logic operation in the subthreshold region," Proc. of ACM Tenth Great Lakes Symposium on VLSI, March 2000, pp. 107–112.
- [10] S. Yoshizawa and Y. Miyanaga, "Tunable wordlength architecture for a low power

wireless OFDM demodulator," *IEICE Trans. on Fundamentals*, vol. E89-A, pp. 2866-2873, October 2006.

- [11] C. Jiang, J. Wan, X. Xu, Y. Li, X. You, D. Yu, "Dynamic voltage/frequency scaling for power reduction in data centers: enough or not?," *International Colloquium on Computing, Communication, Control, and Management*, August 2009, pp. 428-431.
- [12] W. Sung and K. Kum, "Simulation-based word-length optimization method for fixed-point digital signal processing systems," *IEEE Trans. on Signal Processing*, vol. 43, pp. 3087-3090, December 1995.
- [13] K. Han and B. L. Evans, "Wordlength optimization with complexity and distortion measure and its applications to broadband wireless demodulator design," *Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing*, May 2004, pp. 37-40.
- [14] W. Zhong and Z. Mao, "Design and VLSI architecture of a channel equalizer based on adaptive modulation for IEEE 802.11a WLAN," *Proc. of IEEE Asia Pacific Conference on Circuits and Systems*, December 2006, pp. 1699-1702.
- [15] S. Yoshizawa and Y. Miyanaga, "Use of a variable wordlength technique in an OFDM receiver to reduce energy dissipation," *Proc. of IEEE International Symposium on Circuits and Systems*, May 2007, pp. 3175-3178.
- [16] M. M. Nisar and A. Chatterjee, "Test enabled process tuning for adaptive baseband OFDM processor," *Proc. of 26th IEEE VLSI Test Symposium*, May 2008, pp. 9-16.
- [17] B. H. Calhoun and A. Chandrakasan, "Ultra-dynamic voltage scaling using subthreshold operation and local voltage dithering in 90nm CMOS," *Proc. of IEEE International Solid-State Circuits Conference*, Febuary 2005, pp. 300-599.
- [18] B. Lin, A. Mallik, P. Dinda, G. Memik, and R. Dick, "User and process driven dynamic voltage and frequency scaling," *Proc. of IEEE International Symposium* on *Performance Analysis of Systems and Software*, December 2009, pp. 11-22.
- [19] P. E. Allen and D. R. Holberg, "CMOS Analog Circuit Design," New York, Oxford University Press, 2002.
- [20] B. Razavi, "Design of Analog CMOS Integrated Circuits," Singapore, McGraw-Hill, 2001, pp. 6-28.
- [21] A. Bansal and K. Roy, "Asymmetric halo CMOSFET to reduce static power dissipation with improved performance," *IEEE Trans. on Electron Devices*, vol. 52,

no. 3, pp. 397-405, March 2005.

- [22] S. Choobkar and A. Nabavi, "A low power programmable CMOS circuit for generating modulated pulses for UWB applications," *Proc. of the 2nd International Conference on Wireless Broadband and Ultra Wideband Communications*, August 2007, pp. 5-5.
- [23] C. X. Jie and F. Q. Yuan, "A low power high-reliability CMOS current limit circuit," *Proc. of Asia Pacific Conference on Microwave*, December 2005, pp. 3.
- [24] G.I. Wirth, M. G. Vieira, and F. G. L. Kastensmidt, "Accurate and computer efficient modeling of single event transients in CMOS circuits," *IET Trans. on Circuit, Device and Systems*, vol. 1, no. 2, pp. 137-142, April 2007.
- [25] X. Wu and M. Pedram, "Low power CMOS circuits with clocked power," *Proc. of IEEE Asia Conference on Circuit and Systems*, pp. 513-516, December 2000.
- [26] J. Pamklang, K. Kumwachara, and P. Kongtanasunthorn, "Low power dissipation CMOS output driver circuits," *Proc. of TENCON 2000*, September 2000, pp. 466-469.
- [27] T. Enomoto and T. Ei, "Low power CMOS circuit techniques for motion estimators," *Proc. of the International Symposium on Circuits and Systems*, May 2003, pp. 209-412.
- [28] D. Samanta, N. Sinha, and A. Pal, "Synthesis of high-performance low power dynamic CMOS circuits," Proc. of Asia and South Pacific International Conference on VLSI Design, January 2002, pp. 99-104.
- [29] A. Tajalli and M. Alioto, "Improving power delay performance of ultra-low power subthreshold SCL circuits," *IEEE Trans. on Circuits and Systems*, vol. 56, pp. 127-131, February 2009.
- [30] M. C. Casey, O. A. Amusan, S. A. Nation, T. D. Loveless, A. Balasubramanian, B. L. Bhuva, R. A. Reed, D. McMorrow, R. A. Weller, M. L. Alles, L. W. Massengill, J. S. Melinger, and B. Narasimham, "Single-event effects on combinational logic circuits operating at ultra-low power," *IEEE Trans. on Nuclear Science*, vol. 55, pp. 3342-3346, December 2008.
- [31] M. B. Henry, and S. B. Griffin, and L. Nazhandali, "Fast simulation framework for subthreshold circuits," *Proc. of IEEE International Symposium on Circuits and Systems*, May 2009, pp. 2549-2552.
- [32] I. J. Chang, S. P. Park, and K. Roy, "Exploring asynchronous design techniques