#### RICE UNIVERSITY

### Non-invasive IC Tomography Using Spatial Correlations

by

#### **Davood Shamsi**

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE

#### **Master of Science**

APPROVED, THESIS COMMITTEE:

27.

Dr. Farinaz Kousha<del>nfar, Chair</del> Assistant Professor, Electrical and Computer Engineering

UM

Dr. Don H. Johnson J.S. Abercrombie Professor Emeritus, Electrical and Computer Engineering

Dr. Richard Baraniuk Victor E. Cameron Professor, Electrical and Computer Engineering

HOUSTON, TEXAS NOVEMBER 2009 UMI Number: 1485969

All rights reserved

INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted.

In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion.



UMI 1485969 Copyright 2010 by ProQuest LLC. All rights reserved. This edition of the work is protected against unauthorized copying under Title 17, United States Code.

ProQuest LLC 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, MI 48106-1346

#### ABSTRACT

Non-invasive IC Tomography Using Spatial Correlations

by

#### Davood Shamsi

f

We introduce a new methodology for post-silicon characterization of the gatelevel variations in a manufactured Integrated Circuit (IC). The estimated characteristics are based on the power and the delay measurements that are affected by the process variations. The power (delay) variations are spatially correlated. Thus, there exists a basis in which variations are sparse. The sparse representation suggests using the L1-regularization (the compressive sensing theory). We show how to use the compressive sensing theory to improve post-silicon characterization. We also address the problem by adding spatial constraints directly to the traditional L2-minimization.

The proposed methodology is fast, inexpensive, non-invasive, and applicable to legacy designs. Noninvasive IC characterization has a range of emerging applications, including post-silicon optimization, IC identification, and variations' modeling/simulations. The evaluation results on standard benchmark circuits show that, in average, the gate level characteristics estimation accuracy can be improved by more than two times using the proposed methods.

# Contents

a. -

| A  | bstra             | ct     |                                     | ii  |
|----|-------------------|--------|-------------------------------------|-----|
| Li | List of Tables vi |        |                                     |     |
| Li | st of             | Figure | €S                                  | vii |
| 1  | Intr              | oducti | on                                  | 1   |
| 2  | Bac               | kgrour | nd                                  | 8   |
|    | 2.1               | Relate | ed work on process variation        | 8   |
|    |                   | 2.1.1  | Early work                          | 8   |
|    |                   | 2.1.2  | Variation estimation and modeling   | 10  |
|    |                   | 2.1.3  | Effects of variations on the design | 15  |
|    |                   | 2.1.4  | Testing                             | 17  |
|    | 2.2               | Prelin | ninaries                            | 18  |
|    |                   | 2.2.1  | Variation model                     | 18  |
|    |                   | 2.2.2  | Compressive sensing                 | 20  |

.\*

υ.

~

iii

| 3 | Pow | er Tor | mography                                  | 23 |
|---|-----|--------|-------------------------------------------|----|
|   | 3.1 | Prelim | ninaries                                  | 23 |
|   |     | 3.1.1  | Leakage current                           | 23 |
|   |     | 3.1.2  | Global flow of the power tomography       | 25 |
|   | 3.2 | Nonin  | vasive tomography                         | 26 |
|   | 3.3 | Fast t | omography by compressive sensing          | 30 |
|   |     | 3.3.1  | Sparse representation                     | 30 |
|   |     | 3.3.2  | Regular grid tomography                   | 32 |
|   |     | 3.3.3  | Irregular grid tomography                 | 33 |
|   | 3.4 | Tomo   | graphy using spatial constraints (TUSC)   | 35 |
|   |     | 3.4.1  | Adding spatial constraints                | 35 |
| 4 | Del | ay Tor | mography                                  | 38 |
|   | 4.1 | Prelin | ninaries                                  | 38 |
|   |     | 4.1.1  | Delay variation model                     | 38 |
|   |     | 4.1.2  | Sensitizable paths                        | 39 |
|   |     | 4.1.3  | Global flow of the delay tomography       | 39 |
|   | 4.2 | Delay  | estimation by $\ell_2$ -norm minimization | 41 |
|   | 4.3 | Delay  | estimation using compressive sensing      | 46 |
|   |     | 4.3.1  | Sparse representation of variations       | 47 |
|   |     | 4.3.2  | Gates on the regular grids                | 49 |
|   |     | 4.3.3  | Gates on the irregular grids              | 50 |

.

. .

÷

·, <del>-</del>

|   | 4.4 | Determining the regularization coefficient $\lambda$ | 51 |
|---|-----|------------------------------------------------------|----|
|   | 4.5 | Path selection                                       | 55 |
|   |     | 4.5.1 Sensitizable paths                             | 55 |
|   |     | 4.5.2 Basis path set                                 | 57 |
| 5 | Арг | olications                                           | 59 |
| 6 | Eva | luation Results                                      | 65 |
|   | 6.1 | Simulations setup                                    | 65 |
|   | 6.2 | Power tomography results                             | 67 |
|   |     | 6.2.1 Measurement matrix evaluation                  | 68 |
|   |     | 6.2.2 Tomography results in the power framework      | 69 |
|   | 6.3 | Delay evaluation results                             | 75 |
|   |     | 6.3.1 Measurement matrix and estimation in subspaces | 75 |
|   |     | 6.3.2 Delay tomography results                       | 76 |
| 7 | Cor | nclusion                                             | 82 |

v

# List of Tables

| 3.1 | Static power for different input vector combinations                           | 26 |
|-----|--------------------------------------------------------------------------------|----|
| 4.1 | Transition propagation rate for different gates                                | 43 |
| 6.1 | Average number of independent power vectors                                    | 72 |
| 6.2 | Performance of the $\ell_2$ -norm minimization, the $\ell_1$ -norm regulariza- |    |
|     | tion, and TUSC (power)                                                         | 73 |
| 6.3 | Number of independent paths and independent linear equations.                  | 78 |
| 6.4 | Performance of $\ell_2$ -norm minimization and $\ell_1$ -norm regularization   |    |
|     | (delay)                                                                        | 80 |

**8** 

-

# List of Figures

| 2.1 | Design structure used by Doh et al. [23]             | 11         |
|-----|------------------------------------------------------|------------|
| 2.2 | Spatial correlation study by Doh et al [23]          | 1 <b>2</b> |
| 2.3 | Measured process variation in a wafer [30]           | 13         |
| 2.4 | Variation on four test chips.                        | 14         |
| 3.1 | Global flow of the power tomography                  | 25         |
| 3.2 | A simple logic circuit                               | 26         |
| 3.3 | Number of independent measurement vectors.           | 29         |
| 3.4 | The power variation and its sparse wavelet transform | 30         |
| 3.5 | Sorted wavelet coefficients (power).                 | 31         |
| 3.6 | Gates are not placed on regular grids                | 33         |
| 3.7 | Irregular wavelet transformation.                    | 34         |
| 4.1 | Global flow of the delay tomography.                 | 40         |
| 4.2 | A sensitizable path from an input to the output      | 40         |
| 4.3 | Delay variations and their wavelet transform         | 47         |

3**2**5

·- ,

| 4.4 | Sorted wavelet coefficients.                                                  | 48 |
|-----|-------------------------------------------------------------------------------|----|
| 4.5 | The variation estimation error for various regularization factors $\lambda$ . | 51 |
| 4.6 | Optimization curves for various measurement errors                            | 55 |
| 4.7 | An example of a circuit with sensitizable and unsensitizable paths.           | 56 |
| 6.1 | Singular values of the measurement matrix.                                    | 68 |
| 6.2 | Variations estimation error vs. percent of the power measurement              |    |
|     | noise                                                                         | 69 |
| 6.3 | Variation estimation error vs. number of power measurements                   | 70 |
| 6.4 | Singular values of the measurement matrices                                   | 74 |
| 6.5 | Variation (delay) estimation error vs. measurement error                      | 77 |
| 6.6 | Variation (delay) estimation error vs. the number of measurements.            | 79 |

.

•

and the second sec

þ

# Chapter 1

1

## Introduction

In the modern integrated circuit (IC) design the objective is to increase operation speed (maximum frequency) and decrease power consumption. The maximum frequency of an IC is a function of the longest encountered delay in its different parts. Signal delay can be reduced by increasing the transistor density, but to increase transistor density in an IC, dimensions of the CMOS transistors in the IC must be scaled. Decreasing power consumption also demands reducing CMOS transistors dimensions. As we know, Moore's law predicts that the number of transistors on an inexpensive IC doubles every two years. For example, the Intel 80486 introduced in 1989 was manufactured using  $0.8\mu$ m CMOS technology and had a the maximum clock speed of 133MHz. Today's modern processors, such as the Intel Core 2, are manufactured using 65nm technology or less, and the maximum frequency can be more than 3.20 GHz. Dimensions of a manufactured CMOS transistor is not exactly as it was designed. If one measures the dimensions of the manufactured transistors, there are some variations from the design specifications. This phenomenon is called manufacturing variation. Imperfection in manufacturing tools is the main contributor to systematic process variations. For example, because of the limitations on the minimum wavelength of the laser etching the mask [45], masks that are used in the manufacturing are not totally similar and symmetric. Thus, the transistors dimensions depend on the specific mask used in the manufacturing process and the transistors' location on the mask. Another reason for the manufacturing variations is uncontrollable physical parameters of the manufacturing process (random variations). Because it is not possible to control strictly the physical environment of the fabrication, manufacturing two ICs with the same mask does not result in the same variations.

Process variations can dramatically affect properties of manufactured ICs. Statistical static timing analysis (SSTA) statistical power analysis are two examples of the techniques that considers variations for pre-silicon optimizations. In SSTA, the goal is to find the longest path delay in the circuit. Because of the nondeterministic behavior of variations, no single path always elicits the longest delay. Thus, path delays should be statistically modeled and then the longest delay of the circuit is determined with a specific confidence interval. Orshansky et al. [57] showed that variations might cause up to 25% error in timing analysis. In the statistical power analysis, it is also shown that under variations the ratio of standard deviation to the mean of the total current might varies between 0.17 to 0.98 [5]. However, pre-silicon optimizations, such as SSTA, have some limitations: the statistical characterizations of variations are not precisely determined and they might vary on different chips.

 ${\bf y}_{i}$ 

A number of post-silicon variations characterization methods have recently introduced [23, 30, 36, 79]. Friedberg et al. [30] used electrical linewidth metrology (ELM) to measure variations of chips' dimensions on a wafer. They exhaustively measured the variations of all the transistors. Hargreaves et al. [36] introduced a post-silicon characterization method using ring oscillators. They put a number of ring oscillators in different locations on an IC. Then, they measured the frequency of each ring oscillator. Frequencies of the oscillators represent variations across the IC. The mentioned methods are either expensive [30] or design specific [36].

We propose a fast, non-invasive, and inexpensive method for gate level postsilicon characterization using power and delay measurements. In the power framework, we first explain how the nominal leakage power consumptions of a logic gate are multiplied by a *scaling factor* due to process variations. The scaling factor indicates the ratio of the gate leakage to its expected value. Then, we show that measuring the total power consumption for each circuit input enforces a linear constraint on scaling factors. Feeding the circuit with different input vectors and measuring the total power for each input vector leads to a system of linear equations with scaling factors as unknown variables. A common technique to solve the system of linear equations is traditional least square minimization  $(\ell_2$ -minimization).

÷

Ŀ,

This estimation approach can be improved by incorporating the spatial correlations in our framework. We show that spatial correlations suggest that there is a basis in which variations in the scaling factors can be represented sparsely. We specifically consider wavelet bases that can capture spatial correlation efficiently [25,73]. We experimentally determine a wavelet basis that results in the sparsest representation for variations. Having a sparse representation for variations, we use compressive sensing technique to efficiently recover scaling factors. Here, we regularize the objective function of the optimization problem with an  $\ell_1$ -norm term to impose the sparsity on the solution.

The post-silicon characterization also can be improved by adding spatial constraints directly to the optimization. The spatial correlation implies that two spatially close gates approximately follow similar variations. It is not statistically expected that two nearby gates follow totaly independent variations. Thus, in the underlying optimization, we penalize the difference among scaling factors of the nearby gates. The new formulation results in a better estimation of gates scaling factors. The approach is based on our paper in ISLPED 2008 conference [62].

Next, we use path delay analysis to characterize the variations in gate de-

lays [63]. The same approach as in post-silicon leakage characterization is used for gate level delay variation characterization. However, in contrast to the power variations, the variations in delay are additive and they are linear functions of the CMOS dimensions. We use HSPICE simulation to find linear relations between transistor variations and delay variations in various gates. However, in the delay framework, the from and the construction of the system of linear equations is different from the power framework. In the delay framework, one can only measure the delay of the signal propagation on specific paths that start form a primary input and end at a primary output. Such paths are called sensitizable (testable) paths [64, 65]. We use the testable basis selection method in [65] to find a set of sensitizable basis paths for a circuit. Then, using the linear relationship between transistor dimensions and the gate delays, we construct a system of linear equations with variations as the unknown variables. Again, we can use traditional  $\ell_2$ -minimization or  $\ell_1$ -regularization (compressive sensing) to estimate the gate level timing characteristics.

۵» <del>د ب</del>

> We evaluate performance of the proposed methods for both delay and power frameworks on a number of circuits from the MCNC benchmark suits. Results indicate that  $\ell_1$ -regularization method can estimate the variations much more accurately than the traditional  $\ell_2$ -minimization. However, performance of the  $\ell_1$ -regularization method depends on the circuit topology. For example, in delay framework, the  $\ell_1$ -regularization method on the C499 benchmark circuit improves

gate-level characteristics estimation more than 100%, while on the b9 benchmark, the improvement is only 10%.

A number of applications can enjoy non-invasive post-silicon characterization. They include post-silicon optimization, manufacturing process characterization, simulation improvement and IC identification.

The new aspect of this thesis are as follows:

- We propose a method for post-silicon gate-level characterization for both power and delay frameworks, that only uses non-invasive measurements. In contrast to variation measurement methods based on the ring oscillators, our method works for a general combinatorial IC.
- For the first time, we represent post-silicon variations in a sparse domain.
  Even though the spatial correlation in the variations is widely studied before
  [23, 30, 79], it is the first time that is used for post-silicon optimization.
  We experimentally determine which wavelet basis results in the sparsest representation.
- We use the theory of compressive sensing to estimate the variations with a small number of measurements. We use the wavelet basis to sparsely represent delay and power variations.
- We analyze the regularization factor in  $\ell_1$ -regularization and introduce a method to estimate the optimal regularization factor.

- We modify the original compressive sensing formulation such that it can be applied to irregular gate placements.
- We add new constraints to the optimization problem that directly impose spatial correlations. With these additional constraints, variation estimations improve considerably.
- The proposed post-silicon variation characterization method is fast, inexpensive, and non-invasive. It enables a range of new applications. We introduce a number of novel applications for the proposed method.

The thesis is organized as follows. In Chapter 2, we discuss related work and preliminaries that are used in the thesis. Preliminaries include the variations model and the compressive sensing theory. Chapters 3 and 4 introduce our variation estimation method in power and delay frameworks, respectively. Next, we discuss a number of applications for the proposed post-silicon variations characterization method in Chapter 5. The evaluation results are presented in Chapter 6. We finally summarize the thesis in Chapter 7.

## Chapter 2

## Background

### 2.1 Related work on process variation

#### 2.1.1 Early work

Manufacturing variations have been a main source of random properties of precisely designed ICs. Even though process variations were very small in 20th century fabrication technology, they could affect precise analog design and they were addressed by a number of researchers [27, 41, 55, 66]. Three of the early works in identification of random variations stand out.

In 1982, Shyu et al. [66] studied effects of random variations on MOS capacitors. They identified the capacitor edge and the oxide thickness fluctuation as two sources of randomness in MOS capacitors. The variations in the physical properties lead to a random capacitance. They analytically derived the relationship between the capacitance and the random variations, and they numerically showed how variations affect the capacitance. For example, they showed that a Gaussian random fluctuation with variance  $0.1\mu m$  in a capacitor with edge length  $50\mu m$  causes only 0.036% difference in capacitance. Their results indicate that, in early CMOS capacitors, the effects of the random variations were negligible.

5.1.

Lakshmikumar et al. [41] in 1982 proposed a method to predict the current mismatch (intra-die) of the transistors on an integrated circuit. Since only the relative dimensions of transistors are important in analog design, the impacts of global variations (inter-die) were not analyzed in this work. They had two main missions in the paper. First, sources of variations were determined and a model was fitted to the measurement data. In other words, they tried to predict the systematic part of the variation. Second, they constructed an analytical relation between the current mismatch and transistor dimensions. Thus, the predicted current mismatch could be transformed into dimension variations. Knowing the variation in dimensions helps designing more precise analog circuits. However, random variations were not considered. This, the total variations could not be predicted.

In 1995, Eisele et al. [27] used a  $10 \times 10$  transistor array to study intradie variations in manufactured ICs. Their addressing scheme allowed individual transistor selection, meaning, they could characterize each transistor separately. After finding  $V_{GS}$  of all transistors, a normal distribution was fitted to the measured values. They also showed that variations in gate source voltage,  $V_{GS}$ , are spatially correlated. They then repeated the procedure for different aspect ratios (W/L) and verified the relation between the transistor dimensions and the threshold voltage variance:  $\sigma_{V_{th}} \propto (WL_{eff})^{-\frac{1}{2}}$ . Thus, as the CMOS transistor dimensions decrease, the fluctuations variance increases.

#### 2.1.2 Variation estimation and modeling

As technology improved and nano-scale CMOS transistors could be fabricated, process variations became a determining factor. To appreciate how variations affects the circuit design, one needs a thorough understanding of variation and its statistical properties in ICs. Several researchers performed measurement and modeling of the process variations in different CMOS technologies [8, 12, 16, 23, 30, 36, 43, 46, 47, 50, 76, 79].

In 2005, Doh et al. [23] experimentally characterized the spatial correlation in process variations. To do so, they fabricated a  $4 \times 5$  module array in 130nm CMOS technology. As can be seen in Figure 2.1, each module consisted of 16 patterns of nMOS and pMOS transistors and an oscillator. Oscillators are standard devices used to characterize properties of integrated circuits [36]. They consist of a number of inverters that are connected in a loop circuit. Doh et al. [23] used a 40-pattern ring oscillator (see Figure 2.1). Using this method, they explained the spatial correlation in variations. Figure 2.2 shows the scatter plot for saturation



Figure 2.1: Design structure used by Doh et al. to characterize spatial correlation in process variations [23]. The left part is the  $4 \times 5$  module array that they used in the experiment. Each module includes 16 patterns of nMOS, 16 patterns of pMOS, and an oscillator.

voltage of nMOS transistors. Saturation voltage of transistors in close modules, like M1 and M2, are strongly correlated. The right side of Figure 2.2 shows that the correlation decreases linearly with distance.

To characterize accurately variations, Friedberg et al. [30] used Electrical Linewidth Metrology (ELM) to measure transistors feature sizes in a 200mm wafer. They used the Kelvin test to find linewidth by ELM measurements. Figure 2.3 shows variations distribution of 130nm technology for a complete wafer. Patterns of inter-die and intra-die variations can be clearly observed in the picture. They measured dimension variations of all transistors in a number of wafers and introduced a variation model for the transistor dimensions. They proposed a piecewise linear fit to the measurement data. Their experimental results showed



P

Figure 2.2: Spatial correlation study by Doh et al [23]. Left: scatter plot of saturation voltage of nMOS transistors. Close modules are strongly correlated. Right: Spatial correlation decreases as distance between modules increases.

that spatial correlation increases after a specific distance, but they do not have any argument that interprets the experimental results. Their method is invasive and expensive in time and equipment, making it very hard to characterize variations in a large number of ICs using ELM.

Zhao et al. [79] used a transistor array to study the process variation. They used the test chip that was designed and fabricated by Agraval et al. [7]. The test structure was specifically designed to determine the local variation in transistors. The dimension of the test structure was  $125\mu m \times 110\mu m$  and it consisted of 1000 columns and 96 rows. They used Level Sensitive Scan Device (LSSD) latch banks in the structure to allow addressing each transistor uniquely. They determined current voltage characteristics of all transistors. The observed variations were thought to be a result of threshold voltage and gate-length variations. They also



Figure 2.3: Measured process variation in a wafer [30]. Friedberg et al. used Electrical Linewidth Metrology (ELM) to measure the process variation in all the dies of the wafer. Inter-die and intra-die variations can be clearly observed.

proposed a model for each parameter variations. The results show that having a statistical characterization of variations can reduce IC power prediction error from 30% to 7%. Their work signaled benefit of variations modeling. However, their analysis used a test array circuit and it can not be extended for modeling legacy ICs that are not equipped with the sensors.

Liu [47] proposed a new modeling approach that described systematic variations as an affine function of the device's geometric coordinates. To model random variations, he recommended three spatial correlation functions: exponential, Gaussian, and linear. Using generalized least square fitting, he chose a Gaussian model for the measured data. His main focus was on modeling rather than on measuring IC variations.

Ring oscillators spread throughout a test chip were used by Hargreaves et al. [36] to measure variation on a test chip. The chip design allowed the ring oscillators could be accessed sequentially. Thus, Hargreaves et al. [36] could measure each ring oscillator frequency separately. Figure 2.4 shows inverter delays for four different test ICs. They finally also modeled the variations as a Gaussian field. Their method differs from the model by Liu [47] in the correlation function and fitting procedure. Hargreaves et al. used more accurate parameter estimation method with higher complexity compared to Liu [47].



Figure 2.4: Variation on four test chips measured by Hargreaves et al. [36].

None of these methods provides a fast and practical method for variation estimation. They are either invasive, that is destructive, and expensive in terms of the time and equipment cost, or they rely on addition of on-chip oscillators for variation sensing. We introduce a fast, non-invasive, and inexpensive method to estimate the variation. Only a small number of power or delay measurements are used to characterize the gate-level post-silicon variations.

Ξį

#### 2.1.3 Effects of variations on the design

Process variations have considerable effects on chip properties [2–6, 9, 19, 21, 26, 33, 44, 51, 57]. For example, they can seriously affect timing [14, 17, 19, 39, 49, 51, 57, 58, 78]. In statistical static timing analysis (SSTA), researchers try to find signal propagation delays on the critical paths in a circuit. Most of the proposed solutions are particularly interested in finding the statistical distribution of the maximum propagation delay. Orshansky et al. [57] found that in 180nm technology not considering process variation might cause a 25% timing error. Choi et al. [19] estimated path delays under process variation and proposed a new sizing algorithm. Their proposed method performed up to 19% better than the worst case analysis. Mangassarian et al. [51] found the delay probability distribution function (pdf) of the critical paths and sorted them. Based on sorted pdf of path delay, they proposed a statistical timing analysis that is about 30% better than the worst case analysis.

Above methods are pre-silicon models that a specific variations distribution is assumed on the IC. Cline et al. [20] analyzed impacts of the variations models on SSTA methods. They used real measurement data to fit the models such that the correlation decreases as distance between two gates increases. Then, they compared the SSTA methods of the models with the static timing analysis (STA). They showed that correlation models for the SSTA should follow the specific process variations in the IC. Otherwise, the performance of the SSTA would degrade.

Liu et al. [48] introduced an SSTA method using post-silicon measurements and optimizations. They combined post-silicon measurements with the existing pre-silicon models for the variations. Thus, they constructed a specific model for each die. The proposed method could decrease the standard deviation by 83.5% compared to the traditional post-silicon SSTA techniques.

Process variations affect the performance of pipelined circuits as well. Pipelined circuits consist of a number of sequential stages. To increase the operating frequency, one needs stages with small delays, but the slowest stage is the system's bottleneck. In the presence of variations, delay of each gate is randomly distributed according to a some pdf and it is not possible to exactly determine the slowest stage [21]. Datta et al. [21] showed that considering variations can result in a 9% improvement of design yield. Eisele et al. [26] showed that, in 180nm CMOS technology, variation might cause a 10% reduction in the operating frequency.

Leakage current of an IC also changes with process variation [3, 5, 11, 59, 60]. Agarwal et al. [5] proposed a method to model IC leakage current distribution. They showed that in 50nm CMOS technology the coefficient of variation of the total current might vary between 0.17 to 0.98.

#### 2.1.4 Testing

The goal of the IC testing is finding the defective gates in the circuit [56, 64, 65]. The test might be a functional test or a delay test. In the functional test, the logical functionality of the gates is tested. The delay test ensures that the delays of all gates satisfy a number of specific constraints.

Finding a set of testable paths is the most important task in the testing. Sharma et al. [65] introduced a technique to construct a small basis path set that cover all gates. They proposed automatic test pattern generation (ATPG) techniques to identify the longest testable path through each gate. Thus, they could detect any defect in the circuits using delay measurements.

Murakami et al. [56] introduced a method to recognize untestable paths. Their method was based on the logical necessity conditions that should be satisfied for a path to be testable. Knowing the necessity conditions, they proposed an algorithm to find the longest testable path trough each gate.

Although, similar to our method, the circuit testing is based on the delay

measurements on a set of testable paths, the goal is not characterizing the delay variations. In the circuit testing, only defected gates are interested while the goal of our method is to characterize the delay variations of the gates.

Thus, the process variations affect many different properties of a manufactured IC and they can not be ignored anymore. The previously described methods for variation estimation are expensive and cannot be extended for a legacy IC.

### 2.2 Preliminaries

ľ

#### 2.2.1 Variation model

Process variations can be generally described as the sum of systematic variations and random variations. The systematic variations have a deterministic pattern resulting from physical imperfection in the manufacturing process. For example, mask imperfections result in systematic variations in the chip. Because of their deterministic source, systematic variations can potentially be known beforehand [76]. The systematic variations of a specific logical gate u, denoted by  $\psi_u^s$ , are usually linearly modeled [47],

$$\psi^s_u=a_0+a_1x_u+a_2y_u;$$

where  $a_0$ ,  $a_1$  and  $a_2$  are the model parameters and  $(x_u, y_u)$  is physical location of the gate on the IC.

Random variations result from arbitrary fluctuations in the manufacturing process. These variations can be decomposed into inter-die  $\psi^{\text{inter}}$  and intra-die variations  $\psi^{\text{intra}}$ . Inter-die variations represent the differences among the dies for the same wafer. Inter-die variation is a random variable equaling some constant value for each chip. Intra-die variations represent the differences among the devices on one chip. Thus, the total random variation for gate u is

$$\psi_u^r = \psi^{\text{inter}} + \psi_u^{\text{intra}}.$$

Finally, total variation can be written as

$$\psi_u = \psi_u^s + \psi_u^r$$

$$= a_0 + a_1 x_u + a_2 y_u + \psi^{\text{inter}} + \psi_u^{\text{intra}}$$

$$= F_u^T \beta + \psi_u^{\text{intra}} \qquad (2.1)$$

Where  $F_u = [1, x_u, y_u]^T$  and  $\beta = [a_0 + \psi_u^{\text{inter}}, a_1, a_2]^T$ . Note that  $F_u$  contains the gates location information. The term  $F_u^T \beta$  for a specific gate is constant.  $\psi_u^{\text{intra}}$  is a Gaussian random vector with zero mean and correlation matrix  $\Sigma$  [47]

$$\Sigma_{u,v} = \rho(F_u - F_v).$$

 $\rho$  is the correlation function and can have three forms [47]:  $\rho(\cdot) = \exp(-a^2 ||\cdot||)$ (exponential),  $\rho(\cdot) = \exp(-a^2 ||\cdot||^2)$  (Gaussian), or  $\rho(\cdot) = \max\{0, 1 - a^2 ||\cdot||\}$ (linear). Note that Gaussian random variables describe variations in the dimensions of gates (or equivalently gate delays), i.e.,  $d_u = d_u^0 + \psi_u$  where  $d_u^0$  is nominal dimension of the gate.

#### 2.2.2 Compressive sensing

The compressive sensing concepts, that enable us to reconstruct a sparse vector by partial measurement, are explained here (see [10, 15, 24]). A vector is called *s*-sparse when it has only *s* non-zero elements. Assume X is an *s*-sparse  $N \times 1$ vector. Assume Y is described based on the following equation

$$Y = UX + e. \tag{2.2}$$

Vector X is the unknown sparse vector; U is a known  $K \times N$  measurement matrix and e is measurement noise. Note that not only are the values of the non-zero components of X are not known, neither which components are zero. The vector Y is our observation (measurement). The goal is to estimate the sparse vector X using the measurement vector Y. To retrieve the vector  $X_{f}$  one might choose a vector that minimizes  $||Y - UX||_2$ . Because of the measurement noise and small number of measurements, this procedure usually leads to a non-sparse signal. However, solving the following optimization problem finds an sparse solution

$$\min ||X||_0 + \alpha ||e||_2 \tag{2.3}$$

such that Y = UX + e,

where  $\alpha$  is a positive constant.

Note that the zero norm,  $\|\cdot\|_0$ , in Equation 2.3 measures the number of nonzero elements of the vector. This objective function is not convex which means solving the optimization problem in Equation 2.3 is difficult. Instead, Danaho et al. showed [10, 15, 24] that one can use the following optimization problem to approximate the sparse vector X.

$$\min ||X||_1 + \alpha ||e||_2$$
(2.4)  
such that  $Y = UX + e$ .

They proved that, for a Gaussian measurement matrix, an s-sparse vector can be retrieved via  $\ell_1$ -norm optimization if

$$s < C \frac{K}{\log(N/K)};$$

where C is a constant. Moreover, for a general measurement matrix U, Restricted Isometry Property (IRP) should be satisfied [15].

Most of the real world vectors have an approximately sparse representation. A vector  $X_{N\times 1}$  is called approximately *s*-sparse if it has *s* large elements and N - s very small elements. It is also shown that the optimization problem in Equation 2.4 can be used to recover approximately sparse vectors that lie in weak  $l_p$  ball of radius *r* [15]. i.e.,

$$|x|_{(i)} \le ri^{-\frac{1}{p}}, 1 \le i \le N$$
(2.5)

where  $X = (x_1, x_2, ..., x_N)$ ,  $x_{(i)}$  is *i*-th largest element of X, and p is a positive integer number.

-

•

`.

## Chapter 3

# **Power Tomography**

In this chapter, we introduce the new fast power variations estimation (power tomography) method. The proposed power tomography is based on our paper in ISLPED 2008 conference [62].

### 3.1 Preliminaries

#### 3.1.1 Leakage current

Digital circuits are designed such that there is no direct path between the voltage source and ground. Thus, one might expect that digital circuits do not consume static power; however, the leakage current does occur. There are four sources of leakage current [28]: (1) reverse-biased junctions, (2) gate-induced drain leakage, (3) gate direct-tunneling leakage, and (4) sub-threshold leakage. Finding the exact value of the leakage current involves elaborate expressions. Since such an exact leakage model does not affect our basic approach, we use following model presented in [60].

ß

$$I_{\text{leak}} = q_1 e^{(q_2 L + q_3 L^2)}.$$
(3.1)

 $I_{\text{leak}}$  is the leakage current of a transistor;  $q_1$ ,  $q_2$ , and  $q_3$  are three constants that are determined by physical characteristic of the transistor and L is the gate length of the transistor.  $q_3$  is a small number and  $q_3L^2 \ll q_2L$  [60]. This model suggests an exponential relation between the transistor gate length and the leakage power. Thus, the leakage current approximately has a log-normal distribution and  $p_u = \phi_u p_u^0$ ; where  $p_u^0$  and  $p_u$  are nominal power and real power of the gate, respectively, and  $\phi_u = e^{\psi_u}$ ; where  $\psi_u$  represent variation in transistor dimension.

Thus, given a combinational circuit C consisting of N logical gates,  $P_I$  input pins, and  $P_O$  output pins, each gate  $g_u$ , based on its inputs signals b, consumes a specific power  $p_{g_u,b}$ . Because of the process variation, power consumption of gate  $g_u$  does not equal to its nominal power consumption  $p_{g_u,b}^0$ . Rather, it is scaled by  $\phi_u$ .

$$p_{oldsymbol{g}_{oldsymbol{u}},oldsymbol{b}} = p^0_{oldsymbol{g}_{oldsymbol{u}},oldsymbol{b}} \phi_{oldsymbol{u}}$$

The scaling factors of gates,  $\phi_u$ , need to be estimated, whenever it is feasible.



### 3.1.2 Global flow of the power tomography

Figure 3.1: Global flow of the power tomography.

Figure 3.1 shows the global flow of our method. A number of random input vectors are applied to the circuit, and the leakage current corresponding to each input vector is measured (Steps 1 and 2). Next, a system of linear equations is formed where each equation corresponds to one measurement (Step 3). The equation unknowns are the (normalized) leakage current variations of each gates. The standard way to estimate the IC's leakage tomogram is to use  $\ell_2$ -norm optimization (Steps 4a-5a). However, our method exploits spatial correlations of the statistical leakage variations and compressive sensing theory to estimate efficiently the leakage tomogram (Steps 4b-5b). We also enforce the spatial constraint on power variations estimation directly (the TUSC method in Steps 4c-5c).



Figure 3.2: A simple logic circuit.

| input vector | NAND-2   | NOR-2    |
|--------------|----------|----------|
| 00           | 0.776 nW | 17.41 nW |
| 01           | 10.39 nW | 4.112 nW |
| 10           | 4.137 nW | 7.581 nW |
| 11           | 15.15 nW | 3.527 nW |

Table 3.1: Static power for different input vector combinations.

### 3.2 Noninvasive tomography

In this section, we detail the full matrix measurement method for noninvasive gate-level characterization. First, different inputs are applied to the circuit and the total chip's leakage current measured for each input. Then, an optimization problem is solved to find the process variation based on the power measurements.

Consider the simple logic circuit in Figure 3.2. It has 3 inputs and 2 outputs. The nominal power consumptions of each gate for different inputs are shown in Table 3.1.The table shows power consumption for 65nm CMOS transistor technology. As a result the circuit has a different power consumptions for each input vector. Because of the process variation, the nominal power consumption of the gate  $g_u$  is scaled by  $\phi_u$ . For example, if input 1, input 2, and input 3 are 0, 1, and 1, respectively, then the total power consumption of the circuit would be

1

$$p_{011} = p_{g_1,01}\phi_1 + p_{g_2,11}\phi_2 + p_{g_3,00}\phi_3 + p_{g_4,00}\phi_4$$
$$= 4.112\phi_1 + 15.15\phi_2 + 0.776\phi_3 + 17.41\phi_4, \qquad (3.2)$$

where  $p_{g_i,b_j^i}$  is the power consumption of the gate  $g_i$  for input  $b_j^i$ . Note that  $b_j^i$ , the input of each gate  $g_i$ , is a function of input vector of the circuit that is denoted by  $b_j$ . For example, in Figure 3.2, if  $b_j = 011$  then  $b_j^3 = 00$ .

In a digital circuit with N gates, for the binary input vector  $b_j$ , total power consumption  $p_{b_j}$  is

$$p_{b_j} = \sum_{i=1}^{N} p_{g_i, b_j^i} \phi_i.$$
(3.3)

If there are M input vectors  $b_1, ..., b_M$ , define measurement matrix A as

$$A = \left[ \begin{array}{ccccc} p_{g_1,b_1^1} & p_{g_2,b_1^2} & \dots & p_{g_N,b_1^N} \\ \\ p_{g_1,b_2^1} & p_{g_2,b_2^2} & \dots & p_{g_N,b_2^N} \\ \\ \vdots & \vdots & & \vdots \\ \\ p_{g_1,b_M^1} & p_{g_2,b_M^2} & \dots & p_{g_N,b_M^N} \end{array} \right]$$

Also, let

 $\mathbf{p} = \left[p_{b_1}, p_{b_2}, \dots, p_{b_M}\right]^T,$
$$\mathbf{d} = \left[\phi_1, \phi_2, \dots, \phi_N\right]^T$$
.

Then, we need to solve following system of linear equations to find the gate variations.

$$\mathbf{p} = A\mathbf{d}.\tag{3.4}$$

Since there are N unknown variables  $(\phi_i, i = 1...N)$ , N independent measurements are needed to describe completely the solution of the linear system in Equation 3.4. In the presence of power measurement noise, we can least square.

$$\min ||A\mathbf{d} - \mathbf{p}||_2^2. \tag{3.5}$$

We call this method the  $\ell_2$ -minimization method.

Note that each input vector  $b_j$ , based on the topology of the circuit, determines a row of the measurement matrix A (power vector). It may be that the rows of the measurement matrix are not necessarily independent, making it impossible to find the variation of all gates by optimization as in Equation 3.5.

#### Multi-voltage leakage measurement

The number of independent power vectors (row of the measurement matrix) may increase by increasing the number of power measurements, M. However, circuit topology dictates an upper bound on the maximum number of independent power vectors. But as discussed in Section 2.2, supply voltage and the leakage current are not linearly dependent. Hence, measuring static power for different supply voltages results in independent power vectors. We use this fact to increase the number of the independent power vectors in the measurement matrix.



Figure 3.3: Number of independent measurement vectors for single voltage measurements and multiple voltages (3 voltages) measurements. Multiple voltage measurements increase number of independent rows in the measurement matrix.

Figure 3.3 shows the number of the independent power vectors in the C432 circuit from ISCAS'85 benchmarks. Similar to the previous section, this experiment is based on the 65nm CMOS transistor technology. In this figure, the number of independent power vectors versus the number of random measurements are shown. Two cases were investigated: measurement under single supply voltage and measurement under three supply voltages. It is clear that for the same total number of measurements, three supply voltages measurements result in more independent power vectors.



Figure 3.4: Process variation and its sparse wavelet transform for a typical circuit in power framework.

# 3.3 Fast tomography by compressive sensing

As discussed in Section 2.2, sparse vectors can be acquired using very few measurements. In this section, first, we introduce fast tomography for chips with gates located on regular grids. Then, we extend this approach for cases with gates located on irregular grids.

#### **3.3.1** Sparse representation

The spatial correlation in the variations provides some redundancies in the variation values. The spatial correlation suggests that variations can be sparsely represented in an appropriate basis. In this section, we use wavelet basis to sparsely represent the process variations. Specifically, we assume  $\mathbf{d} = W^{-1}\mathbf{s}$ , where W is



Figure 3.5: Sorted wavelet coefficients for different basis functions in power framework. The db9 basis produces the most sparse representation.

a wavelet basis and **s** is a sparse vector. Wavelet basis are very efficient in sparse modeling of spatial correlation, as shown in Figure 3.4. The left side of the figure images the variations of a chip in the spatial domain. The right side shows the variations in the wavelet domain. In the wavelet domain most of the non-zero coefficients are concentrated in the upper-left corner of the transform and most of the remaining coefficients are close to zero.

Figure 3.5 shows wavelet transformation of variations for a number of wavelet bases. The figure demonstrates the coefficients decay rate for a variety of wavelet families on typical  $32 \times 32$  regular grid circuits. The figure suggests that the Daubechies 9 (db9) wavelet basis is very good at sparsifying the process variation. In the remainder of the thesis, we use the Daubechies 9 wavelet to model process variation sparsity in the power framework.

### 3.3.2 Regular grid tomography

First, we assume that the logic gates are located on a regular  $T \times R$  grid on the chip. The matrix of process variation on the regular grid is denoted by  $H = \{h_{s,t}\}_{s=1...T,t=1...R}$ , where  $h_{s,t}$  is variation of the gate located in the (s, t)-th point of the grid. We stack all the elements of the matrix H in a long column vector **d**. Assume W is the transformation matrix for a wavelet in which variation vector **d** is sparse. Let

$$\mathbf{s} = W\mathbf{d};\tag{3.6}$$

then,  $\mathbf{s}$  is a sparse vector.

Using the wavelet basis to model the spatial correlation of the process variation, Equation 3.4 becomes

$$\mathbf{p} = A\mathbf{d} + e = AW^{-1}\mathbf{s} + e. \tag{3.7}$$

The sparse s can be recovered using the optimization in Equation 2.4:

$$\min \|\mathbf{s}\|_1 + \lambda \|AW^{-1}\mathbf{s} - \mathbf{y}\|_2^2.$$
(3.8)

The process variation **d** is then recovered using  $\mathbf{d} = W\mathbf{s}$ .



Figure 3.6: Gates are not placed on regular grids.

#### 3.3.3 Irregular grid tomography

In practice, gates are not placed on a regular layout grid. Figure 3.6 shows an example of an IC in which gates are placed on an irregular grid. To address the irregular placement, we cover the IC with fine regular grids. Then, using Procedure 1, each gate is assigned to a point on the regular grid. At the first step of Procedure 1, all the regular grid points are labeled *unmarked*, meaning that none of the regular points is assigned to any gate. In the second step, for every gate, we find its closest regular point that is *unmarked*. Finally, to prevent multiple selection, we mark the selected regular grid.

Then, we assign auxiliary variables to the points in the fine grid that are not assigned to any gate. We also modify the measurement matrix A to be consistent with the fine regular grids. i.e., for each auxiliary variable, we add an appropriate zero column to the matrix A. Since the coefficients of auxiliary variables in the





measurement matrix are zero, they do not affect the optimization.

#### **PROCEDURE** 1

 $\frac{\text{Mapping from irregular gates to fine regular grids}}{(1) \text{ Set all the regular grid points unmarked}}$ 

- (2) for all gates,  $g_i$
- a. p = the closest grid point to the gates that is unmarked
- b. assign gate  $g_i$  to p
- c. Mark regular grid point p

Note that as an alternative method to deal with irregular grids, we could use irregular wavelet transformation introduced by Wagner et al. [75]. The irregular wavelet transformation is based on the regular wavelet transformation; however, it is adapted to irregular point arrangement. Figure 3.7 shows sorted wavelet coefficient for both irregular wavelet transformation and our fine-grid wavelet transformation in the C880 circuit. The wavelet coefficients of the of the proposed fine-grid method decay much faster than irregular grid transformation. The main reason is that the gate placement is not completely irregular. The standard gate sizes are integer multiplicand of a specific value. Moreover, the placement tools assume irregularity in just one dimension.

f

# 3.4 Tomography using spatial constraints (TUSC)

In this section, we directly use the spatial correlation to improve the estimation error of power variations. In Section 3.2, we just used power (leakage) measurements in Equation 3.3 to estimate the variations. Representing variations in sparse domain in Section 3.3 is based on the spatial correlation in the variations. Here, we reformulate the variation estimation problem such that the spatial correlation explicitly appears in the optimization problem.

#### **3.4.1** Adding spatial constraints

Adding spatial constraints directly to the optimization problem improves the estimation performance. The spatial correlation implies that nearby gates should have approximately similar scaling factors. As the distance between two gates increases, the correlation between their scaling factors decreases. Thus, far gates might have totally different scaling factors. We should penalize solutions in which nearby gates do not have close scaling factors.

Consider optimization problem in Equation 3.5. We add a number of the constraints to the optimization problem such that they enforce spatially correlation solutions. Assume  $g_u$  and  $g_v$  are two logic gates that are located at  $(x_u, y_u)$  and  $(x_v, y_v)$ , respectively. Similar to Section 3.2, their scaling factors are denoted by  $\phi_u$  and  $\phi_v$ . We use the following optimization problem to improve variation estimation.

$$\min ||A\mathbf{d} - \mathbf{p}||_2^2 + \sum_{(g_u, g_v) \in \mathcal{E}} \gamma(d_{u,v}) (\phi_u - \phi_v)^2,$$
(3.9)

where

$$d_{u,v} = \sqrt{(x_u - x_v)^2 + (y_u - y_v)^2},$$

 $\mathcal{E} = \{(g_u, g_v) | g_u \text{ and } g_v \text{ are two gates in the circuit}\},$  (3.10)

and  $\gamma(.)$  is a monotone-decreasing function. Thus, when the distance between two gates  $(d_{u,v})$  is small,  $\gamma(d_{u,v})$  is large. It enforces a small value for  $(\phi_u - \phi_v)^2$ . Consequently, when, the distance between two gates  $(d_{u,v})$  is large,  $\gamma(d_{u,v})$  is small and  $(\phi_u - \phi_v)^2$  does not affect optimization problem dramatically. Hence, solution of the optimization problem in Equation 3.9 will exhibit spatial correlations.

To simplify the constraints, one can eliminate the gate pairs that are far from

each other. For example, we can define  $\mathcal{E}_r$  as

$$\mathcal{E}_r = \{(g_u, g_v) | g_u \text{ and } g_v \text{ are two gates in the circuit, } d_{u,v} < r\}.$$
(3.11)

ŝ

1, \*

# Chapter 4

# Delay Tomography

In this chapter, we extend the variation estimation to the delay framework. Similar to the power tomography in Chapter 3, we only use primary inputs/outputs of the IC to characterize the delay variations. The approach is based on our paper in ICCAD 2008 conference [63].

## 4.1 Preliminaries

#### 4.1.1 Delay variation model

Transition delay is usually modeled as a linear function of transistor feature size variations [38, 49, 58]. For example, consider a NAND2 gate where one of its inputs is 1 and the other input, at time t = 0, transits from 0 to 1. Because of propagation delay, the output transits from 1 to 0 at time  $t = d_r$ . When there

~.\_

are variations in transistor feature size, rising propagation delay,  $d_r$ , varies among different NAND2 gates in the IC. i.e. [49]

$$d_r(\psi_u^{\text{total}}) = d_r^{\text{nominal}} + \xi \psi_u^{\text{total}}$$
(4.1)

where  $\xi$  is a constant and  $d_r^{\text{nominal}}$  is the nominal rising delay of the gate. Note that, even if we model the propagation delay quadratic (or higher order) [29], we can use the same approach by assuming new variables for higher order parameters.

#### 4.1.2 Sensitizable paths

A path in an IC is defined as a sequence of logic gates from an input of the IC to one of its output pins. To find propagation delay in a path, one should find an appropriate input vector for the IC. The input vector should guarantee propagation of a transition in the path. If such an input vector exists, the path is called *sensitizable*; otherwise it is called *unsensitizable*.

#### 4.1.3 Global flow of the delay tomography

Figure 4.1 shows the global flow of the work. At the first step, we feed the circuit with a number of input vector pairs based on the set of sensitizable paths. The inputs are found based on the path selection procedure introduced in Section 4.5. In step 2, propagation delay is measured for every sensitizable path. Based on the measured propagation delays, we construct a System of Linear Equations



Figure 4.1: Global flow of the delay tomography.



Figure 4.2: A sensitizable path from an input to the output. Inputs to the circuit are set such that a rising (falling) transition in input a can propagate to the output n.

(SLE) with gate variations as its unknown parameters. Then, we estimate variations by two methods (4a and 4b). The first method is based on the traditional  $\ell_2$ -minimization (4a.) In the second method, we show sparsity of variations in wavelet domain and use compressed sensing ( $\ell_1$ -regularization) to estimate variation more efficiently.

٦,

## 4.2 Delay estimation by $\ell_2$ -norm minimization

ţ

The signal propagation delays of a number of sensitizable paths are measured. Linear equations are constructed with the scaling factors of gate delays (defined in Section 4.1.1) as the unknown parameters. Finally, solving these equations, we estimate the scaling factors and, therefore, the gate variations. In Section 4.3, we utilize the variations in spatial correlations to improve the scaling factor estimations.

An example of path delay analysis is shown in Figure 4.2. Lines labeled by a, b, c, and d are the circuit's primary inputs and the line n is the circuit's primary output. We want to sensitize the delay of the highlighted path,  $P_1$ : (a- $g_1$ -z-e- $g_3$ -f- $g_4$ -s- $g_6$ -k- $g_7$ -n). We need to find an input vector that guarantees a transition in input a that would propagate through the path. Let us assume a rising transition in a (input a transits from 0 to 1). To allow propagation through the gate  $g_1$ , we need to set b to be equal to 0. Then, there would be a falling  $(1 \rightarrow 0)$  and a rising  $(0 \rightarrow 1)$  transition in lines e and f, respectively. If g is equal to 1 and m is equal to 0, then the rising transition propagates in the lines s, k and n. To guarantee that g is equal to 1 and m is equal to 0, we just need to set the input c = 0.

The input assignments above allow the transition in input a to propagate through the path  $P_1$  :a- $g_1$ -z-e- $g_3$ -f- $g_4$ -s- $g_6$ -k- $g_7$ -n. Using the delay bounding method introduced in [64], one can measure the total delay of the underlying path. We can measure the time difference between the transitions in line a and in line n. Let us denote the total delay of the path  $P_1$  for the rising transition by  $d_r(P_1)$ .

The total path delay is an additive composition of the delays of its elements. For example, delay of the path  $P_1$  can be written as the summation of the delays in line a, gate  $g_1$ , line k, line e, gate  $g_3$ , and so on. i.e.,

ß

$$d_r(P_1) = d(a) + d_r(g_1) + d(z) + d(e) + d_f(g_3) + d(f) + d_r(g_4) + d(s) + d_f(g_6) + d(k) + d_r(g_7) + d(n),$$
(4.2)

where d(x) is the delay of the line x, and  $d_r(g_i)$  and  $d_f(g_i)$  are the rising and falling delays of gate  $g_i$ , respectively.

Here, we assume for presentation clarity that interconnect delays (line delays) are zero. The proposed method can be easily extended to cases with non-zero interconnect delays. Note that it maybe the case that variations in the interconnects have a separate statistical representation. In such scenarios, one may consider compressed sensing methods that address the summation of two distinct distributions in one framework [24]. Assuming zero interconnect delays, Equation 4.2 reduces to:

$$d_r(P_1) = d_r(g_1) + d_f(g_3) + d_r(g_4) + d_f(g_6) + d_r(g_7).$$
(4.3)

In Section 4.1, we illustrated that because of the process variation, delays of

| Gate     | Rising (pS/ $\mu$ m) | Falling (pS/ $\mu$ m) |
|----------|----------------------|-----------------------|
| Inverter | 86.9                 | 40.77                 |
| NAND2    | 176.9                | 507.7                 |
| NOR2     | 95.4                 | 1106.2                |

Table 4.1: Transition propagation rate for different gates. The rising and the fallingtransitions do not enforce the same delay rates.

the gates deviate from their nominal values, i.e. [49],

$$d_r(g_i) = d_r^{\text{nominal}}(g_i) + \xi_{r,g_i} l_{g_i}, \qquad (4.4)$$

where  $d_r^{\text{nominal}}(g_i)$  is the nominal delay for rising transition and  $l_{g_i}$  is the variation for the gate  $g_i$  and  $\xi_{r,g_i}$  is a constant coefficient. Table 4.1 shows the constant coefficients for NAND2 gate. Similarly for the falling transition,

$$d_f(g_i) = d_f^{\text{nominal}}(g_i) + \xi_{f,g_i} l_{g_i}.$$
(4.5)

Thus, Equation 4.3 becomes

·, +

$$d_{r}(P_{1}) = d_{r}^{\text{nominal}}(g_{1}) + \xi_{r,g_{1}}l_{g_{1}}$$

$$+ d_{f}^{\text{nominal}}(g_{3}) + \xi_{f,g_{3}}l_{g_{3}}$$

$$+ d_{r}^{\text{nominal}}(g_{4}) + \xi_{r,g_{4}}l_{g_{4}}$$

$$+ d_{f}^{\text{nominal}}(g_{6}) + \xi_{f,g_{6}}l_{g_{6}}$$

$$+ d_{f}^{\text{nominal}}(g_{7}) + \xi_{r,g_{7}}l_{g_{7}}, \qquad (4.6)$$

or

$$\begin{aligned} \xi_{r,g_1} l_{g_1} + \xi_{f,g_3} l_{g_3} + \xi_{r,g_4} l_{g_4} + \xi_{f,g_6} l_{g_6} + \xi_{r,g_7} l_{g_7} &= b_{P_1} \\ b_{P_1} &= d_r(P_1) - d_r^{\text{nominal}}(g_1) - d_f^{\text{nominal}}(g_3) \\ &- d_r^{\text{nominal}}(g_4) - d_f^{\text{nominal}}(g_6) - d_f^{\text{nominal}}(g_7) \end{aligned}$$

 $b_{P_1}$  is a constant. Thus, each sensitizable path in the circuit leads to a linear relation among the variation elements,  $l_{g_i}$ . The falling and rising coefficients  $(\xi_{f,g_i} \text{ and } \xi_{r,g_i})$  are known and our goal is to estimate the variations,  $l_{g_i}$ .

Assume that  $P_1, P_2 \dots P_M$  are M sensitizable paths in a general combinational circuit C with N gates. For each path  $P_j$ , if it is stimulated by a rising transition,

$$\sum_{i=1}^{N} \alpha_{P_j}(i) \xi_{\lambda^r(P_j, g_i), g_i} l_{g_i} = b_j^r$$
(4.7)

where

$$lpha_{P_j}(i) = \left\{ egin{array}{cc} 1 & ext{if } g_i ext{ belongs to the path } P_j; \\ 0 & ext{otherwise}, \end{array} 
ight.$$

$$\lambda^r(P_j,i) = \left\{ egin{array}{ll} f & ext{if } g_i ext{ has a falling transition when path } P_j \\ & ext{ is stimulated by a rising transition;} \\ r & ext{otherwise.} \end{array} 
ight.$$

ŗ.

Similarly for a falling transition,

$$\sum_{i=1}^{N} \alpha_{P_j}(i) \xi_{\lambda^f(P_j, g_i), g_i} l_{g_i} = b_j^f$$
(4.8)

where

$$\lambda^{f}(P_{j}, i) = \begin{cases} f & \text{if } g_{i} \text{ has a falling transition when path } P_{j} \\ & \text{is stimulated by a falling transition;} \\ r & \text{otherwise.} \end{cases}$$

To write Equations 4.7 and 4.8 in a compact form, we define matrix A and measurement vector **b** and variation vector **l** as follows.

$$A = \begin{pmatrix} \alpha_{P_1}(1)\xi_{\lambda^r(P_1,g_1),g_1} & \dots & \alpha_{P_1}(N)\xi_{\lambda^r(P_1,g_N),g_N} \\ \alpha_{P_2}(1)\xi_{\lambda^r(P_2,g_1),g_1} & \dots & \alpha_{P_2}(N)\xi_{\lambda^r(P_2,g_N),g_N} \\ \vdots & & \vdots \\ \alpha_{P_M}(1)\xi_{\lambda^r(P_M,g_1),g_1} & \dots & \alpha_{P_M}(N)\xi_{\lambda^r(P_M,g_N),g_N} \\ \alpha_{P_1}(1)\xi_{\lambda^f(P_1,g_1),g_1} & \dots & \alpha_{P_2}(N)\xi_{\lambda^f(P_1,g_N),g_N} \\ \vdots & & \vdots \\ \alpha_{P_M}(1)\xi_{\lambda^f(P_M,g_1),g_1} & \dots & \alpha_{P_M}(N)\xi_{\lambda^f(P_M,g_N),g_N} \end{pmatrix},$$

and

$$\mathbf{b} = (b_1^r, b_2^r, \dots b_M^r, b_1^f, b_2^f, \dots b_M^f)^T,$$

and

$$\mathbf{l} = (l_1, l_2 \dots l_N)^T.$$

This notation allows following minimization for finding the variation 1.

$$\min ||A\mathbf{l} - \mathbf{b}||_2^2. \tag{4.9}$$

we call this method  $\ell_2$ -minimization method.

Note that it may not be possible to find the variations of all gates by this method. For example in Figure 4.2, if we want to find another sensitizable path that includes  $g_4$ , we should fix f = 1 (none-controlling value) causing e = 0 and g = 1. Thus, the transition cannot propagate on the line g and path  $P_0$  is the only path that includes the gates  $g_3$ ,  $g_4$  and  $g_6$ . As a result, there is at most two equations (falling and rising) that includes variation of the gates  $g_3$ ,  $g_4$  and  $g_6$ ; it is impossible to find the variation of the three gates separately. We refer to such cases as ambiguous gates.

## 4.3 Delay estimation using compressive sensing

Section 4.2 presents a system of linear equations to estimate variations of the gates. However, the optimization problem in Equation 4.9 does not consider the spatial correlation of the delay variations. Incorporating the spatial correlation in



Figure 4.3: Left: Spatial correlation in delay variations in a typical IC. Right: wavelet transform of the variation. Because of the spatial correlation the variation is sparse in the wavelet domain.

the model significantly improve the results and allows resolving the ambiguities described in the previous section. This section incorporates sparsity in the wavelet domain as a model for the spatial correlation of the timing variation. Thus, we can use compressive sensing theory to estimate the variations more accurately.

#### 4.3.1 Sparse representation of variations

As we explained in Section 3.3.1, because of the spatial correlation, wavelet basis can sparsely represent the variations. Similar to power tomography, we use the wavelet basis to sparsely represent variations. Note that variations in power framework are based on a log-normal distribution but variations in the delay are approximately normally distributed. Thus, power variations and delay variations



Figure 4.4: Sorted wavelet coefficients for different bases. bio3.5 bases results in the most sparse representation.

might be sparse in different wavelet bases.

Figure 4.3 demonstrates the effectiveness of the wavelet transform in representing spatial variations. The left side of the figure is the image plot of the variations in a typical IC, generated using the Gaussian model in [47]. The spatial correlation is evident in the figure. The right side of the figure represents the wavelet transform of the left hand side. Most of the transform coefficients are zero. Only the top-left part of the figure has a dense amount of significant non-zero elements.

Figure 4.4 presents the decay rate of the wavelet coefficients for a number of different wavelet transforms. A transform appropriate for compressed sensing should have a fast decay rate. The faster the decay, the sparser the signal under this transform, and the fewer the measurements necessary to acquire the variation vector. The figure demonstrates that the (3,5) Biorthogonal wavelet basis best describes the spatial variations. We use this wavelet basis for the remainder of this thesis.

#### 4.3.2 Gates on the regular grids

When gates are located on a regular grid, the two-dimensional wavelet transform of the variations,  $\mathbf{s}$ , can be expressed as the product of the variation vector,  $\mathbf{l}$ , with the wavelet transform matrix W.

$$\mathbf{s} = W\mathbf{l}.\tag{4.10}$$

As discussed in Section 4.3.1, **s** is assumed sparse because of the spatial correlation in the variations. We enforce the sparsity prior by regularizing Equation 4.9 using the  $\ell_1$  norm of **s**, as described in Section 2.2.2:

$$\min \|\mathbf{s}\|_{1} + \lambda \||A\mathbf{l} - \mathbf{b}\|_{2}^{2}$$
(4.11)

or, equivalently,

$$\min \|\mathbf{s}\|_1 + \lambda \|AW^{-1}\mathbf{s} - \mathbf{b}\|_2^2, \tag{4.12}$$

where  $\lambda$  is the regularization coefficient. Sparsity of the variations wavelet transformation, **s**, provides a new piece of information. We call this method  $\ell_1$ regularization method.

#### 4.3.3 Gates on the irregular grids

As we saw in Section 3.3.3, in practice because of area and logic gate constraints, the gates are not located on regular grids. An example of gate placement is shown in Figure 3.6. Similar to Section 3.3.3, we overcome this problem by using a dense regular grid such that the center of each gate is close to some grid point for all the gates in the circuit. We assign the variation of each gate  $g_u$  to the point on the regular grid that is closest to the center of the gate. If there are more than one closest points, we select one of them randomly. The remaining grid points are assigned to free variables that do not correspond to physical gates and do not affect the measurements.

The remainder of the measurement process is similar to Section 4.3.2. The points on the regular grid are mapped to a column vector  $\mathbf{l}$  which is measured by a measurement matrix A as in Equation 4.11. Note that if the *i*-th element of the  $\mathbf{l}$  is a free variable not assigned to any gate variation, then *i*-th column of A is zero. The vector  $\mathbf{l}$  is still spatially correlated, and therefore sparse in the wavelet domain, and can be recovered through  $\mathbf{s}$  in Equation 4.12. From the recovered  $\mathbf{l}$  the free variables can be ignored since they do not correspond to physical gates.



Figure 4.5: The variation estimation error for various regularization factors  $\lambda$ .

## 4.4 Determining the regularization coefficient $\lambda$

Consider  $\ell_1$ -regularization problem,

$$\min \|x\|_1 + \lambda \|Ax - b\|_2^2. \tag{4.13}$$

When  $\lambda$  is very small,  $\lambda ||Ax - b||_2^2$  would be small compared to the  $\ell_1$ -norm term,  $||x||_1$  and does not affect objective function dramatically. Thus, norm-one term  $||x||_1$  is the main component that determines the solution of the regularization problem; the solution tends to be sparse. In the other hand, when  $\lambda$  is very large,  $\lambda ||Ax - b||_2^2$  would be large compared to the norm-one term,  $||x||_1$ , and small changes in  $||Ax - b||_2^2$  result in large changes in objective function. In general,  $\lambda$  balances between sparsity ( $\ell_1$ -norm term) and fitting to measurements ( $\ell_2$ -norm term).

Measurement noise and sparsity of the vector x are two major components

that determine  $\lambda$ . When there is no noise in measurements, i.e.,  $Ax_r = b$ , the regularization coefficient  $\lambda$  should be set infinity. As measurement noise increases, we should relax  $\ell_2$ -norm constraint or equivalently decrease  $\lambda$ . In addition, sparse vectors imply small  $\lambda$ . When it is known that vector x is strongly sparse, one should relax  $\ell_2$ -norm constraint (decrease  $\lambda$ ) to obtain a very sparse solution for the problem.

Figure 4.5 shows estimation error for different regularization coefficients,  $\lambda$ . As explained, for very small  $\lambda$  and very large  $\lambda$  estimation error is high. There is an optimal regularization coefficient  $\lambda_{opt}$  in which the variation estimation error is minimum. Optimizing Equation 4.13 for  $\lambda = \lambda_{opt}$  leads to the minimum variation estimation error.  $\lambda_{opt}$  is a function of the measurement matrix, measurement noise, and the true variations  $x_r$ ; thus, it is not possible to find  $\lambda_{opt}$  exactly.

Applying first-order necessity condition for regularization problem in Equation 4.13 determines minimum value for  $\lambda$ . Let

$$J(x) = ||x||_1 + \lambda ||Ax - b||_2^2.$$

The first-order necessity condition for optimal solution implies  $\frac{\partial J(x)}{\partial x_i} = 0, i = 1 \dots n$ . Thus,

$$rac{\partial \|x\|_1}{\partial x_i} = -rac{\partial \lambda \|Ax - b\|_2^2}{\partial x_i}$$

$$\lambda \frac{\partial ||Ax - b||_2^2}{\partial x_i} = \begin{cases} 1 & x_i > 0\\ [-1, 1] & x_i = 0\\ -1 & x_i < 0. \end{cases}$$

1.1

ц.

Hence,

or

$$\lambda \mid \frac{\partial \|Ax - b\|_2^2}{\partial x_i} \mid \leq 1, i = 1 \dots n.$$
$$\|\frac{\partial}{\partial x} \|Ax - b\|_2^2\|_{\infty} = 2 \|A^T (Ax - b)\|_{\infty} \leq \frac{1}{\lambda}$$
(4.14)

As we mentioned before, for very small regularization coefficients  $\lambda$ , zero is close to the optimal point. Thus, putting x = 0 in Equation 4.14 determines a value for  $\lambda$ . i.e., if x = 0 is a optimal solution,

$$\lambda \le \frac{1}{\|A^T b\|_{\infty}}.$$

Thus, the value for  $\lambda$  corresponding to zero would be

$$\lambda_0 = \frac{1}{\|A^T b\|_\infty}$$

Kim et al. [40] suggest determining  $\lambda$  based on  $\lambda_0$ . They use  $\lambda_1 = 10\lambda_0$ . For the problem shown in Figure 4.5,  $\lambda_1 = 10\lambda_0 = 5.56 \times 10^{-4}$ . This estimation of the  $\lambda$  is far from  $\lambda_{opt}$  ( $\lambda_{opt}$  is shown in Figure 4.5).

Hale et al. [35] use distribution of measurement error to find  $\lambda$ . Assuming independent normal distribution for measurement noise, they suggest

$$\lambda_2 = rac{2}{\sigma} \sqrt{rac{N}{\chi^2_{1-lpha,M}}}$$

where  $\underline{\sigma}$  = minimum eigen value of  $AA^{T}$ . For  $\alpha = 0.05$ ,  $\lambda_{2}$  will be 591.13. It is clearly far from the optimal regularization factor,  $\lambda_{opt}$  (Figure 4.5).

To understand the behavior of the best  $\lambda$ , we study optimal point curves of the problem. For each  $\lambda \in [\lambda_0, \infty)$ , let  $x_{\lambda}$  be the solution of the problem in Equation 4.13. Define

$$s(\lambda) = \|x_{\lambda}\|_{1}$$
  
$$t(\lambda) = \|Ax_{\lambda} - b\|_{2}.$$
 (4.15)

 $(s(\lambda), t(\lambda))$  defines a curve in *s*-*t* plane. A number of these curves are shown in Figure 4.6. These curves are for different noise levels. The points that are shown by star on each curve represent the optimal regularization factor,  $(s(\lambda_{opt}), s(\lambda_{opt}))$ ; we call these point optimal points. It suggests that the optimal points are approximately on a horizontal line. Thus, we use following optimization formulation to estimate the variation.

$$\min \|Ax - b\|_2$$
  
such that  $\|x\|_1 \le c$  (4.16)

where c is a constant number. We assume  $c = \theta E(||x||_1)$ ; where  $\theta \in [1.5, 2]$ .



Figure 4.6: Optimization curves for various measurement errors.

# 4.5 Path selection

The accuracy of variation estimation is a function of the paths that are used for constructing optimization problems. First of all, paths should be sensitizable; i.e., they should be possible to measure delay of the paths by externally stimulating the primary input of the IC. Moreover, the paths should be linearly independent. Ignoring the measurement noise, dependent paths provide redundant information about the variations.

#### 4.5.1 Sensitizable paths

As we mentioned in Section 4.1, it might not be possible to find the delay of every arbitrary path. Only delays of the sensitizable paths (testable paths) can be measured by externally stimulating the IC.



Figure 4.7: An example of a circuit with sensitizable and unsensitizable paths.

Figure 4.7 shows examples of a sensitizable path and an unsensitizable path. Consider following path in the circuit:  $P_2$ :  $a-g_2-g-g_4-k$ . To propagate a transition in the path  $P_2$ , d should be 1 and h should be 0. Choosing c = 1 and b = 0will satisfy these constraints. Thus,  $P_2$  is a sensitizable (testable) path. However, path  $P_3$ : c-f-g<sub>3</sub>-h-g<sub>4</sub>-k is not sensitizable. Propagation of a transition in this path happens if and only if g = 0 and e = 0. To satisfy g = 0, we should have a = 1and d = 1. It contradicts with e = 0. Thus,  $P_3$  is unsensitizable.

To ensure that a path is sensitizable, we should generate two input vectors for the circuit such that a transition propagates in the path. Creation such input vectors might be very complex and take a long time. Thus, we determine a path is testable or not in two steps: primary necessity check and using automatic test pattern generation (ATPG) tools.

Primary necessity check is based on the partial path sensitization introduced by Murakami et al. [56]. Using the topology and functionality of the circuit, they introduce bf-pairs in the circuit. Each bf-pair consists of a b-line (back line) and an f-line (forward line). bf-pairs are determined such that necessary conditions for transition propagation in b-line and f-line contradict. Thus, a testable path can not contain any bf-pair. If a path contains at least one bf-pair, it is not testable; otherwise, the path is *potentially* testable.

1

To determine if a potentially testable path is testable or not, we can use any ATPG tool to generate input vectors that test the path. In the simulations, we have used TranGen [77] for the test generation. It is a fast ATPG algorithm based on the SAT solvers.

#### 4.5.2 Basis path set

In path selection, it is also important to select independent paths. Consider following four paths in the circuit shown in Figure 4.7.

 $P_4$ : c-j-g<sub>1</sub>-e-g<sub>3</sub>-h-g<sub>4</sub>-k  $P_5$ : b-g<sub>1</sub>-e-g<sub>3</sub>-h-g<sub>4</sub>-k  $P_6$ : c-j-g<sub>1</sub>-d-g<sub>2</sub>-g-g<sub>4</sub>-k  $P_7$ : b-g<sub>1</sub>-d-g<sub>2</sub>-g-g<sub>4</sub>-k

For the circuit, it is not hard to verify

$$d_r(P_4) + d_r(P_7) = d_r(P_5) + d_r(P_6).$$

Thus, these four paths are not independent. Knowing delay of each three of them leads to the delay of the fourth one.

To efficiently minimize the number of path delay measurements, we should restrict the path set to the independent paths. We have used the method proposed by Sharma et al. [65] to generate a testable basis set for the underlying circuit. It is based on the basis generation algorithm introduced in [42] and [18].

,

9

•

٠

٠.

# Chapter 5

# Applications

In this chapter, we introduce a number of novel applications for the proposed variations estimation methods. My methods are fast, cheap, and applicable to all the combinatorial circuits. However, the previously proposed methods for variations estimation are expensive and design specific. Thus, they can barely be used in the following applications.

4

 Improving modeling and simulation: Modeling a random variable is the first step in finding its effects on a system. Modeling the process variation is widely addressed in the literature [8, 12, 16, 23, 30, 36, 43, 46, 47, 50, 76, 79]. However, there are a limited number of variation measurements that can be used to fit a specific model and verify it. Our method introduces a fast method to acquire an accurate estimation of variations in a specific IC.

The introduced variations estimation method can be also used in variation

simulations. Since there are a limited number of variation measurements, researchers have to use non-precise parametric models of variations in their simulations. Thus, simulations results might not be accurate enough. Our method provides a fast technique to estimate variations and researchers can use the real variation measurements in their simulations and improve their evaluations. These models can also be integrated within power simulator tools for accurate and realistic simulation models.

Ì

2. Post-silicon optimization: Traditional VLSI design is based on the presilicon optimizations. Various parameters of the design are considered by the designer and they are tuned to meet different constraints of the design. The variations are not considered at all; or only the statistical characteristics of variations are considered.

The static timing analysis (STA) is an example of pre-silicon optimization. The goal of STA is finding the longest delay in a specific circuit. The variations in delay are not considered in the STA. Delays of the interconnect wires and gates are deterministically modeled; then, using the graph model of the circuit, the longest path in the circuit is found. However, in the statistical static timing analysis (SSTA), the statistical characterizations of variations are utilized to improve the longest delay estimation in presence of the variations.

Today's modern fabrication processes with high variability make the post-

silicon optimization necessary. When there is no variation or variations are very small, the designer can predict the behavior of the circuit with small uncertainty. However, in the modern fabrication process, even considering statistical characteristics of the process variation might not be enough. Optimizations after manufacturing (post-silicon) can improve efficiency of the IC dramatically [34, 52, 70, 71].

Tschanz et al. [71] used bidirectional adaptive body bias to mitigate effects of the intra-die and inter-die variations on the circuits. They have considered frequency-leakage optimization in which the designer should optimize the circuit for the maximum frequency while it meets a number of leakage constraints. They vary the body bias to change the threshold voltage of the transistors in the circuit. If variations reduce the operating frequency then the threshold voltage should be decreased. If variations increase the leakage current then the threshold voltage should be increased. Thus, by increasing or decreasing the body bias, one can adjust the manufactured ICs to meet the frequency and leakage constraints. To mitigate the inter-die variation, they suggest optimizing the supply voltage based on the variation realization in each IC. Intra-die variations can be also handled using different reference voltages in different parts of the IC. They need an estimation of variations to optimize each circuit separately. Our method can efficiently provide them the estimation. Pre-silicon optimizations (gate sizing) and post-silicon optimizations (adaptive body bias) can be used to reduce the loss of the parametric yield. Mani et al. [52] propose a joint optimization method to mitigate effects of variations on the yield. They show that their method results in a reduction of 5-35% in the leakage current.

In all the mentioned post-silicon optimization methods, an estimation of variations is necessary to optimize each circuit separately. Our method can efficiently provide them such an estimation.

- 3. Manufacturing process characterization: The proposed variations estimation method can be used to characterize the statistical properties of a specific manufacturing technique. In the other words, one can characterize variations based on the specific manufacturing technology. This characterization can be used to optimize designs for a specific manufacturing technology. It can also be used to modify the manufacturing technology in order to decrease variations.
- 4. IC identification and finger printing: Variations are result of complicated nanoscale physical interactions and systematic imperfectness of the manufacturing tools. Thus, it is practically impossible to clone variations in an IC; i.e., the variations in each IC are unique and can not be replicated. It is an important property that can be used in IC identification and finger printing.

Physical unclonable function (PUF) [31] is a security scheme that uses variations in a chip as its secrete key. Delay based PUFs use delay variations in the ICs to construct a function in the chips such that the output of the function depends on the variations. Thus, for the same input, the output of the function varies across the different chips. This unique and unclonable function in each IC can be used as the secrete key.

١.

- 5. Identifying hot spots: Various sections of an IC dispatch different power levels. Hot spot are the sections that dispatch more power and become hot sooner than other sections of the IC. Process variation also affects the hot spots on the IC. Using proposed variations estimation method, one can determine hot spots of a specific IC in presence of variations. Thus, these hot spots can be specifically controlled or cooled down to avoid possible damages.
- 6. Workload scheduling: Maximum frequency of the various parts of the IC is a function of the design and variations. Knowing variation in an IC helps us to find the true power consumption and speed of the different parts of the IC. Thus, one can develop softwares that consider process variations and uses all the resources of the IC optimally.

An example of such a software is proposed for workload management of cache memories by Meng and Joseph [53]. They show that inter-die and intra-die variations can dramatically affect leakage current of the ways in
the cache; i.e., maximum leakage to minimum leakage under variation might be  $10\sim100$ . Then, they introduce way prioritization technique to select low leakage ways in cache management. The propose technique can approximately reduce leakage current by 20%. It is important to note that the way prioritization technique utilize variations estimation. However, they do not provide a fast and cheap method for the variations estimation.

•

÷,

٤

٦,

# Chapter 6

## **Evaluation Results**

To verify the accuracy of the proposed methods, we simulated variations in a number of MCNC benchmark circuits. Then, we used  $\ell_1$ -regularization,  $\ell_2$ -minimization and TUSC (see Section 3.4) to estimate the variations. The simulation result shows that using  $\ell_1$ -regularization and TUSC improve the estimations dramatically.

### 6.1 Simulations setup

• The variation model: As it is explained in Section 2.2.1, we have used multivariate Gaussian distribution to model the spatial correlation in the variations. The model well agrees with the measurement data and is also used by other researchers [22, 32, 47, 69].

• The transistor model: We have used BSIM4 model for 65nm technology in the simulations [13]. The BSIM4 model is developed such that it can accurately model behavior of a transistor in the sub-100nm regime.

 Benchmark circuits: We have used a number of MCNC benchmark circuits in our simulations. The MCNC benchmarks were introduced in 1985 on magnetic tapes, and they are updated, modified, and enhanced regularly. The benchmarks are widely used in design automation community (for example see [37, 54, 74]).

J

- The  $\ell_1$ -regularization software: The SPGL1 software package [68] is used for  $\ell_1$ -regularization. The SPGL1 uses an iterative approach to solve the LASSO problem. In each iteration radius of  $\ell_1$  ball is increased until the convergence. For more details, please see [72].
- The quadratically constrained quadratic program (QCQP) solver: We have used SeDuMi (self-dual minimization) software package [61] for  $\ell_2$ -nimization and the QCQP in Section 3.4. SeDuMi is maintained at the Advanced Optimization Lab at McMaster University. It can be used to solve various symmetric cone problems.
- The ATPG tool: PathATPG [77] is used to identify testable paths and to generate test input pairs for the testable paths. PathATPG is fast ATPG tool that is based on the SAT-solvers.

Estimation in a subspace of the variations space: Measurement matrices in Equations 3.5 and 4.9 are not full rank. Thus, we should not expect to estimate variations of all gates; i.e., null space of the measurement matrix A, N(A) = {y ∈ ℝ<sup>n</sup> | Ay = 0}, is not accessible.

Assume  $A_K$  is a measurement matrix that includes K measurements (delay or power). For a large K (say K > 10N, where N is the number of gates), range of  $A_K$ , cover almost whole the variation space that can be measured. Hence, we use singular vectors of  $A_K$  as the comparison space. By estimation in  $n_e$  subspace, we mean estimation in direction of the first  $n_e$  singular vectors of  $A_K$ .

- As it is explained in Section 3.2, we use multi-voltage power measurements to construct the measurement matrix.
- We have used the exponential correlogram function to generate the variations (see 2.2). We have used the same function as  $\gamma(d_{j,u})$  in Section 3.4.

### 6.2 Power tomography results

In this section, we evaluate performance of the  $\ell_2$ -norm optimization, the  $\ell_1$ -norm regularization, and TUSC for the chip tomography.



Figure 6.1: Singular values of the measurement matrix.

#### 6.2.1 Measurement matrix evaluation

The functionality of the IC imposes dependencies in logic gate status. Thus, the power vectors for the input vectors (i.e., the rows of the measurement matrix A) are not necessarily independent. In this sections, we use the singular value decomposition (SVD) to quantify the dependency of the rows of A.

A matrix with N independent rows has N non-zero singular values. The sorted singular values of C499 and C880 circuits are shown in Figure 6.1 for a measurement matrix with  $M = 6 \times N$  measurements, where N is the number of gates. On the figure the singular values for each circuit are normalized such that the largest singular value is 1. The figure demonstrates that the singular values decay rapidly; the 20-th singular value in both circuits are less than 10% (0.1).



Figure 6.2: Variations estimation error vs. percent of the power measurement noise.

This decay suggests that it is not possible to find variation of all gates independently because there is no information about the null space of the measurement matrix,  $\mathcal{N}(A) = \{y \in \mathbb{R}^N | Ay = 0\}$ . Thus, we can only estimate the variation in a subspace S that does not contain  $\mathcal{N}(A)$ .

#### 6.2.2 Tomography results in the power framework

To study the performance of the proposed tomography method, we have simulated the process variation on a number of MCNC benchmarks. A total of 12% variation is assumed in the simulations. Based on the data in [16] and [76], 20% of the total variation is inter-die variation, 60% is spatial correlated intra-die variation, and 20% is random uncorrelated variation. To model the leakage current (static



Figure 6.3: Variation estimation error vs. number of power measurements.

power), we used the HSPICE simulator on 65nm CMOS transistor technology.

Figure 6.3 presents variations estimation error for the C499 and the C880 benchmark circuits. The horizontal axis is the power measurement noise and the vertical axis is the variations estimation error. The variation estimation is calculated in a N/3-dimensional subspace, where N is the number of gates. Note that by construction the estimation space is orthogonal to the null space of the measurement matrix. Thus, for low noise measurements the  $\ell_1$ -regularization and TUSC are very similar. As the noise level increases, TUSC performs better than the  $\ell_1$ -norm regularization. Note that  $\ell_2$ -minimization performs much worse than  $\ell_1$ -regularization and TUSC; it is not shown on the figure, please refer to Table 6.2 for this comparison. The number of measurements also affects the estimation error. Figure 6.3 presents variation estimation error versus number of measurements. The horizontal axis is the ratio of measurements to the total number of the gates in the circuit. The variation is estimated on N/4-dimensional subspace. M is 383 and 317 for C499 and C880 respectively (M denotes the number of measurements in Table 6.2). Note that as the number of measurements increases, they cover most of the identifiable directions. Thus sparsity and shape constraints are similar in large number of measurements and the errors of the  $\ell_1$ -regularization and TUSC become nearly the same.

f

Table 6.1 shows average number of the independent power vectors for single and multiple voltage measurement. The second column is the number of power vectors (measurements). To find number of the independent vectors in each measurement set, we first find their singular values, then we count the number of non-zero singular values. The third and fifth columns show the number of independent power vectors for single and triple voltage measurements, respectively. The table explains that triple voltage measurements increases the number of independent power vectors.

Table 6.2 shows tomography results on different benchmark circuits. We used the software package SIS [67] with NAND2, NAND3, NAND4, NOR2, NOR3, NOR4, and inverters to map the circuit to the logic gates. The second column shows the number of gates and the third column reports the number of input Table 6.1: Average number of independent power vectors for single and triple voltage

•

€° €°

٣

| Circuit | Number of measurements | Single-voltage | 3-voltages |
|---------|------------------------|----------------|------------|
| C432    | 185                    | 132.6          | 151.6      |
| C499    | 383                    | 183.4          | 265.0      |
| C880    | 317                    | 217.0          | 250.7      |
| C1355   | 465                    | 184.4          | 251.5      |
| C1908   | 553                    | 192.9          | 260.9      |
| C2670   | 540                    | 322.9          | 350.3      |
| alu2    | 324                    | 167.9          | 198.2      |
| alu4    | 659                    | 312.9          | 351.8      |
| comp    | 127                    | 84.5           | 112.7      |
| cordic  | 79                     | 55.1           | 71.1       |
| Ъ9      | 101                    | 84.2           | 92.7       |
| c8      | 138                    | 112.0          | 127.0      |

measurements.

L

58, S.,

165

÷

| Circuit properties |             |         |       |                                 | 3% noise  |                      |                | 6% noise |                | 9% noise       |        |         |                |       |       |       |       |       |
|--------------------|-------------|---------|-------|---------------------------------|-----------|----------------------|----------------|----------|----------------|----------------|--------|---------|----------------|-------|-------|-------|-------|-------|
| name               | #gates      | #inputs | #meas | $\frac{\sigma_{N/2}}{\sigma_1}$ | subspace  | ℓ <sub>1</sub> -reg. | $\ell_2$ -min. | TUSC     | $\ell_1$ -reg. | $\ell_2$ -min. | TUSC   | ℓ1-reg. | $\ell_2$ -min. | TUSC  |       |       |       |       |
| C432               | 206         | 36      | 185   | 0.0076                          | 61        | 2.82                 | 6.08           | 3.97     | 5.13           | 12.13          | 5.57   | 7.75    | 18.19          | 7.46  |       |       |       |       |
|                    |             |         |       |                                 | 92        | 4.85                 | 10.21          | 7.40     | 8.76           | 20.41          | 9.58   | 12.86   | 30.63          | 12.27 |       |       |       |       |
| C499               | C499 532 41 | 41      | 383   | 383                             | 383       | 0.0009               | 127            | 2.71     | 9.87           | 2.7            | 4.98   | 19.77   | 4.77           | 7.31  | 29.67 | 6.97  |       |       |
|                    |             |         |       |                                 | 191       | 7.83                 | 38.08          | 8.18     | 13.90          | 76.40          | 11.56  | 20.50   | 114            | 15.6  |       |       |       |       |
| C880               | 353         | 60      | 317   | 0.004                           | 105       | 3.20                 | 8.61           | 2.99     | 6.06           | 17.27          | 5.66   | 9.01    | 25.94          | 8.39  |       |       |       |       |
|                    |             |         |       |                                 | 158       | 6.03                 | 16.00          | 5.59     | 11.27          | 32.11          | 10.12  | 16.72   | 25.94          | 8.39  |       |       |       |       |
| C1355              | 517         | 41      | 465   | 0.0008                          | 155       | 4.27                 | 65.19          | 4.27     | 7.61           | 130.7          | 7.32   | 11.10   | 196.3          | 10.42 |       |       |       |       |
|                    |             |         |       | 232                             | 15.82     | 248.3                | 15.33          | 26.51    | 498.2          | 19.11          | 37.65  | 748.3   | 23.72          |       |       |       |       |       |
| C1908              | 615         | 33      | 553   | 553                             | 553       | 553                  | 553            | 0.0002   | 184            | 4.89           | 44.77  | 5.19    | 9.29           | 89.69 | 8.35  | 13.77 | 134.6 | 11.87 |
|                    |             |         |       |                                 | 276       | 14.71                | 113.4          | 13.05    | 22.53          | 227.1          | 16.78  | 30.60   | 340.9          | 21.83 |       |       |       |       |
| C2670 900          | 900         | 233     | 540   | 4e-5                            | 180       | 4.05                 | 5.43           | 3.76     | 7.29           | 10.87          | 6.95   | 10.70   | 16.30          | 10.24 |       |       |       |       |
|                    |             |         |       |                                 | 270       | 8.53                 | 11.52          | 8.37     | 15.17          | 23.04          | 13.75  | 22.25   | 34.56          | 19.56 |       |       |       |       |
| alu2               | 360         | 10      | 324   | 0.0014                          | 108       | 6.35                 | 54.97          | 5.67     | 10.12          | 109.7          | 9.21   | 14.29   | 164.5          | 13.10 |       |       |       |       |
|                    |             |         |       |                                 | 162       | 13.61                | 120.9          | 12.83    | 21.54          | 241.4          | 17.80  | 30.37   | 361.9          | 23.74 |       |       |       |       |
| alu4               | 733         | 14      | 659   | 659 0                           | 59 0.0008 | 219                  | 6.70           | 64.01    | 5.82           | 11.56          | 127.96 | 10.74   | 16.73          | 191.9 | 15.81 |       |       |       |
|                    |             |         |       |                                 | 329       | 13.61                | 129.53         | 11.66    | 21.91          | 258.9          | 19.75  | 31.06   | 388.4          | 28.44 |       |       |       |       |
| comp               | 163         | 32      | 32    | 32                              | 127       | 127 0.005            | 42             | 2.73     | 3.94           | 2.60           | 4.87   | 7.74    | 4.67           | 7.12  | 11.56 | 6.84  |       |       |
|                    |             |         |       |                                 | 63        | 4.47                 | 6.34           | 4.25     | 7.94           | 12.42          | 7.56   | 11.64   | 11.56          | 6.84  |       |       |       |       |
| cordic             | 102         | 2 23    | 79    | 79 0.005                        | 0.005     | 26                   | 1.87           | 3.74     | 3.01           | 3.23           | 7.45   | 3.85    | 4.67           | 11.17 | 4.93  |       |       |       |
|                    |             |         |       |                                 | 39        | 3.35                 | 6.54           | 6.97     | 5.84           | 13.01          | 8.02   | 8.46    | 19.51          | 9.48  |       |       |       |       |
| Ъ9                 | 113         | 41      | 101   | 0.014                           | 33        | 2.51                 | 4.02           | 3.66     | 4.68           | 8.02           | 5.01   | 6.90    | 12.02          | 6.66  |       |       |       |       |
|                    |             |         |       |                                 | 50        | 4.00                 | 6.84           | 6.79     | 7.42           | 13.63          | 8.50   | 10.97   | 20.44          | 10.67 |       |       |       |       |
| c8                 | 165         | 28      | 138   | 0.008                           | 46        | 3.50                 | 4.61           | 4.32     | 6.01           | 8.93           | 6.06   | 8.74    | 13.30          | 8.10  |       |       |       |       |
|                    |             |         |       |                                 | 69        | 6.22                 | 8.06           | 8.50     | 10.56          | 15.66          | 10.91  | 15.26   | 23.36          | 13.86 |       |       |       |       |

Table 6.2: Performance of the  $\ell_2$ -norm minimization, the  $\ell_1$ -norm regularization, and

4

TUSC for a number of MCNC benchmark circuits in the power framework.



Figure 6.4: Singular values of the measurement matrices decay very fast.

pins. For each circuit, we have measured the path delays for a number of paths in the testable basis set, reported in the fourth column. The fifth column shows the ratio of the N/2-th singular value of the measurement matrix to the 1-st one. The M/3 and the M/2-dimensional subspaces—the sizes of which are reported in the sixth column—were estimated for the  $\ell_1$ -regularization, the  $\ell_2$ -minimization, and the TUSC methods were evaluated (M is the number of measurements). The remaining columns demonstrate the results for 3%, 6%, and 9% measurement noise. On average, the  $\ell_1$ -regularization and the TUSC perform more than two times better in estimating the variations.

### 6.3 Delay evaluation results

#### 6.3.1 Measurement matrix and estimation in subspaces

As mentioned in Section 4.2, due to the existence of ambiguities (path dependencies), it may not be possible to find the variations for all gates in the circuit. In the other words, the measurement matrix, A, is not necessarily a full-rank matrix. Most often the measurement matrix is ill-conditioned and its singular values decay rapidly. Figure 6.4 shows singular values of the measurement matrix for C880 and C499 circuit. The singular values are normalized to have the maximum value equal to 1. The singular values decay to 10% of the maximum after almost 100 singular values. Note that C432 and C880 have 206 and 353 gates, respectively. The figure also shows the singular value of a random Gaussian matrix. It is clear that singular values of the measurement matrices (for C499, C800) decay much faster than the random Gaussian matrix.

Hence, it is not possible to find the variations of all gates. We measured estimation error in the space of singular values. The estimation error is minimum at the direction of the singular vector corresponding to the largest singular value and so on. We say the estimation subspace has dimension  $n_e$ , when we project estimation error to the space of the first  $n_e$  singular vectors.

#### 6.3.2 Delay tomography results

To evaluate the performance of the proposed methods, we simulate the variation model (Section 4.1.1) on a number of MCNC benchmark circuits. A total of 12% random variations is assumed. Correlated intra-die variation is 60% of the total variation [16] [76]; 20% of the total variation is uncorrelated intra-die variation and the remaining variation is allotted to the inter-die variation.

Similar to Section 6.2, we have used SIS software to map the benchmark circuits to NAND2, NAND3, NAND4, NOR2, NOR3, NOR4, and inverter gates. Then, using Dragon, a placement software package [1], gates are placed on the IC. Since various gates cover different areas on the IC, gates are located on irregular grids.

To calculate the falling and rising coefficients ( $\xi_{f,g_u}$  and  $\xi_{r,g_u}$  in Equation 4.7), we implemented all the gates with 65nm CMOS transistor technology. Then, we used the HSPICE software to fit the linear model for all gates.

Figure 6.5 shows variations estimation error for the  $\ell_2$ -minimization and the  $\ell_1$ -regularization methods. The horizontal axis is delay measurement noise and the vertical axis is variations estimation error. The  $\ell_1$ -regularization yields more than a 50% decrease in error over the  $\ell_2$ -minimization. The estimation subspace is 84 for both C432 and C880 circuits. When measurement noise is small, delay measurements provides enough information to estimate variations accurately. As measurement noise increase, sparsity does not provide significant information.



Figure 6.5: Variation (delay) estimation error vs. measurement error.

Thus, performance of the  $\ell_1$ -regularization over the  $\ell_2$ -minimization increases as measurement noise increases.

The effect of the number of measurements is illustrated in Figure 6.6. The horizontal axis is the number of delay measurements divided by the number of the gates. Again, the  $\ell_1$ -regularization performs more than two times better than the  $\ell_2$ -minimization. On the figure, the estimation subspace is 84 for both C432 and C880 circuits.

Next, we evaluate the basis path sets for the benchmark circuits. The method introduced in Section 4.5.2 provides a heuristic procedure for basis path selection. However, it does not necessarily result in an independent basis path that covers all the space. Table 6.3 shows the number of basis paths in the benchmark circuits.

| Circuit | gates | # basis path | # independent | # independent    |
|---------|-------|--------------|---------------|------------------|
| -       |       |              | paths         | linear equations |
| C432    | 206   | 199          | 121           | 153              |
| C499    | 532   | 422          | 271           | 375              |
| C880    | 353   | 351          | 184           | 253              |
| C1355   | 517   | 480          | 233           | 335              |
| C1908   | 615   | 590          | 318           | 414              |
| C2670   | 900   | 979          | 422           | 632              |
| alu2    | 360   | 368          | 183           | 230              |
| alu4    | 733   | 693          | 337           | 449              |
| comp    | 163   | 131          | 84            | 122              |
| cordic  | 102   | 92           | 59            | 77               |
| b9      | 113   | 142          | 75            | 90               |
| c8      | 165   | 201          | 96            | 119              |

path, rising and falling transitions result in different linear equations.

Table 6.3: Number of independent paths and independent linear equations. For each

Ł

**۶**2-



Figure 6.6: Variation (delay) estimation error vs. the number of measurements.

The third column is the number of independent paths in each basis path. As mentioned before, two linear equations can be written for each path (rising and falling transitions). The last column of the table is number of independent linear equations that provides each basis path set.

Finally, Table 6.4 shows results of variation estimation on 12 benchmark circuits. After the benchmarks' name, the first, the second and the third columns are the number of gates, the number of inputs in the circuit, and the number of delay measurements, respectively. The fourth column is the ratio of the N/2-th singular value to the first singular value in the measurement matrix (N is number of gates). This column shows how fast singular values decay; or how the measurement matrix is well conditioned. The fifth column is the estimation sub-

| Table $6.4$ : | Performance | of $\ell_2$ -norm | n minimization | and $\ell_1$ -norm | regularization | for a num- |
|---------------|-------------|-------------------|----------------|--------------------|----------------|------------|
|               |             | -                 |                |                    | 0              |            |

.

| Circuit properties |        |         |       |                                 | 3%       | 3% noise       |                | 6% noise             |                | 9% noise       |                |
|--------------------|--------|---------|-------|---------------------------------|----------|----------------|----------------|----------------------|----------------|----------------|----------------|
| name               | #gates | #inputs | #meas | $\frac{\sigma_{N/2}}{\sigma_1}$ | subspace | $\ell_1$ error | $\ell_2$ error | ℓ <sub>2</sub> error | $\ell_2$ error | $\ell_1$ error | $\ell_2$ error |
| C432               | 206    | 36      | 199   | 0.035                           | 39       | 6.05           | 7.15           | 10.38                | 13.72          | 14.88          | 20.42          |
|                    |        |         |       |                                 | 66       | 10.13          | 12.29          | 16.18                | 22.47          | 22.8           | 32.93          |
| C499               | 532    | 41      | 422   | 0.022                           | 84       | 7.31           | 13.15          | 10.82                | 25.72          | 15.29          | 38.41          |
| 1                  |        |         |       |                                 | 140      | 11.10          | 20.47          | 16.12                | 39.0           | 22.69          | 57.94          |
| C880               | 353    | 60      | 421   | 0.036                           | 84       | 4.52           | 8.93           | 8.42                 | 17.81          | 12.41          | 26.71          |
| Ì                  |        |         |       |                                 | 140      | 7.71           | 13.12          | 14.86                | 26.06          | 21.95          | 39.04          |
| C1355              | 517    | 41      | 480   | 0.0211                          | 96       | 5.00           | 8.19           | 9.04                 | 16.39          | 12.61          | 24.58          |
|                    |        |         |       |                                 | 160      | 6.35           | 9.50           | 11.90                | 19.00          | 17.07          | 28.50          |
| C1908              | 615    | 33      | 590   | 0.020                           | 118      | 4.89           | 7.51           | 8.87                 | 14.66          | 13.0           | 21.89          |
|                    |        |         |       |                                 | 196      | 7.9            | 12.54          | 13.92                | 24.30          | 20.32          | 36.20          |
| C2670              | 900    | 233     | 979   | 0.022                           | 194      | 8.68           | 21.76          | 11.34                | 41.48          | 14.99          | 61.47          |
|                    |        |         |       |                                 | 326      | 10.42          | 21.83          | 14.61                | 41.37          | 19.52          | 61.29          |
| alu2               | 360    | 10      | 368   | 0.015                           | 73       | 5.20           | 6.06           | 7.75                 | 9.83           | 10.66          | 13.99          |
|                    |        |         |       |                                 | 122      | 10.22          | 11.59          | 14.53                | 17.98          | 19.43          | 25.11          |
| alu4               | 733    | 14      | 693   | 0.010                           | 138      | 5.94           | 10.06          | 9.84                 | 19.89          | 14.21          | 29.79          |
|                    |        |         |       |                                 | 231      | 10.60          | 16.51          | 15.70                | 32.76          | 21.99          | 49.10          |
| comp               | 163    | 32      | 131   | 0.023                           | 26       | 5.18           | 11.00          | 8.07                 | 21.23          | 11.08          | 31.53          |
|                    |        |         |       |                                 | 43       | 7.60           | 15.92          | 13.52                | 31.16          | 19.38          | 46.45          |
| cordic             | 102    | 23      | 92    | 0.03                            | 18       | 4.43           | 26.72          | 7.11                 | 53.41          | 10.09          | 80.11          |
|                    |        |         |       |                                 | 30       | 9.75           | 62.83          | 14.57                | 125            | 20.27          | 188            |
| b9                 | 113    | 41      | 142   | 0.076                           | 28       | 2.12           | 2.22           | 3.51                 | 3.75           | 5.04           | 5.43           |
|                    |        |         |       |                                 | 47       | 4.27           | 4.94           | 6.43                 | 8.04           | 8.97           | 11.48          |
| c8                 | 165    | 28      | 201   | 0.039                           | 40       | 11.03          | 17.51          | 16.15                | 31.19          | 21.52          | 45.52          |
|                    |        |         |       |                                 | 67       | 25.70          | 41.12          | 33.43                | 74.30          | 43.00          | 109            |

ber of MCNC benchmark circuits.

space. The rest of the columns represent the estimation error (in percent) for  $\ell_2$  minimization and  $\ell_1$  regularization with 3%, 6%, and 9% percent measurement noise.

### Chapter 7

## Conclusion

We proposed a fast and inexpensive method for the gate-level variations estimation in the power and the delay frameworks. In the power framework, the total power consumption is measured for a number of input vectors to the IC. Because of the variations, the power consumption of the gates in the circuit will be scaled. Using the leakage model of variations, we construct a linear equation for each power measurement with the scaling factors of the gates as the unknown variables. In the delay framework, the linear equations are constructed by measuring delays of a sensitizable basis path set. Here, unknown variables are the variations in the gate sizing that have a linear relationship with the delay.

Next, we estimate the gate-level variations (power or delay) by solving the appropriate system of linear equations. We can use the traditional  $\ell_2$ -minimization to estimate the gate level variations. Since there are not enough linearly inde-

pendent measurements, the  $\ell_2$ -minimization method performs poorly. However, it is widely known that variations (power or delay) are spatially correlated; i.e., nearby gates are expected to have close variations. Because of the spatial correlations in the variations, there exists a basis in which variations can be represented sparsely. The sparse representation suggests using the compressive sensing theory. We show how to use the compressive sensing theory to improve the post-silicon characterization. We also modify the traditional  $\ell_2$ -minimization by adding the spatial constraint directly. The spatial constraints enforce the nearby gates to have close variations. The proposed method just uses external input/output pins of the IC for the estimation. In the power framework, first, a number of input vectors are applied to the IC and power consumption is measured for each input vector. Next, we establish an optimization problem based on the power measurements. Finally, we improve the optimization problem using spatial correlation in variations. In the delay framework, we follow the same procedure as we did in the power framework. However, one can measure paths delays just in sensitizable paths. Thus, here, the optimization problem is constructed based on the delay measurements in a set of testable basis paths.

The variations can affect various properties in the IC and estimating variations in an IC suggests a number of applications such as post-silicon optimizations. Evaluation results verify our method. We showed that, compared to traditional  $\ell_2$ -minimization,  $\ell_1$ -regularization can improve variation estimation about 80%

#### on average.

**T**21

.

122

.

# Bibliography

- [1] http://er.cs.ucla.edu/dragon/.
- [2] A. Agarwal, D. Blaauw, and V. Zolotov. Statistical clock skew analysis considering intra-die process variations. In International Conference on Computer-Aided Design, page 914, 2003.
- [3] A. Agarwal, D. Blaauw, and V. Zolotov. Statistical timing analysis for intradie process variations with spatial correlations. In *IEEE/ACM International Conference on Computer-Aided Design*, page 900, 2003.
- [4] A. Agarwal, D. Blaauw, V. Zolotov, S. Sundareswaran, M. Zhao, K. Gala, and R. Panda. Statistical delay computation considering spatial correlations. In *Conference on Asia South Pacific Design Automation*, pages 271–276, 2003.
- [5] A. Agarwal, K. Kang, and K. Roy. Accurate estimation and modeling of total chip leakage considering inter- and intra-die process variations. In International Conference on Computer-Aided Design, pages 736–741, 2005.

- [6] A. Agarwal, V. Zolotov, and D. Blaauw. Statistical clock skew analysis considering intradie-process variations. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 23(8):1231–1242, 2004.
- [7] K. Agarwal, F. Liu, C. McDowell, S. Nassif, K. Nowka, M. Palmer, D. Acharyya, and J. Plusquellic. A test structure for characterizing local device mismatches. In VLSI Circuits, Digest of Technical Papers., pages 67–68, 2006.
- [8] I. Ahsan, N. Zamdmer, O. Glushchenkov, R. Logan, E. Nowak, H. Kimura, J. Zimmerman, G. Berg, J. Herman, E. Maciejewski, A. Chan, A. Azuma, S. Deshpande, B. Dirahoui, G. Freeman, A. Gabor, M. Gribelyuk, S. Huang, M. Kumar, K. Miyamoto, D. Mocuta, and Mahoro. Rta-driven intra-die variations in stage delay, and parametric sensitivities for 65nm technology. In VLSI Technology, Digest of Technical Papers., pages 170–171, 2006.
- [9] M. Ashouei, M. M. Nisar, A. Chatterjee, A. D. Singh, and A. U. Diril. Probabilistic self-adaptation of nanoscale cmos circuits: Yield maximization under increased intra-die variations. In International Conference on VLSI Design held jointly with 6th International Conference: Embedded Systems, pages 711-716, 2007.
- [10] R. Baraniuk. A lecture on compressive sensing. IEEE Signal Processing Magazine, 24(4):118–121, 2007.

- [11] S. Bhardwaj and S. Vrudhula. A fast and accurate approach for full chip leakage analysis of nano-scale circuits considering intra-die correlations. In International Conference on VLSI Design held jointly with 6th International Conference: Embedded Systems, pages 589–594, 2007.
- [12] S. Bhardwaj, S. Vrudhula, P. Ghanta, and Y. Cao. Modeling of intra-die process variations for accurate analysis and optimization of nano-scale circuits. In *Conference on Design Automation*, pages 791–796, 2006.
- BSIM Research Group. http://www-device.eecs.berkeley.edu/ bsim3/bsim4.html, seen in June, 2008.
- [14] S. M. Burns, M. Ketkar, N. Menezes, K. A. Bowman, J. W. Tschanz, and V. De. Comparative analysis of conventional and statistical design techniques. In *Conference on Design Automation*, pages 238–243, 2007.
- [15] E. Candes. Compressive sampling. In Int. Congress of Mathematics, pages 1433–1452, 2006.
- [16] Y. Cao and L. T. Clark. Mapping statistical process variations toward circuit performance variability: an analytical modeling approach. In *Conference on Design Automation*, pages 658–663, 2005.
- [17] H. Chang and S. Sapatnekar. Statistical timing analysis under spatial correlations. IEEE Transactions Computer-Aided Design of Integrated Circuits and Systems, 24(9):1467–1482, 2005.

- [18] K. T. Cheng and H. C. Chen. Delay testing for non-robust untestable circuits. In IEEE International Test Conference on Designing, Testing, and Diagnostics - Join Them, pages 954–961, 1993.
- [19] S. H. Choi, B. C. Paul, and K. Roy. Novel sizing algorithm for yield improvement under process variation in nanometer technology. In *Conference* on Design Automation, pages 454–459, 2004.
- [20] B. Cline, K. Chopra, D. Blaauw, and Y. Cao. Analysis and modeling of CD variation for statistical static timing. In *Conference on Design Automation*, pages 60–66, 2006.
- [21] A. Datta, S. Bhunia, S. Mukhopadhyay, N. Banerjee, and K. Roy. Statistical modeling of pipeline delay and design of pipeline under process variation to enhance yield in sub-100nm technologies. In *Conference on Design, Automation and Test in Europe*, pages 926–931, 2005.
- [22] Q. Ding, R. Luo, H. Wang, H. Yang, and Y. Xie. Modeling the impact of process variation on critical charge distribution. In *International SOC Conference*, pages 243–246, 2006.
- [23] J. Doh, D. Kim, S. Lee, J. Lee, Y. Park, M. Yoo, and J. Kong. A unified statistical model for inter-die and intra-die process variation. In International Conference on Simulation of Semiconductor Processes and Devices, pages 131–134, 2005.

- [24] D. L. Donoho. Compressed sensing. IEEE Transaction on Information Theory, 52(4):1289–1306, 2006.
- [25] D. L. Donoho, M. Vetterli, R. A. DeVore, and I. Daubechies. Data compression and harmonic analysis. *IEEE Transaction on Information Theory*, 44(6):2435–2476, 1998.
- [26] M. Eisele, J. Berthold, D. Schmitt-Landsiedel, and R. Mahnkopf. The impact of intra-die device parameter variations on path delays and on the design for yield of low voltage digital circuits. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 5(4):360–368, 1997.
- [27] M. Eisele, J. Berthold, R. Thewes, E. Wohlrab, D. Schmitt-Landsiedel, and W. Weber. Intra-die device parameter variations and their impact on digital cmos gates at low supply voltages. In *International Electron Devices Meeting*, pages 67–70, 1995.
- [28] F. Fallah and P. Massoud. Standby and active leakage current control and minimization in cmos vlsi circuits. *IEICE Trans Electron (Inst Electron Inf Commun Eng)*, E88-C(4):509–519, 2005.
- [29] Z. Feng, P. Li, and Y. Zhan. Fast second-order statistical static timing analysis using parameter dimension reduction. In *Conference on Design Automation*, pages 244–249, 2007.

[30] P. Friedberg, Y. Cao, J. Cain, R. Wang, J. Rabaey, and C. Spanos. Modeling within-die spatial correlation effects for process-design co-optimization. In International Symposium on Quality of Electronic Design, pages 516–521, 2005.

A

- [31] B. Gassend, D. Clarke, M. V. Dijk, and S. Devadas. Silicon physical random functions. In ACM Conference on Computer and Communications Security, pages 148–160, 2002.
- [32] P. Ghanta and S. Vrudhula. Analysis of power supply noise in the presence of process variations. In *IEEE Design Test*, pages 256–266, 2007.
- [33] P. Ghanta, S. Vrudhula, S. Bhardwaj, and R. Panda. Stochastic variational analysis of large power grids considering intra-die correlations. In *Conference* on Design Automation, pages 211–216, 2006.
- [34] J. Gregg and T. W. Chen. Post silicon power/performance optimization in the presence of processvariations using individual well adaptive body biasing (iwabb). In International Symposium on Quality Electronic Design, pages 453–458, 2004.
- [35] E. T. Hale, W. Yin, and Y. Zhang. A fixed-point continuation method for L1-regularization with application to compressed sensing. *Rice University*, *CAAM Technical Report*, (TR07-07), 2007.

- [36] B. Hargreaves, H. Hult, and S. Reda. Intra-die process variations: How accurately can they be statistically modeled? In *Conference on Asia-pacific Design Automation*, pages 524–530, 2008.
- [37] J. Hlavicka and P. Fiser. A heuristic method of two-level logic synthesis. In World Multiconference on Systemics, Cybernetics and Informatics, pages 524–530, 2001.
- [38] V. Iyengar, J. Xiong, S. Venkatesan, V. Zolotov, D. Lackey, P. Habitz, and C. Visweswariah. Variation-aware performance verification using atspeed structural test and statistical timing. In *International Conference on Computer-Aided Design*, pages 405–412, 2007.
- [39] V. Khandelwal and A. Srivastava. A general framework for accurate statistical timing analysis considering correlations. In *Conference on Design Automation*, pages 89–94, 2005.
- [40] S.-J. Kim, K. Koh, M. L. ans S. Boyd, and D. Gorinevsky. An interior-point method for large-scale l1-regularized least squares. *IEEE Journal of Selected Topics in Signal Processing*, 1(4):606–617, 2007.
- [41] K. Lakshmikumar, R. A. Hadaway, and M. Copeland. Characterisation and modeling of mismatch in mos transistors for precision analog design. *IEEE Journal of Solid-State Circuits*, 21(6):1057–1066, 1986.

- [42] J. D. Lesser and J. J. Shedletsky: An experimental delay test generator for lsi logic. *IEEE Transactions on Computers*, 29(3):235–248, 1980.
- [43] X. Li, J. Le, L. T. Pileggi, and A. Strojwas. Projection-based performance modeling for inter/intra-die variations. In International Conference on Computer-Aided Design, pages 721–727, 2005.
- [44] X. Li, P. Li, and L. T. Pileggi. Parameterized interconnect order reduction with explicit-and-implicit multi-parameter moment matching for inter/intradie variations. In International Conference on Computer-Aided Design, pages 806-812, 2005.
- [45] B. Lin. methods to print optical images at low kl factors. SPIE, 1264:2–13, 1990.
- [46] B. Liu. spatial correlation extraction via random field simulation and production chip performance regression. In Conference of Design, Automation, and Test in Europ, pages -, 2008.
- [47] F. Liu. A general framework for spatial correlation modeling in vlsi design. In Conference on Design Automation, pages 817–822, 2007.
- [48] Q. Liu and S. Sapatnekar. Confidence scalable post-silicon statistical delay prediction under process variations. In *Conference on Design Automation*, pages 497–502, 2007.

- [49] X. Lu, Z. Li, W. Qiu, D. M. H. Walker, and W. Shi. Longest path selection for delay test under process variation. In Conference on Asia South Pacific Design Automation: Electronic Design and Solution Fair, pages 98– 103, 2004.
- [50] J. Luo, S. Sinha, Q. Su, J. Kawa, and C. Chiang. An ic manufacturing yield model considering intra-die variations. In *Conference on Design Automation*, pages 749–754, 2006.
- [51] H. Mangassarian and M. Anis. On statistical timing analysis with interand intra-die variations. In Conference on Design, Automation and Test in Europe, pages 132–137, 2005.
- [52] M. Mani, A. Singh, and M. Orshansky. Joint design-time and post-silicon minimization of parametric yield loss using adjustable robust optimization.
   In International Conference on Computer-Aided Design, pages 19–26, 2006.
- [53] K. Meng and R. Joseph. Process variation aware cache leakage management.
  In International Symposium on Low Power Electronics and Design, pages 262–267, 2006.
- [54] A. Mishchenko, S. Chatterjee, and R. Brayton. Dag-aware aig rewriting a fresh look at combinational logic synthesis. In *Conference on Design Automation*, pages 532–535, 2006.

- [55] T. Mizuno, J. Okumtura, and A. Toriumi. Experimental study of threshold voltage fluctuation due tostatistical variation of channel dopant number in mosfet's. *IEEE Transactions on Electron Devices*, 41(11):2216–2221, 1994.
- [56] A. Murakami, S. Kajihara, T. Sasao, I. Pomeranz, and S. M. Reddy. Selection of potentially testable path delay faults for test generation. In *IEEE International Test Conference*, page 376, 2000.
- [57] M. Orshansky, L. Milor, P. Chen, K. Keutzer, and C. Hu. Impact of systematic spatial intra-chip gate length variability on performance of high-speed digital circuits. In International Conference on Computer-Aided Design, pages 62-67, 2000.
- [58] A. Ramalingam, G. Nam, A. Singh, M. Orshansky, S. Nassif, and D. Pan. An accurate sparse matrix based framework for statistical static timing analysis. In *International Conference on Computer-Aided Design*, pages 231–236, 2006.
- [59] R. Rao, A. Srivastava, D. Blaauw, and D. Sylvester. Statistical estimation of leakage current considering inter- and intra-die process variation. In International Symposium on Low Power Electronics and Design, pages 84–89, 2003.
- [60] R. Rao, A. Srivastava, D. Blaauw, and D. Sylvester. Statistical analysis of subthreshold leakage current for vlsi circuits. *IEEE Transactions on Very*

Large Scale Integration (VLSI) Systems, 12(2):131–139, 2004.

- [61] SeDuMi: self-dual minimization. http://sedumi.mcmaster.ca/, seen in June, 2008.
- [62] D. Shamsi, P. Boufounos, and F. Koushanfar. Noninvasive leakage power tomography of integrated circuits by compressive sensing. In International Symposium on Low Power Electronics and Design, pages -, 2008.
- [63] D. Shamsi, P. Boufounos, and F. Koushanfar. Post-silicon timing characterization by compressed sensing. In International Conference on Computer-Aided Design, pages -, 2008.
- [64] M. Sharma and J. Patel. Bounding circuit delay by testing a very small subset of paths. In *IEEE VLSI Test Symposium*, pages 333–341, 2000.
- [65] M. Sharma and J. Patel. Finding a small set of longest testable paths that cover every gate. In *IEEE International Test Conference*, pages 974–982, 2002.
- [66] J.-B. Shyu, G. Temes, and K. Yao. Random errors in mos capacitors. IEEE Journal of Solid-State Circuits, 17(6):1070–1076, 1982.
- [67] SIS: Synthesis of both synchronous and asynchronous sequential circuits. http://embedded.eecs.berkeley.edu/pubs/downloads/sis/index.htm, seen in June, 2008.

- [68] SPGL1: A solver for sparse reconstruction. http://www.cs.ubc.ca/labs/scl/spgl1/, seen in June, 2008.
- [69] A. Srivastava, D. Sylvester, and D. Blaauw. Statistical optimization of leakage power considering process variations using dual-vth and sizing. In Conferenc on Design Automation e, pages 773–778, 2004.

- [70] J. L. Tsai, D. Baik, C.-P. Chen, and K. Saluja. A yield improvement methodology using pre- and post-silicon statistical clock scheduling. In *International Conference on Computer-Aided Design*, pages 611–618, 2004.
- [71] J. Tschanz, J. Kao, S. Narendra, R. Nair, D. Antoniadis, A. Chandrakasan, and V. De. Adaptive body bias for reducing impacts of die-to-die and withindie parameter variations on microprocessor frequency and leakage. pages 1396–1402, 2002.
- [72] E. van den Berg and M. P. Friedlander. Probing the pareto frontier for basis pursuit solutions. *To appear in SIAM J. on Scientific Computing*, 2008.
- [73] M. Vetterli. Wavelets, approximation, and compression. IEEE Signal Processing Magazine, 18(5):59–73, 2001.
- [74] P. Vuillod, L. Benini, and G. D. Micheli. Generalized matching from theory to application. In International Conference on Computer-Aided Design, pages 13–20, 1997.

- [75] R. Wagner, R. Baraniuk, S. Du, D. Johnson, and A. Cohen. An architecture for distributed wavelet analysis and processing in sensor networks. In International Conference on Information Processing in Sensor Networks, pages 243–250, 2006.
- [76] J. Xiong, V. Zolotov, and L. He. Robust extraction of spatial correlation.In International Symposium on Physical Design, pages 2–9, 2006.
- [77] K. Yang, K. Cheng, and L. Wang. Trangen: a sat-based atpg for pathoriented transition faults. In Conference on Asia South Pacific Design Automation, pages 92–97, 2004.
- [78] Y. Zhan, A. J. Strojwas, X. Li, L. T. Pileggi, D. Newmark, and M. Sharma. Correlation-aware statistical timing analysis with non-gaussian delay distributions. In *Conference on Design Automation*, pages 77–82, 2005.
- [79] W. Zhao, Y. Cao, F. Liu, K. Agarwal, D. Acharyya, and S. N. K. Nowka. Rigorous extraction of process variations for 65nm cmos design. In *European Solid State Device Research Conference*, pages 89–92, 2007.