Abstract-An adaptive circuit can perform built-in selfdetection of timing variations and accordingly adjust itself to avoid timing violations. Compared with conventional over-design approach, adaptive circuit design is conceptually advantageous in terms of power-efficiency. Although the advantage has been witnessed in numerous previous works including test chips, adaptive design is far from being widely used in practice. A key reason is the lack of corresponding timing verification support. We develop new timing analysis techniques to fill this void. A main challenge is the large runtime complexity due to numerous adaptivity configurations. We propose several pruning and reduction techniques and apply them in conjunction with statistical static timing analysis (SSTA). The proposed method is validated on benchmark circuits including the recent ISPD'13 suite, which has circuit as large as 150K gates. The results show that our method can achieve orders of magnitude speedup over Monte Carlo simulation with about the same accuracy. It is also several times faster than an exhaustive application of SSTA.
I. INTRODUCTION
An adaptive circuit uses on-chip sensors to detect timing variations and autonomously compensates the variations by body biasing [1] , supply voltage tuning [2] , circuit reconfiguration [3] , etc. Unlike the conventional over-design approach, which applies extra power uniformly (or blindly) across entire circuit (or all fabricated chips) to cushion variations, adaptive circuits apply power differentially at each block according to individually observed performance, i.e., in a targeted manner. Evidently, adaptive circuit design provides variation resilience in a more power-efficient manner than over-design. This is especially true when the adaptivity is fine-grained (in blocks of hundreds/thousands of gates) and therefore has relatively precise compensation [2] , [3] .
Even with benefits demonstrated by test chips [1] , adaptive circuit design is far from being widely adopted in realistic products. A main reason is the lack of corresponding timing verification support. The unconventional and sophisticated nature of adaptive design implies a relatively large risk of design errors. Obviously, correct circuit functioning is more essential than potential power savings. As such, an adaptivityaware timing analysis tool is a fundamental premise for wide application of adaptive circuits designs. One may consider to use conventional timing analysis on adaptive circuit with some simple tweaks. Such an approach is either very restrictive or inefficient. For example, conventional timing analysis based on Monte Carlo simulation is directly applicable but very time consuming. Another naïve approach is to apply conventional timing analysis to each adaptivity mode. This approach is practical only if the adaptivity is coarse-grained and the number of adaptivity blocks is small.
In this work, we attempt to find a general and practical approach to timing verification for adaptive circuit designs. Since adaptivity is operated according to observed variations, variations need to be taken into consideration in the timing analysis. Fortunately, statistical static timing analysis (SSTA) has been an active research subject [4] , [5] and these research works produce a rich body of techniques. Our study starts with an SSTA-based adaptivity enumeration approach. Then, we explore several pruning and reduction techniques in order to decrease computation cost. To our best knowledge, this is the first systematic effort on timing analysis for adaptive circuit design. The proposed techniques are tested on ISCAS'85 and ISPD'13 benchmark circuits, which include cases as large as 150K gates. Our technique is orders of magnitude faster than Monte Carlo simulation and provides very similar accuracy. Compared to SSTA-based exhaustive enumeration, our method yields almost identical result with several times less runtime.
II. PROBLEM FORMULATION
Given a combinational logic circuit design C composed by a set of adaptivity blocks {b 1 , b 2 , ..., b n }, timing constraints T and certain delay models, a fundamental objective of timing verification is to find whether or not C satisfies T . When variations are considered, the objective is often changed to find the probability that C satisfies T , i.e., timing yield [5] .
If C is an adaptive circuit, it includes variation sensors [6] , [7] , circuit tuning mechanisms [1] , [2] , and an adaptivity policy. Without loss of generality, we assume the adaptive tuning is offline, i.e., it is performed at power-on or circuit idle time between normal operations. The input to an adaptivity policy is an integer vector x = (x 1 , x 2 , ..., x m ) T resulting from m variation sensors. For example, x i = 1 means that sensor i detects a high risk of timing violation. The set of all possible sensor observation vectors is denoted by X. If there are n adaptivity blocks in C, the output of adaptivity control is a vector
T where f i specifies adaptivity configuration for block b i . The value of f i is an element of a set of adaptivity configurations {φ 0 , φ 1 , ..., φ q }, where φ 0 means no adaptivity action. For example, f j = φ 0 (f j = φ 1 ) chooses low (high) VDD for block b j . All elements in the same adaptivity block follow the same adaptivity configuration. The set of all possible adaptivity configurations is represented by F . Then the adaptivity policy can be described by a function Π : X → F . Timing verification for adaptive circuit is to find the probability that C satisfies T for a given adaptivity policy Π. In practice, an adaptivity policy is typically simple rulebased [1] - [3] in order to avoid large implementation overhead. Our techniques can handle different adaptivity policies by querying them as black-box routines.
III. BASIC APPROACHES
When variations are considered, each component delay becomes a random variable. In this work, we assume the random variables follow Gaussian distribution, which is a reasonable approximation [5] . We consider spatial correlations among variations using Principal Component Analysis (PCA) as in [4] . A naïve approach is Monte Carlo simulation, where each run emulates one chip instance including its adaptivity actions and timing analysis is performed for this instance. To obtain high statistical confidence level, the number of runs is typically very large. Monte Carlo simulation will be used as a baseline for accuracy and runtime comparison in this work. Now we discuss an adaptivity scenario enumeration approach. In this approach, we perform SSTA on a circuit for each adaptivity scenario individually, and then assemble the results into the overall timing yield. If there are m variation sensors and each sensor has up to p levels of output, there could be at most |X| = p m variation observation scenarios. If there are n adaptivity blocks and each block has up to q configuration options, there are at most |F | = q n configuration scenarios. Then, there are min(|X|, |F |) adaptivity scenarios.
SSTA is first conducted for the circuit without any adaptivity actions. Then, a probability density function (PDF) of timing slack can be obtained for each node of the circuit, including all sensor nodes. From these PDFs, one can estimate the probability of each sensor observation P ( x k ), which is also the probability of an adaptivity scenario.
By running SSTA on each scenario, we can obtain the yield Y ( f k ) for each configuration corresponding to observation x k . Then, the overall circuit timing yield can be estimated by
IV. PRUNING AND REDUCTION TECHNIQUES

A. Adaptivity Configuration Pruning
Suppose circuit C has n adaptivity blocks such that all gates in the same block follow the same adaptivity actions. We use f (b i ) j = φ j to represent adaptivity configuration j for block 
B. Circuit Reduction
Circuits can be reduced from the timing point of view. For example some serial/parallel timing arcs can be merged [8] . Also, the timing arcs that are never critical even under variations can be neglected.
C. Circuit Partitioning
The timing analysis complexity can be reduced if adaptivity blocks are independent of each other. If a circuit is composed by n parallel and almost independent adaptivity blocks, the adaptivity configuration of a block does not need to be considered in conjunction with other blocks. If a block has q configurations, we only need to examine q · n scenarios. By contrast, a full enumeration for a general case needs to check q n scenarios.
D. Block Merging
Some blocks can be merged into a virtual block in the adaptivity enumeration without affecting accuracy. This is based on the following concepts and observations. 
V. OVERALL TIMING ANALYSIS ALGORITHM
Our timing analysis algorithm starts from the exhaustive adaptivity enumeration described in Section III and is enhanced by the pruning and reduction techniques introduced in Section IV. The pseudo code of our timing analysis is shown in Algorithm 1. At the very beginning, we set the circuit to no adaptivity φ 0 and perform SSTA. Then, we can estimate the probability of each adaptivity configuration. In addition, the SSTA result can help the circuit reduction (step 3). Next, we partition the circuit according to Section IV-C. Steps 5-29 describe how to compute timing yield for each partition.
At the beginning for each partition, we first merge blocks according to Section IV-D. A set R i is to keep track of all robust adaptivity configurations for this partition (step 7). Starting from all blocks with φ 0 , we enumerate different adaptivity configurations with increasing number of changing blocks (step 8, 9 and 10). For example, if there are 4 blocks, we first examine configurations with only 1 block having adaptivity beyond φ 0 while all the other blocks remain at φ 0 . There are C 4 1 = 4 such configurations. Next, we examine combination C 4 2 blocks with adaptivity beyond φ 0 , and so on. For each combination of changing blocks, we enumerate all adaptivity configurations in a non-descending order of dominance. In steps 13 and 14, we apply a change while keeping the other blocks at φ 0 . If the new configuration dominates anyone in R i or its probability is less than a threshold δ, we simply skip it (step 16 and 17). Otherwise, SSTA is performed for this configuration. If this configuration is robust, it is added to R i (step 21 and 22). At step 28, timing yield of a partition is obtained according to the yield of each configuration and probability of each configuration.
Step 30 returns the final timing yield of the entire circuit.
Input : Circuit C in n blocks B = {b 1 
Algorithm 1: Timing analysis for adaptive circuit design.
VI. EXPERIMENT
We evaluate the effectiveness of our approach by experimental comparisons on public benchmark circuits. Since there is no previous work on timing analysis for adaptive circuits, to the best of our knowledge, we compare the following approaches.
• Monte Carlo simulation. Spatial correlation among variations is considered. We use Monte Carlo simulation as a baseline for evaluating the accuracy and runtime of our approach.
• Exhaustive SSTA. SSTA is performed for every adaptivity configuration (Section III).
• Ours. Adaptivity configurations are enumerated with pruning/reduction, and evaluated by SSTA (Algorithm 1).
The experiments are performed on ISCAS'85 and IS-PD'13 [9] benchmark circuits, with the largest case having about 150K gates. We use the Elmore delay model and gate RC values are obtained from the ISPD'13 benchmarks. These circuits are placed by Feng Shui placer [10] , where spatial correlation model is extracted. We consider process variations including gate length variation, whose standard deviation σ is 5% of nominal value, and gate width variation with σ being 2.7% of nominal value. Monte Carlo simulations are performed 10K and 20K iterations for small and large circuits, respectively. The probability threshold δ for neglecting an adaptivity configuration is 0.0001. All of the methods are implemented in C++ and the program runs on a Linux server with 4 AMD Opteron processors of 2.2GHz operating frequency and 256GB memory.
The experimental results from ISCAS'85 circuits are shown in Table I and II, with tight and loose timing constraints, respectively. In both cases, the error from our method is about 0.9% compared with Monte Carlo results. Speedups of 32× and 64× are achieved for tight and loose timing constraints, respectively. Greater speedup is obtained for cases with loose constraints as they have less timing critical paths and allow more pruning and reduction. The speedup is not uniform among different circuits as the pruning/reduction highly depends on circuit structure. Comparing with the exhaustive SSTA, our approach is several times faster and the result difference is one order of magnitude less than the error with respect to Monte Carlo method. This is because our method is derived from the exhaustive approach.
Results from ISPD'13 circuits are in Table III and IV. Again, the two tables are for different timing constraints. The speedup compared to Monte Carlo is hundreds and sometimes thousands. The error is greater, but still less than 4% most of time. Although the exhaustive SSTA approach is usually faster than Monte Carlo, its runtime scales poorly with circuit size and it cannot finish in three hours for a large case.
VII. CONCLUSIONS
Adaptive circuit design is a promising approach to handling variations with high power-efficiency. This work proposes the first systematic timing verification method to assist adaptive design, to the best of our knowledge. Since an adaptive circuit may have many configurations, analyzing all cases can be very time consuming. We propose a set of pruning and reduction techniques. These techniques are validated on benchmark circuits of up to 150K gates. The results show that our method provides order of magnitudes speedup compared to Monte Carlo with very small errors. 
