Abstract-Negative bias temperature instability (NBTI) in pMOS transistors has become a major reliability concern in present-day digital circuit design. Further, with the recent introduction of Hf-based high-k dielectrics for gate leakage reduction, positive bias temperature instability (PBTI), the dual effect in nMOS transistors, has also reached significant levels. Consequently, designs are required to build in substantial guardbands in order to guarantee reliable operation over the lifetime of a chip, and these involve large area and power overheads. In this paper, we begin by proposing the use of adaptive body bias (ABB) and adaptive supply voltage (ASV) to maintain optimal performance of an aged circuit, and demonstrate its advantages over a guardbanding technique such as synthesis. We then present a hybrid approach, utilizing the merits of both ABB and synthesis, to ensure that the resultant circuit meets the performance constraints over its lifetime, and has a minimal area and power overhead, as compared with a nominally designed circuit.
I. INTRODUCTION
N EGATIVE BIAS temperature instability (NBTI) in pMOS devices has become a major reliability issue in sub-130-nm technologies. NBTI manifests itself as a temporal increase in the threshold voltage, , of a pMOS transistor, thereby causing circuit delays to degrade over time and exceed their specifications. A corresponding and dual effect, known as positive bias temperature instability (PBTI) [1] - [3] , is seen for nMOS devices, when a positive bias stress is applied across the gate oxide of the nMOS device. Although the impact of PBTI is lower than NBTI [2] , it is increasingly becoming important in its own right, particularly with the use of Hf-based high-k gate oxides for leakage reduction [1] , [3] . Collectively, NBTI and PBTI are referred to as bias temperature instability (BTI) effects.
Previous approaches to guardbanding a circuit and ensuring optimal performance over its lifetime, such as sizing [4] , [5] , and synthesis [6] 1 can be classified as "one-time" solutions that 
where and are weights associated with the area and the power objectives, respectively, while is the specified target delay that must be met at all times, up to the lifetime of the circuit . Under the framework of (1), both the synthesis and sizing optimizations lead to an increase in area and power, as compared with a nominally designed circuit that is built to meet the specification only at birth, and not necessarily over its entire life. The work in [6] argues that synthesis can lead to area and power savings, as compared with sizing optimizations. However, guardbanding (through sizing or synthesis) is performed during design time, and is a one-time fixed amount of padding added into the circuit in the form of gates with a higher drive strength. Inevitably, this results in large positive slacks during the initial stages of operation of the circuit, and therefore, larger-than-necessary area and power overheads, in comparison with a circuit designed to exactly meet the specifications throughout its lifetime.
We also note that while BTI effects cause the transistor threshold voltages to increase, resulting in larger delays, higher also implies lower subthreshold leakage . Therefore, both NBTI and PBTI cause the leakage of the circuit to decrease with time, thereby providing the opportunity to trade off this slack in leakage to restore the lost performance. Adaptive body bias (ABB) [7] provides an attractive solution to explore leakage-performance tradeoffs. Forward body bias (FBB) can be used to speed up a circuit [8] , by reducing the , thereby using up the available slack in the leakage budget. Further, the amount of FBB can be determined adaptively, based on the exact temporal degradation of the circuit, and requisite amounts of body bias can be applied to exactly meet the target specifications under all conditions.
The main advantage of a body bias scheme is that the performance can be recovered with a minimal increase in the area overhead as compared with "one-time" approaches such as sizing and synthesis. While [9] demonstrated that ABB could be used to allow the circuit to recover from voltage and temperature variations as well as aging, we believe our work is the first solution to take advantage of the reduction in leakage due to bias temperature instability (BTI). We demonstrate how ABB can be used to maintain the performance of the circuit over its lifetime, by determining the appropriate pMOS and nMOS body bias values (and supply voltages) at all times. We use a lookup table whose entries consist of the optimal body bias and supply voltages, indexed by the cumulative time of BTI stress on the circuit. Accordingly, we first propose an optimization algorithm to compute the entries of the lookup table, such that the delay specifications of the circuit are met throughout its lifetime and the power overhead is minimized. In contrast with the significant area cost for the synthesis-based method, the area overhead using this approach is limited to the lookup tables, body bias generation, and body bias routing networks and associated control circuitry, and is therefore minimal, while the power overhead is similar to that incurred by synthesis. Thus, we show that the adaptive compensation of circuit delay degradation due to aging provides a viable alternative to "one-time" fix techniques such as BTI-aware synthesis.
In the second approach, we propose an alternative hybrid formulation that combines adaptive techniques with synthesis. This iterative method first performs a power-constrained delay minimization through the application of FBB. This optimization recovers some amount of the performance degradation caused by aging by using the power slack that is created as the circuit ages. However, since this power-constrained optimization is not guaranteed to meet the delay specifications, technology mapping is used next to resynthesize the circuit to meet tighter timing specifications at birth. Using a new power specification, the iteration continues through alternate steps of FBB optimization and resynthesis until the timing specification is met. Our simulation results indicate that by combining the merits of the adaptive and synthesis-based approaches, the resulting circuit meets the performance constraints at all times, with only a minimal expense in the area and power. This paper is organized as follows. Section II outlines the impact of BTI on the delay and leakage of circuits, motivating a scheme for ensuring reliable operation. The effectiveness of FBB in maintaining optimal performance subject to power constraints is explored, and two optimization schemes are outlined, considering some combination of adaptive body bias (ABB), adaptive supply voltage (ASV), and resynthesis. Section III focuses on the control system implementation, while algorithms for circuit optimization are presented in Sections IV and V. Simulation results are presented in Section VI where we compare the area, delay, and power numbers, as a function of time, for the various approaches, followed by concluding remarks in Section VII.
II. BACKGROUND AND PROBLEM FORMULATION
We begin this section by determining the impact of NBTI on the delay and leakage of digital circuits. We then explore the potential of FBB to achieve power-performance tradeoffs, and accordingly formulate an optimization problem.
A. Impact of BTI on Delay and Leakage
At the transistor level, the reaction-diffusion (R-D) framework [10] , [11] has widely been used to determine the longterm impact of NBTI on the threshold voltage degradation of a pMOS device. Accordingly, the degradation for a pMOS transistor under dc stress increases asymptotically with time, , as [12] - [15] . We also use a PBTI model where the degradation mechanism is similar to NBTI, but the magnitude of degradation is lower. Specifically, in our simulations, the for a pMOS device after 10 seconds ( 3 years) of dc stress is 50 mV, while that for an nMOS device is 30 mV. The corresponding nominal values of the threshold voltages, based on PTM 45-nm model files [16] , are 411.8 mV for a pMOS device and 466 mV for an nMOS device. The numbers for maximal pMOS transistor degradation are computed using our multicycle model, which is described in detail in [15] , for a 45-nm device with an oxide thickness of 1.2 nm. Our model in [15] helps us compute the number of interface traps in a normalized manner, and is independent of the actual process parameters. Process parameters used in this work are taken from [11] , [14] , [17] , and [18] . Since NBTI affects the of pMOS devices, it alters the rising delay of a gate. Similarly, PBTI, which affects nMOS transistors, changes the falling delay of a gate.
At the gate level, we derive models for the delay and the leakage as functions of the transistor threshold voltages. We assume the worst-case degradation [4] model for all gates in the circuit, for reasons that will become apparent in Section III. The delay and leakage numbers for the degraded circuit are computed through SPICE simulations, at 105 C, at different times. Since BTI is enhanced with temperature, the library gates are characterized at the maximum operating temperature of the chip, assumed to be 105 C. The results from the above SPICE simulations are curve-fitted to obtain models for the delay and leakage as a function of the transistor threshold voltages. The gate delay is modeled as (2) where the sensitivity terms for each of the transistors in the gate, along the input-output path, are determined through a linear least-squares curve-fit. This first-order sensitivity-based model is accurate, and has an average error of 1% in comparison with the simulation results, within the ranges of degradation caused by BTI. Similarly, a model for leakage, , can be developed as (3) Note that the and values are functions of the supply voltage . The leakage numbers are experimentally verified to have an average error of 5% with respect to the SPICE simulated values.
At the circuit level, Fig. 1 shows the impact of BTI on the delay and leakage of an LGSYNTH93 benchmark "des" as a function of time. The delay and leakage of the uncompensated circuit at , are shown by flat dotted lines on each plot. The results indicate that the delay degrades by around 14%, whereas the subthreshold leakage reduces by around 50%, after three years of operation. We ignore the contribution of gate leakage current here, since neither BTI nor FBB impacts the gate leakage. Further, with the use of high-k dielectrics, gate leakage has been reduced by several orders of magnitude, making it negligible in comparison with the subthreshold and junction leakages.
B. Recovery in Performance Using FBB
At first glance, one may imagine that by returning the threshold voltage to its original value, FBB could be used to fully recover any degradation in the pMOS/nMOS transistor threshold voltage, bringing the and values of the device to their original levels, thereby completely restoring its performance and leakage characteristics, as depicted in Fig. 2(b) . The drain current for a pMOS transistor is plotted in Fig. 2 (a) for two distinct cases, i.e., when , and when , using different scales. Fig. 2(a) plots the currents as a function of pMOS , showing the reduction in the currents due to aging. Fig. 2(b) shows the increase in the on and off currents with the amount of forward body bias applied, computed when the transistor is maximally aged. When a FBB of 0.32 V is applied, this effectively sets to , and hence , and , where and are the nominal values. The change in junction capacitance and the subthreshold slope is assumed to be negligible within the ranges of the FBB voltages considered in this framework, based on the results in [8] , and [19] , respectively.
However, on closer examination, it is apparent that this is not the case, due to the effect of the substrate junction leakage. The results of applying FBB on a temporally degraded inverter (after three years of constant continuous stress on all the transistors) are shown in Fig. 3 . Fig. 3(a) shows the average delay 2 of the inverter, measured as , where and are the output rise and fall times, respectively, plotted against the body bias voltage . Here, we apply an equal to all devices. The value of the delay at zero body bias represents the delay of the aged circuit. The horizontal dotted line represents the delay specification, and after three years of maximal aging, the circuit clearly violates this requirement. At this point, the application of a of 0.3 V can restore the delay of the inverter to its original value. Fig. 3 (b) plots the corresponding total leakage power, consisting of the sum of the subthreshold leakage and the substrate junction leakage, under maximal degradation of both the nMOS and the pMOS transistors, The leakage computed at , i.e., with , shown by the horizontal dotted line, is chosen as the leakage budget: after three years of aging, the leakage value (shown at zero body bias) falls below this budget. The figure shows that with the application of , the leakage rises, and exceeds the budget at around 0.2 V. In particular, the exponential increase in substrate junction leakages with FBB leads to a sharp increase in the leakage beyond a certain point. This is due to the exponential increase in substrate junction leakages with forward body bias, as shown in Fig. 3(c) , which plots the individual components of leakage power, namely the subthreshold and junction leakages for the nMOS and pMOS devices, denoted as and , respectively. We ignore the contribution of gate leakage current to the leakage power overhead, since BTI and FBB both do not cause any impact on gate leakage. Further, with the use of high-k dielectrics, gate leakage has been reduced by several orders of magnitude, making it negligible in comparison with the subthreshold and junction leakages.
From Fig. 3 (a) and (b), it can be inferred that a complete recovery in the delay degradation of the circuit could cause the leakage current to exceed its nominal value. Simulation results indicate that our benchmark circuits require FBB of the order of (0.3-0.4 V), which leads to a large increase in the power dissipation, and can potentially exceed the available budget.
In other words, the sole use of ABB (FBB) to restore fully the performance of the circuit results in a substantial power overhead, particularly as we approach the lifetime of the circuit, where large values of FBB are necessary. The use of ASV in combination with ABB has been demonstrated to be more effective than using ABB individually [20] . Hence, we propose our first method, termed the "adaptive approach," that applies ASV in conjunction with ABB to minimize the total power overhead, while meeting the delay constraints throughout the circuit lifetime.
As we will see from the results in Section VI, while the adaptive approach provides area savings in comparison with the synthesis approach, the maximal power dissipation overhead is significant. Although ASV, when used in combination with ABB, tempers the exponential increase in junction leakages with FBB, the corresponding increases in cause the subthreshold leakage to increase exponentially, while the active power increases quadratically. Further, the amount of threshold voltage degradation has a second order dependence on the supply voltage, with larger leading to higher [21] . Our second approach further reduces the power dissipation by combining the merits of the adaptive and synthesis approaches, thereby trading off area with power. In particular, it supplements the use of ABB with synthesis, instead of ASV as in the adaptive approach, yielding improved tradeoffs. We refer to this as the "hybrid approach."
C. Adaptive Approach
Under the adaptive approach, the optimal choice of the values of the nMOS body bias voltage , the pMOS body bias voltage , and the supply voltage , to meet the performance constraint is such that the total power dissipation at all times is minimized. An optimization problem may be formulated as follows:
where and are the weighted active and leakage (subthreshold junction leakage) power values, respectively, while is the timing specification that must be met at all times. It can be intuitively seen that a solution to the optimization problem in (4) attempts to maintain the circuit delays to be as close to (but still lower than) the specification as possible, since any further reduction in delay using ABB/ASV is accompanied by a corresponding increase in the active and leakage power dissipation.
D. Hybrid Approach
The hybrid approach uses a combination of adaptive methods and presilicon synthesis to optimize the circuit for aging effects. The use of ASV results in a quadratic increase in the active power; in contrast, at reasonable design points, synthesis can provide delay improvements with subquadratic increases in the power dissipation. Therefore, the hybrid adaptive approach is restricted to the use of ABB only, at the nominal value. The hybrid approach employs synthesis and ABB in an iterative loop, tightly controlling the power increase in each step. For the ABB assignment step of the loop, the optimization formulation in (4) is recast within a power envelope, as a problem of delay minimization subject to power constraints Minimize s.t.
where denotes the leakage power budget. This budget is taken to be the peak leakage of the uncompensated circuit, i.e., its leakage at . Note that this effectively bounds the total power dissipation of the circuit to its value at , since the above optimization has a negligible effect on the active power dissipation.
The solution to the above optimization problem reduces the delay of the circuit under power constraints, but does not guarantee that the delays will be lower than . If this is the case, in a second step, the circuit is resynthesized to meet a heuristically chosen delay specification, tighter than , at .
The iteration continues until the optimization in (5) can guarantee that the compensated circuit meets over its entire lifetime.
III. CONTROL SYSTEM FOR ADAPTIVE COMPENSATION
In this section, we investigate how an adaptive control system can be implemented to guardband circuits against aging. Prior work in this area can be summarized as follows. A lookup tablebased approach that precomputes and stores the optimal ABB/ ASV/frequency values, to compensate for droop and temperature variations, is presented in [9] . An alternative approach [7] , [8] uses a replica of the critical path to measure and counter the effects of within-die and die-to-die variations. Techniques for sensor design have been addressed in [22] and [23] , which propose high-resolution on-chip sensors for capturing the effects of aging.
However, with increasing levels of intra-die variations, critical path replica-based test circuits require a large number of critical paths to provide an distribution that is identical to the original circuit, leading to an area overhead. Further, the critical paths in a circuit can dynamically change, based on the relative temporal degradation of the potentially critical paths. Adding every potentially critical path from the original circuit into the critical path replica may cause the test circuit to become extremely large. Apart from a high area overhead, such a large test circuit may incur its own variations that may be different from those in the original circuit.
Owing to these drawbacks, we propose the use of a lookup table-based implementation to determine the actual , and values that must be applied to the circuit to compensate for aging. The entries in the lookup table are indexed by the total time for which the circuit has been in operation. This time can be tracked by the operating system using a software routine, with representing the beginning of the lifetime of the circuit after burn-in, testing, and binning. The degradation in delays due to accelerated stresses at high temperature during burn-in are accounted for in determining , by adding an additional timing guardband. This software control enables the system to determine the total time for which the circuit has been operational. Such software controlled body biasing has been implemented in [24] and [25] while [26] presents an architecture for a hardware-software codesigned solution to dynamically adjust the circuit frequency and supply voltage, considering aging.
While control systems for ABB that monitor the voltage, temperature, frequency, etc., described in [7] , [8] , and [20] , depend on environmental on-chip variations or manufacturing variations, in this case, the lookup table is indexed by a parameter that depends on the cumulative time of operation of the circuit. Since this is very usage-specific, a operating system based software control that can track the total time of usage and appropriately generate signals to access the lookup table is ideal. A coarse software "counter" can be implemented to increment the total time of stress. The counter may be incremented based on the supply voltage and the on-chip temperature, as well, since these parameters determine the extent of interface trap generation. The counter resolution may not be very accurate, since BTI is a long-term degradation mechanism, and short-term variations in computing the time of stress do not affect the asymptotic delay and body-bias numbers.
The lookup table method requires the critical paths and the temporal delay degradation of the circuit to be known beforehand, to determine the entries of the table. It is impossible to determine, a priori, the exact temporal degradation of a circuit, since this depends on the stress patterns, which in turn depend on the percentage of time various circuit nodes are at logic levels 0 and 1. This percentage depends on the profile of computations executed by the circuit, and cannot be captured accurately by, for example, an average probabilistic analysis. The only guaranteed-pessimistic measure for BTI stress uses the worst-case degradation of the circuit. The method in [4] presents such a method, considering the impact of NBTI only, and determines the worst-case scenario by assuming maximal dc stress on every pMOS transistor. The idea can be extended to include maximal impact of PBTI on the nMOS transistors, as well, to compute the maximal degradation of the most critical path in the circuit. The worst-case method to estimate the maximal delay degradation after seconds of aging is computationally efficient, is input-vector-independent, and requires a single timing analysis run based on the degraded nMOS and pMOS values at . Due to the fact that this is guaranteed-pessimistic over all modes of circuit operation, the set of , and values in (4), determined using this number as a measure of in this formulation, is guaranteed to ensure that the circuit meets the delay specification under all operating conditions. The next sections describe the algorithms for the adaptive and the hybrid approaches to counter the impact of BTI. In Section IV, we first outline an algorithm for the adaptive approach to compute the optimal tuple entries in the lookup table at different times. We then investigate how further area-power tradeoffs can be achieved using the hybrid approach in Section V, and describe the implementation.
IV. OPTIMAL ABB/ASV COMPUTATION FOR THE ADAPTIVE APPROACH
We will begin by pictorially illustrating the idea of the adaptive approach. Fig. 4 shows the temporally degraded delay of the original circuit without ABB/ASV, where the delay monotonically increases with time, and violates for some . Fig. 4 shows how ABB/ASV may be applied at a time , to ensure that the delay degradation during the interval does not cause the circuit delay to exceed the specifications. The delay of the circuit immediately after applying ABB/ASV, based on the lookup table values at , is denoted as , and is guaranteed to always be less than . Similarly, is the delay of the circuit just before applying ABB/ASV at , and this typically touches . Considering the cumulative temporal degradation at , the impact of ABB/ASV applied at that time point, and the temporal degradation due to BTI over , we have (6) Fig. 4 . Plot of the delay of the original circuit, without adaptation, as a function of time, showing degradation due to BTI effects, and a schematic of our compensation scheme using ABB/ASV at three consecutive compensation time points t ; t and t , showing the delay of the compensated circuit as a function of time.
At every compensation time point , the amount of adaptation required is dependent on the delay degradation up to the next compensation time point , and follows the shape of the figure in Fig. 4 . In Fig. 4 , if no compensation is applied to the circuit, the delay during the interval will be above . To ensure that the delay meets specifications during this period, we apply a compensation at time , whose magnitude is determined by the following result.
Theorem 1: Under small perturbations to the threshold voltage due to aging, let be the delay of the aged circuit at any time , and assume that under a specific compensation, just prior to compensation time . To bring to be under the specification, the value of can be adjusted, through compensation, to ( 
7)
Proof: For a MOS device, , where the proportionality constant is different for nMOS and pMOS transistors. If we consider the effect of aging from time to , for a specific transistor type (8) Since the perturbations to over this interval are small (by assumption), the delay of each gate can reasonably be assumed to vary linearly with , as defined by a first-order Taylor series approximation. Therefore, the delay of each gate changes by a multiplicative factor, given by the right hand side of (8) , implying that the delay of the circuit also changes by the same multiplicative factor. In other words, if the delay at time is changed to (9) Since our goal is to set , the result follows immediately. Using static timing analysis (STA), determine , the delay due to BTI just prior to time .
8:
Use to determine the target delay at , upon the application of ABB/ASV. 9: Set .
10:
Determine ABB/ASV values to be applied at time to meet .
11:
Use an enumeration scheme, similar to [27] , to solve the optimization problem in (4) The adaptive strategy is developed at design time using the scheme shown in Algorithm 1, which shows the pseudocode for computing the optimal ABB/ASV values as the circuit ages. The algorithm begins by determining the amount of ABB/ASV that must be applied at the beginning of the lifetime of the circuit (after burn-in, testing, and binning), denoted by , to compensate for aging until the first time . This can be computed by determining the amount of change in the threshold voltage until (denoted as , and performing an STA run, to determine , as shown on lines 6 and 7 of the algorithm. The target delay after applying ABB/ASV is then computed, as shown on line 9, by applying the scaling factor from Theorem 1 to . As expected, . Line 11 uses an enumeration scheme, based on the method described in [27] , to determine the optimal ABB/ASV that must be applied at time . Line 14 computes the delay of the circuit just prior to time , i.e., , which is less than . The method is repeated for successive values of , and the lookup table entries are computed.
It should be pointed out that there is a second-order dependence between the level of degradation and [21] . The value of in the solution at depends on the delay degradation over [ ], which in turn depends on the degradation in during this interval, which is a function of the value at time . Hence, an iterative approach is employed, as illustrated by the repeat loop. The choice of the compensation time points depends on several factors. While we would like to continuously apply the requisite amount of compensation at all times, so as to just meet the performance constraints while minimizing the power overhead, in practice, the circuit can only be compensated at a finite number of time points, . The number of compensating times chosen, (i.e., the size of the lookup table) and their specific values is limited by the following factors.
• Resolution in Generating the Body-Bias and Supply Voltages: A large number of body bias and supply voltages require a sophisticated network of voltage generators and dividers, adding to the area and power overheads.
• Minimum Change in Delay Over
, Subject to Modeling Errors: Since the delay model has some inaccuracies, a control system with a large number of compensatory points, where the delay over a pair of such successive times changes very marginally, may lead to inaccurate computations, due to modeling errors 3 .
• Resolution of Mapping Each Delay to a Unique
Tuple: Since there is a fixed discretization in the values of each element of this tuple, each compensation step will reduce the delay by a quantum, and finer-grained delay compensation is not possible. Section VI-C explores the impact of the number of compensating points chosen on the temporal profiles of the delay and power of the circuit.
V. IMPLEMENTATION OF THE HYBRID APPROACH
While the adaptive framework provides considerable savings in area as compared with synthesis, the power overhead over the original circuit can still be appreciably large, as will be shown in Section VI. This is due to the fact that the reduction in delay through FBB is obtained at the expense of an exponential increase in leakage power, as seen in Fig. 3 , while an improvement in performance through ASV also results in an exponential increase in leakage power , as well as a quadratic increase in active power.
On the other hand, technology mapping can map the circuit to use gates with different functionalities and/or drive strengths. The use of this technique has empirically been seen to provide significant performance gains with low area and power overheads, for reasonable delay specifications. Hence, a combination of this synthesis technique with ABB has the potential to provide improved results.
Accordingly, we propose a hybrid approach to design reliable circuits. An iterative approach is followed during design, alternating between the ABB assignment and technology mapping phases, to ensure that the final design is reliable, and has minimal power and area overheads. The algorithm consists of two distinct phases, namely the adaptive compensation phase involving an optimization formulation subject to power constraints, and the resynthesis phase, involving technology mapping to meet a tighter design specification. Algorithm 2 describes the steps involved in this approach. Compute due to BTI at the nominal , and determine . 6:
Perform STA to determine the delay, , due to BTI, just prior to time , and determine the leakage power, . 7: Use an enumeration scheme, similar to [27] , to solve the optimization formulation in (5), i.e., to determine so as to minimize the delay, , while staying within the leakage budget from line 2. 8: Determine the delay before applying FBB at the next time point, i.e., . 9: end for 10: if all delays are 11:
The optimization has converged; output the computed FBB values to the look-up table. 12: else 13:
Resynthesis Phase 14:
Identify the highest and set to reduce .
15:
Tighten the delay specification for synthesis to ensure that after aging and subsequent adaptive compensation, .
16:
Perform technology mapping to resynthesize the circuit under the tighter delay specification, at .
17:
If leakage power of this new circuit at is greater than the original leakage budget computed in line 2, increase the budget accordingly. 18: Repeat from line 2.
19: end if
The algorithm begins with the adaptive compensation phase, where the ABB optimization formulation from (5) is solved. Lines 3-9 modify the framework of Algorithm 1 to compute the optimal values at different time points, instead of the optimal tuple, such that the delay is minimized without violating the leakage power constraints. If the delay of the circuit throughout its lifetime 4 is less than the specification , then the optimization ends and the optimal entries are used to populate the lookup table, as shown in lines 10 and 11.
However, if the delays are higher than , the circuit is technology-mapped to tighter design constraints, as shown in TABLE I  LOOKUP TABLE ENTRIES FOR THE LGSYNTH93 BENCHMARK "DES" USING THE ADAPTIVE AND HYBRID APPROACHES line 16. As a first order measure, the specification of the circuit is lowered from to , where is the maximum delay of the circuit over its lifetime, under the adaptive compensation scheme. If the leakage power of the circuit exceeds its budget value, the nominal value of the leakage power is updated, and this new value is used in (5) for , as shown in line 17, and adaptive compensation is now repeated on this modified circuit. The process of adaptive compensation (lines 3-9) and technologymapping for a tighter target delay (lines 13-16) is performed in an iterative manner, until the circuit delays converge, and the timing specifications are met at all times. In practice, only a few iterations are necessary before the delay converges, as seen from our experiments.
As we will demonstrate shortly, our experimental results indicate that this approach provides savings in area as compared with the synthesis approach, and dissipates lower power in comparison with the adaptive approach.
VI. EXPERIMENTAL RESULTS
We now present the results of applying our compensation scheme to circuits in the ISCAS85, LGSYNTH93, and ITC99 benchmark suites, synthesized on a 45-nm -based library [16] . The body bias voltage is altered in increments of 50 mV, while increments of 30 mV are used for the supply voltage.
A. Results on a Sample Benchmark Circuit
We present detailed experimental results on a representative LGSYNTH93 benchmark, "des," whose delay and leakage variations under BTI, without ABB/ASV compensation, were shown in Fig. 1 . Table Entries: Table I shows the entries of the lookup table that encodes the compensation scheme, and the delay, active, and leakage power numbers for the adaptive and the hybrid approaches. The circuit is compensated at different times, as shown in the first column of Table I , up to its of 10 s. The time-entries in the lookup table are chosen such that the increase in delay over any successive time-interval is uniform, and that the circuit is uniformly compensated for degradation, over its entire lifetime. A large starting value of 10 s, is chosen for the adaptive approaches, since the BTI model for estimating the delay degradation of the circuit in Algorithm 1 is asymptotically accurate. Further discussion on the optimality of the selection of the number of time-stamps to compensate the circuit, and its impact on the temporal delay-power curves is deferred to Section VI-C.
1) Lookup
The remaining columns of Table I show the details of the compensation scheme. Columns 2-7 correspond to the adaptive approach, and show, for each compensation time, the tuples computed by Algorithm 1, the final delay after applying ABB/ASV, and the active and leakage power values. Columns 8-12 show the results for the hybrid approach and display, respectively, the optimal pair, and the delay, active power, and leakage power at each compensation time. The first four columns of the table, (boldfaced, with a gray background), denote the actual entries that would be encoded into the lookup table for the adaptive approach, while the first, eighth, and the ninth columns denote the entries of the lookup table for the hybrid approach. The column "Delay" denotes the delay of the circuit , at the given compensation time , immediately after applying ABB/ASV values from the table.
The results indicate that the target delay is met at all time points, up to 10 s, using both the approaches. The amount of compensation increases with time, as the circuit degrades due to BTI. With the adaptive approach, which optimizes the power under fixed delay constraints, a combination of ABB and ASV is used to counter the effects of aging, on the original design, whose delay and power values are shown in the row labeled "Nominal." The active and leakage power values vary as a function of time, depending on the optimal solution chosen at each time point. As explained in Fig. 4 , the circuit is compensated for aging right from the first time period , by applying ABB/ASV at time . Hence, the delay of the circuit at in the lookup Table I less than . The leakage power decreases temporally due to increase in caused by BTI, but increases with ABB/ASV, and in our scheme, it is seen to exceed the nominal leakage. While the limits for FBB are not violated in the ABB numbers in Table I , since junction leakage rises exponentially with the FBB voltage, as seen from Fig. 3(c) , there is still a large leakage power overhead.
For the hybrid approach, which uses a combination of ABB and synthesis, the circuit at achieves its delay reduction purely through synthesis. It can be seen that the area overhead of synthesis in this case is low: as compared to the nominal case, the active power increases by 0.3% and the leakage power by 1.8%. The results indicate that the power numbers using the hybrid approach are significantly lower than that using the adaptive approach.
2) Comparison of Transient Power and Delay Numbers:
The temporal variation in the delay of "des" is shown in Fig. 5 . The delay of the circuit, as a function of time, is shown for the following:
• the adaptive method from Section II-C, where the delay can be seen to always be close to ; • the synthesis-based method from [6] , where worst-case BTI-based library gate delays were used during technology mapping to synthesize the circuit: in this case, the delay increases monotonically with time; • the fixed power case, corresponding to the results of solving the optimization problem in (5) , where the delay is minimized through ABB under a power budget, set to the power at : this curve does not satisfy the delay specification;
• the hybrid method from Section II-D, which satisfies the delay specification throughout its lifetime, and essentially corresponds to finding a power specification for a fixed power curve that meets at the end of the circuit lifetime. In this case, the power specification implies that the circuit is mapped to meet a delay specification of 330 ps at . All methods were targeted to meet the same delay specification, 355 ps, throughout the circuit lifetime and this value is shown by a horizontal line in Fig. 5 . This delay corresponds to the nominal delay of the original circuit at . Fig. 6(a) and (b) , respectively, compare the values of the active and leakage power for the three approaches (adaptive, hybrid, and synthesis). The horizontal line marked "Nominal" represents the power dissipation of the original circuit at . Since the synthesis approach performs technology mapping for a tighter delay specification at birth, leading to a large area, as compared with the nominal design, the active power for the synthesis approach is constant over the lifetime of the circuit. For the adaptive approach, the supply voltage generally (but not always) increases gradually with time, as shown in Table I . Correspondingly, the active power increases almost monotonically, as shown in Fig. 6(a) . One exception to the monotonicity of , as seen from Table I , is at 0.17 10 s, where the optimal tuple leads to a decrease in accompanied by a larger increase in , with respect to the solution at the previous time point, hence causing the active power to decrease temporally, as seen in Fig. 6. Fig. 6 indicates that the maximum active power dissipated using the adaptive approach is less than that for the synthesis-based design.
Similarly, Fig. 6 (b) compares the leakage of the various approaches over the lifetime of the circuit, with respect to its nominal value. The leakage of the synthesis-based circuit is highest at (when there is no BTI), but monotonically decreases with time. In contrast, the adaptive approach tries to adaptively recover performance, at the expense of increased power. The corresponding overhead implies that the leakage power for this method increases beyond its nominal value. Note that the leakage for the adaptive circuit at is also greater than the nominal value, since some amount of ABB/ASV is applied to the circuit to guardband against temporal degradation during , as shown in Fig. 4 . The maximum leakage power (at ) at any time point using our approach is almost identical to that using the synthesis method, as seen from Fig. 6(b) . For the hybrid approach, which uses a combination of synthesis and adaptive compensation, the results provide improvements over these two methods, used separately. As shown in Fig. 6(a) and (b) , respectively, the active and leakage power at increase very minimally (by less than 2%), as compared with the corresponding values for the original circuit, due to an increase in the area of the circuit during resynthesis. Subsequent adaptive compensation over the lifetime of the circuit is performed under fixed power constraints to ensure that the power never exceeds its value at . Hence, the curves for the overall leakage and active power, as functions of time, are closest to their corresponding budgets.
To illustrate how the hybrid approach works, let us consider the LGSYNTH93 benchmark "des". The nominal delay of the circuit is 355 ps, and the leakage power at is 327 W. We begin to apply the hybrid approach described in lines 4-9 in Algorithm 2 on "des". Using the fixed-power optimization formulation in (5), at , the delay of the circuit reduces only to 380 ps (from 415 ps) without violating the leakage power budget of 327 W, and the active power budget of 641 W. Hence, in order to meet the final target delay, the circuit is resynthesized by setting to 332 ps. Technology mapping is performed again, and the resulting circuit now has a 1% higher area overhead, 2% higher leakage overhead, and 3% higher leakage power overhead. We reapply the fixed-power optimization algorithm on this modified circuit, and the lifetime delay of the circuit subject to the new leakage power constraint of 327 1.02 W is 354 ps, which still meets the desired target. Thus using a combination of adaptive compensation and resynthesis, the circuit is optimally designed.
B. Area and Power Tradeoffs
In this section, we compare the tradeoffs in area and power for various approaches proposed in this paper, for the five largest benchmark circuits from ISCAS85 and LGSYNTH93 suites, as well as some large ITC99 benchmarks. Table II presents Table II indicates that the synthesis approach has a large average area overhead of 26%. However, the area overhead of the adaptive approach is restricted to the lookup tables, voltage generators for the additional supply voltages, and the body-bias voltages, and is therefore significantly smaller. The work in [7] has shown that this overhead is within 2%-3% of the area of the original design, while the work in [9] reports a 5% overhead for the adaptive bias control, lookup table implementation, and PLL for frequency control (which is not required in our control system). Thus, the adaptive approach provides significant area savings as compared with synthesis.
During optimization using the hybrid approach, the resynthesis (technology mapping) phase causes an increase in the area of the circuit, since the circuit is remapped to tighter specifications. The column "Reduction" in Table II indicates that using the hybrid approach, the target delay (at ) during the technology mapping phase is only 5% lower than the nominal delay of the circuit, whereas the target delay (at ) using BTI-aware synthesis is 15% less than the nominal delay. Expectedly, this small decrease in delay of 5% can be obtained with a marginal penalty in area (average value of the order of around 2%) for most circuits. 5 Hence, this overhead in area is extremely small, particularly when compared with that using synthesis.
The power numbers shown in the table indicate that while the adaptive and synthesis approaches have large power overheads, the power overhead using the hybrid approach is extremely small, with an average increase in active and leakage powers of the order of around 2%-3%, over the wide range of benchmarks tested. Thus, by combining the advantages of adaptive compensation and BTI-aware synthesis, we obtain an optimal final design whose area overhead is lower than that of the synthesis based approach, while the power overhead is lower than that of the adaptive approach, for the same delay specifications.
C. Optimal Selection of Lookup Table Entries
For the adaptive approach, the size of the lookup table (i.e., the number of entries) can be chosen according to various criteria, as discussed in Section IV. In this section, we investigate the impact of the size of the lookup table on the power and delay of the compensated circuit, using the adaptive approach. Accordingly, we perform simulations where the circuit is compensated at eight time points, instead of the 15 times chosen in Table I . The compensation time points correspond to alternate entries from the lookup table in Table I , and the corresponding tuples, found using Algorithm 1, are shown in Table III .
As expected, the results indicate that the delay of the circuit is still met at all times, but the optimal tuples, and the corresponding delay and power values, are different from the corresponding values in Table I . We compare these values by plotting the delay and power as functions of time in Fig. 7 . We refer to the adaptive approach with 15 entries in the lookup table as the "Fine-grained" method, and that with eight entries as the "Coarse-grained" method. We also consider an extreme coarsegrained approach, where ABB/ASV is only applied at , to ensure that the circuit meets its delay specifications over its lifetime: this can be considered as a lookup table with only one entry, at , and is referred to as the "One-time" approach. Fig. 7(a) shows the delays for all three of these approaches as a function of time.
By design, all methods meet the delay specification over the circuit lifetime, but as the granularity becomes coarser, the variation in circuit delay over time becomes larger, since the incremental delay degradation in each interval is higher, requiring larger changes to the tuple at each compensation time point, leading to larger swings for below . The active and leakage power profiles for the three cases are shown in Fig. 7(b) , and (c), respectively. These trends show that the peak power dissipation of the circuit over its lifetime, for both the active and leakage power, increase as the granularity becomes coarser. The fine-grained approach used in our work, with 15 compensation time points, therefore satisfies the requirements laid out in Section IV, while maintaining a small overhead in terms of the circuitry required for its implementation.
VII. CONCLUSION
BTI has become an important reliability concern in circuit design. Previous solutions in the presilicon design stage aimed at guaranteeing reliable circuit performance can lead to large area and associated power overheads. An adaptive approach that determines the temporal degradation of the circuit, and compensates for it, through ABB and ASV has been proposed in this work. The results indicate that by combining the adaptive and synthesis approaches, circuits can be efficiently guardbanded over their lifetime, with a minimal overhead in area, and a small increase in power, as compared with a circuit designed only to meet the nominal specifications. Further, techniques such as those in [27] may be used to apply ABB/ASV to simultaneously counter the impact of aging, as well as process and temperature variations.
