Abstract-Recent advances in statistical timing analysis (SSTA) achieve great success in computing arrival times under variations by extending sum and maximum operations to random variables. It remains a challenge problem to apply such results in order to address the variability in circuit optimizations. In this paper, we study the statistical retiming problem, where retiming is a powerful sequential transformation that relocates flip-flops in a circuit without changing its functionality. We formulate the risk aversion min-period retiming problem under process variations based on conventional two-stage stochastic program with fixed recourse and a risk aversion objective of the clock period. We prove that the proposed problem is an integer convex program, show that the subgradient of the objective function can be derived from the combinational paths with the maximum path delay, and present a heuristic incremental algorithm to solve the proposed problem. Our approach can handle arbitrary gate delay model under process variations through sampling from a black-box and the effectiveness is confirmed by the experimental results. Further more, we point out how the current state-of-the-art SSTA techniques could be improved for future optimization algorithms when analytical models are available.
I. INTRODUCTION
With aggressive scaling down of VLSI feature sizes, process variations have become a critical issue in VLSI fabrication that the designers must face. Because of the increasing variations, chip characteristics, e.g. the clock period and the power consumption, fall into larger intervals instead of being single values or within narrow ranges. As statistical analysis approaches enable the designer to analysis the variations, statistical optimization algorithms will finally equip the designers with the necessary tools to control such stochastic effects in order to improve chip yield and system reliability.
A review of the recent advances in statistical timing analysis can be found in the paper [1] . In summary, state-of-the-art SSTA algorithms can achieve a good balance in terms of accuracy and efficiency to compute the arrival times and the clock period under variations by extending the sum and maximum operations to random variables. To advance SSTA in order to handle other aspects of the circuits and to apply SSTA for efficient statistical optimization remain the challenge problems.
Conventional circuit optimizations have been extended to address the issue of variability through statistical optimization. For example, for the gate sizing problems, numerous approaches have been proposed for statistical sizing optimizations. In [2] , Lagrangian relaxation based sizing technique was extended through introducing a safety margin to the circuit timing according to path delay variations. In [3] , [4] , [5] , sensitivity guided iterative improvement heuristics were developed. Convex formulations with provable global optimality were proposed in [6] , [7] to optimize the circuit under the worst cases considering variations, and in [8] through conventional two-stage stochastic programs with fixed recourse [9] to optimize the binning yield. With those successes, it is natural to ask if such approaches can provide insights into future SSTA researches and can be extended to other deterministic optimizations.
Among the many deterministic optimization techniques, retiming [10] is one of the most powerful sequential transformations. Intuitively, one can apply retiming to improve the timing yield of a circuit under process variations since relocating the FFs could * The research was conducted at Northwestern University and supported in part by NSF under CNS-0613967.
balance the combinational paths. Such idea was previously explored by Wang and Zhou [11] , where a heuristic algorithm was proposed by combining SSTA with a deterministic min-period retiming algorithm [12] . However, there is little theoretical guarantee that such heuristic would result in a good retiming solution. In this paper, we study the statistical retiming problem with a more sound theoretical basis and propose a new heuristic algorithm to optimize the circuit for better clock period distribution under process variations. Our contributions in this paper include: 1) We formulate the risk aversion min-period retiming problem for statistical retiming optimization. Our formulation is based on conventional two-stage stochastic programs with fixed recourse. A coherent measure of risk called conditional value-at-risk [13] is used as the objective function to be minimized. We prove that the proposed problem is an integer convex program by presenting a continuous convex relaxation of it. 2) We derive an analytical formula for the subgradient of the objective function based on the continuous relaxation. Compared to the previous works [4] , [8] where the subgradients are computed through perturbing the circuits and evaluating the changes statistically for multiple times, our approach is much more efficient. We can use any random gate delay model through sampling from a black box model representing the underlying variation model. Moreover, we show how current SSTA techniques can be improved to further speed-up the subgradient computation and thus the statistical retiming optimizations. 3) We extend the concept of timing critical paths, which is essential for deterministic retiming optimizations [10] , [12] , [14] , to a statistical sense. We propose a practical simplification of the concept such that it can be efficiently integrated into our algorithm for the risk aversion min-period retiming problem. 4) We propose the Incremental Risk Aversion Retiming algorithm to solve the risk aversion min-period retiming problem heuristically, guided by the subgradient and the statistical timing critical paths. The rest of this paper is organized as follows. The retiming problems, the two-stage stochastic programs, and the coherent measure of risk are introduced in Section II. We formulate the risk aversion min-period retiming problem in Section III and show that it is a integer convex program in Section IV. We propose our Incremental Risk Aversion Retiming algorithm in Section V. After experimental results are given in Section VI, Section VII concludes the paper.
II. PRELIMINARIES

A. Deterministic Retiming Problems
For retiming, a synchronous sequential circuit is modeled by a directed graph G = (V, E) as in Leiserson and Saxe [10] . The vertices V represent combinational gates and the edges E represent signals between vertices. The gate delays are given as the nonnegative vertex weights d : V → R * . The numbers of FFs on the signals are given as the nonnegative edge weights w : E → N.
To guarantee that the circuit functionality will be preserved after FF relocation, a retiming is given by a vertex labeling r : V → Z, which represents the number of FFs moved backward over each gate from its fanouts to its fanins. The FF number on the edge (u, v) after retiming is wr(u, v) = w(u, v)+r(v)−r(u). The retiming r is valid iff the FF number of every edge remains nonnegative,
(1) For a given valid retiming r, the retimed circuit works under a given clock period φ iff the maximum combinational path delay in the circuit is at most φ. In such case, the retiming r is called feasible for φ. To compute the maximum path delay, arrival times t : V → R are introduced at the outputs of all the gates. The following constraints should be satisfied.
wr (u, v 
Conventionally, the min-period retiming problem asks for a minimum clock period such that there exists a feasible retiming for it, and the min-area retiming problem asks for a feasible retiming for a given clock period to minimize the total FF area. To solve both problems, it would be helpful to investigate the timing critical paths, each of which is a directed path connecting two vertices satisfying that the total number of FFs along the path is the minimum among all the paths with the same endpoints, and the total delay along the path exceeds the desired clock period. For any pair of vertices, if there exists a timing critical path connecting them, a feasible retiming will require at least 1 FF along the path. However, because it is usually expensive and sometimes prohibitive to generate all such critical timing constraints, practically efficient algorithms [12] , [14] are able to identify those critical timing constraints from Eq. (2) only when they are required and to organize them into proper data structures in order to guide the optimization and to assert optimality with a low storage overhead.
B. Two-Stage Stochastic Program with Fixed Recourse and Coherent Measure of Risk
A decision problem whose output depends not only on the decision itself but also some uncertain parameters not available at the time of decision making is usually formulated as a two-stage stochastic program with fixed recourse [9] , [15] , [16] , [17] . In such programs, the uncertain parameters unknown at the time of decision are modeled as random variables. The program is separated into two stages as suggested by its name. In the first stage, a decision is made and will incur an initial cost. In the second stage, the uncertain parameters are realized and a second stage cost is determined from both the decision and the realized uncertain parameters through a known deterministic program, i.e. the fixed recourse. The objective of the program is to make a decision in the first stage to minimize the "total cost" of the two stages -as the outcome is random, such cost may have many possible interpretations.
Let X be the random variable representing the outcome of the two-stage stochastic program. An interpretation can be formalized by introducing a measure of risk M [X] that maps X into a real number. Usually a random variable mapped to a smaller value is better than the ones mapped to larger values. For example, a family of the most popular measures are the following ones involving the mean and the standard deviations of the random variable X,
On the other hand, if X represents the random clock period of a circuit under process variations, given a target clock period φ, one can measure X by the timing yield, i.e.,Yield φ [X] Δ = P (X ≤ φ). However, as pointed out by Rockafellar [13] , the above measures are not favorable objectives for optimizations because they are not coherent. A coherent measure of risk should satisfy a few conditions as follows.
Definition 1 (Coherent Measure of Risk [13] ): A measure of risk M is coherent in the basic sense if,
Intuitively, the first condition indicates that if a deterministic value is treated as a random variable taking a single value, the measure should interpret it by the deterministic value; the second condition requires that the measure to be convex with respect to the random variables; the third condition ensures the measure to be monotonic, i.e., if one random variable is no smaller than the other with probability 1, the measure of the former should be no smaller than that of the latter; the fourth condition guarantees that if a random variable can be approximated by some other random variables, one will accept it when all the approximations are acceptable; and the fifth condition implies that the measure is insensitive to scaling.
Coherent measures of risk do exist. For example, the expectation EˆX˜of the random variable X is a coherent measure of risk, though it is feeble and cannot capture the risk associated with X. A more interesting coherent measure of risk, as proposed by Rockafellar [13] , is the conditional value-at-risk that measures the risk in a random variable beyond a risk aversion level α. For a risk aversion level α, a measure of risk VaR α[X ] called value-at-risk is first defined as the value satisfying that P (X ≤ VaRα[X]) = α. Intuitively, the value-at-risk measure can be treated as the inverse of the timing yield measure: while the timing yield measure computes the risk aversion level (the timing yield) from a given value (the target clock period), the value-at-risk measure computes the value at a given risk aversion level. The value-at-risk measure is not coherent, the conditional valueat-risk measure, which is coherent, is defined as follows based on value-at-risk:
One can prove that [13] ,
+˜.
(3) Therefore, if the conditional value-at-risk should be minimized, it has the advantage to optimize both the value-at-risk and the tail beyond the value-at-risk.
III. PROBLEM FORMULATION
Under process variations, the delays of the gates in the circuit are no longer deterministic but random variables. Let Ω be the probabilistic space representing process variations. For a particular variation ω ∈ Ω, assume that the random gate delays are realized as the deterministic nonnegative vertex weights dω : V → R * . For a valid retiming r, let the minimum clock period for the retimed circuit under the variation ω be φω(r). According to Section II-B, we formulate the following risk aversion min-period retiming problem as a two-stage stochastic program with fixed recourse.
Problem 1 (Risk Aversion Min-Period Retiming):
Given a risk aversion level of α, find an integer-valued vertex labeling r for the following program:
where for every variation ω belonging to the probabilistic space Ω, φω(r) is the minimum objective of the following program, Minimize φ s.t.
It is clear that in the first stage of the risk aversion min-period retiming problem, a valid retiming will be chosen with an initial cost of 0. In the second stage, when the random gate delays are realized as dω, the minimum clock period is computed through the fixed recourse by solving the second stage program. The second stage cost is the coherent risk aversion measure of the minimum clock period.
IV. A CONVEX RELAXATION
Note that the proposed risk aversion min-period retiming problem is difficult, not only because r should be integer-valued, but also because the second stage program is not a mathematical program. To overcome such difficulty, we propose to relax the program before attempting to solve it.
A. Continuous Relaxation Formulation
Consider an arbitrary simple path p from u to v, i.e. a path without cycles, in the circuit graph G. Let the total number of FFs along the path be w(p). Let the total path delay be dω(p) for a particular ω ∈ Ω. Then for a valid retiming r, the minimum clock period should satisfy
is the total number of FFs along the path p in the retimed circuit. Moreover, because r is valid, we should have that wr(p) ≥ 0. On the other hand, there must exist a combinational path p * in the retimed circuit with the maximum combinational path delay. For such path p * , it must satisfy that wr(p * ) = 0 and dω(p * ) = φω(r). Therefore, we have the following lemma that transforms the second stage program into a mathematical program by enumerating paths.
Lemma 1: For a valid retiming r, the minimum clock period for a particular ω ∈ Ω can be computed as that,
Note that although the second stage program as formulated in Lemma 1 is a mathematical program, its size is exponential in terms of the size of the circuit graph G, while the size of the second stage program as formulated in Problem 1 is linear. As the mathematical program formulation will be only applied to theoretical analysis, its size will not be a concern for practical implementations. Based on Eq. (4), we can relax the requirement that r should be integer-valued by extending φω(r) to real-valued r. First, we define a real-valued r to be valid iff wr(u, v) ≥ 0 holds for every edge (u, v) ∈ E. Then for any simple path p from u to v in G, it remains true that wr(p) ≥ 0. Therefore, for any valid real-valued r, we can define φω(r) using the same equation as Eq. (4). In summary, we have the following continuous relaxation of the risk aversion minperiod retiming problem.
Problem 2: Given a risk aversion level of α, find a real-valued vertex labeling r for the following program:
where for every variation ω belonging to the probabilistic space Ω,
B. Convexity of Formulation
A very important property of Problem 2 is that it is a convex program. The proof is as follows.
Let r and r both be valid and real-valued. For a particular ω ∈ Ω, assume that for the simple path pω from uω to vω, we have that
. Let the vertex labeling sω : V → {−1, 0, 1} be that sω(uω) = 1, sω(vω) = −1, and sω(x) = 0 for any other x ∈ V . On the other hand, we should have φω(r ) ≥
Given 
Thus,
herefore, the following lemma must hold.
Then gr is a subgradient of CVaRα[φω(r)].
As the set of all the valid real-valued r is convex, we have the following theorem according to Lemma 2.
Theorem 1: Problem 2 is a convex program. The optimal solution to the risk aversion min-period retiming problem is an integer optimal solution to Problem 2.
V. INCREMENTAL ALGORITHM FOR RISK AVERSION MIN-PERIOD RETIMING
We have shown in Section IV that the risk aversion min-period retiming is an integer convex program and derived a subgradient of the objective function. Intuitively, such subgradient can be used to guide iterative heuristic searches. In this section, we will first show the method to compute the subgradient in practice and then present a heuristic algorithm based on the idea of incremental retiming.
A. Computing Subgradient from Black Box Model
According to Lemma 2, the subgradient of the objective function can be computed as Eq. (9) . Obviously, how to compute such subgradient in practice depends on how the probability space Ω and the random gate delays are specified. There are two typical models, where in the first model, the joint distribution of the gate delays is explicitly given, and in the second model, one can only obtain knowledge of the distribution by drawing independent samples from a black box. In this paper, we are interested in the latter black box model because the black box model is independent of the underlying distribution and thus our proposed algorithm can handle arbitrary variation models. Moreover, since the subgradient will be computed from each sample drawn from the block box model, established deterministic analysis frameworks can be reused. One may be concerned about the efficiency of the algorithms relying on the black box model because of the multiple samplings. However, since the subgradient is used to guide the optimization, absolute accuracy is not necessary and a limited number of samples will be suffice for effective optimizations. Note that in case of the former model where the distribution is explicitly given, it would be helpful if current SSTA techniques can be extended to compute the subgradient 5B-4 according to Eq. (9) efficiently and accurately. Such extensions are out of the scope of this paper and are left as one of the future directions of SSTA. Identify the sink vi of a critical path whose delay is φω i . The source of the path ui ← q(vi).
5
A ← the maximum value such that |{i :
If φω i ≥ A:
. Fig. 1 . The ComputeSubgrad subroutine.
Since the risk aversion min-period retiming problem requires an integer solution of Problem 2, we maintain an integer solution through our algorithm. Therefore, only the subgradients at integer solutions should be computed. Let r be a valid retiming. Eq. (9) suggests that the subgradient can be approximated by taking the average of the corresponding values from the individual samples. Suppose that N samples, ωi, i = 1, 2, . . . , N, are independently drawn from the black box. We design the ComputeSubgrad subroutine as shown in Fig. 1 to obtained an approximationĝr of gr by averaging the samples. In this algorithm, after the samples are drawn, we perform timing analysis on line 3 for each sample to determine the minimum clock period and the arrival times according to Eq. (2). As the same time, we maintain a vertex labeling q(v) to record the source of the critical path to the vertex v. Then, the combinational path with the maximum path delay is identified implicitly on line 4 by its endpoints ui and vi. Note that the path delay should be φω i and there is no FF along the path in the retimed circuit. An approximation of VaRα[φω] is obtained on line 5. Finally, in the loop on line 7, we compute an approximationĝr of the subgradient gr according to Eq. (9) .
Note that many previous statistical optimization works, e.g. [4] , [8] , employed a different approach to approximate such subgradient. For a decision variable, the previous approaches will first perturb the variable and then approximate the subgradient of this variable by the change of the objective function under such perturbation. Such approach incurs large runtime overhead because, first, although for some decision variables, perturbation will not change the objective function and thus they can be excluded from the above computation, the number of the decision variables that the above computation must be applied to will increase as the circuit size increases; second, evaluating the objective function usually requires expensive SSTA algorithms. On the other hand, our analytical formula for the subgradient, as in Eq. (9), allows us to compute the subgradient comparably efficiently via sampling from a black box model. Moreover, as mentioned before, the efficiency of our approach can be further improved with future relevant SSTA researches.
B. Statistical Timing Critical Paths
An intuitive idea for optimization is to iteratively improve a valid retiming r following the subgradient obtained via the ComputeSubgrad subroutine by solving the following problem for r .
Minimize
In this problem, the objective function is an first-order approximation of CVaRα[φω(r )]. As this first-order approximation would become inaccurate when r is faraway from r, we require the difference between r and r to be at most 1. Because the constraints are a system of difference inequalities, this problem can be solved by network-flow techniques and it is not necessary to round a non-integer solution to an integer one for a valid retiming since there always exists an integer-valued optimal solution.
However, this intuitive idea does not perform well in practice. The reason is that even changing r(v) by 1 for some vertex v will result in huge changes in the minimum clock period. Cutting plane techniques, similar to the statistical gate sizing work [8] , can be applied to form a more accurate approximation. However, such techniques no longer guarantee the existence of an integer-valued optimal solution and may require a heuristic to round a non-integer optimal solution. Therefore, they cannot be applied directly to our retiming problem.
We propose to overcome such difficulty by introducing the concept of statistical timing critical paths. These paths can be treated as a natural extension of the deterministic timing critical paths as mentioned in Section II-A to the statistical sense. Let r be the current valid retiming. Consider a simple path p in G. (10), i.e., they are a system of difference inequalities. Therefore, the existence of an integer-valued optimal solution is still guaranteed.
C. Incremental Risk Aversion Retiming Algorithm
One difficulty of the constraints in Eq. (11) is that since the risk measure should be computed for many simple paths, it could be inefficient in practice. We propose to simplify the computation in our implementation by identifying similar paths through deterministic timing analysis. Let d(v) = Eˆdω(v)˜be the nominal delay for each gate v. For a simple path p, let d(p) be the nominal path delay with respect to the nominal gate delays d. For a given valid retiming r, let φ(r) be the nominal minimum clock period, i.e., the minimum clock period with respect to d. Then, we assume a simple path to be a statistical timing critical path if d(p) > βφ(r), where β ≥ 1 is a parameter specified by the designer. In summary, given a valid retiming r, we propose to solve the following incremental retiming problem to obtain another valid retiming r in order to improve the conditional value-at-risk measure of risk.
Problem 3:
. In Problem 3, since path enumeration is required to construct the constraints, the number of the constraints can be quadratic in terms of the number of the vertices, i.e. Θ(|V | 2 ). This may impose huge storage and runtime overhead if we are going to solve Problem 3 directly. However, we can treat Problem 3 as a special min-area retiming problem and apply a recently discovered incremental minarea retiming algorithm iMinArea [14] to solve it. The iMinArea Subroutine IncreRetime Inputs G : the circuit graph. d : the nominal gate delay. r : a valid retiming.ĝr : the approximation of the subgradient. β : a designer specified parameter.
Outputs
The optimal solution r of Problem 3. algorithm requires only O(|V |) storage on top of the circuit graph G and is efficient in practice. Letĝr(v) represents the increase of FF area when 1 FF is moved from the fanouts of v to its fanins. It is straight-forward that the given valid retiming r is feasible for the clock period βφ(r) with respect to the nominal gate delays. Then, Problem 3 actually asks for a set of vertices I such that the retiming r , which is obtained by moving 1 FF from the fanouts of I to its fanins, is a feasible retiming for the clock period βφ(r) with the minimum FF area. Because only 1 FF is allowed to move, it is not necessary to run the iMinArea algorithm until it finishes. We adapt the iMinArea algorithm in our IncreRetime subroutine as shown in Fig. 2 to solve Problem 3. The details of the iMinArea algorithm can be found in the work [14] . The following lemma states the correctness of the IncreRetime subroutine. Based on the above discussions, we design the Incremental Risk Aversion Retiming algorithm as shown in Fig. 3 to solve the risk aversion min-period retiming problem. In this algorithm, from a given initial valid retiming, we iteratively improve the current solution by first computing a subgradient on line 5 and then moving to the next solution on line 6 via solving Problem 3. The iteration will stop when a current retiming cannot be improved as found on line 7, or a maximum number R of iterations have been performed. The retiming solution with the best conditional value-at-risk measure of risk will be picked at the end of the algorithm.
VI. EXPERIMENTS
We obtain the code of the deterministic incremental min-period retiming algorithm [12] and build a risk-aware deterministic approach for comparison with our Incremental Risk Aversion Retiming algorithm. In this approach, we first assign each gate a deterministic delay derived from the gate delay distribution and a parameter γ specified by the designer. Then we run Zhou's algorithm [12] for a min-period retiming to obtain a solution. For a gate v, the deterministic gate delay is that Eˆdω(v)˜+ γ
, a weighted summation of the nominal delay and the standard deviation. Note that this deterministic approach is similar to the "Alternative Algorithm" proposed in the work [11] .
We implement our Incremental Risk Aversion Retiming algorithm in C++. All the codes are compiled by GCC version 3.4 and run on a Linux workstation with dual 927MHz Intel Pentium III processors and 512MB memory.
We derive our experimental benchmarks from the conventional ISCAS89 sequential circuits. To establish a gate delay model for process variations, we assume a joint Gaussian distribution of the gate delay. The parameters of the distribution are determined as follows. First, we assign each gate a nominal delay proportional to the number of its fanouts and a standard deviation that is within 20% to 30% of the nominal value. Then, assuming that each gate has a dimension of 1 × 1, we perform a wire-length driven placement of the circuits using the placement tool mPL6 [18] . After placement, the chip area is divided into a 4×4 grid. Two gate delays are assumed to be perfectly correlated if they are within a same grid block, i.e., the covariance is 1. Otherwise, the covariance of two gate delays is assigned to be inversely proportional to the distance of the centers of the grid blocks that the two gates belong to. We assume a risk aversion level of α = 0.9. For each benchmark, we first perform three deterministic optimizations with the parameter γ = 0, 1, and 3 and obtain three solutions. Then we run our Incremental Risk Aversion Retiming algorithm with the initial retiming being the solution obtained from the above deterministic optimizations with γ = 1. Our algorithm is allowed to run for at most 50 iterations before one solution is obtained. The other parameters are N = 500 and β = 1.01. The conditional valueat-risk measure of the clock period for each solution is evaluated by performing Monte Carlo analysis for 10000 samples to ensure accuracy. The results are reported in Table I as follows. The statistics of the circuits are reported in the columns "|V |" and "|E|". Under the column "Deterministic Approach", we report the conditional valueat-risk measure of the clock period for the original circuit before retiming in column "init" and report those of the three solutions obtained by the deterministic optimizations in the columns "γ = 0", "γ = 1", and "γ = 3". The column "best" shows the best one from the previous 4 columns, which is the best solution that one can get through the deterministic approach. The runtimes of the deterministic approach are all within 1 second and are thus excluded from being reported here. The results from our algorithm is reported under the column "Ours". The conditional value-at-risk measure of the clock period is reported in the column "CVaR". The improvement in percentage compared to the one in the column "best" is reported in the column "impr.". The number of the iterations performed is reported in the column "# R" and the runtime in seconds is reported in the column "t(s)". Note that for most of the benchmark, computing the subgradient uses more than 90% of the runtime. It can be seen from the table that our algorithm improves the solution quality for almost every benchmark circuit for up to 8% within fair amount of runtimes.
In addition, we compare the solutions in terms of the timing yield and report the results in Table II . The target clock periods are determined such that the solution obtained by our algorithm will have a timing yield of 90%. This table shows that the timing yield can be effectively improved by optimizing the conditional value-at-risk measure.
VII. CONCLUSIONS In this paper, we formulated the risk aversion min-period retiming problem to optimize the clock period of a circuit under process variations. The formulation is based on conventional two-stage stochastic program with fixed recourse with a risk aversion objective. We proved that the proposed problem is an integer convex program. We gave an analytical formula for the subgradient of the objective function and proposed to compute an approximation of the subgradient by sampling from a black box. We presented a heuristic incremental algorithm to solve the proposed problem. and the effectiveness of our proposed approach is confirmed by the experimental results.
