This paper presents a new concept of potential slack to measure so-called potential performance of digital circuits. Potential means how much improvement could be made in the future in terms of timing, area and power dissipation. Predicting potential performance helps the circuit designers make good design decisions at a specific level of abstraction. We describe two algorithms for potential slack:
Introduction
With the growing demand for small size, high speed and low cost of today's digital systems, the circuit design engineers have been working towards three major design objectives: circuit area, timing and power dissipation. These objectives measure the quality, and hence can be referred to as the performance, of digital circuits. Typically, one can use specific techniques either to design the fast circuits within a given area and/or power budget, or to minimize their area and/or power dissipation under certain delay constraints. In general, because of the very nature of the problem, faster circuits suffer from higher area/power penalty. A design process consists of many intermediate design stages, and the true (or final) performance of the circuits is unknown until the final stage where they are fabricated on the silicon chips. At gate level, for example, when the minimum-area technology mapping is performed on a Boolean network to obtain the gate-level implementation, the final area cost is still not available, for it can be further improved using such techniques as gate sizing and voltage scaling.
There are two types of performance: immediate and potential. The former, as its name implies, can be obtained immediately by existing estimation techniques, while the latter represents how much improvement could be made potentially to the performance through future optimization steps. The designers at high design levels of abstraction have no idea about "potential" performance without going through subsequent stages. Rather than looking at immediate performance which most of prior research focuses on, this work investigates circuits' potential performance which is of unique importance during early stages of design. Indeed, the true performance of circuits is determined by combination of their immediate performance and potential performance.
Intuitively, for timing-constrained design problems, potential performance is strongly related to the non-critical parts of circuits, as the delay penalty caused by area/power improvement on these parts will not necessarily degrade the circuit timing. Typically, the non-critical parts can be characterized by the timing slack (or slack, for brevity) of their constituent logic modules, where the slack of each module is the difference of its required time and arrival time. Experience shows that circuit implementations with same delay, area and/or power can have totally different slacks which can contribute to potential performance improvement. Unfortunately, traditional optimization approaches do not directly deal with slack. This motivates us to take a closer look at slack issues and to devise an effective way of measuring circuits' potential performance based on slack. From typical optimization tradeoff curves of delay versus area (or power) in circuit designs, the logic module with smaller area/power has longer delay. As a result, the module's slack provides an upper bound of its delay increase without violating the timing constraints, and hence represents the "potential" capability of obtaining area/power reduction. It should be noted, however, that this potential capability of the whole circuit cannot be characterized simply by taking the summation of all module slacks in the circuit, as will be seen later in the paper.
The main contribution of this work is to provide an effective way of predicting circuits' potential performance without having to go through the physical optimization stages at lower levels. Combined with immediate performance estimation techniques which most prior work focused on, the potential performance prediction enables efficient and effective design space exploration which is much desirable at higher levels. Fig. 1 shows the comparison between the traditional design flow and our new design flow based on potential performance prediction. While this paper uses gate-level applications for discussion, the idea of potential performance prediction can be extended readily to other design levels of abstraction (e.g., behavioral level) as long as the underlying problems are described by directedacyclic graphs (see the next section).
A well-known technique for slack management is the Zero-Slack Algorithm (ZSA) [1] . The basic idea is to identify the modules with minimum positive slack in a circuit, and then assign delay budgets to them such that their slacks become zero. The goal is to generate the performance constraints for layout design. The improved versions of ZSA for applications in placement have also been proposed, e.g., in [2, 3] . It was recognized that the ZSA seems to be most effective in managing slack, budgeting delay and generating performance constraints. However, this is not true in general. In particular, the results of ZSA are less accurate to be used for potential performance estimation, as will be shown later on.
Intuitively, the slack which can be used potentially for optimizing a circuit is called potential slack (its exact definition will be given in the next section). In this paper, we prove that potential slack is an effective metric of circuit's potential performance. We first explore several ways to estimate potential slack. An optimal and polynomial-time algorithm is then presented to obtain the potential slack. To speed up the process, we also come up with a greedy algorithm which provides the fast prediction of 4 potential slack. Applications to gate sizing and placement problems show that potential slack measures the potential performance of circuits very well.
The rest of the paper is organized as follows. In Section 2, we present some definitions and background. Section 3 describes the algorithms for estimating potential slack of given circuits. Section 4 discusses the fast algorithm for potential slack prediction, and Section 5 shows applications to gate sizing and placement problems. Finally, we report experimental results in Section 6, and conclude the paper in Section 7. 
Preliminaries
Consider a combinational circuit which consists of a set of modules V = {v 1 , v 2 , …, v n }, and a set of nets E = {e 1 , e 2 , …, e l } (at gate-level, modules correspond to gates). Naturally, we associate the circuit with a Directed Acyclic Graph (DAG), where a node denotes a module, and there is an edge from node v i to node v j if the output of v i is an input of v j . Also, each node v∈ V is associated with a delay d(v), which stands for the delay of node v itself plus the delay of driven interconnections (note that this node delay is different from the traditional definition where the interconnect delay was not captured in d(v)).
For convenience, suppose that the arrival times for all primary inputs are zero and that the required times for all primary outputs are the specified timing requirements. A well-known procedure to compute the arrival time a(v) and required time r(v) for node v ∈ V is given recursively by Since s(v i ) is an upper bound of incremental delay for node v i , any possible effective slack is no more than the total slack. Typically, the potential slack of a circuit is much less than its total slack. This is because the slack distribution varies with different slack assignments, depending on the topologic structure of circuits (further discussion will be given in the next section). This suggests the following lemma.
Lemma 2: For any safe circuit, the total slack is an upper bound of potential slack, i.e., PS ≤ |S(V)|.
In Fig. 2 , for instance, PS = 10, while the total slack |S(V)| = 20. For a given circuit, while the calculation of total slack is straightforward, finding the potential slack is a non-trivial task. Total slack can be used only as a rough estimate of potential slack, since not all of total slack may be utilized for optimization. Potential slack represents a real delay budget one can get while keeping the safety of the circuit. A slack assignment is optimal if it leads to the potential slack.
Potential Slack Estimation
From a practical point of view, area/power optimization is done within some delay constraints.
Therefore, we are only interested in effective slack assignments. This section first briefly reviews the well-known zero-slack algorithm (ZSA), and then describes a new algorithm based on maximumindependent-set (MISA).
ZSA
As is mentioned before, the ZSA starts with the nodes of minimum positive slack and locally performs slack assignment on these nodes such that their slacks become zero. This process repeats until the slack of all nodes is zero. More specifically, at each iteration, the ZSA first identifies a path on which all nodes have minimum slack (denoted by s min ), and then assigns each node an additional delay which, on average, is s min /N min , where N min is the number of nodes on the path. In Fig. 2 , for example, the path 4 } is first found, and each of the three nodes is assigned an additional delay of 5/3. The slack of all nodes except v 2 becomes zero, while the slack of v 2 is updated as 5/3. After assigning an additional delay of 5/3 to v 2 , the slack distribution is updated to be zero, and the algorithm terminates with the effective slack of 20/3 (recall that the PS of Fig. 2 is 10 ). This example shows that the ZSA is far from the optimal slack assignment while it is simple and easy to implement.
Our Algorithm  MISA
The ZSA generates effective slack assignment in a greedy way, ignoring the effect of nodes' slack on each other (i.e., the possible effect of a node delay increase on the slack of its transitive fanins/fanouts).
To take this effect into account, let us analyze the slack relation by looking at two cases shown in Fig. 3 .
Figure 3: Effect of node delay on the slack of its fanin (a) and fanout (b)
Suppose that in Fig. 3 (a), node v has a fanin node v in , and an additional delay ∆d(v) is assigned to v. As a result, the slack of node v is decreased by ∆d(v). Meanwhile, the slack of v in will be reduced (affected) if
Similarly, suppose that in Fig. 3(b) , node v has a fanout node v out , and an additional delay ∆d(v) is assigned to v. While the slack of node v is decreased by ∆d(v), the slack of v out will be reduced (affected)
only if
Node v in (or v out ) is said to be slack-sensitive to node v if the inequality (2) (or (3)) holds true, meaning that the delay increase of node v is reducing (affecting) the slack of
Here ∆d(v) can be viewed as a delay benefit which contributes to the effective slack of the circuit. In contrast, slack reduction of node v's fanins/fanouts represents a slack penalty which prevents further benefit which otherwise could be obtained. Intuitively, a good slack assignment should maximize the benefit, and minimize the penalty.
In order to reduce the slack penalty for any given ∆d(v), it is desirable to choose node v with large value of r(v) and/or small value of a(v), such that the number of its fanins/fanouts which are slack-sensitive to v is minimized (see Inequalities (2) and (3) 
update node slacks and δ in G ; } end } Fig. 4 shows a 7-node subcircuit example where the initial delay distribution is assumed to be one unit, i.e., D(V) = 1. Here we briefly describe how the ZSA generates the slack assignment for this particular example (the readers are referred to Section 3.1 above). By inspection, the ZSA first assign an incremental delay of 1.5 to each of nodes v 2 and v 3 , and then assign the same incremental delay to both While it is just a particular case, this example demonstrates that for a large number of fanouts, the MISA can provide significant improvement over ZSA, in terms of effective slack. As will be seen later in our experiments (Section 6), the MISA produces an average of about 30% improvement over ZSA for the real circuits. First we prove that the MISA leads to an optimal slack assignment if N = 1. Since G is a transitive graph, the set of nodes with slack s 1 can be expressed as the union of subsets of nodes such that each of such subsets is a clique which contributes to the effective slack by at most s 1 . If we use N c to denote the number of such cliques, the maximum effective slack (i.e., potential slack) is s 1 ⋅N c . Since the cardinality of maximum independent set in the MISA is exactly N c , it provides the effective slack of s 1 ⋅N c and hence the resulting slack assignment is optimal.
Assume that the MISA is optimal when N = k −1, where k ≥ 2. Let V k-1 be the subset of nodes whose slack is no more than s k-1 , and V k be the subset of nodes with slack s k . We construct an induced graph G k We use V sk to denote the set of nodes to which at least one node in V k is slack-sensitive, and let V sk ' = V \V sk . Suppose an optimal slack assignment of G is represented by
We have
is not an optimal slack assignment. This contradicts the inductive hypothesis). On the other hand, since nodes (with slack s m ) of V k in G may be slacksensitive to nodes (with slack less than s m ) in V sk \V k , any slack assignment ∆ for G leading to
will be either not effective or resulting in
Combining (4) and (5), we conclude that
By finding the maximum effective slacks in G k-1 and G k , respectively, one can obtain an optimal slack assignment for G, which leads to the maximum effective slack:
Based on the above discussions, we have the following theorem (see [5] and [10] 
Fast Prediction of Potential Slack
While exact estimation of potential slack is of theoretical interest, its computation cost is a major concern. In the real world, fast prediction of potential slack is highly desirable for design space exploration. Also, from an application point of view, there is no demand for finding exact potential slack, since potential performance is not strictly linear to potential slack. Indeed, the underlying optimization problems are discrete in nature, and not all potential slacks can be used for performance improvement. To illustrate the typical relationship between potential slack and potential performance, we show, in Figure 6 , the plots of area reduction versus potential slack for a set of circuits (refer to Section 6 for the detailed data) by using gate sizing technique. Area reduction represents circuit's potential performance, and each point in the figure corresponds to a specific implementation of the circuits. While the exact relation between area reduction and potential slack is unknown (depending on specific circuits and optimization methods), there is a strong correlation between them, as shown in the figure. In this section, we present a greedy algorithm towards the fast prediction of potential slack. Again, consider a DAG with n nodes. The MISA algorithm suggests starting with maximum slack and looking for maximum independent set of nodes in the transitive slack-equalization graph G t whose construction is computationally expensive. Considering the fact that all immediate fanins (or fanouts) of a node v are always independent of each other in terms of slack sensitivity unless there are reconvergent directed edges (i.e., edges going from one fanin or fanout to another), a fast way of estimating PS is to assign an additional delay (which equals a specific slack) to either node v or its all fanins (or fanouts)
recursively, depending on their slacks. By comparing these slacks, the nodes with largest sum of slacks can be selected to be the best candidates for node v for delay assignment. This largest sum of slacks is called local PS with node v. The extra procedure we need to go through is to check if there is any reconvergent edge involved. When there exists a reconvergent edge between two nodes, we can ignore the node with less slack. It should be mentioned that the reconvergent edge checking is efficient, since the number of fanins or fanouts of any node is very limited for the real circuits. When the local PS's are available, we select maximum one for delay assignment, and then delete all the related fanins and fanouts from the graph recursively. This process of reconvergent edge checking, candidate determination, delay assignment and fanins/fanouts deletion continues till the graph becomes empty eventually. It is straightforward to see that the time complexity of this greedy algorithm is no more than
O(nlogn).
The detailed pseudo-code of the algorithm is described as follows. final solution with PS = 10 + 3 + 3 = 16, compared to the optimal result with PS =18 generated by the MISA algorithm (as shown in Section 3.2). As we will see later in the experiments (Section 6), the greedy algorithm can be tens or hundreds of times faster than the MISA algorithm, while keeping the reasonable relative accuracy.
Greedy Algorithm

Applications
Application to Gate Sizing
Potential slack can be used for a class of delay-constrained area/power optimization problems [5] [6] [7] . As an application, gate sizing strategy is presented here for area optimization. For a single gate/node, its delay depends upon the sizes of itself and its fanouts. When the size w i of gate v i is reduced to w i ' < w i , the area benefit is w i − w i '. The gate delay can be expressed as
, where τ i is the intrinsic delay, and k i is a technology-dependent constant. Thus, the delay penalty is proportional to ). For the specific circuits, high potential slack promises significant area reduction during gate sizing.
When the potential slack (PS) of a circuit implementation is available, one can predict potential area/power reduction without having to go through actual low-level area/power optimization (e.g. gate sizing, voltage scaling or buffer insertion, etc). Of course, the exact improvement strongly depends on many factors, such as specific circuits, optimization methods used (or what to be optimized), how aggressive the optimization is, and the target technology library (if applicable). It is the designers' responsibility to capture the approximate correlation under their specific design environment, which is beyond the scope of this paper. While it is still unclear, in general, how the potential area/power savings could be determined by the PS, the plots in Fig. 6 suggest that for gate sizing application in particular, a piecewise linear function seems to be the reasonable approximation model. In this sense, fast area/power estimation for specific implementations can improve previous work on gate-level area/power estimation where this potential benefit is ignored.
Application to Placement
Yet another application of potential slack is the timing-driven placement design [8] . Given delay constraints of a gate-level circuit implementation, the potential slack can give us a basic idea about the freedom/flexibility of all signal nets of the circuit during placement. Intuitively, high potential slack allows a large number of signal nets to be routed easily without violating the timing performance. We first generate several implementations for a specific circuit and then perform the timing-driven placement to obtain the more accurate circuit delay. As will be seen from our experiments, high potential slack leads to significant timing performance improvement.
Experiments and Discussions
We have implemented the MISA and ZSA on top of SIS package [4] . We tested each algorithm on different gate-level implementations of benchmark circuits. These circuits were first decomposed with two-input gates and inverters, and then implemented using different mappers (with a standard cell We performed a first set of experiments to show the longest path delay (LPD), total slack and potential slack (PS) for three different implementations with their comparison. We tried a complete benchmark suite from MCNC'91 and selected, as the representatives, ten of them whose sizes range from 25 to 1500
gates (five circuits with tens of gates, four circuits with hundreds of gates, and one circuit with more than a thousand gates). The results are summarized in Table 1 , where the total slack is obtained using the maximum value of LPDs among three implementations as the timing constraints, and the potential slack is generated by the MISA (note that the LPD is estimated using unit-fanout model [4] , and I 2 does not always correspond to minimum LPD). From this table we see that for all circuits, potential slack is much less than total slack, and more total slack does not always result in more potential slack. For specific circuits, the best implementation in terms of either LPD or total slack does not necessarily correspond to the best implementation in terms of potential slack. For instance, circuits C432 and dalu have their minimum LPD and maximum total slack under I 2 , while their potential slacks are highest under I 1 . To look at how potential slack contributes to area optimization, we conducted gate sizing experiments based on the above three technology-mapped networks. Experimental results of the MISA followed by gate sizing are presented in Table 2 . Because of discrete nature of the underlying target library, not all potential slacks can be used for area reduction during gate sizing. We plot, based on Table 2 , the * This is the number of gates in the circuit under I 1 .
comparison of the best implementations predicted by LPD, by total slack and by potential slack and the best implementation in terms of area reduction, as shown in Fig. 7 . We see that for all tested circuits, the best implementation predicted by potential slack leads to maximum area reduction. It is interesting to look at the percentage of "correct" prediction by different metrics in Fig. 7 . While the LPD and total slack result in, on average, 20% and 40% chance of obtaining correct prediction, respectively, the potential slack provides 100% correct prediction. This is important since we can predict and/or maximize the possible area/power reduction by dealing with potential slack without having to go through lower level optimization. Typically, potential slack estimation is dozens of times faster than gate sizing in our experiments. The slacks unused in gate sizing can be utilized in subsequent physical design phase. In this sense, the proposed MISA attempts to maximize area reduction and/or flexibility of placement and routing. Comparison of the potential slack given by MISA and the effective slack estimated by ZSA is shown in Fig. 8 , where the curve is drawn based on the average value over three implementations. It can be seen that the MISA produces about 29% (on average) more effective slack than the ZSA. This indicates that ZSA is less accurate in estimating potential slack. Interconnection parameters were taken from [9] for delay estimation based on 0.35 µm technology. We tried a number of timing constraints for each circuit, and chose the minimum value T spec m from them "correct" prediction Figure 7 : The best implementation predicted by different metrics (The potential slack achieves 100% correct prediction.)
see Table 1 see To test the performance of the proposed greedy algorithm, we used same benchmarks as with the MISA algorithm. We also ran the ZSA algorithm on these circuits. Table 4 shows their comparison in terms of potential slack estimated and CPU time taken. We see that while for a specific implementation, the potential slack estimated by the greedy algorithm (column "GREEDY") can be much lower than that by the MISA (column "MISA"), the relative accuracy for different implementations shows a good match between them (except for circuits decod and C432). For instance, both I 1 and I 2 for circuit 9symml have * PS  potential slack (from Table 1 ) ** P T  total timing penalty much higher slack than I 3 , leading to larger area reduction with I 1 and I 2 (see Table 2 ). Based on Table   4 , we plot, in Fig. 9 , the best implementations (in terms of area reduction) predicted by the MISA, GREEDY and ZSA algorithms. On average, the ZSA and GREEDY give 70% and 80% correct prediction, respectively, compared to 100% correct prediction with the MISA, as shown in Fig. 9 . While the ZSA and GREEDY always lead to underestimate in the potential slack prediction, sometimes they generate correct results in the best implementation prediction. It should be mentioned that the greedy algorithm is much faster than the MISA, as shown in the last column of Table 4 , where the CPU time is based on SUN SPARC Ultra 10 with 440MHz clock frequency and 128MB RAM. 
Conclusion
We have introduced the new notion of potential slack to measure the potential performance for digital circuits. We have proposed an optimal maximum-independent-set based algorithm for finding the potential slack of given circuits, and a fast greedy algorithm for estimating the potential slack. It has been shown that potential slack provides another metric which traditional metrics would not be able to account for. In terms of area optimization in gate sizing application, potential slack provides 100% correct prediction, compared to 20%-40% chance of obtaining correct prediction by other metrics. In terms of timing performance for placement application, potential slack can be used as an important metric. Also, we have shown that our technique is significant improvement over traditional zero-slack algorithm in predicting the potential performance.
To the best of authors' knowledge, this is the first work to discuss the potential performance of circuits and its prediction for the new design flow. We believe that further work is much needed, including applications to other optimization problems and logic synthesis for potential slack maximization.
