Abstract-With continued scaling of technology into nanometer regimes, the impact of coupling induced delay variations is significant. While several coupling-aware static timers have been proposed, the results are often pessimistic with many false failures. We present an integrated iterative timing filtering and logic filtering based approach to reduce pessimism. We use a realistic coupling model based on arrival times and slews and show that non-iterative pessimism reduction algorithms proposed in previous research may give potentially nonconservative timing results. On a functional block from an industrial 65nm microprocessor, our algorithm produced a maximum pessimism reduction of 11.18% of cycle time over converged timing filtering analysis that does not consider logic constraints.
I. INTRODUCTION
With continuous scaling of technology, the aspect ratio of onchip interconnect wires has continued to increase. As a result the coupling capacitance between adjacent wires on the same metal layer is a dominant component of the total wire capacitance [14] . In addition, modern process technologies have multiple metal layers and a detailed extraction algorithm extracts layer-to-layer capacitance as coupling cap rather than virtual ground cap. In Figure 1 , we show the dominance of the coupling cap in the total capacitance of nets extracted from a functional block of an industrial high-performance microprocessor in 65nm technology. The x-axis shows the ratio of coupling capacitance to ground capacitance and the y-axis shows the percentage of nets that have a ratio less than or equal to the value on the x-axis.
Switching activity on the coupled wires induces interference on the wire of interest. Following convention, we refer to the wire of interest as victim and the coupling wires as aggressors. In particular, the switching of the aggressors around the same time as the victim transition causes changes in the delay of the victim signal. When the aggressors switch in the same direction as the victim net, the delay decreases; when the aggressors switch in the opposite direction, the delay increases.
Coupling-aware static timers have been introduced in the last few years to account for this effect. Such timers have to balance the need for being conservative with the need for being realistic and not showing too many false failures. Since victim delay is adversely impacted only when the aggressors switch in the temporal vicinity of the victim, timing windows are used on both victim and aggressor nets, and coupling is considered only when the windows overlap. While this reduces pessimism significantly, additional pessimism can be removed if one considers the logic interactions among the signals. For instance, even if the windows of the aggressors and the victims overlap, it may be logically impossible for all aggressors to fall when the victim rises. Thus, logical interactions along with timing filtering is essential for coupling-aware static timing tools to reduce the number of false failures and enhance designer productivity. This paper presents a comprehensive framework for performing logic and timing filtering in an integrated manner. This framework is implemented in an industrial static timer and is shown to reduce the * This work is supported by a grant from Intel Corporation show that iterations are required, since once logic filtering is done, some aggressors get filtered and thus change timing of the victim, necessitating further analysis. 2) A sensitivity-based method for selecting the subset of aggressors on which to apply logic analysis: Doing logic analysis on the entire set of aggressors is often not runtime efficient. 3) Insights into algorithm behavior (Section III) when the underlying coupling model is aware of victim/aggressor alignment and slews: We use a charge sharing model with a focus on fast evaluation and good fidelity to drive the algorithm in the right direction.
We next briefly review the existing literature relevant to this work. Since delay and crosstalk are inherently chicken-and-egg problem (victim delay depends on the amount of charge injected by the aggressors which in turn depends on the victim arrival times and slews), iterative methods are needed to perform coupling analysis [20] , [8] , [6] , [3] . Zhou [20] established the theoretical foundation for the iterative analysis. [8] presented an iterative static timing analysis algorithm based on some initial switching windows (best case crosstalk delays as opposed to worst case crosstalk delays considered by [6] ) and iteratively updating the timing information until convergence. Though [8] used a novel coupling model in timing analysis based on arrival and transition times, they did not explore the model dynamics which, as we show, turns out to be a key factor in coupling-aware static timing analysis using logic constraints. To exploit logic feasibility conditions, two recent researches proposed pessimism reduction [19] , [4] . We show in this paper that under an alignment-and slew-aware coupling model, pruning by combining timing and logic is no longer a non-iterative step, contrary to [4] , [19] . In fact, ignoring the interaction between timing and logic conditions can potentially lead to non-conservative timing results, as shown in Section IV-A.
The rest of the paper is organized as follows. We present our coupling model in Section II. Application of our coupling model Charge sharing based coupling model in timing filtering is described in Section III. Logic filtering and our algorithm LogicTimer are presented in Section IV. Detailed results are presented in Section V. Finally we conclude our paper in Section VI.
II. COUPLING MODEL
Our coupling model is based on a charge sharing model as presented in [12] , [10] . We are aware that such a model is an abstraction of more complex simulation-based models [13] , [7] but algorithmic contributions presented in this paper are orthogonal to model choice as long as the model is slew and alignment based. Our goal is to use a fast evaluation model that has high fidelity in order to drive the algorithm in the right direction as the model is evaluated many times in our algorithm. This is analagous to using the Elmore delay model in physical design. Additionally, the chosen model allows us to derive an analytical understanding of algorithm behavior in Section III.
The circuit model for Miller coupling factor (m) computation is shown in Figure 2 . m is computed based on the victim and aggressor arrival times and respective slews. We use the following notations:
Victim Arrival Time = atv
Note the difference of our notations from prevalent conventions. For the purpose of defining and evaluating the model, we consider at as 0% arrival time rather than 50% point. tt represents the time taken for the signal to transition between 0% and 100% of V dd . This is computed by linearly extending a 20-80% slew from static timing analysis. The analytical derivations for the charge sharing model are omitted; interested readers are referred to [12] , [10] for details. We define the voltage percentage over which charge matching is applied as β. Based on different values of β, different bounds on the coupling factor can be obtained. In this work, we do charge matching over 0-100% voltage of the victim with aggressor (β = 1). Thus, the bounds for our coupling model can vary between 0 and 2 but there are other similar coupling models such as [12] where the bounds can be much wider than 0 and 2. The algorithmic details reported in this paper are independent of the chosen value of β. Should one choose to be more conservative, choosing β as 0.5 provides a practical upper bound of 3 to Miller coupling factor [5] , [12] and thus our model can provide an upper bound similar to the Elmore delay model. As described in Section IV, using a model with high fidelity we can identify important aggressors. A more complex model can be employed for final signoff timing with these important aggressors.
Details of our coupling model are presented in Figure 3 Figure 4 presents the situation when victim and aggressor are switching in opposite directions. We call this function ψ OS . 
III. COUPLING MODEL APPLICATION TO TIMING FILTERING
We apply our coupling model to crosstalk-aware static timing analysis. Each net in the circuit has a driver port as source and several receiver ports as sink. During static analysis, timing events are propagated using a breadth-first-search beginning from input ports. Static timing is performed in both min and max mode to identify lower and upper bounds on arrival times and slews for each net. These bounds define arrival time and slew windows for each net. Both rise and fall windows exist on each net due to rise and fall timing events generated during static timing. The min and max modes are evaluated concurrently to enable timing window analysis.
For the two static timing modes, we must compute both min and max coupling induced delay push-out. Since delay monotonically increases with output load, we compute max (min) delay push-out using the max (min) coupling factor m. Max m results from opposite switching victim and aggressor transitions and min m results from like switching victim and aggressor transitions as shown by the coupling model in Figure 4 and Figure 3 .
Slew selection for coupling induced push-out computation follows from the coupling model. Consider the equations in Figure 4 where tt v is always found in the numerator and tta is always found in the denominator, both with positive sign. Thus to maximize m we choose maximum ttv and minimum tta from the respective slew windows. A similar method is used to choose slews to minimize m. During static analysis of victim v, four coupling factors are computed: m for the maximum rise event, m for the maximum fall event, m for the minimum rise event and m for the minimum fall event. Note that we get different values of the four coupling factors when we analyze net a as a victim and v becomes its aggressor.
In rest of the paper, for each victim net we focus our attention on the max arrival time calculation (rise, fall). For each max event Fig. 5 . m as a function of α for opposite switching calculation we assume the aggressor can transition anytime between its earliest and latest arrival time. Min arrival time computation follows by symmetry. We begin our crosstalk-aware static timing analysis with the bounds of 0 and 2 on m. Timing windows are then iteratively refined. These iterations decrease the maximum bound and increase the minimum bound on m. Thus our iterations shrink the delay windows and reduce the pessimism. We refer to such an analysis as timing filtering. We discuss behavior of our coupling model under timing filtering next. Refer to the situation discussed in Section II. We are computing maximum m due to the maximum rise event on v and the fall window on aggressor a. We have been given (atv, ttv) for victim v and (at min a , tta), (at max a , tta) for the aggressor a. α, m ∈ R are defined in Section II.
We map the given aggressor window (at 
subject to w l ≤ w ≤ w h A visual representation of the maximum coupling factor computation is shown in Figure 5 . w l and w h are the bounds on α. As evident from the figure, if tta < ttv, m is 2.0. Consider iterations i and j. Respective W for these iterations are W i and W j . As described in Section III subsequent iterations strive to reduce pessimism. We define relation (Pessimism Reduction) over the set of W as follows
Consider m i and m j as the respective coupling factors at iterations i and j for same aggressor net. The following theorem gives interesting insight into the proposed coupling model. 
Applying Equation 1, m in this situation will be given by ψOS(w l ).
We represent m = f (α, ttv, tta). Using a first order Taylor expansion around m i we approximate m i+1 as
Computing the partial derivatives of ψOS(w l ), at iteration i we can approximate m i+1 as . Corollary 1 gives more insights into our model. Arrival time difference turns out to be the primary effect that changes m and thus helps in pessimism reduction. Slew differences are second order effects but they invalidate the monotonicity of m. Monotonic changing of timing windows was essential for the proof of convergence of iterative techniques as presented in [20] . But in a realistic coupling model as we present in Section II, we have situations when monotonicity of m over the iterations does not hold and therefore may cause problems in convergence. The following observation presents insights into the timing convergence in presence of a realistic coupling model. Consider iterations i, i + 1 and i + 2.
Observation 1:
We represent change in m over iterations as Δm. 
IV. LOGIC FILTERING
Timing filtering as presented in Section III assumes state transitions on victims and aggressors to generate maximum m. Logic filtering describes a process by which certain aggressor and victim state transition combinations are eliminated from consideration because such conditions are logically impossible. This can be explained with an example shown in Figure 6 . Let us assume that the logic driving the two aggressors A1 and A2 are as shown in Figure 6 . Both the latches are active high and driven by the clock C. Let N(t) denote the value of signal N in the circuit at time t. N(t) is commonly referred to as the current state for signal N and N(t+1) is the next state for signal N. The next-state functions for the aggressors A1 and A2 for the above circuit can be represented by the following equations.
A1(t + 1) = (C(t + 1) = 0) * A1(t) + (3) (C(t + 1) = 1) * ( M (t + 1))

A2(t + 1) = (C(t + 1) = 0) * A2(t) + (4) (C(t + 1) = 1) * M (t + 1) * N (t + 1)
In Equations 3-4, * denotes logical AND, + denotes logical OR and denotes logical NOT conditions. Note that the above equations assume a 0-delay model and do not take into consideration glitches.
To find out if A1 = R and A2 = R is logically feasible, we check if the Boolean condition (A1(t) = 0) * (A2(t) = 0) * (A1(t + 1) = 6A-1 
. . . . Fig. 7 . Logic conditions on a victim cluster 1) * (A2(t + 1) = 1) based on the above equations is satisfiable (i.e., has a solution). It can be easily verified that this condition is not satisfiable for this example. This indicates that the condition A1 = R and A2 = R is logically impossible.
In our current work we derive similar equations for aggressor and victim state transition combinations by analyzing the logic and use search engines like Satisfiability (SAT) [16] and Automatic Test Pattern Generation (ATPG) [18] to find out if certain combinations are impossible. Detailed discussion on formulating SAT clauses from circuit structures can be found in [15] . ATPG based techniques are used by [2] for pessimism reduction but unlike our work they did not consider the interaction between timing and logic filtering.
Consider a victim cluster < V, A1, A2, A3 >. We define a logic pattern as a combination of Rise(R), F all(F) or Stable(S) conditions on A1, A2, A3 where possible victim logic conditions are Rise or F all.
A collection of such patterns as shown in Figure 7 , forms a Logic Table for the victim cluster. In Figure 8 we show the infeasible pattern counts for all victim nets in a functional block. For each victim, we check logic patterns for the top 3 aggressors when the victim is rising(R). The pattern RFFF is logically infeasible for 7481 out of 21681 possible victim nets. Thus, assuming worse-case logic conditions is overly pessimistic.
Given a logic table for a victim cluster, each pattern has a coupling induced delay push-out associated with it. We call this delay pushout the pattern rank. The logic filtering problem statement is as follows:Find the pattern that produces the worst rank.
A. Logic Filtering Preliminaries
Computation of coupling induced delay push-out for logic patterns extends the m computation as presented in Section III. Suppose we are computing the maximum m for a victim rise event. Static timing would assume aggressors fall, but in reality the aggressors can fall, rise, or remain stable. Therefore we must compute additional We give an example of rank generation for pattern RFSS from Figure 7 . We generate m A1 , m A2 and m A3 using the extended formulations described above. The circuit model of this victim cluster is shown in Figure 9 . m1, m2 and m3 are respectively m
The rank E d for the pattern under consideration is computed using Elmore delay [9] . Due to its high fidelity, E d provides us with a fast and accurate estimate for the ranks.
We show that logic filtering turns out to be a chicken-egg problem. In this iteration the worst ranked pattern is still RSFS but the ranks for pattern RFRF and RSFS become 9.5 and 10.0 respectively. The worst rank of 9.75 obtained in the previous iteration is nonconservative and fix-point iterations as suggested by Zhou [20] are required to get conservative bounds on worst ranks. Also iterations are necessary since the pattern that generates worst rank can change due to m as explained earlier. Thus logic filtering with a realistic coupling model is an iterative approach as opposed to a single step process as suggested by previous researches [4] , [19] .
B. LogicTimer
We present our algorithm in Figure 10 . Our algorithm has the flexibility to run in timing filtering or timing and logic filtering modes.
6A-1
if(TIMING AND LOGIC FILTERING) Compute mi for selected aggressors(Section IV-A) R ← Identify worse ranked pattern using mi mi The algorithm is similar to Gauss-Jacobi [11] iteration technique. It initializes by assuming worst and best bounds on the aggressors as provided by our coupling model in Section II. Based on m, timing events on each node of the timing graph for iteration 0 are generated. The subscript with Events refer to the iteration number. We continue our algorithm in TIMING FILTERING until we are close to convergence on all events. The converged timing windows are used to identify important aggressors for the logic filtering algorithm. The aggressors that do not align with the victim are not considered. Details of important aggressor identification are presented in Section IV-C. We generate logic tables for the selected aggressors and then change the mode of our algorithm to TIMING AND LOGIC FILTERING. At each iteration we compute coupling factors and use them to identify the worst possible pattern for each victim. We use the worst pattern to determine new m's. Finally using these updated m's, new events are generated on each node of the timing graph in a topological order. Once the coupling factors are generated, timing analysis approach is similar to the algorithm proposed by [3] .
Convergence of the algorithm comes from Observation 1. For complexity analysis, suppose the number of nodes in a circuit are N . I T F iterations are needed to converge using timing filtering. We are not considering the time for aggressor selection and logic table generation since it is called once. Although in any logic filtering algorithm, logic table generation has the highest overhead in terms of runtime. Suppose we took I LF iterations to converge on logic tables. Consider the number of victim nets in the circuit as E. Generation of m in timing and logic filtering modes can be done in O(E) time. The overhead in pattern ranking is constant time in terms of number of patterns for each victim cluster. Once m's are generated, the timing events can be updated in O(N ). Hence each iteration of our algorithm takes O(N + E). Total complexity of our algorithm is given by O((I LF + I T F )(N + E)).
C. Aggressor Selection based on sensitivities
We present a metric S to select important aggressors after the timing filtering iterations converge. Consider a victim cluster as described in Section IV. A victim net can be represented as an RC tree [1] 
Couple 
We then predict the gradients of Equation 6 as follows ∂g
Rearranging Equation 6 we can obtain the % change in delay due to respective change in m j . On the basis of above discussion we present our sensitivity based metric for aggressor j as follows
m j is required to compute the sensitivity S j .m j results from the underlying coupling model (Refer Section II) in the associated timing analysis flow. While doing max analysis in logic filtering we need to evaluate ψ SS as shown in Equation 5 . Upper bound of m in Figure 3 is 1.0. Therefore choosingm j as 1.0 predicts the gradient value in right direction. Though we derived our metric using Elmore formulation, it is straightforward to use our metric in conjunction with RICE [17] engine to compute accurate delays and thus trade off on runtime to generate S j .
V. RESULTS
Our experiments are performed on a functional block from a 65nm industrial microprocessor. Logic table generation requires selection of important aggressors. We run LogicTimer in TIMING FILTERING mode till convergence and then select important aggressors. We compute the slacks on end nodes (flip-flop and latch data pins) and normalize the slacks with respect to the cycle time of the microprocessor.
We present our results on the setup slacks corresponding to maximum edges of timing windows. Hold slacks or minimum edge analysis is analogous. We compare normalized slacks of the first iteration (with coupling factors of 0 and 2) with slacks of the iteration when LogicTimer converges on all nets in TIMING FILTERING mode (Iteration 8). Maximum pessimism reduction relative to the cycle time of the microprocessor is 22.69% and median pessimism reduction is 3.72%. Note that most of the nodes have converged after just 3 iterations.
We selected the top N aggressors for logic table generation where aggressors are ranked on the basis of m × Cc. Due to complexity of SAT, logic table generation with N = 9 aggressors for all victims was not feasible (O(E × 3 9 )). Hence we divided the aggressors 3 )). We compute the worst rank of each group independently and do a superimposition to come up with the final rank. If logic table generation runtime is improved, larger tables can be generated and logic filtering results will also improve. Results of running our algorithm in TIMING AND LOGIC FILTERING mode are presented in Figure 11 . We compare normalized slacks with the slacks of iteration 8 (converged timing filtering results). We obtain a maximum pessimism reduction of 11.18% relative to the cycle time of microprocessor. Median pessimism reduction is 0.44%. We also identified circuit nodes where arrival times are not monotonically decreasing over the iterations. Theoretical reasons behind such behavior comes from the behavior of our coupling model as explained by Theorem 1 where m is not monotonically decreasing. We show the variation of m over iterations on one such node for a few aggressors in Figure 12 . We also found nodes which show oscillations in m as mentioned in Observation 3. Such nodes were few (around 1.17%) and in such cases the timing analysis algorithm should take conservative decisions to deal with the oscillations.
Results of running our algorithm on the basis of aggressor selection metric presented in Section IV-C is shown in Figure 13 . Using m × C c 3999 nodes reduced pessimism by 0.50% while using our metric S, about 5362 nodes reduced a similar amount of pessimism. This trend continues as our metric was able to reduce 0.60% of pessimism in 5362 nodes as compared to 2937 nodes using m × Cc metric. In fact as evident from Figure 13 , the node distribution using S is more skewed toward increased values of pessimism reduction as compared to metric m × Cc.
VI. CONCLUSIONS
In this paper we present a novel algorithm to do coupling-aware static timing analysis with logic constraints to remove pessimism. We also present the dynamics of a complex coupling model under timing analysis and its effect on logic filtering. Timing analysis with logic constraints must be iterative to give conservative timing data. On a functional block of a 65nm industrial microprocessor, our algorithm showed a maximum pessimism reduction of 11.18% on top of Timing Filtering Analysis. Our future work includes extending such a coupling model to statistical timing analysis and optimization.
