Abstract-Static timing analysis is instrumental in efficiently verifying a design's temporal behavior to ensure correct functionality at the required frequency. This paper addresses static timing analysis in the presence of cross talk for circuits containing levelsensitive latches, typical in high-performance designs. The paper focuses on two problems. First, coupling in a sequential circuit can occur because of the proximity of a victim's switching input to any periodic occurrence of the aggressor's input switching window. This paper shows that only three consecutive periodic occurrences of the aggressor's input switching window must be considered. Second, an arrival time in a sequential circuit is typically computed relative to a specific clock phase. The paper proposes a new phase shift operator to align the aggressor's three relevant switching windows with the victim's input signals. This paper solves the static analysis problem for level-clocked circuits iteratively in polynomial time, and it shows an upper bound on the number of iterations equal to the number of capacitors in the circuit. The contributions of this paper hold for any discrete overlapping coupling model. The experimental results demonstrate that eliminating false coupling allows finding a smaller clock period at which a circuit will run.
I. INTRODUCTION
Shrinking process geometries have imposed new challenges in both design and verification. One particular problem is the capacitive coupling among two or more signals in the circuit. Coupling exists due to the proximity of a wire to others that are either in the same layer (lateral coupling), or in different layers (inter-layer coupling). Coupling creates undesired noise and delay in the circuit. This phenomenon is commonly referred to as cross talk.
Noise on a signal refers to creating voltage deviation from the nominal supply and ground rails when the signals should otherwise have been stable at a high or low value as dictated by the logic and delay of the circuit [21] . Noise greater than the allowed noise margins causes malfunctions.
Delay variation due to capacitive coupling refers to either speeding or slowing the point in time where a switching net reaches its receiving threshold, thus causing receiving gates in the immediate fanouts to switch sooner or later than expected. The delay variation is dependent on the relative arrival times of the victim net and the aggressor(s) net that capacitively couple to the victim. If the victim is switching in the same direction as the aggressor(s), then we have assistive coupling, and the victim switches sooner than anticipated. Delay improvements could potentially cause race-through or double-clocking conditions, and thus circuit failure. With opposing coupling, the victim net switches later due to opposing transition on the aggressor(s) net(s). Delay degradation cause performance failure; the circuit will not run at the desired frequency. Static timing analysis techniques, which verify a design's temporal behavior to ensure correct functionality at the required frequency, thus must consider the effects of cross talk.
Several static timing analysis techniques that consider cross talk have been proposed for combinational circuits. Some are based on iterative techniques [3] , [18] ; some are based on the propagation of events [5] ; others are based on more complex mathematical formulations [10] . The choice of what constitutes coupling (any overlap of the inputs' switching windows v.s. more detailed coupling conditions) affect the complexity of the algorithms. Consideration of the functional correlation of the victim and the aggressors allow further accuracy in analysis [25] , [2] , [4] . The worst case victim delay can be obtained by driver modeling using reduced order modeling and worst case alignment of the aggressors relative to the victim [7] , [9] , [22] . This paper addresses cross talk analysis for circuits with level-sensitive latches. Level-clocked circuits are certainly dominant in high-performance designs because they can operate at faster clock rates than edge-triggered circuits [8] . This is because, unlike edge-triggered registers, latches allow borrowing time across their boundaries. Researchers have efficiently solved the problem of verifying a clock schedule [23] , [14] , [11] . However, naively assuming worst case crosstalk while running these algorithms yields pessimistic clock periods.
A clock schedule specifies the clock period, and the relative timing and duration of each of the phases in the schedule. Given a circuit and a clock schedule, we solve the problem of clock schedule verification in the presence of cross talk. That is, we answer the question, does the circuit run at the specified clock period given the phase waveforms imposed by the clock schedule?
The difficulty of clock schedule verification problem is twofold. First, due to the periodic nature of signals in a sequential circuit, coupling can occur because of the proximity of a victim's switching input to any periodic occurrence of the aggressor's input switching window. More than one occurrence of the aggressor waveform thus must be compared against that of the victim. Second, the arrival times in a level-clocked circuit are typically computed relative to a specific clock phase. Translating the arrival times using a common reference point will be needed to meaningfully compare the switching windows.
This paper addresses both of these problems. We show that only three consecutive switching windows of the aggressor's input must be compared with the victim's input switching window. To determine overlap in switching windows at the inputs of the victim and aggressor, we propose a phase shift operator that can translate values from the aggressor's to the victim's time zones. The paper solves the clock schedule verification problem in the presence of cross talk iteratively in polynomial time. Furthermore, it shows an upper bound on the number of iterations equal to the number of capacitors in the circuit.
Several discrete and continuous coupling models are possible for representing the change in delay due to coupling. We choose to use the dynamically bounded delay model [10] , an abstract delay model that allows a gate's delay to be assigned one of many values depending on related operating conditions. While more accurate continuous models are possible, e.g. [6] , the chosen model is a generalization of discrete coupling models, such as ones that assume a 0X, 1X, or 2X increase in delay, e.g. [18] . While suffering from inaccuracies compared to continuous models, discrete models require less computational complexity. Furthermore, they have proved helpful in understanding the complex problem of static timing analysis in the presence of cross talk. Their use in this paper allowed us to achieve an understanding and develop a solution to the coupling problem in level-clocked circuits. The framework and solution proposed here can be easily extended to utilize other discrete coupling models.
The paper is organized as follows. Section II reviews recent advances in timing analysis for combinational circuits in the presence of cross talk and for level-clocked circuits. Section III introduces the clock schedule model, the gate-level delay model, and the circuit model. An example is presented in Section IV. Timing equations to model correct circuit operation and coupling conditions are respectively derived in Section V and in Section VI. Then, in Section VII, we present a polynomial algorithm to verify the timing of a levelclocked circuit when given a clock schedule. We conclude with experimental results.
II. RELATED WORK A. Timing Analysis in the Presence of Cross Talk
Timing analysis techniques for non-cyclic combinational circuits are based on traversing an acyclic graph in a time linear in the number of vertices and edges [13] . In the presence of cross talk, however, such techniques cannot be directly applied because one net can couple to another anywhere in the circuit. Mutual dependencies among the signals are created, effectively creating cycles in the underlying timing graph. Iterative techniques have been proposed to solve this problem. An initial solution is first assumed. New solutions are then iteratively computed from previous ones, until the solution converges.
Several researchers have proposed such iterative solutions. Pileggi's group at CMU model a gate driving an RC load as a linear time-varying voltage source in series with a resistance [9] . Their static timing analysis, TACO [3] , begins by maximizing the switching windows for each signal: the earliest arrival times are set to zero, and the latest arrival times are set to infinity. Static timing analysis is then run, computing all arrival times in the circuit assuming worse case alignment of the aggressors. Analyzing the output of this run, some aggressors are found to be non-aligned with the victims. The arrival times for the victims are updated and propagated using a static timing analysis run. The process repeats to tighten the windows until the windows stop shrinking. Sapatnekar also proposes an iterative approach [18] . Whenever switching windows of wires overlap, then the delays are updated. Zhou, Shenoy and Nicholls establish a theoretical foundation for iterative techniques for timing analysis with cross talk [26] . They show that different initial solutions lead to different convergent solutions. They also show that the optimal fixpoint (tightest) solution is obtained by starting from the best case solution that assumes no coupling.
B. Verifying Clock Schedules
The biggest challenge in formalizing the verification of clock schedules for level-clocked circuits was creating a general clock schedule model to reflect borrowing across latch boundaries. Among first generation timing analysis tools, such as TA [1] , TV [12] , Crystal [15] , and LEADOUT [24] , only the latter correctly verified borrowing across latch boundaries. Second generation timing analysis tools, developed in the early nineties, are based on formalizing the timing constraints and developing efficient algorithms to solve them. Sakallah, Mudge, and Olukoton developed the SMO model [16] which was widely adopted within the timing verification and optimization community. Ishii, Leiserson, and Papaefthymiou also provide a general framework for the timing verification of 2-phase level-clocked circuits [11] . Schedule verification algorithms were based on one of two approaches. The Sakallah et. al approach [17] and the Szymanski and Shenoy approach [23] advocate computing arrival times using iterative approaches based on successive relaxation of arrival and departure times. Szymanski and Shenoy show that clock schedules can be verified using a simple polynomial time algorithm modeled after the Bellman-Ford shortest path algorithm [23] . Lockyear's approach [14] and Ishii et. al's approach [11] , however, are based on determining the amount of time in which a computation must complete. This approach also results in efficient polynomial algorithms for verifying schedules.
III. PRELIMINARIES

A. Clock Schedule Model
Our clock schedule model is based on the SMO formulation [16] . A n-phase clock schedule is an ordered collection of n periodic signals, φ 1
, having a common period π. Because phases are periodic, a local time zone of width π is associated with each phase. Each phase φ i is characterized by two parameters e i and w i . Parameter e i represents the absolute time when φ i begins (relative to an arbitrary global time reference). Parameter w i is the length of time that φ i is active (latch is open). To translate one measurement a from the local time zone of φ i into the next local time zone of φ j , we subtract from a a phase shift operator E i j , defined as:
This clocking scheme is demonstrated in Figure 1 . We assume that the design intention and thus the clock schedule specify that a signal departing from a latch k must be captured by the next latching edge (which occurs after the latching edge of k) of the following latch l. The earliest arrival time at the output of a latch k clocked by φ i is π 
B. The Delay Model
The dynamically bounded gate delay model [10] , illustrated in Figure 
then our model is essentially the commonly used fixed, or min-max, delay model. Delays associated with cross coupling are modeled as follows: Assume that the output of a node, v, capacitively couples to the output a node a. With opposing coupling, v's maximum delay is increased by a ∆ v a . With assistive coupling, v's minimum delay is decreased by a value δ v a . A predicate indicates when this increase or decrease must hold. To handle additive coupling or more detailed conditions, such as 0X, 1X, or 2X factors predicates can conditionally specify when these delays will be used.
C. Circuit Model
A circuit is modeled as a directed graph G
Each vertex in V represents either a primary input, primary output, 
IV. EXAMPLE
To understand how false coupling can produce pessimistic clock schedules, consider the example circuit in Figure 3 Signal F must wait till the opening edge of the φ 2 latch before the value is propagated. The smallest possible clock period is forced to be at least 10, to accommodate the critical path, whose worst case delay is 15, from the input of the block generating signal A to D. Using the schedule in Figure 3 (c), for example, will not work since the period is 9. Other schedules with a period of 10, such as ones with non-symmetrical phases, will work.
The switching windows of signals C and G overlap, thus, coupling between D and H will cause additional delays for both signals. The switching windows of A and E are, however, far apart. Thus, B switches without interference from F. Noise might be possible on node F, but it will certainly not affect its arrival times. The coupling between B and F is thus false. The delay of the critical path from the block generating A through the block generating D is 14 instead of 15. The schedule in Figure 3(c) can be used to clock this circuit. It (c) has a smaller period than the one in Figure 3(b) . Timing analysis that eliminates false coupling therefore allows a faster schedule. In this example, the comparison of the overlapping switching windows of the victims and the aggressors was done in absolute time. However, arrival times are computed relative to a specific latch's time zone, and we must translate the time zone of the aggressor to that of the victim (or vice versa) in order to compare them correctly.
V. TIMING EQUATIONS
The earliest and latest arriving signals at the inputs of the victim and aggressor must be analyzed to determine if the switching windows overlap. The latest arrival time at a combinational node v: The departure time from a node v, without capacitive coupling on its output, can be specified as follows:
To compute D v , the departure time at v, we augment the latest arriving input to v by an amount ∆ v , the maximum propagation delay through v. For a node v with capacitive coupling on its output through one or more aggressor in C v , the maximum departure time is:
This constraint ensures that the propagation delay of v is augmented by an amount ∆ v a when a node v (the victim) experiences capacitive coupling through an aggressor a. Worst case opposing coupling between v and a is assumed because we are not considering the functional/logical behavior of the circuit. Variable γ v a is binary indicating if the conditions for capacitive coupling hold. A description of conditions that cause coupling is provided in Section VI. We similarly specify constraints for minimum arrival and departure times. For a combinational node v:
For a latch v:
The earliest departure time for a node v can be specified as follows assuming worst case assistive coupling between a victim node v and an aggressor a.
VI. COUPLING CONDITIONS Due to the periodicity of signals in a sequential circuit, coupling can occur due to the overlap, or close proximity by an amount of τ, of the switching window at the input of the victim and any periodic switching window at the aggressor's input.
Consider the situation depicted in Figure 4 , where the aggressor and the victim have the same phase, p v ¦ ¡ p a ¦ , resulting in aligned time zones. When considering the maximum possible victim range and the need to account for τ, it is apparent that the victim's input switching window can overlap with with either one, two, or three of the three possible switching windows of the aggressor's input: the previous, the current, and the following windows.
To determine if coupling exists, we must compare the overlap between the input switching windows with that of the three occurrences of the aggressor. When p v ¦ ¡ p a ¦ , determining the overlap between the inputs to the victim and the current aggressor, is essentially that same as for combinational circuits, namely:
The comparisons with the previous and following occurrences can also be determined by noting that the previous occurrence of the aggressor can be computed by subtracting π from the range, resulting in To translate a value local to the aggressor's time zone to the victim's time zone and to have that value appear as a current occurrence, we define a new phase shift operator, E¤ i j , as follows:
This operator differs from the SMO phase shift operator. Consider again the coupling scenario in Figure 5 . We examine the use of the new phase shift operator which is illustrated in Figure 5 
VII. ALGORITHM
Our algorithm for verifying that a circuit runs correctly for a given clock schedule is iterative. Initially, all coupling is assumed not to hold: all γ v a variables are set to zero. During each iteration, the steps below are performed. This algorithm is run until no new γ variables are assigned.
1) Compute the latch-to-latch, PI to latch, and latch-to-PO minimum and maximum delays as outlined in [19] . The run time is dominated by O
, where
is the number of latches in the circuit. Because the Szymanski/Shenoy algorithm in the next step utilizes latch-to-latch delays, the computation in this step is needed to ensure the efficiency of the latter algorithm. During each iteration, the latch-to-latch delays are recomputed because new γ variables are assigned, and the computed delays will be different. 2) Using the delays computed in step 1, run the Szymanski/Shenoy [23] algorithm to compute the arrival and departure times at the latches, PIs, and POs. . Because the next step requires the arrival times at the inputs to victims and aggressors, a post-processing step, linear in the number of circuit nodes and edges, produces these values. 3) Compare the switching windows as outlined in the previous section, and set the appropriate binary γ variables. The run time is linear in the number of nodes, assuming a small number of aggressors is associated with each victim. Our algorithm is guaranteed to converge. Once a new γ is assigned, the victim's window is simply stretched (the A v becomes larger, and the a v becomes smaller). Such a change in the victim's window can only cause other windows to either remain the same or further stretch. The algorithm is guaranteed to converge in C iterations because, in the worst case, one γ variable is assigned true through each iteration. Furthermore, once γ is assigned true, it does not change. Once C iterations are completed, no switching windows change. The argument of continually shrinking or expanding switching windows was used to prove convergence for timing analysis for combinational circuits [3] , [18] . Sapatnekar noted that C iterations are needed for convergence [18] .
VIII. EXPERIMENTAL RESULTS
Our experiments evaluate the effectiveness of our algorithm in verifying clock schedules in the presence of crosstalk. Our benchmarks are based on a subset of the edge-triggered MCNC FSM circuits that we convert to circuits with level-clocked latches. SIS was first used to perform logic optimization and mapping [20] . We then converted registers to back-toback phi1/phi2 latches and used sskew, Lockyear's retiming tool [14] , to determine an equal, two-phase retiming and an initial clock schedule. The combinational nodes in the circuit were initialized with a maximum random delay within 2.5 and 0.5; the minimum delay was then initialized with a random value that is at most 0.5 less than the maximum delay. We then added random capacitors equal in number to 10% of the total circuit nodes. Each capacitor was assigned a random delay between 0.0 and 1.0. The circuits used are summarized in Table VIII. We augmented the circuits with three larger ones: c1k, c2k, and c4k. These circuit were obtained by stitching together the mapped sand benchmark, and then generating delays and capacitors randomly and converting the registers to latches.
We ran sskew to determine the worst and best clock schedules. Table II lists the maximum period that assumes worst case capacitive coupling, and the normalized minimum period, which assumes no coupling, in column 2 and 3 respectively. To find the best clock period with our algorithm, we search the space starting starting with minimum clock period, incrementing this period by 10% of the maximum clock period until we find a period at which the circuit ran. Because the solution space may not be convex we avoided doing a binary search as is possible when trying to determine the minimum clock period when no coupling is considered (e.g. Lockyear's approach [14] ). The final period is reported in columns 4 while column 5 lists the reduction achieved with respect to the maximum possible reduction (i.e. the difference between the maximum and minimum clock periods). The final column lists the total run time.
From our results in Table II , we see that only one circuit operated at the maximum clock period. This circuit has a combinational delay from a primary input to a primary output that sets the clock period. For the others, the circuit ran at a smaller clock period than the maximum one. Some circuits were able to run at the indicated minimum clock period. The number of calculations to reach the minimum clock period was one for all circuits except for circuits dk16, ex2 and ex6, for which the minimum period was obtained during the second calculation, and for train4 where the minimum clock period was obtained during the 11th calculation. This fast convergence is due to the fact that more than one capacitor was effective in contributing to the delay very early in the algorithm. Table III lists the number of capacitors that affect delays in the circuit.
The total run times are shown in Table II . The 4K circuit, c4k, ran in less than 6 minutes. All analyses were performed on a Sun Enterprise-250. Run times were collected using gethrtime system call which measures user time. This is almost the same as CPU time considering that timing analysis was the only active process running on the machine. Table IV lists the run times associated with each phase of the algorithm as outlined in the steps in Section VII.
Our results conclude that, for the set of examined benchmarks, it is indeed possible to find a faster clock schedule using more accurate and less pessimistic timing analysis. The implementation seems reasonably fast for the examples presented. The run time, however, may become prohibitive for larger circuits. From the run times in Table II , once can see that the run time grows approximately by a factor of 6 as the circuit size is doubled. Due to the unavailability of realistic public domain larger benchmarks, it is not possible to further assess the implementation.
In light of comments by Szymanski and Shenoy [23] , we make the following two observations. First, the SMO equations [17] may have more than one solution when the circuit is running at the optimal clock period. The slightest physical perturbation may cause the circuit to switch from one solution to another. Szymanski and Shenoy advise against operating a circuit at such an optimal clock period. Cross talk could potentially cause timing violations and thus errors while switching from one operating point to another. It is important to thus characterize how far from the optimal clock period a circuit must operate to avoid errors while switching from one operating point to another. Second, the Szymanski and Shenoy algorithm depicts a simulation of the circuit operation during the first V L cycles once power is turned on [23] Gates  train4  20  2  1  4  13  bbsse  109  7  7  8  87  ex2  114  2  2  10  100  ex6  119  5  8  17  89  cse  175  7  7  8  153  kirkman  200  12  6  8  174  dk16  205  2  3  11  189  sand  542  11  9  86  436  c1k  1075  11  9  183  872  c2k  2141  11  9  377  1744  c4k  4273  11  9  765  3488 ensure proper operation. Additional cross talk analysis that considers coupling along the reset path might be needed to eliminate cross talk errors during reset.
IX. CONCLUSIONS
This is the first paper that addresses cross talk analysis for circuits with level-sensitive latches. The main contributions of this paper are (a) showing the overlapping conditions necessary to detect changes in delays due to coupling, (b) deriving a new phase shift operator to conveniently translate the aggressor's periodic occurrences to the victim's local time zone, and (c) a polynomial algorithm to solve timing verification for level-sensitive circuits in the presence of cross talk. These contributions are not specific for the dynamically bounded gate delay model, but they will hold for any discrete overlapping coupling model. Our experiments demonstrate that eliminating false coupling results in a tighter clock schedule.
Circuit
Total Caps Contributing Caps  train4  2  2  bbsse  10  9  ex2  11  10  ex6  11  11  cse  17  16  kirkman  20  15  dk16  20  17  sand  54  45  c1k  107  63  c2k  214  107  c4k 427 195 X. ACKNOWLEDGMENTS
