1 (734) 1. ABSTRACT A method that characterizes the timing of Intellectual Property (ZP) blocks while taking into account IP functionality is presented. IP blocks are assumed to have multiple modes of operation specified by the user. For each mode, our method calculates IO path delays and timing constraints to generate a timing model. The method thus captures the mode-dependent variation in IP delays which, according to our experiments, can be as high as 90%. The special manner in which delay calculation is performed guarantees that IP delays are never underestimated. The resulting timing models are also compacted through a process whose accuracy is controlled by the user.
1 (734) 1. ABSTRACT A method that characterizes the timing of Intellectual Property (ZP) blocks while taking into account IP functionality is presented. IP blocks are assumed to have multiple modes of operation specified by the user. For each mode, our method calculates IO path delays and timing constraints to generate a timing model. The method thus captures the mode-dependent variation in IP delays which, according to our experiments, can be as high as 90%. The special manner in which delay calculation is performed guarantees that IP delays are never underestimated. The resulting timing models are also compacted through a process whose accuracy is controlled by the user.
Keywords
Timing analysis, false path, functional (mode) dependency, IP characterization.
INTRODUCTION
Predesigned and precharacterized Intellectual Property (IP) blocks offer great potential for reducing design time and cost for System-On-Chip designs. However, the successful use of IP blocks hinges on the ability to accurately characterize their timing and functionality. The timing characterization issue is addressed in this paper.
Characterization of large IP blocks is predominantly done using static timing analysis which calculates a worst-case structural (or topological) delay ignoring functionality. Static timing analysis makes no attempt to detectfalse paths, which are signal paths that are never sensitized (activated) in actual operation. The benefits of using functional information to improve the accuracy of static timing analysis methods have long been recognized [3, 7] .
Recent characterization methods have used functional analysis to improve timing accuracy [&lo] . Functional timing analysis captures the fact that the delays in a circuit are strongly linked to the way the circuit functions. Functionality refers to the logical value computed for each circuit node given an input vector. Thus the delays and timing constraints are influenced not only by the circuit's structure but by its function as well.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distibuted for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DAC 99, New Orleans, Louisiana 01999 ACM 1-58113-092-9/99/ooo6..$5.00 karem @ eecs.umich.edu Two widely used methods for functional timing analysis are symbolic evaluation via binary decision diagrams (BDDs) [lo] and Boolean search that systematically performs an implicit enumeration of the input space [ 1, 9] . Both methods assume that the delays of a circuit depend on the values of all of its inputs. They aim at finding an input vector that sensitizes the true longest path. However, they both have complexity potentially exponential in circuit size, which severely limits their applicability.
A more practical approach is to assume that a circuit's delays depend on only a subset of its inputs. This is typical of datapath circuits, where a small number of control inputs determine the delays between a large number of data inputs and the outputs. Methods which successfully trade accuracy for computation using this approach have been reported in the literature [6, 11] .
To characterize the timing of an IP block, this paper assumes that the delays through the block are mainly determined by a relatively small number of control inputs. Each combination of these control inputs corresponds to a mode of IP operation for which a timing model is generated. Timing models consist of delay tables for all input-output pin pairs, where each table contains min/max delays for user-specified input slew and output load configurations. Although the number of all possible control input combinations can be high, not all of them are useful in practice. It is assumed that IP block characterization will need to be performed only on a reasonably small number of combinations which are predetermined. The set of all the timing models and the modes associated with them characterize the timing of the IP block.
Our method can be regarded as one based on systematic case analysis, a capability offered by some static timing analyzers whereby the user sets some inputs to constant values and then performs the analysis [8] . Unlike previous approaches, however, in addition to the delays between data inputs and outputs, the delays between control inputs and outputs are also computed. Because of the special way in which delays are calculated, our method does not suffer from the delay underestimation problem which is encountered when delays are determined using static sensitization [2, 71. An optional post-processing step is performed to coalesce conditional delays and produce a compact timing characterization.
The next section describes the proposed characterization method. Section 4 discusses table representation of timing models and their compression, Section 5 presents the application of the method to the ISCAS-85 benchmarks and the experimental results.
IP BLOCK TIMING CHARACTERIZA-TION
The timing model of an IP block is constructed as follows. The control inputs and their meaningful combinations are provided by the user. Each combination (IP mode) is applied to the inputs and is propagated as far as possible. For each mode, certain paths are blocked by controlling values that have propagated to their side inputs. For example, if one input of an AND gate is 0, the paths from its other inputs will be blocked. The path delays between each data input and each output pin are then calculated by performing topological analysis on the unblocked portion of the circuit. This amounts to using static sensitization for delay calculation from data inputs. A path delay calculated for a particular combination or mode of the control inputs is valid only for that combination. In this paper we use the notation (Cond, d> to denote that a path delay of d is valid only when Cond is true. An IO path can have several different delays, each valid under a different condition. For instance, { (!C,5),(C,7)} means that the delay of this IO path is 5 when C=O, and 7 when C=l. Unconditional delays are a special case and are denoted by (-,d ).
The method is illustrated with the example circuit of Fig. 1 . Assume that C is a control input, and A and B are data inputs. Further assume that all gates have unit delays and that only the maximum delays are of interest. The following path delays are calculated by successively setting C to 1 and 0.
C=l c=o A=%?? 3 A=G? none B-2 3 B=sZE 2
Using our (Cond, d) notation, the path delays can be expressed as:
Note the highlighted path of length 4 in Fig. 1 . Because the maximum delay for A*Z is 3 as shown above, it would appear that this path is false. This is, however, not true because static sensitization has no notion of time and implicitly assumes that control signals have had ample time to block the paths from data inputs. In reality, the data signal (A) and control signal (C) propagate together, which makes it possible for the data signal to propagate even when the control signal has a controlling value. In the above example, two simultaneous events on inputs A and C, both with a final value of 0, do propagate to output Z along the highlighted path. To account for this underestimation problem from a data input, our method calculates topological delays for paths originating at control inputs. This ensures that the outputs will always have correct delays. For the example circuit, the delay for path C-Z is therefore c=Fz: ((-, 4)} Putting all the IO path delays together, the IP block characterization method generates the timing model for the IP block. Since the method uses static sensitization and topological analysis, both of which are independent of input arrival times, the resulting timing models are valid under all possible arrival time conditions of the IP inputs. The following is a summary of the characterization method. A formal proof showing that the above method does not underestimate circuit delays is given in the appendix.
Delays and timing constraints can also depend on the value of internal registers/latches in sequential circuits. In this case, the outputs of such registers can be set to constant values along with the control inputs.
TABLE REPRESENTATION AND REDUCTION
So far it has been assumed that each path is characterized with a single delay. In reality the delay of a gate depends on the slope of its input transitions (input slew) and the capacitive loading at its output. This dependence is represented by a delay This means that the delay of P is 8ns when the input to P has a slew of 0. Ins and the output of P has a load of O.OlpF. The other entries similarly correspond to the other combinations of slew and load.
In the conditional delay notation of Section 3, tables replace scalers. For example,
IaO:
{(Cl&&C& Tl), (!Cl&&C2, 72)) where Tl and T2 are delay tables. After merging, the conditional delay for Ia0 becomes:
Notice that the conditions Cl&&C2 and !Cl&&C2 are combined into one by taking their logical OR. The resulting condition can be simplified as C2. Our current implementation performs only simple reductions such as this one. More complex reductions require the ability to manipulate logical expressions, and will be considered in the future.
EXPERIMENTAL RESULTS
The characterization method is applied to the widely-used ISCAS-85 benchmarks and an industrial IP block. The highlevel models of the ISCAS-85 benchmarks described in [4] are used to determine the control inputs and their combinations for the experiments presented in this section. All the circuits were mapped to transistor level using a sample library of gates. A detailed analysis of the ~3540 benchmark is given next. The results for the others are summarized.
Analysis of ~3540
The high-level model of ~3540 is shown in Fig. 2 . It is an 8-bit ALU whose largest block is ALU-Core (M4 Figure 2 . High-level model of the ISCAS-85 benchmark c3540 (highlighted path is false due to control input dependencies).
:O] is found for each mode.
Some modes yield identical tables; only 9 tables were found to be distinct. Table 1 shows the delay tables, the condition and a representative mode for each delay To compare the table delays, the average of the four delays in each table was found. The average delay for each mode is shown in Fig. 3~ . Notice that the delay for 7@J' is greater than any other, an indication of a false path. In fact, a topological timing analysis performed on this circuit reported the path highlighted in Fig. 2 as the longest one between AIO] and 2 [7] . This path, however, is false for the following reason. When the F-BCD bus goes through the final mux M9 during a BCD operation, the M3 mux always selects its first input, the B bus. The highlighted path is impossible to activate due to control input dependencies.
As revealed by Fig. 3a , the path delay ranges from 26.7ns to 49.6ns, nearly a 50% variation. Other modes produce a delay between these two extremes. The large variation in delay between circuit modes is therefore captured by this method.
sT
Mode It is possible to reduce the number of tables using the table merging method of Section 4. By appropriately setting TOL, different amounts of reduction are achieved. First TOL is set to Ins, which is about 2% of the maximum delay. The application of the table merging method reduces the number of tables to 5. Next TOL is increased to 5ns (about 10% of the maximum delay). The number of tables is further reduced to 3. This last step is shown in Fig. 3b . As the tables are merged subject to a user-specified tolerance, their conditions are ORed, so the new conditions are (C3 II C4 II C5 II C6 II C7) for the combined set #3,#4,#5,#6,#7, and (C8 II C9) for the combined set #8,#9. An even higher value of TOL can result in even fewer tables.
A majority of critical paths in this circuit have a delay profile similar to that of A[O]aZ [7] . It was found that the delay variation for the top 5% critical paths is 27% on average. These results show that the use of functional information provides more accurate delays and helps eliminate false paths.
Summary of Results
From the high-level models of the ISCAS-85 benchmarks, a set of modes was determined for each benchmark. An attempt was made to choose the most representative modes, but not all possible modes were covered. Table 2 shows the number of modes for each circuit along with their function, transistor count, number of input and output pins, and number of control and data pins. The circuits were characterized for two slew and two output load values. The aim of the experiments was to determine the variation in path delays. The delay variation for a path is the maximum difference in its delay between any two modes. To compare the results for all the circuits, the variation found for each path was normalized to the maximum value of the path delay over all modes. Table 3 shows the maximum and average variation for all paths and for the top 5% critical paths for each benchmark. Since the ~6288 benchmark has no control inputs, it was not included in the experiments. The maximum variation for the remaining benchmarks is 91% due to c 1908. The average variation is generally low because of many short paths which have little or no variation. When only the top 5% critical paths are considered, the average variation numbers are much higher, as shown in the table. All the ALUs and cl908 have an average variation greater than 25% for their top 5% paths. The three benchmarks ~499, c 1355 and ~7552 have little variation because they have very few control inputs: 1 in ~499 and ~1355, and 3 in the largest benchmark ~7552.
Another experiment was done on an industrial IP block which is a 16-bit ALU with 10527 transistors, 101 latches and 23 control inputs. This IP block was characterized using 10 modes. The maximum delay variation is found to be 83%, while the average over all paths is 13%. For the top 5% critical paths, the maximum variation is 75%, while the average is 60%. These results clearly illustrate that large delay variations do exist among different modes of practical circuits.
The run time required to process a circuit depends on the circuit size, number of IO paths, and number of modes. For the ISCAS-85 circuits, the run time ranged from 11s for ~432 to 246s for ~5315 on a SUN Ultra 10 with 256 MB memory. The run time for the industrial IP block was 1753s.
The timing models for the benchmarks were compressed using the table merging method. Using a tolerance of 5% of the maximum circuit delay, the reduction in the size of the timing models was found to range from 69% to 91% for the ~7552 1 0% 0% ISCAS-85 benchmarks. The reduction for the industrial IP block was calculated as 6 1% with a 5% tolerance.
CONCLUSIONS
A method that creates accurate timing models for IP blocks by making use of functionality is presented. Delays and timing constraints are calculated for each mode of IP operation, and are combined in a unified timing model. The characterization method guarantees that delays are never underestimated. A post-processing step is used to reduce the size of IP timing models generated by this method. Experiments show that delays can vary considerably among IP modes. If a static timing (topological) analysis tool is used, delays can be significantly overestimated.
