Abstract. In this paper, we investigate the challenges in timing model generation for designs operating at various functional modes and timing corners for reducing the overall complexity of timing verification besides preserving the key intent of IP protection. We also propose a method for concurrently generating a model that can address the requirements of timing verification of a set of functional constraint modes belonging to the same corner with a given process, voltage and temperature specifications. Eventually we present a comparison of this proposed technique to the standard timing model generation technique and outline the advantages in three metrics of accuracy, performance and compaction of the timing models.
INTRODUCTION
With advancements in very deep submicron, there has been a consistent shrink in the technology nodes and rising challenges in the domain of timing analysis. Static timing analysis (STA) technique can very efficiently estimate the timing of VLSI circuits at a given process, voltage and temperature and also perform significant optimizations on the circuit to meet the desired timing for the circuit. This however is not sufficient in modern analysis techniques where the variations in manufacturing have a significant affect over the yield of the circuit to the extent of a chip failure [2] [3] [4] .
To overcome these challenges, statistical timing analysis was touted as a solution more than a decade ago [5] [6] [7] . Statistical timing analysis (SSTA) models the process variations as a set of random variables and the cell/signal delays/slews as probability density functions. The circuit delay can then be obtained by the path-based or block-based propagation algorithms [8] [9] [10] [11] [12] [13] . By capturing the impact of process variations as delay distributions, the SSTA approach has the advantage that it can predict/prevent more silicon failures and thus improve the yield rate.
While SSTA looks promising on resolving some of the challenges arising out of process variations, it is also marred with a series of its own challenges and problems [6] [7] . First, the delay functions in real circuits can be asymmetric or uncertain and thus cannot be modeled as a Gaussian distribution as used in most of the SSTA algorithms. In addition, even they do, the correlations among within-die features may not be easily obtained, making it difficult for the common users to adopt this methodology. Moreover, the computation complexity of SSTA is generally much higher than the conventional STA and thus lowers its applicability to the modern VLSI designs.
Multi-mode multi-corner (MMMC) STA, on the other hand, is another variation-aware timing analysis technique that can serve as a compromised approach between the SSTA and conventional STA [7] [8] [9] . Instead of modeling the distributions of the process variations, it focuses only on the extreme cases (corners) of the process parameters. By computing the delay deviations of multiple process corners, the MCMM STA can ensure the worst case or best case delays and thus act as a conservative but efficient variation-aware timing analysis approach. In addition, this parameter-based method can also be extended to handle non-process variations such as voltage, temperature, etc. (i.e. different modes). However, as the number of variation parameters increases, the number of corners/modes will grow exponentially. Obviously, a naïve approach that computes the timing of each corner/mode separately will definitely not work for the practical cases.
Timing models generation for single corner is a well-known method however with the rise of MMMC STA, the method of model generation still remains same [14] . Combinations of functional modes and delay corners easily result in more than a hundred in the model STA world. One more for each combination of process, voltage and temperature for a given mode essentially means users need to generate many combination of models which leads to issues of increased time cost in generation and complexity in use and management of these models. mode based timing arcs of models where M <= N by reducing the timing graph for combinational and sequential circuits and have a condensed graph that represents the paths that need to be mapped into the timing model. Once done, for each constraint mode, we perform the delay characterization as described in [14] and map those arc delays into one of the designated functional modes. This process is repeated for all the functional modes using multiple threads where each thread acts on one of functional modes. Once done, we write out the timing model using model_definition and mode_value based semantics of open source Liberty format [15].
Model described above can be comprehended by STA tools use activation of one of more mode_definitions specified in the output model using standard set_mode user interface provided by modern STA tools [16] .
MODEL GENERATION INPUTS
In order for concurrent model generation, we need additional input for definition of various functional modes which includes related clocks and related timing constraints that comprise the functional mode.
TIMING GRAPH SETUP
We extend the definition of timing graph G provided in [14] . This extended timing graph is a three-tuple G = (P, DFM, CFM), where P is a set of pins, DFM a set of delay arcs where each arc contains a delay vector and CFM a set of check arcs where each arc contains a timing check vector. These vectors hold the delay or timing check value for that arc indexed by functional modes. Definition of transitions over the delay and check arcs remains exactly same as described in [14] .
Combinational and sequential arc generation logic also remains the same except for latch based designs. In case of latch based designs, the method of sequential and combinational merges are extended beyond the latch if it is transparent. A transparent latch is considered as a combinational device (more like a buffer) so the serial merge operations are done through the data to output arc of latch and final forward / backward merge operations are only applied till we hit a non-borrowing latch or a flow in the downstream of this latch.
TIMING GRAPH REDUCTION
Once "G" is available, perform graph traversal for identifying the logic to be preserved. Following algorithm is used for marking logic of interest.
Traversal 1 Forward traversal for timing check marking foreach data boundary port / terminal, do forward traversal till you reach a sequential device. If sequential device is a latch, find the output pins of that latch and repeat forward traversal again.

Traversal 2 Forward traversal for sequential paths foreach input clock boundary port, do forward traversal over graph to clock pin of a sequential device, to the output pins of that device and all the way to the output boundary data port.
Traversal 3 Forward traversal for combinational paths foreach input data boundary port, do forward traversal over graph to all the way to the output boundary data port.
Once the tracing is done, we mark the edges and nodes of the graph.
CONCURRENT CHARACTERIZATION
For each of the traversal's defined in timing graph reduction technique, we repeat the traversal for characterization.
For marked edges, take a vector of slew of load points, setup the stage for characterization. Stage is defined as a driver cell and driven interconnect. At the stage output i.e. at the output of the interconnects, we now have a new set of output slews and delays that can act as (a)
As shown above, we have two functional modes defined for analysis "Mode1" and "Mode2". For each of these modes, there could be completely overlapping, partially overlapping or distinct input vectors of slews (similarly output loads driving the stage). Delays are the delays of the edges for the stage. We construct the stage for delay calculation and then pass on these input vectors for delay calculation.
ON DISK MODEL GENERATION
After characterization is completed, next step is to create a Liberty library model that can be written on disk for subsequent usage. In order to achieve that, we generate a model exactly similar as [14] however the data storage on the modeled arcs here is different. Since [14] only targets generation of a model for single functional mode, it offers limited usage with that model. Instead of creating one timing arc between a pair of ports, we create multiple timing arcs between pair of ports. Each arc represent the characterized delays and slews for that functional mode. Distinction of functional modes on the edges is done by tagging the arcs with the mode_definition and mode statements of Liberty library format [15] .
MODEL USAGE FOR CHIP LEVEL ANALYSIS
On disk model once written can be stitched at the top level and can be bind with the netlist just like any other library cell. By default static timing analysis tools provide capability to analyze all the characterized arcs inside the generated models however using set_mode open source SDC format [17] , one can activate only a subset of arcs for doing the analysis of desired functional modes.
EXPERIMENTAL RESULTS
The proposed algorithm was implemented in C++ and TCL. Experiments were carried out on 3GHz machine with 12 processors however the Linux application was single threaded and RAM available on the machine was 64GB.
We experimented our approach on ISCAS85 benchmark circuits designs to evaluate parameters of accuracy, on disk model size and time taken for characterization of such models. These circuits were modified to create functional modes by application of timing derates so that the computed delay, arrival times and slacks could be seen different. One of the functional mode was the original ISCAS definition and second was made 10% slower by derate application. Results are illustrated in the three tables below. In the table shown above, we have first column define the set of circuits used. Second column shows the normalized path delay considering all the paths characterized inside the ETM for both the functional modes. Third column shows the respective normalized delays seen through the original circuit. Fourth column shows the deviation in results as part of the characterization. Last column indicates the total time taken for the characterization in ETM generation.
Circuit
As can be seen, the results are generally on the pessimistic side for the generated ETM ranging between 10% and 20%. As the paths lengths would increase, this number is expected to shrink citing more accuracy.
Model generation times were measured to understand if we would be taking linearly 2X the time for the generation of the models. Although we could not complete the exercise of comparison with individual runs, the runtimes were slightly between considering the fact that slew stabilization happening over the paths for the two functional modes would reduce the delay calculation effort.
