Power of timing optimization of digital circuits by Αντωνιάδης Τσατσάιας, Νικόλαος
 
1 
 
ΠΑΝΕΠΙΣΤΗΜΙΟ ΘΕΣΣΑΛΙΑΣ 
ΠΟΛΥΤΕΧΝΙΚΗ ΣΧΟΛΗ 
ΤΜΗΜΑ ΜΗΧΑΝΙΚΩΝ Η/Υ, 
ΤΗΛΕΠΙΚΟΙΝΩΝΙΩΝ ΚΑΙ ΔΙΚΤΥΩΝ 
 
“ Bελτιστοποίηση Ταχύτητας και Κατανάλωσης Ισχύος 
Ψηφιακών Κυκλωμάτων ” 
 
“ Power of Timing Optimization of Digital Circuits ”  
 
Διπλωματική Εργασία 
 
Αντωνιάδης Τσατσάιας Νικόλαος  
 
Επιβλέποντες καθηγητές : 
Σταμούλης Γεώργιος 
Ευμορφόπουλος Νέστωρ 
 
Βόλος, Σεπτέμβριος 2016  
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
2 
Διπλωματική Εργασία για την απόκτηση του Διπλώματος του Μηχανικού 
Ηλεκτρονικών Υπολογιστών, Τηλεπικοινωνιών και Δικτύων του Πανεπιστημίου 
Θεσσαλίας, στα πλαίσια του Προγράμματος Προπτυχιακών Σπουδών του Τμήματος 
Μηχανικών Η/Υ, Τηλεπικοινωνιών και Δικτύων του Πανεπιστημίου Θεσσαλίας. 
 
 
 
 
 
 
 
 
 
 
 
 
 
Ευχαριστίες 
Με την παρούσα διπλωματική εργασία θα ήθελα να ευχαριστήσω θερμά τους 
επιβλέποντες καθηγητές μου για τη συνεργασία και την εμπιστοσύνη που μου 
επέδειξαν καθώς και τους φίλους/συνεργάτες του εργαστηρίου Ε5 και ιδιαιτέρως 
το διδακτορικό φοιτητή Γαρυφάλλου Δημήτριο για την καθοδήγηση και τις 
ουσιώδεις υποδείξεις τους. Τέλος ένα μεγάλο ευχαριστώ στην οικογένεια μου για 
την ανεκτίμητη βοήθεια και υποστήριξη που μου παρείχαν κατά τη διάρκεια των 
σπουδών μου.  
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
3 
 
 
Contents 
 
Chapter 1 
1.1 Static Timing Analysis Method ................................................................................................................... 4 
1.2 Purpose and definitions ............................................................................................................................. 4 
1.3 Method Analysis........................................................................................................................................5 
1.4 Timing Propagation...................................................................................................................................6 
1.5 Interconnect Modeling..............................................................................................................................8 
1.6 Circuit Element Modeling........................................................................................................................10 
Chapter 2 
2.1 Logical Effort............................................................................................................................................14 
2.1.1 Introduction.............................................................................................................................14 
2.1.2 Delay in a logic gate…………………………………………………………………………………………….14  
2.1.3 Multistage Logic Networks......................................................................................................17 
2.2 Unified Logical Effort...............................................................................................................................21 
 2.2.1 Introduction………………………………………………………………………………………………………………………..21 
 2.2.2 Delay Model of Logic Gates with Wires...................................................................................21 
 2.2.3 Delay Minimization using Unified Logical Effort......................................................................23 
 2.2.4 ULE Optimization in Paths with Branches................................................................................25 
 2.2.5 Conclusion...............................................................................................................................29 
Chapter 3 
3.1 Input Files…………………………………………………………………………………………………………………………………………..30 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
4 
3.2 Input standard parasitic exchange format (.spef)……………………………………………………………………………....35 
3.3 Input Liberty (.lib) )…………………………………………………………………………….................................................42 
3.4 Output files (.v.scf) )…………………………………………………………………………….................................................49 
Chapter 4 
4.1 OpenTimer : Timing Analysis Tool...........................................................................................................50 
4.1.1 Introduction............................................................................................................................50 
4.1.2 Purpose..................................................................................................................................54 
4.1.3 Find critical paths with positive slacks...................................................................................54 
4.1.4  Minimum scale factor file parser..........................................................................................56 
4.1.5 Setting the unit inverter........................................................................................................57 
4.1.6 Unit inverter’s values............................................................................................................59 
4.1.7  Logical Effort values  extraction...........................................................................................64 
4.2 Conclusion.............................................................................................................................................65 
 
 
 
 
 
 
 
 
 
 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
5 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
   
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
6 
  
Chapter 1 
 
1.1 Static Timing Analysis Method 
 
Static timing analysis (STA) is a simulation method of computing the expected timing of a digital circuit 
without requiring a simulation of the full circuit. 
High-performance integrated circuits have traditionally been characterized by the clock frequency at 
which they operate. Gauging the ability of a circuit to operate at the specified speed requires an ability to 
measure, during the design process, its delay at numerous steps. Moreover, delay calculation must be 
incorporated into the inner loop of timing optimizers at various phases of design, such as logic synthesis, 
layout (placement and routing), and in in-place optimizations performed late in the design cycle. While 
such timing measurements can theoretically be performed using a rigorous circuit simulation, such an 
approach is liable to be too slow to be practical. Static timing analysis plays a vital role in facilitating the 
fast and reasonably accurate measurement of circuit timing. The speedup comes from the use of 
simplified timing models and by mostly ignoring logical interactions in circuits. It has become a mainstay 
of design over the last few decades. 
One of the earliest descriptions of a static timing approach was based on the Program Evaluation and 
Review Technique (PERT), in 1966[1]. More modern versions and algorithms appeared in the early 1980s. 
 
1.2 Purpose and Definitions 
 
In a synchronous digital system, data is supposed to move in lockstep, advancing one stage on each tick of 
the clock signal. This is enforced by synchronizing elements such as flip-flops or latches, which copy their 
input to their output when instructed to do so by the clock. Only two kinds of timing errors are possible in 
such a system: 
 A setup time violation, when a signal arrives too late, and misses the time when it should advance; 
 A hold time violation, when an input signal changes too soon after the clock's active transition. 
The time when a signal arrives can vary due to many reasons - the input data may vary, the circuit may 
perform different operations, the temperature and voltage may change, and there are manufacturing 
differences in the exact construction of each part. The main goal of static timing analysis is to verify that 
despite these possible variations, all signals will arrive neither too early nor too late, and hence proper 
circuit operation can be assured. 
Since STA is capable of verifying every path, it can detect other problems like glitches, slow paths 
and clock skew. 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
7 
 The critical path is defined as the path between an input and an output with the maximum delay. 
Once the circuit timing has been computed by one of the techniques below, the critical path can 
easily be found by using a traceback method. 
 The arrival time of a signal is the time elapsed for a signal to arrive at a certain point. The reference, 
or time 0.0, is often taken as the arrival time of a clock signal. To calculate the arrival time, delay 
calculation of all the components in the path will be required. Arrival times, and indeed almost all 
times in timing analysis, are normally kept as a pair of values - the earliest possible time at which a 
signal can change, and the latest. 
 Another useful concept is required time. This is the latest time at which a signal can arrive without 
making the clock cycle longer than desired. The computation of the required time proceeds as 
follows: at each primary output, the required times for rise/fall are set according to the specifications 
provided to the circuit. Next, a backward topological traversal is carried out, processing each gate 
when the required times at all of its fanouts are known. 
 The slack associated with each connection is the difference between the required time and the arrival 
time. A positive slack s at some node implies that the arrival time at that node may be increased by s, 
without affecting the overall delay of the circuit. Conversely, negative slack implies that a path is too 
slow, and the path must sped up (or the reference signal delayed) if the whole circuit is to work at the 
desired speed[2]. 
 
1.3 Method Analysis 
 
Timing analysis computes the amount of time signals propagate in a circuit from its p r i mary inputs (PIs) to 
its primary outputs (POs) through various circuit elements and interconnect. Signals arriving at an input of an 
element will be available at its output(s) at some later time; each element therefore introduces a delay 
during signal propagation. Further-more, assume that signal transitions are characterized by their input slew 
and their output slew, which is defined as the amount of time required for a signal to transition from high-to-
low or low-to-high. 
 
 
 
 
 
 
Figure 1 - Slews and delays in a circuit element. 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
8 
In the figure, the delay across the circuit element from input A to output Y is designated by dA→Y , the input 
slew at A by SiA, and the output slew at Y by SoY . Here, both the delay and the output slew are functions of 
input slew. 
 
1.4 Timing Propagation 
 
Starting from the primary input(s), we quantify the instant that a signal reaches an input or output of a 
circuit element as the arrival time (at). Similarly, starting from the primary output(s), we quantify the limits 
imposed for each arrival time to ensure proper circuit operation as the required arrival time (rat). Given an 
arrival time and a required arrival time, we define the slack at a circuit node as a measurement of how well 
timing constraints are met. That is, a positive slack means the required time is satisfied, and a negative slack 
means the required time is in violation. 
To account for multiple sources of within-chip variation, such as manufacturing variations, temperature 
fluctuation, voltage drops, and electromigration, timing analysis is typically done using an early/late split, 
where each circuit node has an early (lower) bound and a late (upper) bound on its time. By convention, if the 
early or late mode is not explicitly stated, both modes will be need to be considered. For example, a generic 
output slew so that is a function of input slew si implies that the early mode soearly     is a function of early 
mode siearly  , and the late mode solate     is a function of late mode silate.    
Actual arrival time. Starting from the primary inputs, arrival times (at) are computed by adding delays 
across a path, and performing the minimum (in early mode) or maximum (in late mode) of such 
accumulated times at a convergence point. That is, in early mode, we are concerned with computing the 
earliest time instant that a signal transition can reach any given circuit node. For example, let atAearly and 
atBearly  to be the early arrival times at pins A and B in Figure(). Then the early mode arrival time at the 
output pin Y will be  
 
atYearly = min(atAearly + dearlyA→Y,  atBearly  + dearlyB→Y )                                      
 
Conversely, in late mode, we are concerned with computing the latest time instant that a signal 
transition can reach any given circuit node. Following the same example in Figure 1 (right), the late mode 
arrival time at Y will be 
         atYlate = max(atAlate + dlateA→Y, atBlate  + dlateB→Y )                                       
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
9 
Required arrival time. Starting from the primary outputs, required arrival times (rat) are computed by 
subtracting the delays across a path, and performing the maximum (in early mode) or minimum (in late 
mode) of such accumulated times at a convergence point. That is, in early mode, we are concerned with 
computing the earliest time instant that a signal transition must reach any circuit node. For example, in 
Figure 2 (left), the early mode required arrival time at the input pin Z will be 
                  ratzearly  = max(ratT1early  - dearlyZ→T1, ratT2early - dearlyZ→T2  )                       
Conversely, in late mode, we are concerned with computing the latest time instant that a signal transition 
must reach any given circuit node. Following the same example in Figure 2 (left), the late mode required 
arrival time at the input pin Z will be 
 
ratzlate  = min(ratT1late  - dlateZ→T1, ratT2late –  dlateZ→T2 )     
 
Slacks. For proper circuit operation, the following conditions must hold: 
 atearly ≥ ratearly 
atlate ≤ ratlate  
 
To quantify how well timing constraints are met at each circuit node, slacks (slack) can be computed based 
on equations for at and rat . That is, slacks are positive when the required times are met, and negative 
otherwise. 
Slackearly = atear ly - ratearly  
Slacklate = ratlate -at la te   
Slew propagation. As circuit element delays and interconnect delays are a function of the input slew (si), the 
subsequent output slew (so) must be propagated. In this contest, we will assume worst-slew propagation, 
where we propagate the smallest (largest) slew in early (late) mode. Following the example in Figure 1 (right), 
the early mode and late output slew at output pin Y are, respectively: 
soYearly  = min ( soAYearly (siAearly), soBYearly (siBearly) )   
soYlate  = max ( soAYlate (siAlate), soBYlate (siBlate) ) 
Transitions. For each timing arc, delay and output slew values will propagate only for transitions that exist. 
For example, suppose there two timing arcs in serial, where the first timing arc propagates rise-to-rise (R→R) 
and fall-to-fall (F→F), and the second timing arc propagates fall-to-rise (F→R). After timing analysis, the only 
valid output transition at the second timing arc will be rise (R). The delay through both the timing arcs is the 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
10 
sum of the delay for the F→F transition in first arc plus the delay for the second arc for the F→R transition in 
the second arc. Note that the delay for the R→R delay from the first arc is not used, and the fall arrival time 
for the second arc is undefined. For this contest, an undefined early (late) arrival time is set as 987654 (-
987654), and an undefined early (late) required arrival time is set as (-987654) (987654). 
 
 
 
 
 
 
Figure 2 : Generic interconnect (left), its timing model (center) and RC network (right). 
 
 
1 . 5  I n t e r c o n n e c t  M o d e l i n g  
 
The basic instance of interconnect (wire) is a net, which is assumed to have an input pin (Port) and one or 
more output pins (Taps), as illustrated in Figure 2 (left). Parasitic RC trees only contain grounded capacitors 
and floating resistors (we will not include the discussion of coupling capacitors or grounded resistors). 
Delay. The computation of port-to-tap delays can be accurately performed through electrical simulation. 
However, and for the sake of simplicity (and speed), we will assume the simpler Elmore delay model [3], 
where the delay is approximated by the symmetric of the value of the first moment of the impulse response. To 
compute the delay of RC tree networks, we summarize the topological method [4]. 
In an RC network, consider any two nodes e and k. Let Ck be the lumped capacitance at node k, and let 
Rk->e be the total resistance of the common path between the paths from Port to e and Port to k. For 
example, in Figure 2 (right), the resistance between nodes 1 and T2 (R1->T2) is RA, as that is the only 
common resistor between the paths Z to 1 and Z to T2. The 
Elmore delay at node e is  
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
11 
de =  ∑ Rk→e
k ∈N
 Ck 
 
where N is the set of all nodes in the RC network. For the example net illustrated in Figure  (right), the delay at 
node T2 (tap) is (visiting in order nodes 1, T1, 3, 2, T2): 
 
dT2  =  RAC1 + RAC3 + RAC4 + (RA+RB)C2 + (RA+ R B+RE)C5  = RA(C1 + C3 + C4) + ( R A + R B)C2 + (RA + RB + RE)C5  
 
Output slew. The value of the output slew (so) on any given tap node T can be approximated by a two-step 
process. First, compute the output slew of the impulse response on T, which was observed  to be well-
approximated by 
 
?̂?𝑜𝑇 ≈ √2𝛽𝛵 −  𝑑𝑇
2 
 
where βT is the second moment of the input response at node T, and dT is the corresponding Elmore delay 
from Equation . Second, compute the slew of the response to the input ramp by the expression given :  
 
?̂?𝑜𝑇 ≈ √𝑠𝑖2 −  ?̂?𝑜𝑇
2 
where si is the input slew. 
 
 
 
 
Figure 3 . RC tree 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
12 
Modified RC network for output slew calculation 
 
The value of βT can be computed through the efficient path-tracing algorithm for moment 
computation proposed in [5], which is a generalization of the algorithm proposed in [1]. To 
calculate βT, first replace all capacitance values Ck in the RC network by Ckdk, where dk is the 
Elmore delay. Second, follow the same procedure as before for finding βT 
𝛽T =  ∑ Rk→Τ
k ∈N
 Ckdk 
At node T2  we have : 
βΤ2 = RA (C1d1 + C3d3 + C4d4 ) + (RA + RB)C2d2 + (RA + RB + RE)C5d5 
 
1 . 6  C i r c u i t  E l e m e n t  M o d e l i n g  
For delay and output slew calculations between two pins, the information will be given in the .lib file as two-
dimensional tables. To find the corresponding timing information, extrapolation or interpolation will be 
necessary. 
If the table contains a single value, i.e., a 1x1 table (Figure 4 left), no interpolation is necessary. That is, 
regardless of input x and y, the corresponding value is constant. If the table is one-dimensional, i.e., a 1xn 
table or a mx1 table (Figure 4 center), then the value will depend only on the non-scalar dimension. For 
example, consider the 1x4 table in Figure 4. If y <y1, then the corresponding output z value will be the linear 
extrapolation between z1 and z2. If y2 ≤ y ≤ y3, then z will be the linear interpolation between z2 and z3.  
 
 
 
Table 1.Illustration of different tables: scalar, one-dimensional, and two-dimensional. 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
13 
If y4 <y, then z will be the linear extrapolation between z3 and z4. 
 
 
 
 
 
 
 
If the table is two-dimensional, perform linear interpolation on the x values first, then perform linear 
interpolation on the y values. For example, consider the 3x4 table in Figure 4. If x2 < x  < x3 and y2 <y <y3, 
then (i) determine zfirst by linear interpolation on z22 and z32, (ii) determine zsecond  by linear interpolation on 
z23 and z33, and then ( i i i)  determine z by linear interpolation using zfirst and zsecond . 
Combinational elements. For a given combinational cell, e.g., OR gate, let the delay d and output slew so  for 
a input/output pin-pair (see Figure ) be calculated by non-linear delay model interpolation/extrapolation. 
These delay and output slew tables are stored in the .lib, and are referenced by the input slew (x) and driving 
load (y). CL  denotes the equivalent downstream capacitance seen from the output pin of the cell. Several 
sophisticated models have been proposed for computing CL . For simplicity, the application of such models is 
considered to be out of the scope of the present contest, and a simple model is adopted. CL  is assumed to be 
the sum of all the capacitances in the parasitic RC tree, including the cell pin capacitances at the taps of the 
interconnect network. 
 
 
 
 
 
Figure 4: Combinational OR gate (left), its timing model (center) and capacitances (right). 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
14 
 
Sequential elements. Sequential circuits consist of combinational blocks interleaved by registers, usually 
implemented with flip-flops (FFs). Typically, sequential circuits are composed of several stages, where a register 
captures data from the outputs of a combinational block from a previous stage, and injects it into the inputs 
of the combinational block in the next stage. Register operation is synchronized by clock signals generated by 
one or multiple clock sources. Clock signals that reach distinct flip-flops, e.g., sinks in the clock tree, are 
delayed from the clock source by a clock latency l. 
A (D) flip-flop is a storage element that captures a given logic value at its input data pin D, when a given clock 
edge is detected at its clock pin CK, and subsequently presents the captured value and its complement at the 
output pins Q and ?̅?. The flip-flop also enables asynchronous preset (set) and clear (reset) of the output pins 
through the respective S and R input pins. 
 
 
 
 
 
 
 
Figure 5:Generic D flip-flop and its timing model (left), and two FFs in series and their timing models (right). 
Setup and hold constraints. Proper operation of a flip-flop requires the logic value of the input data pin to be 
stable for a specific period of time before the capturing clock edge. This period of time is designated by the 
setup time tsetup . Additionally, the logic value of the input data pin must also be stable for a specific period of 
time after the capturing clock edge. This period of time is designated by the hold time thold. The flip-flop 
timing models are depicted in 
 
 
The complement, preset and clear signals are stated here for completeness. For the purposes of the contest, their 
behavior wi l l  be ignored. 
 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
15 
Figure 5 (left). The test time are given in the . l ib  as two-dimensional tables, and are referenced by the clock-
side input slew (x) and the data-side input slew (y). 
Signal propagation. Consider the standard signal transition between two flip-flops as illustrated in Figure 5 
(right). Assuming that the clock edge is generated at the source at time 0, it will reach the injecting 
(launching) flip-flop F F 1 at time li, making the data available at the input of the combinational block dCK→Q  
time later. If the propagation delay in the combinational block is dcomb , then the data will be available at the 
input of the capturing flip-flop F F 2 at time li + dC K →Q + dcomb. Let the clock period to be a constant T. Then 
the next clock edge will reach FF 2 at time T + lo. For correct operation, the data must be be available at the 
input pin D of FF2 tsetup  time before the next clock edge. Therefore, at the data input pin D of FF2, we have the 
following : 
 
𝑎𝑡𝐷
𝑙𝑎𝑡𝑒 =  𝑙𝑖
𝑙𝑎𝑡𝑒 +  𝑑𝐶𝐾→𝑄 +  𝑑𝑐𝑜𝑚𝑏
𝑙𝑎𝑡𝑒  
𝑟𝑎𝑡𝑠𝑒𝑡𝑢𝑝 =  𝑟𝑎𝑡𝐷
𝑙𝑎𝑡𝑒 = 𝑇 + 𝑙𝑜
𝑒𝑎𝑟𝑙𝑦 −  𝑡𝑠𝑒𝑡𝑢𝑝 
 
A similar condition can be derived for ensuring that the hold time is respected. The data input pin D of FF2 
must remain stable for at least thold  time after the clock edge reaches the corresponding CK pin. Therefore, at 
the data input pin D of FF2, we have the following: 
 
𝑎𝑡𝐷
𝑒𝑎𝑟𝑙𝑦 =  𝑙𝑖
𝑒𝑎𝑟𝑙𝑦 + 𝑑𝐶𝐾→𝑄 + 𝑑𝑐𝑜𝑚𝑏
𝑒𝑎𝑟𝑙𝑦
 
𝑟𝑎𝑡ℎ𝑜𝑙𝑑 =  𝑟𝑎𝑡𝐷
𝑒𝑎𝑟𝑙𝑦 = 𝑙𝑜
𝑙𝑎𝑡𝑒 −  𝑡ℎ𝑜𝑙𝑑 
 
Note that when computing the required arrival times in Equations 27 and 29, the value lo is specific to 
Figure 6. In the general case, lo should be replaced with atC. The previous arrival times and required arrival 
times induce setup and hold slacks, which can be computed from Equations 7 and 8. For the clock pins of the 
flip-flop, the required arrival time is derived from the test slack. For early mode, the slack at the clock pin is 
the setup or late test slack, and for late mode, the slack at the clock pin is the hold or early test slack. From 
the corresponding test slack and arrival time, the clock required arrival time can be derived, and 
appropriately propagated. 
 
  
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
16 
Chapter 2 
2.1 Logical Effort 
2.1.1 Introduction 
Timing modeling and optimization are two of the primary issues in high complexity circuit design. The 
method of Logical Effort (LE) [6], a term invented by I. Sutherland and B. Sproull in 1991, is a straightforward 
technique for fast evaluation and optimization of delay in logic paths (see Figure 6). The technique has since 
been adopted as a basis for numerous CAD tools, for the sake of its simplicity. 
 
 
 
 
 
Figure 6 - Logical effort optimization for gates without wires is based on equal stage efforts, 
g1h1=g2h2 etc. 
 
2.1.2 Delay in a Logic Gate 
The LE method is founded on a simple model of delay [4] through a single MOS logic gate. The model 
describes delays caused by the capacitive load that the logic gate drives and by the topology of the logic 
gate. Clearly, as the load increases, the delay increases, but delay also depends on the logic function of the 
gate. Inverters, the simplest logic gates, drive loads best and are often used as amplifiers to drive large 
capacitances. Logic gates that compute other functions require more transistors, some of which are 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
17 
connected in series, making them poorer than inverters at driving current. Thus a NAND gate has more delay 
than an inverter with similar transistor sizes that drives the same load. The method of logical effort 
quantifies these effects to simplify delay analysis for individual logic gates and multistage logic networks. 
As a first step, delay is expressed in terms of a basic delay unit τ which is the delay of an inverter driving an 
identical inverter with no parasitic capacitance. The unit-less number associated with this is known as the 
normalized delay. The absolute delay is then simply defined as the product of the normalized delay of the 
gate d and τ: 
𝑑𝑎𝑏𝑠 = 𝑑 ×  𝜏 
The delay incurred by a logic gate is comprised of two components, a fixed part called the parasitic delay p 
and a part that is proportional to the load on the gate’s output, called the effort delay or stage effort f. The 
total delay, measured in units of τ, is the sum of the effort and parasitic delays: 
d  = f + p 
The effort delay depends on the load and on properties of the logic gate driving the load. We introduce two 
related terms for these effects: the logical effort g captures properties of the logic gate, while the electrical 
effort h characterizes the load. The effort delay of the logic gate is the product of these two factors: 
 
     𝑓 = 𝑔 × ℎ 
The logical effort g captures the effect of the logic gate’s topology on its ability to produc e 
output current. It is independent of the size of the transistors in the circuit. The electrical 
effort h describes how the electrical environment of the logic gate affects performance and 
how the size of the transistors in the gate determines its load-driving capability. The electrical 
effort is defined by 
ℎ =
𝐶𝑜𝑢𝑡
𝐶𝑖𝑛
 
where 𝐶𝑜𝑢𝑡 is the capacitance that loads the output of the logic gate and 𝐶𝑖𝑛 is the capacitance presented 
by the input terminal of the logic gate. Electrical effort is also called fanout by many CMOS designers. 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
18 
Combining the last two equations, we obtain the basic equation that models the delay through a single 
logic gate, in units of 𝜏: 
     𝑑 = 𝑔 × ℎ + 𝑝  
 
This equation shows that logical effort g and electrical effort h both contribute to delay in the same way. 
This formulation separates 𝜏,  ,  ℎ, and 𝑝 , the four contributions to delay. The process parameter 𝜏 
represents the speed of the basic transistors. The parasitic delay p expresses the intrinsic delay of the gate 
due to its own internal capacitance, which is largely independent of the size of the transistors in the logic 
gate. The electrical effort, ℎ, combines the effects of external load, which establishes 𝐶𝑜𝑢𝑡, with the sizes 
of the transistors in the logic gate, which establish 𝐶𝑖𝑛. The logical effort 𝑔 expresses the effects of circuit 
topology on the delay free of considerations of loading or transistor size. Logical effort is useful because it 
depends only on circuit topology. 
 
Table 2 - Logical effort for inputs of static CMOS gates, assuming γ=2. γ is the ratio of an inverter's pull-up 
transistor width to pull-down transistor width. 
 
Logical effort values for a few CMOS logic gates are shown in Table 2. Logical effort is defined so that an 
inverter has a logical effort of 1. An inverter driving an exact copy of itself experiences an electrical effort 
of 1. Therefore, an inverter driving an exact copy of itself will have an effort delay of 1, according to third 
equation.  
The logical effort of a logic gate tells how much worse it is at producing output current than is an inverter, 
given that each of its inputs may present only the same input capacitance as the inverter.  
 
 
 
3 In a typical 600-nm process τ is about 50 ps. For a 250-nm process, τ is about 20 ps. In modern 45 nm processes the delay is 
approximately 4 to 5 ps.   
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
19 
Reduced output current means slower operation, and thus the logical effort number for a logic gate tells 
how much more slowly it will drive a load than would an inverter. Equivalently, logical effort is how much 
more input capacitance a gate must present in order to deliver the same output current as an inverter.  
 
It is interesting but not surprising to note from Table 2 that more complex logic functions have larger 
logical effort. Moreover, the logical effort of most logic gates grows with the number of inputs to the gate. 
Larger or more complex logic gates will thus exhibit greater delay. These properties make it worthwhile to 
contrast different choices of logical structure. 
 
2.1.3 Multistage Logic Networks 
 
The method of logical effort reveals the best number of stages in a multistage network and how to obtain 
the least overall delay by balancing the delay among the stages. The notions of logical and electrical effort 
generalize easily from individual gates to multistage paths. 
 
The logical effort along a path compounds by multiplying the logical efforts of all the logic gates along the 
path. We use the uppercase symbol G to denote the path logical effort, so that it is distinguished from g, 
the logical effort of a single gate in the path. The subscript 𝑖 indexes the logic stages along the path. 
𝐺=Π𝑔𝑖  
 
𝐺 =  ∏ 𝑔𝑖 
The electrical effort along a path through a network is simply the ratio of the capacitance that loads the 
last logic gate in the path to the input capacitance of the first gate in the path. We use an uppercase 
symbol H to indicate the electrical effort along a path. 𝐻= 𝐶𝑜𝑢𝑡/𝐶𝑖𝑛  
𝐻 =  
𝐶𝑜𝑢𝑡
𝐶𝑖𝑛
 
In this case, 𝐶𝑖𝑛 and 𝐶𝑜𝑢𝑡 refer to the input and output capacitances of the path as a whole, as may be 
inferred from context. We need to introduce a new kind of effort, named branching effort, to account for 
fanout within a network. So far we have treated fanout as a form of electrical effort: when a logic gate 
drives several loads, we sum their capacitances, to obtain an electrical effort. Treating fanout as a form of 
electrical effort is easy when the fanout occurs at the final output of a network. This method is less 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
20 
suitable when the fanout occurs within a logic network because we know that the electrical effort for the 
network depends only on the ratio of its output capacitance to its input capacitance. When fanout occurs 
within a logic network, some of the available drive current is directed along the path we are analyzing, and 
some is directed off that path. We define the branching effort b at the output of a logic gate to be 
𝑏 =  
𝐶𝑜𝑛−𝑝𝑎𝑡ℎ + 𝐶𝑜𝑓𝑓−𝑝𝑎𝑡ℎ
𝐶𝑜𝑛−𝑝𝑎𝑡ℎ
=
𝐶𝑡𝑜𝑡𝑎𝑙
𝐶𝑢𝑠𝑒𝑓𝑢𝑙
 
where 𝐶𝑜𝑛−𝑝𝑎𝑡ℎ is the load capacitance along the path we are analyzing and 𝐶𝑜𝑓𝑓−𝑝𝑎𝑡ℎ is the 
capacitance of connections that lead off the path. Note that if the path does not branch, the branching 
effort is one. The branching effort along an entire path B is the product of the branching effort at each of 
the stages along the path. 
 
𝐵 =  ∏ 𝑏𝑖 
Armed with definitions of logical, electrical, and branching effort along a path, we can define the path 
effort 𝐹. Again, we use an uppercase symbol to distinguish the path effort from the stage effort 𝑓 
associated with a single logic stage. The equation that defines path effort is reminiscent of the third 
equation, which defines the effort for a single logic gate: 
𝐹 = 𝐺 × 𝐵 × 𝐻 
Note that the path branching and electrical efforts are related to the electrical effort of each stage: 
𝐵 × 𝐻 =  
𝐶𝑜𝑢𝑡
𝐶𝑖𝑛
∏ 𝑏𝑖 = ∏ ℎ𝑖 
Although it is not a direct measure of delay along the path, the path effort holds the key to minimizing the 
delay. Observe that the path effort depends only on the circuit topology and loading and not upon the 
sizes of the transistors used in logic gates embedded within the network. Moreover, the effort is 
unchanged if inverters are added to or removed from the path, because the logical effort of an inverter is 
one. The path effort is related to the minimum achievable delay along the path, and permits us to 
calculate that delay easily. Only a little more work yields the best number of stages and the proper 
transistor sizes to realize the minimum delay.  
The path delay 𝐷 is the sum of the delays of each of the stages of logic in the path. As in the expression for 
delay in a single stage , we shall distinguish the path effort delay 𝐷𝐹 and the path parasitic delay 𝑃: 
 
𝐷 =  ∑ 𝑑𝑖 = 𝐷𝐹 +  𝑃 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
21 
 
 
 
The path effort delay is simply: 
𝐷𝐹 = ∑ 𝑔𝑖 × ℎ𝑖 
and the path parasitic delay is: 
𝑃 = ∑ 𝑝𝑖  
Optimizing the design of an N-stage logic network proceeds from a very simple result: The path delay is 
least when each stage in the path bears the same stage effort. This minimum delay is achieved when the 
stage effort is: 
𝑓 = 𝑔𝑖 × ℎ𝑖 = 𝐹
1
𝑁⁄  
We use a hat over a symbol to indicate an expression that achieves minimum delay.  
Combining these equations, we obtain the principal result of the method of logical effort, which is an 
expression for the minimum delay achievable along a path: 
 
?̂? = 𝑁 × 𝐹
1
𝑁⁄ + 𝑃 
To equalize the effort borne by each stage on a path, and therefore achieve the minimum delay along the 
path, we must choose appropriate transistor sizes for each stage of logic along the path. Equation 15 
shows that each logic stage should be designed with electrical effort 
ℎ?̂? =
𝐹
1
𝑁⁄
𝑔𝑖
 
From this relationship, we can determine the transistor sizes of gates along a path. Start at the end of the 
path and work backward, applying the capacitance transformation: 
 
𝐶𝑖𝑛𝑖 =
𝑔𝑖 × 𝐶𝑜𝑢𝑡𝑖
𝑓
 
This determines the input capacitance of each gate, which can then be distributed appropriately among 
the transistors connected to the input. 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
22 
2.2 Unified Logical Effort 
 
2.2.1 Introduction 
 
The LE method benefits from an uncomplicated and intuitive delay model and closed-form optimization 
conditions. The optimization rule of logical effort, however, only addresses logic gates and does not 
consider on-chip wires. As VLSI circuits continue to scale, the contribution of wires to the delay increases 
and cannot be neglected. This characteristic occurs not only with respect to long wires connecting 
separate modules but also to the interconnect within logic modules where the delays introduced by the 
wires connecting closely coupled gates approach and can exceed the gate delays. The useful LE rule that 
the path delay is minimum when the effort of each stage is equal breaks down, because interconnect has 
fixed capacitances which do not correlate with the characteristics of the gates (see Figure 7). This behavior 
is described by the authors of the LE method as “one of the most dissatisfying limitations of logical effort”. 
 
Figure 7 – In the case of gates with wires, the rule of equal effort breaks down because of fixed 
wire parameters.  
 
 
2.2.2 Delay Model of Logic Gates with Wires 
 
The logical effort model is modified to include the interconnect delay [7]. This change is achieved by 
extending the gate logical effort delay by the wire delay, establishing a Unified Logical Effort (ULE) model. 
Thanks to the Elmore delay model the delay of a circuit comprising logic gates and wires (see Figure 8) can 
be easily calculated 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
23 
 
Figure 8 - Cascaded logic gates with resistive-capacitive interconnect. 
The total combined delay expression is: 
 
 
𝐷𝑖 = 𝑅𝑖 × (𝐶𝑝𝑖 + 𝐶𝑤𝑖 + 𝐶𝑖+1) + 𝑅𝑤𝑖 × (0.5 × 𝐶𝑤𝑖 + 𝐶𝑖+1)  
 
where 𝑅𝑖 is the effective output resistance of the gate 𝑖 , 𝐶𝑝𝑖 is the parasitic output capacitance of gate 𝑖 , 
𝐶𝑤𝑖  and 𝑅𝑤𝑖  are, respectively, the wire capacitance and resistance of segment 𝑖 , and 𝐶𝑖+1 is the input 
capacitance of gate 𝑖+1 .  
This expression can be rewritten similar with the  function of the delay of a minimum sized inverter 𝜏 
=𝑅0𝐶0, where R0 and C0 are the output resistance and input capacitance of a minimum sized inverter: 
 
𝐷𝑖 = 𝜏 × 𝑑𝑖 = 𝜏 × [
𝑅𝑖
𝑅0
×
𝐶𝑤𝑖 + 𝐶𝑖+1 + 𝐶𝑝𝑖
𝐶0
+
𝑅𝑤𝑖
𝑅𝑜 × 𝐶0
× (0.5 × 𝐶𝑤𝑖 + 𝐶𝑖+1)] 
 
The delay 𝑑𝑖 normalized with respect to a minimum sized inverter delay 𝜏 is defined by: 
 
𝑑𝑖 = 𝑔𝑖 × (ℎ𝑖 +
𝐶𝑤𝑖
𝐶𝑖
) +
𝑅𝑤𝑖 × (0.5 × 𝐶𝑤𝑖 + 𝐶𝑖+1)
𝜏
+ 𝑝𝑖 
Where, 
 
𝑔𝑖=(𝑅𝑖×𝐶𝑖)/(𝑅0× 𝐶0)  is the logical effort ,  
ℎ𝑖=𝐶𝑖+1/𝐶𝑖    is the electrical effort,  
𝑝𝑖=(𝑅𝑖×𝐶𝑝𝑖)/(𝑅0× 𝐶0)  is the parasitic delay. 
 
The capacitive interconnect effort ℎ𝑤 and the resistive interconnect effort 𝑝𝑤 are, respectively:   
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
24 
 
ℎ𝑤𝑖 =
𝐶𝑤𝑖
𝐶𝑖
 
 
𝑝𝑤𝑖 =
𝑅𝑤𝑖 × (0.5 × 𝐶𝑤𝑖 + 𝐶𝑖+1)
𝜏
 
 
 
The wire influences the electrical effort of the logic gate with ℎ𝑤 and contributes more delay to the total 
delay with 𝑝𝑤. The final expression of the ULE delay of a single logic gate considering the interconnect is: 
 
𝑑=𝑔×(ℎ+ℎ𝑤)+(𝑝+𝑝𝑤) 
 
For an N stage logic path with interconnect the ULE delay is the sum of each delay of the single stage: 
𝑑 = ∑ 𝑔𝑖 × (ℎ𝑖 + ℎ𝑤𝑖) + (𝑝𝑖 + 𝑝𝑤𝑖)
𝑁
𝑖=1
 
 
Note that in the case of short wires, the resistance 𝑅𝑤 of the wire may be neglected, eliminating 𝑝𝑤 and 
leaving only the capacitive interconnect effort ℎ𝑤 in the expression. When the wire impedance along the 
logic path is negligible, the extended delay expression reduces to the standard LE delay equation. 
 
2.2.3 Delay Minimization using Unified Logical Effort 
 
As a first step in the path delay optimization process, consider a two-stage portion of a logic path with 
wires (as shown in Figure 4). The condition for optimal gate sizing is determined by equating the 
derivative of the delay with respect to the gate size to zero. As proven , the resulting optimum condition 
is: 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
25 
 
(𝑅𝑖 + 𝑅𝑤𝑖) × 𝐶𝑖+1 = 𝑅𝑖+1 × (𝐶𝑖+2 + 𝐶𝑤𝑖+1) 
 
 
The meaning of the optimum size of gate 𝑖+1 is achieved when the delay component (𝑹𝒊 + 𝑹𝒘𝒊) × 𝑪𝒊+𝟏 
due to the gate capacitance is equal to the delay component 𝑹𝒊+𝟏 × (𝑪𝒊+𝟐 + 𝑪𝒘𝒊+𝟏) due to the effective 
resistance of the gate. A schematic model describing the related delay components is shown in Figure 9. 
 
 
After solving the differential equations that occur in the optimization problem , we get the expression for 
the optimum input capacitance of each gate based on the ULE model: 
 
 
 
𝐶𝑖𝑜𝑝𝑡 = √
𝑔𝑖
𝑔𝑖−1 +
𝑅𝑤𝑖−1 × 𝐶𝑖−1
𝑅0 × 𝐶0
× 𝐶𝑖−1 × (𝐶𝑖+1 + 𝐶𝑤𝑖) 
 
= √𝐶𝑖−1 × 𝐶𝑖+1 × √(1 +
𝐶𝑤𝑖
𝐶𝑖+1
) ×
√
𝑔𝑖
𝑔𝑖−1 +
𝑅𝑤𝑖−1 × 𝐶𝑖−1
𝑅0 × 𝐶0
 
 
 
The first part of the resulting expression is similar to the condition described by the LE model for a path of 
identical gates. The second component expresses the influence of the interconnect capacitance. The last 
component is related to the resistance of the wire and the difference among the individual logical efforts 
(types of logic gates) along the path. This expression illustrates the quadratic relationship between the 
sizes of the neighboring gates. The gate size based on ULE can be determined by solving a set of N 
polynomial expressions for the N gates along the path. 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
26 
 
 
 
 
Figure 9 : Delay components in characterizing ULE for long wires. 
 
Later in this thesis we will show how this expression can be further extended in order to include fixed side 
branches and multiple fan-outs. In order to simplify the solution, a relaxation method has been used. The 
technique is based on an iterative calculation along the path while applying the optimum conditions. Each 
capacitance along the path is iteratively replaced by the capacitance determined from applying the 
optimum expression of the capacitance to two neighboring logic gates. 
 
 
 
 
 
2.2.4 ULE Optimization in Paths with Branches 
 
As we mentioned earlier, the expression of the optimum input capacitance of each gate based on the 
ULE model can be further extended to address the general design case where the logic path may 
include branches or gates with multiple fanout. For instance, consider the circuit shown in Figure 6. 
The circuit shows the general structure containing a side branch with RC interconnect and/or a fanout 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
27 
load with arbitrary capacitance where 𝑅𝑏 and 𝐶𝑏 are the resistance and capacitance of branch wires, 
respectively, and 𝐶𝑓 is the fanout load capacitance.  
 
The ULE expression of the total delay of stages 𝑖 and 𝑖 + 1 containing branches and fanout can be 
written as: 
 
𝑑 = 𝑔𝑖 × [ℎ𝑖 + ℎ𝑤𝑖 +
𝐶𝑏1𝑖 + 𝐶𝑓1𝑖
𝐶𝑖
+
𝐶𝑏2𝑖 + 𝐶𝑓2𝑖
𝐶𝑖
] +
𝑅𝑤𝑖
𝜏
× [0.5 × 𝐶𝑤𝑖 + ℎ𝑖 × 𝐶𝑖 + 𝐶𝑏2𝑖 + 𝐶𝑓2𝑖] + 𝑔𝑖+1
× [
𝐶𝑤𝑖+1 + 𝐶𝑖+2 + 𝐶𝑏1𝑖+1 + 𝐶𝑓1𝑖+1 + 𝐶𝑏2𝑖+1 + 𝐶𝑓2𝑖+1
ℎ𝑖 × 𝐶𝑖
] +
𝑅𝑤𝑖+1
𝜏
× [0.5 × 𝐶𝑤𝑖+1 + 𝐶𝑖+2
+ 𝐶𝑏2𝑖+1 + 𝐶𝑓2𝑖+1] 
 
   
where 𝜏 = 𝑅0 × 𝐶0 is the minimum inverter delay. Following the same procedure as in the case with no 
branches and fan-outs, we equate the derivative of the delay with respect to the gate size to zero, and the 
optimum expression for the input capacitance of each gate can be written as: 
 
𝐶𝑖 = √
𝑔𝑖 × 𝐶𝑖−1 × (𝐶𝑤𝑖 + 𝐶𝑖+1 + 𝐶𝑏1𝑖 + 𝐶𝑓1𝑖 + 𝐶𝑏2𝑖 + 𝐶𝑓2𝑖)
𝑔𝑖−1 +
𝑅𝑤𝑖−1 × 𝐶𝑖−1
𝜏
= √𝐶𝑖−1 × 𝐶𝑖+1 × √1 +
𝐶𝑤𝑖
𝐶𝑖+1
+
(𝐶𝑏1𝑖 + 𝐶𝑓1𝑖 + 𝐶𝑏2𝑖 + 𝐶𝑓2𝑖)
𝐶𝑖+1
×
√
𝑔𝑖
𝑔𝑖−1 +
𝑅𝑤𝑖−1 × 𝐶𝑖−1
𝜏
 
 
This ULE optimum expression can be generalized for any combination of side branch wires and fanout 
gates by determining the total effective capacitance of the fanout branches for each stage of the path: 
 
𝐶𝐵𝐹 = ∑ 𝐶𝑏𝑛 + ∑ 𝐶𝑓𝑚
𝑚
1
𝑛
1
 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
28 
where 𝑛 and 𝑚 are the number of branch wires and fanout gates in a path, respectively. Taking into 
consideration the last equation, the general ULE optimum expression for the input capacitance is 
determined : 
 
 
𝐶𝑖 = √𝐶𝑖−1 × 𝐶𝑖+1 × √1 +
𝐶𝑤𝑖
𝐶𝑖+1
+
𝐶𝐵𝐹𝑖
𝐶𝑖+1
×
√
𝑔𝑖
𝑔𝑖−1 +
𝑅𝑤𝑖−1 × 𝐶𝑖−1
𝜏
 
 
 
 
Figure 10 : A logic path segment including RC interconnect and two branches. 
 
 
 
In the case of a more complex parasitic tree (see Figure 11), the resistance of a wire, between two 
adjacent cells, is defined as the sum of all the resistances in the path between the adjacent cells, 
 
𝑅𝑤𝑖 = ∑ 𝑅𝑖→𝑖+1 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
29 
 
 
 
 
Figure 11 - 𝑅𝑤𝑖 = 𝑅1 + 𝑅2 + 𝑅3. 
 
 
 
In order to simplify the solution, a relaxation method is proposed in [8]. The technique is based on an 
iterative calculation along the path while applying the optimum conditions. Each capacitance along the 
path is iteratively replaced by the capacitance determined from applying the optimum expressions to two 
neighboring logic gates. The technique consists of the following steps: 
 
 
 
a) (Initialization) Set the gate capacitances along the path to arbitrary values (only the first and last values 
are given).  
b) (Iteration) Replace each capacitance by the value determined from applying the optimum expressions 
on two neighboring logic gates  
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
30 
c) (Stop check) If any of the new values differ by more than a given precision from the previous value, 
reiterate step b  
 
 
 
The application of the algorithm generally produces the optimal size, converging to 5% accuracy after 
three iterations. The gates in the last few stages of the path are the first to converge, since the accuracy 
increases while propagating along the path from the leaf to the root of the path. Consequently, fewer 
calculations are performed in each successive iteration. 
 
 
2.2.5 Conclusion 
 
Delay minimization in logic paths with wires is an important issue in the high complexity IC design process. 
The interconnect is a dominant factor in performance-driven circuits and must be explicitly considered 
throughout the design process. The characteristics of the wires are not correlated with those of the gates, 
thereby not permitting the use of the standard logical effort model. In fact, gate sizing in the presence of 
interconnect does not correspond to equal effort of all of the stages along a path. The ULE method is 
proposed for delay evaluation and minimization of logic paths with general gates and RC wires. The ULE 
method provides conditions to achieve minimum delay. Optimal gate sizing in logic paths with wires is 
achieved when the delay component due to the gate capacitance is equal to the delay component due to 
the effective resistance of the gate. The ULE method converges to the standard Logical Effort when wire 
resistance and capacitance are negligible. Gate sizing determined by the proposed ULE method makes ULE 
suitable for both manual calculations and integration into existing EDA tools. 
 
The following chapter introduce the input files that needed for the resizing tools in order to perform the 
implementation of the static timing analysis and resizing methods.  
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
31 
Chapter 3 
 
3.1 Input Files 
 
The Verilog file specifies the top level hierarchy of the design. For this thesis, we will be using a small set 
of keywords with the Verilog language. Our Verilog parser supports the set of keywords found within the 
simple.v file (reproduced below for clarity). It also supports comments that start with ‘//’. The expected 
syntax is: 
 
 
module <circuit name> ( 
<input 1>, 
..., 
<input n>, 
<output 1>, 
... 
<output m> ); 
input <input 1>; 
... 
input <input n>; 
output <output 1>; 
... 
output <output m>; 
 
// begin wire definitions 
wire <wire 1> ; 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
32 
//  end  wire definitions 
// begin cell definitions 
<cell type> <cell instance name> ( .<pin name> (<net name) ); 
//  end  cell definitions 
 
endmodule 
 
 
 
The expected structure of the Verilog file is to start with a module declaration, defining the interface with 
of the module with name <circuit name>. The inputs and output pins are explicitly declared; the internal 
wires are optionally declared with the keyword wire. For each cell definition, every <cell type> (.<pin 
name>) should be a specified cell type (pin) in the library file and every <cell instance name> and <net 
name> should be found in the design specification. Each field is considered a string. The following 
example is from c17.v; its corresponding implementation is shown in Figure 12 . 
 
 
 
01. module c17 ( 
02.         N1, N2, N3, N6, N7, 
03.         N22, N23 
04.          ); 
05. 
06.   // Start PIs 
07.  input Ν1, Ν2, Ν3, Ν6, Ν7; 
08. 
09.  // Start Pos 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
33 
10.  output N22, N23; 
11. 
12.  // Start wires 
13.  wire Ν0, Ν4, Ν5, Ν8, Ν9, Ν12, Ν10, Ν11, Ν16, Ν19; 
14. 
15.  // Start cells 
16          ΙΝV_Χ2 Ι_5 ( .Α(Ν12), .ΖΝ(Ν23) );  
17.         AND2_X2 NAND2_6 ( .A1(N16), .A2(N19), .ZN(N12) ); 
18.         ΙΝV_Χ2 Ι_4 ( .Α(Ν9), .ΖΝ(Ν22) ); 
19.         AND2_X2 NAND2_5 ( .A1(N10), .A2(N16), .ZN(N9) ); 
20.         ΙΝV_Χ2 Ι_3 ( .Α(Ν8), .ΖΝ(Ν19) ); 
21.         AND2_X2 NAND2_4 ( .A1(N11), .A2(N7), .ZN(N8) ); 
22.         ΙΝV_Χ2 Ι_2 ( .Α(Ν5), .ΖΝ(Ν16) ); 
23.  AND2_X2 NAND2_3 ( .A1(N2), .A2(N11), .ZN(N5) ); 
24.  ΙΝV_Χ2 Ι_1 ( .Α(Ν4), .ΖΝ(Ν11) ); 
25.  AND2_X2 NAND2_2 ( .A1(N3), .A2(N6), .ZN(N4) ); 
26.  ΙΝV_Χ2 Ι_0 ( .Α(Ν0), .ΖΝ(Ν10) ); 
27.  AND2_X2 NAND2_1 ( .A1(N1), .A2(N3), .ZN(N0) ); 
28. 
29.  endmodule 
 
 
 
 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
34 
Lines 01 and 29 define the start and end of the specified design with the keywords module and 
endmodule. Lines 01-04 specify the input and output connection names of the module (note that the 
direction is not specified here). Line 07 specifies the primary inputs (PIs) of the module with the keyword 
input. These names must match the ones started with module (lines 01-04). Line 10 specifies the primary 
output (PO) of the module with the keyword output. This name must match the one stated with the 
module (lines 01-04). Line 13 specifies the connections or 22 nets within the module with the keyword 
wire. These connections specify both the external PIs and POs as well as the internal connections between 
gates (explained further after lines 16-27). Lines 17-27 specify the cells used in the design, as well as how 
the cells are connected. For example, on line 16, an INV_X2-type cell instance of I_5 is specified, it’s A pin 
is fed by primary input N12, and its ZN pin feeds the primary output N23. On line 27, N1 feeds the A1 pin 
of the AND2_X2-type cell instance NAND2_1. Line 29 terminates the module definition. 
 
 
 
Figure 12 - Implementation of c17.v.   
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
35 
 
 
01. module dff_d(clk, q, d);  
02.  input clk, d;  
03.  output q;  
04.  wire clk, d;  
05.  wire q;  
06.  DFF_X1 q_reg(.CK (clk), .D (d), .Q (q), .QN ());  
07. endmodule  
08.  
09. module dff_d_4(clk, q, d);  
10.  input clk, d;  
11.  output q;  
12.  wire clk, d;  
13.  wire ; 
14. DFF_X1 q_reg(.CK (clk), .D (d), .Q (q), .QN ())q; 
15. endmodule  
16. 
17. module dff_d_3(clk, q, d);  
18.  input clk, d;  
19. output q;  
20.  wire clk, d;  
21.  wire q;  
22.  DFF_X1 q_reg(.CK(clk), .D (d), .Q (q), .QN ());  
23. endmodule  
24.  
25. module s27(CK, G0, G1, G17, G2, G3);  
26.  input CK, G0, G1, G2, G3;  
27.  output G17;  
28.  
29.  wire CK, G0, G1, G2, G3;  
30.  wire G17;  
31.  wire G5, G6, G7, G10, G11, G13, n_0, n_1;  
32.  wire n_2;  
33.  
34.  dff_d DFF_0(.d (G10), .clk (CK), .q (G5));  
35.  dff_d_4 DFF_1(CK, G6, G11);  
36.  dff_d_3 DFF_2(CK, G7, G13);  
37.  
38.  INV_X32 p1579A(.A (G11), .ZN (G17));  
39.  NOR2_X1 p5988A(.A1 (G11), .A2 (n_0), .ZN (G10));  
40.  NOR2_X1 p2151D(.A1 (n_2), .A2 (G5), .ZN (G11));  
41.  AOI22_X1 p2104A(.A1 (n_1), .A2 (G3), .B1 (n_0), .B2 (G6), .ZN (n_2));  
42.  NOR2_X1 p6096A(.A1 (n_1), .A2 (G2), .ZN (G13));  
43.  NOR2_X1 p2096A(.A1 (G1), .A2 (G7), .ZN (n_1));  
44.  INV_X1 Fp2096A(.A (G0), .ZN (n_0));  
45.endmodule
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
36 
Line 34 instantiates the module dff_d, and the arguments are passed in explicit format, where in 
line 35 the module dff_d_4 is instantiated in implicit format.  
The keyword assign can also be handled along the constants 1’b0, 1’b1, where the later can be 
used as wires. 
assign <wire_name_a> = <wire_name_b> 
Designs containing busses only in the top level module can also be partial handled (bus 
operations are not supported). 
 
 
 
3.2 Input Standard Parasitic Exchange Format (.spef) 
 
This file contains the parasitics of a set of nets as a resistive-capacitive (RC) network. If a (e.g. gate-
to-gate) connection does not have parasitics, then that connection has 0 delay and the output slew 
is equivalent to the input slew. Our SPEF parser supports the format specified in s imple.spef (see 
Appendix A) (portions reproduced for clarity). It also supports comments beginning with ‘//’. The 
format is: 
/ /  begin header 
*SPEF <string> 
* DESIGN <string> 
* DATE <string> 
*VENDOR <string> 
*PROGRAM <string> 
*VERSION <string> 
* DESIGN_FLOW <string> 
* DIVIDER <string> 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
37 
* DELIMITER <string> 
* BUS_DELIMITER <string> 
*T_UNIT <int> <string> 
*C_UNIT <int> <string> 
*R_UNIT <int> <string> 
*L_UNIT <int> <string>   
// end header 
 
 
// begin nets 
// … 
// end nets 
 
 
The header describes the general set of units for the file. In this thesis, the DELIMITER field will be 
set to ‘:’ , the C_UNIT field will be set to one picoFarad (1 PF), and the R_UNIT field will be set  to one 
Ohm (1 OHM). All other fields in the header will not be used. Below shows an example header. 
 
1. *SPEF "IEEE 1481-1998" 
2. *DESIGN "c17" 
3. *DATE "Thu Sep 25 17:47:29 2014" 
4. *VENDOR "Cadence Design Systems, Inc." 
5. *PROGRAM "Encounter" 
6. *VERSION "13.13-s017_1" 
7. *DESIGN_FLOW "PIN_CAP NONE" "NAME_SCOPE LOCAL" 
8. *DIVIDER / 
9. *DELIMITER : 
10. *BUS_DELIMITER [] 
11. *T_UNIT 1 NS 
12. *C_UNIT 1 PF 
13. *R_UNIT 1 OHM 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
38 
14. *L_UNIT 1 HENRY 
 
 
Line 01 specifies the SPEF format date. Line 02 specifies the design name. Line 03 specifies the 
date at which the file was generated. Line 04 specifies the consumer of this file. Line 05 specifies 
the tool used to generate the file. Line 06 specifies the version of this file. Line 07 specifies the 
format in which this file is used. Line 08 specifies the hierarchy divider character. Line 09 
specifies the pin divider character. Line 10 specifies the bus delimiter characters. Line 11 specifies 
the time units for the design. Line 12 specifies the capacitance units for the design. Line  13 
specifies the resistance units for the design. Line 14 specifies the inducta nce units for the design. 
To reduce file size, SPEF allows long names to be mapped (optional) to shorter numbers preceded 
by a *. This mapping is defined in the name map section. For example:  
1. / /  MMMC spef file for corner 'typ' 
2.  
3. *NAME_MAP 
4. * 1  N1 
5. *2 N2 
6. *3 N3 
7. *4 N6 
8. *5 N7 
9. *6 N22 
10. *7 N23 
11. *8 N0 
12. *9 N4 
13. *10 N5 
14. *11  N8 
15. *12 N9 
16. *13 N12 
17. *14 N10 
18. *15 N11 
19. *16 N16 
20. *17 N19 
21. *18 I_5 
22. *19 NAND2_6 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
39 
23. *20 I_4 
24. *21 NAND2_5 
25. *22 I_3 
26. *23 NAND2_4 
27. *24 I_2 
28. *25 NAND2_3 
29. *26 I_1 
30. *27 NAND2_2 
31. *28 I_0 
32. *29 NAND2_1 
Later in the file, N1 can be referred to by its name or by *1. Name mapping in SPEF is not 
required. Also, mapped and non-mapped names can appear in the same file. Typically, short 
names such as a pin named A will not be mapped as mapping would not reduce file size. One can 
write a script that will map the numbers back into names. This will make SPEF easier to read, but  
greatly increase file size. 
After the name map section, each net’s parasitics will be defined by the following format:  
 
*D_NET <net name> <total net capacitance> 
* CO N N 
<pin type> <pin name> <pin direction> 
/ /  more pin definitions 
*CAP 
<integer label> <pin or node name> <pin or node capacitance>  
/ /  more capacitor definitions 
* R ES 
<integer label> <pin or node name> <pin or node name> <pin or node resistance> 
/ /  more resistor definitions 
*END 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
40 
 
 
 
Each net’s definition begins with the keyword *D NET followed by its name and the sum of all 
the capacitors of the net. The <net name> will be unique for each net. The <total net 
capacitance> will be a decimal value, and is the sum of all the capacitors defined in the *CAP 
section. The *CONN keyword describes the set of pins attached to the net. The <pin type> field 
will either be of type port (*P), which is a primary input or output pin, or internal (*I), which is 
an internal pin in the design. In this section, only design pins will be referenced – no 
intermediate SPEF-specific node will be listed. The <pin name> field will be either a primary 
input, a primary output, have the syntax <cell name>:<cell pin name>, e.g., NAND2_1:A1, or have 
the syntax <net name>:<int>, e.g., N1:1. The <pin direction> field refers to the pin directional 
type (not the net), and will be either input (I) or output (O).  
The *CAP keyword describes the set of grounded capacitors that are in the net. Namely, each 
capacitor will be connected to a specified node and GND. For each capacitor, the <integer label>  
is a unique integer that identifies the capacitor for this net. The <pin or node name> is a string, 
and can be a primary input, primary output, a design pin with the syntax <cell name>:<cell pin 
name>, or an intermediate SPEF-specific node with the syntax <net name>:<integer>. The <pin  
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
41 
or node capacitance> will be a decimal value specifying the capacitance 
attached to the node. The actual capacitance will be this value multiplied by 
the C_UNIT value specified in the header. For example, if C_UNIT is 1 PF and 
<pin or node capacitance> is 1.2, the capacitance is 1.2 pF. 
The *RES keyword describes the set of resistors in the net. Each resistor 
connects two pins or nodes (whose format is identical to the *CAP field), 
and similarly has a unique <integer label>. The <pin or node resistance> is a 
decimal value; the actual resistance value is this field multiplied by the 
R_UNIT value specified in the header. For example, if R_UNIT is 1 OHM and 
<pin or node resistance> is 3.4, then the resistance is 3.4 Ω. The *END 
keyword indicates the end of the net parasitics. An example net definition 
is shown below: 
 
01. *D_NET *15 0.000332396 
02. * CO N N 
03. *I *23:A1 I *C 4 3 *L 0.00166 *D AND2_X2 
04. *I *26:ZN O *C 4 3 *L 0 *D INV_X2 
05. *I *25:A2 I *C 4 6 *L 0.00173 *D AND2_X2 
06. *CAP 
07. 1 * 15:0 0.000117155 
08. 2 *15:1 0.000134821 
09. 3 * 15:2 1.83593e-05 
10. 4 *15:3 3.06835e-05 
11. 5 *23:A1 9.17966e-06 
12. 6 *26:ZN 9.17966e-06 
13. 7 * 15:6 1.30172e-05 
14. * R ES 
15. 1 *15:6 *25:A2 4 
16. 2 *15:3 *15:6 1 
17. 3 *15:2 *26:ZN 1.03143 
18. 4 *15:2 *23:A1 1.03143 
19. 5 *15:1 *15:3 1.35714 
20. 6 *15:0 *15:2 4 
21. 7 *15:0 *15:1 9 
22. *END 
 
 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
42 
Let *R_UNIT and *C_UNIT be the same values as in the header above, i.e., *R_UNIT is 1 OHM 
and *C_UNIT is 1 PF. Line 01 defines the net *15 (or N11 before name mapping) with a total 
lumped capacitance of 0.000332396 pF. Lines 02-05 define the connectivity of the net *15. Line 
03 specifies the internal design pin *23:A1 is an input type. Line 04 specifies the internal design 
pin *26:ZN in an output type. Line 05 specifies the internal design pin *25:A2 is an input type. 
Lines 06-13 define the set of capacitors for the net *15. Line 07 specifies capacitor 1 between 
the SPEF-specific intermediate node *15:0 and GND with a value 0.000117155 pF. Line 08 
specifies capacitor 2 between the SPEF-specific intermediate node *15:1 and GND with a value 
0.000134821 pF. Line 09 specifies capacitor 3 between the SPEF-specific intermediate node *15:2 
and GND with a value 1.83593e-05 pF. Line 10 specifies capacitor 4 between the SPEF-specific 
intermediate node *15:3 and GND with a value 3.06835e-05 pF. Line 11 specifies capacitor 5 
between the SPEF-specific intermediate node *23:A1 and GND with a value 9.17966e-06 pF. Line 
12 specifies capacitor 6 between the SPEF-specific intermediate node *26:ZN and GND with a 
value 9.17966e-06 pF. Line 13 specifies capacitor 7 between the SPEF-specific intermediate node 
*15:6 and GND with a value 1.30172e-05 pF. Lines 14-21 defines the set of resistors of net *15. 
Line 15 specifies resistor 1 between the SPEF-specific intermediate nodes *15:6 and *25:A2 with 
a value of 4 Ω. Line 15 specifies resistor 1 between the SPEF-specific intermediate nodes *15:6 
and *25:A2 with a value of 4 Ω. Line 16 specifies resistor 2 between the SPEF-specific 
intermediate nodes *15:3 and *15:6 with a value of 1 Ω. Line 17 specifies resistor 3 between the 
SPEF-specific intermediate nodes *15:2 and *26:ZN with a value of 1.03143 Ω. Line 18 specifies 
resistor 4 between the SPEF-specific intermediate nodes *15:2 and *23:A1 with a value of 
1.03143 Ω. Line 19 specifies resistor 5 between the SPEF-specific intermediate nodes *15:1 and 
*15:3 with a value of 4 Ω. Line 20 specifies resistor 6 between the SPEF-specific intermediate 
nodes *15:0 and *15:2 with a value of 4 Ω. Line 21 specifies resistor 7 between the SPEF-specific 
intermediate nodes *15:0 and *15:1 with a value of 9 Ω. Line 22 ends the net definition. Figure 10 
illustrates the parasitics described above for net *15. 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
43 
 
 
Figure 13 - Parasitics of net *15 (N11). The R (C) labels refer to resistors (capacitors).  
 
3.3 Input Liberty (.lib) 
 
This file contains the set of all cells or gates that are available to the design. All cell instances 
found in the .v file will have corresponding cell type that is located in this file. Gate -level delay 
and output slew calculations will use the relevant t iming information found for the appropriate 
cell type. For this thesis, we will be using the NanGate 45nm Open Cell Library and the Open 
Source Liberty parser. The parser supports the full logical (.lib) set of constructs including 
Composite Current Source (CCS) Modeling Technology, and noise, plus syntax, and common 
semantic checks. 
The relevant portions of the .lib file are explained below. The library consists of (i) a header, (ii) a  
set of lookup-table definitions, and (iii) a set of cell definitions, where a cell will be a 
combinational element (e.g., NAND2) or a sequential element (e.g., flip -flop DFF). While there 
are many keywords available, this thesis will only use the following set. For readability, each 
syntax set is discussed in separate subsections below. 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
44 
HEADER. The header sets the general information about the library, and is defined in the 
NanGate 45nm Open Cell Library with the following format:  
01. /* Documentation Attributes */  
02. date     : "Thu 10 Feb 2011, 18:11:20";  
03. revision    : "revision 1.0";  
04. comment     : "Copyright (c) 2004-2011 Nangate Inc. All Rights Reserved.";  
05.  
06. /* General Attributes */  
07. technology     (cmos);  
08. delay_model    : table_lookup;  
09. in_place_swap_mode   : match_footprint;  
10. library_features    report_delay_calculation,report_power_calculation);  
11.  
12. /* Units Attributes */  
13. time_unit     : "1ns";  
14. leakage_power_unit   : "1nW";  
15. voltage_unit    : "1V";  
16. current_unit    : "1mA";  
17. pulling_resistance_unit   : "1kohm";  
18. capacitive_load_unit   (1,ff);  
19.  
20. /* Operation Conditions */  
21. nom_process    : 1.00;  
22. nom_temperature    : 25.00; 
23. nom_voltage    : 1.10;  
24.  
25. voltage_map (VDD,1.10);  
26. voltage_map (VSS,0.00);  
27.  
28. define(process_corner, operating_conditions, string);  
29. operating_conditions (typical) {  
30. process_corner    : "TypTyp";  
31. process     : 1.00;  
32. voltage     : 1.10;  
33. temperature    : 25.00;  
34. tree_type     : balanced_tree;  
35. }  
36. default_operating_conditions  : typical; 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
45 
Line 08 specifies the delay model used. Lines 13-18 specify the units in which the values in the 
.lib file are referenced. Lines 21-23 specify the nominal process, temperature, and voltage at 
which the library is characterized at. Lines 29-35 specify a set of operating conditions for the 
“typical” profile. Line 24 sets the default operating conditions of the library. ll other lines are 
being ignored. 
LOOKUP TABLES. Most of the cell libraries include table models to specify the delays and timing 
checks for various timing arcs of the cell. The table models are referred to as NLDM (Non -Linear 
Delay Model) and are used for delay, output slew, or other timing checks. The table models 
capture the delay through the cell for various combinations of input transition time at the cell 
input pin and total output capacitance at the cell output. The lookup table templates are defined  
as follows: 
lu_table_template (<table label>) {  
variable_1 : <variable name> ; 
index_1 (<string of data points for variable_1>); 
variable_2 : <variable name> ; 
index_2 (<string of data points for variable_2>);  
... 
} 
The <table label > and <variable name> fields are considered to be strings, and may or may not be 
enclosed in “‘’ and ‘”’. The string of data points will be a set of integer or double values indicating the 
index values of the table. The variable and index definition lines can be in any order, e.g., all variable 
definitions can come before all index definitions. Each <table label> can be referenced in the cell 
definitions. An example table template looks like: 
1. lu_table_template (delay_template_3x3) { 
2. variable_1 : input_net_transition; 
3. variable_2 : total_output_net_capacitance;  
4. index_1 ("1000,1001,1002"); 
5. index_2 ("1000,1001,1002"); 
6. } 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
46 
Line 01 and 06 define the table template with label “delay_template_3x3”. Line 02 specifies that  
variable_1 is the input transition time. Line 03 specifies that variable_2 is the output capacitance.  
The table values are specified like a nested loop with the first index_1  (line 04) being the outer 
(or least varying) variable and the second index_2 (line 05) being the inner (or most varying) 
variable and so on. There are three entries for each variable and thus it corresponds to a 3 -by-3 
table. In most cases, the entries for the table are also formatted like a table and the first index 
(index_1) can then be treated as a row index and the second index (index_2) becomes 
equivalent to the column index. The index values (for example 1000) are dummy placeholders 
which are overridden by the actual index values in the cell_fall and cell_rise delay tables. An 
alternate way of specifying the index values is to specify the index values in the template 
definition and to not specify them in the cell_rise and cell_fall tables. Such a temp late would look 
like this: 
1. lu_table_template(delay_template_3x3) { 
2. variable_1 : input_net_transition; 
3. va ria ble_2 : total_output_net_capacitance;  
4. index_1 ("0.1, 0.3, 0.7"); 
5. index_2 ("0.16, 0.35, 1.43"); 
6. } 
Based upon the delay tables, an input fall transition time of 0.3ns and an output load of 0.16pf 
will correspond to the rise delay of the inverter of 0.1018ns. Since a falling transition at the input  
results in the inverter output rise, the table lookup fo r the rise delay involves a falling 
transition at the inverter input. This form of representing delays in a table as a function of two 
variables, transition time and capacitance, is called the non-linear delay model (NLDM), since 
non-linear variations of delay with input transition time and load capacitance are expressed in 
such tables. The table models can also be 3-dimensional - an example is a flip-flop with 
complementary outputs, Q and QN. The NLDM models are used not only for the delay but also 
for the transition time at the output of a cell which is characterized by the input transition time 
and the output load. Thus, there are separate two-dimensional tables for computing the output 
rise and fall transition times of a cell.  
CELL DEFINITIONS. A cell specifies a gate that could be used as part of a design, e.g., 
combinational 
gate NAND2 and flip-flop DFF. Its relevant specified syntax in the .lib format is:  
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
47 
 
cell (<cell type>) { 
pin(<pin name>) { 
direction : <direction> ; 
capacitance : <double> ; 
max_capacitance : <double> ; 
min_capacitance : <double> ; timing() {  
related_pin : <pin name> ; 
/* combinational or sequential definitions * /  
} 
/ *  other timing() definitions * /  
} 
/ *  other pin definitions * /  
} 
 
In a cell, multiple pins can be defined, e.g.,  a standard NAND2 will have 3 pins – two inputs and 
one output. For each pin, the direction field indicates the type of pin: (i) input, (ii) output, or (iii)  
internal. The capacitance, max capacitance, and min capacitance fields specify the respective pin  
capacitance, maximum and minimum expected pin loads. A timing() definition creates a timing 
arc (directed pin-to-pin) inside a cell. The specific syntax is different for a combinational and 
sequential connection (discussed below). Combinational timing arcs . Combinational arcs 
propagate delay and output slew from a source pin to a sink pin. They are found in common 
combinational logic gates, e.g., NAND2 or as a clock-trigger segment in flip-flops. A propagate 
segment’s timing() syntax is:  
timing() { 
related_pin : <pin name> ; 
timing_sense : <timing sense> ; 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
48 
timing_type : <timing type> ; 
cell_<transition> (<table label>) {  
<table instance> / *  omitted for space * /  
} 
<transition>_transition(<table label>) { 
<table instance> / *  omitted for space * /  } 
/* other cell transition table definitions * /  
} 
The related pin is the source of the segment, and the pin (from the pin definition) is the sink of 
the segment. The timing sense field specifies the transition mode: (i) positive unate, where the 
source and sink transitions are the same (e.g., rise-to-rise), (ii) negative unate, where the source 
and sink transitions are opposite (e.g., rise-to-fall), and (iii) non unate, where the source 
transition has no relation to the sink transition. The timing  type field specifies if the arc is 
combinational, where the unateness is be defined as either positive unate or negative unate, or 
<timing type edge> edge, where the unateness is defined as non unate and <timing type edge>  is 
either rising or falling, and refers to the source. The cell <transition> table refers to delay; the  
<transition> transition table refers to output slew. In both tables, the <transition> refers to the 
sink of the arc, and is either rise or fall. Note that in the case of (i) positive u nate and (ii) negative 
unate, the direction of the source-to-sink transition is implicitly defined by knowing the 
unateness and the <transition> transition. For instance, if the arc is negative unate and there 
exists a table with fall transition, the arc described is a rise-to-fall transition. In the case of non 
unate, both <timing sense> and <transition> transition must be used, where the former 
describes the source edge, and the latter describes the sink edge. For example, if <timing sense>  
is rising edge and there exists a table with fall transition, the arc described is a rise -to-fall 
transition. The <table label> will be a string that corresponds either (i) to a previously -declared 
lookup-table template or (ii) be the keyword scalar, indicating that the  value stored is a single 
element (i.e., a 1x1 table). A sample gate is shown below 
 
1. cell(OR2_X2) { 
2.  pin ("o") { 
3.  direction : output ; 
4.  capacitance : 2.00 ; 
5.  timing() { 
6.  related_pin : "a"; 
7.  timing_sense : positive_u nate; 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
49 
8.  timing_type : combinational; 
9.  cell_fall (scalar) { 
10.  values ("40.00"); 
11.  } 
12.  fall_transition (delay_slew_load_6x1) { 
13.  index_1 ("1.050, 2.000, 5.000, 5.500, 9.000, 20.00");  
14. index_2 ("1.0000"); 
15.  values ( \ 
16.  "1.050000", \ 
17.  "2.000000", \ 
18.  "5.000000", \ 
19.  "5.500000", \ 
20.  "9.000000", \ 
21.  "20.000000" \ 
22.  ); 
23.  } 
24.  } 
25.     } 
26. } 
 
Lines 01-26 define the cell OR2 X2. Lines 02-25 define the pin o inside cell OR X2. Line 03 
specifies that o is an output pin. Line 04 specifies that the pin capacitance of the cell (for both 
rise and fall) is 2fF. Lines 05-24 specify a timing arc between source pin a (line 06) and sink pin o.  
Line 07 specifies that this timing arc is of type positive unate, which propagates the incoming 
transition to the output transition (i.e., rise-to-rise and fall-to-fall). Lines 09-11 specify that the 
arc contains a fall transition at the output with a fixed (scalar) delay value of 40ps. Due to the cell  
fall definition and the positive unate type, this arc is implicitly a fall -to-fall transition. Lines 12-23 
specify the output slew table using lookup-table template delay slew load 6x1, with lines 13-22 
matching the corresponding table syntax.  
 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 
50 
3.4 Output Files (.v .scf) 
 
 
The produced files comprise of a verilog file, as described in a previous section, containing the 
new cell names, after the resizing has taken place, and a file containing the scale factors of the 
new cells. The output Verilog file will be flatten, which means that if the input Verilog files 
contained a hierarchy of modules, the output file will contain only the top module which wi ll 
include all the instantiated cells and nets of the hierarchical modules.  
The .scf file defines the scale of the new cells compared to the cell sizes contained in the original  
design, and the format is defined as,  
<instance_name_1> <scale_factor_1> 
<instance_name_2>  <scale_factor_2>  
… 
<instance_name_n>  <scale_factor_n> 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 51 
Chapter 4 
4.1 OpenTimer : Timing Analysis Tool 
 
4.1.1 Introduction 
 
OpenTimer is a high-performance academic timing analysis tool developed by Tsung-Wei 
Huang and Prof. Martin D. F. Wong in the University of Illinois at Urbana-Champaign (UIUC), 
IL, USA. Evolving from its previous generation "UI-Timer", OpenTimer works on industry 
formats (.v, .spef, .lib, .sdc, .lef, .def), and supports important features such as block-based 
analysis, path-based analysis, cppr, incremental timing, and multi-threading. OpenTimer is 
extremely fast by its effective data structure and algorithm which can efficiently and 
accurately analyze large-scale designs. To further facilitate seamless integration between 
timing and other electronic design automation (EDA) applications such as timing-driven 
placement and routing, OpenTimer provides user-friendly application programming inteface 
(API) for interactive analysis. Most importantly, OpenTimer is open-source [9].  
Experimental results on industry benchmarks released from TAU 2015 timing analysis 
contest have demonstrated remarkable results achieved by OpenTimer, especially in its 
order-of-magnitude speedup over existing timers. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Figure 14 : Program flowchart of OpenTimer. 
 
 
 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 52 
 
 
 
 
In deep submicron era, timing-driven operations are imperative for the success of 
optimization flows. Optimization transforms change the design and therefore have the 
potential to significantly affect timing information. The timer must reflect such changes and 
update timing information incrementally and accurately in order to ensure slack integrity as 
well as reasonable turnaround time and performance. 
However, such process requires extremely high complexity especially when path-based 
analysis is configured. A high-quality incremental timer capable of path-based analysis is 
definitely advantageous in speeding up the timing closure.  
 
 
Figure 15. Performance improvement of incremental timing to full timing 
 
 
The significance of incremental timing is demonstrated in Figure 1. It is observed that the 
runtime improvement keeps growing as the number of optimization transforms increases. 
One obvious reason is that once the critical paths in a design have been reported, the 
optimization tool would optimize the logic (e.g., gate sizing, buffer insertion) so as to 
overcome the timing violations. This subtle change can affect up to the majority of a circuit, 
whereas in reality, depending on the trace of critical paths, the timing update may only 
involve a small portion of the circuit. Since an optimization tool can perform millions of logic 
transformations, it is important that the timing profile is kept up-to-date in an incremental 
fashion. Otherwise, optimization tools cannot support fast turnaround for timing-specific 
improvement, which dramatically degrades the productivity.  
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 53 
Three main key features of OpenTimer are:  
• Parallel framework. OpenTimer applies a pipeline task scheduler as the central engine. 
Critical tasks such as timing propagation and endpoint slack calculation are scheduled into 
the pipeline so as to overlap their runtimes.  
• Incremental capability. OpenTimer precisely and minimally captures the features that are 
key to incremental timing. With lazy evaluation, we are able to keep computation as 
minimum as necessary.  
• Path-based analysis. OpenTimer represents the path implicitly using efficient and compact 
data structure, yielding a significant saving in both search space and search time for CPPR.  
 
 
Figure 16. Parallel forward timing propagation using pipeline 
 
The effectiveness and efficiency of our timer have been evaluated on a set of industry 
benchmarks released from TAU 2015 CAD contest. Compared to the top performers in TAU 
2015 CAD contest, OpenTimer confers a high degree of differential in nearly all aspects. The 
source code of OpenTimer has been released to the public domain for promoting further 
research [10]. 
 
 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 54 
4.1.2 Purpose 
The purpose of this thesis is to extend the Open Timer timing analysis tool in order to get 
critical paths with positive slacks so we can perform the Unified Logical Effort (ULE) and 
resizing method. For this purpose it is necessary to parse the minimum scale factor 
(min_scf.scf) for every cell, to set a unit inverter and calculate inverter’s values in order to 
proceed to Logical Effort’s parameters extraction for every cell of our .lib file. 
 
4.1.3 Find critical paths with positive slacks 
 
First we declare the (Path*) object critical_path : 
 
Then we iterate the endpoint vector in order to get the nodes of the path. We perform 
backward tracing by checking the unateness of the node that we point every time in order to 
get the correct previous one, until we reach a primary input: 
 
 
 
 
 
 
 
 
 
 
 
 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 55 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 56 
 
4.1.4  Minimum scale factor file parser 
 
           
   
 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 57 
 
 
 
 
 
4.1.5 Setting the unit inverter 
 
In order to proceed, we have to set our unit inverter, which is the inverter “INV_X1”, in 
order to calculate it’s values :  
 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 58 
 
 
 
Setting the unit inverter which is defined from the .conf file: 
 
 
 
 
 
 
 
 
 
 
 
 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 59 
4.1.6 Unit inverter’s values 
 
Next step is to calculate inverter’s parasitic delay (rise/fall), logical effort (rise/fall) the .C0 
and .tau value. For this function we need the timing look up tables (rise/fall) in order to 
perform the inter-extra polation. 
 
 
 
 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 60 
 
 
 
 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 61 
 
 
In order to calculate the parasitic delay values (rise/fall) and logical effort values (rise/fall) 
we need to call the NLDM_to_LDM_conv function, the Non-Linear-Delay-Model to Linear 
Delay Model conversion. a stands for the parasitic delay, b stands for the logical effort delay 
(ps). 
 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 62 
 
 
 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 63 
 
 
 
 
 
 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 64 
 
 
 
 
4.1.7  Logical Effort values extraction 
 
For the purpose of this function we need to iterate every cell and call the LExtraction : 
           
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 65 
   
 
 
 
4.2 Conclusion 
 
We have checked and compared our results and values from OpenTimer with the ones that 
resulting from the CCSOpt , a continuous gate-level resizing tool that produce valid and 
credible values for parasitc delay and logical effort. 
For example both tools produce the following values for the input pins of the gate 
NOR4_Y20 : 
 
G_fall : 21.2632   (logical effort) 
G_rise: 21.2632 
P_fall: 0.997727  (parasitic delay) 
P_rise: 0.997727   
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 66 
That comparison is verified for all the cells of our .lib file, so we end up that we have settled 
all the  necessary tools and parameters in order to implement the resizing method for the 
critical paths with positive slacks. 
 
 
 
 
 
  
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
 67 
Bibliography 
 
[1] Kirkpatrick, TI & Clark, NR (1966). "PERT as an aid to logic design". IBM Journal of 
Research and Development. 
[2] McWilliams, T.M. (1980). "Verification of timing constraints on large digital 
systems" (PDF). Design Automation, 1980. 17th Conference on. IEEE. 
[3] C. V. Kashyap, C. J. Alpert, F. Liu and A. Devgan, “Closed-form Expressions for Extending 
Step Delay and Slew Metrics to Ramp Inputs for RC Trees”, IEEE Transactions on 
Computer-aided Design of Integrated Circuits and Systems, 23(4)(2004), pp. 509-516. 
[4]  P. Penfield Jr. and J. Rubinstein, “Signal Delay in RC Tree Networks”, Proc. Design 
Automation Conference, 1981, pp. 613-617. 
[5] C. L. Ratzlaff and L. T. Pillage, “RICE: Rapid Interconnect Circuit Evaluation Using 
AWE”, IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, 
13(6)(1994), pp. 763-776. 
[6]. "Logical Effort:designing for speed on the back of an envelope," in IEEE Advanced 
Research in VLSI, 1991.  
 
[7]. S. S. Sapatnekar, B. V. Rao, P. M. Vaidya and S. M. Kang, "An exact Solution to the 
Transistor Sizing Problem for CMOS Circuits using Convex Optimization," in IEEE Transactions 
on Computer-Aided Design of Integrated Circuits and Systems, 1993.  
 
[8] A. Morgenshtein, E. Friedman, R. Ginosar and A. Kolodny, "Unified logical effort - a 
method for delay evaluation and minimization in logic paths with RC interconnect.," in IEEE 
Transactions on Very Large Scale Integration (VLSI) Systems.  
 
[9]  Tsung-Wei Huang and Martin D. F. Wong “OpenTimer: An Open-Source High-
Performance Timing Analysis Tool”  
 
[10] Tsung-Wei Huang and Martin D. F. Wong  “Special Session Paper: Incremental Timing 
and CPPR Analysis”, Department of Electrical and Computer Engineering, University of 
Illinois at Urbana-Champaign, IL, USA 
  
 
 
 
Institutional Repository - Library & Information Centre - University of Thessaly
09/12/2017 06:34:51 EET - 137.108.70.7
