Delay insertion method in clock skew scheduling by Taskin, Baris
 
 
 
 
 
 
 
 
 
 
 
 
 
School of Engineering 
      
 
Drexel E-Repository and Archive (iDEA) 
http://idea.library.drexel.edu/   
 
 
Drexel University Libraries 
www.library.drexel.edu
 
 
 
 
 
 
 
 
 
 
 
The following item is made available as a courtesy to scholars by the author(s) and Drexel University Library and may 
contain materials and content, including computer code and tags, artwork, text, graphics, images, and illustrations 
(Material) which may be protected by copyright law. Unless otherwise noted, the Material is made available for non 
profit and educational purposes, such as research, teaching and private study. For these limited purposes, you may 
reproduce (print, download or make copies) the Material without prior permission. All copies must include any 
copyright notice originally included with the Material. You must seek permission from the authors or copyright 
owners for all uses that are not allowed by fair use and other provisions of the U.S. Copyright Law. The 
responsibility for making an independent legal assessment and securing any necessary permission rests with persons 
desiring to reproduce or use the Material. 
 
 
Please direct questions to archives@drexel.edu
 
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 4, APRIL 2006 651
Delay Insertion Method in Clock Skew Scheduling
Baris Taskin, Member, IEEE, and Ivan S. Kourtev, Member, IEEE
Abstract—This paper describes a delay insertion method that
improves the efficiency of clock skew scheduling. It is shown
that reconvergent paths limit the improvement of circuit per-
formance achievable through clock skew scheduling. A delay
insertion method is proposed such that the optimal clock period
achievable through clock skew scheduling is improved by miti-
gating the limitations caused by reconvergent paths. Experimental
results demonstrate that reconvergent paths are limiting for 34%
(41% for level sensitive) of the selected suite of ISCAS’89 bench-
mark circuits. Through the application of clock skew scheduling
with delay insertion, an average improvement of 10% shorter
clock periods (9% for level sensitive) is observed for ISCAS’89
benchmark circuits compared to the results of conventional clock
skew scheduling.
Index Terms—Delay padding, digital synchronous very large
scale integration (VLSI) circuit timing, nonzero clock skew
scheduling, reconvergent paths.
I. INTRODUCTION
ONE OF THE principal design objectives of synchronousdigital VLSI (very large scale integration) circuits is
operation at the highest clock frequency with improved toler-
ance to secondary-order effects and process parameter varia-
tions. As technology advances, designers develop techniques
to overcome manufacturing imperfections or to utilize these
imperfections to the designers’ advantage (e.g., [1]–[3]). clock
skew scheduling [4]–[9] is one such technique where the clock
signal delays are intentionally manipulated in order to improve
the circuit operation.
In this paper, minimum clock periods and the correspond-
ing optimal clock schedules [6] of edge-triggered and level-
sensitive circuits are analyzed after clock skew scheduling. It is
known that the minimum clock period of a synchronous circuit
achievable through clock skew scheduling is limited by the
uncertainties of the data-propagation times on local data paths
[4] and the total data-propagation times on data-path cycles
[10]. It is shown for the first time in this paper that reconvergent
local data paths introduce an additional theoretical limit on
the minimum clock period of a synchronous circuit achievable
Manuscript received October 16, 2004; revised June 23, 2005 and Sep-
tember 11, 2005. This work was supported in part by Center Research
Development Fund (CRDF) under Award 04.23202.31234 from The University
of Pittsburgh, by the Gigascale Silicon Research Center (GSRC), and by
MultiGiG Inc., Scotts Valley, CA. This paper was recommended by Associate
Editor L. Scheffer.
B. Taskin was with the Department of Electrical and Computer Engineering,
University of Pittsburgh, Pittsburgh, PA 15261 USA. He is now with the Depart-
ment of Electrical and Computer Engineering, Drexel University, Philadelphia,
PA 19104 USA (e-mail: taskin@coe.drexel.edu).
I. S. Kourtev is with the Department of Electrical and Computer Engineer-
ing, University of Pittsburgh, Pittsburgh, PA 15261 USA (e-mail: ivan@engr.
pitt.edu).
Digital Object Identifier 10.1109/TCAD.2006.870072
through clock skew scheduling. This limitation caused by re-
convergent paths is theoretically derived and a novel delay
insertion method is defined in order to mitigate this limitation.
In the rest of this paper, it is assumed that reconvergent paths are
the dominant limiting factor on the minimum clock period of a
synchronous circuit achievable through clock skew scheduling
over other limiting factors of delay uncertainty and data-path
cycles. This assumption does not invalidate the generality of
the analysis; it is adopted in order to simplify the presentation
of the described limitation.
The limitation on the minimum clock period caused by
reconvergent paths is derived for both edge-triggered and level-
sensitive circuit implementations. It is shown that through
systematic delay insertion, the limitation on the minimum
clock period achievable through clock skew scheduling can be
mitigated. For a scalable and fully-automated application, the
proposed delay insertion method is implemented as a linear
programming (LP) problem. A topological analysis of a circuit
(to identify the reconvergent paths) is not necessary in the
proposed LP problem, as the problem constraints are generated
for the individual local data paths. The LP problem models the
traditional clock-period minimization problem of synchronous
circuits through clock skew scheduling simultaneous with the
calculation of the optimal delay values to be inserted into each
local data path. In the formulation, the local data paths are
modeled abstractly at a higher hierarchy level, without specific
physical information at the gate level.
Note that a delay insertion method targeting the improvement
of the operating frequency of synchronous circuits has previ-
ously been offered in [11]. This method and its variants are used
in digital circuit design, where the circuits are designed to meet
the register setup-time [6] requirements (also called long-path
constraints), and the hold-time [6] requirements (also called
short-path constraints) are satisfied post analysis by inserting
appropriate delays. The delay insertion method proposed in this
paper has certain similarities with the method offered in [11],
but is fundamentally different in the following aspects.
1) The study in [11] is proposed for systems where the clock
skews are precomputed (postclock tree synthesis). This
study is proposed for systems where the optimal clock
skews need to be computed (preclock tree synthesis).
2) The study in [11] is offered to mitigate short-path-
constraint violations on local data paths. This study is
proposed to mitigate both short- and long-path-constraint
violations that can occur with clock skew scheduling.
3) The study in [11] is offered to fix the timing violations
on each local data path, where the timing of one local
data path is independent from its neighboring local data
paths (due to precomputed clock skew). In this paper,
0278-0070/$20.00 © 2006 IEEE
652 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 4, APRIL 2006
clock skew scheduling is considered, which leads to
the interdependence of the timing of adjacent local data
paths (due to the computation of optimal clock skew).
Consequently, the topological orientation of local data
paths (such as reconvergent paths) is of importance to
this study. On a larger scale, the study in [11] is a delay
insertion method at the combinational block level, while
this study is a delay insertion method at the sequential
block level of design hierarchy.
Another delay insertion method targeting circuits with
nonzero clock skew scheduling is presented in [12]. The method
in [12] is intended to mitigate the limitations caused by the
delay uncertainties of local data paths. In [12], the delay
uncertainties of local data paths are not modeled generically
using the conventional min–max delay timing models. Instead,
a simple delay timing model without uncertainties is used.
Using this model, a specific type of circuit topology is shown
to demonstrate the limitations of the uncertainties of local
data paths. The delay insertion method in [12] is proposed to
decrease the delay uncertainties of local data paths (applicable
only to local data paths conforming to the topology described in
[12]) such that the limitations caused by these uncertainties are
mitigated. The study presented in this paper is different from
the study presented in [12] in the following aspects.
1) The study in [12] does not address the limitations caused
by reconvergent paths on the improvements achievable
through clock skew scheduling. This study is the first to
demonstrate such limitations.
2) The delay insertion method presented in [12] does
not—implicitly or explicitly—mitigate the limitations
caused by reconvergent paths. This study proposes such a
delay insertion method.
3) A simple delay timing model is used in the study pre-
sented in [12], where register-to-register timing paths
and delay elements are modeled without uncertainties.
Conventional min–max delay timing models are adopted
in this study. Delay timing models are definitive to the
definitions and applications of delay insertion methods.
The rest of the paper is organized as follows. In Section II,
edge-triggered and level-sensitive circuits are briefly discussed,
and the timing constraints and variables used in the problem
formulation are introduced. In Section III, the analysis of the
optimal clock skew scheduling results generated by the clock
skew scheduling algorithms is presented. Also, in Section III,
the proposed delay insertion method is introduced. In
Section IV, the experimental results for the application of the
proposed method on benchmark circuits are presented. Finally,
conclusions are offered in Section V.
II. BACKGROUND AND TERMINOLOGY
Synchronous circuits are built of local data paths. A local
data path [6] is formed by two sequentially adjacent registers
and a combinational logic block between them. The timing
analysis of a synchronous circuit is performed by checking
the relevant timing constraints for each local data path. A
sample local data path Ri → Rf between sequentially adjacent
Fig. 1. Local data path in synchronous circuit.
Fig. 2. Circuit graph of synchronous circuit. Registers are represented by
vertices labeled v1 to v4. Local data paths are represented by edges.
registers Ri and Rf is shown in Fig. 1. The minimum and
maximum data-propagation delays on the local data path are
denoted by DifPm and D
if
PM, respectively. The pair [D
if
Pm,D
if
PM]
denotes the delay interval for the path Ri → Rf , where the de-
lay uncertainty of the path is defined by U if = (DifPM −DifPm).
The timing information of circuit components are defined and
represented similarly with delay pairs and uncertainties.
In order to provide easy manipulation by computer-aided-
design (CAD) algorithms, synchronous circuits are represented
by a circuit graph [6]. In a circuit graph, the registers and
the local data paths of the circuit are mapped to the vertices
and edges of the graph, respectively. A sample circuit graph is
illustrated in Fig. 2.
In the rest of this section, the timing constraints of a synchro-
nous system are summarized and the selected circuit model is
described. Also, the clock skew scheduling algorithms used for
the delay insertion method are reviewed. Note that the timing
constraints and notations are in the tradition of [6] and [9].
For edge-triggered circuits, the positive edge of the clock
signal is considered the active (triggering or latching) edge. A
level-sensitive circuit, on the other hand, is considered active
during the positive level of the clock signal, where the low-to-
high and high-to-low transitions are called the leading and trail-
ing edges of the clock signal, respectively. The parameter CLW
denotes the constant ON time of the clock signal. Clock skew is
defined as Tskew(i, f) = ti − tf , where ti and tf represent the
clock signal delays to the initial (Ri) and final (Rf ) registers
of a local data path.
The earliest and latest arrival times of the data signal to the
register Rf are denoted by af and Af , respectively. Similarly,
the earliest and latest departure times at Rf are denoted by
df and Df , respectively. The arrival and departure times of a
data signal at a register are defined with respect to the cycle
of the clock signal that synchronizes that register [9]. The
internal delays of a register are denoted by the parameters
H , S, DDQ, and DCQ, which stand for the hold time, the
setup time, the data-to-output delay, and the clock-to-output
delay, respectively.
TASKIN AND KOURTEV: DELAY INSERTION METHOD IN CLOCK SKEW SCHEDULING 653
TABLE I
CLOCK-SKEW-SCHEDULING METHOD FOR EDGE-TRIGGERED CIRCUITS
TABLE II
CLOCK-SKEW-SCHEDULING METHOD FOR LEVEL-SENSITIVE CIRCUITS
Two clock skew scheduling methods—one for edge-
triggered and one for level-sensitive circuits—are examined.
The selected clock skew scheduling method for edge-triggered
circuits is shown in Table I. This simple and effective linear
programming (LP) problem for the clock skew scheduling of
edge-triggered circuits is introduced in [4]. In the problem
constraints, the permissible ranges [6] for the clock skew values
on each local data path are implied. The objective of the
algorithm is to minimize the clock period T of a circuit, given
the set of permissible range constraints.
The clock skew scheduling method for level-sensitive cir-
cuits is presented in Table II. This method, introduced in [9],
is an LP problem formulation for the clock skew schedul-
ing problem of level-sensitive circuits. The time-borrowing
property [9], [13] of level-sensitive latches is reflected in the
timing constraints and the problem solution permits circuit
performance improvements due to both clock skew scheduling
and time borrowing [9]. In [9], the LP problem constraints
are categorized as latching, synchronization, propagation, and
validity constraints, which govern the proper operation of a
level-sensitive circuit. In particular, latching constraints bound
the arrival time of the data signal Xf (see Fig. 1) to be latched
into the final latch Rf of a local data path. Synchronization
constraints define the departure time of the data signal Qi from
the initial latch of a local data path. Propagation constraints
define the arrival time of the data signal Xf at the final latch Rf
of a local data path. Validity constraints ensure the consistency
in the definitions of the problem variables. These four sets
of constraints are generated for each local data path in the
circuit, and the resulting system of constraints is solved with
the objective of clock-period minimization.
The clock skew scheduling methods presented in Tables I
and II are used as frameworks of formulation for the proposed
delay insertion method. The modifications performed to model
the delay insertion method are discussed in Section III-E.
III. DELAY-INSERTION METHOD
When clock skew scheduling is applied to a synchronous
circuit, a set of optimal values that satisfy the objective function
(clock-period minimization is considered in this work) are
assigned to the delays of the clock signal to each register.
Certain data paths become critical timing paths because of
the distribution of these optimal clock delays. In this sec-
tion, the consequences of criticality to the short and long-
path constraints of a reconvergent path are analyzed. It is
demonstrated that when the short- and long-path constraints of
a reconvergent path are critical, the minimum clock period can
be improved via delay insertion. Note that the criticality of the
constraints of a reconvergent path adheres to the preliminary
assumption that the limitation caused by this reconvergent-
path system is dominant over other limitations. For circuits
where limitations caused by other factors are dominant, im-
provement through delay insertion is not possible. In experi-
mentation, such circuits are reported as cases where the delay
insertion method is inapplicable (e.g., delay insertion is not
beneficial).
A reconvergent data-path system is defined by two or more
series of local data paths (reconvergent paths) with a common
source register and a common sink register. The source and
sink registers are called the divergent register Rd and the
convergent register Rc, respectively. Let pd{i1...in}c define a
reconvergent path starting from register Rd, continuing through
the intermediate registers {Ri1 , . . . , Rin} and ending at reg-
ister Rc. The number of intermediate registers rd{i1...in}c =
n is a nonnegative integer number (n ∈ Z+ ∪ {0}) and the
path is acyclic [∀in, im : Rd = Rin , Rin = Rim , Rin = Rc,
and Rd = Rc]. In Fig. 2, for instance, there are two reconver-
gent paths p123 and p13 between v1 and v3, where the numbers
of intermediate registers are r123 = 1 and r13 = 0, respectively.
The path delay PDd{i1...in}c of a reconvergent path pd{i1...in}c is
defined as the total data-propagation time between the divergent
and convergent registers Rd and Rc, respectively, over the
intermediate registers {Ri1 , . . . , Rin}. A path delay is not the
summation of all the data-propagation times on the path—
the path delay depends on the clock period, as well. The min-
imum and maximum path delays of the reconvergent data path
are given by PDd{i1...in}cm and PDd{i1...in}cM , respectively. In or-
der to represent the longest and shortest total data-propagation
times between the divergent and convergent registers (path
delays), system delay is defined. The system delay SDdc of a re-
convergent data-path system between divergent and convergent
registers Rd and Rc must consider all the path delays between
registers Rd and Rc. The maximum system delay SDdcM is
defined by the largest of the maximum path delays between Rd
and Rc. Similarly, the minimum system delay SDdcm is defined
by the smallest of the minimum path delays between Rd and
Rc. If there are k number of reconvergent paths between Rd
and Rc, labeled pA, pB , . . . , pK , the minimum and maximum
system delays are represented by
SDdcm = min (PDpAm ,PDpBm , . . . ,PDpKm ) (1)
SDdcM = max (PD
pA
M ,PD
pB
M , . . . ,PD
pK
M ) . (2)
654 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 4, APRIL 2006
Fig. 3. Simple reconvergent data-path system. Registers R1 and R2 are diver-
gent and convergent registers, respectively. Two reconvergent paths p12a and
p12b form reconvergent data-path system. Note that system delay of reconver-
gent datapath system is given by interval [SD12m , SD12M ] = [PD
12b
m , PD12aM ] =
[0.6, 1.2].
A. Simple Numerical Example of Reconvergence
A simple reconvergent data-path system formed by two
reconvergent local data paths sharing the divergent and conver-
gent registers R1 and R2, respectively, is shown in Fig. 3. Note
that as a special case, subscripts a and b are used to identify the
two reconvergent local data paths p12a and p12b , respectively.
For this simple reconvergent data-path system, the path delay
of each reconvergent path is the data-propagation delay of the
respective local data paths, (PD12am = D12aPm = 1.0, PD12aM =
D12aPM = 1.2) and (PD12bm = D12bPm = 0.6, PD12bM = D12bPM =
0.7). The minimum and maximum system delays are driven by
the reconvergent data paths p12b and p12a , respectively
SD12m = min
(
PD12am ,PD12bm
)
= PD12bm = 0.6 (3)
SD12M = max
(
PD12aM ,PD
12b
M
)
= PD12aM = 1.2. (4)
Two circuits with the topology presented in Fig. 3 are analyzed
in Sections III-B and C—the edge-triggered circuit SFF and the
level-sensitive circuit SL, respectively.
B. Reconvergence in Edge-Triggered Circuit
For edge-triggered circuits, the data signals depart the regis-
ters clock-to-output delay DCQ after the latching edge of the
clock signal. Consequently, in SFF, the signal Q1 (see Fig. 1)
departs R1 clock-to-output delay DCQ time after the positive
clock edge and propagates along the reconvergent paths. In
order to satisfy the short-path constraints, the arrival of data
signals X2a and X2b at R2 must occur H2 later than the positive
edge of the previous clock cycle at R2. Similarly, in order to
satisfy the long-path constraints, the arrivals must occur S2
earlier than the positive edge of the current clock cycle at R2 [4]
H2 ≤ a2 ≤ A2 ≤ T − S2. (5)
Next, suppose clock skew scheduling for clock-period min-
imization is applied to an arbitrary edge-triggered circuit that
involves a reconvergent data-path system. After clock skew
scheduling, if at least one of the reconvergent paths becomes
a critical timing path, the earliest and latest arrival times of
the data signal at the critical convergent node are at marginal
Fig. 4. Timing diagram of simple edge-triggered reconvergent data-path
system in Fig. 3 after clock skew scheduling. C1 and C2 are clock signals
synchronizing registers R1 and R2, respectively.
values. Recall from Section II that the earliest and latest arrival
times of the data signal at a register (a and A, respectively)
are defined in the frame of reference of the clock signal that
synchronizes that register. After clock skew scheduling, the
clock signal delays change, causing the frame of references of
the earliest and latest data arrival times to change, which leads
to the following representation of the marginal arrival times a2
and A2 for SFF:
H2 = a2 ≤ A2 = Tmin − S2. (6)
The constraints in (6) are illustrated in Fig. 4.
Also illustrated in Fig. 4 is the separation between A2 + S2
and a2 −H2 defining the minimum clock period after clock
skew scheduling
Tmin = A2 + S2 − (a2 −H2). (7)
The data arrival times at R2 satisfy the propagation constraints
[6], [9]
a2 = min
(
d1 +D12aPm − Tmin, d1 +D12bPm − Tmin
)
= d1 +min
(
D12aPm ,D
12b
Pm
)− Tmin (8)
A2 = max
(
D1 +D12aPM − Tmin,D1 +D12bPM − Tmin
)
=D1 +max
(
D12aPM,D
12b
PM
)− Tmin. (9)
Replacing the propagation constraints in (7) yields
Tmin = D1 +max
(
D12aPM,D
12b
PM
)− Tmin + S2
− [d1 +min (D12aPm ,D12bPm)− Tmin −H2] . (10)
Considering d1 = D1, (10) is simplified to
Tmin = max
(
PD12aM ,PD
12b
M
)
−min (PD12am ,PD12bm )+ S2 +H2. (11)
Following from (1) and (2), (11) is identical to
Tmin = SD12M − SD12m + S2 +H2. (12)
Substituting the numerical values and assuming zero internal
register delays S = H = 0, the minimum clock period Tmin of
SFF after clock skew scheduling is computed Tmin = 0.6.
TASKIN AND KOURTEV: DELAY INSERTION METHOD IN CLOCK SKEW SCHEDULING 655
Fig. 5. Simple reconvergent system in Fig. 3 after delay insertion. Black
circle represents a delay element of [0.1,0.2], which is inserted on reconvergent
path p12b .
Consider (12), showing the dependence of Tmin on the
algebraic difference between the maximum system delay and
the minimum system delay between Rd and Rc (summed with
the internal register delays Sc and Hc). The delay insertion
method is proposed to modify these maximum and minimum
system delays between Rd and Rc. The modification, when
applicable, decreases the algebraic difference in (12). In SFF,
for instance, the minimum system delay between Rd and Rc is
determined by PD12bm of path p12b . By inserting a delay element
of [0.1,0.2] time units on p12b , the minimum and maximum path
delays of this path are changed to D12bPm = 0.7 and D
12b
PM = 0.9,
respectively. The minimum system delay between Rd and Rc
is still determined by PD12bm of path p12b , which is now 0.7
instead of the original 0.6 time units. Both before and after
delay insertion, the maximum system delay between Rd and
Rc is determined by PD12aM of path p12a , which is a constant
1.2 time units. Therefore, the algebraic difference between the
maximum and minimum system delays between Rd and Rc
is improved from (1.2− 0.6 = 0.6) to (1.2− 0.7 = 0.5) time
units. This delay insertion procedure for the circuit shown in
Fig. 3 is illustrated in Fig. 5.
Note that for SFF, inserting a delay element with a value in
the range [0.4,0.5] on p12b gives the minimum possible alge-
braic difference in (12), leading to the minimum clock period
achievable through delay insertion T ∗min. For SFF, T ∗min evalu-
ates to T ∗min = 1.2− 1.0 = 0.2. It is shown that this minimum
clock period achievable through delay insertion depends on the
maximum of the algebraic differences between the maximum
and minimum path delays of each reconvergent path (after
delay insertion).
Proposition: Let there be k number of reconvergent paths
between Rd and Rc, labeled pA, pB , . . . , pK . The minimum
possible algebraic difference between the maximum and min-
imum path delays of each reconvergent path between Rd and
Rc after delay insertion is the minimum clock period T ∗min
achievable through delay insertion.
Let the minimum and maximum system delays define a real
numbers interval Λ, such that
Λ =
[
SDdcm ,SDdcM
]
. (13)
By definition, the minimum possible algebraic difference be-
tween the maximum and minimum path delays of each recon-
vergent path after delay insertion is the minimum length of
interval Λ (after delay insertion).
In order to compute the minimum length |Λ| of interval Λ
achievable through delay insertion, the difference [max(Λ)−
min(Λ)] is computed. Recalling (1) and (2), the following
is derived:
min(Λ) =SDdcm = min (PDpAm ,PDpBm , . . . ,PDpKm ) (14)
max(Λ) =SDdcM = max (PD
pA
M ,PD
pB
M , . . . ,PD
pK
M ) . (15)
Let the real number delay intervals formed by the minimum
and maximum delay values of the paths pA, pB , . . . , pK be
represented by A,B, . . . ,K, respectively. In other words, a de-
lay interval L, associated with the path pL ∈ {pA, pB , . . . , pK}
is formed by L = [PDpLm ,PD
pL
M ]. One of the following pos-
sibilities defining the expression [|Λ| = max(Λ)−min(Λ)]
must hold.
P1) A delay interval M ∈ {A, . . . ,K} determines both the
minimum min(Λ) and maximum max(Λ) values of the
interval Λ. Then, Λ = M and |Λ| = |M | = max(Λ)−
min(Λ) = max(M)−min(M).
P2) Otherwise, two nonidentical delay intervals determine
the minimum and maximum values of the interval Λ.
Then, ∀ L ∈ {A, . . . ,K}: |Λ| = max(Λ)−min(Λ) >
max(L)−min(L).
For systems satisfying P1), the minimum length for Λ is already
given by |Λ| = |M |. The minimum interval length—thus the
minimum clock period—cannot be changed by delay insertion.
For systems satisfying P2), the delay insertion method is ef-
fective in modifying one or more of the delay intervals in Λ
in order to promote one of the delay intervals to become the
interval M . In other words, systems satisfying P2) are con-
verted to systems satisfying P1) through delay insertion. Delay
insertion is performed into the logic network, thus, the system
delays and the interval Λ are modified with delay insertion.
Note that both the minimum and maximum system delays can
be modified with delay insertion. Therefore, it is not possible to
predetermine which reconvergent path will be the determining
path for the interval Λ after delay insertion.
In case (i) of Fig. 6, a sample system satisfying P1) is
illustrated, where the delay interval D (associated with path
pD) determines the minimum length for Λ. No modification is
necessary for such systems, as the minimum possible length
for Λ is already observed. In cases (ii) and (iii) of Fig. 6, the
application of the delay insertion method on a sample system
satisfying P2) is illustrated. Note that in case (ii), the minimum
value in the Λ interval is determined identically by delay inter-
vals C and D [min(Λ) = PDpCm = PDpDm ], while the maximum
value is determined by delay interval B [max(Λ) = PDpBM ].
Delay insertion on a reconvergent path is similar to adding an
offset to the interval, while preserving the interval length. If the
optimal values of delay elements are inserted on each path, the
minimum possible |Λ| is achieved by asserting that the biggest
delay interval M ∈ {A, . . . ,K} becomes the interval Λ. In the
modification of the sample system shown in case (iii) of Fig. 6,
the delay interval B is promoted to become this biggest delay
interval M such that both min(Λ) and max(Λ) are determined
by delay interval B (i.e., delay interval B becomes Λ).
There are two important points to note here. First, the solu-
tion set of the inserted delay values is not unique. For instance,
656 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 4, APRIL 2006
Fig. 6. Two reconvergent data-path systems satisfying P1) and P2), respectively. (i) System satisfying P1). Minimum |Λ| interval is readily determined by
delay interval D (Λ = D). (ii) System satisfying P2). Maximum and minimum values of interval Λ are determined by delay intervals B and C/D, respectively.
(iii) System in (ii) is modified through delay insertion to satisfy P1). Minimum Λ interval is now determined only by delay interval B. Delay insertion is illustrated
with one-sided arrows. (iv) System in (ii) is modified through delay insertion as in (iii), except with delay elements with uncertainties. Uncertainties accrued over
delay intervals are represented by dotted lines appended to delay interval boxes.
the delay inserted on the path defining delay interval C in case
(iii) of Fig. 6 can have any value between 5 and 11 time units
(|C| = 3) to satisfy the computed minimum interval. Also,
the delay values inserted on all paths can simultaneously be
increased by any identical amount (e.g., two time units) to
generate an alternative solution. This nonunique solution-set
property provides flexibility against any inherent uncertainty or
unavailability of exact values of the delay elements.
The second point is that, after delay insertion, the interval
lengths are preserved only if the inserted delay elements have
zero delay uncertainty. In demonstrating case (iii) of Fig. 6,
delay values with zero uncertainties are considered in order
to simplify the presentation of the delay insertion method. In
reality, delay elements have delay uncertainties just like any
other circuit component. These delay uncertainties of the delay
elements are accrued over the associated delay intervals. Let
the delay uncertainty of the delay element inserted on path
L be represented by UL. The application of delay insertion
to the sample system presented in case (ii) of Fig. 6, where
the delay uncertainties of the delay elements are accounted
for, is presented in case (iv) of Fig. 6. Note that, due to the
differences in the accrued delay uncertainties for each delay
interval, the interval determining the minimum possible length
for interval Λ can be different compared to the ideal case
presented in case (iii). Incidentally, for cases (iii) and (iv) of
Fig. 6, the delay intervals determining the minimum possible
length for Λ are B and A, respectively. Also, note that, in a
worst case scenario, the accrued delay intervals can end up
being larger compared to the minimum length for Λ presented
in case (ii). In such a case, the delay insertion method is deemed
inapplicable.
Reflecting the proposition on a general reconvergent circuit,
there are two possibilities in computing the minimum algebraic
difference in (12).
P1∗) The minimum and maximum system delays of the
reconvergent data-path system between Rd and Rc are
determined by the same reconvergent path.
P2∗) The minimum and maximum system delays of the
reconvergent data-path system between Rd and Rc are
determined by two or more nonidentical reconvergent
paths.
For systems satisfying P1∗), the minimum algebraic differ-
ence is already achieved. For systems satisfying P2∗), delay
insertion is effective. By inserting delays on one or more of
the reconvergent paths, the path with the largest difference
between its maximum and minimum path delays after delay
insertion becomes the determinant path for the minimum clock
period T ∗min achievable through delay insertion. Therefore, the
minimum clock period of SFF with clock skew scheduling and
delay insertion is
T ∗min = max∀α∈{a,b}
(
PD12αM − PD12αm + U12α
)
+ S2 +H2. (16)
TASKIN AND KOURTEV: DELAY INSERTION METHOD IN CLOCK SKEW SCHEDULING 657
Fig. 7. Timing diagram of simple level-sensitive reconvergent data-path
system in Fig. 3 after clock skew scheduling.
Assuming zero delay uncertainty and substituting the numerical
values, the minimum clock period T ∗min of SFF after clock
skew scheduling with delay insertion is T ∗min = 1.2− 1.0 =
0.2. The improvement achieved through delay insertion over
circuits with clock skew scheduling is computed with the for-
mula 100× [(Tmin − T ∗min)/Tmin]. Substituting the values, the
improvement is computed 100× [(0.6− 0.2)/0.6] = 66.7%.
The computation of the amount of delays to be inserted on
each path is integrated into the clock skew scheduling algo-
rithm. For simplicity, continuous delay models are considered
here. The revised clock skew scheduling algorithm and initial
insight for a general analysis using discrete delay models are
presented in Sections III-E and F.
C. Reconvergence in Level-Sensitive Circuit
For level-sensitive circuits, results similar to an edge-
triggered circuit are obtained albeit the significant changes in
circuit operation. For level-sensitive circuits, the positive level
of the clock signal is the active phase for each (positive-level
sensitive) register and the departure times of the data signals
from the registers depend on the arrival times of the clock and
data signals.
Consequently, in SL, the signal Q1 can depart from R1 any
time during the positive level of the clock signal. In order to
satisfy the short-path constraints, the arrival of data signals
X2a and X2b must occur H2 later than the trailing edge of the
previous clock cycle at R2. In order to satisfy the long-path
constraints, the arrivals must occur S2 earlier than the trailing
edge of the current clock cycle at R2. The timing constraints
are similar to the constraints for the edge-triggered circuit
H2 ≤ a2 ≤ A2 ≤ Tmin − S2. (17)
When clock skew scheduling is applied to SL, the earliest and
latest arrival times at R2—when critical—satisfy
H2 = a2 ≤ A2 = Tmin − S2 (18)
as illustrated in Fig. 7.
Using the same derivation as (7) and (10) and assuming
(DCQ = DDQ) and (D1 = d1) for practical reasons
Tmin = max
(
PD12aM ,PD
12b
M
)
−min (PD12am ,PD12bm )+ S2 +H2. (19)
Fig. 8. Generalized reconvergent data-path system.
An alternative way to represent (19) is
Tmin = SD12M − SD12m + S2 +H2. (20)
Substituting the numerical values and assuming zero internal
register delays, the minimum clock period Tmin of SL after
clock skew scheduling is Tmin = 0.6.
Similar to edge-triggered circuits, the delay insertion method
can be used on level-sensitive circuits in order to improve the
minimum clock period. The minimum clock period of SL with
clock skew scheduling and delay insertion is given by
T ∗min= max∀α∈{a,b}
(
PD12αM −PD12αm + U12α
)
+S2+H2. (21)
The minimum clock period T ∗min of SL after clock skew
scheduling with delay insertion is computed T ∗min = 1.2−
1.0 = 0.2, leading to an improvement of 66.7% over conven-
tional clock skew scheduling. The revised clock skew schedul-
ing algorithm with delay insertion for level-sensitive circuits is
presented in Section III-E.
Note that the earliest and latest departure times d1 and D1,
respectively, from a register R1 can be nonidentical in a level-
sensitive circuit. Fig. 7 illustrates one such case, where (21)
does not hold true. However, the minimum clock period re-
mains directly proportional to the algebraic difference between
the maximum and minimum path delays between R1 and R2.
The delay insertion method is fully applicable to level-sensitive
circuits, as the referred algebraic difference can ultimately be
modified by delay insertion, leading to improvements in the
minimum clock period.
D. General Reconvergent Data-Path Systems
The generalized case for a reconvergent data-path system
is presented in Fig. 8. The edge-triggered and level-sensitive
circuits are analyzed on the same circuit graph. Let there
be k number of reconvergent paths between Rd and Rc,
labeled pA, pB , . . . , pK . The generalized system contains
rd{i1...im}c = m and rd{j1...jn}c = n intermediate registers on
two of its reconvergent paths pI and pJ , respectively (pI , pJ ∈
{pA, pB , . . . , pK}). Assume that the minimum and maximum
658 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 4, APRIL 2006
Fig. 9. Timing diagram of generalized edge-triggered reconvergent data-path system with m = 3 and n = 2.
Fig. 10. Timing diagram of generalized level-sensitive reconvergent data-path system with m = 3 and n = 2.
system delays between Rd and Rc are determined by paths
pd{j1...jn}c = pJ and pd{i1...im}c = pI , respectively. Note that,
if m = n, the number of clock cycles for data propagation
along the paths are different. After clock skew scheduling is
applied, the earliest (acglobal) and latest (Acglobal) data arrival
times at the convergent node with respect to the global zero
time reference are
acglobal = tc + nTmin +Hc (22)
Acglobal = tc + (m+ 1)Tmin − Sc. (23)
Following from (7), the minimum clock period after clock skew
scheduling is bounded by
|m− n+ 1|Tmin = Acglobal + Sc −
(
acglobal −Hc
) (24)
which leads to
Tmin =
PDd{i1...im}cM − PDd{j1...jn}cm + Sc +Hc
|m− n+ 1|
=
SDdcM − SDdcm + Sc +Hc
|m− n+ 1| . (25)
The identical lower bounds of the minimum clock period stated
in (25) for both the edge-triggered and level-sensitive circuits
are demonstrated in Figs. 9 and 10, respectively.
Similar to the simple reconvergence case analyzed in
Section III-A, if the minimum and maximum path delays are
determined by the same reconvergent path, the delay insertion
method is inapplicable. If these delays are determined by differ-
ent reconvergent paths, the delay insertion method is effective,
permitting an improved minimum clock period
T ∗min = max∀pR,pS∈{pA,...,pK}
(
PDpRM − PDpSm + UpR − UpS
|m− n+ 1|
)
+
Sc +Hc
|m− n+ 1| . (26)
Although the minimum path delay is not the total of the data-
propagation delays of the local data paths on the reconvergent
path, the path delay can ultimately be modified by inserting de-
lays on the local data paths. The amount of delays is computed
during the application of clock skew scheduling.
E. Formulation and Analysis
In Sections III-B and C, the limitation on the minimum clock
period achievable through clock skew scheduling caused by
a reconvergent system is shown for the sample flip-flop and
latch-based circuits SF and SL, respectively. In Section III-D,
the limitation is calculated for a general representation of a
reconvergent system. Note that the limitation caused by a recon-
vergent system defines the minimum clock period of a circuit
only when this limitation is dominant over other limiting
factors—the delay uncertainties of the data-propagation times
[4] and the total data-propagation times on the data-path cycles
[10]. In this paper, the details of the limiting factors other
than the reconvergent paths are not analyzed in detail, as such
limitations are well known.
A valid approach to computing the limitation caused by
reconvergent paths in a synchronous circuit is to identify the
reconvergent systems on a circuit graph and evaluate (26). As a
more practical approach, two general LP problems are defined
in order to model the delay insertion method for level-sensitive
and edge-sensitive synchronous circuits. These LP problems
not only model and solve the clock-period minimization prob-
lems, but also compute the optimal delay values to be inserted
on each local data path in order to achieve the minimum
possible clock period.
The clock skew scheduling algorithms presented in [4] and
[9] (revised in Tables I and II, respectively) for edge-triggered
and level-sensitive circuits, respectively, are modified in order
to model the delay insertion method. As reported in [9], both
clock skew scheduling methods are highly amenable to accom-
modating additional design constraints.
TASKIN AND KOURTEV: DELAY INSERTION METHOD IN CLOCK SKEW SCHEDULING 659
TABLE III
CLOCK-SKEW-SCHEDULING METHOD FOR EDGE-TRIGGERED CIRCUITS MODIFIED
WITH DELAY-INSERTION METHOD
TABLE IV
CLOCK-SKEW-SCHEDULING METHOD FOR LEVEL-SENSITIVE CIRCUITS MODIFIED WITH DELAY-INSERTION METHOD
The clock skew scheduling algorithms modified for the de-
lay insertion method, utilizing continuous delay models with
uncertainties, are presented in Tables III and IV. The amount
of delay to be inserted is modeled by the minimum-amount
and maximum-amount variables Iifm and I
if
M , respectively. The
uncertainty U if of the delay element to be placed on local
data path Ri → Rf is U if = IifM − Iifm . The delay variables are
modeled on each local data path; however, pruning of the paths
such that only the propagation constraints of the reconvergent
paths are modified is also possible. For the former case, the
clock skew scheduling algorithm simply returns zero for the
delay values on the nonreconvergent paths.
The worst case analysis shows that the simplex method and
its variants (LP solution methods) may require exponential
number of steps to reach an optimal solution [14]. However,
a vast amount of practice has confirmed that in most cases,
the number of iterations to reach an optimal solution is poly-
nomial [14]. The exact computational complexity cannot be
determined since the internal presolver, matrix-sparsity checker
and large-scale optimizer [14], [15] routines employed within
industrial-strength LP solvers are proprietary and unknown.
The solution times observed for ISCAS’89 benchmark cir-
cuits are quite short and exemplary for small- and middle-
sized circuits. The number of constraints and variables for
both LP formulations grow linearly with increasing number
of registers and paths in a circuit [16]. The solution times
for large-scale circuits can very roughly be estimated by ex-
plicitly matching these problems (their size and complexity to
the best extent) with LP problem benchmarks, such as netlib
[17] benchmarks.
F. Practical Concerns in Modeling and Application of Delay
Insertion Method
In the problem formulation, continuous delay models have
been used. Practically, however, delay elements are available
only in discrete values. There are two possible approaches
to solving the discrete-valued delay insertion problem. The
naive approach is to solve the clock skew scheduling problem
assuming continuous delays and approximating the optimal
values with the given set of discrete components. Although
likely to produce reasonable results for simple cases, such linear
approximations to integer problems do not always guarantee
optimality [18]. As a more robust and ubiquitously valid ap-
proach, the problem can be formulated as an integer linear
problem (ILP). As expected, run times for ILP problems are
typically longer than LP problems.
Modeling and solving the problem with a continuous delay
model serves best to demonstrate the two main purposes of this
paper: identifying the limitations caused by reconvergent paths
and demonstrating how to mitigate these limitations through the
delay insertion method. By adapting continuous delay models,
the theoretical limitations of reconvergent paths and the level
of improvement through mitigation of these limitations are
analyzed independent of any cell library. For practical imple-
mentation, ILP-based solution approaches discussed above, or
similar methods, are more viable.
Another practical concern for the delay insertion method
is the area-aware delay insertion method proposed in [11].
In order to reduce the total area increase due to inserted
delays, a delay buffer-tree structure is proposed. In the buffer-
tree structure, shared delay elements are used to pad multiple
660 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 4, APRIL 2006
TABLE V
ISCAS’89 BENCHMARK CIRCUITS RESULTS SHOWING NUMBER OF REGISTERS r AND PATHS p, CLOCK PERIOD TFF FOR ZERO-SKEW CIRCUIT WITH
FLIP–FLOPS, TCSSFF FOR NONZERO-SKEW CIRCUIT WITH FLIP–FLOPS, AND T
DICSS
FF FOR NONZERO-SKEW CIRCUIT WITH FLIP–FLOPS USING
DELAY INSERTION. ALSO LISTED ARE CALCULATION TIMES tCSSFF AND t
DICSS
FF , OF T
CSS
FF AND T
DICSS
FF , RESPECTIVELY,
AND PERCENTAGE CLOCK-PERIOD IMPROVEMENTS ICSSFF , I
DICSS
FF , AND I
DI
FF FOR IMPROVEMENTS
FROM TFF TO TCSSFF , TFF TO T
DICSS
FF , AND T
CSS
FF TO T
DICSS
FF , RESPECTIVELY
fan-outs of a register. Note that the delay buffer-tree construc-
tion is a post-timing analysis process and is not integrated into
the clock skew scheduling algorithms.
As briefly mentioned in Section I, the local data paths
are modeled abstractly at a higher hierarchy level than gate-
level hierarchy. Such simplification is preferred in order to
improve the demonstration of the theoretical limitations of
reconvergent paths and the mitigation of these limitations by the
delay insertion method. In practical implementation, the loca-
tion of the delay elements to be inserted into the logic must be
identified at a lower level of abstraction—most suitably at the
gate level of hierarchy. The modeling of local data paths at a
higher abstraction level as suggested in this paper might lead
to an ambiguous assignment of delays to reconvergent paths. In
an extreme case, it is plausible that three or more reconvergent
paths might share all of the logic paths that make up the re-
convergent system. For the simplest case of three reconvergent
local data paths, any two reconvergent paths might differ by one
logic path only, and all logic paths might be covered by the three
reconvergent paths. For such a reconvergent system, including
delay elements anywhere on a reconvergent path (on any logic
path) would affect the path delay of more than one reconvergent
path. Thus, the optimal delay insertion values computed by
the presented LP problem must be postprocessed for practical
implementation.
The described concerns in the practical implementation of
the delay insertion method are not considered in the experi-
mentation stage of the work presented in this paper. Simplicity
is preserved in the models used in formulation in order to
improve the presentation of the limitation of the reconvergent
paths and the mitigation of this limitation by the delay insertion
method. Designers, however, must be wary of these practical
requirements.
IV. EXPERIMENTAL RESULTS
For experimentation, the clock skew scheduling algorithms
with the delay insertion method proposed for edge-triggered
and level-sensitive circuits (Tables III and IV) are applied
to ISCAS’89 benchmark circuits. Continuous delay models
have been used in the experimentation. The experimental setup
in [9] is replicated for the proposed timing analysis. The
timing information for each circuit component is generated
with a predetermined algorithm, where the number of fan-outs
from a component, the size, and the type of the component
are considered effective on the computed delay. Min–max
TASKIN AND KOURTEV: DELAY INSERTION METHOD IN CLOCK SKEW SCHEDULING 661
TABLE VI
ISCAS’89 BENCHMARK CIRCUITS RESULTS SHOWING NUMBER OF REGISTERS r AND PATHS p, CLOCK PERIOD TL FOR ZERO-SKEW CIRCUIT WITH
LATCHES, TCSSL FOR NONZERO-SKEW CIRCUIT WITH LATCHES, AND, T
DICSS
L FOR NONZERO-SKEW CIRCUIT WITH LATCHES
USING DELAY INSERTION, RESPECTIVELY. ALSO LISTED ARE CALCULATION TIMES tCSSL AND t
DICSS
L OF T
CSS
L AND T
DICSS
L ,
RESPECTIVELY, AND PERCENTAGE CLOCK-PERIOD IMPROVEMENTS IL, ICSSL , I
DICSS
L , AND I
DI
L FOR IMPROVEMENTS
FROM TFF TO TL, TFF TO TCSSL , TFF TO T
DICSS
L , AND T
CSS
L TO T
DICSS
L , RESPECTIVELY
timing models are used for each gate. The level-sensitive cir-
cuits are obtained by replacing the edge-triggered flip-flops of
the ISCAS’89 benchmark circuits with positive-level-sensitive
latches. A single-phase clock signal with a 50% duty cycle is
selected for synchronization. The internal register delays are
assumed to be zero (S = H = DCQ = DDQ = 0). The results
computed on a 440-MHz Sun Ultra-10 workstation with the
industrial LP solver CPLEX (version 7.5) [15] are presented
in Tables V and VI.
The clock skew scheduling algorithms used in experimen-
tation are formulated for clock-period minimization. There-
fore, through the application of clock skew scheduling with
delay insertion, improvements in the minimum clock period
are achieved. These improvements, computed with the for-
mula {I(%) = 100× [(Told − Tnew)/Told]}, are reported in
Tables V and VI. Zero clock skew, edge-sensitive circuits are
selected as the basis for comparison due to their simplicity
and popularity in digital circuit design. Both for edge-triggered
and level-sensitive circuits, the improvements achieved through
conventional clock skew scheduling (ICSSFF and ICSSL , respec-
tively) and through clock skew scheduling with delay insertion
(IDICSSFF and IDICSSL , respectively) are computed. Also shown in
Tables V and VI are the comparisons of the results of conven-
tional clock skew scheduling methods with the results of clock
skew scheduling methods with delay insertion. These com-
parisons (IDIFF and IDIL for edge-triggered and level-sensitive
circuits, respectively) demonstrate the effectiveness of the delay
insertion method in further improving the performance of clock
skew scheduling.
For ISCAS’89 benchmark circuits, delay insertion leads
to 10% and 9% improvements on average over the results
of conventional clock skew scheduling algorithms (Tables I
and II, respectively) for edge-triggered and level-sensitive
circuits, respectively. For better visualization, the perfor-
mance improvement for each benchmark circuit is presented
in Fig. 11.
The delay insertion method is inapplicable (i.e., ineffec-
tive) to some circuits due to the three reasons discussed in
Section III. The first reason is that the minimum clock period
of the circuit can be determined by a limitation other than
reconvergent paths, which cannot be mitigated by the delay
insertion method. The second reason is that probability P1∗)
might hold for some circuits. The third reason is that, due to
the uncertainty of the delay elements inserted into the logic, the
delay insertion might be ineffective in improving the minimum
clock period. In the LP formulations presented in Tables III
and IV, the uncertainties of the delay elements are modeled
without lower (and upper) bounds (delay elements can have
662 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 4, APRIL 2006
Fig. 11. Percentage improvements (IDIFF and IDIL in Tables V and VI, re-
spectively) in minimum clock period via delay insertion for edge-triggered and
level-sensitive ISCAS’89 benchmark circuits with clock skew scheduling. Two
data points shown per benchmark circuit from left to right are improvements
observed for edge-triggered and level-sensitive circuits, respectively.
zero uncertainty with Im = IM ). Thus, the third reason for
inapplicability is not observed in the experimentation. Among
the selected ISCAS’89 circuits, the delay insertion method for
edge-triggered circuits is applicable to 41% (12 circuits) of the
total 29 circuits. By excluding the circuits for which no im-
provements are observed (for which the method is inapplicable
due to first and second reasons), the average improvement of the
delay insertion method for edge-triggered circuits is 25.8% over
the conventional clock skew scheduling algorithm of [4]. The
delay insertion method on level-sensitive circuits is applicable
to 34% (10 circuits) of the total 29 circuits. By excluding the
circuits for which no improvements are observed, the average
improvement of the delay insertion method for level-sensitive
circuits is 27.4% on average over the clock skew scheduling
algorithm of [9].
The experimental results in Fig. 11 show that reconvergent
paths are the dominant limiting factor on the minimum clock
period after clock skew scheduling with a significant proba-
bility (41% and 34% as observed on the ISCAS’89 circuits).
The delay insertion method is effective in mitigating these
limitations, as shown by 10% and 9% improvements in the
minimum clock period. The proposed clock skew scheduling
method with delay insertion takes approximately twice as much
time as the conventional application of clock skew scheduling;
however, the method is highly practical for ISCAS’89 bench-
mark circuits with total run times below a few minutes on
common computing resources.
The improvements in minimum clock period achieved
through conventional clock skew scheduling (ICSSFF and ICSSL ),
and through clock skew scheduling with delay insertion
(IDICSSFF and IDICSSL ) for edge-triggered and level-sensitive
circuits are visualized for each benchmark circuit in Figs. 12
and 13, respectively.
The average total improvement of nonzero clock skew,
edge-triggered circuits with delay insertion with respect to the
zero clock skew, edge-triggered circuits is 34%. The average to-
tal improvement of non zero clock skew, level sensitive circuits
with delay insertion with respect to the zero clock skew, edge-
triggered circuits is also 34%. Note that the total improvements
Fig. 12. Percentage improvements (ICSSFF and IDICSSFF in Table V, respec-
tively) in minimum clock period via clock skew scheduling and delay insertion
for edge-triggered ISCAS’89 benchmark circuits. Two data points shown per
benchmark circuit from left to right are improvements observed for conven-
tional clock skew scheduling and clock skew scheduling with delay insertion,
respectively.
Fig. 13. Percentage improvements (IDIL and IDICSSL in Table VI, respec-
tively) in minimum clock period via clock skew scheduling and delay insertion
for level-sensitive ISCAS’89 benchmark circuits. Two data points shown per
benchmark circuit from left to right are improvements observed for conven-
tional clock skew scheduling and clock skew scheduling with delay insertion,
respectively.
are due to the simultaneous effects of the applications of delay
insertion, clock skew scheduling, and consideration of time
borrowing (for level-sensitive circuits) in the timing analysis.
The improvement with delay insertion is equal to or better
than the improvement with clock skew scheduling alone, as
delay insertion is applicable when it can be used to mitigate
the limitation of the reconvergent paths.
V. CONCLUSION
In this paper, the limitations of reconvergent paths on the
improvements achievable through clock skew scheduling are
shown for the first time. These limitations are mitigated with
a novel delay insertion method. The delay insertion method is
formulated as an LP problem, proposing a highly automated,
versatile, and efficient implementation.
In experimentation, clock skew scheduling with the delay
insertion method is demonstrated to improve the minimum
clock period by up to 90% over traditional zero clock skew,
TASKIN AND KOURTEV: DELAY INSERTION METHOD IN CLOCK SKEW SCHEDULING 663
edge-triggered circuits. On average, the improvements are at
34% both for edge-triggered and level-sensitive circuits. The
improvements over conventional clock skew scheduling meth-
ods for edge-triggered [4] and level-sensitive [9] circuits are at
10% and 9%, respectively. In experimentation on the ISCAS’89
suite of benchmark circuits, reasonable run times are reported
for the proposed delay insertion method. The practicality of
delay insertion is to be concluded by the designers considering
the projected increases in the total power consumption and
circuit area, and the transformation in circuit placement due to
delay insertion.
ACKNOWLEDGMENT
The authors would like to thank the anonymous reviewers,
Associate Editor L. Scheffer, and International Symposium on
Physical Design (ISPD)’05 participants for their constructive
comments.
REFERENCES
[1] J. Wood, “Electronic circuitry,” U.S. Patent 6 556 089, Apr. 29, 2003.
[2] ——, “Electronic circuitry,” U.S. Patent 6 816 020, Nov. 9, 2004.
[3] J. Wood, T. Edwards, and S. Lipa, “Rotary traveling-wave oscillator
arrays: A new clock technology,” IEEE J. Solid-State Circuits, vol. 36,
no. 11, pp. 1654–1665, Nov. 2001.
[4] J. P. Fishburn, “Clock skew optimization,” IEEE Trans. Comput., vol. 39,
no. 7, pp. 945–951, Jul. 1990.
[5] S. Held, B. Korte, J. Massberg, M. Ringe, and J. Vygen, “Clock schedul-
ing and clocktree construction for high performance ASICs,” in Proc.
IEEE/ACM Int. Conf. Computer-Aided Design, San Jose, CA, Nov. 2003,
pp. 232–239.
[6] I. S. Kourtev and E. G. Friedman, Timing Optimization Through Clock
Skew Scheduling. Norwell, MA: Kluwer, 2000.
[7] K. Ravindran, A. Kuehlmann, and E. Sentovich, “Multi-domain clock
skew scheduling,” in Proc. Int. Conf. Computer-Aided Design, San Jose,
CA, Nov. 2003, pp. 801–808.
[8] B. Taskin and I. S. Kourtev, “Linear timing analysis of SOC synchronous
circuits with level-sensitive latches,” in Proc. IEEE ASIC/SOC Conf.,
Rochester, NY, Sep. 2002, pp. 358–362.
[9] ——, “Performance optimization of single-phase level-sensitive circuits
using time borrowing and clock skew scheduling,” in Proc. ACM/IEEE
Int. Workshop Timing Speciﬁcation and Synthesis Digital Systems,
Monterey, CA, Dec. 2002, pp. 111–118.
[10] M. C. Papaefthymiou and K. H. Randall, “Edge-triggering vs. two-phase
level-clocking,” in Proc. Symp. Research Integrated Systems, Seattle, WA,
1993, pp. 201–218.
[11] N. Shenoy, R. K. Brayton, and A. L. Sangiovanni-Vincentelli, “Minimum
padding to satisfy short path constraints,” in Proc. IEEE/ACM Int. Conf.
Computer-Aided Design, Santa Clara, CA, Nov. 1993, pp. 156–161.
[12] S.-H. Huang and Y.-T. Nieh, “Clock period minimization of non-
zero clock skew circuits,” in Proc. Int. Conf. Computer-Aided Design,
San Jose, CA, Nov. 2003, pp. 809–812.
[13] M. R. Dagenais and N. C. Rumin, “On the calculation of optimal
clocking parameters in synchronous circuits with level-sensitive latches,”
IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 8, no. 3,
pp. 268–278, Mar. 1989.
[14] S.-C. Fang and S. Puthenpura, Linear Optimization and Extensions:
Theory and Algorithms. Englewood Cliffs, NJ: Prentice-Hall, 1993.
[15] ILOG CPLEX 7.1 User’s Manual, ILOG, Gentilly, France, 2001.
[16] B. Taskin and I. S. Kourtev, “Linearization of the timing analysis and
optimization of level-sensitive digital synchronous circuits,” IEEE Trans.
Very Large Scale Integr. (VLSI) Syst., vol. 12, no. 1, pp. 12–27, Jan. 2004.
[17] Netlib, a Collection of Mathematical Software, Papers, Databases: Linear
Programming Library. [Online]. Available: http://www.netlib.org/lp
[18] W. L. Winston, Operations Research Application and Algorithms, 2nd ed.
Boston, MA: PWS-Kent, 1991.
Baris Taskin (S’01–M’05) received the B.S. degree
in electrical and electronics engineering from Middle
East Technical University, Ankara, Turkey, in 2000,
and the M.S. and Ph.D. degrees in electrical engi-
neering from the University of Pittsburgh, Pittsburgh,
PA, in 2003 and 2005, respectively. He also received
the minor program diploma in operations research
from Middle East Technical University, in 2000, and
the certificate in system-on-chip (SOC) design from
Pittsburgh Digital Greenhouse, Pittsburgh, PA (in
cooperation with the University of Pittsburgh, the
Pennsylvania State University, and Carnegie Mellon University), in 2003.
He was with Multigig Inc., Scotts Valley, CA, from 2003 to 2004. He
has been an Assistant Professor in the Electrical and Computer Engineering
Department, Drexel University, Philadelphia, PA, since September 2005. His
research interests include very large scale integration (VLSI) computer-aided
design (CAD), circuit timing, and design, analysis, and optimization of high-
performance integrated circuits.
Ivan S. Kourtev (S’98–M’99) was born in Sofia,
Bulgaria, in 1968. He received the B.S. degree in
computer systems engineering from the Technical
University in Sofia, Bulgaria, in 1994, and the M.S.
and Ph.D. degrees in electrical engineering from the
University of Rochester, Rochester, NY, in 1995 and
1999, respectively.
He started his appointment as an Assistant Profes-
sor with the Department of Electrical Engineering
at the University of Pittsburgh, Pittsburgh, PA, in
August 1999. He has authored multiple publications
in the area of timing optimization and circuit design, as well as a book
on clock skew scheduling algorithms. His research interests include method-
ologies and computer-aided design (CAD) tools for digital very large scale
integration (VLSI) design, computer architecture, and software technology.
His professional experiences include Xerox Corporation in Webster, NY; IBM
Microelectronics in East Fishkill, NY; and Ultima Interconnect Technology in
Sunnyvale, CA.
Dr. Kourtev has served as an Associate Editor for the IEEE TRANSACTIONS
ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL
PROCESSING and for the IEEE TRANSACTIONS ON VERY LARGE SCALE
INTEGRATION (VLSI) SYSTEMS. He is currently serving as a member of the
editorial board of the Journal of Circuits, Systems, and Computers.
