Network flow-based simultaneous retiming and slack budgeting for low
  power design by Yu, Bei et al.
Network Flow-based Simultaneous Retiming and
Slack Budgeting for Low Power Design
Bei Yu∗, Sheqin Dong∗, Yuchun Ma∗, Tao Lin∗, Song Chen† and Satoshi Goto†
∗Department of Computer Science & Technology, Tsinghua University, Beijing, China
†Graduate School of IPS, Waseda University, Kitakyushu, Japan
Email: {b-yu07@mails, dongsq@mail}.tsinghua.edu.cn
Abstract—Low power design has become one of the most
significant requirements when CMOS technology entered the
nanometer era. Therefore, timing budget is often performed
to slow down as many components as possible so that timing
slacks can be applied to reduce the power consumption while
maintaining the performance of the whole design. Retiming is a
procedure that involves the relocation of flip-flops (FFs) across
logic gates to achieve faster clocking speed. In this paper we
show that the retiming and slack budgeting problem can be
formulated to a convex cost dual network flow problem. Both the
theoretical analysis and experimental results show the efficiency
of our approach which can not only reduce power consumption
but also speedup previous work.
1. INTRODUCTION
Timing constraint design and low power design have be-
come significant requirements when the CMOS technology
entered the nanometer era. On the one hand, more and more
devices trend to be put in the small silicon area while at the
same time the clock frequency is pushed even higher. As an
effective timing optimization scheme, retiming is a procedure
that involves the relocation of flip-flops (FFs) across logic
gates to achieve faster clocking period. On the other hand, to
tackle the tremendous growth in the design complexity, timing
budgeting is performed to relax the timing constraints for as
many components as possible without violating the system’s
timing constraint. Therefore, both retiming and timing budget
might influence the timing distribution of the design greatly.
Since Leiserson and Saxe proposed the idea of retiming in
1983 [1], it has become one of the most powerful sequential
optimization techniques. In [2], the min-area retiming prob-
lem was solved by min-cost network flow algorithm. Recent
publications [3] and [4] proposed a very efficient retiming
algorithm for minimal period by algorithm derivation. [5] and
[6] respectively presented efficient incremental algorithms for
min-period retiming under setup and hold constraints, and min-
area retiming under given clock period.
For timing-constrained gate-level synthesis, timing slack
is an effective method for circuit’s potential performance
improvement. The components with relaxed timing constraints
can be further optimized to improve system’s area, power
dissipation, or other design quality metrics. The slack bud-
geting problem has been studied well. Some of the previous
slack budgeting approaches are suboptimal heuristics such as
Zero-Slack Algorithm (ZSA) [7]. [8][9] formulated the slack
budgeting problem as Maximum-Independent-Set (MIS) on
cb
a e
d
6,0
3,0
6,0
3,0
3,0
T=9
v delay, slack
(a)
cb
a e
d
6,0
3,0
6,0
3,0
3,3
T=9
v delay, slack
(b)
Fig. 1: Relocate FFs to increase potential slack without vio-
lating timing constraint. (a)No potential slack in this circuit.
(b)moving the FF from edge de to edge cd, the potential slack
can be increased from 0 to 3.
sensitive transitive closure graph. In [10] and [11], authors
proposed combinatorial methods based on net flow approach
to handle the slack budget problem.
Budgeting problem can be extended to describe exactly real-
word applications, such as gate resizing, multiple V dd and
multiple V th assignment [12][13][14]. Since the number of
logically equivalent cells in a library is limited, it is reasonable
to limit the possible slack value in real designs . In [15],
Qiu et al. showed that power reduction is not proportional
to the slack amount and propose a piecewise linear model
to approximate the relationship between slack and power
reduction. In this paper, we adopt the same model and consider
discrete slack budgeting problem. Note that our method can
be easily transferred into continuous slack budgeting problem.
Nearly all the existing slack budgeting algorithms are either
used for combinatorial circuit, or limited to fixed FF locations.
At the early design stages, it is flexibility to schedule pipeline
or timing distribution to obtain more timing slack. As shown
in Fig. (1), the period of a circuit is minimized with the delay
and slack labeled beside each gate as well. It is seen that there
is no potential slack in this circuit. However, if retiming and
slack budget process is taken, i.e. moving the FF from edge
de to edge cd, the potential slack can be increased from 0 to
3, keeping the period minimized at the same time.
A simultaneous retiming and slack budgeting algorithm for
dual-Vdd programmable FPGA power reduction was proposed
in [16]. In [17] Lin et al. proved that slack budgeting problem
can be viewed as a convex retiming problem. However they
ar
X
iv
:1
40
2.
24
60
v1
  [
cs
.A
R]
  1
1 F
eb
 20
14
failed to formulate retiming and slack budgeting simulta-
neously. In [18] authors proposed a simultaneously slack
budgeting and incremental retiming algorithm to maximize
the potential slack by retiming for synchronous sequential
circuit. They proposed a reasonable algorithm flow, however,
their solution quality suffers in two aspects. First, there was
no guarantee that the algorithm will get optimal solution
because iterative strategy is easily trapped in local optimum.
Besides, the slack budget problem was translated to a Maximal
Independent Set (MIS) problem, which is a NP-hard problem.
[19] showed that for an Integer Linear Programming (ILP)
with separable convex objective functions and special form of
constraints, it can be viewed as convex cost dual network flow
problem and solved in polynomial time. This model has been
adopted in various works, such as buffer insertion [20], multi-
voltage supply [21][22], clock skew scheduling [23] and slack
budgeting [17].
In this paper we first formulate retiming and slack budgeting
problem as an Integer Linear Programming (ILP) problem.
Since ILP has been listed to be one of the known NP-hard
problems, we then show how to transform this problem to the
convex cost dual network flow problem with just a little loss
of optimality. Experimental results show that our algorithm
can not only reduce power consumption, increase total slack
budgeting, but also effectively speedup previous work.
The remainder of this paper is organized as follows. Section
2 defines the simultaneous slack budget and retiming problem.
Section 3 presents our algorithm flow. Section 4 reports our
experimental results. At last, Section 5 concludes this paper.
2. PROBLEM FORMULATION
As shown in [1], we model a synchronous sequential circuit
as a directed graph G(V,E, d, w), each vertex i ∈ V represents
a combinational gate and each edge (i, j) ∈ E represents a
signal passing from gate i to j. Non-negative gate delays are
given as vertex weights d : V → R. Non-negative integer
w : E → Z as the edge weight represents the number of FFs
on the signal pass. The max clock period is given as T .
For each vertex, three non-negative labels, ai/γi/si, repre-
sent the latest arrival time, require time, and slack of vertex i.
ai and γi can be calculated as follows:{
ai = di if w(k, i) > 0 or i ∈ PI
ai = maxj(aj + dj) ∀j ∈ FI(i) (1){
γi = T if w(k, i) > 0 or i ∈ PO
γi = minj(γj − dj) ∀j ∈ FI(i) (2)
where PI is set of all primary inputs and PO is set of all
primary outputs. FI(i) and FO(i) represent the incoming and
outgoing gates to gate i respectively. Then slack si is then
calculated by
si = γi − ai (3)
A retiming of a circuit G is an integer-valued vertex-labeling
r, which represent how many FFs are moved from the outgoing
edges to the incoming edges of each vertex. Thus the number
of FFs on edge (i, j) with label r is formulated as follow:
wi,j + rj − ri
Definition 1: Power Slack Curve - Each gate i is given
k discrete slack levels, and the power-slack tradeoff is rep-
resented by {(si1, P (si1)), · · · , (sik, P (sik))}. In the Power
Slack Curve, each point is connected to its neighboring
point(s) by a linear segment.
Based on the relationship between power reduction and
slack provided by [15], we assume Power Slack Curve is a
convex decreasing function.
Definition 2: Simultaneous Slack Budget and Retiming
Problem - Given a directed graph G = (V,E, d, w) represent-
ing a synchronous sequential circuit, and period constraint T ,
we want to find FFs reallocation represented by r, such that the
power consumption obtained by slack budgeting is minimized
under the period constraint.
According to the above definitions and notations, the si-
multaneous slack budget and retiming problem can be easily
formulated into the following mathematical program.
min
∑
i∈V
P (si) (I)
s.t. (1)− (3)
rj − ri ≥ −wi,j ∀(i, j) ∈ E
si ∈ {s1i , · · · , ski } ∀i ∈ V
ai ≤ T ∀i ∈ V
3. METHODOLOGY
3.1. MILP Formulation
The MILP formulation for retiming synchronous circuits is
originally presented in [1] to minimize clock period. The clock
period Φ(G) ≤ T if and only if there exists an assignment of
real values ai and an integer value ri to each vertex i ∈ V
such that the following conditions are satisfied:
ai ≥ di + si ∀i ∈ V (4)
ai ≤ T ∀i ∈ V (5)
ri − rj ≤ wij ∀(i, j) ∈ E (6)
aj ≥ ai + di + si if ri − rj = wij (7)
Suppose Ri = ri + ai/T , then ai = T · Ri − T · ri. The
problem can be formulated as (II).
min
∑
i∈V
P (s¯i) (II)
s.t. R¯i − r¯i ≥ s¯i ∀i ∈ V (IIa)
R¯i − r¯i ≤ T ∀i ∈ V (IIb)
r¯j − r¯i ≥ −T · wij ∀(i, j) ∈ E (IIc)
0 ≤ R¯i, r¯i ≤ N¯ff ∀i ∈ V (IId)
s¯i = {s¯1i , · · · , s¯ki } ∀i ∈ V (IIe)
0 ≤ s¯i ≤ T ∀i ∈ V (IIf )
R¯j − R¯i ≥ tij ∀(i, j) ∈ E (IIg)
tij ≥ s¯j − T · wij ∀(i, j) ∈ E (IIh)
where N¯ff = Nff ·T , s¯i = di+si, r¯i = ri ·T and R¯i = Ri ·T .
For each gate i, s¯ji = s
j
i + di(j = 1, · · · , k).
This problem can be solved by common ILP solver. How-
ever, computationally ILP is one of the most difficult combi-
natorial optimization problems and the runtime is unaccepted
even if the problem size is small. In the following subsections,
we will explain how to transform this problem to a convex cost
dual network flow problem.
3.2. Formulation Simplification
Constraint (IIh) make problem (II) too complex to solve
by network flow-based algorithm. First we consider a more
simple formulation (III), which removes constraint (IIh).
To compensate the lose of accuracy, we add penalty function
P (tij) in objective function.
min
∑
i∈V
P (s¯i) +
∑
(i,j)∈E
P (tij) (III)
s.t. (IIa)− (IIg)
tij ≥ −c · wij , ∀(i, j) ∈ E
where P (tij) = P (s¯j)/k, and k is a coefficient. Here we set
k =
∑
i(1− wij)1.
Given solution of problem (III) s¯i(i = 1, . . . ,m) and
tij(∀(i, j) ∈ E), we propose a heuristic method to generate
solution of problem (II).
ti,j ≥ s¯j − c · wij ⇒ s¯j = min(tij + c · wij),∀i ∈ FI(j)(8)
We denote the s¯j got in (8) as s¯j(Ω) and s¯j got from
problem (III) as s¯j(Θ), then we can get s¯j in problem (II)
as follows:
s¯j = min[s¯j(Ω), s¯j(Θ)]
= min[min(tij + c · wij), s¯j(Θ)], ∀i ∈ FI(j) (9)
By now we have build the connection between solution
of problem (II) and problem (III). After we calculate the
solution of (III), we can then get the solution of (II). In
the next subsection, we will prove problem (III) can be
transformed to convex cost dual network flow problem.
3.3. Remove Redundant Constraint
In this subsection we will prove that without loss of opti-
mality, problem (III) can remove constraint R¯i − r¯i ≤ T .
Let s∗i denote the value of si for which P (s¯i) is minimum.
In case there are multiple values for which P (s¯i) is minimum,
the minimum value will be chosen. Let us define the function
Q(s¯i) in the following manner:
Q(s¯i) =
{
P (s¯∗i ) if s¯i ≤ s∗i
P (s¯i) if s¯i > s
∗
i
(10)
1We suppose for each (i, j) ∈ E, wij is 0− 1 variable.
Now consider the following problem (III ′), which replaces
(IIa) and (IIb) by R¯i − r¯i = s¯i:
min
∑
i∈V
Q(s¯i) +
∑
(i,j)∈E
P (tij) (III ′)
s.t. (IIc)− (IIg)
R¯i − r¯i = s¯i ∀i ∈ V
tij ≥ −T · wij ∀(i, j) ∈ E
Theorem 1: For every optimal solution (R¯, r¯, s¯) of problem
(III), there is an optimal solution (R¯, r¯, sˆ) of problem (III ′),
and the converse also holds.
Proof: Consider an optimal solution (R¯, r¯, s¯) of (III), we
show how to construct an optimal solution (R¯, r¯, sˆ) of (III ′)
with the same cost. There are two cases to consider:
Case 1: R¯i − r¯i ≥ s∗i . It follows from (IIa) and the
convexity of P (s¯i) that sˆi = s∗i . In this case, we set
sˆ = R¯i − r¯i. It follows from (10) that P (s¯i) = Q(sˆi).
Case 2: R¯i − r¯i < s∗i . Similar to case 1, we can get s¯i =
R¯i− r¯i. In this case, we set sˆi = R¯i− r¯i. It follows from (10)
that P (s¯i) = Q(sˆi).
Similarly, it can be shown that if (Rˆ, rˆ, sˆ) is an optimal
solution of (III ′), then the solution (Rˆ, rˆ, s¯) constructed
in the following manner is an optimal solution of (III):
s¯i =max{s∗i , sˆi}.
Theorem 2: The constraint R¯i − r¯i ≤ T in problem (III)
can be removed.
Proof: By Theorem 1, we can transform each constraint
in (IIa) to an equality constraint. In other words, R¯i−r¯i = s¯i.
Because constraint (IIf ) (0 ≤ s¯i ≤ T ), R¯i − r¯i ≤ T . So we
can remove constraint R¯i − r¯i ≤ T .
3.4. Transformation to Primal Network Flow Problem
To further simplify problem (III), we transform G(V,E)
into G¯(V¯ , E¯) in such a way that each vertex i ∈ V is split
into two vertex r¯i and R¯i. So constraints (IIa) (IIg) and
(IIc) can be transformed to the connection relationship in
E¯. V¯ = {r¯1, R¯1, . . . , r¯m, R¯m}. E¯ = E¯1 ∪ E¯2 ∪ E¯3, where
E¯1 include edges (r¯i, R¯i), E¯2 include edges (R¯i, R¯j) and
edges (r¯i, r¯j) belong to E¯3. Fig. (2a) illustrates a simple
DAG G representing a synchronous sequential circuit, and the
transformed DAG G¯ of G is illustrated in Fig. (2b).
Now the problem formulation can be simplified as follows:
min
∑
(i,j)∈E¯
P (sij) (IV )
s.t. µj − µi ≥ sij ∀(i, j) ∈ E¯ (IV a)
0 ≤ µi ≤ N¯ff ∀i ∈ V¯ (IV b)
lij ≤ sij ≤ uij ∀(i, j) ∈ E¯ (IV c)
where sij represents slack assigned to edge from node i to j.
For each edge e(i, j) ∈ E1, if i = r¯p and j = R¯p, then sij =
s¯p, and lij = s¯1p and uij = s¯
k
p . For each edge e(i, j) ∈ E2,
sij = s¯j−T ·wij , then lij = s¯1j−T ·wij and uij = s¯kj−T ·wij .
slack
Pij
lij uij
(sij1, pij1)
(sij2, pij2)
(sij3, pij3)
(a)
slack
Pij
lij(-c·wij) uij(Nff)
(b)
slack
Pij
lij(0) uij(Nff)
(c)
Fig. 3: The Power-Slack Curve of (a)an edge (i, j) ∈ E1 ∪E2, here we assume wij = 0; (b)an edge (i, j) ∈ E3; (c)an edge
(i, j) ∈ E4.
3
1
2
4
(a)
r3
R1
R2
r4
R4R3
r1
r2
E1
E2
E3
(b)
Fig. 2: (a)The DAG G representing a synchronous sequential
circuit. (b)The transformed DAG G¯ of G.
For each edge e(i, j) ∈ E3, lij = −T · wij and uij = N¯ff .
An example Power-Slack Curve of an edge in E1 ∪ E2 and
that of an edge in E3 are illustrated in Fig. (3a) and Fig. (3b),
respectively.
We then further eliminate constraints (IV b) and (IV c). First
of all, P (sij) can be modified to eliminate the bounds on s¯i
as follows.
P¯ (sij) =
 P (uij) +M(sij − uij) s¯ij > uijP (sij) 0 ≤ s¯i ≤ T
P (lij)−M(sij − lij) s¯ij < lij
(11)
where M is a sufficiently large number such that P¯ (sij) is
still a convex function.
Similarly, the bounds on µi can also be eliminated by adding
into objective a convex cost function B(µi) defined as follows.
B(µi) =
 M · (µi − N¯ff ) if µi > N¯ff0 if 0 ≤ µ¯i ≤ N¯ff−M · µi if µi < 0 (12)
After the above simplifications, problem (IV ) can be trans-
formed to problem (V ):
min
∑
(i,j)∈E¯
P¯ (sij) +
∑
i∈V¯
B(µi) (V )
s.t. µj − µi ≥ sij ∀(i, j) ∈ E¯
3.5. Problem Transformation by Lagrangian Relaxation
Using Lagrangian relaxation to eliminate constraint in prob-
lem (V ), get the Lagrangian sub-problem:
L(~x) =
∑
e(i,j)∈E¯
P¯ (sij) +
∑
i∈V¯
Bi(µi)
−
∑
e(i,j)∈E¯
(µj − µi − sij)xij (13)
It is easy to show that∑
e(i,j)∈E¯
(ui − uj)xij =
∑
i∈V¯
x0i × µi (14)
where
x0i =
∑
j:e(i,j)∈E¯
xij −
∑
j:e(j,i)∈E¯
xji,∀i ∈ V (15)
Lagrangian subproblem (13) can be restated as follows:
L(~x) = min
∑
e(i,j)∈E¯
[P (sij) + xijsij ] +
∑
i∈V¯
[Bi(µi) + x0iµi]
(16)
A start node v0 is added to V¯ , v0 interconnects all other
nodes in V¯ . We set s0i = µi, l0i = 0, u0i = N¯ff . So V =
{v0} ∪ V¯ . The new edges are denoted as E4, E = E¯ ∪ E4.
The Power-Slack curve of an edge (i, j) ∈ E4 is illustrated in
Fig. (3c). So we can transform L(~x) as formulation (17).
L(~x) = min
∑
e(i,j)∈E
[Pij(sij) + xijsij ] (17)
s.t.
∑
j:e(i,j)∈E
xij −
∑
j:e(j,i)∈E
xji = 0 ∀i ∈ V
xij ≥ 0 ∀(i, j) ∈ E1 ∪ E2 ∪ E3
3.6. Convex Cost-scaling Approach
We define function Hij(xij) for each e(i, j) ∈ E as follows:
Hij(xij) = minsij{Pij(sij) + xijsij} (18)
For the e(i, j) ∈ E1, because the function Hij(xij) is a
piecewise linear concave function of xij , and ∀e(i, j) ∈ E1,
then Hij(xij) is described in the following manner [19]:
Hij(xij) =

Pij(s
k
ij) + s
k
ijxij 0 ≤ xij ≤ bij(k)
. . .
Pij(s
q
ij) + s
q
ijxij bij(q + 1) ≤ xij ≤ bij(q)
. . .
Pij(s
1
ij) + s
1
ijxij k ≤ xij
where bij(q) =
Pij(s
q−1
ij )−Pij(sqij)
sqij−sq−1ij
.
For the e(i, j) ∈ E1, similar to E2, then Hij(xij) =
Hij(xij) =

Pij(t
k
ij) + t
k
ijxij 0 ≤ xij ≤ bij(k)
. . .
Pij(t
q
ij) + t
q
ijxij bij(q + 1) ≤ xij ≤ bij(q)
. . .
Pij(t
1
ij) + t
1
ijxij k ≤ xij
where bij(q) =
Pij(t
q−1
ij )−Pij(tqij)
tqij−tq−1ij
, and tqij = s
q
ij − T · wij .
For the e(i, j) ∈ E3, because Pij(sij) = 0,
Hij(xij) = minsij (sijxi,j) = −T · wij · xi,j , xij ≥ 0
For the e(i, j) ∈ E4, the variable xi,j is not a Lagrangian
multiplier, and it is bounded by −M ≤ xij ≤M .
Hij(xij) =
{
0 0 ≤ xij ≤M
N¯ff · xij −M ≤ xij ≤ 0
Note that these functions Hij(xij) are all concave. We
define Cij(xij) = −Hij(xij), so that Cij(xij) is a piecewise
linear convex function. Then we can subsequently propose
problem (V I) as follows:
L(~x) = min
∑
e(i,j)∈E
Cij(xij) (V I)
s.t.
∑
j:e(i,j)∈E
xij −
∑
j:e(j,i)∈E
xji = 0 ∀i ∈ V
0 ≤ xij ≤M ∀(i, j) ∈ E1 ∪ E2 ∪ E3
−M ≤ xij ≤M ∀(i, j) ∈ E4
To transform the problem into a minimum cost flow prob-
lem, we construct an expanded network G′ = (V ′, E′). There
are four kinds of edges to consider:
• e(i, j) in E1: we introduce k edges in G′, and the costs
of these edges are: −skij ,−sk−1ij , · · · − s1ij ; upper
capacities: bij(k), bij(k − 1)− bij(k), bij(k − 2)−
bij(k − 1), . . .M − bij(2), where M is a huge
coefficient; lower capacities are all 0.
• e(i, j) in E2: we introduce k edges in G′, and the costs
of these edges are: −tkij ,−tk−1ij , · · · − t1ij ; upper
capacities: bij(k), bij(k − 1)− bij(k), bij(k − 2)−
bij(k − 1), . . .M − bij(2), where M is a huge
coefficient; lower capacities are all 0.
• e(i, j) in E3: cost, lower and upper capacity is
(c · wij , 0,M ).
TABLE I: Characteristics of Test Cases
Case Name Gate # Edges # Max Output Max Inputs Tmin
s27.test 11 19 4 2 20
s208.1.test 105 182 19 4 28
s298.test 120 250 13 6 24
s382.test 159 312 21 6 44
s386.test 160 354 36 7 64
s344.test 161 280 12 11 46
s349.test 162 284 12 11 46
s444.test 182 358 22 6 46
s526.test 194 451 13 6 42
s526n.test 195 451 13 6 42
s510.test 212 431 28 7 42
s420.1.test 219 384 31 4 50
s832.test 288 788 107 19 98
s820.test 290 776 106 19 92
s641.test 380 563 35 24 238
s713.test 394 614 35 23 262
s838.1.test 447 788 55 4 80
s1238.test 509 1055 192 14 110
s1488.test 654 1406 56 19 166
• e(i, j) in E4: two edges are introduced in G′, one with
cost, lower and upper capacity as (N¯ff ,−M, 0),
another is (0, 0,M ).
Using the cost-scaling algorithm [24], we can solve the
minimum cost flow problem in G′. For the given optimal flow
x∗, we construct residual network G(x∗) and solve a shortest
path problem to determine shortest path distance d(i) from
node s to every other node. By implying that µ(i) = d(i) and
sij = µ(i)− µ(j) for each e(i, j) ∈ E1 ∪ E2, we can finally
solve problem (III).
4. EXPERIMENTAL RESULTS
We implemented our algorithm in the C++ programming
language and executed on a Linux machine with eight 3.0GHz
CPU and 6GB Memory. 19 cases from the ISCAS89 bench-
marks are tested, and the name, number of gates, number of
signal passes, the maximum number of gate output/inputs, and
the minimum period for each case are given in Table I. We
used four discrete slack levels for each gate as {0, 10, 20, 33}.
Energy consumption of the gates with slack level scaling were
found from model in [15].
In the experiments, a min-period retiming algorithm [4] is
first employed to generate the minimum clock period T , which
is listed in the 2nd column of TABLE II. Liu et al.’s [18]
algorithm was implemented for comparison. Note that algo-
rithm in [18] can not directly solve discrete slack budgeting
problem, because if sensitive transitive closure graph is used,
the timing constraints might be violated after slack budgeting
[8]. Therefore we use a transitive closure graph instead of
sensitive transitive closure graph here. To evaluate the accuracy
of our algorithm, the ILP for achieving the optimal solution
were also implemented using an open source ILP solver CBC
[25].
Table II shows comparisons among optimal ILP, algorithm
in [18] and our algorithm. The column Power Consumption
gives actual power consumption of each circuit and less value
means more power can be reduced. Comparing with optimal
TABLE II: Comparisons with Optimal ILP and Previous Work [18]
Benchmark T Power Consumption Total Slacks Runtime(s)
Optimal ILP [18] ours Optimal ILP [18] ours Optimal ILP [18] ours
s27.test 20 800 824 850 40 40 30 0.02 0.0 0.0
s208.1.test 28 3542 9118 4772 1770 290 1988 0.39 0.44 0.06
s298.test 24 6498 8888 8010 1330 660 1240 0.78 0.69 0.07
s382.test 44 6456 9038 9958 3011 2071 1895 >1000 10.56 0.12
s386.test 64 8836 12870 9564 2484 807 2324 4.58 1.03 0.1
s344.test 46 9876 11848 9894 1855 1064 1760 0.82 2.53 0.09
s349.test 46 9938 12472 9894 1852 912 1780 0.79 4.49 0.11
s444.test 46 8938 14032 11884 2962 1025 1939 >1000 12.04 0.12
s526.test 42 7602 14106 11498 3626 1307 2356 42.57 1.67 0.17
s526n.test 42 7752 11734 11548 3616 2089 2366 30.32 4.72 0.17
s510.test 42 13976 17492 14846 2237 937 2040 >1000 1.62 0.17
s420.1.test 50 4574 17920 9224 5906 1050 4466 1.29 16.91 0.14
s832.test 98 13652 14518 16274 5175 4525 4171 71.96 151.26 0.24
s820.test 92 13552 17694 16448 5261 3493 4103 68.98 13.18 0.25
s641.test 238 13334 20408 14424 7925 6067 7604 2.24 92.97 0.26
s713.test 262 13018 21228 14322 8522 6363 8112 2.27 121.1 0.27
s838.1.test 80 6004 18898 17556 14048 9016 9912 1.48 256.9 0.4
s1238.test 110 6096 10444 8208 16764 14635 15792 0.23 448.6 0.34
s1488.test 166 21292 23799 27836 15313 14791 13024 >1000 670.7 0.53
Avg - 9249.3 14070 11947.9 5457.7 3744.3 4573.8 - 95.3 0.19
Diff - 1 +52% +29% 1 -31% -16% - 1 0.002
solution, our algorithm increases 29% power consumption
while [18] increases 52%. Column Total Slack gives the sum
of each gate’s slack. Comparing with optimal solution, our
algorithm loses 16% of slacks while [18] loses 31%. Note that
power consumption is not proportional to the slack amount. As
for benchmark s27.test, [18] and optimal ILP get equal slack
amount, but their power consumption is different. Column
Runtime compares the run time of each algorithm. From the
results we can find that although optimal ILP can get optimal
solution, its runtime sometimes is unacceptable. Comparing
with [18], our algorithm can not only generate better design
results, but also get nearly 500× speedup.
5. CONCLUSION
In this paper we have showed that the retiming and slack
budgeting problem can be simultaneously solved by formulat-
ing the problem to a convex cost dual network flow problem.
Both the theoretical analysis and experimental results show the
efficiency of our approach which can not only reduce power
consumption but also speedup previous work.
REFERENCES
[1] C. E. Leiserson and J. B. Saxe, “Retiming synchronous circuitry,”
Algorithmica, vol. 6, pp. 5–35, 1991.
[2] N. Maheshwari and S. Sapatnekar, “Efficient retiming of large circuits,”
IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
vol. 6, pp. 74–83, 1998.
[3] H. Zhou, “Deriving a new efficient algorithm for min-period retiming,”
in ACM/IEEE Asia and South Pacific Design Automation Conference
(ASPDAC), 2005, pp. 990–993.
[4] ——, “A new efficient retiming algorithm derived by formal manipula-
tion,” ACM Trans. Des. Autom. Electron. Syst., vol. 13, no. 1, pp. 1–19,
2008.
[5] C. Lin and H. Zhou, “An efficient retiming algorithm under setup and
hold constraints,” in ACM/IEEE Design Automation Conference (DAC),
2006, pp. 945–950.
[6] J. Wang and H. Zhou, “An efficient incremental algorithm for min-area
retiming,” in ACM/IEEE Design Automation Conference (DAC), 2008,
pp. 528–533.
[7] R. Nair, C. L. Berman, P. S. Hauge, and E. J. Yoffa, “Generation of
performance constraints for layout,” IEEE transactions on computer-
aided design of integrated circuits and systems, vol. 8, pp. 860–874,
1989.
[8] D.-S. Chen and M. Sarrafzadeh, “An exact algorithm for low power
library-specific gate re-sizing,” in ACM/IEEE Design Automation Con-
ference (DAC), 1996, pp. 783–788.
[9] C. Chen, X. Yang, and M. Sarrafzadeh, “Predicting potential per-
formance for digital circuits,” IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems, 2002.
[10] S. Ghiasi, E. Bozorgzadeh, S. Choudhuri, and M. Sarrafzadeh, “A unified
theory of timing budget management,” in ACM/IEEE International
Conference on Computer Aided Design (ICCAD), 2004, pp. 653–659.
[11] ——, “A unified theory of timing budget management,” IEEE Trans-
actions on Computer-Aided Design of Integrated Circuits and Systems,
pp. 2364–2375, 2006.
[12] D. Nguyen, A. Davare, M. Orshansky, D. Chinnery, B. Thompson,
and K. Keutzer, “Minimization of dynamic and static power through
joint assignment of threshold voltages and sizing optimization,” in
IEEE International Symposium on Low Power Electronics and Design
(ISLPED), 2003, pp. 158–163.
[13] A. srivastava, D. Sylvester, and D. Blaauw, “Power minimization us-
ing simultaneous gate sizing dual-vdd and dual-vth assignment,” in
ACM/IEEE Design Automation Conference (DAC), 2004, pp. 783–787.
[14] S. Kulkarni, A. Srivastava, and D. Sylvester, “A new algorithm for
improved vdd assignment in low power dual vdd systems,” in IEEE In-
ternational Symposium on Low Power Electronics and Design (ISLPED),
2004, pp. 200–205.
[15] X. Qiu, Y. Ma, X. He, and X. Hong, “Iposa: A novel slack distribution
algorithm for interconnect power optimization,” in International Sym-
posium on Quality of Electronic Design (ISQED), 2008, pp. 873–876.
[16] Y. Hu, Y. Lin, L. He, and T. Tuan, “Simultaneous time slack budgeting
and retiming for dual-vdd fpga power reduction,” in ACM/IEEE Design
Automation Conference (DAC), 2006, pp. 478–483.
[17] C. Lin, A. Xie, and H. Zhou, “Design closure driven delay relaxation
based on convex cost network flow,” in the conference on Design,
Automation and Test in Europe (DATE), 2007, pp. 63–68.
[18] S. Liu, Y. Ma, X. Hong, and Y. Wang, “Simultaneous slack budgeting
and retiming for synchronous circuits optimization,” in ACM/IEEE Asia
and South Pacific Design Automation Conference (ASPDAC), 2010.
[19] R. K. Ahuja, D. S. Hochbaum, and J. B. Orlin, “Solving the convex
cost integer dual network flow problem,” Manage. Sci., vol. 49, no. 7,
pp. 950–964, 2003.
[20] R. Chen and H. Zhou, “Efficient algorithms for buffer insertion in
general circuits based on network flow,” in ACM/IEEE International
Conference on Computer Aided Design (ICCAD), 2005, pp. 322–326.
[21] Q. Ma and F. Young, “Network flow-based power optimization under
timing constraints in msv-driven floorplanning,” in ACM/IEEE Interna-
tional Conference on Computer Aided Design (ICCAD), 2008, pp. 1–8.
[22] B. Yu, S. Dong, S. Goto, and S. Chen, “Voltage-island driven floorplan-
ning considering level-shifter positions,” in ACM Great Lakes Sympo-
sium on VLSI (GLSVLSI), 2009, pp. 51–56.
[23] C. Lin and H. Zhou, “Clock skew scheduling with delay padding for
prescribed skew domains,” in ACM/IEEE Asia and South Pacific Design
Automation Conference (ASPDAC), 2007, pp. 541–546.
[24] R.K.Ahuja, T.L.Magnanti, and J.B.Orlin, Network Flows: Theory, Algo-
rithms, and Applications. Prentice Hall/Pearson, 2005.
[25] [Online]. Available: http://www.coin-or.org/projects/Cbc.xml
