Minimization of circuit registers: Retiming revisited  by Gaujal, Bruno & Mairesse, Jean
Discrete Applied Mathematics 156 (2008) 3498–3505
Contents lists available at ScienceDirect
Discrete Applied Mathematics
journal homepage: www.elsevier.com/locate/dam
Note
Minimization of circuit registers: Retiming revisited
Bruno Gaujal a,∗, Jean Mairesse b
a INRIA, LIG (CNRS, UJF, INPG), 51 Av. J. Kuntzmann, 38330 Montbonnot, France
b CNRS, LIAFA (Université Paris 7), Case 7014, 2 place Jussieu, 75251 Paris Cedex 05, France
a r t i c l e i n f o
Article history:
Received 8 December 2005
Received in revised form 2 February 2008
Accepted 9 March 2008
Available online 28 April 2008
Keywords:
Retiming
Digital circuits
Register minimization
Periodic graphs
Cuts and flows
a b s t r a c t
We address the following problem: given a synchronous digital circuit, is it possible to
construct a new circuit computing the same function as the original one but using a minimal
number of registers? We show that the minimal number of registers is the size of the
minimal cut on a bi-infinite graph, namely the unfolding of the dependencies in the digital
circuit. Furthermore, the construction of such a cut and the corresponding circuit can be
done in polynomial time, using a max-flow min-cut result of Orlin for one-periodic bi-
infinite graphs. Finally, we show the relation between this construction and the retiming
technique introduced by Leiserson and Saxe.
© 2008 Elsevier B.V. All rights reserved.
1. Introduction
Digital circuits can be seen as a rather accurate model of computer hardware. Their design and optimization is a long
lasting challenge from several viewpoints. For instance, the problems of layout compaction [6], verification [7], placement
and partitioning [6] have been intensively studied in the VLSI literature, see for instance [10].
From an abstract point of view, synchronous digital circuits are often seen as finite graphs constituted by functional gates,
wires and registers. At each clock tick, functional elements transform input data into output data which are transmitted on
the wires to the next nodes. A register is a storage facility, or a memory cell, of finite size.
Several optimizations are possible at that level. One may want to accelerate the clock frequency [9] or reduce the number
of nodes in the circuit. In this paper we show how to minimize the number of registers. This problem has already been
considered by Leiserson and Saxe in a seminal paper [9], where they show how to retime (this will be defined later) the
circuit in order to reduce the number of registers. They also provide an algorithm computing the best possible retiming.
One question remains: is it possible to do better? In other words, can one modify a circuit so that the number of registers is
smaller than what optimal retiming does, while keeping the original functional behavior?
The answer is yes and no. For many circuits retiming is indeed optimal. In those where it is not, the gain in the number
of registers comes at the expense of additional functional nodes.
More precisely, the contributions of the present paper are the following:
(i) We provide a characterization of the minimal number of registers as the size of a minimal cut in a graph. This implies
that retiming is not always optimal.
(ii) We provide a polynomial algorithm to construct a circuit with the minimal number of registers.
(iii) We characterize the circuits in which retiming is indeed optimal.
A preliminary paper on that subject [5] considered a particular class of circuits, namely recycled ones. Here, general
circuits are considered and the proof is different and based on a result of Orlin [11] on one-periodic bi-infinite graphs, which
∗ Corresponding author.
E-mail addresses: bruno.gaujal@imag.fr (B. Gaujal), mairesse@liafa.jussieu.fr (J. Mairesse).
0166-218X/$ – see front matter© 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.dam.2008.03.030
B. Gaujal, J. Mairesse / Discrete Applied Mathematics 156 (2008) 3498–3505 3499
Fig. 1. An illustration to Theorem 2.1.
is a “max-flow min-cut” theorem. Such graphs have mainly been used in scheduling applications [12], Uniform Recurrence
Equations or PRAM program loops [1,4,8], where the weights on the arcs represent time and where the flow is the crucial
quantity. Here instead, we view weights as memory resources and the cut becomes the central notion.
The paper is organized as follows. Section 2 contains basic definitions and notations and formulates the above-mentioned
“max-flow min-cut” theorem of Orlin. Section 3 discusses how the result can be used in the design of digital circuits to
minimize the number of registers.
2. One-periodic bi-infinite graphs and Orlin’s theorem
The sets of non-negative and non-positive integers are denoted by Z+ and Z−, respectively.
By a bi-infinite graph we mean an infinite directed graphD = (V,A)with node setV = (V ×Z) and arc setA ⊂ V ×V ,
assuming throughout that V is a finite set and thatD is locally finite, i.e., each node is incident with a finite number of arcs.
For u ∈ V and i ∈ Z, the u-th line in D is formed by the nodes (u, j) for all j ∈ Z, and the i-th column by the nodes (w, i)
for all w ∈ V . For brevity, for a node v = (u, i) and an integer k, the node (u, i+ k) is denoted by v+ k, and similarly for a set
of nodes. A set S of nodes is called consecutive (in each line) if (u, i) ∈ S and (u, i+ k) ∈ S with k > 0 imply (u, i+ h) ∈ S for
each h = 0, . . . , k.
A path in D is an alternating sequence P of (not necessarily distinct) nodes vi (i ∈ I) and arcs (vi, vi+1) (when i + 1 ∈ I).
Here I is either {0, . . . , n} for n ∈ Z+ (yielding a finite path) or I = Z+ or I = Z− or I = Z (a bi-infinite path). We also use
notation · · · → vi → vi+1 → · · · for P and, depending on the context, may consider a path as a subgraph ofD . For two nodes
u and v, we write u ∗→ v if there exists a (finite) path from u to v.
Two additional conditions on the bi-infinite graphs we deal with are imposed. A bi-infinite graph D is said to be one-
periodic if
(OP) for any two nodes v and v′ ofD , (v, v′) ∈ A if and only if (v+ 1, v′ + 1) ∈ A,
and causal if
(C) for any infinite path P indexed by Z+, the set P ∩ (V × Z−) is finite.
Note that properties (OP) and (C) imply thatD is acyclic. The result presented below is analogous to the classical Menger
theorem for usual finite graphs. A flow F is a set of pairwise (node-)disjoint bi-infinite paths. It is one-periodic if for each
path P in F , the path P + 1 belongs to F as well. A cut is a set of nodes that intersects every bi-infinite path. Clearly the
size of a flow cannot exceed the size of a cut. Also if D is one-periodic and causal, then the set V × {0, . . . , k} forms a cut,
where k := max{|i − j| | ((u, i), (w, j)) ∈ A} (k is finite sinceD is locally finite and one-periodic). Therefore, the maximum
cardinality of a flow is finite.
Theorem 2.1 (Orlin [11]). Let D be a bi-infinite graph satisfying (OP) and (C). The maximum cardinality of a flow is equal to
the minimum cardinality of a cut. Moreover, the maximum is attained by a one-periodic flow and the minimum is attained by a
consecutive cut.
Theorem 2.1 can be seen as a special case of Theorem 4 in [11]. Getting the above statement requires one transformation.
Each node inD is replaced by a triple (node-arc-node) on the same column. Now, Theorem 4 in [11] with upper and lower
capacities on all the arcs set to 1 and 0 respectively, is exactly Theorem 2.1.
An example illustrating a maximum one-periodic flow and a minimum consecutive cut is drawn in Fig. 1.
A representation of D in a compact form is obtained by “folding” D into a finite arc-weighted digraph R = (V, E,∆)
with possible multiple arcs, as follows. Let us say that arcs ((u, i), (w, j)) and ((u′, i′), (w′, j′)) of D are similar if u = u′,
w = w′, and j − i = j′ − i′. Since D is locally finite and one-periodic, the number of classes under this similarity relation is
finite, and each class, with a representative ((u, i), (w, j)), generates one arc e in R with tail t(e) := u, head h(e) := w, and
weight ∆(e) := j− i. We callR the folded graph associated withD (it is called a dynamic graph by Orlin). In Fig. 2, we have
represented the folded graph associated with the one-periodic bi-infinite graph of Fig. 1.
3500 B. Gaujal, J. Mairesse / Discrete Applied Mathematics 156 (2008) 3498–3505
Fig. 2. A folded graph.
Fig. 3. A consecutive cut and the corresponding splinters.
In Section 3, we use the reverse construction. Given a finite directed (multi)graphR = (V, E,∆)with∆ ∈ ZE, its unfolding
is the one-periodic bi-infinite graph D = (V × Z,A) in which ((u, i), (w, j)) ∈ A if and only if there is an arc e ∈ E with
t(e) = u, h(e) = w and j− i = ∆(e).
We say thatR is causal ifD is causal. Clearly,R is causal if and only if∆(C) > 0 for each cycle C ofR.
Splinters. LetD be causal and one-periodic. For a subset S ⊆ V , define
succ(S) = {v ∈ V \ S | ∃u ∈ S, (u, v) ∈ A}, pred(S) = {v ∈ V \ S | ∃u ∈ S, (v, u) ∈ A},
succ∗(S) = {v ∈ V \ S | ∃u ∈ S, u ∗→ v}, pred∗(S) = {v ∈ V \ S | ∃u ∈ S, v ∗→ u}.
We assume that each node of D is contained in an infinite path (indexed by Z+ or Z−). Then, given a consecutive cut C,
one can partition the nodes into three sets in a natural way: the set P(C) of nodes “before” C, the set S(C) of nodes “after” C,
and C itself. Note that the sets pred∗(C), C and succ∗(C) are pairwise disjoint because of causality but they need not cover
the whole graph. We extend pred∗(C) and succ∗(C) into the desired sets P(C) and S(C), respectively, by examiningD line by
line and acting as follows. Three cases are possible.
1. No bi-infinite path goes through line u but some infinite path indexed by Z+ does. Then there exists a path from u to C
and succ∗(C) ∩ u is empty. On u, we assign the nodes of pred∗(C) to P(C), and the other nodes to S(C).
2. No bi-infinite path goes through line u but some infinite path indexed by Z− does. On u, we assign the nodes of succ∗(C)
to S(C), and the other nodes to P(C).
3. There is a bi-infinite path intersecting u. Then succ∗(C), C, and pred∗(C) do partition line u. We make S(C) and P(C)
coincide with succ∗(C) and with pred∗(C) on u, respectively.
It is easy to check that the sets P(C) and S(C) are consecutive, using that the graph D is one-periodic. Also one can see
that pred(S(C)) = C and succ(P(C)) = C.
The sets P(C) and S(C) are called, respectively, the negative and positive splinters associated with C (see [3]). An example
is provided in Fig. 3. This partition is used in the next section.
3. Register minimization in digital circuits
In this section, the name graph stands for a finite directed multigraph with integer arc-weights, called delays.
A digital circuit is made of gates computing data according to boolean logical functions, wires connecting the gates and
memory registers on the wires which are storing the data between two computation cycles. With a digital circuit, we associate
the graphR = (V, E,∆), whose nodes, arcs, and delays correspond respectively to the gates, wires, and number of registers
B. Gaujal, J. Mairesse / Discrete Applied Mathematics 156 (2008) 3498–3505 3501
on the wires of the digital circuit. Since the number of registers has to be non-negative, a specificity of the graph associated
with a digital circuit is that it has only non-negative delays. Also, for physical reasons, any cycle in the digital circuit should
contain at least one register. The associated graph is therefore causal. So for our purpose, a digital circuit is simply a causal
graph with non-negative delays. The unfolding of the graphR can be viewed as the graph of the dependencies between the
computations performed by the digital circuit (the node (i, n) corresponds to the n-th computation at gate i). Our goal is to
minimize the number of registers used in a digital circuit in a sense to be made precise below (problem Min-Register). A
more thorough discussion of the relation between digital circuits and graphs is proposed in [9,5].
3.1. Computations in digital circuits
LetR = (V, E,∆) be a digital circuit. LetQ be a finite set (corresponding to all the different values that one register can
store) and let F be the set of functions fromQk toQ for all k ∈ Z+.
A specialization σ of the digital circuit consists in mapping one function of F to each gate ofR:
σ : V → F
u 7→ Fu.
The function Fu attached to gate u must have as many arguments as u has input arcs. In particular, if u is a node with no
predecessor, then Fu is a constant (since Fu is a function fromQ0 toQ).
On a wire e with∆(e) > 0, we denote the registers by (e, 1), . . . , (e,∆(e))where the ranking is performed according to
the physical order of the registers on the wire. Let M = {(e, n), e ∈ E, 1 ≤ n ≤ ∆(e)} be the set of all the registers. An initial
condition I assigns an initial value to each register of the digital circuit:
I : M → Q
m 7→ I(m).
The computation of (R,σ, I) is the sequence (x(u, n))u∈V,n∈Z+ where x(u, n) ∈ Q is the n-th value computed at gate u if
the values stored initially in the registers have been set by I and if the functions computed at each gate are those given by
σ. More formally, we have
x(u, n) = Fu[x(i(e1), n−∆(e1)), . . . , x(i(ek), n−∆(ek))], n ∈ Z+, u ∈ V, (1)
where e1, . . . , ek, are the arcs with terminal node u (listed according to some total order on E), and where, if n − ∆(ej)
< 0, x(i(ej), n−∆(ej)) = I((ej,∆(ej)− n)).
The computational power of a digital circuit R is defined as follows. For an arbitrary finite set Q, the sequence
(x(u, n))u∈V,n∈Z+ of elements ofQ is computable byR if there exists σ and I, such that the sequence is computed by (R,σ, I).
We say that a digital circuit R2 (nodes V2) has a larger computational power than a digital circuit R1 (nodes V1) if for an
arbitrary finite setQ and for each specialization σ1 ofR1, there exists a specialization σ2 ofR2 and an injective mapping
θ : V1 × Z+ → V2 × Z+, (2)
such that for any initial condition I1 for R1 there exists an initial condition I2 for R2 such that the computation
(x(u, n))u∈V1,n∈Z+ of (R1,σ1, I1) and the computation (y(u, n))u∈V2,n∈Z+ of (R2,σ2, I2) satisfy ∀(u, n) ∈ V1 × Z+, x(u, n)= y ◦ θ(u, n). Roughly speaking, this means that everything that can be computed by R1 can also be retrieved from a
computation carried overR2. Obviously, this retrieval is efficient only if θ is simple enough.
Remark. In relation to the above point, observe that the computational power of a digital circuit only depends on its topology,
namely V , E, and∆. It is independent ofQ, σ, or I.
3.2. Duplication and forward splitting
As mentioned before, the graphs corresponding to digital circuits only have non-negatives delays. However, it is essential
for the following sections to work on causal graphs that may have some negative delays.
Consider a causal graphR = (V, E,∆), possibly with some negative delays, we define the total number of delays ofR as
follows:
∆A(R) =
∑
e∈E
∆(e). (3)
Another quantity of interest is the following one:
∆B(R) =
∑
u∈V
max
e∈E,i(e)=u
∆(e). (4)
In words,∆B(R) only sums the maximum delays on the output arcs of all nodes.
The definition of ∆A(R) and ∆B(R) is illustrated in the upcoming Fig. 5. By causality, we have ∆A(R) ≥ 0 and
∆B(R) ≥ 0. Furthermore, as soon as R contains at least one cycle, we have ∆A(R) > 0 and ∆B(R) > 0. Since
∆A(R) =∑u∈V∑e∈E,i(e)=u∆(e), we have∆B(R) ≤ ∆A(R).
Introducing the quantity∆B is relevant because of the following result.
3502 B. Gaujal, J. Mairesse / Discrete Applied Mathematics 156 (2008) 3498–3505
Fig. 4. The duplication transformation.
Fig. 5. The forward splitting transformation.
Proposition 3.1 ([5]). There exists a causal graph denoted ϕ(R) (set of nodes ϕ(V)), with non− negative delays, such that:
(i) ∆A(ϕ(R)) = ∆B(R);
(ii) ϕ(R) has a larger computational power thanR. Furthermore, the map θ in (2) has the following form:
θ : V × Z→ ϕ(V)× Z, (v, n) 7→ (α(v), n+ cv), (5)
where α : V → ϕ(V) is an injection and cv, v ∈ V , are integer constants, which do not depend on the specialization and initial
condition of R.
Sketch of the proof. In [5, Propositions 6.5 and 7.3], this result is proved for recycled graphs. It is straightforward to see
that the proof extends to the general case. The transformation ϕ can be decomposed into two elementary operations onR.
The first one is a duplication of some nodes and the second one a forward splitting of arcs.
The duplication can be described by the following algorithm.
• While the graph contains an arc (v,w)with a negative delay−d;
1. duplicate the input node v into two nodes v and v′;
2. remove arc (v,w) and create a new arc (v′,w)with delay 0.
3. for each input arc (u, v) of v (with delay h), create a new arc (u, v′)with delay h− d.
An example of the local duplication transformation of one node is given in Fig. 4.
Note that after one duplication, new cycles may be created in the graph. However, the total delay on each new cycle is
the same as on the corresponding original cycle. Since the original graph is causal, all its cycles have positive delays and this
is also the case in the new graph. By a precise analysis of the circulation of negative delays, one can show that the algorithm
must stop after a finite number of duplications. Finally, note that the algorithm removes all negative delays and does not
change the value of∆B.
As for forward splitting, it was introduced in [9] under the name of “register sharing”. Forward splitting operates on
graphs with non-negative delays
The forward splitting can be described by the following algorithm.
• For each node v in the graph
1. replace all output arcs (v,w1), . . . , (v,wk) with sorted delays d1 ≥ · · · ≥ dk ≥ 0 with a path of length d1 over new
nodes v0 (=v)→ v1 · · · → vd1 where all arcs have delay 1;
2. for all nodes w1, . . . ,wk create an arc (vdk ,wk)with weight 0.
An example of the local forward splitting transformation on one node is given in Fig. 5. Note that forward splitting
transformsR into a new graphR′ such that∆A(R′) = ∆B(R). 
B. Gaujal, J. Mairesse / Discrete Applied Mathematics 156 (2008) 3498–3505 3503
Fig. 6. Retimed graph and its unfolding.
3.3. Solution to the Min-Register problem
We want to solve the Min-Register problem which is defined as follows:
Given a digital circuit R, find another digital circuit with at least the same computational power and with as few registers as
possible. This number will be denoted by Min-Reg(R).
LetR = (V, E,∆) be a digital circuit and let us recall the classical notion of retiming [9]. A retiming is a function r : V → Z.
It specifies a new graphRr and a new unfolded graphDr as follows:
• Rr = (V, E,∆r)with, for e ∈ E,∆r(e) = ∆(e)+ r(i(e))− r(t(e));
• Dr = (V × Z,Ar) is the unfolding ofRr; that is ((i, n), (j,m)) ∈ Ar ⇐⇒ ((i, n+ r(i)), (j,m+ r(j))) ∈ A.
In the example of Fig. 6, the new graphsRr andDr correspond to the retiming r defined by r(1) = 1, r(2) = 1 and r(3) =
0.
Using Proposition 3.1, for any retiming r, the graph ϕ(Rr) has non-negative delays and a larger computational power
thanR. In particular it implies that:
Min-Reg(R) ≤ min
r
∆B(Rr) = min
r
∆A(ϕ(Rr)),
where the minimum is taken over all possible retimings. The next theorem states that there is in fact equality.
Theorem 3.2. Let R be a digital circuit and let D be its unfolding. Then the minimum number of registers Min-Reg(R) is equal
to χ(D), the minimum cardinality of a cut of D . Let C be a consecutive cut of D of minimum cardinality and let S be the
corresponding positive splinter. For i ∈ V , let ni be such that (i, ni − 1) 6∈ S, (i, ni) ∈ S. Let r be the retiming defined by r(i) = ni.
Then the digital circuit ϕ(Rr) is a solution to the Min-Register problem.
Proof. We first prove that the number of registers of ϕ(Rr) is χ(D). Let fr be the map from V × Z into itself defined by
fr((u, n)) = (u, n−r(u)). Obviously, fr(S) is a positive splinter ofDr . Furthermore, by definition of r, we have fr(S) = {(u, n), u ∈
V, n ∈ Z+}. Now, the size of the cut C = pred(S) in D is the same as the size of the cut pred(fr(S)) in Dr . Let us consider a
node u ∈ V . Let m = maxe∈E,i(e)=u∆r(e) and let v be such that there exists e ∈ E with i(e) = u, t(e) = v,∆(e) = m. There
is an arc in Dr from (u,−m) to (v, 0) and no arc from a node (u, k), k < −m to a node (w, `), ` ≥ 0. Hence, by definition,
pred(fr(S)) contains exactly the nodes (u,−m), . . . , (u,−1) on line u. The same argument repeated on each line shows that
∆B(Rr) = |pred(S)| = |C| = χ(D). As recalled above, the graph ϕ(Rr) has non-negative delays, a larger computational
power thanR and satisfies∆A(ϕ(Rr)) = ∆B(Rr) = χ(D).
In the second part of the proof we show that there exist no digital circuits with at least the same computational power as
R and with strictly fewer registers than χ(D). LetR′ = (V ′, E′,∆′) be a digital circuit with at least the same computational
power asR and letD ′ be the unfolded graph associated withR′.
According to Theorem 2.1, there exists inD a one-periodic flow of cardinality |C|which defines a bijective mapping from
the nodes of C to the ones of C+ L, for any non-negative integer L. Let us choose a specialization σ : u 7→ Fu, in the following
way. Consider u ∈ V and let e1, . . . , ek, be all the arcs in R with terminal node u (listed according to some total order on
E). Let el be the only arc which corresponds to a set of arcs in D belonging to the flow. Then we define Fu : Qk → Q by
Fu(x1, . . . , xk) = xl. By composing the functions Fu, we get an application from the nodes of C to the ones of C+ L of the form
F : Q|C| → Q|C| which is a permutation of the coordinates. In particular F is bijective.
3504 B. Gaujal, J. Mairesse / Discrete Applied Mathematics 156 (2008) 3498–3505
For instance, consider Fig. 1. Rank the nodes of the cut C in the order: (2, k) < (3, k) < (4, k − 1) < (4, k). For L = 1,
the corresponding function is F(x1, x2, x3, x4) = (x1, x3, x4, x2). For L = 2, the function is F(x1, x2, x3, x4) = (x1, x4, x2, x3). For
L = 3, the function F is the identity.
Observe that when we let the initial condition I vary over all the possible values in Q∆A(R) then the values of the
computation of (R,σ, I) in the cut C, namely (x(u, n))(u,n)∈C , cover all the values inQ|C|.
Since R′ has a larger computational power than R, there exists a specialization σ′ of R′ and an injective mapping
θ : V × Z→ V ′ × Z such that each sequence (x(u, n))u∈V,n∈Z+ computed by (R,σ, I) for some initial condition I, is related to
a sequence (y(u, n))u∈V′,n∈Z+ computed by (R′,σ′, I′) for an adequate initial condition I′, by x(u, n) = y ◦ θ(u, n).
Let C′ be a minimum consecutive cut ofD ′. Let S′ and P′ be the positive and negative splinters associated with C′.
Let θ(C) be the image by θ of the cut C inD ′. Since θ is injective, we can assume by translating S′ and by choosing L large
enough, that θ(C) ⊂ P′, and θ(C + L) ⊂ S′. This means that θ(C) is on the “left” of C′ and θ(C + L) is on the “right” of C′. Let U′
be the subset of V ′ consisting of the nodes with no predecessor. By definition, given a specialization σ′ ofR′, we have
∀u ∈ U′, ∃cu ∈ Q,∀I′,∀n ∈ Z+, y(u, n) = cu.
where (y(u, n))u∈U′,n∈Z+ are the values computed by (R′,σ′, I′). We consider all the paths ofD ′ ending in θ(C+ L). Any such
path intersects the cut C′ or is finite and starts in a node of the type (u, n), u ∈ U′. Hence for any node (w, n) ∈ θ(C+L), we have
y(w, n) = G([cu]u∈U′ ; [y(v, k)](v,k)∈C′) for some function G depending only on σ′. We conclude that for a fixed specialization
of R′, the variables (y(u, n))(u,n)∈θ(C+L) can take at most |Q||C′| different values when the initial condition I′ varies. Since F is
bijective fromQ|C| into itself, it follows that |C| ≤ |C′|. 
Proposition 3.3. Let R be a digital circuit with n functional elements and m arcs. The circuit ϕ(Rr) can be constructed in
O(n(m+ n log n)) units of time.
Proof. The construction can be decomposed into several steps.
1. Computing the “max-flow min-cut” can be reduced to a maximum-weight perfect matching problem in a bipartite
(undirected) graph and its dual one, which can be done with time complexity O(n(m+ n log n)); see, e.g., [2].
2. The retiming and forward splitting operations are local and take O(m) units of time.
3. The duplication operation is also local. The number of duplications is bounded by
∑
e∈E∆(e) = O(nm). 
Theorem 3.2 deserves several comments:
1. Given a digital circuitRwith unfoldingD , the quantity χ(D) can be seen as the intrinsic quantity of memory needed to
carry all the computations which could be wired byR.
2. In the degenerated case where R is acyclic, the result Min-Reg(R) = χ(D) = 0, clearly holds. Computing the relevant
retiming is easy in this case.
3. The digital circuit R (nodes V) can be replaced by the digital circuit ϕ(Rr) (nodes ϕ(V)) without loss of computational
power. It is however necessary to ask if the mapping θ : V × Z+ → ϕ(V)× Z+, defined in (2), is simple enough. Actually,
it follows from (5) that the mapping θ is elementary.
4. To each minimal cut of the unfolding ofR corresponds an associated retiming r ofR. If there exists such a retiming r for
which the retimed graphRr contains only non-negative delays, then duplication is useless and only forward splitting is
necessary to get a circuit with a minimal number of registers. In that case, the optimal retiming technique given in [9]
coincides with our construction. However if all retimed graphRr contain negative delays, duplication is always necessary
in our construction and the optimal retiming technique given in [9] provides a circuit with strictly more registers than
ϕ(Rr).
3.4. An illustrating example
In this subsection, we go through a small example to show how the construction works.
Let us consider the digital circuit displayed in Fig. 7(a). Functional gates are represented by nodes with funny shapes
and registers by boxes. The initial number of registers is 5. Here, retiming alone does not help in reducing the number of
registers.
Following our algorithm, the construction of the associated unfolded graph is given in Fig. 7(b). The size of the maximal
flow is 4. This means that one can find an equivalent circuit with 4 registers. The shape of the corresponding splinter gives
the retiming to be applied. Duplicating node 1 finishes the construction of the circuit, displayed in Fig. 7(c). The new circuit
is equivalent to the original circuit, with the sequence of values computed in node 3 being shifted by one.
Acknowledgements
The authors wish to thank Alexander V. Karzanov who provided them with a simple proof of Theorem 2.1 and an
anonymous referee who pointed out the work of Orlin on dynamic graphs.
B. Gaujal, J. Mairesse / Discrete Applied Mathematics 156 (2008) 3498–3505 3505
(a) A digital circuit with 5 registers. (b) The associated unfolding, its maximal flow and the
corresponding positive splinter.
(c) The new circuit has 4 registers and node 1 has been duplicated.
Fig. 7. A circuit and its optimized version.
References
[1] V. Adlakha, V. Kulkarni, A classified bibliography of research on stochastic PERT networks: 1966–1987, INFOR 27 (3) (1989) 272–296.
[2] R. Ahuja, T. Magnanti, J.B. Orlin, Network Flows, Prentice Hall Inc., 1993.
[3] J.-C. Bermond, A. Bonnecaze, T. Kodate, S. Perennes, P. Solé, Symmetric flows and broadcasting in hypercubes, in: Symposium en mémoire de F. Jaegger,
Grenoble, France, 1998.
[4] A. Darte, Y. Robert, F. Vivien, Scheduling and Automatic Parallelization, Birkhauser, 2000.
[5] B. Gaujal, A. Jean-Marie, J. Mairesse, Computations of uniform recurrence equations using minimal memory size, SIAM J. Comput. 30 (5) (2000)
1701–1738.
[6] S.H. Gerez, Algorithms for VLSI Design Automation, Wiley, 1999.
[7] G.D. Hachtel, F. Somenzi, Logic Synthesis and Verification Algorithms, Kluwer Academic Publishers, 2000.
[8] R. Karp, R. Miller, S. Winograd, The organization of computations for uniform recurrence equations, J. Assoc. Comput. Mach. 14 (3) (1967) 563–590.
[9] C. Leiserson, J. Saxe, Retiming synchronous circuitry, Algorithmica 6 (1991) 5–35.
[10] G. de Micheli, Synthesis and Optimization of Digital Circuits, Mc Graw Hill, 1994.
[11] J.B. Orlin, Maximum-throughput dynamic network flows, Math. Program. 27 (2) (1983) 214–231.
[12] J.B. Orlin, Minimum convex cost dynamic network flows, Math. Oper. Res. 9 (2) (1984) 190–207.
