Abstract-Transformations using retiming and resynthesis operations are the most important and practical (if not the only) techniques used in optimizing synchronous hardware systems. Although these transformations have been studied extensively for over a decade, questions about their optimization capability and verification complexity are not answered fully. Resolving these questions may be crucial in developing more effective synthesis and verification algorithms. This paper settles the above two open problems. The optimization potential is resolved through a constructive algorithm which determines if two given finite state machines (FSMs) are transformable to each other via retiming and resynthesis operations. Verifying the equivalence of two FSMs under such transformations, when the history of iterative transformation is unknown, is proved to be polynomial-space-complete and hence just as hard as general equivalence checking, contrary to a common belief. As a result, we advocate a conservative design methodology for the optimization of synchronous hardware systems to ameliorate verifiability. Our analysis reveals some properties about initializing FSMs transformed under retiming and resynthesis. On the positive side, a lag-independent bound is established on the length increase of initialization sequences for FSMs under retiming. It allows a simpler incremental construction of initialization sequences compared to prior approaches. On the negative side, we show that there is no analogous transformation-independent bound when resynthesis and retiming are iterated. Nonetheless, an algorithm computing the exact length increase is presented.
Retiming and Resynthesis: A Complexity Perspective
Jie-Hong R. Jiang, Member, IEEE, and Robert K. Brayton, Fellow, IEEE Abstract-Transformations using retiming and resynthesis operations are the most important and practical (if not the only) techniques used in optimizing synchronous hardware systems. Although these transformations have been studied extensively for over a decade, questions about their optimization capability and verification complexity are not answered fully. Resolving these questions may be crucial in developing more effective synthesis and verification algorithms. This paper settles the above two open problems. The optimization potential is resolved through a constructive algorithm which determines if two given finite state machines (FSMs) are transformable to each other via retiming and resynthesis operations. Verifying the equivalence of two FSMs under such transformations, when the history of iterative transformation is unknown, is proved to be polynomial-space-complete and hence just as hard as general equivalence checking, contrary to a common belief. As a result, we advocate a conservative design methodology for the optimization of synchronous hardware systems to ameliorate verifiability. Our analysis reveals some properties about initializing FSMs transformed under retiming and resynthesis. On the positive side, a lag-independent bound is established on the length increase of initialization sequences for FSMs under retiming. It allows a simpler incremental construction of initialization sequences compared to prior approaches. On the negative side, we show that there is no analogous transformation-independent bound when resynthesis and retiming are iterated. Nonetheless, an algorithm computing the exact length increase is presented.
Index Terms-Computational complexity, equivalence verification, finite state machine (FSM), initialization sequence, resynthesis, retiming.
I. INTRODUCTION
R ETIMING [9] , [10] is an elementary yet effective technique in optimizing synchronous hardware systems. By simply repositioning registers, it is capable of rescheduling computation tasks in an optimal way subject to some design criteria. As both an advantage and a disadvantage, retiming preserves the circuit structure of the system under consideration. It is an advantage in that it supports incremental engineering change with good predictability and a disadvantage in that the optimization capability is somewhat limited. Therefore, resynthesis [1] , [13] , [14] was proposed to be combined with retiming, allowing modification of circuit structures. This combination of retiming and resynthesis certainly extends the optimization power of retiming, but to what extent remains an open problem, even though some notable progress has been made since [13] , e.g., [18] , [19] and [25] . Fully resolving this problem is crucial in understanding the complexity of verifying the equivalence of systems transformed by retiming and resynthesis and in constructing correct initialization sequences. In fact, despite its effectiveness, the transformation of retiming and resynthesis is not widely used in hardware synthesis flows due to the verification hindrance and the initialization problem. Progress in these areas could enhance the practicality and application of retiming and resynthesis, and advance the development of more effective synthesis and verification algorithms.
This paper tackles three main problems regarding retiming and resynthesis.
1) Optimization power:
What is the transformation power of retiming and resynthesis? How can we tell if two synchronous systems are transformable to each other with retiming and resynthesis operations? 2) Verification complexity:
What is the computational complexity of verifying if two synchronous systems are equivalent under retiming and resynthesis? 3) Initialization:
How does the transformation of retiming and resynthesis affect the initialization of a synchronous system? How can we correct initialization sequences?
Our main results include the following. 1) Characterize constructively the transformation power of retiming and resynthesis (Section III). 2) Prove the polynomial space (PSPACE)-completeness of verifying the equivalence of systems transformed by retiming and resynthesis operations when the transformation history is lost (Section IV). 3) Demonstrate the effects of retiming and resynthesis on the initialization sequences of synchronous systems. Present an algorithm correcting initialization sequences (Section V).
This paper is organized as follows. After Section II introduces some preliminaries and notation, our main results are presented in Sections III-V. In Section VI, a closer comparison with prior work is detailed. Section VII concludes this paper and outlines some future research directions. 
A. Synchronous Hardware Systems
Based on [9] , a syntactical definition of synchronous hardware systems can be formulated as follows. A hardware system is abstracted as a directed graph, called a communication graph, G = (V, E) with typed vertices V and weighted edges E. Every vertex v ∈ V represents either the environment or a functional element. The vertex representing the environment is the host, which is of type undefined; a vertex is of type f if the functional element it represents is of function f (which can be a multiple-output function consisting of f 1 , f 2 , . . .). Every edge e w = (u, v) w ∈ E with a nonnegative integer-valued weight w corresponds to the interconnection from vertex u to vertex v interleaved by w state-holding elements (or registers). From the viewpoint of hardware systems, any component in a communication graph disconnected from the host is redundant. Hence, in the sequel, we assume that a communication graph is a single connected component. A hardware system is synchronous if, in its corresponding communication graph, every cycle contains at least one positive-weighted edge. This paper is concerned with synchronous hardware systems whose registers are all triggered by the same clock ticks. Moreover, according to the initialization mechanism, a register can be reset either explicitly or implicitly. For registers with explicit reset, their initial values are determined by some reset circuitry when the system is powered up. In contrast, for registers with implicit reset, their initial values can be arbitrary, but can be brought to an identified set of states (i.e., the set of initial states) 2 by applying some input sequences, the so-called initialization (or reset) sequences [17] . It turns out that explicit-reset registers can be replaced with implicit-reset ones plus some reset circuitry [14] , [21] . Doing so admits a more systematic treatment of retiming synchronous hardware systems because retiming explicit-reset registers needs special attention to maintain equivalent initial states. Without loss of generality, this paper assumes that all registers have implicit reset. In addition, we are concerned with initializable systems, that is, there exist input sequences which bring the systems from any state to some set of designated initial states.
The semantical interpretation of synchronous hardware systems can be modeled as finite state machines (FSMs). An FSM M is a tuple (Q, I, Σ, Ω, δ, λ), where Q is a finite set of states, I ⊆ Q is the set of initial states, Σ and Ω are the input and output alphabets, respectively, and δ : Σ × Q → Q (respectively, λ : Σ × Q → Ω) is the transition function (respectively, output function). Let V S , V I , and V O be the sets of variables that encode the states, input alphabet, and output alphabet, respectively. Then,
. As a convention, for a (current-)state variable s, its primed version s denotes the corresponding next-state variable.
To construct an FSM from a communication graph G = (V, E), for the sake of convenience we build another communication graph G = (V , E ) from G as follows. Initially, let V = V and E = {e w ∈ E|w = 0, 1}. For each (u 1 , u 2 ) w ∈ E with w ≥ 2, we introduce w − 1 new vertices of type identity mapping to V , say {v 1 , . . . , v w−1 }, and add w
With the so-constructed G , we can associate a current-state variable and a next-state variable to each (u, v) 1 ∈ E to denote the output and input of the register on this edge, respectively. Let the transitive fanin cone rooted at a nonhost vertex v ∈ V , denoted as TFI(v), be the set of nonhost vertices u ∈ V such that either u = v or there exists
The transition function of a state variable s is the overall function defined by the vertices in TFI(t) for t the fanin vertex of the register associated with state variable s. Similarly, an output function is the overall function defined by the vertices in TFI(t) for t the fanin vertex of the corresponding output variable. Since any circuit implementing an FSM can be abstracted as a communication graph, a communication graph can be seen as a realization of an FSM.
The behavior of an FSM can be described in another graphical representation, the so-called state diagram [8] or state transition graph (STG). The STG Γ = (N, A) of an FSM (Q, I, Σ, Ω, δ, λ) has nodes N representing states Q and labeled arcs A representing transitions specified by δ and λ. A detailed construction can be found, e.g., in [8] .
We define a strong form of state equivalence which will govern the study of the transformation power of retiming.
Definition 1: Given an FSM M = (Q, I, Σ, Ω, δ, λ), two states q 1 , q 2 ∈ Q are immediately equivalent, 3 denoted as
Notice that ∼ = is reflexive, symmetric, and transitive, and thus is an equivalence relation. Also, note that the immediate equivalence differs from the standard state equivalence [8] , which says that two states of an FSM are equivalent if starting from either of the two states the FSM is indistinguishable in its input-output behavior. Fig. 1 .
The introduced three representations, communication graphs, FSMs, and STGs, are used throughout this paper to represent synchronous hardware systems. Although these representations are interchangeable, their succinctness in representing sequential systems may differ and affect the measures in complexity analysis. To represent synchronous hardware systems with FSMs, the input size is measured mainly by the length of the formulas of transition and output functions. For the communication graph representation, the input size is measured by the length of representing typed vertices and weighted edges. Since the translation between an FSM and a communication graph is often linear, these two representations of synchronous hardware systems are of similar succinctness. On the other hand, STGs are graphs whose sizes are measured by the number of vertices (states) and edges (transitions). Translating an FSM or a communication graph into an STG suffers the so-called state explosion problem since the number of states is exponential in the number of state variables. Therefore, STGs are not efficient in representing synchronous hardware systems. However, they provide a friendly data structure to conceptualize the transformation power of retiming and resynthesis. In the sequel, complexity analysis may be conducted over different representations. It is important to notice the exponential gap between the STG representation and the other two representations.
B. Retiming
A retiming operation over a synchronous hardware system consists of a series of atomic moves of registers across functional elements in either a forward or backward direction. The relocation of registers is crucial in exploring optimal synchronous hardware systems with respect to various design criteria, such as area, performance, power, etc. As is not our focus, the exposition of retiming in the optimization perspective is omitted in this paper. Interested readers are referred to [10] . Formally speaking, retiming can be described with a retime function [9] over a communication graph as follows.
Definition 3: Given a communication graph G = (V, E), a retime function ρ : V → Z maps each vertex to an integer, called the lag of the vertex, such that w
Given a communication graph G = (V, E), any retime function ρ over G uniquely determines a "legally" retimed com-
† . By symmetry, the retime function −ρ reverses the retiming from G † to G. Fig. 2 shows the retime functions of a vertex v in some communication graph corresponding to atomic backward and forward moves of registers.
Retime functions can be naturally classified by calibrating their equivalences as follows.
Definition 4: Given a communication graph G, two retime functions ρ 1 and ρ 2 are equivalent if they result in the same retimed communication graph.
Proposition 1: Given a retime function ρ 1 with respect to a communication graph, let ρ 2 = ρ 1 − c for some constant c ∈ Z. Then, ρ 1 and ρ 2 are equivalent.
Hence, any retime function can be normalized. This equivalence relation, which will be useful in the study of the increase of initialization sequences due to retiming, induces a partition over retime functions. Equivalent retime functions (with respect to some communication graph) form an equivalence class.
Proposition 2: Given a communication graph G, any equivalence class of retime functions is of infinite size; any equivalence class of normalized retime functions is of size either one or infinity (only when G contains components disconnected from the host). Furthermore, any equivalence class of retime functions has a normalized member.
C. Resynthesis
A resynthesis operation over a function f rewrites the syntactical formula representation of f while maintaining its semantical functionality. Clearly, the set of all possible rewrites is infinite (but countable, namely, with the same cardinality as the set N of natural numbers). When a resynthesis operation is performed upon a synchronous hardware system, we shall mean that the transition and output functions of the corresponding FSM are modified in representations but preserved in functionalities. This modification in representations will be reflected in the communication graph of the system. Again, such rewrites are usually subject to some optimization criteria. Since this is not our focus, the optimization aspects of resynthesis operations are omitted. See, e.g., [1] for further treatment.
The effects of retiming and resynthesis on a communication graph G = (V, E) are important for our later development and worth emphasis. Retiming only alters the weights (i.e., numbers of registers on edges) of edges E, whereas the vertices and their connections of G are not affected by retiming. Resynthesis, on the other hand, can change both the vertices and their connections. However, since it needs to preserve the functionalities of transition and output functions, it can only modify a purely combinational block (i.e., a set of vertices along with the zeroweight edges connecting them). Therefore, edges E + ⊆ E with positive weights remain intact throughout resynthesis while vertices V and edges E \ E + can be completely changed. The optimization capabilities of retiming and resynthesis are complementary.
III. OPTIMIZATION CAPABILITY
The transformation power of retiming and resynthesis can be understood best with STGs defined by FSMs. We investigate how retiming and resynthesis operations can alter STGs.
A. Optimization Power of Retiming
Given a communication graph G = (V, E), we study how the atomic forward and backward moves of retiming affect the
To study the effect of an atomic backward move, consider a normalized retime function ρ with ρ(v) = 1 for some vertex v ∈ V as shown in Fig. 3 , and ρ(u) = 0 for all u ∈ V \ {v}. (Because a retiming operation can be decomposed as a series of atomic moves, analyzing ρ defined above suffices to demonstrate the effect.) Let V S = {s 1 , . . . , s n } be the state variables of M. Then, according to the atomic backward move of retiming, V S can be partitioned into two disjoint subsets: V S = {s 1 , . . . , s i }, those changed by retiming, and
Moreover, since f is a total function, every state of M † has a corresponding state in M related by R. It corresponds to the fact that backward moves of retiming cannot increase the length of initialization sequences, the subject to be discussed in Section V. On the other hand, since f may not be a surjective (or an onto) mapping in general, there may be some state
, that is, no states can transition to q. In this case, q can be seen as being annihilated after retiming. To summarize, we give the following. [19] , where the phenomena of state creation and annihilation were omitted.)
Note that two immediately equivalent states, say q 1 and q 2 , of an FSM may become not immediately equivalent when their common successor state splits into multiple states due to backward retiming. In this case, q 1 and q 2 may transition to different successors and become not immediately equivalent. In contrast, two nonimmediately equivalent states of an FSM may possibly become immediately equivalent when their successor states are merged due to forward retiming. Therefore, retiming may not preserve the state relation of immediate equivalence. That is, this equivalence relation is not an invariant under retiming. However, the relation of standard state equivalence [11] is an invariant even under retiming and resynthesis to be discussed in Section III-B. Also, notice that, in a single atomic forward move of retiming, transitions among the newly created states are prohibited. In contrast, when a sequence of atomic forward moves m 1 , . . . , m n are performed, the newly created states at move m i can possibly have predecessor states created in later moves m i+1 , . . . , m n . Therefore, all the newly created states not merged with original existing states by immediate equivalence are dangling. However, to be shown in Section V-A, the transition paths among these dangling states cannot be arbitrarily long.
Since a retiming operation consists of a series of commutative 5 atomic moves, Lemmas 1 and 2 set the fundamental rules of all possible changes of STGs by retiming. Observe that a retiming operation is always associated with some structure (i.e., a communication graph). For a fixed structure, a retiming operation has limited optimization power because the configurations of register positions are finite and confined to the structure. That is, there may not exist a series of atomic moves of retiming (over a communication graph) which meet arbitrary targeting changes on an STG with respect to the manipulations on immediately equivalent states. In fact, the converses of Lemmas 1 and 2 are not true (that is, there may not exist atomic moves of retiming achieving some designated state splitting, merging, creation, and/or annihilation) since one can design a communication graph in a way that the register positions are fixed and thus immediately equivalent states cannot be manipulated as desired. Fig. 4 shows an example where the register position cannot be changed. Unlike a retiming operation, a resynthesis operation provides the capability of modifying the vertices and connections of a communication graph.
B. Optimization Power of Retiming and Resynthesis
A resynthesis operation itself cannot contribute any changes to the STG of an FSM. However, when combined with retiming, it becomes a handy tool. In essence, the combination of retiming and resynthesis validates the converse of Lemmas 1 and 2 as will be shown in Theorem 1. Moreover, it determines the transitions of newly created states due to forward retiming moves, and thus has decisive effects on initialization sequences as will be discussed in Section V-B. On the other hand, we shall mention an important property about retiming and resynthesis operations.
Lemma 3: Given an FSM, the newly created states (not existing in the original STG) due to atomic moves of retiming remain dangling throughout iterative retiming and resynthesis operations if not merged with the original existing states due to immediate equivalence.
Proof: Prove by induction on the structure of STGs modified by retiming. Notice that resynthesis is not capable of modifying an STG but is useful in increasing retiming configurations.
In the base case, there are no newly created states initially. Thus, no newly created states can become nondangling. In the inductive case, assume that, before and at the kth iteration of retiming (and resynthesis), no newly created dangling states become nondangling if not merged with the original existing states. Suppose the (k + 1)th iteration is performed. Four cases induced by retiming need to be analyzed: state annihilation, creation, merge, and split. However, no dangling states can become nondangling due to state annihilation and creation. We only need to focus on state merge and split. For state merge, merging two dangling immediately equivalent states yields no nondangling state because the predecessor states of the new merged state are all dangling. In other words, a state derived from merging two immediately equivalent states is nondangling only if at least one of its original two states is nondangling. However, in the inductive hypothesis, we assume that no newly created dangling states become nondangling before and at the kth iteration. The nondangling states must exist in the original STG. Consequently, no dangling states can become nondangling without merging with the original existing states. For state split, splitting a state q into multiple immediately equivalent states q 1 and q 2 redistributes any incoming edge to q to either q 1 or q 2 . As a consequence, if q is dangling, then q 1 and q 2 must be dangling as well because all predecessor states of q (and thus of q 1 and q 2 ) are dangling. That is, no dangling states can become nondangling due to state split. Therefore, the newly created states due to retiming remain dangling throughout iterative retiming and resynthesis operations if not merged with the original existing states.
Remark 1: As an orthogonal issue to our discussion on how retiming and resynthesis can alter the STG of an FSM, the transformation of retiming and resynthesis was shown [14] to have the capability of exploiting various state encodings (or assignments) of an FSM.
Notice that the induced state space of the dangling states originating from atomic moves of retiming is immaterial in our paper of the optimization capability of retiming and resynthesis because an FSM after initialization never reaches such dangling states. An exact characterization of the optimization power of retiming and resynthesis is given as follows.
Theorem 1: Ignoring the (unreachable) dangling states created due to retiming, two FSMs are transformable to each other through retiming and resynthesis if, and only if, their STGs are transformable to each other by a sequence of splitting a state into multiple immediately equivalent states and of merging multiple immediately equivalent states into a single state.
Proof: (=⇒) Since resynthesis does not change the transition functions of an FSM, the proof is immediate from Lemmas 1 and 2.
(⇐=) Given a target sequence of merging and splitting of immediately equivalent states, it can be accomplished by a sequence of retiming and resynthesis. Essentially, each merging (respectively splitting) of states can be achieved with a resynthesis operation followed by a forward (respectively backward) retiming operation. To see why, let Σ and Q be the input alphabet and state set of M, respectively. Without loss of generality, assume that q 1 , q 2 ∈ Q are immediately equivalent states to be merged. (Merging more than two states can be done similarly.) As illustrated in Fig. 5 , an resynthesis operation can rewrite the original transition functions δ : Σ × Q → Q as a composition of two parts, δ(σ, q) = ∆ 2 (σ, ∆ 1 (q)), where
In addition, ∆ 1 (q 2 ) = q 1 , and ∆ 1 (q) = q for q = q 2 . Retiming registers forward to the positions in between ∆ 1 and ∆ 2 results in a new state transition function Fig. 5(c) . The new transition function in effect merges immediately equivalent states q 1 and q 2 . Notice that the retiming operation is always possible because the output functions can be rewritten to depend on Q \ {q 2 } without affecting the global behavior of M.
On the other hand, assume q ∈ Q is the state to be split into multiple immediately equivalent states Q † , with Q † ∩ Q = ∅. As illustrated in Fig. 6 , an resynthesis operation can again rewrite the original transition functions δ as a composition of two parts, δ = ∆ 4 • ∆ 3 , where
Retiming registers to the positions in between ∆ 3 and ∆ 4 results in a new state transition function ∆ 3 (σ, ∆ 4 (q)) as shown in Fig. 6(c) . The new transition function in effect splits q to Q † . Notice that the retiming is always possible because the output functions, originally depending on Q, can be rewritten (by resynthesis) as functions depending on Q † ∪ Q \ {q}. Consequently, any sequence of merging and splitting of immediately equivalent states is achievable using retiming and resynthesis operations.
A similar result of Theorem 1 appeared in [19] , where, however, the optimization power of retiming and resynthesis was overstated as will be detailed in Section VI. Notice that the statement of Theorem 1 is not constructive in the sense that no procedure is given to determine if two FSMs are transformable to each other under retiming and resynthesis. This weakness motivates us to study a constructive alternative.
From Theorem 1, one can show that retiming and resynthesis cannot alter the sequential (input-output) behavior of an FSM in the induced state subspace consisting of nondangling states.
Corollary 1: Given two FSMs M = (Q, I, Σ, Ω, δ, λ) and 
Since the so constructed R satisfies the two criteria along the state merging and splitting transformations, the corollary follows.
Since the relation R of Corollary 1 is a strict subset of the general state equivalence relation [11] , the input-output behavior of an FSM in the nondangling state subspace is not affected under retiming and resynthesis.
Remark 2: Peripheral retiming [14] generalizes standard retiming in that edges with negative weights are temporarily allowed. One might ask if this generalization increases the optimization power of retiming and resynthesis. The answer to this question is negative as we argue below.
Peripheral retiming and resynthesis work as follows. A peripheral retiming operation is performed on a communication graph G = (V, E) such that edges with negative weights are allowed to exist temporarily. A resynthesis operation is then performed on the peripheral retimed communication graph, yielding a new communication graph
To ensure that edges of negative weights will be recovered to possess nonnegative weights later, the resynthesis operation needs to preserve these edges in the modified communication graph. Another retiming operation on G † , yielding
, must ensure that all edges E ‡ are of nonnegative weights. If the last step fails, the entire transformation is illegal. We are concerned with legal transformation only. Observe that the edges with nonzero weights in E † survive throughout the above operations (as discussed at the end of Section II). That is, these edges exist in both E and E ‡ as well, except for some weight changes due to the retiming operations before and after resynthesis. Valuations on state variables of G (respectively G ‡ ) induce valuations on the variables of these edges in G (respectively G ‡ ). Let Q and Q ‡ be the state sets of G and G ‡ , respectively. State pairs (q ∈ Q, q ‡ ∈ Q ‡ ) yielding the same valuations on these edges form a state relation of immediate equivalence, similar to the arguments for Lemma 1. Even iterating peripheral retiming and resynthesis cannot provide more transformation power than that specified in Theorem 1. Hence, when combined with resynthesis, peripheral retiming does not provide more transformation power than standard retiming.
It is noteworthy that, although in theory peripheral retiming combined with resynthesis does not increase the transformation power of standard retiming combined with resynthesis, it is useful in practice for design optimization.
C. Retiming-Resynthesis Equivalence and Canonical Representation
Given an FSM, the transformation of retiming and resynthesis operations can rewrite it into a class of equivalent FSMs (constrained by Theorem 1). We ask if there exists a computable canonical representative in each such class, and answer this question affirmatively by presenting a procedure constructing it. Rather than arguing directly over FSMs in terms of transition and output functions, we simplify our exposition by arguing over STGs.
Because retiming and resynthesis operations are reversible, we know Proposition 3: Given STGs G, G 1 , and G 2 . Suppose G 1 and G 2 are derivable from G using retiming and resynthesis operations. Then, G 1 and G 2 are transformable to each other under retiming and resynthesis.
We say that two FSMs (STGs) are equivalent under retiming and resynthesis if they are transformable to each other under retiming and resynthesis. Thus, any such equivalence class is complete in the sense that any member in the class is transformable to any other member. To derive a canonical representative of any equivalence class, consider the algorithm outlined in Fig. 7 . Similar to the general state minimization algorithm [8] , the idea is to seek a representative minimized with respect to the immediate equivalence of states. However, unlike the least-fixed-point computation of the general state minimization, the computation in Fig. 7 looks for a greatest fixed point. 6 Given an STG, the algorithm first removes all the dangling states, and then iteratively merges immediately equivalent states until no more states can be merged.
Theorem 2: Given an STG G, Algorithm ConstructQuotientGraph produces a canonical state-minimized solution, which is equivalent to G under retiming and resynthesis.
Proof: It is clear that the algorithm always terminates for finite STGs.
Recall our assumption that FSMs are of implicit reset. Since dangling states do not affect the normal operation of an FSM (but affect its initialization), the algorithm can safely remove the state space induced by the dangling states and consider only the remaining state space. (See also Proposition 5.)
For the sake of contradiction, assume the algorithm produces two different (nonisomorphic) quotient graphs G 1/ and G 2/ for two given STGs G 1 and G 2 , respectively, which are equivalent under retiming and resynthesis. Because the algorithm merges only immediately equivalent states, G 1/ and G 2/ must also be equivalent under retiming and resynthesis (but not isomorphic by assumption). Since G 1/ and G 2/ are not isomorphic, there does not exist a bijection (a one-to-one and onto mapping) 6 In the fixed-point computation of the general state minimization, there is initially only one equivalence class, i.e., the universal state set. In the following iterative computation, the state space is refined monotonically, and thus the number of equivalence classes increases monotonically. It can be seen as a least fixed-point computation in the sense that it is analogous to the least fixed-point computation of reachability analysis, where the reached state set increases monotonically. However, unlike the general state minimization, the computation of Fig. 7 looks for a greatest fixed point in the following sense. Initially, every equivalence class is a singleton set, consisting of one state. Thus, the number of equivalence classes equals the state size initially. In the iterative computation, equivalence classes are merged with respect to immediate equivalence, and the number of equivalence classes decreases monotonically.
between states of G 1/ and states of G 2/ such that the bijection preserves immediate equivalence. Two cases need to be considered. First, there exists an onto but not one-to-one mapping from one graph to the other which preserves immediate equivalence. In this case, not both G 1/ and G 2/ are maximally reduced. It contradicts with the assumption that any two states in a quotient graph cannot be immediately equivalent. Second, there exists no mapping preserving immediate equivalence. However, from Proposition 3, we know that G 1/ is transformable to G 1 , then to G 2 , and finally to G 2/ . Hence, a mapping that preserves immediate equivalence must exist between G 1/ and G 2/ . Again a conflict arises. The theorem follows.
For a naïve implementation based on explicit graph enumeration, Algorithm ConstructQuotientGraph can be done in time complexity O(kn 2 ), where k is the size of the input alphabet and n is the number of states. This complexity can be obtained from the following analysis.
Step 1 of 
. Therefore, the overall time complexity for Algorithm ConstructQuotientGraph is O(kn 2 ). Notice that the complexity is exponential when the input is an FSM, instead of an STG, representation. (We distinguish between an FSM, a tuple (Q, I, Σ, Ω, δ, λ), and its STG. For an FSM, its behavior is described with transition and output functions rather than in graph representation. The size (or complexity measure) of an FSM is in terms of the size of binary encodings of its transition and output functions. On the other hand, the size of an STG is in terms of its number of nodes, i.e., states, and edges, i.e., transitions.) For an implicit symbolic implementation, the complexity depends heavily on the internal symbolic representations. If Step 3 in Fig. 7 computes and merges all immediately equivalent states at once in a breadthfirst-search manner, then the algorithm converges in a minimum number of iterations.
From the proof of Theorem 2, an algorithm outlined in Fig. 8 can check if two STGs are transformable to each other under retiming and resynthesis.
Theorem 3: Given two STGs, Algorithm VerifyEquivalenceUnderRetiming&Resynthesis verifies if they are equivalent under retiming and resynthesis.
Proof: A direct consequence of Theorem 2. Notice that the algorithm of Fig. 8 can be modified to construct retiming and resynthesis steps translating one FSM M 1 to the other M 2 . For instance, G 1 of M 1 can be first reduced to the quotient graph, from which we can reverse the reduction procedure of G 2 of M 2 and thus bring G 1 to G 2 . As shown in the proof of Theorem 1, retiming and resynthesis operations can be derived from these state manipulations.
Observe the specialty that FSMs are deterministic and with known initial states. Hence, the complexity of the algorithm in Fig. 8 is the same as that in Fig. 7 since the graph isomorphism check for such STGs is O(kn), which is not the dominating factor. With the presented algorithm, checking the equivalence under retiming and resynthesis is not easier than general equivalence checking. In the following section, we investigate its intrinsic complexity.
As an example, Fig. 9 shows three STGs (a), (b), and (c). Their equivalence under retiming and resynthesis can be checked by Theorem 3. It can be verified that STGs (a) and (b) are transformable to each other under retiming and resynthesis, but they are not transformable to STG (c).
IV. VERIFICATION COMPLEXITY
We show some complexity results of verifying if two FSMs are equivalent under retiming and resynthesis.
A. Verification With Unknown Transformation History
We investigate the complexity of verifying the equivalence of two FSMs with unknown history of (iterative) retiming and resynthesis operations.
Theorem 4: Determining if two FSMs are equivalent under iterative retiming and resynthesis with unknown transformation history is PSPACE-complete.
Proof: Certainly Algorithm VerifyEquivalenceUnderRetiming&Resynthesis can be performed in PSPACE (even with inputs in FSM representations).
On the other hand, we need to reduce a PSPACE-complete problem to our problem at hand. The following problem is chosen.
Given a total function f : {1, . . . , n} → {1, . . . , n}, is there a composition of f such that, by composing f k times, f k (1) = n? In other words, the problem asks if n is "reachable" from 1 through f . It was shown [7] to be deterministic 7 LOGSPACEcomplete in the unary representation and, thus, PSPACEcomplete in the binary representation [16] . We show that the Fig. 9 . STGs in (a) and (b) are equivalent under retiming and resynthesis transformation. Since states q 0 and q 1 in (a) are immediately equivalent, they can be merged and thus the STG can be simplified to that in (b). On the other hand, although the STG in (c) is equivalent to the previous two in terms of input-output behaviors, it is not equivalent to them under retiming and resynthesis transformation. problem in the unary (respectively binary) representation is logspace (respectively polynomial-time) reducible to our problem with inputs in STG (respectively FSM) representations. We further establish that the answer to the PSPACE-complete problem is positive if and only if the answer to the corresponding equivalence verification problem (to be constructed) is negative. Since the complexity class of nondeterministic space is closed under complementation [4] , the theorem follows.
To complete the proof, we elaborate the reduction. Given a function f as stated earlier, we construct two total functions f 2 : {0, 1, . . . , n} → {0, 1, . . . , n} as follows. Let f 1 have the same mapping as f over {1, . . . , n − 1} and have f 1 (0) = 1 and f 1 (n) = 1. Also let f 2 have the same mapping as f with f 2 (0) = 1 but f 2 (n) = 0. Clearly the constructions of f 1 and f 2 can be done in LOGSPACE. Treating {0, 1, . . . , n} as the state set, functions f 1 and f 2 specify the transitions of two STGs (with an empty input alphabet), say G 1 and G 2 , respectively, as shown in Fig. 10 . In addition, let all the states of G 1 and G 2 have the same output observation. That is, the output functions of the FSMs of G 1 and G 2 do not distinguish states. Under this setting, observe that any state of G 1 (similarly G 2 ) has exactly one next state. Thus, every state is either in a single cycle or on a single path leading to a cycle. Observe also that two states of G 1 (similarly G 2 ) are immediately equivalent if and only if they have the same next state. An important consequence of these two observations is that any dangling state (not in a cycle) can eventually be merged, due to immediate equivalence, with some nondangling state (in a cycle) which has the same next state. By Theorem 1, this merging process can be achieved with retiming and resynthesis over the FSMs defined by G 1 and G 2 .
To see the relationship between reachability and the equivalence under retiming and resynthesis, consider the case where n is reachable from 1 through f . States 1 and n of G 1 must be in a cycle excluding state 0; states 1 and n of G 2 must be in a cycle including state 0. Hence, the state-minimized (with respect to immediate equivalence) graphs of G 1 and G 2 are not isomorphic. That is, G 1 and G 2 are not equivalent under retiming and resynthesis. On the other hand, consider the case where n is unreachable from 1 through f . Then, state n of G 1 and state n of G 2 are dangling. From the mentioned observations, merging dangling states with nondangling states in G 1 and in G 2 yields two isomorphic graphs. The isomorphism can be established by a mapping π from the set of nondangling states of G 1 to that of G 2 , and vice versa, with π(i) = i. That is, G 1 and G 2 are equivalent under retiming and resynthesis. Therefore, n is reachable from 1 through f if, and only if, G 1 and G 2 are not equivalent under retiming and resynthesis. Notice that, unlike the discussion of optimization capability, here we should not ignore the effects of retiming and resynthesis over the unreachable state space.
B. Verification With Known Transformation History
By Theorem 4, verifying if two FSMs are equivalent under retiming and resynthesis without knowing the transformation history is as hard as the general equivalence checking problem. Thus, we advocate a conservative design methodology optimizing synchronous hardware systems to ameliorate verifiability.
An easy approach to circumvent the PSPACE-completeness is to record the history of retiming and resynthesis operations as verification checkpoints, or alternatively to perform equivalence checking after every retiming or resynthesis operation. The reduction in complexity results from the following wellknown facts.
Proposition 4: Given two synchronous hardware systems, verifying if they are transformable to each other with retiming is of the same complexity as checking graph isomorphism (for communication graphs without edge weights), which is within NP ∩ coNP; verifying if they are transformable to each other with resynthesis is of the same complexity as combinational equivalence checking, which is coNP-complete.
Therefore, if transformation history is completely known, the verification complexity reduces to coNP-complete.
V. INITIALIZATION SEQUENCES
To discuss initialization sequences, we rely on the following proposition of Pixley [17] .
Proposition 5: [7] The initial states of an initializable FSM cannot be dangling. Moreover, any nondangling state of an initializable FSM can be used as an initial state by suitably modifying initialization sequences.
By Corollary 1, the behavior of an FSM in nondangling states cannot be altered by retiming and resynthesis. Also, by Lemma 3, newly created states by retiming (and resynthesis) not immediately equivalent to any nondangling states remain dangling throughout iterative retiming and resynthesis operations. Adding dangling states does not affect initializability because prefixing an original initialization sequence with a long enough input sequence can drive an FSM to some nondangling state, which is a legitimate initial state by Proposition 5. (Note that any dangling state will eventually reach some nondangling state after a long enough input sequence is applied, regardless of the input patterns.) As a result, we have the following.
Corollary 2: The initializability of an FSM is an invariant under retiming and resynthesis.
Hence, we shall assume that the given FSM M is initializable. Furthermore, we assume that its initialization sequence is given as a black box. That is, we have no knowledge on how M is initialized. Under these assumptions, we study how the initialization sequence is affected when M is retimed (and resynthesized). As shown earlier, the creation and annihilation of dangling states are immaterial to the optimization capability of retiming and resynthesis. However, they play a decisive role in affecting initialization sequences. In essence, the longest transition path among dangling states determines how long the initialization sequences should be increased. That is, prefixing the original initialization sequence of M with an arbitrary input sequence of length no less than results in a valid initialization sequence for M † . Thus, (nonnegative for normalized ρ) 9 gives an upper bound of the increase of 8 A state q of FSM M is equivalent to a state q † of FSM M † if M starting from q, and M † starting from q † have the same input-output behavior. 9 Recall that a normalized retime function ρ is with ρ(host) = 0.
initialization sequences under retiming. This bound was further tightened in [2] , [22] by letting be the maximum of −ρ(v) for all v of functional elements whose functions define nonsurjective mappings. Unfortunately, this strengthening still does not produce an exact bound. Moreover, by Proposition 1, a normalized retime function among its equivalent retime functions may not be the one that gives the tightest bound. A derivation of exact bounds will be discussed in Section V-B.
2) Lag-Independent Bounds: Given a synchronous hardware system, a natural question is if there exists some bound which is universally true for all possible retiming operations. Even though the bound may be looser than lag-dependent bounds, it discharges the construction of new initialization sequences from knowing what retime functions have been applied. Indeed, such a bound does exist as exemplified in the following. Thus, max v r(v), which is intrinsic to a communication graph and is independent of retiming operations, yields a lagindependent bound.
When initialization delay is not a concern for a synchronous system, one can even relax the above lag-independent bound by saying that the total number of registers of the system is another lag-independent bound. As an example, suppose a system has one million registers and its retimed version runs at 1-GHz clock frequency. Then, the initialization delay increased due to retiming is less than a thousandth of a second.
B. Initialization Affected by Retiming and Resynthesis
Thus, far we have focused on initialization issues arising when a system is retimed only. Here, we extend our study to issues arising when a system is iteratively retimed and resynthesized.
A difficulty emerges from directly applying Lemma 4 to bound the increase of initialization sequences under iterative retiming and resynthesis. Interleaving retiming with resynthesis makes the union bound i u i the only available bound from Lemma 4, where u i denotes the lag-dependent bound for the ith retiming operation. Essentially, inaccuracies accumulate along with the summation of the union bound. Thus, the bound derived this way can be far beyond what is necessary. In the light of lag-independent bounds discussed earlier, one might hope that there may exist some constant which upper bounds the increase of initialization sequences due to any iterative retiming and resynthesis operations. (Notice that, when no resynthesis operation is performed, the transformation of a series of retiming operations can be achieved by a single retiming operation. Thus, a lag-independent bound exists for iterative retiming operations.) Unfortunately, such a transformation-independent bound does not exist as shown in Theorem 5. Fig. 11 . Given an FSM in (a), it can be resynthesized to the one in (b) and then backwardly retimed to the one in (c).
Lemma 5: Any dangling state of an FSM (with implicit reset) is removable through iterative retiming and resynthesis operations.
Proof: By Proposition 5, the initial states of an FSM M with implicit reset must be nondangling. Removing dangling states cannot affect the behavior of M. Essentially, states without predecessor states can be eliminated with a resynthesis operation followed by a retiming operation. To see why this is the case, let Σ be the input alphabet, Q be the set of states of M, and Q † ⊆ Q be the subset of states with predecessors. As illustrated in Fig. 11 , a resynthesis operation can rewrite the original transition functions δ : Σ × Q → Q as a composition of three parts Theorem 5: Given a synchronous hardware system and an arbitrary constant c, there always exist retiming and resynthesis operations on the system such that the length increase of the initialization sequence exceeds c.
Proof: Any dangling state of an FSM can be removed by iterative retiming and resynthesis by Lemma 5. On the other hand, since the transformation of retiming and resynthesis is reversible, a path over dangling states can be made arbitrary long through iterative retiming and resynthesis operations. Therefore, the theorem follows.
Since the mentioned union bound is inaccurate and requires knowing the applied retime functions, it motivates us to investigate the computation of exact 10 length increase of initialization sequences without knowing the history of retiming and resynthesis operations. The length increase can be derived by computing the length, say n, of the longest transition paths among the dangling states because applying an arbitrary 11 input sequence of length greater than n drives the system to a nondangling state. The length n can be obtained using a symbolic computation. By breadth-first search, one can iteratively remove states without predecessor states until 10 The exactness is true under the assumption that the initialization sequence of the original FSM is given as a black box. If the initialization mechanism is explored, more accurate analysis may be achieved. 11 Although exploiting some particular input sequence may shorten the length increase, it complicates the computation. a greatest fixed point is reached. The number of the performed iterations is exactly n.
VI. RELATED WORK

A. Optimization Capability
The closest to this paper on the optimization power of retiming and resynthesis is [19] , where the optimization power was unfortunately overstated contrary to the claimed exactness. The mistake resulted from the claim that any two-way switch (redirecting a transition to another immediately equivalent next state) operation is achievable using two-way merge (merging two immediately equivalent states into a single state) and twoway split (splitting a state into two immediately equivalent states) operations; see [19] for detailed illustrations. Fig. 12 shows a counterexample illustrating a two-way switch operation that is not achievable with two-way merge and split operations. The overstated optimization power results from the overlooked fact that, under any input assignment, the next states of immediately equivalent states split from a current state must be the same. In fact, only two-way merge and split operations are essential. Aside from this minor error, no constructive algorithm was known to determine if two given FSMs are equivalent under retiming and resynthesis. In addition, not discussed were the creation and annihilation of dangling states, which we show to be crucial in initializing synchronous hardware systems.
B. Verification Complexity
Ranjan in [18] examined a few verification complexities for cases under one retiming operation and up to two resynthesis operations with unknown transformation history. The complexity for the case under an arbitrary number of iterative retiming and resynthesis operations was left open, and was conjectured in [25] to be easier than the general equivalence checking problem. We disprove the conjecture.
C. Initialization Sequences
For systems with explicit reset, the effect of retiming on initial states was studied in [3] , [21] , and [24] . In the explicit reset case, incorporating resynthesis with retiming does not contribute additional difficulty. Note that, for systems with explicit-reset registers, forward moves of retiming are preferable to backward moves in maintaining equivalent initial states, contrary to the case for systems with implicit-reset registers. To prevent backward moves, Even et al. in [3] proposed an algorithm to find a retime function such that the maximum lag among all vertices is minimized. Interesting enough, their algorithm can be easily modified to obtain minimum lagdependent bounds on the increase of initialization sequences (by avoiding forward retiming instead of backward retiming). As mentioned earlier, explicit reset can be seen as a special case of implicit reset when reset circuitry is explicitly represented in the communication graph. Hence, the study of the implicit reset case is more general, and is subtler when considering resynthesis in addition to retiming.
Pixley in [17] studied the initialization of synchronous hardware systems with implicit reset in a general context. Leiserson and Saxe studied the effect of retiming on initialization sequences in [9] , where a lag-dependent bound was obtained and was later improved by [2] and [22] . We show a lag-independent bound instead. In recent work [15] , a different approach was taken to tackle the initialization issue raised by retiming. Rather than increasing initialization sequence lengths, a retimed circuit was further modified to preserve its original initialization sequence. This modification might need to pay area/performance penalties and could nullify the gains of retiming operations. In addition, the modification requires expensive computation involving existential quantification, which limits the scalability of the approach to large systems. In comparison, prefixing the original initialization sequence with an arbitrary input sequence of a certain length provides a much simpler solution (without modifying the system) to the initialization problem.
On the other hand, we extend our study to the unexplored case of iterative retiming and resynthesis, and show the unboundability of the increase of initialization sequences. Finally, our exact analysis on the increase of initialization sequences is applicable to the case of iterative retiming and resynthesis and improves the bound of [2] and [22] .
VII. CONCLUSION AND OPEN PROBLEMS
This paper demonstrated some transformation invariants under retiming and resynthesis. Three main results about retiming and resynthesis were established. First, an algorithm was presented to construct a canonical representative of an equivalence class of FSMs transformed under retiming and resynthesis. It was extended to determine if two FSMs are transformable to each other under retiming and resynthesis. Second, a PSPACE-complete complexity was proved for the above problem when the transformation history of retiming and resynthesis is unknown. Hence, to reduce complexity (from PSPACE-complete to coNP-complete), it is indispensable to maintain transformation history, or to check intermediate equivalence after every retiming or resynthesis operation. Third, the effects of retiming and resynthesis on initialization sequences were studied. A lag-independent bound was shown on the length increase of initialization sequences of FSMs under retiming; in contrast, unboundability was shown on the case under retiming and resynthesis. In addition, an exact analysis on the length increase was presented. We believe our results may reveal some directions enhancing the practicality of retiming and resynthesis for the optimization of synchronous hardware systems.
For future work, it is important to investigate more efficient computation, with reasonable accuracy, for the length increase of initialization sequences for FSMs transformed under retiming and resynthesis. On the other hand, it may seem that our lagindependent bound can be used to improve retiming algorithms by pruning out spurious linear constraints, similar to [12] . Moreover, as the result of [3] can be modified to obtain a retime function targeting area optimization with minimum increase of initialization sequences as discussed in Section VI, it would be useful to study retiming under other objectives while avoiding increasing initialization sequences.
While verifying the equivalence of two sequential circuits transformed by an unbounded number of retiming and resynthesis iterations was shown to be PSPACE-complete, it is open when the number is bounded by some constant. In particular, it is not known if the complexities parametric upon this constant follow the polynomial-time hierarchy [23] .
