Recent work has identified the notion of safe replacement for sequential 
Introduction
The problem of design replacement for gate-level synchronous, sequential circuits is to replace a given design with another (optimized in some respect) making only minimal and reasonable assumptions about the interactions of the designs with its environment. The sequential nature of the design comes from the use of memory elements (such as latches and flip-flops). In synchronous designs all memory elements are triggered by a common clock signal. In some designs, all memory elements can be initialized by asserting an input that is hardwired to a specific port of each memory element.
In theory, designswith universal reset lines are easy to deal with becauseone design can safely replace another functionally if both designs have the same reset state and any sequence of identical inputs to the two designs in their respective reset states elicits identical sequences of outputs. However, the price for the ubiquitous reset line is the cost of routing wires to each memory element and the circuitry within each memory element to accomplish the initialization. Therefore, many real, industrial circuits contain at least some memory elements with no reset lines. This, however, raises a harder question about what it means for one sequential design without hardware reset to safely replace another. Furthermore, initializing such a design is a consequence of the interaction of a design with its environment.
Reflection 111 and experience show that, in the absence of knowledge about the intended use of a design and information about its environment, a necessary condition for safe replacement is that the State Transition Graphs (STGs) of a design and its replacement have equivalent Terminal Strongly Connected Components (TSCCs). However, since it is generally concededthat one cannot control the state in which a design is powered up, some condition concerning the way a design and its replacement are initialized is for Sequential Optimization Adnan Azizt Robert K. Braytod required. In [1] , it was suggested that a necessary (but not sufficient) condition is that, in addition to having identical TSCCs, a design and its replacement have at least one synchronizing sequence in common. More recently a "safe replacement" condition was presented [2] which implies that every synchronization sequence for the original design also synchronizes the replacement design (this is equivalent to the redundant condition for fault detection in [31) . It was argued that that this notion of safe replacement was the weakest possible without knowledge or assumptions about the design's environment. Based upon the notion of "safe replacement," a sequential syn- (see Section 4 of this paper). Many well-crafted designs have ephemeral states due to the fact that binary encoding often leaves some states of the implementation without a corresponding state in the specified design. These states should be ephemeral. Also onehot encodings of machines with n states results in 2" -n states which may also be ephemeral. In addition, a forward retiiing move across a fanout junction will also create ephemeral states [6] .
However, all retiming moves preserve the outer-envelope. The notion presented in the current work relies on the assumption that any realistic design will be used only after some cycles have elapsed after power-up; these cycles are necessary for the voltages and current to settle immediately after power-up. This initialization slack, often at least a few thousand clock cycles (usually milliseconds), is known in advance and is part of the design specification. Sequential optimization can use this initialization slack, say N cycles, in the following two ways. First, the design can be partitioned into non-overlapping, manageable sized pieces.
Then each piece can be optimized using the N initialization slack cycles. Proposition 4.3 will show that this strategy will produce an entire design that is N-cycle delay-safe replacement. To summarize, our optimization techniques for delay replacements allow us to achieve significant logic optimizations while allowing the designer to speclfy the flexibility in the power-up delay, N .
Terminology and Background
We now make precise the notion of a finite state machine and our model for sequential hardware. Since distinct netlists of gates can compute the same combinational function, distinct hardware designs can correspond to the same DFSM. Given a DFSM, we use the term "design" to denote any gate level implementatiori of the DFSM.
Safe Replacements
The safe replacement condition requires that if the original design is replaced with the new design, there is no way the environment can detect the replacement by looking at the input-output behavior of the design. Unfortunately, the requirement for one design to be a safe replacement of another was shown to be very strong. Not much flexibility exists in choosing replacements that are safe; in particular this means that little scope exists for significant design optimization. This motivated our current work, where based on the number of clock cycles between the power-up and when a design is actually is used, substantial flexibility can be extracted from a design. 
Definition 3 Given a design D , the

Delay Replacements
Dn-1 --D". Using the terminology in [5] , this number n is the We now present our condition for delay replacement. As we noted in Section 1, as part of the system specscation, an initialization slack of n clock cycles is available. We can use the flexibility afforded by this slack by requiring a design to be an acceptable replacement if it is allowed to clock n extra cycles with arbitrary inputs before it is used: is strictly stronger than ours, and thus allows less flexibility for replacement. The following two properties of delay replacements follow (we refer the reader to [13] for the proofs of all the results that are presented in this paper): Thus, for example, a 2-delay replacement followed by a 3-delay replacement on the same design results in a n-delay replacement f o r a n y n 2 5 .
As an example of delay replacement, consider designs P and R in Figures 1 and 3 . It can be seen that R' 5 P; however, R 2 P (thestate 100 E Rproducesout sequence0.0.0on input sequence 0 . 1 . 0; this input-output behavior cannot be seen from any state in P).
Compositionality of Delay Replacements
For the optimization of any design, we can select arbitrary subpieces of the design and perform delay replacements on these. In this subsection, we examine the effect of making a delay replacement on the larger design. We will show that making n-delay replacements on non-overlapping sub-pieces of a design will produce an entire design which is n-delay safe. On the other hand, if two consecutive n-delay optimizations are made on sub-pieces which are overlapping, we get an entire design which is 2n-delay safe (notice that, as a special case of this, if the two sub-pieces are identical, we already know from Proposition 4.2 that the entire design in 2n-delay safe).
Informally, two given hardware designs can be "wired" together by driving some of the inputs of one by outputs of the other andvice versa. The remaining inputs and outputs will be primary inputs and primary outputs of the composed design. Given designs A and B, we will use A @ B to denote their composition. A formal definition of design composition is given in [14] . We obtain the following result which shows that composing two delay replacements adds up the delays. 
'--'_:nodes
This means that delay replacements can be made in different parts of the design, and the resulting overall design is as safe as weakest individual replacement (the replacement with the great 
Delay-Preserving Replacements
This subsection discusses another interesting replacement notion relating power-up delays to replacements. However, the reader can choose to ignore this section without sacrificing any understanding of this paper.
We had originally formulated the following conditions for allowing a replacement in the presence of the power-up slack
As examples of delay-preservingreplacements consider designs P and R in Figures 1 and 3 . We already h o w that R is a 1-delay replacement of P (i.e. R' 5 P). However, R is not a l-delaypreserving replacement of P; instead €2 is a 2-delay-preserving replacement of P .
It is easily seen that every n-delay-preserving replacement is also an n-delay replacement (of Definition 4). Delay-preservingreplacement is a useful notion because successive n-delay-preserving replacements on the same design still result in an n-delaypreserving replacement (unlike n-delay replacements, which add up, as shown in Proposition 4.2). However, we found an example which illustrated that delay-preserving replacements do not have nice compositionality properties which allow us to make substitutions on sub-pieces of designs. Thus, the iterative optimization strategythat we proposedin the previous sub-section will not work.
The reader is referred to [ 131 for the above-mentioned example and other properties of delay-preserving replacements. That paper also relates delay-preserving replacements to delay replacements (called delay-accumulating replacements in [13] ) and includes an interesting result which shows that in the limit, as n approaches m, these two notions converge.
Resynthesis for Delay Replaceability
Typically, the synthesis process has two stages: fist, the set of all possible implementations is characterized using the flexibility given by the replacement condition. Following this, oneimplementation is chosen according to some optimality criteria. In this section we describe how to extract the flexibility for delay-preserving replaceability, and then describe how it is used to minimize the area of the combinational logic associated with a hardware design.
Let the original design
delayed designs form an "onion ring" structure [5] . Clearly any state reachable from itself under some input sequence belongs to each De and to D". We will not alter the behavior of any such state (a "stable" state). All the flexibility for resynthesis comes from the set of "transient" states, i.e. D \ D" .
Recall that an n-delay replacement allows the new design to have some flexibility over the first n clock cycles after the powerup, and after these clock cycles it is indistinguishable from the As we said before, often it is known in advance that the design will be allowed to settle for at least a fixed number of clock cycles before it is used. We use this slack n for resynthesis-the higher n is, the greater the flexibility allowed for resynthesis; however, the environment of the design cannot rely on the input/output behavior of the design for the first n cycles. For an n-delay replacement, we will express the flexibility to obtain a new design C such that C" is exactly the same as D".
Note, that this is a conservative strategy since only C" 5 D is needed.
A sequential gate-level design can be viewed as a connection between a purely combinational part and memory elements. The combinational part can be represented by a directed acyclic graph, individual vertices of which compute combinational functions. The primary inputs and outputs to the design are denoted by Z and c, respectively (Figure 4) . The outputs of the combinational part which feed the memory elements are denotes by c, and the lines which feed the output of the memory elements to the combinational part are denoted by 3'. To do resynthesis for delay replaceability, we obtain a Boolean relation Q(;, 3, G, g) which describes the flexibility for replacement. We will then use this relation to do multi-level resynthesis on our design. First, we informally describe the flexibility which will be specified later using a Boolean relation Q. We note here that techniques for usinga Boolean relation to do multi-level synthesis [15] require the relation to be such that the starting design satisfies the relation. The Boolean relation is such that the behavior of the states in D"
(the "stable" states) is preserved. For 0 5 i < n, on any input, the relation allows a state in (De-' \ D e ) to transition to any state in D'. Clearly, the original design satisfies this flexibility relation. It can also be seen that for any design C that satisfies the relation it is true that C" = D", i.e. after the first n clock cycles we every reachablestatein Cisequivalentto onein D". Alsonotethat since we do not care about the outputs during the fist n clock cycles, the outputs of the states in D \ D" can be arbitrarily chosen.
Formally, the Boolean relation Q(;, 2,5, The intuition for the above relation Q is that, given n, we choose to preserve the behavior of all states in the set D", i.e. states in this set are forced to have the same output and next-state functions as in the original design D. For states outside of D", if the state lies in D' \ Dit', we allow the next state of such a state to be any state in Dit'; we do not care about the output from this state. All this ensures that n cycles after power-up, the new design would be in a state in D" and thus the new design would be an n-delay replacement for the original design (in fact, since C" = D", it is also an n-delay-preserving replacement).
Notice that for any integer m 2 k, the delayed design D" is the same as D m , i.e. the set of stable states. Thus, the flexibility described by the relation Q for m-delay replacement is the same as that for kdelay replacement. If we compute the number k for a design in advance, we know that we will not get any additional flexibility by allowing a slack greater than k.
Once we have the BDD for the Boolean relation Q(l, Z,;, y3
we can use standard BDD-based multi-level synthesis techniques to propagate this flexibility to individual nodes in the network and then minimize the nodes [4, 151. 
Experiments
In this section we report experimental results using the algorithm described in Section 5 on ISCAS85 sequential circuits. We used BDDs to manipulate the Boolean relations and sets. We focused on the area reduction for n-delay replacements. We report results for various values of n.
The experimental results, obtained using a DECstation5900 are shown in Table 1 . The starting circuits were obtained from IS-CAS89 benchmark circuits with the SIS commands (sweep; eliminate -1) applied [16] . Theseeliminate single-input and constant nodes and collapse nodes which do not fan out to more than one node. First we show the optimizations obtained by just doing the standard multi-level combinational optimization [17] using observability don't cares ( O K ) propagated to the individual nodes in the network. We also show the optimizations obtained by the safe replacement resynthesis method described in [4] . Then, we show the results of the method presented in this paper for ndelay replacements for n = 1,2,5,00. The table shows that for many examples, significant additional optimizations are obtained by allowing power-up delay. Even for n = 1, we see good results for some examples, e.g. s3 8 6 , sl4 8 8 , s l 4 9 4. Also, in most cases the CPU times for n-delay replacements are within an order of magnitude of the CPU time for combinational resynthesis. The much larger CPU times for pure safe replaceability can be attributed to the rigid conditions for safe replacement, which require placing constraints on the outputs from the "transient" states as well as correlating the next states of "transient" states with the inputs. Both these constraints are avoided by the resynthesis method presented in Section 5. This leads to a much smaller BDD to express the relation Q, and hence, faster multi-level resynthesis.
We suspect that synchronous recurrence equations [ 121 also lead to delay replacements. However, the experimental results presented there indicate that using synchronous recurrence equations is not very effective; our optimization method (using the SIS commands sweep; eliminate -1 followed by the optimization procedure described in this paper) produces smaller circuits using less CPU times than those reported in [ 121. (Note that the CPU times indicated in Table 1 do not include the time used by the SIS preprocessing commands; however, for all the examples, that time is much less than the times reported for the sequential optimizations. However, the CPU times reported in [ 121 are greater than double the times reported in Table 1 . ) We performed an experiment on one of the benchmark circuits with large number of onion rings (s52 6) to explore the tradeoffs between flexibility and the power-up delay allowed. The results are in Table 2 . The table shows that by allowing more delay n we do get additional flexibility. Also, the CPU times increase with higher values of n, partly because the time taken to compute the Table 2 : Power-up delay/flexibility tradwff for s526; reduction is in number of literals Boolean relation Q goes up with higher n. In the experiments above. the initial nodes of the circuits have been collapsed minimally. We see that safe replaceability gives very marginal improvements over pure combinational reductions, and at a much larger CPU time cost. For this reason, it was argued in [4] that we might get better use of the safe replaceability notion by using larger node sizes in the circuits. They increased the node sizes in these benchmark circuits by using SIS commands eliminate 1 0 or collapse. For the sake of completeness, we ran our algorithm on these starting points also. For the results, the reader is referred to [131. This experiment showed that once again, using n-delay replacements instead of safe replacements allows us greater flexibility for resynthesis and gives much better optimizations. Also, the CPU times are once again much better than those for safe replacements.
Conclusions
We presented the notions of delayed replacement which allows additional degree of flexibility over safe replacement by letting the design settle down for a certain number of clock cycles after power-up. The number can be controlled by the designer and input toour synthesistool, andthen wecanuseallthe availableflexibility while guaranteeing the degree of safeness the designer specified. We have suggested how this notion can be used in an iterative synthesis methodology where successive design replacements are made. We have shown experimental results to illustrate that we obtain significant optimizations at affordable CPU costs by using our notion of delay replacement.
