Abstract Synchronous designs are often described and implemented hierarchically as a network of interacting fvlite state machines. .When the individual components are synthesized and implemented cseparately, it is desirable to take into account the degrees offreedom that arise from the interactions with the other components and from the specification. Specifically, we consider the problem offvlding the complete set of implementations that can be correctly substituted for a component in a system without changing the behavior of the total system. In this paper, a synchronous trace set nwdel is proposed to tackle this problem. The unambiguous way in which this model describes synchronous behavior provides us with U general method to nwdel interacting jirute state machines. We show that a single syncluonous trace set can be used to characterize all possible substitutions in a correct substitution problem. We give a necessary notion of progress that permits us to choose a correct one from this set, or to show that no solution exists. We provide results of an implicit implementation of our approach applied to benclznwks ranging from small to very large state machines. These results show the feasibility of the proposed methods.
Introduction
Many synchronous designs are naturally specified and implemented hierarchically as a network of interacting finite state machines (FSMs). Hierarchical representations are especially important when the design is compiled from a high-level language. Also, implementation constraints may force thek physical implementations to be separated. When optimizing each individual component separately, it is desirable to take into account the degrees of freedom that arise from the interactions with the other components and from the specification. Earlier works on hierarchical optimization of finite state machines were based on the exploitation of don't care sequences [5, 7, 81, which have been shown to be inadequate in capturing all degrees of freedom available for hierarchical optimization [ 121. In Figure 1 , the hierarchical optimization problem is depicted.
In this figure, MI is the machine to be replaced by an optimized one while M 2 is a partial implementation representing other components in the system. M is the overall specification that we wish to satisfy. In this general setting, both the specification M as well as the partial implementation M2 may be nondeterministic. For the specification, the non-determinism means that we have a choice in the behaviors that we can implement, but only one choice needs to be implemented. The non-determinism in the specification can arise either because some input sequences are left unspecified, or for some input sequences, there are multiple choices of output sequences. Informally, an implementation can be used for the specification M if "there exists" a completely specified state machine that can be chosen from M which is equivalent to it. For the partial implementation, the non-determinism means that * spoiisorzd by thz EC undzr HCM contract ERBCHBGCT920056 a specific implementation has not yet been chosen; a specific implementation will be chosen later according to some cost criteria. This non-determinism can arise when the design is compiled from a high-level language. However, in optimizing M I , an altemative implementation Mi can only be used if it leads to a valid implementation of M "for all" choices of M i that can be chosen from M,. This means that the correctness of MI should be independent from the eventual implementation of M z . Another issue is the choice of signals to use for implementing the replacement component in the system. Referring again to Figure 1 , the external input signals I and the internal signals X may be used in the implementation of M I . Even the output signals 0, may be used as long as no combinational cycle is formed. In general, all signals defined in the network topology may be used in the implementation of a replacement. The choice of signals defines the actual interconnection of the replacement machine with the rest of the system, which also determines the final network topology. In some applications, we may be required to express MI in terms of only a restricted set of (local) signals. In other scenarios, making use of some internal signals may lead to more efficient implementations. Therefore, it is desirable to incoiporate this kind of flexibility into the solution framework.
In [4] and [12], a restricted version of the correct substitution problem was investigated where the partial implementations are completely specified and deterministic, though the specification may be non-deterministic. For this problem, they have shown that non-deterministic finite state machines are sufficiently expressive to capture the set of all possible substitutions. They proposed structural computations at the state machine level for computing the non-deterministic finite state machines. When the generalized correct substitution problem is considered in which the partial implementations can be nondeterministic, some subtle problems arise. Although it may be possible to extend the state machine approach to tackle these subtleties, it is more desirable to develop a complete theory with a precisely-defined model in which they become clear. Also, for the generalized correct substitution problem. a general notion of "conformance" is lacking, as well as general notions of "hiding" and "projection" of signals in the network. The latter are essential in exploring different solutions that involve the use of different signals in the network topology.
To address these issues, we present a synchronous trace set model and a corresponding set of operations on it. In contrast to a structural level approach, this model is based on languages.
To characterize correctness, a notion of simulation is also introduced. As the behaviors of the network and its components are described in terms of synchronous trace sets, the problem of generalized correct substitution is also expressed using these trace sets and the notion of simulation. This model allows for uniform handling of nondeterminism, in both the overall specification and the partial implementation, and deals also with different network topologies issues. Without loss of generality, it is also possible to postpone the choice of network topology to the last step in deriving the solution. The theory provides insights and improves the ability to analyze similar problems.
The outline of this paper is as follows. In Section 2, the model of synchronous trace sets and the main operations on them are described. In Section 3, notions of conformance and simulation are defined and explained in terms of these trace sets. In Section 4, we provide the general synchronous trace set theorems for the correct substitution problem in terms of simulation, expressed in a global and local signal space which allows for optimizations involving hiding of different signals. We have implemented in C a fully implicit implementation of our theory using implicit enumeration methods [3, 101 and have applied it to a number of benchmarks ranging in size from tens of states to over loz states in the largest example tested. The experimental results are reported and discussed in Section 5. Finally, we draw some conclusions.
Synchronous Trace Sets
In this section, the synchronous trace set model is presented. We first state some basic definitions. where I is a finite set of input signals and 0 is a finite set of output signals ( I n 0 = 0). Let A = B' "O be the alphabet. Then ScA' is the set of success traces and FcA' is the set of failure traces. The set P of possible traces is defined to be P = S U F and the set X = A -P forms the impossible traces.
Preliminaries
We require that for all synchronous trace sets, these three sets form a partition on A*. Normally, the behavior of a circuit is specified as a set of sequences of specified inputloutput combinations. For deterministic systems, for every input, there is exactly one output specified. Other outputs may not be generated for this input sequence by an implementation of the circuit. For a nondeterministic specification, there can be more than one output specified, from which a deterministic implementation must choose one. In trace sets, the traces with outputs that may not be generated are captured by the set of impossible traces X.
For incompletely specified systems, the unspecified inputs are assumed not to occur and are therefore called failure traces.
However, an implementation is allowed to specify some output for such an input, e.g. for optimization purposes or to be completely specified. For trace sets, these sequences arc captured by the set of failure traces F. The previously stated interpretation provides us with a direct way to transform a synchronous trace set ( I , O , S , F) into a finite automaton representing that trace set. Let 
Operations on Synchronous Trace Sets
The essential operation on circuits is the construction of a network of circuits by interconnecting them through common wires. This composition operator determines the common behavior of the two trace sets. To allow this, the languages have to be defined over a common alphabet. Let T = (I, 0. S, F) and T' = (I', 0', S', F') (OuO' = 0) be two trace sets defined over the same alphabet, A = IB'uo = IB''vo'. Formally, the parallel conzposition of T and T', denoted as T II T', is defined as the trace set T" = ((Iul') -(OuO'), OuO', S " , , ' ' ) with S" = SnS', F" = (FnP')u(PnF') and X" = XuX'. For trace sets T and T' that are not defined over the same alphabet, we first introduce an auxiliary ro'ection function.
Let @(D)(x): B'"O -+ IB' "O be the function that maps a string x , defined over alphabet A = B'"O, to a string x' defined over A' = B""''-D such that any occurrence of a symbol X E D is removed. # ( D ) ( n ) on a string x is extended in the natural P i TI I T , , iff I = I', U =U' and P'EP and F'cF (or equivalently: XcX' and F'cF).
Conformance is a necessary condition for a circuit to correctly implement a specification. It states that TI must at least be specified for all input traces of T,, i.e. X'rX. In addition, it states that for all specified input traces of T,. TI does not include other output traces than T, specifies, i.e. FEF'. The above definition directly provides a procedure for checking conformance based on language containment. Another way to check for conformance is by means of the mirror trace set. A synchronous trace set T = ( I , 0, S, F ) is called failurefree if its failure set F is empty. Then, the mirror of a synchronous trace set T = ( I , U , S, F ) , denoted by mirmr(T), defines a unique maximum environment TM such that T composed with that environment T yields a synchronous trace set that is failure-free. For a synchronous trace set T = ( I , U , S, F ) , mirror(T) is defined as (U, I, S, X). The conformance of an implementation with respect to a specification can now equivalently be expressed using the mirror operation: 
, which is the normal parallel composition but in which the the trace sets are first expanded to a common alphabet.
The hide operator is used to define a trace set in only a partial alphabet. Formally, let D be a set of signals to hide. Then
Normally, hiding is only applied to a set of output signals. The projection onto a set of signals is the dual of the hide operator:
. As a result of applying hide, the trace set may success and failure languages that overlap. The common interpretation of this situation is that a system is supposed to fail whenever it may possibly fail. The resulting trace set can be made non-overlapping by removing the overlapping failure traces from S . Note that when two arbitrary FSMs (Mealy machines) are connected, the resulting circuit might contain an oscillating combinational cycle. Such a cycle shows up in our general synchronous trace set model as an X-trace. However, when in the next state, due to a new input, the oscillating cycle does not exist any more, this trace will be a success trace again. As a consequence, the success set of the trace set representing the total machine will not be prefix-closed.
Conformance, Progress and Simulation
In this section we introduce a refinement relation to define when a trace set is a correct implementation of a specification trace set. This relation is of importance for the correct substitution problem which we will discuss in the next section. The refinement relation is called simulation and consists of two properties: conforiizunce and progress. The first notion of conformance is taken from asynchronous trace theory [6] . However, this property is not sufficient to guarantee that a trace set is a correct implementation, as we will show. It only captures the fact that the implementation does not contain "bad" behaviors. It does not state that enough "good" behaviors are covered. This is guaranteed by the notion of progress. This results in necessary and sufficient conditions for a correct implementation.
Conformance and Mirror
Given a synchronous trace set T , of a circuit specification, and its implementation represented by trace set T I , TI is said to be a "safe substitution" when it does not show more, i.e. bad, behavior than the behavior given by the specification. This safe substitution is also called conformance and can be formally defined in terms of languages as:
Definition 3.1 Let Ts = ( I , U , S , F ) and T I = (I', U', S', F') be two synchronous trace sets. T , conforms to T,, denoted
Progress and Simulation
In the general case, conformance is a necessary but not sufsIcieM condition for an arbitrary trace set TI to actually implement a trace set T,. It guarantees that e.g. no erroneous outputs are generated by TI but it does not demand that it generates successful outputs at all. In Figure 3 , a trace set is depicted that conforms to its specification but which is not a correct implementation. In state S,, T I does not generate an output on input 1, whereas an output is specified in T, for this input.
TS TI -11 011
Figure3. A specification trace set T , and a conforming but
In network of circuits, the above displayed behavior can e.g. be caused by cycles, as mentioned before. Since in the later discussed correct substitution problem we deal with networks, it is important to be aware of this situation in order to avoid it. To remedy this, we define a notion of progress of a trace set Ti with respect to a trace set T,. Intuitively, progress expresses the fact that if a trace in the success set of T I can be successfully continued on some input in T,, there must also be a successful continuation in T I on the same input. This property is not guaranteed by conformance alone. Conformance actually allows all these extended traces to become impossible, resulting in incorrect implementations. In general, every TI with S' a subset of prefixes of success traces S is conforming to Ts by adding the difference traces S -S' to X to form x'. Nevertheless, such a conforming S' can not be considered to be a correct implementation. Formally, progress is defined as follows.
Definition 3.2 Let Ts = ( I , U , S , F ) and T I = (I',O'. S', F')
be two synchronous trace sets with I = I',U = U'. Let ;SIB' and 0, o'EIB. Then TI is said to make progress w.r.t. T , iff
V X E ( S ' U E ) :
(3 i , 0 : x . io E S j 3 0': x io' E S') incorrect implementation TI.
Note that for the special case that Ts is a representation of a completely specified and deterministic FSM, progress reduces to S = S'. Now, we can define the necessary and sufficient condition of a trace set to correctly implement a specification, which is expressed by the term simulation: 
T, .
When we say that TI simulates T,, we actually imply several properties. First, T, may not contain more unspecified input transitions than T,, and in addition no outputs may be generated by TI that were not specified in Ts. These two properties are expressed by the conformance constraint. Furthermore, TI must generate sufficiently many outputs, when Ts does. This is expressed by the progress constraint.
Correct Substitution Problem
In the previous sections, we defined a general model to describe hierarchical FSM networks and operations to manipulate them using wace sets as well as correctness criteria for implementing them. A known and important problem in the context of interacting state machines is the correct substitution problem. In this section, the theoretical framework for this problem is stated in terms of synchronous trace sets. This will provide a general solution to this problem.
Problem Situation
The correct substitution problem addresses the issue of replacing a part of a network of interacting machines without changing the overall behavior. As an example, consider Figure 1 . The correct substitution problem asks for finding a replacement for subnetwork MI such that M , connected with subnetwork M , simulates the specification M . Due to the interaction of the replaceable FSM with its surrounding network, of which the behavior is (partially) specified, and the thereby induced don't care sequences, there exists in general more than one solution to the correct substitution problem. Besides this cause for the existence of possibly more than one correct substitution, the choice of network topology also affects the solution space. A network fopology defines the actual interconnection of the substituted machine with the rest of the system. In Figure 1 , the replaceable machine M I is shown to have the signals I and internal network signals X as input signals.
The signals X are (a subset of the) outputs of M,. It can, however, be the case that a solution for MI that also uses some internal signals of Mz may result in a more efficient M I . Those signals reflect behavior already implemented by another machine but which may be used in the implementation of M I . Another situation is that, because of network constraints, MI can only be dependent on the inputs X and not of the external inputs I. To represent the several possible solutions to the correct substitution problem, we will exactly define the maximal behavior to which the solution must simulate. It will be proven that this maximal behavior can be represented by a single synchronous trace set, out of which all solutions can easily be extracted. In this paper, by the use of synchronous trace sets, or equivalently languages, to describe the behavior of hierarchical systems, a full general model is provided as it allows for the underlying FSMs to be non-deterministic and incompletely specified. In addition, it can be made independent of the specific network topology of (local) input and (local) output signals. This allows to describe solutions ranging over an arbitrary topology. In a later stage, a network topology can then be selected which yields the most efficient solution for the replaceable subnetwork. Therefore, we formulate the solution of the correct substitution problem in two ways. One describes the network to be substituted depending on the global space of all possible inputs, while the other describes it in terms of a particular topology. In the following, let the specification of the overall network be given by the synchronous trace set T = (I, 0, S, F), and the subnetwork that is to be replaced be the synchronous trace set TI = (Z,,O,, SI, Fl). The remaining specified subnetwork is represented by the single synchronous trace set Tz = (I2, 02, S, , F,). In case the specified subnetwork consists of more than one component, T, can be computed by collapsing all the components through the application of the parallel composition operator on the trace sets of the components. Note that 0, n 0, = 0 holds.
Derivation of T, ,
Now we will formalize the correct substitution problem and derive formally the maximal trace set for it.
Problem Let T , and T be synchronous trace sets. T represents the specification and T, represents a partial implementation. The correct substitution problem is defined to bc the problem of finding a synchronous trace set TI such that TI II T2 -T.
To provide a solution to this problem we make use of the following intermediate results. Let T, , = mirror(T, I 1 mirror(T)) be the maximal trace set. Theorem 4.1 shows that all TI 5 T, , composed with Tz conform to T. However, the following corollary shows that TI is a conforming substitution independent of the implementation of T2. This shows that TmX is also maximal w.r.t to the conformance ordering 1.
is the case if and only if TI I mirror(T, II mirror(T)).
Corollary 4.2
If TI 1 mirror (T, I 1 mirror(T)), then VT; I T2: T, I 1 Ti 2 T. Theorem4.1 defines the trace set TI in terms of the global space of all possible (input) signals. However, in reality, a component in a network of synchronous circuits may only be defined in terms of a subset of all possible input signals. This can be required for reasons of optimization, to reduce the number of dependencies, or simply because in the actual Implementation of the network, TI does not have all possible input signals at its disposal. Then, TI in Theorem 4.1 should be made independent of certain inputs, if possible. Equivalently, this can be expressed as the hiding of output signals on the mirror. of TI. Now, we can express the conformance constraint to find the maximal trace set in terms of a local space for Ti. 
project(A,)(T, Ii miri-or(T))).
After projection, the success and failure languages may, in general, overlap. Therefore, T, , must be made non-overlapping fmt. As T, , can in general not be made independent of all input signals, it is evident that signals may not be arbitrarily hidden. For those network topologies, an implementable solution does not exists to the correct substitution problem. It will be shown later how this can be checked on T,, using the notion of simulation. Note that no confonning T, , or one that has sufficiently many traces that make progress may exist because of the erroneous (non-deterministic) behavior of subnetwork T2. The theorems previously stated provides a way to optimize a subnetwork for the correct substitution problem:
1. First, compute the maximal behavior T,,, = mirror(T2 II mirror(T)). 2. Apply state minimization on the underlying automaton of the maximal trace set T, .
3. Hide appropriate input signals in order to optimize the subnetwork.
Choose an implementation of T,,,.
actually makes progress w.r.t. T .
Implementation Details
Now we will discuss how the above described path to compute the maximal solution to the correct substitution problem can be performed in detail. In Section 2 it was shown that finite state machines can be modeled by synchronous trace sets which again are deterministic finite automata. The parallel composition operator on trace sets is identical to the structural (synchronous) product of the two automata representing the individual success trace sets. Hiding and projection are also defined as usual operations on finite automata. Making trace languages non-overlapping is equal to determinizing the automaton in which all (super)states which contain a failure state are mapped to a failure state in the final automaton. The mirror operation on synchronous trace sets is the identity operation on the finite automaton model, only with swapping the failure and the In extracting a correct FSM from T,,, note that when conformance is considered as the only condition for a correct solution, than every solution must have a success language that forms an arbitrary subset of S,,uF,,.
These solutions than exactly correspond to all structural sub-automata of an arbitrary unfolding of the automaton formed by S-uF,,. To satisfy the necessary progress condition, traces that will lead to a non-progress condition may not be included in the acceptance language of these automata. The set of traces that cause non-progress conditions can be determined while constructing T,,, and subsequently be subtracted from the success set of T,,,.
Experimental Results
We have implemented in C a fully implicit implementation of our approach to derive the maximal simulating trace set for the correct substitution problem. The implementation uses BDDbased implicit enumeration methods [I, 3, lo] . In this section, we present some results of applying this implementation to a number of benchmarks from the SIS sequential benchmark set [9] . The examples used range in size from tens of states to over le8 states in the largest example tested. The results are tabulated in The first set of examples originated from the ISCAS sequential benchmark set for sequential testing and optimization. The second set of examples are accumulator circuits made from conventional adders. The last three sets of examples are derived from the "minmax" benchmark circuit [9] , which computes the minimum, last, and maximum from a sequence of integers. The last set of benchmarks labeled "minmax" are precisely this circuit but with different bit-widths. The "max" circuits in the third set only compute the maximum from a sequence, and the "mod-minmax" circuits in the fourth set compute both the minimum and maximum, but does not record the last. The experiment was performed as follows: we took each benchmark circuit and regarded it as the "specification". In the left columns, the number of input and output variables, the number of state variables, and the size of the state space are shown for each example. From each specification, we extracted a submachine by randomly selecting half the state variables. The sub-machine is the state machine induced from these state variables. The statistics of the sub-machines are shown in the middle columns. We then used the approach presented in this paper to derive the maximum simulating trace set. This maximum trace set contains all state machines that when composed with the sub-machine simulates to the specification. The operations on the languages are performed using implicit methods [3, 101 and the CPU times contain the reachability analysis of the automaton T, , = mirror((T, II mirror(T))), where T is the specification trace set and T2 is the sub-machine trace set. The size of the BDD that represents the reachable set of accepting states of the automaton is indicated in the column labeled "BDD size". All CPU times were measured on a HP model 7 15/50 workstation. 
Conclusions
We have presented a general language-framework that is able to model hierarchical synchronous circuits. It allows for expressing problems relating to interacting FSMs to be represented in a mathematically sound way. Specifically, we modeled the generalized correct substitution problem and stated necessary and sufficient conditions for finding an implementable solution to this problem. The problem and solutions are formulated uniformly for deterministic and non-deterministic behaviors, and incompletely specified machines. In addition, it is also network topology independent, thus allowing for finding more efficient solutions which are defined for different interconnection schemes. Results applied on benchmarks implementing the model show that computations based on the presented theory are practically feasible.
