Designing Self=Timed
Devices Using the Finite Automaton Model &YNCHRONOUS SYSTEMS offer many advantages in terms of performance and power. Designing them, however, is essentially an art (see the Designers as artists box), and the quality of the final circuit implementation depends greatly on the designer's skill. Our research, therefore, defines a procedure that reliably accomplishes the routine work associated with designing one class of asynchronous systems, self-timed devices. We do not intend to take the designer out of the design process; rather, our procedure frees a designer to review more possibilities.
Our procedure synthesizes a selftimed device with external inputs from a finite Mealy automaton specification. We chose to use that specification and a two-register structure with master-slave flipflops for two reasons:
Representing behavior specifications in finite automata language is widespread and supported by many CAD systems. It also has a good theoretical and practical basis. 
University of Aim
The authors suggest a procedure for designing a self-timed device defined by the finite automaton model. This procedure proves useful when designing these devices using the available synchronous behavior specifications. They illustrate the effectiveness of their procedure by applying it to the design of a stack memory and constant acknowledgement delay counter.
w As we will show, the Mealy automaton's self-timed realization with a two-register structure has a simple and evident solution. For the given examples, it possesses a complexity close to that of the synchronous realizations.
One reason preventing wide use of the conventional asynchronous approach is the necessity of antirace coding, which causes major complications in an implementation. In this sense, self-timed realizations take the middle position between synchronous and asynchronous ones. The inputs and outputs of self-timed circuits usually have a two-phase behavior (codespacercode, . . .)' and a four-phase interface with the environment (requestcode-acknowledgementspacer-request, . . .). These characteristics of the inputs and outputs, in fact, organize synchronous b e havior and therefore allow specification in synchronous finite automata language. However, the problem of obtaining correct circuit behavior (eliminating races, for example) stays outside the specification. Proper transitioning of the specification to the realization structure guarantees gatedelay insensitive behavior and provides the correct circuit behavior.
Such an approach is not a methodological novelty. This article illustrates its particular application to a situation that often arises when designing selftimed devices using the available synchronousprototype behavior specification. For asummary of the research supporting this approach, see the Bac"kground box (next page).
Specification problems
We must first derive the device behavior specification. This process is informal and produces varying results. In addition, a specification of minimal complexity doesn't guarantee minimal implementation complexity. Changing input specifications, however, achieves a more significant improvement in circuit solution quality than perfecting formal synthesis procedures. ing a vertex of free choice into the model and modernizing the formal method, or "unwinding" the specification (linearization).
Our procedure, then, intends to obtain a correct change diagram specification from a finite automaton model. Although self-timed devices are asynchronous automata, we will not use asynchronous automaton models to d e fine them. This is because these modelssolve problems like antirace coding of internal states and hazard-free implementation of logic functions. A selftimed device's main design problem is fixation of transient process completion moments in its circuit. The circuit certainly must be free of races and hazards, too, but fighting them is an attendant problem. Global methods handle races and hazards. They also often lead to better circuit solutions than that of special This definition is not formal and emphasizes only that a self-timed automaton must correctly perform the automaton conversion under any ratio of delays of its dements. This requirement is feasible when using special coding systems and restrictions on the characteristic of delays in elements and wires. The following Muller's hypothesis of delay characteristics conforms well to practice: 1 ) delays can be both inertial and pure; 2) delays in elements and Dieces of wires from an element oubut Lp to a fork can be of any finite vahe; and 3) wires after a fork have a skew in delay values not more k n the minimum delay of an element. In general, designers use self-synchronized code systems to code the input, output, and internal states of on automaton.
Our earlier work proved the possibility of designing a self-timed implementation of an arbitrary finite automaton consisting of a combinational circuit and memory elements. We also developed methods to synthesize self-timed automata from electric-potential elements.
To do this, we use methods and p mdures designing self-timed realizations of Boolean function systems, m r y elements, and the circuits signaling transition processes completion in those elements.
To create formal methods of self-timed circuit synthesis, we need formal models. These models must reflect possible work concurrency and interaction asynchronism between different parts of the device. In this case, unfortunately, a finite automaton model representing a sequential machine is useless.
Known formal methods of self-timed device synthesis use dynamic models for specifying parallel asynchronous processes in circuits. These models include Muller's diagrams,' signal graphs, and change diagrams.2 Synthesis methods based on such models work well for designing autonomous devices. When synthesizing devices with external inputs and outputs, the specification-simulating environment behavior must complement the general device behavior specification. This insertion usually requires "specification linearization,"2 a multiple unwinding of the general specification that significantly complicates both the methods developed within the theory of asynchronous automata. We use the classic finite Mealy automaton model for designing a self-timed device. The transition graph, shown in Figure 1 , presents such an automaton.
specification procedure and synthesis.
Also, some research derives the general specification ofthe "&ce-envimment" system using vertices of free choice3 rather than linearization. The pair XJY, marks each arc leading from one state to another. The automaton passes from state Si to state S, under the influence of input X, producing outLet us consider only those automata in which each internal state is attainable from any other state, that is, those with connected transition graphs. This restriction is not excessive, because, first, connected automata are of the greatest interest. Second, any unconnected graph can always be converted to a connected one by using dummy input and output characters.
Our procedure for designing a selftimed device consists of the following steps: put 6.
1. choice of a standard self-timed realization for a finite automaton 2. reduction of an automaton transition graph to a simple cycle graph 3. construction of a change diagram device specification 4. application of formal methods to the change diagram to obtain Muller's circuit for output and memory element excitation signals
Standard automaton realization.
Step 1 defines the automaton structural scheme; the rules governing its interaction with the environment; coding input, output, and internal-state characters; and the memory element structure. Only logic functions of the allotted signal stay undefined. Now we can define the partial order of signal change for the signal sets representing the automaton's inputs and outputs as well as the memory element signals. However, the environment's nondeterministic behavior (it chooses the next input set in an unknown way) is a problem.
Two studies'J propose canonical approaches that allow us to simplify the synthesis procedure. Using them, we obtain standard realization circuits. The memory element determines the type of standard realization. Standard realizations use irredundant coding of automaton internal states, unlike asynchronous automata, which use antirace coding. A standard realization contains a combinational circuit, parallel register, and perhaps, an indicator of transient process completion moments. D flip-flops with two-rail inputs or T flipflops form the register. In this article, we use a memory element of two masterslave RS flip-flops.
Special design methods, as well as proper information coding, provide invariability of circuit behavior from element time parameters. Such code systems are selfsynchronized. For selfsynchronized codes, set B appearing at the output indicates the completion of a transition from set A to set B. Selfsynchronized codes are a universal and unique tool for fighting functional hazards.
Asynchronous automata theory usually uses neighbor or quasi-neighbor coding methods. Such coding systems can be treated as single-phase selfsynchronized codes that are overly redundant. They allow direct transition from one character to another.
Those self-synchronized codes where every working character alternates with an empty one (spacer) are more convenient. Note that using such sequences removes the restriction peculiar to asynchronous automata that no character can follow itself. A working input character initiates the active phase of the device work; the spacer initiates the passive phase. The selfsynchronized code of such sequences are twophase.
The most commonly used twophase, selfsynchronized codes are equalweight codes and codes having an identifier (Berger's codes). These are the basis of all other selfsynchronized code systems. We prefer equal-weight codes consisting of sets with a fixed number of ones. An example is two-rail code that represents each information bit by two In standard realizations, we usually use two-phase self-synchronized code systems for input and output states. That system defines the automatonenvironment interaction. The automaton replies to a working set with a working set and to a spacer with a spacer;
the environment replies to a spacer with a working set and to a working set with a spacer. It is also important to choose rules governing the interaction between different parts of the automaton structural diagram. We discuss this problem later.
Reducing the automaton graph. In step 2, we unwind the automaton transition graph into a simple cycle to change free choice of the next input characterto a deterministic choice. The unwound graph defines a sequence of input characters that causes all the possible transitions in the automaton and is a loop that must pass every arc at least once. This guarantees the definition of all possible transitions. In such an unwinding, some states can meet repeatedly.
We can reduce a connected, oriented graph to a simple cycle in more than one way. We may wish to find its optimal unwinding; that is, the one containing a minimum number of vertices. The following is a possible algorithm for doing so:
1. compile the set of possible simple cycles from the automata graph 2. find all possible subsets of this set that cover all the arcs of the graph 3. choose the subset with the minimum number of cycle vertices Any coverage of the graph is adequate for our purposes because all coverages define the same automaton; the optimal coverage simply reduces the subsequent work of the designer.
A designer next reduces the automaton graph to a simple cycle graph by of the device on the allotted signal set. The standard realization type determines the way we construct it. Two flipflops with heteropolar control represent every variable coding internal state of the automaton. To construct the standard realization of an automaton we must code input, output, and intemalstate characters. Let two-phase, equal-weight, selfsynchronizing codes represent input and output characters. This establishes the rules of interaction between the automaton and the environment. We use binary code to represent the internal states.
After coding, we must define the rules of interaction between different parts of the automaton structural scheme. After that we can derive the change diagram specification. Let us accept the following rules:
1. Every transition executes in two phases. The first phase starts with a set of self-synchronizing code at the automaton input and finishes when a set of code appears at the output. The second phase starts with a spacer at the input and finishes when a spacer appears at the output.
If the transition does not cause a
change of state, then the input signals are the immediate cause of output signal changes. Figure 3a shows such a transition.
3.
We break an automaton transition from one state to another ( Figure  3c ) into two successive subtransitions. Figure 3d shows the subtransitions. We thus construct the signal change orders for every transition of the automaton graph unwound to asimple cycle. After defining the initial marking, we obtain the signal graph specification for the device.
We emphasize that the specifications obtained from different unwindings of an automaton graph, when processed by the synthesis procedure, must lead to the same result. We can strictly prove this statement.
Self-timed stack memory
As an example, we first applied our procedure to the design of a self-timed stack memory. Several approaches exst for designing this kind of memory; ye divide them into two basic classes: egister structures and memory-based ,tack. Studies of self-timed stack design isually consider register structures? We vi11 consider the second approach.
In a usual CMOS static-memory array with two-rail representation of the data lath, sufficiently simple tools indicate ,cad-operation completion. The main lroblem is indicating write-operation :ompletion. We solved this problem4 y breaking the write process into two lhases: reading information and rewritng it. Memory detects rewrite compleion by checking whether the code ieing written coincides with the code n the data path of the read. The details If self-timed memory organization and jesigning the control circuits are out-;ide this article's scope. Figure 4 presents the stack structure. t contains a self-timed memory array, ;tack pointer, and control unit. Signals i and W initiate read and write operaions. They first enter the self-timed memory array and control unit to de-.ermine the stack pointer work mode. 4ck is the signal that acknowledges op-?ration completion to the environment. 4dr is the set of address signals coming io the memory array from the stack pointer. Adr initiates the memory work, while R and W choose the mode. C is the set of control signals coming to the stack pointer from the control unit.
The signal graph in Figure 5 describes the rules governing stack block interactions. We indicate active and passive signal states by + and -. Extra signal 0 unites W and R making them indistinguishable.
Our goal is to design the control unit. To d o this, we must consider the stack pointer circuit, which produces memory block addresses.
Stack pointer. The stack pointer logic circuit should be simple because its complexity grows linearly with increasing stack size. We achieve thissim- plicity by increasing the number of external control signals and, hence, complicating the control circuit. 
w/u, Using this description, we can easily represent the control unit by a Mealy automaton with the transition graph of Figure 7 . The automaton states correspond to the following situations:
WE-occurrence of a write by an even address WO-occurrence of a write by an odd address RE-occurrence of a read by an even address RO-occurrence of a read by an odd address
To code four states, coding variables S, and S, suffice. Let variables, take values from set (e,o} and variable S, take values from set {r,w}. Then, clearly, the coding of the automaton states is neighbor since only one variable value changes in every transition. Such coding simplifies the automaton realization.
We set the output signals, shown at the arcs of the graph in Figure 7 , so that every variable that codes internal states breaks its set into two nonoverlapping subsets. Choosing such output signals must simplify its logic functions. For synchronous automaton realization, the universal antirace method uses a two-register memory that divides the registers' work in time. We find a similar approach useful for self-timed realization. For example, a pattern of self-synchronizing code appearing at the inputs triggers a write to the first register. A spacer at the input causes information to move from the first register to the second. In some cases, the second register can contain a smaller number of simple flip-flops than the first. When the information moves, it must be compressed. We use this case in our example. Such automaton memory organization, together with coding input and output characters using two-phase selfsynchronizing codes, provides rnonotonic representations for all signal logic functions. In our case, the inputs and outputs are already coded, and we can easily check that these codes are twophase and self-timed. We have thus designed the standard realization of the automaton defined by the transition graph of Figure 7 .
This graph reduces to a simple cycle graph rather easily. Two simple cycles, obtained by passing all the graph vertices clockwise and counterclockwise, cover all the graph arcs. We construct an unwinding of the graph from these two cycles.
We derive the linearized specification, represented by the signal graph in Figure 8 , by unwinding the graph of the automaton standard realization. Using this specification and formal synthesis methods, we obtain the following Muller's circuit:
Considering these derivations as logic functions of AND-NOR gates, we can draw the logic circuit of the control automaton. An arrangement of counter stages in a pipeline guaranteeing that the response time is independent of the number of stages A counter contains two interconnected asynchronous pipelines. Counting signals propagate in one; carry and overflow signals propagate in the other, but move in o p posite directions.
Modulo
Let us construct such a counter using these principles and our automaton approach. In a modulo-2" counter, n is the number of stages. Let the ith counter stage be the black box shown in Free inputs of the most significant stage meet the following boundary conditions: input Pi connects to output A, and constant zero feeds input QP A finite automaton with the graph shown in Figure 10a represents a counter stage. The automaton has internal states So and SI that correspond to keeping the information bit values 0 and 1. Signal Ai= 1, which is the analog of clock signal for the automaton, initiates every transition. Therefore, this signal does not appear in the transition graph. The automaton passes from state So to state S, on any condition; on the corresponding transition arc the condition is marked 1 and the output signals are Q, = 1 and A, = 1. The automaton allows the transition from& to So only when it receives input signal The inputs and the outputs of the automaton are already coded. We can easily check that for all transitions coming from any internal state the codes of the inputs and outputs are self-timed and twephase. Thus, we have constructed a standard automaton realization.
Let us unwind the automaton graph to a simple cycle (Figure lob) . Using this unwinding and the accepted standard automaton realization, we derive the signal graph specification, shown in Figure 11 .
From this specification, formal methods produce the following Muller's circuit:
Note that in this circuit the functions of signals Q,, Po, and A, are selfdepending, that is, they must be implemented as flip-flops madefrom a gate and an inverter. Signals Q, and E in function A, come from the gate outputs of the corresponding flip-flops.
We can improve this solution by inserting changes into the signal graph specification. For example, the modified specification derived by removing two arrows, shown as dashed lines in Transitioning the specification within a basic structure to a change diagram is a formal step. It allows us to obtain an self-timed implementation using wellknown formal methods? The facilities used for formal synthesis determine s o lution quality.
It is important to introduce the behavior of the basicstructure gates into the change diagram. Starting our design procedure at the gate level increases the quality of the solution. This article's limited length does not allow us to describe the details of this process. @
