Abstract-Q-modules are internally-clocked modules that can be used to satisfy delay-insensitive specifications. The allowed changes of inputs to, and outputs from, a delay-insensitive module are specified by partial orderings of these signals in such a way that the set of possible behaviors remains unchanged with arbitrary values of delay inserted in series with each input and output path. A two-phase single-wire clock and a single-wire clock acknowledge are used for sequence control and accommodate any value of flip-flop hold time. The clock distribution within a Qmodule is also delay insensitive and the modules will operate correctly with any value of delay inserted in series with the clock distribution. Metastable flip-flop operation due to input signal changes will not cause failures hut will only extend the clock cycle time. Correct sequence operation is ensured in exchange for an occasional clock cycle extension. The only delay constraint that must be satisfied in assembling a Q-module from its predesigned components is a one-sided requirement that a particular clock phase be longer than the longest delay through the combinational logic of the module. Prototypes of components to implement Qmodules have been designed, and a design aid program, QSYN, to place instances of these components, personalize a PLA, and generate a MAGIC or CIF file for a CMOS realization, including the delay circuitry, is being developed. Testability is one of the advantages of Q-modules over clock-free delay-insensitive modules; circuitry is included in the cells for testing the logic and interconnections.
Abstract-Q-modules are internally-clocked modules that can be used to satisfy delay-insensitive specifications. The allowed changes of inputs to, and outputs from, a delay-insensitive module are specified by partial orderings of these signals in such a way that the set of possible behaviors remains unchanged with arbitrary values of delay inserted in series with each input and output path. A two-phase single-wire clock and a single-wire clock acknowledge are used for sequence control and accommodate any value of flip-flop hold time. The clock distribution within a Qmodule is also delay insensitive and the modules will operate correctly with any value of delay inserted in series with the clock distribution. Metastable flip-flop operation due to input signal changes will not cause failures hut will only extend the clock cycle time. Correct sequence operation is ensured in exchange for an occasional clock cycle extension. The only delay constraint that must be satisfied in assembling a Q-module from its predesigned components is a one-sided requirement that a particular clock phase be longer than the longest delay through the combinational logic of the module. Prototypes of components to implement Qmodules have been designed, and a design aid program, QSYN, to place instances of these components, personalize a PLA, and generate a MAGIC or CIF file for a CMOS realization, including the delay circuitry, is being developed. Testability is one of the advantages of Q-modules over clock-free delay-insensitive modules; circuitry is included in the cells for testing the logic and interconnections.
Index Terms-Asynchronous sequential circuit, delay insensitive, logic design, metastable.
I. INTRODUCTION
ELAY-INSENSITIVE circuits [ 161 or modules simplify D design at the system level because correct timing and control of operations can be achieved without explicit consideration by the system designer of circuit and interconnection delays, or of clock generation and distribution between modules. Only the sequences of operations to be performed need be considered, not the times for performing each operation; operations are initiated after their predecessors are completed, not after some fixed and independently determined time. Macromodules [4] demonstrated a methodology and set of building blocks or modules that were used for design of delay-insensitive systems without the need to consider propa- Manuscript received July 19, 1986; revised December 18, 1986 . This work was supported by the Biotechnology Resources Program of the National Institutes of Health under Grant l-R24-RR01379.
The authors are with the Institute for Biomedical Computing, Washington University, St. Louis, MO 631 10.
IEEE Log Number 8718425.
gation delays or introduce clocking disciplines. The operations were specified by sequence or order, not by time, making the substitution of new modules with faster operation simple. The concerns for performance and correctness could be separated and dealt with individually. This form of specification facilitated exploitation of concurrency since the initiation of operations was dependent only on the completion of preceding operations. The simple design description sufficient for system construction from macromodules made experimenting with architectures and algorithms easy. The complete specification of the CHASM system for analyzing neuron models using Markov processes, in terms of macromodules [4], required only specification of macromodule location in a cellular frame, data paths, and a control flow chart. Ease of modification of macromodular systems was illustrated by MMS-1 through MMS-4, a series of systems for displaying models of molecules that were designed and constructed in four configurations to experiment with different architectures and implementations for the same task. MMS-X was a following system in which collections of macromodules were used for experiments in display vector generation, and later replaced with a single macromodule designed specifically for that task, with no need to modify other parts of the system to accommodate the increased speed of operation available.
Specification and synthesis methods that suppress details of clocking and timing also provide significant advantages for VLSI circuit design. Dealing with module connection and interaction in the sequence domain simplifies the designer's task over dealing explicitly with time and clocks. Individual operations can proceed when their required input values are available. The necessary condition is that the inputs be available, not that a given time has elapsed. I-nets [lo] and trace structures [12] have been used for specifying delay-insensitive operation. These essentially equivalent models specify the allowed sequences of signals at the interface between a module and its environment, specifying what the module may do in response to any sequence of module input signals allowed by the specification. Both the module and its environment are specified together. It is impossible to specify a module by itself since its operation depends on the allowed input sequences and because there must be restrictions on the allowed inputs. For example, an input that is supposed to produce a corresponding output from a module cannot be allowed to make two changes without an intervening output from the module. If there is no intervening module output, it is impossible, in a delay-insensitive system, to ensure that a module will recognize two consecutive input signals.
Q-MODULES
Q-modules are internally-clocked modules that can be used to satisfy delay-insensitive specifications. The allowed changes of inputs to, and outputs from, a module are specified by partial orderings of these signals in such a way that the set of possible behaviors remains unchanged with arbitrary values of delay inserted in series with each input and output path. A two-phase single-wire clock and clock acknowledge are used for sequence control and accommodate any value of flip-flop hold time. The clock distribution within the module is also delay-insensitive and the modules will operate correctly with any value of delay inserted in series with the clock distribution. Metastable operation [ 11, [2] of flip-flops will not cause errors in Q-modules, only occasional increases in individual clock periods. Q-modules allow design of delayinsensitive modules as part of a VLSI circuit with a small design effort by use of specific implementation methods and predefined cells and interconnections. Fig. 1 The Q-module differs from a conventional implementation of a finite-state machine as a clocked sequential circuit in several important ways. 1) Correct sequence operation is guaranteed regardless of when the inputs change with respect to the clock, 2) the module will operate correctly even if the input Q-flops exhibit metastable behavior, 3) the clock generation and Qflop circuitry adapt to the delay in clock distribution by using an acknowledge signal from each flip-flop to indicate when it is ready for the next clock event, 4) there is only a single delay constraint that must be satisfied in the clock generation and distribution, in contrast to at least four delay constraints required for conventional sequential circuits implemented with two-wire two-phase clocking [8, ch. 71, 5) the module operation is not specified in terms of states but in terms of allowed sequences of interface signals, and 6) the clock period is not fixed; if extra time is needed for a Q-flop to resolve to one of its two stable states, the next clock event is delayed until the Q-flop has resolved. where I , 0, and S are finite, nonempty sets of inputs, outputs, and states, respectively, 6 is the state transition function mapping the Cartesian product I x S into S , and X is the output function mapping I x S into 0. On each clock cycle, a new state is calculated by 6 . This mathematical specification of an FSM does not account for some of the restrictions imposed by a physical implementation such as the nonzero time required for a signal to change from one state to another, and the propagation time for signals through logic and wires, which make it impossible to ensure simultaneous signal changes at multiple locations. In addition, delay-insensitive requirements for communication add a requirement that is not necessary for all FSM's: if a module output changes state, it must make a single monotonic transition from the old to the new value and if it does not change state, it must remain stable without any transients. The function of the sequencing or timing in a physical implementation is to control the storage elements to accomplish the FSM operation described above under the constraints of nonzero signal transition times and propagation delays. To accomplish this, we place the following sufficient, but more restrictive than necessary, requirements on the operation of Q-modules. 1) Initialization: The Q-module is placed in a known state (So) that is independent of the present state, for example, all 0's are stored in the state and input registers and the combinational logic calculates the next state (SI) and applies it to the state register inputs. The initialization includes sufficient delay to ensure that the outputs from the Q-flops propagate through the combinational logic and produce stable values at the Q-flop inputs as prescribed by the next state function 6.
A . Finite-State Machine Specification
2) Clock cycle: The sequence of steps described in a)-c) is executed repeatedly, once for each cycle of internal clock operation. a) Store State: All input values to the Q-flops are stored in the Q-flops while the Q-flop output values remain unchanged. b) Update outputs: The Q-flop outputs are updated to the stored values. The Q-flop outputs are set to the values stored in step a), and remain at those values until the subsequent clock cycle. c) Logic delay: a delay elapses which is sufficient to ensure that the next state as specified by 6 and calculated by the combinational logic propagates to the Q-flop data inputs.
Discussion: Conditions a), b), and c) ensure that the state calculated during each clock cycle and stored in the state register on the succeeding clock cycle is the specified function of the present state and the module inputs. If module input ij changes at about the same time that the inputs are stored in the Q-flops, the stored value of ij is not predictable, but will correspond to either the new input value or the old one. If several module inputs change at about the same time as the data are stored, the stored values will correspond to some combination of the old and new input values. This stored value of the inputs may not correspond to any set of values that was actually present at any instant at the module inputs, since variations on the internal paths of the module may reorder the signal changes within the module. If several module inputs can change without an intervening output from the module, then any combination of values of those concurrently changing levels must be acceptable. If succeeding states depend on the order in which module inputs change, the delay-insensitive specification requires that there be a module output acknowledging the earlier input signal(s), before the later input signal(s) can be changed by the module environment. Thus, the module need not strictly preserve the input sequence for all inputs, only those which are separated by module output signals.
3) Transient-free outputs: The delay-insensitive transmission of signals between modules requires that the module outputs be free of transients due to races and hazards. It might be impossible to distinguish between a transient at a module output and two successive level changes that are part of the specified behavior. Each module output must make a single monotonic transition, if specified, or remain stable.
4) Testability:
The fabrication correctness of Q-modules, and systems constructed from them, should be determinable to the desired degree of confidence in a practical way by direct testing or by statistical calculations based on observable properties of the fabrication process. Test procedures that require excessive testing time do not meet this criterion.
The above requirements are sufficient, but not necessary.
We have adopted them as defining the requirements for Qmodules, but for other implementations of sequential circuits, they might be relaxed. For example, condition 2a) can be weakened; it is possible to allow the outputs of some Q-flops to change before all of the values are stored, if the delay from the Q-flop outputs back to their inputs is long enough to ensure that storage will take place before the inputs change. Ensuring that this delay will be sufficient is difficult since it depends on the performance of several parts of the module and on minimum delay values as well as maximums. The need to control or even know the minimum value of delays is avoided by the conditions prescribed above.
B. Internal Sequencing and Timing
The method used to satisfy the requirements outlined above is shown schematically in Fig. 2 . Sequencing is controlled by a two-phase single-wire clock signal C that is distributed to all Q-flops, and an acknowledge signal A , the rendezvous of the individual acknowledge signals Ak from each storage element. The requirements described above are implemented as the following.
I ) Initialization (Requirement 1):
Initialization can be implemented by setting the states of the Q-flops directly or controlling the combinational logic outputs and the module inputs during one clock cycle. A long time must be allowed for the initialization signal to be distributed to the modules since their operation will not be delay insensitive with respect to this signal. Since initialization should be required only after power is applied or as part of testing or other nonperformance critical operations, long delay times may be substituted for delayinsensitive operation. If initialization is required for error recovery or some other function, and the time required for this initialization is critical to performance, then more sophisticated initialization methods may be required.
2) Store State (Requirement 2a):
The store-state operation is controlled by the clock-store event, shown as a clock transition labeled CS (clock store) in Fig. 2 , which causes all Q-flops to store the value present at their input when the CS event occurs. When the kth Q-flop has completed its task of storing the input value, and its input may be changed without affecting the stored value, an acknowiedge transition from the high to low state, the ASk event, is generated. When all of the ASk events have been generated as signaled by the AS event from the rendezvous element, the next step (2b) can proceed.
3) Update Outputs (Requirement 2b): The update outputs operation is controlled by a clock output event labeled CO (clock output), which causes all Q-flops to update their outputs to equal the value stored in response to the preceding CS event. When the kth storage element has updated its output, an acknowledge transition from the low to high state, the AOk event, is generated to signal that its output is updated. When all of the AOk events have been generated as signaled by the A 0 event, all of the Q-flops have updated their outputs and the next step (2c) can proceed.
In a delay-insensitive system, any input that the specification allows to change, may change at any time including the time at which it is being stored in the input register. Thus, a Q-flop may enter a metastable state [ 11, [2] and remain there an arbitrarily long time before resolving to one of the two stable states. If the metastable state is reached, the Q-flop must either delay its AS event until resolution, or delay its A 0 event until a valid output is present. If the latter, the AS signal might be generated before the stored value was resolved, but either stored value is acceptable in this case and the value will not be used until after the A 0 signal, by which time the value would be guaranteed to be resolved. In the following discussion and implementation, we have chosen to delay the AS event until after resolution has occurred since this is simpler and the average loss in performance is small.
4) Logic Delay (Requirement 2c):
The logic delay is used to allow the updated Q-flop outputs to propagate through the combinational logic that calculates the next state values, and for these new values to propagate to the Q-flop inputs. This delay elapses between the completion of the Q-flop updating signaled by the A 0 event (rendezvous of all of the AOk events) and the generation of the CS event on that next cycle, and must be larger than the longest delay from any Q-flop output to any Q-flop input. The module inputs are stored in Q-flops controlled by the same clocking protocol as the state register. These Q-flops guarantee that the Z inputs to the combinational logic do not change while the next state is being calculated so that the next state is specified by a valid combination of the inputs and present state. Thus, races and hazards in the combinational logic cannot produce an invalid next state due to changes in the Z inputs while that next state is being determined. These Q-flops have the effect of placing a delay of as much as one clock cycle in series with the inputs, but since they are delayinsensitive inputs, it does not change the possible behavior of the module.
5) Transient-Free Output (Requirement 3):
The Qmodule design must ensure that the module outputs change only in response to input changes, and that only a single monotonic transition or no change is generated. The effects of races and hazards cannot be allowed to cause changes in the module outputs that are not part of the module-environment specification, even for very short intervals. This is accomplished by using Q-flop outputs directly as module outputs to eliminate the effects of races and hazards in the combinational logic, and by careful electrical design of the Q-flops. Making h the identity function mapping I x S to 0 and using the stored valSection V-C.
6) Testability (Requirement 4):
Several means exist to enhance the testability of Q-modules including the incorporation of circuitry specifically for testing, such as level-sensitive scan-design (LSSD) registers [ 181 that enhance the observability and controllability of nodes within the modules. Additional test facilities that are specific to the metastable behavior of Qflops are discussed in later sections.
C. Previous Work
Conventional clocked circuits must have a finite probability of synchronization failure [ 11 when independently clocked systems or subsystems communicate, although this probability of failure can be made as small as desired by introducing sufficient delay. Many asynchronous circuits, including Qmodules, avoid synchronization failure by waiting until storage elements have resolved before using the stored values. Thus, they introduce only as much additional delay as needed for storage elements to resolve, rather than a fixed time as in conventional clocked circuits.
Classical asynchronous circuit design as described by Unger [ 171 generally places restrictions on allowed input changes, such as the requirement for fundamental-mode operation that only one input can change at a time, and succeeding input changes must be delayed until the circuit is "stable." Synchronizer and arbiter circuits cannot, in general, satisfy these restrictions. Speed-independent circuits developed by Muller [9] , [ 131, [ 171 provide output signals that indicate when succeeding inputs may change, but these circuits are difficult to design and no satisfactory procedures for synthesis of Muller circuits exist. Synthesis methods developed by Molnar [ 101 for clock-free asynchronous circuits utilize a unified specification of the circuit and its environment, and provide methods to identify and control races and hazards, even for circuits that allow concurrent input changes. These circuits are most suitable for simple functions with few inputs and outputs because circuit and design complexity typically increase dramatically as the number of inputs and outputs increase. Both fundamental-mode circuits and Molnar circuits can be difficult to test, particularly when redundant logic terms are required to cover static hazards.
Pausable clocks for synthesis of asynchronous circuits have beendescribed in previous work [3], [7] , [8] , [ I l l , [15] . They delay clock operation until resolution of flip-flop metastable operation thus avoiding synchronization failure. No provision was made in these previous works to ensure correct sequence operation by use of storage flip-flops that acknowledge when they are ready to accept new inputs as well as acknowledging when the storage operation is complete and stable. Thus, they have more timing conditions on clock generation and distribution to ensure correct sequence operation than Q-modules require. The entire clock generation and distribution sequence in Q-modules utilizes a delay-insensitive protocol. Q-modules also provide a means to design testable delay-insensitive circuits that are free of synchronization failures and free of the need to analyze or control races and hazards. The increase
in complexity over conventional synchronous circuits is restricted to the storage elements and clock generation and distribution circuitry.
Q-ELEMENTS
We have defined several basic types of elements or building blocks from which Q-modules can be assembled. A distinction should be made between the functional elements that are used to explain Q-module operation and the physical elements that are assembled to construct Q-modules. First we will describe a set of functional elements that could be used to construct Q-modules while realizing that the physical elements may have a different grouping of functions that is at least partially dictated by connections and physical proximity. Then we will describe one set of such physical elements and some of the implementation tradeoffs. Four functional elements from which Q-modules are composed are: 1) combinational logic, conveniently implemented as a PLA, that determines the next state, and thus the output values, as a function of the inputs and present state, 2) a Q-clock that generates the clock events, CO and CS, signaling that the Q-flops are to perform their operations, that receives acknowledge events, A 0 and AS from the rendezvous indicating that the preceding Q-flop operation is complete, and that provides the delay necessary to account for the propagation time of the combinational logic,
3) a rendezvous used to produce the clock input A 0 and AS signals after all of the Q-flops have produced their respective AOk and ASk signals, 4) Q-flops which capture and store input values and state information and deliver them as outputs in response to distinct clock events, and that indicate when they have completed these operations by means of acknowledge events, AOk and Ask. Fig. 1 shows a simple Q-module containing each of the functional elements described above, and their interconnection. The following sections describe the operation of each of the four elements in more detail.
A . Q-Combinational Logic
The combinational logic can be implemented as a PLA for simplicity of automatic generation, or by other techniques. There are no restrictions on hazards or races since the module outputs are taken from Q-flops that are clocked only after the combinational logic outputs have settled to their steady state. The only combinational logic requirement, other than generating the correct output values, is that there must be a known maximum delay following input changes until the Combinational logic outputs are stable at their new values.
B. Q-Clock
The Q-clock generates the initial CS event, and succeeding CO and CS events in response to the AS and A 0 events, respectively. CO can be generated immediately after receiving the AS event from the rendezvous since all of the Q-flops have captured their input value by that time and the Q-flop outputs can safely be changed without affecting the stored values. A delay is necessary between reception of the A 0 event and generation of the CS event that is at least as long as the I C Fig. 3. Q-flop. longest delay from any Q-flop output, through the combinational logic, and back to any Q-flop input. This delay assures that the data stored in response to the CS signal are that calculated for the next state of the module. This is the only delay condition that is required to ensure correct sequence operation of the module. It has a one-sided bound so any margin desired can be used in selecting this delay, with a consequent performance penalty. The clocking scheme ensures that all delays associated with CS, CO, AS, and A 0 distribution do not affect correct operation of modules. A basic Q-clock is just an asymmetric delay with minimum delay time for the AS input to CO output and a delay from the A 0 input to CS output at least equal to the delay of the combinational logic and its interconnections. In addition to the requirements for a basic Q-clock, there are a number of additional capabilities desirable for testing of a Q-module. These are discussed in Section IV on testing.
C. Q-Rendezvous
The Q-rendezvous has the A outputs from each Q-flop as inputs, and generates the signal to the clock logic indicating when the ASk signal or the AOk signal has been generated by all Q-flops. The Q-rendezvous can be implemented by a Muller C-element [9]. The C-element cannot be delayinsensitive internally, but the delay-sensitive part of the Celement can be controlled using short and local connections which are carefully designed once, and replicated as required.
D. Q-FIoPs
Q-flops are used both to store the value of input signals to a Q-module and to store the state of the Q-module feedback signals from the combinational logic outputs. The requirements for these two applications are not identical although a single Q-flop can satisfy the requirements for both of them. The input Q-flops must be able to accept inputs that change at any time with respect to the clock and must therefore be able to operate correctly in the presence of metastability [l] , [2] , [8]. Both input and feedback Q-flops must have outputs that make clean monotonic transitions, since their outputs can be used as module outputs.
Some electrical design and analysis beyond that required for normal logic circuits is necessary in the design of input Qflops to ensure reliable operation in the presence of metastable operation. In order to satisfy the requirement that the Q-flop must not generate the AOk signal until its output is stable at the new value, the Q-flop must have some means to determine when the possible metastable operation of the latch within the Q-flop has resolved. gram that operates according to the requirements developed in Section 11-A. It has two major parts, a Q-flop resolver that accepts the input signal and deals gracefully with data input signals that change at any time with respect to the clock, and a Q-flop output that holds the stored value while the Q-flop resolver is preparing to sample a new value, and which provides the Q-flop output.
2) Q-Flop Resolver: The Q-flop resolver, shown in Fig. 4 Conventional latches that are in one stable state or the other except for a time about the clock or data transitions, require a delay, after the clock event, and before using the RH-L and RL-L output(s), to allow those outputs to reach the high state if the latch enters the metastable region of operation. A novel characteristic of this Q-flop resolver circuit is that it requires no delay to allow RH-L and RL-L to reach the high state after the CS clock event (C goes from H to L ) that causes the input data value to be stored. The latch is held in the unresolved state with RH-L and RL-L high, prior to the CS clock event, so no time is required for them to reach the high state to indicate the latch is unresolved; they are already high. When the C input goes low, MOSFET QC turns off. The data input D provides a bias to the latch to control which state it resolves to. This bias must be strong enough to direct the latch to the state specified by a stable D input when C goes low, but weak enough that it cannot change the state of the latch once RH-L or RL-L has gone low. When the data value is resolved, either RH-L or RL-L goes low indicating which value ( H or L ) is stored. This places a two-sided bound on thc strength of the bias applied by the data input that must be satisfied over the range of manufacturing tolerances and operating conditions encountered.
The MOSFET sizes must satisfy several criteria. The ratio of Q3 to Q2 must make the metastable voltage for the Q-flop resolver (the voltage at which the unloaded inverter output voltage is equal to the input voltage) about 2 V (assuming V,, = 5 V) so that QC can be turned on with VDO applied to its gate. Because of body effect (which is accentuated in a p-well process) the effective threshold voltage on QC is increased over that of an N-channel FET with a grounded source. The Appendix contains a detailed analysis of the Qflop resolver electrical operation.
3) Q-Flop Output: The Q-flop-output can be implemented with logic gates as shown in Fig. 5 . It stores the data value while the Q-flop resolver is preparing to sample the next input value (the interval while C is high). Q-module design ensures that the inputs to the Q-flop output cannot cause metastable behavior. The Q-flop output design follows the Washington University clock-free delay-insensitive circuit synthesis methodology [ 101. A complete clock cycle starts with C and A both high as shown in the waveforms of Fig. 2 . The clock cycle starts with the CS event, a high to low transition of the C signal, which causes the P output of the Q-flop output, and thus the clock input to the Q-flop resolver, to go low. MOSFET QC is now off and the latch within Q-flop resolver then settles to one of its two stable states, the particular state determined by the bias provided by the D input. Either the RH-L or RL-L output then goes low, indicating to which state the latch has settled. The low RH-L or RL-L output produces an ASk event through AND gate P1. During the operations described above, the present state is held undisturbed in Q-flop output as required by the specification.
Some time after the AS signal from each Q-flop has been generated, the C signal will make a transition to the high state, signaling the CO event. This causes the Q output to change to agree with the stored value of the Q-flop resolver, if it was different, and causes the P output to go high when the Q output is equal to the Q-flop resolver value. This turns on QC and causes RH-L and RL-L to both be high, and this causes the A output to go high, producing an AOk event. The AOk event cannot occur until after the Q output is set equal to the Q-flop resolver value so it correctly follows the updating of the Q-flop output.
This circuitry satisfies the constraints put on the Q-flops in Section 11-B: 1 ) the ASk signal is not generated until the input value is sampled and the latch is resolved to one of its stable states, 2) the AOk signal is not generated until the output value has been updated and is present at the Q-flop output, and 3) the Q-flop is not sensitive to changes in its input between the ASk event and the following AOk event.
IV. TESTING
Two main objectives of testing are to verify design correctness and to verify manufacturing correctness. The former need be done only once and may even be avoided by adequate design methodology, but the latter is required for every copy of the circuit that is fabricated. Verification of manufacturing correctness can be divided into two parts, one to verify the correct static operation of the circuit, including the presence and interconnection of the components, and second to verify that propagation delay values satisfy the constraints for correct operation of the circuit. Since Q-modules are to be used for a wide variety of applications, a general testing method that is independent of the particular function of a module is desirable.
A tradeoff exists between the thoroughness of testing, test time, and the area and complexity of on-chip circuitry that is used to facilitate testing. Our approach is to provide all of the testability that is feasible in initial designs. We expect to reduce some of the testability provisions in later circuits as experience identifies the most practical compromises.
A. Test Requirements
A proposed procedure for testing Q-modules for manufacturing defects or the effects of extreme process variations follows.
1) Verify that the combinational logic outputs are the specified function of the inputs.
2) Verify the connections between Q-elements. 3) Verify that the delay in the clock circuit is longer than the longest delay through the combinational logic. 4) Verify that the rendezvous circuit requires the ASI, and AOk signals from each Q-flop before providing the AS and A 0 signal, respectively, to the clock circuit. 5) Verify that the Q-flop resolver RH and RL outputs correctly indicate the unresolved and resolved conditions. 6) Verify that each Q-flop can be set to both resolved states.
A safety margin is desirable for some of the above tests, such as the clock delay and data bias, to ensure that the operation of the circuit will be correct in the presence of minor variations in component values and parameters due to temperature, power supply tolerance, repetition rate, and aging.
1) Combinational Logic and Interconnections:
Levelsensitive scan design (LSSD) techniques [18] provide a way to control and observe internal points in an integrated circuit with a small number of connections. These techniques can greatly simplify testing by partitioning a circuit into several small portions that can be tested independently. By incorporating each Q-flop used for input or state storage in the Q-module into a scan-path register, the operation of the Q-flops (excepting the Q-resolver), the combinational logic, and the connections between the combinational logic and the Q-flops can be tested directly. Connections between Q-modules can also be tested since the state and input flip-flops can be controlled to set any possible output pattern from each module, and the correct reception of those patterns verified at the input Q-flops of other modules receiving the signals.
2) Propagation Delay: Propagation delay testing verifies that the delay from the A 0 event to the CS event is greater than the longest propagation delay from any Q-flop output, through the combinational logic, to any Q-flop input. Since it is the relative value of these delays, not their absolute value 'I that is important, the values need not be measured. Instead, it is sufficient, and easier, to verify that the delays satisfy the required inequality. This can be done using the scan-path register connection to update the Q-flop outputs, waiting for a time equal to the clock delay, storing the Q-flop inputs, and comparing these stored values to the expected values. The simplest and most meaningful execution of these steps will use as much of the normal functioning of the Q-module as possible so that the parts are tested in the way they are used in normal operation. The scan-path registers are used to introduce the test values but the Q-flops operate with their normal CS and CO clock signals.
If this test is passed successfully, there is still no assurance that the module would not fail because of small changes in delay values due to temperature, power supply variation, or aging. To protect against these effects, we can include circuitry to reduce the delay slightly during testing. This ensures a safety margin in the functioning circuit. This safety margin will also protect against slight changes in propagation delay that depend on particular data values, in case it is impractical to test propagation delays for all combinations of data values. The reduction of clock delay during testing can be verified by observing the clock period.
3) Rendezvous Circuit: The rendezvous circuit is particularly troublesome since it has many inputs which may all change at about the same time and in the same direction. Special measures are needed to detect defects such as shorts between inputs. The method we are planning is a test mode where the resolution detector outputs of the Q-flops are controlled by data values shifted through the scan-path register connection of the Q-flops. This will allow the A k Q-flop outputs to be controlled individually so the correct operation of the rendevous circuit can be verified.
4) Q-Resolver:
The resolution detector is difficult to test. It must indicate unresolved when QC (Fig. 4) is on, have a single transition on RL or RH after QC turns off, and not be affected by changes in the data input after resolution has been indicated. An inverter whose high-output level can be controlled by an external supply connection (common to all such inverters), as shown in Fig. 6 , provides a means to test these critical Q-flop characteristics for margins. The output of this inverter drives the clock input to the Q-flop shown in Fig. 4(b) and by making VH slightly less than VDD, MOS-FET QC, the weak data bias MOSFET's (Q4a and Q4b), and the resolution detectors (QlA, QlB, Q5A, and Q5B) can be tested. A margin for setting and resetting the resolver latch by the weak data bias MOSFET's can be tested by operating with a reduced H level on the clock input. QC will be turned on less strongly in this case than with the inverter powered from VDD during normal operation, and if too weak, or if the data bias MOSFET's are too weak, will not allow the state of the latch to be changed. The resolution detectors can be tested by gating their outputs to part of the scan-path register for several values of the clock H level, and ensuring that the appropriate output changes from unresolved to resolved as the H level on the clock input is reduced, and back to unresolved as the clock H level is increased.
V. EXTENSIONS AND ALTERNATIVES
There are many alternative implementations and extensions of Q-modules that can provide advantages of increased functionality or performance. A few of them are briefly discussed in this section.
A . Unbounded Transition Times
If the inputs to a Q-module have unbounded transition times, the module might fail by recognizing an input that is slowly changing from L to H as H on one clock event and as L on the next clock event due to thermal noise in the circuit or due to capacitive, inductive, or resistive coupling between signals. The transition times would usually be short enough that this would not represent a problem. If the input transition times were very long, hysteresis could be added to the Q-flop resolver, controlled by the value stored on the previous cycle, so that once an input signal was recognized by a Q-flop resolver as having changed state, the maximum possible noise (with some probability for thermal noise) could not change the value to be stored by the Q-flop resolver.
B. Bundled Data
Data bundling is a technique for transmission of data values in a delay-insensitive system by data signals and an associated "data-announce signal" that indicates the presence of a valid data value [14]. Data bundling can save a significant number of components where relative propagation delays can be controlled. The requirements are 1) to ensure that the data-announce signal is generated after data are stable at the source, 2 ) that the data-announce signal propagates no faster than the data signals, and 3) that the destination will function correctly if conditions 1) and 2 ) are met. Each of the modules can be specified and designed independently, but there is an output sequence constraint on the data source, a relative delay constraint on the interconnections, and an input sequence constraint on the destination.
The output sequence constraint on the source can be satisfied by changing the data on one CO event and generating the data-announce signal on the succeeding CO event. The input sequence constraint at the destination can be satisfied by using the data values only on CO events that follow the CO event on which the data-announce signal is recognized. If the combinational logic outputs are independent of the data values until after the data-announce signal is received, a simpler Q-flop may be used for the data inputs that does not require metastable-state detection; the detection would be required only for the data-announce signal. Satisfaction of delay constraints on the interconnections depends on the routing and implementation employed. The relative delays between data and the data-announce signal can be influenced by deliberately increasing the capacitance (and thus the delay) of the data-announce signal by using a wider conductor than that used for the data values assuming delay is dominated by interconnection capacitance and not resistance. Less conservative design would sample the data input and data-announce signal with the same CS signal and generate the source data output and data-announce signal on the same CO signal. This leaves no margin, however, and may fail if the delay for a data signal is only slightly longer than that for the data-announce signal.
C. Specialized Q-Flops
Some of the requirements on the Q-flops can be relaxed in particular cases. The Q-flops in the input register need not have transient-free outputs if their outputs are filtered by a later Q-flop. The internal state Q-flops need not deal with metastability. Thus, it is possible to use a more specialized and simpler design for each case.
D. Alternate Q-Flop Circuit Designs
An alternate circuit for the Q-flop resolver flip-flop is shown in Fig. 7 . This uses a different method to force the flip-flop to the unresolved state, and has advantages and disadvantages with respect to the circuit in Fig. 4 . Because the transistors used to force the unresolved state have their source grounded, problems with body effect and threshold voltage shift are avoided. A disadvantage is that the forced voltages are not as close to the inverter voltage as in the first design; thus a greater range of common mode voltage is encountered, requiring better matching of components.
Logic gates can also be used to force the unresolved condition without the use of a "weak bias" for the data input as illustrated in Fig. 8 . This circuit was designed using the Washington University synthesis technique [lo] . This is a more difficult case to analyze for metastable operation since there are more gates in the feedback path than the previous two cases. We have not completed the analysis required to ensure that there are no problems in reliably detecting the metastable condition, such as those caused by oscillatory behavior that might cause the detectors to indicate resolution before the circuit was actually resolved.
E. Relaxation of Output Source Restrictions
Some outputs may be taken from combinational logic instead of the Q-flop outputs, if the combinational logic outputs are free from races and hazards, for example outputs for which there is no combinational logic and the Q-flop outputs are used directly or other outputs where the particular combinational logic function has no races or hazards. This may give faster performance since outputs are provided as soon as they are generated by the combinational logic without waiting for the CO event.
F. Q-Register
The Q-module functional diagram of Fig. 1 shows a distinct rendezvous element that combines the acknowledge signals from all of the Q-flops. In an actual circuit, it may be more practical to distribute this function among the Q-flops to make the design more modular, reduce wiring, and eliminate the centralized rendezvous. Fig. 9 shows a possible implementation. The Q-flop geometric layout has been designed so that the rendezvous connections are automatically made when Q-flops are abutted. 
G. Weakly Phase-Locked Clock
Some applications might have a performance improvement if a particular phase relationship could be maintained between the clocks in connected Q-modules. A voltage-controlled delay might be used in the clock circuitry to allow some control over the relative clock phases in the Q-modules. Of course, a period of metastability that increased a clock period would disturb the phase relationship; it would eventually be reestablished if no further disturbances occurred. The effect of disturbing the phase relationship would be a temporary loss in performance, not incorrect operation.
The voltage variable delay in the clock circuitry would require a larger margin than a fixed delay if the delay could be reduced by the weak phase locking, so a tradeoff would result, longer expected clock period in exchange for a clock phase relationship that reduced the number of clock cycles required. The need for a larger margin might be obviated by allowing the coupled signal only to increase the period, not to decrease it.
VI. CONCLUSION
Q-modules provide a quick and simple means to implement testable modules with delay-insensitive terminal specifications while avoiding synchronization failures and the requirement to analyze or avoid races and hazards while limiting the increase in hardware cost, compared to synchronous designs, to the stage registers. A single delay element is required with a onesided bound that its value be greater than the maximum delay of the combinational logic. A few predesigned cells capture the Q-element requirements and are well suited for composition by a design-aid program that is under development.
APPENDIX
A. Q-Flop-R esolver Analysis Analyzing Q-flop-resolver operation with a very simple model that incorporates the essential characteristics provides an understanding of Q-flop-resolver operation and design tradeoffs that would be difficult to obtain with a more com- plete but more complicated model. Refinements can be incorporated later to verify the correct operation of a specific Q-flop resolver after the simplified model has been used to make design choices. The following section develops such a simple model for the circuit shown in Fig. 4. 1) Simplified Model for Q-Flop Resolver: The function of MOSFET QC in the Q-flop resolver can be viewed as controlling the number of states or solutions to the operating point equations as described by bifurcation theory [ 5 ] . (with VD ranging from + 5 to -5 V) provide a weak bias of the operating point which is not strong enough to overcome the amplifier output resistance Ro when the amplifier is saturated. Although this is a greatly simplified model that does not match CMOS characteristics precisely, it has the same general characteristics and operation. Fig. 1 Fig. 12 shows the range of operating points as Vc and VD are varied, and also shows lines representing the thresholds for resolution detector outputs RH and RL. Required relations between the latch operation and these outputs can be deduced from the plot. The steady-state design task becomes one of ensuring that the unshaded intervals marked r 1, r2, r3, r4, r5, and r6 exist for the two steady-state values of VC. rl and r2 ensure that the flip-flop indicates unresolved when VC is high, r3 and r4 ensure that the flip-flop indicates the resolved state when it is resolved, and r5 and r6 ensure that the resolved state is not indicated when the flip-flop is not resolved.
The discussion above applies to the static operation of the Q-flop-resolver model for steady-state values of VC, VD, and V,, but not to its dynamic operation. We can, however, use the analysis of static operation given above to predict certain aspects of the dynamic behavior of the Q-flop resolver. For example, to demonstrate that the Q-flop-resolver model operates correctly with very slow transitions on its C and D inputs, we can use a quasi-static analysis. This quasi-static analysis can then be extended to predict limits on the dynamic behavior of the Q-flop resolver model.
Since the circuit operation is symmetric with respect to V D , we will consider only one polarity of VD for each of the following cases. The quasi-static operation is shown in Fig.  13 for the following conditions: 1 ) VD is stable at + 5 V while Vc changes from + 5 to 0 V. As VC decreases, V, will increase monotonically following trajectory D-C 1 [ Fig. 13(a) ].
2) VD is stable at + 5 V, and V , is positive, while the Vc 3) VC is stable while V , changes. V x will increase or decrease monotonically following trajectory C- Fig. 13(b) ].
1-
If V, and Vc change at about the same time, then the following possibilities must be considered: 1) V, changes from -5 to + 5 V with V X initially positive, while Vc is changing from 0 to + 5 V. This is not possible for a system with a delay-insensitive specification. V , must have previously changed from + 5 to -5 V and is now changing back to + 5 V but the stored value is + 5 V.
Since the stored value is still + 5 Y, no module output could have changed in response to the last change of V, from + 5 to -5 V because that value of V , has not been stored yet. This violates any delay-insensitive specifications for the Q-module containing the Q-flop resolver.
2 ) V, changes from + 5 to -5 V, with V , initially positive, while Vc is changing from 0 to + 5 V. In this case, the behavior of Vx may not be monotonic, it may initially decrease, and then increase as V , changes. However, it must be monotonically decreasing as long as Vx is positive and the threshold for changes in the KH output will be crossed exactly once. A set of possible trajectories of Vx versus Vc are shown in Fig. 13(c) . If the intersection of the threshold for RL, with the left edge of the output characteristic (point W W ) is to the right of point X X , then the Q-flop must operate correctly, since V, must follow a trajectory similar to one of those shown. If W W is to the left of X X , then dynamic operation must be considered as discussed below.
3) VC changes from + 5 to 0 V while V , changes from + 5 to -5 V as shown in Fig. 13(d) . As in case 2 ) , the behavior of V, may not be monotonic, but in this case trajectories of the form Y Y are possible and the RH output may go from resolved to unresolved and back to resolved, an unacceptable behavior. This can be prevented by putting limits on the transition time of Vc as described below.
2) Dynamic Operation: We can use a capacitor from node X to ground in Fig. 10 to incorporate a single-pole transient response in our simple model for the Q-flop resolver and allow consideration of dynamic behavior. An important point to notice is that even for dynamic: operation, Fig. 13 gives information about possible modes of operation. The sign of the derivative of Vx outside the shaded regions of Fig. 12 is given by the plus's and minus's. For any value of VC and V, outside the shaded regions, the direction in which V, will change and the state it will reach in the steady state is known.
Within the shaded regions, the change in Vx with time is not known and is a function of V,.
Correct operation for the conditions in case 3) above requires limits on the dynamic operation. If VC changes quickly enough with respect to changes rn V,, then the behavior shown by trajectory Y Y in Fig. 13(d) is not possible since by the time V , reaches the RH threshold, VC will be sufficiently low that V, will only be able to increase and it will remain above the RH threshold. V, will have to increase monotonically as it crosses the RH threshold. Thus, correct operation can be obtained by making the transition of Vc from + 5 to 0 V short enough. From Fig. 13(d) , we can see that if the time for Vc to reach less than 2 V, is less than the time for Vx to reach the RH threshold, the RH threshold will be crossed exactly once. Trajectory c in Fig. 14 shows a representative case.
Dynamic operation for case 2) above must also be considered, if the shape of the characteristics is such that the requirements for correct operation under quasi-static operation are not met. In this case, the operation will be correct if Vc reaches a voltage close to + 5 V before V, decreases to the RL threshold. Trajectory b in Fig. 14 shows a representative case.
There is no need for dynamic analysis of the other cases since they cannot fail, regardless of the transition times.
3) MOSFET Model of Q-Flop Resolver: The MOSFET model of the Q-flop resolver shown in Fig. 4 has the same general behavior as the simple model we have been discussing but there are a few additional factors to consider. They do not change the general discussion given above but affect the operating margins. Fig. 15 shows the results of SPICE simulations to determine the operating points for a specific set of MOS-FET parameters. a) Component Tolerances: Inclusion of worst case component tolerances does not affect the general form of the plots in Figs. 12-14 but increases the shaded areas and makes the operating margins smaller. In addition, with tolerances on the components, the data bias input must be strong enough to ensure that the flip-flop can be set to either state. This requires Q4A and Q4B to be strong enough to overcome component tolerances and be able to force V , to + 5 or -5 V as VC changes from + 5 to 0 V. b) Common Mode Flip-Flop Voltage: All of the above analysis has been performed considering only the difference output voltage, V, = V, -VB, while ignoring the common mode voltage, VCM = ( v~ + VB)/2. There is some influence of the common mode output voltage, however, on the differential circuit behavior. If the pullup and pulldown transistors were balanced and the resolution detectors required no current, the common mode voltage would not affect the operation. However, the output detectors do require some current, and the pullup and pulldown transistors are not balanced because it is desired to make the inverter threshold voltage less than Vdd/2 so that the bias on QC will be larger and turn QC on more strongly. A three-dimensional plot can be considered corresponding to Fig. 12 with the additional axis representing the common mode voltage VCM. The regions of operation will be a weak function of VcM and a two-dimensional plot can be made by taking the extreme values over a range of VCM and projecting them onto a two-dimensional plot. The result will again be similar to Fig. 12 but with the shaded regions expanded and the operating regions further decreased. The previous analysis can be applied to this plot to ensure correct operation over a range of V C M . e) Coupling Capacitance: We have assumed that coupling capacitance between circuit nodes is small and that the dynamic behavior is due to the changes in FET conductances; if too much coupling capacitance exists, the circuit will fail for any design since coupling from the data input to RH or RL could cause transient changes in them. Thus, the geometric layout should attempt to minimize coupling to the latch and the completed layout should be checked to ensure that coupling capacitances are sufficiently small. A reasonable criterion to use would be that the current coupled to nodes A and B or RH and RL nodes would not be sufficient to change the direction of voltage change while the thresholds for the RH and RL detectors are being crossed. This will set a lower limit on the rise and fall times of VC, in addition to the lower limit discussed above, but if the coupling capacitance is kept small by good geometric layout, this restriction on rise and fall times should be easy to meet.
