Introduction

Scope of this Report
This report is intended to identify prospects for research on the testing and testability characteristics of asynchronous circuits and to propose a research project on this topic. It is a revised and extended version of the internal report HPCA-ECS-95/03, \Progress Report on Research on the Testing of Asynchronous Circuits", 19th December 1995.
In order to proved background for the discussion of recent work in the area the history, some basic concepts and some signi cant design methods are discussed. This discussion is intended only to provide su cient context for the discussion of testability by indicating the nature and scope of current design methods. It is not intended to be either an exhaustive survey or a comprehensive analysis of the strengths and weaknesses of such methods.
History
For many years the vast majority of digital electronic circuits have been designed to be synchronous. The reasons for the predominance of synchronous design are straightforward:
the synchronization of state changes in a circuit to a global clock minimizes or eliminates the need to check for hazards; there has until recently been little motivation to use the more di cult asynchronous design approaches. Research interest in asynchronous design methods has peaked periodically but since the mid 1980's there has been a sustained renewal in activity. This has been motivated by trends in the implementation technologies which have cast doubt upon the traditional view that synchronous systems require less e ort to implement than asynchronous systems. To some extent it has been enabled by the greater speed of modern CAD systems and their ability to run complex asynchronous synthesis algorithms in reasonable time.
Properties of Asynchronous Circuits
Circuit Structure
Asynchronous circuits are designed to operate without global clock signals, generally under the assumption that the processing of signals (both data and control) within the circuit may be subject to arbitrary delays. They are typically designed as compositions of subcircuits (also referred to as elements or modules) that perform well de ned processing tasks and interconnect with other subcircuits to form the complete circuit. The reasons for this decomposition of the circuit design into modules are the well understood necessity of managing design complexity and the less well understood bene t of localizing communication paths 51].
The processing within each subcircuit is subject to unpredictable delays. Each subcircuit must therefore provide to adjacent subcircuits some indication that it has completed processing and that the adjacent subcircuits may begin processing whatever data it has produced. Conceptually at least there will exist a handshaking protocol between subcircuits which they use to indicate the validity of data being passed from one to another 1 and to sequence their respective processing tasks. Figure 1 shows a simple representation of an asynchronous circuit comprised of three subcircuits connected in sequence. It does not represent many of the attributes of asynchronous circuits including the potential for parallel processing. It shows a clear separation between data and control signals which is a common but not universal practice.
Data in
Data out Two e ects of the necessarily well de ned handshaking protocols used between subcircuits is that they are typically more self-contained than synchronous subcircuits and have better de ned interfaces.
Completed Start
The Advantages of Asynchronous Circuits
Several advantages of asynchronous design have been claimed (in many papers, but summarized concisely in 22]) including:
Higher throughput
Asynchronous subcircuits provide some form of completion signal when they have nished processing. In the common case in which processing times depend on the data being processed this means that the subcircuit will indicate that the next stage of processing can commence as soon as it has nished processing each item. In a synchronous circuit each subcircuit is assumed to always take the same amount of time to process the data and that will be the time required for processing the worst case datum under worst case conditions (of temperature and process parameters for example). An isolated asynchronous subcircuit will therefore achieve average case processing times whereas a synchronous subcircuit will only achieve worst case processing times. In practice the potential speed-up may be not be realized because:
in a pipeline of subcircuits the operand throughput can be no better than the average case throughput of the slowest stage (although this will be faster than the worst case throughput of the slowest stage); asynchronous control logic introduces handshaking overheads which reduce the throughput of each stage. 2 Removal of clock skew problems One of the greatest di culties encountered by designers of large scale synchronous circuits is the distribution of global clock signals within acceptable skew limits to all parts of the circuit. Large amounts of circuit area and power are used by clock drivers and distribution networks. Asynchronous circuits, which by de nition have no global clock signal, do not have this problem. Communication is typically localized and many asynchronous synthesis methods produce circuits that are tolerant of arbitrary amounts of control signal skew.
Lower power consumption CMOS circuits, representing by far the greatest proportion of integrated circuit implementations 14] , consume power at a rate which is approximately proportional to the number of node transitions per second. With synchronous circuits there is a transition at each clocked node twice in each clock cycle, whether or not there is any computational activity in the vicinity of the node. Asynchronous circuits on the other hand tend only to cause transitions in sections of the circuit in which there is computational activity. They tend therefore to have fewer transitions per computation and consume less power. This is especially so when the input data rate is low compared to current typical clock speeds 10].
Modular composition
As noted above most asynchronous design methods encourage the creation of circuits as a set of discrete subcircuits with well de ned interfaces and handshaking protocols. This is because the methods require a strict de nition of communication protocols between adjacent modules. From the point of view of managing design complexity by abstraction this is a very desirable approach. While the same approach can be applied to synchronous circuits, global clocking strategies tend to break down the barriers between subcircuits, giving rise to illde ned or timing-dependent interfaces between subcircuits. A related advantage of asynchronous circuits is that because the circuit is designed to adapt to variable processing speeds of subcircuits, any subcircuit can be replaced by a higher speed design and the circuit will continue to work, possibly with higher throughput, without any changes to surrounding subcircuits. In synchronous systems higher performance can only be achieved by increasing the clock speed and this will only be possible if the slowest subcircuit is redesigned for higher speed. Signi cant increases in clock speed may require the redesign of many subcircuits.
Adaptation to circuit and physical properties
An asynchronous circuit will complete its computation at the speed de ned by the circuit con guration and the electrical and physical properties of the circuit. If any of these properties change (for example, if the temperature rises) the circuit will continue to function correctly whereas if delays vary signi cantly in a synchronous circuit the circuit might fail.
Interface to external events
Digital processors are often required to interface to external asynchronous events.
In synchronous processors this generates the need for synchronizers and arbiters, which are subject to metastability 12] which may lead to failure of the processor. The same problem exists for asynchronous processors but they are by de nition designed to process asynchronous signals and in particular will not fail due to the unbounded delays which can be introduced by metastability.
Lower noise
The fact that asynchronous circuits are free from periodic transitions on large numbers of circuit nodes means that they do not have the large, regular peaks in current on the power supply rails that are characteristic of synchronous circuits. This means that there is less electrical noise in asynchronous circuits which can therefore be designed more aggressively with lower noise margins and higher performance; the spectra of electromagnetic radiation from the circuits do not have large peaks at a single frequency and its harmonics. Radiated energy is spread more uniformly over a range of frequencies.
Easier testability
Some asynchronous design styles produce circuits which are naturally self testing for certain classes of faults (especially stuck-at faults (SAFs)). This issue will be addressed in more detail in section 4 of this report. There is increasing skepticism about the potential speed advantages of asynchronous circuits and it has been suggested that the clock routing problem of synchronous circuits is no worse than the increased complexity of asynchronous design methods. However lower power consumption, lower noise and the modular composition style are seen as distinct advantages of asynchronous methods.
Principles of Asynchronous Circuits
Early Methods
Formal methods for the design of asynchronous circuits have been in existence since the early work of D.A. Hu man in the mid 1950's 54]. Hu man modelled asynchronous circuits as a combinational logic block with a set of feedback wires ( Figure 2 ) and used a ow table method to map the transition of the circuit through its stable and unstable states. The circuit is constrained to operate in the fundamental mode { inputs may not change until all outputs and feedback variables have stabilized after the preceding input change. The assignment of state variables in the Hu man model must be done so as to eliminate critical race conditions. Operating speed constraints imposed by the fundamental mode assumption and exponential dependence on the number of state variables make the model impractical for many purposes.
A more general model is that of Muller 43] in which an asynchronous circuit is modelled as an arbitrary connection of combinational gates, each of which has an unbounded, but nite, delay at its output. This means that the output signal at each gate is subject to an arbitrary delay which has no speci ed upper bound, although it is guaranteed that the output will appear eventually. Muller explicitly models all signal transitions and all as one which exhibited one nal behaviour from an initial state, irrespective of the size of gate delays. He also de ned semi-modular circuits, as a subclass of speed independent circuits, as those in which any transition, once enabled, is never disabled before it occurs. It happens that semi-modularity is an easier property to check for and design for than speed independence. Most modern methods are based more on the Muller model then the Hu man model although many of them generalize the model still further to allow delays in interconnecting wires. The necessity to guarantee complete freedom from hazards tends to make the synthesis procedures for Muller circuits (now generally called speed independent circuits) complex and compute intensive. Note that each of the subcircuits represented in Figure  1 could be designed as either Hu man or Muller circuits or using some other model.
Timing Models
A key distinguishing characteristic of modern asynchronous design methods is the model of signal propagation timing on which they are based.
Self-timed circuits
The broad concept of a self-timed circuit in the context of VLSI design was introduced by Seitz in 1980 51]. Self-timed circuits are circuits composed of self-timed elements. A self-timed element is sub-circuit which performs some processing on input data and provides some form of completion signal which indicates that the processing has been completed. This signal can be used to initiate handshaking to transfer data to or from adjacent self-timed elements. Seitz's motivation for composing systems in this way is to localize communication in order to minimize the clock-skew problem.
Within self-timed elements and between adjacent elements interconnection delays are assumed to be negligible. Regions of the circuit which are su ciently close together that interconnection delays are negligible compared to gate delays are called equipotential regions. It should be noted that self-timed elements may be internally synchronous but communicate using asynchronous protocols based on the completion signal.
There are various methods for producing the completion signal. The two most common are:
Encoded data
The completion signal is derived from the processed data. The popular method of dual rail encoding uses two wires to represent each bit of data. The two states in which the wires have complementary logic levels represent a bit value of 0 or 1, whereas that states in which the two wires have the same value represent an idle (also known as space) or invalid state for the bit (see Table 1 ). Input bits are initially set to the idle state. Processing commences when all of the input bits are set to either 0 or 1, at which time all output bits are set to idle. Completion of processing is indicated when the two wires for each output bit are set to complementary levels. There is an extensive body of theory on the assignment of codes to represent data and the detection of valid and invalid states 56]. Such encoding provides a robust method of producing completion signals but requires approximately twice as much logic as the single wire encoded logic.
Model delays
The processing delay through each element can be modelled with a single delay network (or model delay) such that the delay through the network is the same as the delay through the data processing logic. In other words the delay from the Start control signal to the Completion signal is customized so that it is no less than the delay through the data path (see Figure 3 ). This is much more e cient in terms of circuit area but requires some design e ort to match gate delays. This approach assumes that the propagation delay for data from one element to the next is the same for each data line (the bundled data convention) and that the completion signal which indicates the availability of data to the next stage is not received at the next stage until after all data lines are stable. In practice this means increasing the model delay su ciently to ensure that the completion signal arrives at the succeeding element after the data.
Speed Independent circuits
Speed Independent (SI) circuits are circuits whose behaviour is independent of delays in gates used in the circuit, as described by Muller 43, 39] . Circuits are invariably modelled as an interconnection of logic gates in which each gate consists of an ideal logic function followed by an unbounded delay. Interconnection delays between gates are assumed to be zero.
Delay Insensitive circuits
Delay Insensitive (DI) circuits are those whose behaviour is independent of both gate delays and interconnection (wire) delays. This is considered to be perhaps the purest theoretical model for asynchronous circuits. As gate delays decrease and interconnection delays become more signi cant it is widely held that the DI model is more appropriate than the SI model for VLSI systems.
On the existence of DI circuits.
Unfortunately the range of circuits which can conform to the strict DI model is very limited. Martin 35] argues that the only closed circuits 1 that can be DI are those composed entirely from Muller C elements and inverters. The range of DI circuits can be extended by adopting the isochronic fork assumption which states approximately that at forks in wires the delay in propagation signals from the fork to their respective gate inputs is the same for all signals. Such circuits are called Quasi-Delay Insensitive (QDI).
Seger 50] adopts a di erent approach to show that three very restrictive conditions apply to DI circuits and that therefore the range of DI circuits is very restricted. This result applies to non-closed circuits operating in fundamental mode.
Leung and Li 33] identify 6 restrictive properties of DI circuits and show that DI speci cations that possesses one of these properties can only be constructed from gates that all possess that property. In other words if a speci cation does not possess any one of the properties it can only be realized with a set gates such that for each property it lacks, at least one gate must also lack that property. They go on to show that arbitrary DI speci cations can be realized using a small set of gates, including some unconventional ones. They explain how their analysis of DI behaviours di ers from that of Martin and Seger but their explanation of the di erent results is unconvincing. There may be more subtle properties of their models of DI behaviour which they have not addressed in their comparison.
Delay Models
The modelling of SI and DI circuits requires the use of a model which describes the nature of delays. It turns out that the properties of the circuits depend critically on the assumed nature of the delays. Two models are in common use:
Pure delays are devices with one input and one output. Each input transition is replicated at the output after some period of time although transitions may not be re-ordered as a consequence of the delay. This model is unrealistic in the sense that practical gates will not transmit a pulse caused by two transitions very close together whereas the model guarantees that every transition will be at the output irrespective of the proximity of successive pulses. Indeed pulses may be arbitrarily lengthened or shortened by a pure delay.
Inertial delays are often used in preference to pure delays to model the fact that practical circuits will not respond to two transitions which are very close together. The inertial delay model is one in which input transitions are replicated at the output after some period of time unless two transitions occur at the input within some de ned period, in which case neither transition is transmitted. A common form of the implementation of the inertial delay model is the one in which the transmission delay for transitions is the same as the threshold for cancellation. In other words when a transition appears at the input that transition will appear at the output after unless a second transition occurs at the input beforehand. The inertial delay model is di cult to incorporate into many SI and DI circuit models. A related but more tractable model is the Pure Chaos Delay Model in which transitions are queued within the delay element and adjacent queued transitions may annihilate one another at random 4].
Handshaking Protocols
Self-timed modules must produce completion signals to indicate the availability of data to adjacent modules. It is necessary not only for the module producing data to indicate that it has data available, but also for the receiving module to indicate to the producing module when it has received the data, so that the producing module can remove or change the data (at the end of its next computation). So two signals, one in each direction, are required for the self-timed transfer of data from one module to another. The completion signal from the producing module to the receiving module is conventionally called Request and the return signal from the receiving module is called Acknowledge (Figure 4 ).
It should be noted that if the data is encoded as described in section 2.2.1 then the completion/request signal is implied by the transition of the data lines from idle to valid states, and there will be no separate Request connection.
At a minimum, one transition is required on each wire to indicate the availability of and reception of data. This is known as a two phase or Non-Return to Zero (NRZ) handshake protocol ( Figure 5(a) ). Alternatively each data transfer can be accompanied by Figure 4 : Request/acknowledge handshaking two transitions on each of the Request and Acknowledge wires, so that each wire returns to the initial state at the end of each data transfer. This is a four phase or Return to Zero (RZ) handshake protocol ( Figure 5(b) ). With the four phase protocol it is possible to vary the times at which data becomes valid and invalid with respect to the two handshake signals. These variations are known as early and late protocols and each may o er some speed advantage in di erent circumstances. It would appear at rst that the two phase protocol o ers greater data throughput but in some circumstances at least the control circuits required are signi cantly more complicated than for a four phase protocol so the four phase protocol works faster 15]. 9 
Data
Request
Acknowledge
Data is valid One cycle
Data has been received and can be changed
Some Recent Asynchronous Design Methods
The most recent resurgence of interest in asynchronous design commenced in the early 1980's. Since that time many asynchronous design methods with varying amounts of underlying formal theory have been developed. The most signi cant developments (in so far as that can be judged at the present time) are discussed below. In each case an outline of the design method is presented followed by a brief discussion of its signi cance.
Micropipelines Outline
In Ivan Sutherland's Turing Award lecture 52] he introduced the micropipeline structure { an asynchronous elastic FIFO structure with the ability to transform the data between FIFO stages. Important features of this design style are:
It is strongly oriented toward (but not constrained to) linear pipelined processing of data. Data lines and control signals, which are used to co-ordinate the transfer of data between pipeline stages, are clearly separated. Transfer of data between stages is e ected by a two phase (non-return to zero) request/acknowledge handshake on two control wires between adjacent stages. The transformation logic at each stage is self-timed, with the completion signal being used in the handshake with the following stage. Sutherland's examples use model delays and the bundled data convention but dual-rail encoded data can be used. Figure 6 is a diagram of a three stage micropipeline using the bundled data convention. The micropipeline is a simple and elegant model for parallel asynchronous processing of data in strictly sequential order. Sutherland's work has formed the basis for several data path designs. The most signi cant application of the technique is the AMULET processor, a micropipelined implementation of the ARM architecture developed at Manchester University 20] . The rst implementation of this design was fully functional. It operated approximately 30% slower, occupied 100% more area and consumed 15% less power than the synchronous ARM6 processor which it emulated. The second implementation is to include several improvements to their design approach, including optimized data latches and four phase control circuits 15] and is expected to be 4 times faster and consume signi cantly less power than the rst.
Methods for Speed Independent Circuits
Signal Transition Graphs Outline
Signal Transition Graphs (STGs) 13] were devised by Tam-Anh Chu as a graphical speci cation mechanism for speed-independent asynchronous circuits. 2 In his thesis Chu describes the syntax of STGs and presents algorithms for processing the STG into supposedly hazard-free SI logic implementations of the speci cation.
STGs are interpreted Petri Nets (connected graphs comprising places and transitions), in which transitions represent changes in circuit signals from 0 to 1 or 1 to 0 and places represent conditions in which signal changes represented by transitions leaving the place may occur (after some delay). Hereafter the term transition will be used to refer to both Petri net transitions and signal changes, as for present purposes they are equivalent.
In order for a transition to be enabled each place that it leaves must contain a token and when the transition occurs ( res) a token is transferred into all places to which it is incident. Thus a partial ordering of the occurrence of transitions is enforced by the construction of the graph. STGs are able to specify sequential, concurrent and/or alternate transition ring sequences.
The distinction between the enabling (by all input places of a transition becoming marked) and ring (thus marking output places) of a transition represents the arbitrary delay between the input variables of a gate being set so as to change the output of the gate and the output of the gate changing. Figure 7 (a) shows a Petri net. the horizontal lines represent transitions, the circles represent places and the dotted circles are marked places. STGs are usually represented in a simpler form than the Petri-net on which they are based. Places are not shown unless there are several transitions following the place and transitions are represented by just the name of the signal transition which they represent. The initial marking of the STG is shown by placing circles on the arcs connecting transitions, where the initially marked places would be if they were shown. Figure 7 (b) shows an STG based on the Petri-net in Figure 7 (a).
The circuit designer speci es the behaviour of the circuit by constructing an STG subject to the syntactic constraints of liveness, safeness and complete state encoding (CSC) A free choice Petri net is one in which any place from which there is more than one exiting transition is the only place from which each of those transitions leaves. (In a general (Petri net each place can be followed by several transitions and each transition can leave several places). The purpose of this restriction is to ensure that at any marking of the STG in which two alternate transitions are enabled a completely free and non-deterministic choice can be made between those transitions. A live STG is one which is fully connected, in which every cycle of transitions contains only alternating positive and negative transitions of a single variable, and in which each transition may re in nitely often. A safe STG is one which is initially marked such that there is no transition ring sequence which will allow any place to contain more than one token. The safety property is proposed in order to guarantee that no transition is able to re twice in immediate succession. The CSC property in essence guarantees that the labelling of transitions is consistent with the allocation of outputs as state variables, and that no two distinct states of the resulting circuit have the same state vector, unless they can be distinguished by di erent input transitions.
The STG in Figure 7 (b) describes the behaviour a Muller C element 43] with inputs a and b and output z. a+ represents a positive (0-1) transition of signal a and a? is a negative (1-0) transition. The STG speci cation shows that from the initial condition either a+ or b+ may occur { they are concurrent transitions which may occur in either order. After both a+ and b+ have red z+ is enabled and after z+ has red both a? and b? are enabled. The speci cation in this case assumes that the inputs a and b provided by the gate's environment are constrained so that after a transition on one input there will not be another transition on that input until after an output transition. The principle of constrained (or de ned) behaviour of the environment in the synthesis of asynchronous circuits is important and if an environment violates its de ned behaviour this may cause the circuit to be non-SI.
Chu's method transforms the STG automatically into a State Graph, in which each node represents a valid state (combination of output variable values) of the circuit. From this, Karnaugh maps for the output variables can be constructed and combinational logic subcircuits devised to implement them. It should be noted that because of the ability of STGs to represent concurrent signal transitions (multiple places following a transition) the State Graph can contain exponentially more nodes than the STG. Chu's method provides for contraction of the STG into several STGs with fewer signals than the original so that the synthesis process for each contracted STG can be computationally much simpler.
In processing the STGs Chu's algorithms include steps to ensure persistency as a means of guaranteeing hazard-free implementation. Informally, persistency in an STG means that for any transition which results in the enabling of a second transition, the complementary transition of the rst signal may not occur until after the transition of the second signal has occurred. Figure 8 shows a fragment of an STG illustrating a non-persistent transition, b+. After a+ res, enabling both c+ and b+, a? may re (removing the enabling condition for b+) before b+ res. This may result in a valid transition on b, no transition or a runt pulse, depending upon the implementation technology. 
Signi cance
Chu's work is important because it has formed the foundation for later work on STG based synthesis methods. As described below Chu's method is based on some assumptions which are more restrictive than necessary so his method is not applied in its original form. However his development of a compact and intuitive graphical speci cation mechanism for asynchronous circuit behaviour and his approach to circuit synthesis from this form of speci cation are fundamental contributions. 13 
Re ned STG Models
The basic STG based synthesis method developed by Chu has been subject to further development by several research groups. Moon 40] and Lavagno 30 ] considered Chu's conditions for hazard-freedom concluded that persistency is neither necessary nor su cient for hazard-free implementation of an STG speci cation.
The insu ciency of persistency arises from the fact that the boolean function specied by the Karnaugh maps derived from State Graphs may be subject to a hazard-prone implementation, notwithstanding persistency. In e ect Chu had assumed that any implementation of the logic functions would not be subject to hazards.
The lack of necessity of persistency can be illustrated by re-considering Figure 8 . If a+ res, enabling b+, the transition c+ must re before a? can. It is possible that c+ has the e ect of enabling b+ thus allowing a? to occur without disabling b. Such a situation may occur, for example, when a and c are inputs to an OR gate of which b is the output. A more appropriate expression of the condition that a transition should be disabled before it res is semi-modularity which, informally, states that once a transition is enabled only the ring of that transition can disable it (see section 2.1). Chu was too conservative in assuming that a transition which enables another is the only one which can keep it enabled.
Yakovlev 57] criticizes more of Chu's restrictions on STG structure. He demonstrates that reasonable circuit behaviours can be usefully described by STGs which are not safe, are not free choice nets and have non-binary state encoding. 3 A signi cant problem with the STG based design approach is the potentially very large size of the State Graph which can be generated from an STG exhibiting high levels of concurrency. Vanbekbergen et al. 55 ] have proposed a method for ensuring Complete (or the stricter Unique) State Coding condition of STG speci cations without resorting to the state graph. Extra arcs (constraints) may be added to the STG to do this, reducing the potential concurrency of the speci cation. While the computational complexity of their algorithms are a signi cant improvement on Chu's state graph based methods for enforcing USC, there are restrictions on the types of STGs to which they can be applied.
Kondratyev and Taubin 29] present a method for verifying the speed independence directly from the STG, avoiding Chu's State Graph based checks for persistency. Meng 38] introduced a higher level guarded command speci cation for asynchronous circuits which are translated automatically into STGs. She described a method for automatically adding the weakest constraints to the STG which would ensure semi-modularity. The STG is then automatically translated into the corresponding state graph from which boolean expressions for the output signals are derived. These expressions are realized using circuits consisting of C elements or SR ip-ops as state holding elements, with combinational logic producing the inputs to the state holding elements. Meng, like Chu ignored the problem of ensuring that the combinational logic is hazard-free. Lavagno 30, 31] has developed an STG-based bounded delay synthesis method which produces an implementation in the form of two level combinational logic to generate set and reset signals for a storage element. This two level implementation is analyzed to identify any hazards. It is then transformed into a multi-level AND/OR gate implementation in such a way that its hazard characteristics are preserved. Finally, gate delays are analyzed and extra delays are introduced into the circuit to eliminate all hazards. Beerel 6] has developed a synthesis method based on a state graph speci cation (which can be derived from an STG or other form of behavioural speci cation). His method produces circuits comprising hazard-free networks of AND/OR and C gates based on \excitation regions" for signals derived from the state graph. The resulting circuit can be relatively ine cient and Beerel has developed some optimization techniques, some of which are guaranteed to preserve the hazard-free characteristics of the circuit and some are not. For the latter cases Beerel has developed an automatic veri cation algorithm which is used to check whether the circuit continues to be SI after the optimizations have been done. The algorithm is pessimistic in the sense that it is theoretically possible for it to report that hazard-free circuits are not, but this is evidently not a problem in practice.
STG-based Synthesis Methods
Myers 44] has a synthesis method for bounded delay circuits which is a variant of SI STG methods. He uses an STG annotated with estimates of circuit delays to nd minimum and maximum timing constraints between pairs of transitions. Some of these constraints then prove to be redundant. A reduced state graph is constructed and expressions are derived for CMOS pull-up and pull-down networks to implement the state variables. Circuit delays can then be checked against original estimates and the circuit resynthesized if necessary. Myers' algorithm is heuristic in part but he has demonstrated the correct implementation of several speci cations in a form which is smaller and faster than the equivalent SI implementation.
Signi cance
Various research groups have addressed de ciencies in Chu's original method, proposing a number of solutions to the problems of excessively conservative constraints on the speci cation; implementation of hazard free circuits from boolean equations derived from State Graphs; achieving consistent state assignment; exponential complexity of the State Graph processing algorithms It is beyond the scope of this report to analyze in detail each of the variations to Chu's method. However it can be observed that no single method has e ciently solved all of the problems or is recognized as being superior to all others. The hazard-free logic implementations tend to either be complex or have poor testability characteristics (or both). Tabrizi 53] ponts out that many of the supposedly hazard-free synthesis methods ignore the problems caused by inverters at the inputs of AND/OR gate implementations. The only automated design examples which have been reported are small (of the order of hundreds of variables). 15 3.3 Methods for Delay Insensitive Circuits 3.3.1 Trace Theory Outline Ebergen 18] has proposed another method for the speci cation of Quasi-DI circuits based on trace theory. The behaviour of the circuit is speci ed as a sequence of signal transitions (Ebergen does not specify the polarity of the transition) composed into complex sequences using operators for repetition, alternation and concurrency.
A trace speci cation of the behaviour of the Muller C element whose STG is shown in Figure 7 is
Here the question mark su x for a and b indicates that they are input signals, while the exclamation mark indicates that c is an output. The weave operator (k) indicates that the transitions on a and b may be concurrent, but that both must occur before a transition on c. This sequence may be repeated inde nitely. The pref operator is a formality indicating that all pre xing subsequences of the above sequence are valid for the circuit as well.
He de nes grammars comprising these operators and signal names and shows that if a trace speci cation conforms to certain grammar classes then the circuit produced will be DI (that is, will behave according to the speci cation irrespective of gate and wire delays, except for the isochronic fork assumption.) He presents methods for decomposing the speci cation into equivalent elementary trace sequences and presents standard gates for implementing those elementary sequences in DI or Quasi-DI form.
There has been no reported application of Ebergen's method to the design of practical asynchronous circuits. The reason for this may be that the speci cation mechanism is cumbersome and non-intuitive, particularly because it requires explicit and complicated speci cations of parallel behaviour.
Trace speci cations have been used by various researchers as a theoretical tool for the analysis of asynchronous circuits 11]. Dill has used trace theory to produce an automatic verifying tool for SI and DI circuits which has been used by other researchers to verify their own work and to discover aws in the work of others 21]. Martin 36] has developed and applied an asynchronous design methodology which produces Quasi-Delay Insensitive circuits from a relatively high level textual description based on Hoare's Communicating Sequential Processes 24] . The language features a guarded command style of speci cation and explicit communication primitives between concurrent processes. To illustrate the style of speci cation, a common construction is the following:
Communicating Hardware Processes Outline
Where the G i are boolean expressions called guards and the S i are statements consisting of sequences of actions. The construction speci es the in nite repetition of the process select some i for which G i is true and execute S i (i.e., non-deterministic repetition).
Martin's method involves a number of compilation steps which perform rule-based decomposition of the high level process speci cation into simpler concurrent communicating processes. Communication operations are transformed into signal handshaking sequences and the result is a set of production rules specifying signal ring sequences and preconditions for these to occur. Subsets of production rules are translated into standard circuit implementations. For reasons of e ciency Martin has chosen that the basic circuit elements are constructed as CMOS switch networks. This method has been used to produce what is accepted as the world's rst asynchronous microprocessor 37]. The 1:6 version of the processor ran at 18 MIPs but it may be signi cant that there were some unexplained errors with certain combinations of external memory delays.
As noted earlier Martin has argued that the class of truly Delay Insensitive circuits is very limited 35]. In essence his argument is based on the Acknowledgement Theorem which states that in a DI circuit each non-nal transition at the input of a gate must have a successor (which in this context means that there must be a transition at the output of the gate before the transition can be disabled). AND gates and OR gates can be easily shown to violate this condition and in fact inverters and (Muller) C-elements are the only gates which do not. A purely DI circuit can be constructed only from these two gates.
If we allow the existence of isochronic forks then any signal that is used as an input to an AND or OR gate and might otherwise have a transition which is not acknowledged can be forked to another gate which will acknowledge that transition. Provided we accept acknowledgement by the second gate as acknowledgement of the input to the rst gate (which is valid if the delays to both gates are the same) then the transitions at both gates are acknowledged and the circuit is Quasi-DI.
Signi cance
Martin's work is highly respected within the asynchronous research community. The method he has developed appears to be comprehensive and robust but most publications from the group have concerned the theoretical aspect of the design method and there have been few reported fabrications. Martin's approach has attracted some criticism for having a clumsy speci cation language, for being constrained to CMOS gate implementations and for use of the isochronic fork assumption 9]. This method, like most of the SI methods, su ers from hazard problems due to inverters at the inputs to complex gate structures 53].
Handshaking Circuits Outline
Van Berkel and others have based a synthesis method for DI circuits on the programming language Tangram, a language not unlike C in its syntax and range of control structures. Tangram circuit speci cations are compiled into handshake circuits comprising elementary circuit components which are interconnected to allow communication and synchronization. The circuit components implement sequencing, repetition, parallelism and alternation. A gate level library of component implementations is used to translate the handshake circuits into hardware. Figure 9 shows the composition of handshake circuits which would be used to implement the execution speci cation S 0 ; (S 1 k S 2 ) where, as with other methods k indicates concurrent execution and ; indicates sequential execution. The circles are handshake components which in many cases can be implemented directly in hardware using cell libraries. The S i are executable statements which will in general also be decomposed into handshake components. The open and closed circles on the inputs and outputs of handshake components indicate di erent styles of communication handshaking. 
Signi cance
The handshaking circuit approach has been used to design two asynchronous chips (44,000 and 111,000 transistors in 1 and 0.8 CMOS respectively) for a commercial low power consumer product 48]. There are few published results from which to draw any inferences but it appears that the approach incurs a heavy area penalty, although the power savings are very signi cant. The theory underlying this approach bears some similarities to Martin's although the implementation is in the form of standard cells rather instead of customized CMOS gates. Lavagno 31] and Myers 44] have both developed STG-based methods for synthesis of asynchronous timed or bounded delay asynchronous circuits (i.e., circuits in which the delays through gates are known lie within de ned bounds, and which are guaranteed to function correctly provide the gate delays lie anywhere within those bounds). Although not speed independent, these methods were described in section 3.2.3 because they are derived from STG-based SI techniques.
Synthesis of Timed Asynchronous Circuits
STG-based Methods Outline
Signi cance
Myers' method has the advantage over Lavagno's that it uses delay estimates in the speci-cation of the circuit and this enables the elimination of redundant sequencing constraints early in the synthesis procedure, while Lavagno does not use the delay information until after extraction of the expressions for the state variables. The disadvantage of Myers' method is that involves a number of heuristics and is therefore not provably correct.
The rationale driving the development of timed synthesis techniques is that by using the extra knowledge of realistic gate delay bounds more e cient and faster circuits can be implemented, as conservative transition sequences are no longer needed. Myers has demonstrated some speed improvement over SI circuits, while Beerel 6] has demonstrated that his SI method produced a range of circuits circuits that are on average 25% faster and 5% smaller than Lavagno's. The reason for this is that Lavagno's method requires the addition of delays to eliminate hazards.
Event Controlled Systems Outline
A group at the University of Adelaide has taken an approach to the design of asynchronous circuits which is intended to provide a circuit designer with an intuitive mode of expression of circuit behaviour { the temporal speci cation (TS) { and uses timed two phase control circuits for speed and e ciency.
Morton et al. contend that the current formal methods, including STGs, Communicating Hardware Processes, Trace Theory and Handshaking Circuits, provide a non-intuitive mode of expression for circuit designers. They either require a too abstract speci cation requiring the designer to explicitly specify parallel behaviour in the system, or a too low level speci cation, requiring the designer to enumerate each signal transition. The more formal methods tend to produce large and ine cient circuits. The TS system allows the designer to specify sequences of signi cant events in the circuit, including data availability and completion signal generation. The syntax bears some similarity to guarded commands.
The design approach, called Event Controlled Systems (ECS) encourages the specication of module Event Controlled Unit (ECU) behaviour in terms of TSs. For example the TS @z > @a:@b speci es that the event @z (i.e., a transition of the signal z) must follow events @a and @b. This is the Muller C element or last gate as it is known in ECS terminology. Another example: z > @a U @b This speci es that the signal z will be set true after @a until after @b. Given a TS in the form of a sequence of statements involving standard operators such as last and until, ECUs are synthesized from libraries of standard gates which implement the operators and the ECU modules are composed into a system in a fashion very familiar to VLSI designers.
Signi cant features of the ECS methodology are the rigid separation of control and data variables in the circuit and the modelling of each in the temporal domain.
ECS has been used to design an asynchronous Fast Fourier Transform processor 41] and an asynchronous microprocessor is being implemented 42] although no results from fabrication have been reported yet.
The Temporal Speci cation is only a convenience and has no supporting theoretical framework. Some tools for automatic synthesis have been developed but the tool set is far from complete. The value of the ECS/TS approach appears to be its design philosophy. The proponents claim shorter design times and predict better speed/area ratios than achieved through more formal methods but these claims have yet to be substantiated through fabrication.
Summary
There is a range of asynchronous design methods which have been developed over the past decade. Two distinguishing characteristics of these methods are the strength of the underlying theory and the extent to which the e ectiveness of the method has been proven through fabrication. A dichotomy is evident among asynchronous research groups between those principally concerned with highly formal techniques using abstract models of circuit behaviour and those concerned with practical implementation techniques. Few (perhaps only Martin's CHP group and the Handshaking Circuit group) cross the boundary, although there are signs of recognition of a need for a mixed approach.
It is clear that there is no universally accepted asynchronous design method, nor is there likely to be while the dichotomy persists. While the theoretical methods are con ned to relatively simple problems and models of circuit behaviour and while the more practical methods lack a rm theoretical basis for proving the correctness of the circuits the prospect of a design methodology supported by robust CAD tools, including provision for testability design and analysis, are poor. There is ongoing work on the development of most of these design approaches, including some work on the analysis of testability properties.
Test Methods
Introduction
It is recognized that if asynchronous design methods are to be a realistic alternative to synchronous design then it will be necessary that such methods will be able to produce VLSI circuits which are at least as readily testable as synchronous circuits.
Test methods for synchronous circuits have been studied for many years and there are some methods which provide adequate testability, most notably Scan Path and quiescent supply current (I DDQ ) methods. However many of these methods rely upon the synchronous nature of the circuits. Asynchronous circuits with their distributed control logic and less structured use of storage elements are less suitable for partial scan techniques. I DDQ testing can only be used in asynchronous circuits if the circuit can be halted in a useful range of distinct states. For this reason several of the research groups that have been active in the development of asynchronous theory and design methods have also addressed the issue of the testability of asynchronous circuits, including analysis of their self-testing characteristics and built-in test techniques. An encouraging result emerging 20 from this work is that asynchronous circuits possess certain characteristics that make them self testing for certain fault classes. Hulgaard 25] presents a review of recent work in this area. The most signi cant areas of research are discussed below.
Delay Fault Testing
While asynchronous circuits are ideally delay-insensitive and hence immune to variations in signal propagation delays, in practice this is not achievable and most of the current methods make some assumptions about delays in order to synthesize circuits which behave correctly. In order to test such circuits it may be necessary to verify signal propagation delays.
Devadas and Keutzer 17, 16] present an extensive set of results concerning the ability to test combinational logic for delays. A circuit is Path Delay Fault Testable (PDFT) if each path within the implementation of the logic function can be tested to measure the propagation delay of signals. For each path this requires the application of two test vectors. The rst will propagate a known logic level to the output at the end of the path. The second will propagate a transition along the path resulting in a transition at the output. The delay is veri ed by measuring the time between application of the second vector and the output transition.
A circuit is Robust Path Delay Fault Testable (RPDFT) if the path delay tests can not be invalidated by the presence of hazards. Devadas and Keutzer show that a combinational logic circuit is RPDFT if it is a prime irredundant implementation of a function. Unfortunately from the point of view of testing asynchronous circuits, redundancy is often added in order to eliminate hazards.
There is an increasing amount of work on delay fault testability being reported in the literature.
Micropipelines
The approach taken to testing circuits based largely on micropipelines is signi cantly di erent to that used for less structured asynchronous circuits.
Khoche and Brunvand 28] describe a technique for testing micropipelines by modifying the C-elements in the control logic and the pipeline latches to accommodate a scan path approach. With scan in and scan out paths at each stage of the pipeline the data path logic between stages can be tested using well established combinational logic test procedures. The control logic is shown to be readily tested for SAFs.
In the case that a micropipeline employs the bundled data convention it is necessary to test for faults which can cause changes in the propagation delays of the data and control paths. This requires the application of PDFT techniques and in particular the application of two test vectors in immediate succession to the input of the circuit. In the context of scan path based testing this means that scan latches must be modi ed to hold two values with the ability to synchronously apply the second value to the circuit under test.
The AMULET group 47, 46] has analyzed the fault coverage in micropipelines through pseudo random pattern testing applied via scan paths. Their approach allows the response of combinational logic within the micropipeline to be observed as the test patterns are shifted through the scan path, signi cantly reducing the scan test time for a given fault coverage.
The testing of micropipeline data paths appears to be a solved problem, although there is no doubt scope for development of more e cient techniques. The techniques for testing micropipeline control logic have so far been ad hoc (as the design of micropipeline control logic tends to be) and there is scope for development of more systematic and e cient techniques.
Delay Insensitive Circuits
Strictly delay insensitive circuits are known to be self-checking for all stuck-at faults which don't cause the circuit to enter an invalid state. This is a direct consequence of Martin's Acknowledgement Theorem; in a DI circuit unless a signal is redundant a SAF will prevent a transition on this signal, preventing acknowledgement of the signal and hence any further computation. The fault will be observable at some output in the form of an expected transition at that output being delayed inde nitely.
Given that control circuits tend to operate in relatively small cycles a SAF in a control circuit that causes the circuit to halt is likely to be observed quickly although not necessarily before erroneous control sequences have been produced. It is di cult to generalize about SAFs in the data path but possibly with the assistance of scan path techniques it is usually easy to partition the data path into block of combinational logic and to apply well established techniques. Dual rail encoded data paths will cause control circuit errors in the presence of SAFs.
Given that the only really practical class of DI circuits is that of circuits including isochronic forks the simple result above does not hold in practice. Faults on \unacknowledged" branches of isochronic forks may not be detected. This considerably weakens the above result and leads to a requirement to consider the characteristics of Quasi-DI circuits in more detail. Hazewindus 23] has shown that in the class of Quasi-DI circuits produced by Martin's synthesis method all stack-at faults that inhibit transitions will cause the circuit to halt, based on the principle described above. Faults that cause premature ring of transitions are testable if the premature ring is stable and if it propagates to an observable output. Control and observability points can be added to ensure this. Roncken 48] has included high level statements in the Tangram language to automatically include scan paths. In 49] she describes modi cations to some of the handshaking circuit components to enhance control circuit testability. The handshaking components responsible for creating parallel and repetitive behaviour in the circuit are modi ed so that in test mode repetitive procedures are executed once while synchronized parallel procedures are unsynchronized. These changes are necessary to preclude the possibility of deadlocks and livelocks (in nite loops) in faulty circuits.
Speed Independent Circuits
Design Method Speci c Results
Several researchers have adopted the approach to testing asynchronous circuits that provided the circuits can be subdivided into combinational logic blocks separated by scannable storage elements it is su cient to be able to test the combinational logic for stuck-at and (possibly) path delay faults. While testability properties of combinational logic are well understood from work conducted under the synchronous design paradigm, with asynchronous circuits we need combinational logic which is both fully testable and hazard free.
The STG based synthesis method of Lavagno 30 ] requires a hazard free implementation of two level logic functions and may include xed delay elements. To test such circuits it is necessary ensure that the delay paths within the hazard-free combinational logic can be veri ed. In 27] Keutzer, Lavagno and Sangiovanni-Vincentelli show that transformations can be performed on two level hazard-free logic implementations to produce a multi-level implementation which is both hazard-free and Robust Path Delay Fault Testable. With the inclusion of suitable scan paths the combinational logic can therefore be delay fault tested and the operation of the asynchronous circuit veri ed.
In 32] Lavagno et al. show that by phase splitting (allowing literals and their complements to be independently controlled at the input to combinational logic functions) any combinational logic can be converted into a positive unate, hence prime irredundant, hence RPDFT circuit without requiring any extra logic. This approach does necessitate extra complexity in the scan path logic. Nowick et al. 45 ] describe a method for synthesizing hazard free multi-level asynchronous circuits from two-level circuits which are fully SAF testable. The method sometimes requires the addition of extra inputs because it is not always possible to eliminate redundancy.
These results imply that combinational logic blocks for realizing state variables in asynchronous circuits can be implemented in such a way as to be hazard-free and delay path testable. If all storage elements in an asynchronous circuit are designed for scan path testing then the circuit can be fully tested for delay and stuck-at faults. While this is an important result not all asynchronous circuits t this model without trivializing the combinational logic connecting storage elements. In other words the interconnection of storage elements may be so unstructured as to make the testing of interconnecting combinational logic through scan paths impractical.
General Properties of SI Circuits
In a series of papers 5, 8, 4] Beerel and Meng analyze the properties of semi-modular and speed independent circuits and produce a number of results about the testability of circuits with assumed structures:
1. For circuits composed of combinational logic blocks that contain a Muller C element in each feedback loop 5]: (a) live semi-modular circuits must be irredundant, speed independent and hazard free and vice versa; (b) such circuits are self-diagnostic with respect to all single and multiple SAFs. This is a consequence of the fact that these circuits halt in the presence of a SAF. 2. For strongly connected circuits composed of only AND, OR and C gates 4]:
(a) live deterministic speed independent circuits are semi-modular and hence, by application of the results of Varshavsky 56] such circuits are totally self checking with respect to single and multiple gate output SAFs which are non-exitory (those which do not cause the circuit to leave the set of valid states);
(b) such circuits are totally self checking with respect to single and multiple gate output exitory SAFs provided that no signal has more than one transition in any cycle of states; 3. under a limited range of conditions (expressed in terms of the e ect of faults on state sequences) timed circuits (that is, circuits in which gate delays are known within upper and lower bounds) are single SAF testable 8]. The three papers concerned have a similar theoretical basis, with 4] having the most re ned formalism (and best de ned results). In each case the authors report a tool which automatically checks for testability by establishing whether circuits are semi-modular or not. The tool, which is intended to a provide feedback loop in the design process (see section 3.2.3), has been used to check circuits of around 1000 states and is reported to give results consistent with Dill's veri er.
Beerel subsequently showed by counter-example 7] that results 1(b) and 2(b) are incorrect. In fact it can be stated with certainty only that semi-modular circuits (subject to Beerel's assumption about the construction of the circuit) are totally self checking with respect to non-exitory faults. He speculates that if the behaviour and structure of the circuit satisfy a set of three conditions (together called the Strong Cycle Condition) then it will be self checking with respect to exitory faults.
Beerel's results are consistent with those of Hazewindus and are important in that they establish the tendency for SI/semi-modular/QDI circuits to be self checking with respect to SAFs. They are limited in scope however. Beerel's assumptions about the construction of the circuit are reasonable for many of the design methods described earlier but are certainly not universally valid. The greater limitation of these results is that they only apply to non-exitory faults.
Al-Assadi at al. 2] have shown that in the presence of non-stuck-at faults semi-modular circuits may cycle within a set of valid states. This is a straightforward extension of their work on the characterization of faults other than SAFs 3].
Proposed Research on Asynchronous Circuit Testability 5.1 The State of Asynchronous Research
Research on asynchronous circuits is still very active. There is a handful of large and experienced research groups that are looking at various aspects of asynchronous design and several of these have considered testing characteristics and strategies to some extent. The design of asynchronous circuits is a complex process involving many tradeo s and while some methods have been applied successfully (Martin's asynchronous processor, the AMULET and Handshaking Circuit projects) there is still a lot of basic research on design methodologies, including new design methods and optimization of existing methods. There is an increasing amount of attention being paid to tradeo s with respect to speed, area, power, complexity and degree of formalism. Timed (bounded delay) methods are receiving some attention because of their ability to producer faster and more compact circuits but the synthesis algorithms are typically ether very complex or heuristic.
There is some diversity in the circuit models being used as the basis for the formal methods. While most papers claim to address SI or DI circuits it is common to nd subtle variations on the models used. There appears to be a widespread (but not explicitly stated) assumption that Speed Independent and Quasi-Delay Insensitive circuits are not signi cantly di erent. There has been a little work on the physical relevance of these models 9] but there is clearly a need for more work in this area and it is an important issue for testability analysis. There does not appear to have been a serious study into delay models and their relationship to what occurs in practical circuits.
The issues of appropriate design speci cation languages and the computational complexity of existing algorithms must still be addressed. An often expressed complaint in the literature is that CAD tools for asynchronous systems are very under-developed compared to synchronous tools. There is clearly a requirement for e cient design veri cation, test generation and design for testability (DFT) tools.
The State of Research on Testing Asynchronous Systems
The testing of complex circuits is a di cult problem and has been recognized as such for many years. There are no universally satisfactory methods for testing synchronous circuits and it would be foolish to imagine that any such methods will emerge for asynchronous circuits of comparable complexity. Nevertheless if asynchronous design is to be a viable alternative to synchronous design it is necessary that test techniques be developed that are at least as successful as current techniques for synchronous circuits. In this context success might be interpreted in terms of the following parameters: level of coverage of physical faults; complexity of test set generation process; test application times; area and delay penalties. Little of the work to date has addressed these criteria.
There have been limited (although signi cant) results on the testability properties of asynchronous circuits as well as reports of successful practical applications of BIST techniques. The latter depend heavily on scan techniques.
Test techniques for micropipeline based designs appear to be well understood although there is scope for including a more systematic approach to the testing of micropipeline control logic. This will become more important if, as might be anticipated, there is a trend to more complex control and sequencing of micropipelines.
The results regarding the self-checking properties of asynchronous circuits are encouraging, but quite limited in scope. The relationship of this property to circuit topology is not well understood and there is scope for further investigation to determine whether SI circuits are self-checking with respect to some classes of non-exitory or non-inhibiting faults.
A limitation of the existing work on SI circuits is that the size of the SI circuits which have been synthesized is relatively small. A practical VLSI asynchronous circuit is certainly going to be composed of a number of smaller SI modules. There appears to have been no work on the testability of composed SI circuits and the techniques which can be applied at a system-wide level, other than the partitioning of circuits using scan paths. 25 
Proposed Research Project
Given the requirements outlined above and the nature of work in this area to date, two strategies can be identi ed which would advance the state of the art:
1. devise a comprehensive test approach for asynchronous circuits that have been designed using one of the synthesis methods discussed earlier; 2. investigate general strategies for testing a broad class asynchronous circuits, making use of the known properties that such circuits have in common. There are currently too many design methods for one to be singled out for further testability investigations with any con dence that the method will endure. For this reason, and because the research groups that have developed the design methods are best placed to research design for testability for those methods, the rst line of investigation is not preferred.
When considering the general class of asynchronous circuits, the results that are available so far are restricted to small asynchronous (more speci cally semi-modular) circuits and the stuck-at fault model. In seeking to apply these results in the development of a practical test strategy for realistic circuits several problems that have not yet been addressed are evident: 56 ] extend this theorem to other froms of interconnection at a single point. None of these results are su ciently general to be applicable to the multiple interconnections using various protocols that might be required when composing a single circuit. One approach to deriving a more useful result is to devise a set of general conditions under which arbitrary interconnections of semi-modular circuits can be guaranteed to be semi-modular. 2. Can the results of Beerel be extended to cover some subset of exitory faults and if not, can circuits be designed to minimize the probability of such faults? It would appear that the restriction to non-exitory faults is too tight to make the self-checking property very useful in practice, in the absence of any information about how exitory faults can arise. A better understanding between the relationship between the occurrence of exitory faults (a classi cation based on circuit state) and circuit structure is required.
3. Is the stuck-at fault model appropriate for asynchronous circuits?
The SAF model is one which has been successfully used for testing combinational and synchronous sequential logic for many years. This is in spite of the fact that most physical faults do not manifest themselves as SAFs 3] . One reason for the successful application of the model is that test sets developed from the SAF models tend to be large and therefore will detect a large number of non-SAFs as well. In asynchronous circuits however a non-SAF fault may change the topology and semi-modularity of the circuit and invalidate any test procedures (especially self-checking) that depend on that property. Lu have demonstrated the inapplicability to non-SAFs. Nevertheless it is possible that there is a wider range of faults for which asynchronous circuits exhibit useful self-checking properties. 4. If a circuit is not totally self-checking with respect to some meaningful fault model then it will be necessary to use at least some external test techniques such as are used for synchronous circuits, including external application of test vectors, scanpath and I DDQ test approaches 1]. The application of chip-and system-level test techniques, such as I DDQ and, to a lesser extent, scan path requires that it be possible to halt the circuit in an arbitrary state. This is relatively straightforward in synchronous circuits but much less so in an asynchronous circuit, in which the circuit moves autonomously from one state to another. Without the addition of extra hardware it may be possible to halt the circuit in only a very small selection of its range of possible states. While the use of scan-path techniques for testing asynchronous data paths and limited control paths, the application of I DDQ to asynchronous circuits appears to be completely open for investigation.
I propose to address each of these four issues in detail. The proposed methodology is discussed brie y below.
Composition of Semi-modular Circuits
This is expected to be a relatively straightforward application of one of the existing models of semi-modular circuits to determine whether, and under what conditions general interconnections of semi-modular circuits are semi-modular.
It appears to be assumed that in practice a circuit designer will interconnect semimodular circuits so as to form another semi-modular circuit. This study will produce a formal statement about conditions that must be satis ed in such cases and a set of guidelines for the interconnection of semi-modular circuits. An algorithm for verifying the semi-modularity of such interconnected circuits is a likely result. This could be coded to produce a tool to be used for veri cation in a hierarchical design process.
Semi-modular Circuits and Non-exitory Faults
In his early papers on the topic 5, 4] Beerel argued that, subject to certain circuit construction assumptions, the self-checking property of asynchronous circuits would extend to exitory faults. This argument was subsequently refuted by counter-example but in his thesis Beerel suggests (but does not elaborate upon) a set of criteria related to circuit topology (the Strong Cycle Condition) such that circuits satisfying these criteria are totally self-checking.
It is proposed to develop a model of circuit topology that may be used in conjunction with circuit state-based models (such as those of Muller 43] or Brzozowski and Seger 11]) to facilitate reasoning about the occurrence and location of faults and their e ect on circuit state (it is this process in which Beerel made mistakes, arguably through the lack of a suitably descriptive formal model). The combined models will be used to investigate whether, and under what conditions the semi-modular circuits are totally self-checking for exitory faults.
This investigation would be purely theoretical in its early stages, although veri cation of results using computer simulation is likely to follow. There is a range of tools in the public domain for analyzing and simulating asynchronous circuits, so all resources required for this investigation are at hand.
There is a risk that this investigation will prove to be intractable or that any results obtained will be applicable to only a small class of circuits. If the investigation is going to be intractable it should become evident early on. On the other hand the development of suitable models for reasoning about fault e ects will be signi cant, whether or not the self-checking property of semi-modular circuits proves to be signi cantly extensible.
Semi-modular Circuits and Non-stuck-at Faults
Failure modes in CMOS circuits are well documented and their e ects on the types of gates used in asynchronous circuits have been considered to some extent. The di culty here is that bridging faults in particular can change circuit topology.
The models described in section 5.3.2 will be suitable for determining the e ect on any self-checking properties of any faults that alter the topology of the circuit. It is expected that it will be possible to identify a subset of non-SAFs with respect to which semimodular circuits are self-checking. A second possibility is that guidelines for designing for testability against such faults can be developed.
This will also be a theoretical investigation supported by simulations.
There is a higher risk that this study will return a negative result but application to this problem of models such as described above would be a signi cant new approach. On the other hand a strong positive result would be very signi cant given the widespread use of and dissatisfaction with the SAF model.
Applicability of System-wide Test Techniques
It is proposed to investigate whether it is possible to apply a design for testability approach that allows more control over state transitions without unduly a ecting the speed of or area consumed by the circuit.
A state-based model of circuit behaviour will be used to devise a technique for halting circuits in a much wider range of states than allowed by the control of a limited number of inputs. This will involve the addition of extra hardware including at least one input.
This will be a circuit design and simulation exercise designed to 1. classify the I DDQ characteristics of typical gates; 2. add extra inputs and and control logic to allow the circuit to be simulated so as to trigger signi cant I DDQ behaviour; 3. determine test sets required to exercise such circuits. This will be a less theoretical study than the previous three. A result in the form of quanti cation of the hardware penalty in using I DDQ for asynchronous circuits will be forthcoming. Design for testability guidelines will be inherent in the result.
Timetable
It is intended to conduct the four studies described above according to the following approximate timetable. 
Activity
