This paper introduces bzy Transitions Systems &zTSs). The notion of laziness exDlicitlv distinguishes behveen the enabling and the firing of an e;ent in"a transition system.
Introduction
In the recent years, there has been significant progress in developing methods and tools for asynchronous circuit synthek sis [19, 15, 17, 23, 5 The two chief directions in this work have been following t e hvo, traditionrdly competing, synthesis approaches, one based on the Huffman's state machine model [20] , the other deriving from MuUer's concept of a speed-independent circuit [13] . The former, also known as fundamental mode circuit design, makes strong assumptions about the delay of the environment compared to that of the circuit. It requires that the environment be slow enough in applying the new input vahres so as to allow the circuit to stabilize after responding to the previous inpu~The most well-known method associated with this ap roach is the one called Burst-Mode @M) circuit design, derve oped m [17, 24] . The second approach, on the contrary, makes no assumptions about the delays in the environmen~,permitting it to switch some of the inputs in response to changes m some of the circuit's outputs, without waiting for their complete stabilization. This way of action is often called input-output~0) mode. To define the behavior of the circuit and the environment interacting in the 10 mode, one normally uses an event-based description rather than a state-oriented one like in the BM approach. The recenfly
•~IS \vorkhm been fundedby~P~AGD-WGNr.
214949,CIC~~C95-0419, EPSRC gmts GW24038 md G~70175, Sptin-UK Accion= Integmdm Progmmme 1998/99,rordhlU~(projwt "WI kctitmmra").
Petission to make di~ti or tid copies of d or pti of this~vorkfor pemoti or $sr~m use k granted \titiout fee protidd tit copies ae not made or &trii uted for profit or cowerti advanhge ad that copies bear h notice ad tie M a~tion on tie fit page. To copy otien~, to repubhh, to post on sewe~or to red~tibute to hts, rqtiea prior sptic~M1on and/or a fee. ICCAD9S, Sm Jose, CA USA 0199S A~l I-5S113qS-Z98/0011.S5.W developed design methods and software based on Signal Transition Graphs (STGS) [9, 5] exemplify this approach, and produce speed-independent circuits, whose behavior is invariant to delays in gates but maybe sensitive to wire delays.
Mthough the second approach looks more flexible on the surface than the BM one, and promises higher performance and modularity, in rewlty it does not rdways come as efficient as expected, both in speed and area concerns. This is especidy tme with the advent of deep-submicron technologies, which radicrdly change the ratio between gate and wire delays. Wile being conservative to the former it is overly optimistic to the latter. Even though the notion of (extended) isochronicforks [10, 21] can help in guiding the technology mapping of speed-independent circuits towards safer solutions, it does not resolve the fundamentrd problem of time dependence for wires. On the other hand, in order to guarantee correct action regardless of the delays of both circuit gates and the environment, the synthesis process often caters for potential concurrency which wi~not exist in retity. This often results in excessively redundant implementations, which lose to their possible BM counterparts both in speed and area.
h order to batie the problems characteristic to both of the above mentioned "extremes" a method, ctied timed circuits, has been developed in [16, 15] . The main idea of this method is to retain the flexibihty of an event-based IO ap roach but make the ciri cuit's implementation more retistic and erefore more efficient. The awareness of time, required for asynchronous controllers to be on a par with synchronous ones in speed [6] , is achieved by associating expficit timing information with the actions performed by the environment and by the circuit, and uti~zing it throughout the design procedure to optimize the find logic. The major effect of timing assumptions appfied to circuit design is foUowing. Such constraints can reduce the state space effectively reachable by the circuit. Hence, firstiy, they can 'etiminate' some undesirable states, e.g. where input events might disable some output sigrtt rrmsitions~or 'resolve' state coding problems, e.g. the presence of semandcdly different states with identicrd codes. Secondly, they can help o~timize logic by exploiting the additionrd "don't care" space. Thirdly, the timing information may assist in rdlowing some, relatively slow, actions to be started eartier than they would norrndly be allowed in the speed-independent cas~their actual firing with respect to other events wi~remain unchan ed. f Finally, such timing information can be made globrd enoug to cover the BM designs; indeedl it can be shown (cf. Section 4) that a BM design is just a specld case of a timed circuit with simultaneity constraints, wherein dl outputs are assumed to change at once before any new input transition arrives. Wile work of Myers et al. [16, 15] appears to be exploiting the first two of the above-mentioned factors, it has not been able to provide an adequate formal support for the latter hvo issues.
h this paper, we target M these potential gains, by developing a behavioral model (Section 4) of a timed circuit, cded bg mition System. The two crucial novel elements of this model q the concept of relative timing constraints, where the exact timing information is abstracted away. hstead,~veuse only difference (event a fires earfierthan event b) orslmultrmeity (events a and b fire at the same time with respect to event c) assumptions. Such conditions can either be provided by the designer (which is the case at present) or produced by a hypothetical timing analysis tool 1.
q thenotion oflaziness tiatexplicifly distin~ishes behveen the enabling and the firing of an event in a transition system. This allows usnotonlyexplo~t delays mreducing concurrency to simplify designs on the basis of a priori timing conditions but dso to increase concurrency using the @ackward) expansion of the set of enabfing states. h the latter case, we also expect the designer to be able to trade off behveen speed and area increase.
The paper presents necessary conditions (Section 4) to synthesize circuits with a correct behavior under given timing assumptions and develops an algorithm (Section 5), implemented witiln the synthesis tool petrify.
The pretiruinary experiments (Section 6) show significant area and performance improvements due to exploiting the extra "don't care" space impficifly provided by the laziness of the events.
Basic notions
In this section we present basic definitions that will be used in the paper. For brevity, we assume the reader to be familiar with Petri nets, a formalism used to speci& concurrent systems. \Ve refer to [14] for a generrd tutorial on Petri nets. An STG has an associated SG in which.each reachable m.ar~-ing corresponds to a state and each translhon between a pmr of marhngs to an arc labeled with the same event of the transition.
State
Although STGS with bounded reachability space and SGS have the same descriptive power, STGS can usually express the same behavior more succinctly. In this paper, STGS help to illustrate timing assumptions in-a more inm-itiie way. 
Properties for implementabifity
Further to consistency, the fo~owing hvo properties are necessary for an SG to be implemented by a speed-independent circuit [8] .
The first property is speed-independence. In Figure I .b, ER(x-) = {101, 111} and QR(z-) = {001, 011, 010}. The symbol O* (l*) indicates that a rising $:~g) transition of the corresponding signal is enabled in that fieimplementation of an SG as a Iogiccircuit is done through the definition of the nat-statefinction for each output signal and binary vector. It is defined as follows:
The next-state function~a is correcdy defined when the SG has the CSC property, i.e. when there is no pair of states (s, s') such that w(s) = v(s') ands c ER(a+) U QR(a+) and s' E ER(a-) U QR(a-). Note that~~is an incompletely defined function with a don't care @C) set comespondingto those binary vectors without any associated state in the SG.
h the SG of Figure 
Logic synthesis
From the next-state functions, a speed-independent circuit can be derived by implementing the boolean equation of each output signal as anatomic complex gate [13] , as shown in Figure 1 .d.
h general, the boolean equations ma be too complex to be i implemented as an atomic gate in a specl c technology. Methods for logic decomposition and technology mapping that overcome tils hmitation have been proposed recenfly (e.g. [2, 4]). In this~v ever, the optimization methods we propose can be easily a er we do not address the problem of technology mapping. combined with existing methods for logic decomposition that can be targeted to technology mapping into given gate libraries.
l\\'e believe fiat tie Ie\relofrawch in tiis WQ dfiou~beingquite signifiwt remntly [3,7, 1S] is still insufficientto~v-t p~cti~fiWr =pwidly in provi~ng adquate relative timing for rdlstic circuit designs. 
Monotonic covers
The following definition is related to hazards in the behavior of asynchronous circuits. It will be used later in the paper. Given two sets of states S1 and SZof an SG such that SZ c S1, and a transitions~s', we will say that S1 is a monotonic cover of S2 iti
In the SG of Figure I .b, the set {101, 110,111} is a monotonic cover of ER(z-).
However, the set {100, 101,111} is not, since the transition 100~110 violates the conditions for monotomclty.
Motivating example
This section gives an intuitive picture of the optimization based on timing assumptions. It is illustrated by an implementation of the..~?z specification shown in Figure 1 ,a. A starting point for optlmizations is given by a speed-independent implementation of .ITZSTG (see Fikmre 1,d).
Speed-independence gives a rather conservative view on gate delays: they are finite but arbitrary. Howeve~,when the gates of a circuit are adjacent on a chip (which is most hkely for the modular swle of lrnp!efU~nt?tiOn). one can expect from their delays to be related. This relat]onshlp might be expressed by matching the time for a signal propagation through different stages of logic. For example, one can assume that a signal propagates through a single gate faster than through k gates, where k is a technology andor implementation dependent parameter.2
Let us assume that in a circuit for the vz example two gate delays are always greater than a delay of a single gate. Under this assumption, even though the transitions g+ and x-are potentirdly concurrent in the STG, in an implementation y+ would always occur be~ore x-. This timing assumption can be expressed in the STG by a specid.'timing arc" going from g+ to x- [22] (denoted in Figure 2 ,a by a dashed line). Timing restricts possible behaviors of implementation, in particular state 001 becomes unreachable because it can be entered only when x-fires earfier than y+. At unreachable states logic finctions of output signals can be defined arbitrarily. Therefore use of timing assumptions increases the "don't care.' space for circuit gates, which gives extra room for o~timization.
For.~~'z example, putting 001 in the don't care set ofz simplifies its tincdon from z = x +~z to a buffer z = x (see Figure 2,c,d ).
To get more aggressive optimization let us consider concurrent transitions z+ and y+ closer. These transitions are triggered by the same event r+ and due to the timing assumption 2 * gate~,,n > gatem.n no gate can fire until both outputs g and c are set to 1. Therefore for all other signals of the circuit the difference in firing times of g+ and z+ is negligible. The latter means that for the rest of the circuit transitions g+ and z+ 2~t l~ttcr cm be formalizedm ferns of detay rmrgefor gata. If a delay mge ]s [gtit~.,,,,,,~utem=z] then tie resumption~be posed m k * gatem, n > gattw, az are simultaneous and indistinguishable and they can replace each other in causal relations,with the other eyents.
In gz example x-1s the ordy transition that can "heti' z+ or g+. The dashed hyper-arc from z+, g+ to x-in Figure 3 ,a graphically re resents the simultaneity of g+ and z+ with respect Y to z-. Forma ly it means that for an enabling ofx-we can choose any of the following conditions: 1) z+ 2) y+ 3) z + V g+. This gives a set of states in which x-can be potentidy enabled, i.e. the so-called potentially enabling region of x-(PEnR(x-)) which is shadowed in Figure 3 ,b.
It is important to note tbac 1) Even though x-might be enabled in any state of
PEnR(z-)
its firing (due to timing assumptions) canoccuronly when reaching state 111. This behavior wi~be crdled a lan~one, because after its enabling a signrd is not eager to fire immediately but waits until certain states are entered.
2) A potentially enabting region gives an upper bound for the set of states in which a signal might be enabled. For a "refl enabfing in an implementation we can choose a subset of the potentially enabfing re ion. Playing with different sets of "rear' enablings within a #EnR gives new opportunities forthe optianipulatrons of signal enablings can be fomafized by admiza ion of circuits. and therefore x-should be enabled in 111. Enabting of x-in the other hvo states 101 and 110 can be chosen arbitrarily, i.e. these states can be put in the don't care set of a function for x (see Figure 3) ,c. During minimization the function for z (which becomes simply an inversion) is defined to be O in state 110 and 1 in 101, i.e. minimization puts 110 into the set of enabled states of z-, while 101 is put into the set of states in which z is stable. Back-annotating this result to the level of event interaction gives an STG in Figure 3 ,e in which x-is triggered by g+ instead of causal relation z+~x-in the original STG. This change of causal dependencies is valid under the assumption that g+ and z+ are simultaneous with respect to z-.
The timed circuit in Figure 3 ,d is much simpler than the speedindependent one in Figure 1,d . Nevefieless if the timing assumption "delay of g+ is less than sum of delays of z+ and x-" is satisfied, then the optimized circuit is a correct implementation for the original specification.
}Vecan now conclude about hvo potential sources of gain in optimization based on timing assumptions: 1)Unreachability of some states due to timing (~imed unreachable states).
2) Simultaneity of transitions which gives freedom in choosing enabfing regions for signals (la~behavior). k both cases the don't care space for the finctions of circuit signals increases which finally leads to simpler implementations.
The idea to use don't cares coming from the timed unreachable states is due to [16, 15] and was successfully exploited in the ATACS tool for the design of timed circuits. To our knowledge the observation about the additional don't cares coming from lazy behavior appears for the first time and is the main theoreticrd contribution of the paper. This concept is developed in more detail in the next section. 
Lazy systems
This section introduces the basics for defining lazy systems~vhich vere informally introduced in Section 3. The main distinctive feature of a lazy system is that it considers a non-zero delay behveen enabling of transition and its firing. Due to this, the set of states in tvhich a transition is enabled might be larger than the set of states in~vhichthe transition fires; recall that for speedindependent systems (cf. Section 2) these Dvosets alv~ayscoincide since every transition can have an arbitrary delay. The difference behveen the notions of firing and enabling regions comes from the observation of a non-zero delay in firing a lazy transition. The need to introduce a potentially enabfing region together~vithenabfing region simply~vasilhzstra~edby the optimization loop on the example of timed implementation of~z STG. A potentially enabling region gives an upper bound for a set of states in Ivhich a transition can be ena led The freedom in 2 choosing the enabhng region Iv]thin the P nR gives additionrd possibilities for logic optimization. Note that at the specification lev$l it is suficient to consider firing and potentially enabling regions.
Lazy State Graphs
It is easy to see the follotving correspondence betv;een the introduced regions: FR(a*) c EnR(a*) c PEnR(a*).
Examples of potentially enabfing and firing regions are illustrated by Figure 3 andjrins (FR(a*)) resionsare de$ned, and 2. at least one transition is laq.3
The correctness properties of SGS can be easily transferred onto lazy state graphs. An LzSG~villbe called consistent, deterministic and commutative if the underlying SG has these properties. For persistency property the distinction behveen the firing and enabting regions requires to generalize its definition for LzSGS. Persistency captures the absence of hazards in an implementation derived by LzSG, therefore ;ve }villformulate it in terms of enabting regions rather than by PEnRs.
Definition 43 @persistency) A sisnal transition a* is persistent in LzSG ifhvo conditions are satis$ed:
q in EnR(a*) c PEnR(a*) no disablins of rz* is possible, i.e. Vs c EnR(a*), s~SI, a # b, a* is enabled in S1.
q no transitions from jrins to the corresponding enablins resion is possible, i.e. for any he there is no transin.on S1 % sz such that SI E FR(a*) andsz G EnR(a*) -FR(a*).
The follo~vingprope~reveals the distinctive feamres of firing and enabling regions of persistent transitions.
Property 4.1 For a persistent transition a* in LzSG every
EnR(a*) and FR(a*) can be sited only by thejrins of ax. 3A< v~e are targetti at optimization of signals that ae syntb=imd by a circuit \ve \vill not consider l~y behaviors of input signats.
The proof is trivial. For FR(a*)
itfollows directly from Condition 2 of Definition 4.3, while for EnR(a*) exiting it by any signal different from a means the disabling of a*, which contradicts the persistency requirement Prope@ 4.1 bridges up the conditions for hazard-free implementation of an LzSG with the similar ones for implementing an SG. It can be shown that, if the timing assumptions for an initial specification are satisfied, then any LzSG in which transitions of output signals are persistent has a hazard-free implementation with complex gates. The implementation issues for an LzSG will be d~scpssedin detail in Section 5. Before that we should fo~rd-lze tlmmg assumphons and determme what kmd of assumphons are really needed.
Timing assumptions
Timing assumptions could be defined in the form telling that one event is happening before or after another. However, this form is ambiguous for cyclic specifications because their transitions can be instantiated many times and different instances may have different ordering. More rigor in defining ordering relations can be achieved at the unfolding level [12] , i.e. when an original net is unfolded into an equivalent acycfic description. The theory of timed unfoldings is howeverrestricted to simple stmctural classes of STGS and the timing analysis algorithms are computationally expensive [3, 7] . \Ve will therefore rely on a more conservative approximation of timing assumptions in LzSGS.
Difference constraints.
A difference constraint b* < a*, involving two potentially concurrent events a* and b*, assumes that, due to certain timing characteristics, b* fires earlier than a*. Formally, it can be dettned through the mazitnum separation Sep~a$(b*, a*) behveen events b* and a* [11] . The maximum separation gives an upper bound on the time difference behveen firings of b* and a*. If Sep~a~(b*, a*) <0 then b* always fires earlier than a*. h SG this assumption can be represented by the concurrency reduction of a* with respect to be.
Concurrency reduction can be performed in two steps: 1: Remove all arcs s~such thats is backward reachable from PRnR(a*) n PEnR(b*) 4 (this delays a* until b* fires) 2. Remove unreachable states (due to delaying a* against b*) Let us illustrate the application ofa difference constraint+ < d+ on the example of STG in Figure 4 Timing assumptions based on difference constraints are the main source for the elimination of timed unreachable states [16, 15] but they cannot filly express the lazy behavior of signals.
Simultaneity constraints.
Exploiting simultaneity in transition firings is a key factor in the burst-mode design methodology [17] . Here, the environment is considered to be slow and therefore the skew of delays for output sigrrds is negligible, i.e. output transitions are simultaneous from the point of view of environment (fandamerrtal mode assure tion). The weak point of ! the fundamental mode is that it must e appfied to a circuit as a whole, which essentially rehes on even distribution of propagation delays within the circuit. To lift this restriction we consider simultaneity assumptions more locally, and hence introduce a local jmdamentalmode with respect to paticulargroups of transitions. The simultaneity assumption is a rektive notion, which is defined on a set of transitions T with res ect to a reference transition a*. Letus consider the simultaneity assumption behveen transitions b+ and c+ with respect to a-in the LzSG from Figure 4 ,c. This assumption influences the LzSG in hvo ways: 1) State 0100 which is entered when a-fires before c+ becomes unreachable.~ndeed from Sep~== (c+, b+) < d(a-) (coming from simultaneity assumption) and Sep~==(b+, a-< d O(coming from causafity behveen b+ and a-) follows the di erence constraint Sepma.(c+, a-) < O) 2) A potentially enabling region for a-is expanded in state 1010 (see Figure 4,d) .
The above imples that optimization based on simultaneity assum~tions goes beyond the possiblfities given by difference constrakts onl~.
Early enabling. The simultaneity assumptions ex loit "lazi-I ncss" behveen concurrent transitions.~Is idea can e generalized for ordered transitions as well. Suppose that transition ax triggers the firing of b* and we assume that in the implementation a* is "fastef' than b* ( 
5

Implementation
The method presented in the previous sections has been implemented in the tool petrify, that can synthesize asynchronous circuits from STG specifications.
The timing assumptions on the behavior of the circuit and the environment are specified by the designer.~vo types of assumptions are accepted q t(a) < t(b), indicating that event a will always occur before event b, even they are potentially concurrent according to the originrd STG.
q t(a) = t(b) wrt c indicating that the occurrence of a and b can be considered simultaneous with regard to event c
In the example of Figure 3 , the fo~owing assumptions have been specified for optimization
t(g+) < t(z-) and t(~+)= t(z+) wrt x-
The following procedure is executed to do logic synthesis of each output signrd x:
1. The set ofunreachablestates, PEnR(~+) and PEnR(xare calculated according to the timing assumptions. D2 is defined as the set of binary vectors not corresponding to any reachable states.
2. Those states that may introduce CSC conflicts by a possible change of the next-state function are removed from their corresponding PEnR.
'DC must be only catcrdatedon= for atl signats.
c(x) Othenvise, remove from PEnR(x+) and PEnR(z-) those states that violate the previous monotony conditions for C(r] and C(X).~is transformation is illustrated in Figure 5 .~us, the ne~v PEnRs tvi~be equal to or smtier than the previous ones.
Go to step 3
In the worst case, the loop 3-7 Ivill converge to;vards a configuration~vith PEnR(z+) = FR(x+), PEnR(x-) = FR(a-), ON(Z) = FR(x+)u QR(x+) and C(Z) = ON(x) Udforsome d~DC. Note that the largest timing optimization is achieved vhenC(2 ) completely covers PEn R(z+) and does not intersect PEnR(z-).
In practice, most covers C(Z) are monotonic after the first boolean minimization and no iteration is required. Only in some rare cases? more than hvo iterations are executed.
Petrify includes a boolean minimizer that defivers several covers~vithsimilar cost. Among them, a cost function selects those that are monotonic and have the sma~est tited count.
In the future tve foresee to provide more freedom to the designer to seek the best trade-off behveen area and performance.
is can be im lemented by enabting the designer to tune some t parameters oft e cost function.
Experimental results
In this section~vereport on the experimental setup, including a discussion on ho~vto derive timing assure tions from ho~vledge % about the environment and information a out the circuit implementation, and~vesho~vpreliminary experimental results. " 6.1 Design flow and assumption derivation me timing-based o timizations described in this paper best fit into a design flo}v$at satisfies three requirements,~n order of importance 1. some information is kno~vnabout the delay of the environment in~vhichthe circuit~villoperate (or, rdtematively, large potions of theoverdl asynchronous control aresynthesized and analyzed for timing properties simultaneously), 2. good control is possible over the delay of gates and~vires tvitiln the circuit portion on ;vhich timing-based optimization is performed.
3. a good asynchronous timing anrdysis tool is available.
me first requirement is necessary in order to apply optimization in the style of Myers [15] , as extended in this paper to use don't cares instead of pre-specified vahres. me second requirement is necessary in order to t&e m&xi-mum.advantage from the ca abilities of lazy timing optimization. i Consider, for example, the ecision to enable a slo~vsignal early, m order to speed it up. In that case, changing the logic due to the addition of laziness to the SG may have the un}vrmtedeffect of firing this signal too early. Whhout transistor sizing or delay padding, there is litie hope of closing the optimization loop in a clean and easy~vay,since it is very tifficult to determine a priorĩ vhicho tirnizations are safe andpresewe the timing assumptions iõn whzc th are based. On the other hand,~vithtransistor sizing or delay pa ding one can restore the correct ordering of transitions and ensure the vafidity of almost any early enabting due to separation assumptions behveen outputs. me third requirement is, unfortunately, sti~fa from rea~zable. Although good progress in this direction has been made [3, 11, 1S, 1],~ve are sti~far from having an automated tool that can handle realistic circuits in a reasonable time in the ~efttothedesigner'sin~ition addability.
resence of input non-determinism.
Hence for now this step is For this paper, we have assumed that 1. All inputs to the circuit are slower than any single gate inside the circuit. This is genedly a realistic assumption even if the "apparent''behaviorof those inputs isjust that of a buffer or inverter, since this generally "hides" the control of some other asynchronous pipehne stage, that behaves like a simple handsh~e, but has actually large delays in comparison with those of the gates composing the circuit that 1sbeing designed.
2.
No control over gate delays is possible. We actually used a fairly small standard cell library, in order to test our approach in a sort of worst case.
3. Performance analysis, as well as part of timing analysis, is done by logic simulation. We synthesized both the circuit and the environment, and artificially slow down the environment implementation by delay padding. Moreover, we limited ourselves to circuits without input non-determinism (since non-determinism is not synthesizable with standard speed-inde endent techniques), or chose one specific operfational cyc e ot circuits with input non-deterrmnism.
The results of simulation were used both to derive internal timing assumptions, for the purpose of early enabling, and in order to check that those assumptions were satisfied after lazy resynthesis. We manually verified that the result of simulation was consistent with the STG specification. This is by no means a suggested design flow choice, but it is just a tempora~solution.
6.2
Experimental results Table 1shows the results of the application ofourtiming-based oi timization procedure to a well-known set of asynchronous bentmark circuits. The experiment was organized as follows.
q We implemented all the circuits by using basic gates from a small library (1 not, 4 andnandor/nor, 4 and-or-invert, 2 SR flip-flop and 1 C-element) based on ES2 1 pm technology, q We ran a logic simulation of the circuit hvice, once with 1ns delay on every input signal, and once with 2 ns on every input signal. We identified the duration of a cycle in the simulation, and used it as a measure of circuit performance fin fact the simulation always converged to the critical cycle in two iterations). We used the difference behveen the hvo runs in order to isolate the contribution to the critical cycle due to the Circuit from @at due to the environment. The result of th]s first speed-urdependent synthesis run 1s presented in columns 2 and 3, by showing area (factored form Iiterals) and critical cvcle contribution due to tlte circuit fin picosecond; the del~y of an invefier is about 200 ps in this technology).
q We added separation assumptions stating that input signals are slower than any output or internal (state) signal, and implemented all the circuits again.
q We ran the simulation again, with 1 ns delay on all inputs, checKng that the timing assumptions were satisfied. The result of Wls second timed synthesis and simulation run (again, factoring out the contribution to the period due to the inputs) is presented in columns 4 and 5, both in absolute terms and as a percentage.
q We added further separation assumptions befiveen outputs, based on relative delays of gates m the implementation. The simulation done in the previous step was used in order to derive firing times of intemd and external signrds, and manual analysis was used in order to determine the exact timed causal relations. We implemented the circuits again. In some case, no improvement could be obtained while still q satisfying the assumptions. Otherwise, the improvement with respect to timed synthesis was due both to a larger don't care space and to early enabling.
We ran the simulation. chec~n~the satisfaction of the assumptions. The resufi of this &ird lazy synthesis step is presented in columns 6 and 7.
From thesepreliminary experiments we can conclude that lazy optimization is a very prornismg technique for aggressive timing optimization of asynchronous control circuits, because q it allows one to effectively achieve the same objective of increasing throughput aspipelining in synchronous circuits, but q avoids (or limits) the pend~due to pipehne latches in terms of both area and performance~atency and ultimate throughput limitation due to latch intemrd delays).
Moreover, the technique is applicable even without sophisticated transistor sizing techniques, that would mke it even more effective, and without automated timing ansdysis tools, that would m~e it easier and safer6.
Conclusions
We have proposed Lazy Transition Systems, a theoretical model for timed circuit synthesis, where the notions ofenabfingandfiring are distinguished for a signal switching event. h tils new framework, we have also presented necessary conditions for synthesis of circuits with correct behavior under given timing assumptions.
Wecannow summarize themainresults oftils paperbyputting our method into the ovedl t~xonomy of issu~involved in timed circuit synthesis: Both types of relative timing assumptions, difference (onesided) and simultaneity (hvo-sided) constraints, are used.
The objects on which timing information can be defined are either individud transition delays (they are good for locdy related events; timing analysis is simple) or firing times (more glob~, relate sequences of events).
The way timing determines the don't care space is either due to unreachability (they are aimed at are~higher speed is achieved as logic is simpler) or due to laziness, i.e. enabfing region ex ansion (these are targeted for both area 7 and performance .
The method currentiy solves a "direct" problem given an STG model with timing assumptions, obtain an optimized circuit (it would be ossible to consider the "inverse" onti ! given an STG mode, obtain an optimized circuit with timing constraints).
Timing analysis is at present assumed to be the designer's resuonsibiliw (which lScheau and fast. Iocd dependencies. ap~roximatej. 'h the fiture ;n automatic tool (still expefr: sive, global dependencies, exact) can be used.
Preliminary experimented results confirm that significant area and speed improvements can be achieved by exploiting the extra don't care space due to the laziness of timed events. This approach helps bridging hvo critical gaps existing in synthesis of control circuits today. The first gap is between the hvo main ap roaches { for automated asynchronous controller synthesis, those ased on fundamental @lobd timing constraints) and input-output modes. It also tacMes the traditionfly unreconcilable ga behveen asynf chronous and synchronous circuit synthesis [6 . Namely, the 61n WISexperiment \ve considered only a single delay number for each gate v~henverifying the hndng resumptions by simulation, inst~d of considering the safer rnin-m~delay intetis allow,edby the above mentioned separation arrafysis t=hniqum. Table 1 : Experimental results of lazy optimization proposed lazy optimization technique is in many~vayscomple-mentaryto thetechniques used for synchronous circuits for the same objective @igher throughput). Our approach thus identifies~vaysin~vhichsynthesis of asynchronous circuits can achieve greater practicality and~vider acceptance due to its more active deding tvith time information. To this end,~vefeel urgent need for more research in the area of mechanizing the feedback bet~veen timing optimization and timing analysis.
