Abstract: We describe a technique to estimate the energy consumed by speed-independent asynchronous (clockless) control circuits. Because speed-independent circuits are hazard-free under all possible combinations of gate delays, we prove that an accurate estimate of their energy consumption is independent of relative component gate delays and can be determined by simulating only a small number of input patterns proportional to the size of the circuit's Signal Transition Graph (STG) specication. Specically, w e calculate the average energy per external signal transition consumed by a circuit. This can be used to compare the energy consumption between two dierent circuit implementations of the same specication, to calculate average energy for a given high-level operation, and to provide average circuit power when combined with delay information.
Introduction
Because asynchronous circuits do not have a global clock, energy consumption per clock cycle, a common means of quantifying energy dissipation in synchronous circuits, can not be used. Instead, it has been suggested to quantify energy consumption in asynchronous circuits using energy per operation (cycle of activity) [6] . This approach has been applied to asynchronous circuits that use a two-phase signaling protocol for request/acknowledge handshaking. The control circuits addressed in [6] , however, are assumed to be in a small set of pre-dened macro gates whose energy consumption per output transition is pre-calculated using SPICE. The energy per output transition for all control circuits is then incorporated into the overall estimate for energy and power consumption [6] .
This paper, on the other hand, considers the problem of measuring energy consumption of a more general class of control circuits to support energy and power estimation of other types of asynchronous circuits. Specically, w e consider control circuits that are netlists of gates rather than pre-dened macro gates. These circuits are speedindependent, i.e., they are hazard-free under any set of component gate delays. Moreover, the circuits can exhibit input choice, in which the environment can non-deterministically decide what external inputs to change, driving the circuit into dierent modes of operation, such as a read/write mode in a memory interface controller.
Since dierent i n ternal signals may c hange in dierent modes, each m o d e m a y h a v e a dierent energy consumption. To obtain a unique gure of merit, it is necessary to combine the energy consumption in each of the dierent modes of operation into a single quantity. T o do this, we assume the availability of input-choice statistics, which describe the relative probabilities of dierent e n vironmental choices. In practice, these input-choice statistics may be given by the user or estimated through behavioral simulation.
Our procedure assumes the control circuit is specied using a Signal Transition Graph (STG). Starting with the STG, we derive a Sequential Signal Transition Graph (SSTG) that denes a small set of input sequences that are simulated to obtain the switching activity of circuit nodes.
The SSTG also denes a Markov c hain which under reasonable circuit assumptions is irreducible, recurrent, and periodic (see section 4). Using Markov c hain analysis, we obtain a small system of linear equations whose solution, when combined with the simulation results of the SSTG, yields the average energy per external signal transition. The organization of the paper is as follows. Section 2 gives a more complete introduction to calculating energy in asynchronous circuits. Section 3 describes the classes of circuits and specications considered in this paper. Section 4 presents a proof that the order of concurrent transitions does not aect the energy consumption of the circuit. Section 5 describes our SSTG. Section 6 describes how the SSTG denes a Markov c hain which leads to a system of linear equations whose solution is used to calculate the average energy. Section 7 describes how this work relates to previous work on the energy consumption of synchronous circuits. Section 8 describes some preliminary results and conclusions.
Calculating energy consumption
The lack of a global clock in an asynchronous circuit and the concurrent operation of the circuit and its environment obscures any o b vious time frame to measure energy consumption. If the circuit repeatedly performed the same operation, such as asynchronous datapaths, then energy per operation would be a useful measure of energy consumption [6] .
Control circuits, however, do not perform a well-dened single operation repeatedly and hence energy per operation is not suitable for quantifying energy use. Kudva et al. [6] suggest measuring the average energy per output transition using SPICE simulation on the nal layout, which is practical when the circuits are single pre-dened gates. Yet, for other asynchronous design styles the control circuits are synthesized automatically or semi-automatically using basic gates (e.g., [2, 9] ). SPICE simulation is impractical for these circuits since energy consumption estimates are desired before their physical design. Moreover, since energy might b e expended in response to input signal changes prior to any output signal change, we estimate a slightly dierent gure of merit: average energy per external (input and output) signal transition.
To illustrate the issues involved in calculating energy per external signal transition, let us consider the speedindependent control circuit and its STG specication depicted in Figure 1 . This control circuit, called sbuf-send-ctl, has 5 output signals, 3 input signals, and 9 internal signals. The arrows in the STG specication describe sequencing requirements between transitions of input and output signals. For example, the arrow b e t w een rejpkt+/1 and y1+/1 means that after the environment raises rejpkt, the circuit should raise y1. The two arrows into rejpkt-from idlebar+ and latchaddr+/1 respectively mean that once both idlebar+ and latchaddr+/1 occur, the environment can lower rejpkt. In addition, the STG species environmental choice using arrows that emanate from circles, called places. F or example, in the Third, we associate a probability for each trace in T 2 T k , denoted Pr STG (T), subject to the constraint that P T2T k Pr STG (T) = 1 for all k. This probability depends on gate delays and the probability o f v arious environmental choices. For example, consider the two traces T1 and T2 in T 5 . Since they are the only two traces of length 5, the sum of their probabilities should equal 1. Their exact probabilities, however, depend on the relative delays of the sub-circuits that control them. For example, since the sub-circuit for latchaddr has 3 levels of logic and the sub-circuit for idlebar only has 2 levels, idlebar+ will most likely re rst. Since idlebar+ res rst in T1, a delay analysis might generate a probability o f 0 : 8 for T1 and 0:2 for T2.
Finally, w e can present our denition for average energy per external signal transition, denoted E as follows:
The focus of the paper is to provide an ecient technique to calculate the above equation for a large class of speedindependent control circuits. To do this, we prove that for the class of speed-independent circuits we consider, any pair of traces T and T 0 that dier only in the order of concurrent transitions consume the same energy, i.e., En(T) = En(T 0 ). For example, the traces T1 and T2 discussed in the last section dier only in the order of concurrent transitions. Hence, both traces consume the same amount of energy.
As a result of this property, a n e quivalence r elation that identies traces that consume equal energy, diering only by the ordering of concurrent transitions, can be dened. This relation partitions the traces T into a set of equivalence classes . Each class 2 contains all traces in which the same input choices are made by the environment. We dene the probability of an equivalence class Pr STG () t o b e t h e sum of the probabilities of all traces inside the class. The energy consumed by an equivalence class, En(), is equal to the energy consumed by a n y trace T in . A v erage energy can then be computed based on the probability of each equivalence class Pr STG () as follows:
(3) This result is the key justication of our power estimation procedure. This equation refers to the probability of equivalence classes rather than that of individual traces. In the presence of concurrent transitions, the probability of individual traces can depend on relative delays inside the circuit and the environment, in contrast, the probability of an equivalence class depends only on the statistics of various input choices made by the environment. Hence, this result enables us to estimate energy consumption without performing a complicated delay analysis on the circuit or its environment.
Moreover, this equation enables us to calculate average energy by simulating the circuit using a derived sequential signal transition graph that denes a subset of the traces of the STG and does not model any concurrency. W e will show that because of the lack of concurrency, the SSTG denes a Markov c hain. Using Markov c hain analysis, the proportion of each STG transitions that res in an average trace in the SSTG can be calculated. This result leads to a small system of linear equations that can be used to calculate E.
Specications and implementations
In this paper we restrict ourselves to free-choice STG specications and their speed-independent implementations.
Free-choice STG
An STG [3] is a specication formalism for asynchronous sequential circuits. It is an interpreted Petri net, and as such, can explicitly capture causality, concurrency, and choice.
A P etri net is a triple N = hP; R; Fi, where P is a set of places, R is a set of transitions, and F (P R) [ The conventional graphical representation of an STG is slightly dierent than conventional Petri net representations. STGs are represented using a directed graph, where transitions are identied by their name, places are denoted by circles, and directed edges represent elements of the ow relation. Places with only one predecessor and one successor are omitted. Directed edges whose successor is a transition represent sequencing constraints, either on the circuit to be synthesized (if their successor is an output transition), or on the environment (if their successor is an input transition). They specify what set of transitions cause each transition.
A token marking of a Petri net is a non-negative i n teger labeling of its places. A transition is enabled (i.e., the corresponding event can happen in the circuit) whenever all its predecessor places are marked with at least one token.
An enabled transition may re. This means that the corresponding external signal changes value in the circuit. When it res, a token is removed from every predecessor place, and a token is added to every successor place.
If a place marked with only one token has more than one enabled successor transition, then only one of them may re non-deterministically. The other transitions are disabled by its ring. Hence, such successor transitions are referred to as choice t r ansitions. if its is 1-bounded. A marked net is 1-bounded (safe) if its initial marking is 1-bounded (safe). Two transitions are said to be concurrent i there exists a marking, reachable from the initial marking, in which both transitions are enabled.
An STG is live i 1) the underlying free choice net is live and safe, 2) no two transitions of the same signal are concurrent, and 3) if two up-transitions of a signal are red, a d o wn-transition of the same signal should always be red in between (and vice versa). In this paper, we restrict ourselves to live STGs.
We annotate choice transitions with choice probabilities that designate the probability of the environment ring one of multiple mutually-exclusive input transitions. In our example, we annotated both choice transitions acksend+/1and rejpkt+/2 with a probability o f 0 : 5.
The reachability graph of a Petri net is a directed graph where each node corresponds to a marking and an edge joins a pair of marking M 0 and M 00 if there exists a transition t that ring from M 0 produced M 00 (the transition labels the edge).
The state graph (SG) [3] of an STG is the reachability graph of the underlying net where each node (henceforth called state) is labeled with a vector of signal values. For a given state s, the value of signal u is given by s(u). This node labeling must be consistent with the SG edge labeling, in other words for each edge s ! s 0 and each signal u: if the edge is labeled u + , then s(u) m ust be 0 and s 0 (u) m ust be 1; if the edge is labeled u , then signal s(u) m ust be 1 and s 0 (u) m ust be 0; and otherwise s(u) m ust equal s 0 (u). The labeling can always be done, as proven in [3] , if the STG is live. The procedure to nd a SG from a STG is called token ow and is given in [3] . A portion of the SG along with the corresponding portion of the STG is depicted in Figure 2 .
Notice that the SG can be exponentially larger than its STG. The size dierence is dependent on the amount of concurrency expressed in the STG. In the STG, concurrency is expressed implicitly with independent parallel paths of transitions, while, in a SG, each dierent i n terleaving is explicitly represented by dierent paths through the SG. The two dierent i n terleaving T1 and T2 that were discussed earlier correspond to the two paths through the portion of the SG depicted in Figure 2 .
Our procedure for estimating power does not explore the various interleavings and hence avoids the exponential complexity associated with the state graph.
Speed-independent circuits
We model a circuit with a ve-tuple G = hI;O ;N;E;Fi. The sets I, O, and N are the input, output and internal signals of the circuit respectively. They are collectively called the circuit signals and denoted A Impl . E A Impl A Impl is a set of directed edges that denes the connectivity o f the circuit signals. An edge e = ( u; v) means that signal u is a fanin of signal v. F is a labeling function that labels each i n ternal and output signal u with a binary function fu that describes the function of the gate that drives this signal. Two-output gates such a s m utual exclusion elements can not be modeled in this framework. Consequently, circuits that contain such elements are not considered.
A signal is enabled if its current v alue does not equal the value of its function given the current v alue of its fanin.
For example, if the current v alue of u is 0 and it is driven by an AND gate whose inputs are all 1, then u is enabled.
When the signal changes value, the signal res. A signal is disabled if, before it res, one or more of its fanins change such that it is becomes no longer enabled. In practice, the disabling of a signal can manifest itself as a voltage spike o r runt pulse. For a broad class of circuits using a conservative delay model, there always exists a set of delays such that this spike can propagate to a primary output [1] . Therefore, such a disabling is generally considered a hazard and should be avoided. For the sequel, a circuit is speed-independent i the circuit is guaranteed to be void of hazards under all possible gate delays (assuming negligible wire delay) and the circuit satises its specication [1] .
Concurrency and energy consumption
This section formalizes our claim that energy consumption can be calculated without analyzing all possible interleavings. First, we prove that two traces that dier only in the order of two concurrent transitions consume the same energy. Then, we extend the result to all members of an equivalence class of interleavings. To extend this result to any set of transitions that dier only in the interleavings of concurrent transitions, we dene a relation C T T such that (T;T 0 )2Ci there exists a sequence (including the zero length sequence) of legal swaps that transform T into T 0 . Then, any t w o traces in C consume the same amount of energy. Moreover, it is easy to prove that C is an equivalence relation. Therefore, it partitions a trace set T into a set of equivalence classes . We let k represent the equivalence classes of the set of traces of length k.
The equation for average energy can be recast in terms of these equivalence classes rather than individual traces to obtain Equation 3 repeated here for convenience: The algorithm to nd the SSTG from a given STG. The advantage of this alternative representation is that the probabilities of each equivalence class depend only on input-choice statistics. Hence, a delay analysis of the circuit to determine relative probabilities of dierent i n terleavings caused by concurrent output signals is unnecessary.
The sequential signal transition graph
The energy of an equivalence class, En(), is measured by simulating a characteristic trace T of This characteristic trace is dened by imposing an order on concurrent STG transitions which is the same for all equivalence classes. Such an ordering ensures that equivalence classes that contain different subtraces corresponding to the same part of the STG will have the same choice of interleavings. We specify this ordering by taking the original STG and deriving a new STG, called a sequential signal transition graph (SSTG), in which all concurrency is serialized.
Algorithm 5.1, presented in Figure 3 , nds an SSTG for a given STG. The resulting SSTG is comparable in size with the initial STG. When run on the STG given in Figure 1 , the algorithm derives the SSTG illustrated in Figure 4 .
Note that each trace of the SSTG T i s a c haracteristic trace of the equivalence class in the original STG. Infact, the purpose of the SSTG is that the probability of the equivalence class , which recall is dened as the sum of Pr STG (T) for all traces T in , is equal to the probability o f T in the SSTG, i.e.,
Pr STG (T) = Pr SSTG (T) (5) Using this fact, the average energy of the circuit can be computed as follows:
where T SSTG is the set of all traces in the SSTG.
Calculating average energy
To calculate the average energy using the SSTG we dene the long term proportion of an SSTG transition t, denoted Figure 4 : The SSTG for the circuit sbuf-send-ctl. t, as follows:
Then, the equation for average energy can be recast as follows:
where R is the set of STG transitions and En(t) is the energy consumed simulating transition t. Note that each SSTG transition t has a unique corresponding SG state s in which t is enabled in which simulation begins. (This state can be obtained through a reachability analysis of the SSTG.) We begin simulation of the transition assuming all internal circuit signals settle in this state. Then, we re the external transition and simulate the circuit until all internal signals have once again settled, recording which circuit signal change. During simulation, a zero delay model can be used for circuits which h a v e n o i n ternal feedback since internal signals can change at most once per external signal transition. Many automatically synthesized circuits have this property, e.g., [2] . When internal feedback can exist, on the other hand, internal signals may c hange multiple times per external signal transition, requiring the use of a more realistic delay model. Developing this equation from the SSTG is important because now Markov chain theory can be used to obtain the long-term proportions of STG transitions. To demonstrate this, we rst review Markov c hains, then, explain why the original STG does not generally dene a Markov c hain, and demonstrate that, under the assumption of uncorrelated choices, the traces of the SSTG dene a Markov c hain.
Consider a stochastic process fXn; n= 0 ; 1 ; 2 ; : : : g . If Xn = i then we s a y the process is in state i at time n. I f the conditional distribution of any future state Xn+1 is independent of the past states and depends only on the present state, then the process is a Markov c hain.
First, consider the stochastic process fTn; n= 0 ; 1 ; 2 ; : : : g that models all possible traces in the original STG. In each \time step", Tn can take o n a nite set of values, the STG transitions. If Tn = t, then the n th state of the process is the STG transition t. The traces do not dene a Markov c hain because concurrency is allowed. Consider, for example, two concurrent transitions t and t 0 . The probability of a future state Tn+1 being transition t 0 given that Tn = t depends on whether or not t 0 was visited earlier (such as in state Tn 1), i.e., the conditional distribution depends not only on the current state of the process, but also on past states. This violates the Markov model. Now, consider the traces in the SSTG which exhibits no concurrency. The only decision points made during the trace are at choice places. If there is no correlation between consecutive input choices, the conditional distribution of the future states Tn+1 depends only on the present state. The transition probability b e t w een states corresponding to STG transition t and t 0 , denoted P tt 0 , are as follows:
zero if there is no edge from t to t 0 ; one if the only edge into t 0 is from t; and the choice probability o f t , i f t i s a c hoice transition.
Hence, the stochastic process is a nite-state Markov chain. In fact, in Section 7, we will argue that for speedindependent circuits, correlations among input choices do not aect the average energy consumption of the circuit.
Before we can present our nal result, we m ust demonstrate the our Markov c hain satises a few properties. A state t in a Markov c hain is recurrent if, starting in state t, the process will ever reenter state t. Since the STG is live we are guaranteed after ring t there will exist a trace in the original STG that reres t. Since we do not allow zero choice probabilities we are guaranteed that the probability of this trace is non-zero. Hence, our Markov c hain is recurrent.
A Markov c hain is irreducible if every state can be reached from every other state. Because the original STG is live, from every marking there exists a sequence of transitions which enables every transition. Since this property extends to our SSTG, our Markov c hain is irreducible.
State t in a Markov c hain is periodic with period d if the probability of being in state t after n transitions starting in state t is 0 whenever n is not divisible by d and d is the largest integer with this property [11] . In other words, starting in state t, i t m a y be only possible to enter state t at times 2, 4, 6, 8, etc, in which case state t will have period 2. State t is called aperiodic if d equals 1. Since a STG transition t cannot re twice without ring a complete cycle of the transitions in the STG, our Markov c hain is periodic.
Because our nite-state Markov c hain is irredundant, recurrent, and periodic, it can be shown that the t are the unique non-negative solutions to the following equations [11] : Figure 4 , we illustrate this collapsing of equiprobable transitions in Figure 5 .
The new system of linear equations based on this partition of sequences of equiprobable transitions are as follows:
(# of transitions in Lm) L m = 1 : (12) In our example, this transformation reduces the number of needed equations from 24 to 4. Average energy can then be calculated in terms of these transition lists as follows:
where En(Lm), the energy consumed in the transition list Lm, is simply the sum of the energy consumed in its composite transitions. Hence, the exact value of E can be obtained by simulating each STG transition once! Figure 5 : The collapsed SSTG for the circuit sbuf-send-ctl.
Relationship to related work
Because the performance of a synchronous circuit is xed by the period of the global clock, the average circuit power can be obtained from the energy consumed per clock cycle. As a result, the switching activity (the expected number of times a gate switches in one clock period) of circuit signals determines the energy and power consumption estimates. The goal of current research is to obtain the most accurate estimate of switching activity while maintaining low complexity.
Many synchronous energy estimation techniques address taking into account correlations among dierent circuits signals. Correlations among signals can be related temporally (on the same signal at dierent times) [5] , spatially (between dierent signals at the same time), and spatio-temporal correlations (dierent signals at dierent times) [8, 12, 10] .
In our circuits, the only correlations that are not taken into consideration by our Markov model are correlations between dierent input choices. Correlations among the input choices aect the probability of each trace. More specically, they aect the probability of sequences of transition lists. They do not aect, however, the expected number of occurrences of a particular transition list in an average trace. Since the energy consumed in a circuit depends only on the expected number of occurrences of each transition list, we can conclude that correlations do not aect average energy.
Results and Conclusions
We h a v e applied our procedure to four circuits whose STG specications are modied versions of those used in a large industrial circuit [4] and report the results in Table 1 . Since these specication are taken from a benchmark set [7] , their use in a large system is unknown. Hence, we had to make some assumptions about their use. First, we assumed that all mutually exclusive c hoice transitions are equiprobable. Since these circuits have not been layed out, we had to make some assumptions on the load capacitances of the gates in the circuit. Second, we assume each fanout of a signal contributes 25fF to the output load of the gate. The circuit netlist provides the fanout of internal signals while each primary output is assumed to have a total of 4 fanouts. Third, we assume a 5 v olt power supply is used.
The column labeled jRj shows the number of signal transitions in the original STG, the column labeled jA Impl j shows the number of circuit signals, the column labeled jj shows the number of dierent transition lists obtained by serializing and then collapsing the original STG. The last column E represents our nal estimate for average energy per external signal transition.
The main purpose of the technique proposed in this paper is to facilitate an energy consumption comparison between two implementations of the same STG. Because the complexity of the approach i s v ery low, this technique may guide logic optimizations (such as those described in [1] ) to facilitate the synthesis for low p o w er. In addition, when combined with the average number of transitions per high-level operation, 
