Stochastic cycle period analysis in timed circuits by Myers, Chris J. & Mercer, Eric G.
STOCHASTIC CYCLE PERIOD ANALYSIS IN TIMED CIRCUITS
Eric G. Mercer and Chris J. Myers
Electrical Engineering Department
University of Utah
Salt Lake City, UT 84112
ABSTRACT
This paper presents a technique to estimate the stochastic
cycle period (SCP), a performance metric for timed asyn-
chronous circuits. This technique uses timed stochastic Petri
nets (TSPN) which support choice and arbitrary delay dis-
tributions. The SCP is the delay of the average path in a
TSPN when represented as a sum of weighted place delays.
A place delay is the expected value of its associated distri-
bution and its weight denotes its importance in the average
path of the TSPN. The approach analyzes finite execution
traces of the TSPN to derive an expression for the weight
values in the SCP. The weights can be analyzed with ba-
sic statistics to within an arbitrary error bound. This paper
demonstrates the use of the SCP to aggressively optimize
timed asynchronous circuits for improved average-case per-
formance by reducing transistor counts, reordering input
pins at gates, and skewing transistor sizes to favor important
transitions. Each optimization effort is directed to improve
the average-case delay in the circuit at the possible expense
of the worst-case delay.
1. INTRODUCTION
Asynchronous design styles were engendered before many
synchronous techniques, but were left to the wayside be-
cause of their perceived difficulty of implementation. Com-
ponents in an asynchronous circuit operate as fast as they
can and notify other components when they have complet-
ed their work. In this type of circuit, the traditional cost
metric must be redefined. In synchronous circuits, one must
optimize for the worst-case behavior. It is the worst-case
that is going to set the clock frequency, regardless of how
often that worst-case actually occurs. When designing an
asynchronous circuit, the designer must optimize for the
average-case, not the worst-case scenario.
Circuits designed for average-case using asynchronous
techniques have a potential for higher performance. This
potential is harnessed as early as 1955 in the design of an
This research is supported by a grant from Intel Corporation, NSF
CAREER award MIP-9625014, and SRC contract 97-DJ-487.
asynchronous adder [1]. Through the use of dual rail en-
coding and completion detection, this simple ripple carry




is the number of bits in the addition. Traditional worst-case
design techniques bound the performance at
✄
. A more
recent example of exploiting average-case performance is
seen in [2]. This design of an asynchronous IA32 instruc-
tion decoder is aggressively optimized for the most common
instruction lengths. The result is a three times improvemen-
t in throughput at half the latency, dissipating only half the
power, and requiring about the same area as an existing syn-
chronous IA32 decoder running at 400 MHz.
In order to exploit average-case performance, it is nec-
essary to have an appropriate systematic method to analyze
and optimize circuits and specifications throughout the de-
sign process. This paper develops the stochastic cycle peri-
od (SCP) as one such method. The SCP is a performance
analysis metric for timed circuits [3] which is integrated into
the CAD tool ATACS. The SCP analysis technique operates
on timed stochastic Petri nets (TSPN), which are capable
of modeling choice and arbitrary delay distributions. Given
a timed asynchronous circuit modeled by a TSPN, the SCP
is a sum of weighted place delays showing the average-path
in the TSPN. Each place delay in the sum is the expected
value of the delay distribution on that place, and its weight
shows its importance in the average-delay of the TSPN. An
expression for the weights is derived by analyzing finite un-
rollings of the TSPN. The expression can be evaluated using
basic statistics. The value of the SCP using the place delays
represents the average-delay of a cycle in the TSPN.
Many methods for asynchronous performance analysis
do exist and this work strives to improve and extend these
techniques. The SCP builds on the cycle period in [4] by
incorporating bounded delays with arbitrary distributions
and choice constructs in the specification. In [5], a tech-
nique is presented that calculates the average time separa-
tions of events (TSE) by analyzing finite unrollings of s-
tochastic timed Petri nets. The analysis of the weights in
the SCP uses the finite unrolling methods of [5]. Howev-
er, the SCP uses the max diff algorithm from [3] to derive
the TSEs rather than a longest path analysis. The max diff
algorithm removes the need to maximize over all longest
paths found from each initially marked rule in the cut set
of the TSPN and reduces the need to minimize the initially
marked places in the cut set. Furthermore, the work in [5]
only presents a single value that denotes the average time
separation between two events. Although [5] does calculate
the average-case delay of a cycle in the system, it does not
present any information about the paths in the system that
constitute that delay. This approach augments the analysis
in [5] to extract the constraining path from the TSPN. This
path is used to derive the weight values in the SCP.
Markovian analysis has also been applied to the perfor-
mance analysis of asynchronous circuits [6, 7]. SCP analy-
sis avoids the computational complexity of Markovian anal-
ysis and reduces the number of statistical metrics to a man-
ageable set. Moreover, the methods in [6, 7] either present
a metric for every edge or state in the reachable system, or
they reduce the information down to a single value. The
SCP presents a greatly reduced set of metrics—on the or-
der of the number of places in the TSPN—which denote the
relative importance of different paths in the asynchronous
circuit.
This paper is organized as follows. Section 2 presents
the system model, describes the SCP, and briefly discusses
how it is derived. Section 3 gives an example of the SCP
analysis for a simple enhanced latch controller and discuss-
es how it can be used to analyze and optimize circuit per-
formance. Finally, Section 4 demonstrates runtimes for the
SCP on various designs and proposes areas of future work.
2. THE STOCHASTIC CYCLE PERIOD
The SCP uses a TSPN representation of timed circuits. S-
ince delay in a timed circuit is a complex function of process
variation and environment, delays for places in the TSPN
are modeled as bounded stochastic distributions. In prac-
tice, the TSPN behaves much like a 1-safe marked Petri net
or Signal Transition Graph (STG), except that transition-
s cannot immediately fire when they become enabled in a
marking. Rather, when a token enters a place, it undergoes
a delay that is prescribed by its stochastic distribution. Once
the token has completed its prescribed delay, the token be-
comes available to waiting transitions. If enough tokens are
available to a waiting transition to enable it to fire, it fires
instantaneously. A trigger is defined as the last place to
become available to a transition causing it to fire [8]. The
TSPN model forces interleaving semantics and only allows
a single token to become available at a time.
The SCP, ✆ , is defined as a sum of weighted delays giv-
en as ✆✞✝✠✟☛✡✌☞✎✍ ✏✒✑✔✓✖✕✘✗ uv ✙ uv. ✚ is the set of all possible
transitions allowed in the TSPN. The place delay,
✙ uv, is the
expected value of the delay for the distribution at the place
between transitions ✛ and ✜ . This delay can be related to
the circuit by setting it to be the average measured delay
between a transition of ✛ and a transition of ✜ on the gate
implementing ✜ when the gate is triggered by ✛ (i.e., ✛ is
the last signal transition needed for ✜ to transition). The
✗
☞✢✏ value is a multiplier that determines the importance of
✙ uv in an average-cycle of the circuit. A ✗ ☞✣✏ value at or n-
ear 1 implies that the associated delay is often contributing
to the average-delay of the circuit. A ✗ ☞✢✏ at or near 0 im-
plies that the associated delay rarely, if ever, contributes to
the average-delay of the circuit. This means that the tran-
sition happens in parallel with some other slower transition
(or transitions).
Following notation in [5], the expression for the ✗ ☞✢✏
values develops as follows: let a timed execution ( ✤ ) of a
TSPN be an acyclic event graph or an unfolding of the TSP-
N with all choice resolved and places assigned delay values.
An event ✥ ✡✂✦✒✑ is defined as the ✧ th instance of the transition ✥
in the timed execution ✤ . Formally ✤★✝✪✩✫✚✭✬✯✮✱✰✲✬✳✮✵✴✷✶ where
✚✳✬ is the set of all events, ✰✲✬✹✸✺✚✳✬✼✻✽✚✳✬ is the flow rela-
tion, and ✴✽✸✾✚✳✬❀✿❂❁ is a mapping function where ✴❃✩❄✥ ✡❅✦✒✑ ✶
returns the time of the ✧ th firing of ✥ .
To generate a timed execution, it is first assumed that all
initially marked places in the TSPN receive tokens at time
zero. Each token is then assigned a random delay value
sampled from its associated place distribution. The system
clock is then iteratively advanced to the earliest token time
that is to become available to waiting transitions. If when
a token becomes available it enables a waiting transition to
fire, then the transition is fired, tokens are moved to new
places, and they are assigned random delays from their re-
spective place distributions.
For a given timed execution pair ✩❆✥ ✡✂✦❇✑ ✮❉❈ ✡✂✦❇❊✳❋❉✑ ✶ , their time
separation is defined as ● ✡✂✦✒✑ ✩❆✥✎✮✵❈✒✮■❍✖✶❏✝❑✴❃✩▲❈ ✡✂✦✒❊❃❋❉✑ ✶◆▼✽✴❃✩❆✥ ✡✂✦❇✑ ✶ .
The critical path function, ❖ , returns the sequence of events
that determine the value of ● ✡✂✦✒✑ ✩❆✥✎✮✵❈✒✮■❍✖✶ . Formally, ❖ is de-














































This states that given any pair of adjacent events ✩✫❈ ✡♠❧✵✑♥ ✮✵❈ ✡♠❧✵❊✳♦♣✑♥ ❊
P
✶








✶ must be in the flow re-
lation as required by (1), and ✩▲❈ ✡♠❧✵✑♥ ✮❉❈ ✡♠❧✵❊❃♦♣✑♥ ❊
P
✶ must satisfy the
backtrace requirement of (2). (2) states that the time separa-
tion of ❈
✡♠❧✵✑
♥ and the end of the sequence ❈
✡✂✦✒❊✳❋✵✑
❘ must be equal
















✶ . The ✐ term accounts for adjacent
events on a new cycle boundary, since it is possible to have
❈
♥ in cycle ❜ and ❈ ♥ ❊ P in cycle ❜✲❡⑤❳ . Let ⑥⑦✸◗✚ ✬ ✻⑧✚ ✬ ✻✘❖⑨✿





























if the events ✥ ✡
♥
✑
and ❈ ✡♠❧✵✑ are found in the critical





✶ pair in the flow relation ✰✺✬ .
Given a pair of transitions ✩▲✛✺✮❉✜✾✶ , an event trace ✤ , and a
critical path ❖ ✡❅✦✒✑ ✩❄✥✖✮✵❈✒✮■❍✖✶ , the weight value ✗
✡✂✦✒✑


















which states that the weight of a given transition pair is e-
qual to the sum of P
❋





and ✜ ✡♠❧✵✑ in the critical path ❖ ✡✂✦✒✑ ✩❆✥✎✮❉❈✒✮❉❍◗✶ . Let ❿
represent the set of all possible timed executions of a giv-
en TSPN. If ✤ is randomly sampled from ❿ , then ✗
✡✂✦✒✑
☞✢✏ is




✸➁✧r✝➂❳✖✮❶➃✾✮✱➄❻✮❯❚❱❚❯❚♠➅ is a random process.













☞✣✏ which is defined as ✗ ☞✣✏ .
3. EXAMPLE
Figure 1 shows the signal transition graph (STG) for the
enhanced latch controller from [9] and the circuit synthe-
sized by ATACS. The STG is translated to the TSPN model
by making the transition delays uniformly distributed across
bounds that are set to be ➌❩➃✖❣➎➍ of the delay values from
SPICE shown in [9]. Before generating the circuit, the s-
tochastic cycle period for the enhanced latch controller is







































Figure 1: The STG and circuit implementation of an en-
hanced latch controller courtesy of [9].
is the simple latch controller from [9]. This is done by us-
ing the expected delays of places in the TSPN as the
✙ uv
delays in the SCP. This simple analysis shows the enhanced
latch controller to have an average-case performance that is
❳✖❚ ➄✎➏ times faster than the simple latch controller. This num-
ber correlates well to the ❳✖❚ ➐✉➑ times speedup shown in the
SPICE results from [9]. The slight difference is attributed
to the fact that results from the SPICE simulation are de-
pendent on a single input trace with fixed times for things to
happen, as well as the fact that SPICE cannot consider pro-
cess and environment variations. It produces a fixed delay
that is determined by the inputs and the model used in the
simulation.
The weights on the arcs of the STG in Figure 1 de-
note the ✗ uv values from the SCP. The larger the weight,
the greater the amount of delay the arc contributes to the
average-case performance of the circuit. Therefore, if a
trigger-dependent delay has the value
✙ uv, then the amount
of time that delay contributes to the average-cycle of the cir-
cuit is ✗ uv ➒ ✙ uv, where ✗ uv is the weight multiplier from the
SCP. Using these weights, further optimization to the circuit
can be applied. According to the weights, the delay for the
transition ➓❩❡ is largely controlled by the trigger ➔❛❡ and
both ➔❛❡ and ➓⑧❡ make significant contributions to the cy-
cle period. Therefore, ➔❛❡ should be near the output of the
gate implementing ➓❩❡ to optimize its performance. For the
falling transition of ➓→▼ , ➔⑤▼ controls the delay, and thus,
➔③▼ should be near the output of the gate. The transition
➣
❡ is triggered by Rin ❡ and ➓→▼ , but ➓→▼ is not directly






by Rin ▼ and Lt ❡ , but Rin ▼ has negligible weight, and it is
thus, not a contributor to the cycle period. Therefore, Lt ❡
should be moved near the output of the gate for
➣
▼ .
A more aggressive optimization of this circuit involves
tightening timing bounds to restrict out triggers in the actu-
al implementation. The weights from the SCP can be used
to identify trigger signals that do not contribute to the cycle
period. In this example, it is extremely unlikely that Aout ❡
triggers ➓→▼ . If the bounded delay for Aout ❡ is tightened
by 0.5%, the signal Aout ❡ is no longer needed in the imple-
mentation of ➓→▼ . Similarly, a tightening of about 0.5% re-
moves Rin ▼ from the gate for
➣
▼ . This shows how the SCP
can be used to restrict out triggers that have low weights.
Although this type of optimization may seem aggressive, it
is important to note that the delay assumptions in the begin-
ning are typically very conservative. As the design matures,
the timing assumptions are brought closer to their actual de-
lay ranges. Moreover, at this point the circuit designer has
a better understanding of the amount of slack found in the
system. The SCP is designed to better utilize this slack.
Another possible optimization is transistor sizing. For
example, looking at the STG, transitions that make signifi-
cant contributions to the delay of the circuit can be readily
identified. With this information, it is possible to size the
transistors in the gate implementations to favor transitions
that fall in the critical cycle. Consider the signal Lt. The
high going phase of Lt is on the critical path with a weight
value of 0.99, and the low going transition falls off the criti-
cal path with a weight value of 0.15. With this information,
it is possible to skew the gate implementing Lt to favor the
rising transition, since that is the critical edge. This can eas-
ily be accomplished by increasing the widths of the transis-
tors involved in the rising transition of Lt, with the transistor
near the power rail having the largest width.
4. RESULTS AND CONCLUSIONS
Early results for the SCP are promising. We have deter-
mined that we can do performance analysis on larger sys-
tems. To show this, we analyze a number of enhanced latch
controllers connected in series on a 550 MHz Pentium III
processor with ➄✎↔◗➐ MB of memory. For a 4 stage enhanced
latch, ATACS finds 2416 states in 222 seconds. The value of
the SCP converges to ❳✖❳↕➑↕➙✷❚ ➙➛➌❥❳✎❳✖❚ ➙❻❳ at a 95% confidence
interval with a 1.0% relative error in 132 seconds. This cor-
relates with the computed TSE, reported at the same confi-
dence interval, to be ❳✖❳✣➜➎➑✉❚ ➄❙➌③↔✷❚✌❳ . In fact, ATACS is unable
to find the state space for a 5 stage enhanced latch controller.
However, the SCP is found in 150 seconds. This shows how
the SCP scales to larger systems.
In [2], an asynchronous instruction length decoder is p-
resented that makes extensive use of timed circuits. One of
the principle timed circuits is the tag unit shown in Figure
2. Tag units are arranged in a matrix, and each column of
the matrix is connected to a length decoder. When a col-
umn is the first byte of an instruction, its tag unit is alerted.
The tag unit looks at the length of the instruction and then
forwards the tag to the column of the first byte of the next
instruction. This example illustrates our method applied to
specifications containing choice constructs since the system
must choose both an instruction length and an incoming tag.
For this example, the SCP is determined in 35 seconds and
reflects the frequency of instruction lengths as described by
the specification.
This paper has presented the SCP as a performance met-
ric for timed systems. It presented an expression for directly
calculating the ✗ uv values in the SCP that does not require
state space exploration. The new method is based on finite
unrollings of the TSPN model and is faster than previous
simulation methods in [8]. This paper has presented meth-
ods of optimizing circuits for average-case performance us-
ing the SCP. These methods are: pin ordering, transistor
count reduction through trigger signal restriction, and un-
balanced sizing to favor important transitions in the SCP.












Figure 2: Intel RAPPID tag unit.
transistor sizing and pin ordering, as well as aiding the de-
signer in identifying triggers to restrict from gate implemen-
tations. We also plan to develop methods for finding good
bounded delays to use in the TSPN model and a method of
sizing transistors to meet the specified delays.
Acknowledgements
We are indebted to Peter A. Beerel of the University of
Southern California for helping us to better define the SCP,
as well as his insight on its derivation. In addition, we would
to thank the following people of the University of Utah who
contributed to this manuscript: Eric Peskin, Kip Killpack,
and Robert Thacker.
5. REFERENCES
[1] Bruce Gilchrist, J. H. Pomerene, and S. Y. Wong. Fast carry logic for
digital computers. IRE Transactions on Electronic Computers, EC-
4(4):133–136, December 1955.
[2] Shai Rotem, Ken Stevens, Ran Ginosar, Peter Beerel, Chris Myers,
Kenneth Yun, Rakefet Kol, Charles Dike, Marly Roncken, and Boris
Agapiev. RAPPID: An asynchronous instruction length decoder. In
Proceedings of International Symposium on Advanced Research in
Asynchronous Circuits and Systems, pages 60–70, April 1999.
[3] Chris J. Myers. Computer-aided synthesis and verification of gate-
level timed circuits. PhD thesis, Stanford University, October 1995.
[4] Steven M. Burns. Performance analysis and optimization of asyn-
chronous circuits. PhD thesis, California Institute of Technology,
1991.
[5] Aiguo Xie, Sangyun Kim, and Peter A. Beerel. Bounding average
time separations of events in stochastic timed Petri nets with choice.
In Proceedings of International Symposium on Advanced Research in
Asynchronous Circuits and Systems, pages 94–107, April 1999.
[6] Aiguo Xie and Peter A. Beerel. Symbolic techniques for performance
analysis of timed systems based on average time separation of events.
In Proceedings of International Symposium on Advanced Research in
Asynchronous Circuits and Systems, pages 64–75. IEEE Computer So-
ciety Press, April 1997.
[7] P. Kudva, G. Gopalakrishnan, and E. Brunvand. Performance analysis
and optimization for asynchronous circuits. In Proceedings of Interna-
tional Conf. Computer Design (ICCD). IEEE Computer Society Press,
October 1994.
[8] Eric G. Mercer. Stochastic cycle period anlaysis in timed circuits.
Master’s thesis, University of Utah, 1999.
[9] S. B. Furber and J. Liu. Dynamic logic in four-phase micropipelines.
In Proceedings of International Symposium on Advanced Research in
Asynchronous Circuits and Systems. IEEE Computer Society Press,
March 1996.
