Performance Modeling of Asynchronous Pipelines -- An Overview  by Ren, Hongguang et al.
Procedia Engineering 29 (2012) 3788 – 3793
1877-7058 © 2011 Published by Elsevier Ltd.
doi:10.1016/j.proeng.2012.01.572
Available online at www.sciencedirect.com
Available online at www.sciencedirect.com
          Procedia Engineering  00 (2011) 000–000 
Procedia
Engineering
www.elsevier.com/locate/procedia
2012 International Workshop on Information and Electronics Engineering (IWIEE) 
Performance Modeling of Asynchronous Pipelines -- An 
Overview
Hongguang Ren*, Zhiying Wang, Wei Shi 
School of Computer, National University of Defense Technology, Changsha 410073, China 
Abstract 
It is important to give a quick estimation of the performances of asynchronous pipelines during the design process. 
This paper provides an overview of performance modelling techniques of asynchronous pipelines. Different 
performance modelling techniques are classified into several groups according to the structures of asynchronous 
pipelines for which these techniques are suitable. Comparisons among the modelling techniques are also given. At the 
end, we prospect the future of the performance modelling of asynchronous pipelines. 
© 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of Harbin University 
of Science and Technology. 
Keyword: Asynchronous pipelines; Performance Modeling; Overview; Petri Nets 
1. Introduction 
According to the reports of the International Technology Roadmap for Semiconductors (ITRS) [1], 
more than 49% of the global signals in a chip will be using asynchronous circuits. The reason of this can 
be concluded as the followings two sides: 1) Synchronous circuits are facing some limits along with the 
shrinking of the sizes of semiconductor technologies, e.g. the efforts to solve the clock screw problem are 
becoming hardly affordable; 2) Asynchronous circuits can offer several potential advantages over 
synchronous counterparts [2] especially with the elimination of clock screw problem and the power 
efficiency. Furthermore, the asynchronous circuits have the potential to achieve an average-case 
performance rather than a worst-case performance in synchronous circuits. 
Using simulations to measure the performance of an asynchronous pipeline is direct; however, the 
* Corresponding author, E-mail address: ren@nudt.edu.cn 
Open access under CC BY-NC-ND license.
Open access under CC BY-NC-ND license.
3789Hongguang Ren et al. / Procedia Engineering 29 (2012) 3788 – 37932 H. Ren et al. / Procedia Engineeri g 00 (2011) 00–000 
simulation based methods have two major disadvantages: 1) the performance results obtained by 
simulations are input specific and it is very difficult to prepare a test vector which can reflects the whole 
respects of the performance of the design; 2) the simulation based methods are time-consuming. 
Performance modeling techniques can conquer these limitations. This paper provides an overview of the 
modeling techniques being used to measure the performances of asynchronous pipelines. 
We divide the asynchronous pipelines into several types. For different types of asynchronous 
pipelines, different modeling techniques are used. We also give a comparison of different modeling 
techniques. 
2. Preliminaries 
In the synchronous pipeline, a global clock is used to control the propagation of data in the data path 
which is usually consisted of combinational logics and registers. While in the asynchronous pipeline, 
there is no global clock. Communications between adjacent stages are carried on the local handshake 
channels, which contains both data and control. Unlike the synchronous pipelines, each stage may have 
an independent delay in an asynchronous pipeline, which has no effects on the correctness of the circuits. 
Compared with the worst-case performance of synchronous pipelines, asynchronous pipelines have the 
potential to achieve an average-case performance. Furthermore, asynchronous pipelines consume 
dynamic power only on demand. The global clock in a synchronous pipeline never stops clicking, which 
may cause power waste when the pipeline is idle. 
Various structures of asynchronous pipelines can be formed using different handshake components. 
Although many other handshake components, e.g. components in Balsa [3], can be used to implement 
asynchronous pipelines, this paper focuses on a set of basic components used in [4] that is sufficient to 
implement data-driven asynchronous pipelines. 
3. Performance Modeling of Different Asynchronous Pipelines 
3.1. Performance Modeling of Linear Acyclic Asynchronous Pipelines 
A linear acyclic asynchronous pipeline is an asynchronous pipeline contains no branches and cycles 
viewed from a sight of components and channels. It is the simplest asynchronous pipeline, while it is also 
the most common structure being widely used for high performance asynchronous pipelines. These high 
performance asynchronous pipelines. A survey of these high performance asynchronous pipelines can be 
found in [5]. 
Using the Operator and Latch components only, we can construct a typical linear data-driven 
asynchronous pipeline as shown in Fig 1. Each stage consists of an Operator component and a Latch 
component. Consider a linear data-driven asynchronous pipeline of L  stages as shown in Fig 1, unfolded 
dependency graphs [6] can be used to model the handshaking signals of the interfaces between adjacent 
stages. Using a variable-delay model, where delay of each stage may vary between a lower and upper 
bound, the throughput of a linear data-driven asynchronous pipeline can be obtained by using analytic 
method. For each stage, we assume the delays between the receipt of the last input and the production of 
any of the two outputs is bounded below by ± and from above by ¢ . The response times of the 
environments, i.e. the delay of the input/output environment to make a/an request/acknowledge when an/a 
acknowledge/request is received by the input/output environment, are assumed to be fixed as ±E , where 
±E ¸ ±.
3790  Hongguang Ren et al. / Procedia Engineering 29 (2012) 3788 – 3793H. Ren et al. / Procedia Engineering 00 (2011) 000–0 0 3
O 0 O 1 O 2 O 3
r4
a4
S tage3
L 0 L 1 L 2 L 3
r0
a0
r00
a00
S tage0
r1
a1
r01
a01
S tage1
r2
a2
r02
a02
S tage2
r3
a3
r03
a03
Fig. 1. A linear asynchronous pipeline 
The response times of a pipeline, which are defined as the delays between requests and succeeding 
acknowledgements of the first stage, can be calculated using the formulas obtained in [6]. It has been 
shown that the worst-case response time W R  for the linear asynchronous pipeline is bounded from above 
by ±E + (L + 1)(¢ ¡ ±). While for the average-case response time A R  for the linear asynchronous 
pipeline is bounded from above by ±E + ¢ ¡ ±, which is independent of the length of the pipeline L .
Suppose a 4-Phase handshake protocol is being used, then the throughput of a linear asynchronous 
pipeline can be modeled as: tpt4 P h ase = 1=2(A R + ±E ) = 1=(4±E + 2¢ ¡ 2±).
3.2. Performance Modeling of Asynchronous Pipeline Rings 
Some equations and graphic results of the asynchronous pipeline rings have been given in [7]. 
Suppose an asynchronous pipeline ring containing N  stages and the number of tokens kept in the ring is 
denoted by K . Given that a computation round needs G  stages to finish, the total latency of an 
asynchronous pipeline ring, denoted by ¸ , is the delay between the introduction of a new data token into 
the ring and the removal of the corresponding processed token after the number of iterations necessary for 
the token to have passed through G  function evaluation stages in all. Then the overall throughput of the 
pipeline ring is given by T = K =¸ .
1 N L f
P
N L f
S (L f + L r ) N (1S ¡
L f
P )
N
S ¡ 12
1
2G L r
1
G L f
N
G P
N
G S (L f + L r )
T = KG L f
T = NG P
T = 1G L r (NS ¡ K )
Bubble Limited
T
K
                      
Fig. 2. Throughput vs. K  (tokens) while N  (stages) constant [7]        Fig. 3. Throughputs of two parallel composed pipelines 
Fig 2 shows the throughput T  versus the tokens K  for a fixed number of stages N , where L f  is the 
forward latency of a stage, L r is the reverse latency of a stage, P  is the local minimum cycle time which 
includes the delays of all the transitions necessary for a stage to evaluate, reset and become enabled again 
for the evaluation of the next token, and S  is the number of stages required to contain a token. The main 
drawback of the Williams’ method is that the pipeline rings to be analyzed have to be with fixed-delay 
stages. In [8], the authors gave a more general analyzing method for pipeline rings with arbitrary 
stochastic delays, which is also based on the dependency graphs. 
3.3. Performance Model of Nonlinear Pipelines without Choices 
For the nonlinear asynchronous pipeline without choices (i.e., the presence of conditional computation 
constructions), the Fork, Join, Operator and Latch components may be used in the constructions. Without 
the Steer and Merge components, there is no non-determinism while running the asynchronous pipeline. 
3791Hongguang Ren et al. / Procedia Engineering 29 (2012) 3788 – 37934 H. Ren et al. / Procedia Engineeri g 00 (2011) 00–000 
Three typical techniques can be used to model this type of asynchronous pipelines: unfolded process 
graph [9-10], marked graph[11] and canopy graph [12]. When the first two models are being used, 
algorithms to find the steady states of the model must be employed. The cycle period of the asynchronous 
pipeline can be calculated by measuring the time separation of events (TSE) [9], which can directly 
reflects the performance of the asynchronous pipeline. Besides, theory based on Markov Chains can also 
be used to analyze the executions of the models [13-15]. Since the performance of an asynchronous 
pipeline is highly dependent on the slacks and the token numbers, it may perform some dynamic 
behaviors [12]. The canopy graph is a useful tool to model the dynamic performances of the 
asynchronous pipelines. Using different rules, different canopy graph can be composed to form a new 
canopy graph to model the performance of the composition of asynchronous pipelines. For example, the 
canopy graph of a parallel composition of two pipelines shown in Fig 3 is the intersection of the original 
curves. So the resulting peak performance is not greater than either pipeline.  
3.4. Performance Model of Nonlinear Pipelines with Choices 
All the basic components can be used while constructing non-linear asynchronous pipelines with 
choices. Since the performance of the pipeline is highly dependent on the choices made during the 
execution of the asynchronous pipelines, some stochastic analysis has to be used. The Event-Rule system 
[16] and the timed petri net with stochastic behaviors [17-18] are two typical tools being used to model 
the performance of nonlinear asynchronous pipelines with choices. Both of the modeling methods reduce 
to the linear programming problem which can be very time-consuming when the design is large. Next, we 
briefly introduce the petri net [19] based method. 
The petri net model of the whole asynchronous pipeline can be formed by composing the petri net 
model of each component in the asynchronous pipeline. The delay information can be added to the 
transitions or places in the petri nets forming the timed petri nets. By running the timing execution of the 
model, the performance of the asynchronous pipelines can be obtained. The biggest challenge to both the 
petri net based and the Event-rule based methods is that the computation complexity can be very large 
when the design becomes too complicated which contains many components. It has been proved that the 
minimal cycle time problem (MCTP) [20] for P-invariant petri nets is NP-complete. 
Gill, et al [21] further developed the canopy graph based model to cover the choices in asynchronous 
pipelines. The runtime are highly efficient compared to petri net based methods. A shortage of the method 
is that the method is only applicable to asynchronous pipelines with hierarchical topologies, e.g. pipelines 
in Balsa. 
Table 1. Comparison of different performance modeling techniques for asynchronous pipelines 
Modeling tools Analyzing method Suitable pipeline structures Runtime 
Dependency graph Analytical Linear, Rings Quick 
Process graph Unfolding Non-linear without choices Medium 
Marked graph Timing execution Non-linear without choices Medium 
Canopy graph Analytical Non-linear Quick 
Timed petri nets Unfolding Non-linear with choices Slow 
Event-Rule Timing execution Non-linear with choices Slow 
4. Conclusions 
The reason that we introduce different models for different pipelines is that some model is more 
suitable than other models. Table 1 gives qualitative comparisons of different modeling method, which 
3792  Hongguang Ren et al. / Procedia Engineering 29 (2012) 3788 – 3793H. Ren et al. / Procedia Engineering 00 (2011) 000–0 0 5
may offers good hints for choosing a proper method for a specific pipeline. As the asynchronous design is 
becoming more and more important, it is desirable to develop more powerful CAD tools for 
asynchronous pipelines. Performance estimation of asynchronous pipelines plays an important role in this 
process. Currently, the modeling methods are either quick but limited or general but time-consuming. 
Timed petri net is a general tool used for modeling. It is suitable for some designs when the state space of 
the model is not large. However, as the designs are becoming larger and larger, the timed petri net based 
methods are reduced to simulation based methods. 
Acknowledgements 
We gratefully acknowledge the supports of the National Natural Science Foundation of China under 
Grant No.60873015 and NO.61070037 
References
[1] The International Technology Roadmap for Semiconductors edition (2009) Chapter Design Semiconductor Industry 
Association, 2009 
[2] Van Berkel C, Josephs M, Nowick S. Applications of asynchronous circuits. Proceedings of the IEEE, 1999, 87, 223 -233 
[3] Edwards D, Bardsley A. Balsa: An Asynchronous Hardware Synthesis Language. The Computer Journal, 2002, 45, 12-18 
[4] Bardsley A, Tarazona L, Edwards D. Teak: A Token-Flow Implementation for the Balsa Language. In: Proc. of ACSD, 2009, 
23-31 
[5] Nowick SM, Singh M. High-Performance Asynchronous Pipelines: An Overview. IEEE Design and Test of Computers, 2011, 
28, 8-22 
[6] Ebergen J, Berks R. Response-time properties of linear asynchronous pipelines. Proceedings of the IEEE, 1999, 87, 308-318 
[7] Williams T. Analyzing and improving the latency and throughput performance of self-timed pipelines and rings. In: Proc. of 
ISCAS,  1992, 2, 665-668 
[8] Greenstreet MR, Steiglitz K. Bubbles can make self-timed pipelines fast. The Journal of VLSI Signal Processing, 1990, 2, 
139-148 
[9] Hulgaard H, Burns S, Amon T, Borriello G. An algorithm for exact bounds on the time separation of events in concurrent 
systems. IEEE Trans. on Computers, 1995, 44, 1306 -1317 
[10] Chakraborty S, Yun K, Dill D. Timing analysis of asynchronous systems using time separation of events. IEEE Trans. on 
Computer-Aided Design of Integrated Circuits and Systems, 1999, 18, 1061 -1076 
[11] McGee P, Nowick S. An efficient algorithm for time separation of events in concurrent systems. In: Proc. of ICCAD, 2007, 
180 -187 
[12] Lines A M. Pipelined Asynchronous Circuits. Master thesis, Caltech University, California USA, 1998 
[13] Kudva P, Gopalakrishnan G, Brunvand E, Akella V. Performance analysis and optimization of asynchronous circuits. In: 
Proc. of ICCD, 1994, 221-224 
[14] Nowick S, Coffman E, Mcgee P. Efficient performance analysis of asynchronous systems based on periodicity. In Proc. of 
CODES+ISSS, 2005, 225 -230 
[15] Xie A, Beerel P. Symbolic techniques for performance analysis of timed systems based on average time separation of 
events. In: Proc. of ASYNC, 1997, 64 -75 
[16] Burns S M. Performance Analysis and Optimization of Asynchronous Circuits. PhD thesis, California Institute of 
Technology Pasadena, California USA, 1991 
[17] Mercer E, Myers C. Stochastic cycle period analysis in timed circuits. In: Proc. of ISCAS, 2000, 2, 172 -175 
[18] Xie A, Beerel PA. Performance Analysis of Asynchronous Circuits and Systems using Stochastic Timed Petri Nets. 
Hardware Design and Petri Nets, 1999, 239-268 
[19] Murata T. Petri nets: Properties, analysis and applications. Proceedings of the IEEE, 1989, 77, 541 -580 
3793Hongguang Ren et al. / Procedia Engineering 29 (2012) 3788 – 37936 H. Ren et al. / Procedia Engineeri g 00 (2011) 00–000 
[20] Ramamoorthy C, Ho G. Performance Evaluation of Asynchronous Concurrent Systems Using Petri Nets. IEEE Trans. on 
Software Engineering, 1980, SE-6, 440 - 449 
[21] Gill G, Gupta V, Singh M. Performance estimation and slack matching for pipelined asynchronous architectures with 
choice. In: Proc. of ICCAD, 2008, 449-456 
