Statistical Modeling of Pipeline Delay and Design of Pipeline under
  Process Variation to Enhance Yield in sub-100nm Technologies by Datta, Animesh et al.
Statistical Modeling of Pipeline Delay and Design of Pipeline under Process 
Variation to Enhance Yield in sub-100nm Technologies*
Animesh Datta, Swarup Bhunia, Saibal Mukhopadhyay, Nilanjan Banerjee, and Kaushik Roy 
Dept. of ECE, Purdue University, West Lafayette, IN, 47907, USA 
<adatta, bhunias, sm, nbanerje, kaushik> @ecn.purdue.edu 
*
The work is sponsored in part by Marco Gigascale Systems Research  
Center (GSRC), Intel Corp. and Semiconductor Research Corp. (SRC).
Abstract
Operating frequency of a pipelined circuit is determined by the 
delay of the slowest pipeline stage. However, under statistical 
delay variation in sub-100nm technology regime, the slowest 
stage is not readily identifiable and the  estimation of the pipeline 
yield with respect to a target delay is a challenging problem. We 
have proposed analytical models to estimate yield for a pipelined 
design based on delay distributions of individual pipe stages. 
Using the proposed models, we have shown that change in logic 
depth and imbalance between the stage delays can improve the 
yield of a pipeline. A statistical methodology has been developed 
to optimally design a pipeline circuit for enhancing yield. 
Optimization results show that, proper imbalance among the 
stage delays in a pipeline improves design yield by 9% for the 
same area and performance (and area reduction by about 8.4% 
under a yield constraint) over a balanced design.
1. Introduction
Increasing inter-die and intra-die variations in the process 
parameters, such as channel length, width, threshold voltage 
etc., result in large variation in the delay of logic circuits [1]. 
Consequently, estimating circuit performance and designing 
high-performance circuits with high yield (probability that the 
design will meet certain delay target) under parameter 
variations have emerged as serious design challenges in sub-
100nm regime [1, 2, 5]. Statistical analysis of delay and 
techniques to enhance yield in combinational circuits have 
been proposed [2, 3]. In the high-performance design, the 
throughput is primarily improved by pipelining the data and 
control paths [4]. In a synchronous pipelined circuit, the 
throughput is limited by the slowest pipe segment (segment 
with maximum delay) [4]. Under parameter variations, as the 
delays of all the stages vary considerably, the slowest stage is 
not readily identifiable. The variation in the stage delays thus 
result in variation in the overall pipeline delay (which 
determines the clock frequency and throughput). Thus, a 
statistical delay model is necessary to predict the delay 
distribution of the pipeline. 
Traditionally, the pipeline clock frequency has been enhanced 
by: a) increasing the number of pipeline stages, which, in 
essence, reduces the logic depth and hence, the delay of each 
stage [4]; and b) balancing the delay of the pipe stages, so that 
the maximum of stage delays are optimized [4]. However, it 
has been shown that if intra-die parameter variation is 
considered, reducing the logic depth increases the variability 
(defined as the ratio of standard deviation and mean) [5]. The 
effect of balancing the stage delays on the overall delay under 
parameter variation also needs to be analyzed. Thus, 
traditional deterministic approaches for maximizing the 
pipeline throughput need to be reinvestigated to understand 
their effect on pipeline yield under parameter variations.   
In this paper, we have developed statistical delay models and 
proposed a statistical design flow for enhancing the yield of a 
pipelined design considering inter-die and intra-die variations. 
In particular, in this paper, we have developed: 
x Analytical models to estimate the mean and the standard 
deviation of the overall pipeline delay from the individual 
stage delay distributions.  
x Analytical models to predict yield of a pipelined design. 
x Analysis of the effect of logic depth and imbalance in the 
stage delays and correlation among the different stage 
delays on the yield of a pipelined design. 
x A statistical design methodology to minimize the area (or 
power) of a pipelined design under a target yield and 
performance constraint.  
Our analysis shows that, under large intra-die process 
variations reducing the logic depth decreases the yield, which 
is in accordance with [5]. However, if inter-die process 
variation is dominant, the traditional approach of reducing the 
logic depth, results in better yield. It has also been shown that, 
balancing delays of different pipeline stages does not 
necessarily maximize the yield. Under parameter variations, 
proper imbalance among the stage delays result in yield 
improvement. We have come up with a simple heuristic 
(based on the area vs. delay trend of a circuit) to 
systematically incorporate proper imbalance among pipeline 
stages, so that yield can be improved with minimum area 
penalty. The heuristic is used to improve yield of an example 
4 stage pipelined design by 9% from a balanced design (with 
stages independently optimized for equal stage delay). Based 
on the above observation, we have designed a fast gate-level 
sizing algorithm for the complete pipeline to minimize total 
area under a yield constraint. This optimization step reduces 
the area by 8.4% (while ensuring a 80% target yield) 
compared to a balanced design.  
(b)
= 1 Job / Max(SD1, SD2, SD3, SD4, SD5)
SD5SD4SD3SD2
Throughput = 1 Job / Max(stage delay) 
IF EXID WBMEM
SD1
(a)
= 1 Job / 6ns
Throughput = 1 Job / Max(stage delay) 
5ns 5ns 3ns6ns4ns
Latches
IF EXID WBMEM
  Figure 1: A simple 5-stage pipeline and its throughput for    
(a) static delay model, (b) statistical delay model 
Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’05) 
1530-1591/05 $ 20.00 IEEE 
2. Yield estimation of a pipeline design under 
parameter variation 
The delay of a pipeline is determined by the slowest pipe stage 
(Fig. 1(a)) [4]. The variation in process parameters results in a 
significant variation in the delay of each pipe stage and hence, 
the overall pipeline delay [1, 5].  
2.1. Variation in pipeline delay and yield
The delay of a pipeline stage (SD) consists of the clock-to-Q 
delay of the latch (TC-Q), propagation delay through the 
combinational logic (Tcomb) and the setup time (Tsetup) [4]. Both 
the inter-die and the intra-die distribution in process 
parameters result in the variation of the delay of the pipe 
stages. Inter-die distribution shifts the delay of each stage in 
the same direction (i.e. either all increases or all decreases). 
Due to intra-die distribution, the delay of different stages may 
shift in different directions. The random component of the 
intra-die distribution (e.g. Vth variation due to Random Dopant 
Fluctuation [6]) makes the delays of the different stages in a 
pipeline completely independent of each other (uncorrelated 
stage delays). On the other hand, the systematic variation in 
the parameters (e.g. spatially correlated W, L, Tox variations) 
makes the delays of the different stages correlated [1]. 
Moreover, the stages can also be electrically correlated if the 
delay of the combinational logic of one stage affects the delay 
of the subsequent stage [6]. 
Let us consider a pipeline consisting of N stages (Fig. 1). If 
SDi denotes the delay of i-th stage, then the overall pipeline 
delay (TP) is the maximum of N individual stage delays and is 
given by [4]:  
1
1,..., 1,...,
( ) ( )ii iP i C Q C om b Setupi N i N
TT M ax SD M ax T T   
    (1)
The stage delays can be represented as correlated Gaussian
random variables (SDi~N(Pi,Vi)) with mean µi and standard 
deviation ıi [1, 3]. If we neglect the spatial and the electrical 
correlation, the stage delays can be considered as independent 
random variables. The overall pipeline delay TP given in (1) 
will also be a random variable with the mean PT and the 
standard deviation VT.
The probability (PD) that the pipelined design will meet a 
specific delay requirement (say TTARGET) is given by:  
1,...,
Pr{ } Pr{max }D P TARGET i TARGET
i N
P Yield T T SD T
 
     (2)
We define this probability as a measure of the yield of the 
pipelined design. In this section we will discuss methods to 
analytically estimate the overall delay distribution of the 
pipeline (i.e. PT, VT) and the yield (i.e. PD).
2.2. Estimation of distribution of pipeline delay
The mean (PT) and the standard deviation (VT) of the pipeline 
delay (TP) depend on the mean (Pi) and the standard deviation 
(Vi) of the each stage. A lower bound on PT is given by (using 
the Jensen’s inequality [7, 8]): 
1,. .., 1 ,... ,
[ ] [ ] { [ ]}T P i i
i N i N
E T E m a x S D m a x E S DP
  
  t (3)
The above equation states that the mean of the overall pipeline 
delay will be larger than the maximum of the mean delay of 
all the stages. To obtain an exact estimate of PT and VT we 
approximate TP as [8]:  
 
  
  
1 2 1
1 2 1
1 2 2 1,
max , ,..., ,
max , ,...,max ,
max , ,...,max ,
P n n
n n
n n n
T SD SD SD SD
SD SD SD SD
SD SD SD N


 
 
 
 
(4)
where, Nn-1,n represents the normal approximation to max (SDn-
1, SDn). The mean (Pn-1,n) and the standard deviation (Vn-1,n) of 
Nn-1,n can be approximated as [8]:  
     
       
   
 
1 1
2 2 2 2
2 1 1
1
2 2 2
1 1 1 1,
1, 1, 1
2
1, 2 1
      
    ;  2
   
.
n n
n n n n
n n
n n n n n n n n
n n n n
n n
m a
m
a
where a a
Mean of N m
Std Deviation m m
P D P D M D
P V D P V D
P P M D
D P P V V V V U
P
V

 

   
 

 )  )  
  )   ) 
 
    
  
  
(5)
where, ) represents the Cumulative Distribution Function 
(CDF) and M represents the Probability Distribution Function 
(PDF) (M(D) = (2S)-1/2exp(- D2/2)) of a standard normal (P = 0 ҏ
and V = 1) Gaussian variable [8]. The correlation coefficient 
between SDn-1 and SDn is given by Un-1,n. The correlation 
coefficient between Nn,n-1 and SDn-2 (say USDn-2, Nn-1,n)) is 
given by [8]:   
2 , 1 2, 1 2, 2, ( ) ( )n n n n n n n n n nSD NU V U D V U D V     ª º ª º )  ) ¬ ¼ ¬ ¼ (6)
USDn-2, Nn-1,n) is used to estimate the mean and the standard 
deviation of max(SDn-2, Nn,n-1). The above process is repeated 
by taking two variables at a time and finally PT and VT are 
estimated.  
2.3. Estimation of yield    
Using (2) the probability of delay failure (PD) or yield can be 
estimated as [7]:  
1,...,
1,...,
Pr{max } Pr{ ( )}D i TARGET i TARGET
i N
i N
P SD T SD T
 
 
    (7)
160 180 200 220 240 260 280
0
20
40
60
80
100
120
140
Delay (ps)
Pr
ob
ab
ilit
y f
re
qu
en
cy
 di
str
ibu
tio
n Monte−CarloAnalytical
190 200 210 220 230 240 250 260
0
50
100
150
Delay (ps)
Pr
ob
ab
ilit
y f
re
qu
en
cy
 d
ist
rib
ut
ion
Monte−Carlo
Analytical
160 180 200 220 240 260 280
0
20
40
60
80
100
120
140
Delay (ps)
Pr
ob
ab
ilit
y 
fre
qu
en
cy
 d
ist
rib
ut
ion
Monte−Carlo
Analytical
(a)      (b)            (c) 
Figure 2: Delay distribution of a 12-stage inverter chain pipeline (stage logic depth = 10) under process variation with (a) only
random Intra-die variation, (b) only Inter-die variation, (c) Inter- and Intra-die variation with both random and systematic 
components  
Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’05) 
1530-1591/05 $ 20.00 IEEE 
The exact solution of (7) is possible by assuming the stage 
delays (SDi) to be independent Gaussian random variables, as 
shown below [7]:
11,...,
Pr{ ( )}
n
TARGET i
D i TARGET
ii N i
T
P SD T
P
V  
§ ·   )¨ ¸
© ¹
 (8)
If the variables are correlated such a simplification is not 
possible. To estimate PD considering correlated SDis, we 
approximate the overall pipeline delay (TP) as a Gaussian 
random variable (with PT and VT estimated using (5)). Using 
this assumption PD is given by [7]:  
Pr{ } TARGET TD D TARGET
T
T
P T T
P
V
§ · d  )¨ ¸
© ¹
(9)
2.4. Model verification
We have estimated the delay distributions in different pipeline 
structures (number of pipe stage l logic depth in a stage). The 
models have been first verified by Monte-Carlo simulation of 
inverter chain pipelines with the transmission gate Master-
slave flip-flops in 70nm BPTM [9]. SPICE Monte-Carlo 
simulation is first used to determine the mean and the standard 
deviation of the delay of each stage. The simulated Pi and Vi
values for each stage are then fed into the proposed model to 
determine the distribution of the pipeline delay. It can be 
observed that, the delay distribution predicted by the proposed 
model closely matches the SPICE simulation result of a 5x8 
pipeline for: (a) only random intra-die variations (i.e. stage 
delays are considered as independent) (Fig. 2(a)) (b) only 
inter-die variations (i.e. stage delays are perfectly correlated) 
(Fig. 2(b)) and (c) both inter and intra-die distributions with 
spatial correlation (i.e. stage delays are partially correlated) 
(Fig. 2(c)). Close match have also been observed for several 
other pipeline configurations (Table-I). The major source of 
error in the proposed modeling method is the assumption that 
the maximum of two Gaussian variables is also a Gaussian one 
(i.e. Nn-1,n is Gaussian) [7, 8]. This assumption is valid only for 
two uncorrelated variables [7, 8]. Hence, the errors in the 
estimation of the mean and the standard deviation are expected 
to increase with the increase in the number of pipeline stages 
(Fig. 3(a)) and the correlation between the stage delays (Fig. 
3(b)) [7, 8]. It can be observed that the increase in error in the 
standard deviation is more significant in both cases. However, 
in all cases the error in the standard deviation and the mean is 
less than 3% and 0.2%, respectively. The modeling error also 
depends on the ordering of the variables SDi in (2). It has been 
shown that the error is the minimum if the variables are 
ordered in increasing (or decreasing) sequence of their means 
[7]. We have used this ordering in our estimation to minimize 
the modeling error.  
Table-I  Modeling  and simulation results of delay 
distribution and yield for different pipeline configurations 
Monte-Carlo Analytical Model (ps)
Pipeline
Config.
Target
Delay 
(ps)
PT
(ps)
VT (ps)
Yield
(%)
PT
(ps)
VT (ps)
Yield
(%)
8 l 5 160 155 2.82 96 154 2.68 98.62
5 l 8 200 198 3.27 78 198 2.72 77.72
5 l * 215 210 3.67 92 210 3.42 93.00
5 l 8 inter 370 200 29.2 88 199 28.9 86.69
5 l 8
inter + intra
240 201 28.6 90 199 28.04 91.83
* denotes variable logic depths. 
2.5. Estimation of the design space
Using the proposed models, we can estimate the design space 
for the mean and standard deviation of different stages of a 
pipeline that will satisfy a yield constraint. Assuming the 
overall delay of the pipeline to be Gaussian and using (3), the 
upper bound of the mean of each stage delay (Pi) is given by:  
Relaxed Upper Bound
20 40 60
Realizable lower bound
Realizable upper bound
Realizable Region
Minimum sigma bound
Equality Bound(n1)
Minimum mu bound
Equality Bound(n2)
Mu Space
Si
gm
a 
Sp
ac
e
40
35
30
25
20
15
10
5
0
0 10080
Figure 4: Range of permissible mean and standard deviation 
for each stage to meet a target yield 
0 5 10 15 20 25 30
0
0.02
0.04
0.06
0.08
0.1
No. of stages
%
 E
rr
or
 in
 M
ea
n
0
1
2
3
4
5
%
 E
rr
or
 in
 S
ta
nd
ar
d 
D
ev
ia
tio
n
Mean
Std. Dev.
0 0.2 0.4 0.6 0.8
0
0.5
1
1.5
2
2.5
3
3.5
4
Correlation Coefficient
%
 E
rro
r i
n 
m
od
el
in
g
mean (mu) 
sigma
Figure 3: Trend in modeling error with (a) number of stages 
and (b) correlation coefficient 
5 10 15 20 25 30 35 40
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
Logic Depth in a Stage
N
or
m
liz
ed
σ
/µ
 
ra
tio
Only Random 
Intra−die Variation 
Intra + Inter−die 
(σVthInter=20mV)
Intra + Inter−die 
(σVthInter=40mV)
Only Inter−die Variation 
( σVthInter=40mV)
4 8 12 16 20 24 28 32 36 40
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
Number of Stages
N
or
m
al
iz
ed
σ
/µ
 
ra
tio
Correlation
Coefficient = 0.5 
Correlation
Coefficient = 0.2 
Correlation
Coefficient = 0.0 
5 10 15 20 25 30
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Number of Stages in Pipeline 
σ
/µ
 
R
at
io σVthInter=40mV
σVthInter=20mV
σVthInter=0mV
# of stages X # of logic depth in each stage = 120 
                                          (a)                                             (b)                  (c) 
Figure 5: Variability of (a) stage delay with logic depth, (b) pipeline design delay with the number of stages, (c) pipeline 
design delay with the simultaneous change of logic depths and number of stages (number of stages l stage logic depth = 120)
Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’05) 
1530-1591/05 $ 20.00 IEEE 
 1i T TARGET T DT PP P V d d  ) (10)
However, this bound does not provide any estimate of the 
design space of ıi of each stage. A relaxed upper bound on the 
µi and ıi for the ith stage can be obtained (Fig. 4) by assuming 
that any stage jzi meets the yield requirement with probability 
1 (since maximum value of )(x) =1) as shown below: 
 1i i D TARGETP TP V  ) d (11)
Equation (11) states that, if the mean and/or the standard 
deviation of any stage falls outside this bound, no pipelined 
design with that stage can ever meet the target delay and yield. 
A more stringent bound is obtained by assuming uncorrelated 
and equal stage delays and is given by:  
 1/1
S
S
N
NTARGET i
D i i TARGETD
i
T
P P T
P P V
V
ª º§ ·) t   ) d« »¨ ¸
« »© ¹¬ ¼
(12)
where, NS is the number of pipeline stages. Depending on Ns,
it gives the set of values for µi and ıi that can meet the target 
yield (Fig. 4 shows 2 such equality bounds for Ns = n1, n2
with n1 < n2).  Note that, there is a minimum bound on the µi
and ıi which depends on the minimum allowable logic depth 
and process specification (Fig. 4). Moreover, the µi and ıi of a 
combinational circuit are related parameters and the relation 
determines the realizable design space for the µi and ıi. For 
example, if we model each stage delay to be a chain of NL
inverters then a simple relation between µi and ıi is given by:  
 2 2min min min min and i L i L i iN NP P V V P P V V    (13)
where, µmin and ımin are the mean and the standard deviations 
of an minimum sized inverter, respectively. Similarly for 
maximum sized inverter having parameters µmax and ımax, there 
will be another bound on realizable µi and ıi (Fig. 4, curve 
marked Realizable upper bound). Hence, for equal critical 
stage delays a realizable design region bounded by a set of 
curves given by equation (12) to (13) is obtained (Fig. 4).  
3. Observations on statistical pipeline design 
Using the analytical models presented in section 2, we have 
analyzed the effect of logic depth and the unbalancing of the 
stage delays on the pipeline yield.  
3.1. Trade-off between number of pipeline stages and 
logic depth 
The delay of a pipelined design depends on the logic depth of 
the pipe stages. A reduction in the logic depth, which 
increases the number of stages in a pipeline, improves the 
operating frequency [4]. In this section we analyze the effect 
of logic depth and number of stages on the variability of a 
pipeline delay. A design with a lower variability has a higher 
probability of meeting a target delay (i.e. better yield) [5].
If only random intra-die variation is considered, the delays of 
the different gates in a combinational logic behave as 
independent variables. Under this condition, increasing logic 
depth reduces the variability (V/P) due to cancellation effect 
(i.e. delays of some gates increase while others decrease 
resulting in lower overall change) (Fig. 5(a)) [5]. However, 
correlation in the delays of the different gates (due to inter-die 
variation and spatial correlation) minimizes the cancellation 
effect. Hence, the variability becomes a weaker function of the 
logic depth (Fig. 5(a)). On the other hand, increasing the 
number of elements in the max function reduces its variability 
(Fig. 5(b), evaluated using a inverter chain pipeline with 
constant logic depth in each stage [7]. It can be further, 
observed that as the stage delays becomes more and more 
correlated, the sensitivity of the variability to the number of 
stages reduces (Fig. 5(b)).
In order to understand the effect of logic depth (NL), number 
of stages (NS) and relative strength of inter-die and intra-die 
variations, we have estimated the variability of an inverter 
chain pipeline (in BPTM 70nm technology node) of 120 levels 
with different configurations (i.e. with different NL and NS
such that NL x NS =120). In each case the delay distribution 
proposed by the analytical models closely match the Monte-
Carlo simulation results from SPICE. With only intra-die 
variation, the effect of logic depth prevails over the effect of 
the max function. Consequently, increasing the number of 
stages (i.e. reducing the logic depth of each stage) increase the 
variability (Fig. 5(c)). On the other hand, as the strength of 
inter-die variation increases (i.e. stage delays become more 
correlated) the V/P ratio of each stage becomes a weaker 
function of its logic depth. Hence, the impact of max function 
prevails and variability of the overall pipeline delay reduces 
with an increase in the number of stages (Fig. 5(c)).
3.2. Perfectly balanced vs. unbalanced pipeline design 
Traditionally, the pipeline stages are designed for equal delay 
to maximize the throughput [4]. However, incorporating 
imbalance among the pipeline stage delays can have a positive 
impact on the yield under process variation. This is because of 
the fact that a balanced pipeline has more number of critical 
paths than an unbalanced design that adversely affects the 
yield [5]. Imbalance can be incorporated in a balanced 
pipeline by using transistor sizing and/or logic re-structuring.  
However, the impact of imbalance on the overall pipeline 
delay needs to be estimated.  
We have performed experiments with a 3-stage pipeline 
structure to understand the effects of imbalance on the pipeline 
delay. For example, Fig. 6 shows a three-stage pipelined 
ALU-Decoder circuit. The combinational logic of each stage 
is first optimized for minimum area (using [3]) for a specific 
target pipeline delay (179ps) and yield (80%). First, we kept 
the target delay and the target yield constant for all the three 
stages (i.e. neglecting any correlation, the yield target for each 
stage was kept at (0.80)
1/3
= 0.9283 using (12)). In the next 
step, we have introduced imbalance among the three stages 
(by transistor sizing) in such a way that the total area remains 
constant. The overall delay distributions and the yield of the 
balanced and unbalanced pipeline are shown in Fig. 7. 
PART−I PART−II
ALU ALUDECODER CASE−A
PART−I PART−II
ALU ALUDECODER CASE−B
Logic Depth = 4 Logic Depth = 4 Logic Depth = 4
Downsized Upsized Downsized
Figure 6: A 3-stage pipeline with different stage delays  
Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’05) 
1530-1591/05 $ 20.00 IEEE 
Resizing the transistors reduces the mean but increases 
variance in pipeline delay (Fig. 7(a)). To understand the 
reason behind this yield improvement, let us consider the area 
vs. delay curves for each stage (Fig. 8). They are initially 
designed for equal delays indicated by line L1 in Fig. 8. This 
results in yield of Y0 for each stage (pipeline yield = Y0
3
). The 
total area for this design is the sum of the stage areas 
(A1+A2+A3). Now, we introduce imbalance by reducing the 
area of stage 1 and 3 (by dA1 and dA3) increasing their delays 
to line L2. This reduces yields of stages 1 and 3, to Y1, Y3
(both less than Y0). However, this extra area (dA1+dA3) can 
be added to the stage 2, thereby reducing its delay to line L3. 
This improves the yield of stage 2 (i.e. Y2 > Y0). If, in this 
particular case,   31 2 3 0Y Y Y Yu u !  the overall pipeline yield 
improves. This trend has been observed in the 3 stage pipeline 
circuits as shown in Fig. 7(b). However, introducing excess 
imbalance in stage delays, we might get diminishing returns 
when pipeline performance is governed by the µ of the slowest 
stage (Fig. 7(b), worst case unbalancing results).
Hence, it is necessary to appropriately apply imbalance among 
the stages.  Using the above observations, we have developed 
a simple heuristic to provide imbalance among the stages as 
shown below:
          :
   
            1
                     
                       . 
i
i
i
A
Calculate rate of change of area with delay for each stage R
D
For each stage
if R
reduction in large area
results in small increase in delay
w 
w
!
        
                     
                       . 
            
else
increase in small area
results in large improvement in delay
endif
endfor
(14)
In order to improve the yield of a design with minimum 
impact on area, the delays of the stages with (Ri < 1) should 
be more effective to reduce (small area penalty). On the 
contrary, to reduce area (or power) with minimum penalty on 
yield, the area of the stages with (Ri > 1) should more effective 
to reduce (small delay increase).  
4. Pipeline design flow under statistical delay 
distribution
Using the models developed in section 2 and based on the 
observations in section 3, we propose a statistical design 
method for the complete pipelined circuit. The proposed 
method is targeted to optimize area (hence, power) while 
ensuring the yield. In a conventional pipeline design flow, 
individual stages are designed and optimized independently of 
others before they are glued together to form a pipeline.  
Under statistical delay variation, a global optimization of the 
complete pipeline is necessary for two reasons:  
x Yield target (Y) for the complete pipeline may not be met 
if some of the stages fail to meet their yield target (Y0). In 
that case, the delay of the stages meeting yield target Y0
can be further improved to compensate for the stages failed 
to meet the yield. 
x Although the yield constraint is satisfied for each stage, the 
integrated pipelined circuit has opportunity for 
optimization with respect to the area/power (section 3.2).  
4.1. Global optimization of the pipeline for target yield
The optimization problem for minimizing area of a pipeline 
under a given yield constraint can be formulated as below,  
Minimize  
)(
1
¦
 
N
i
iSDArea
 /* SDi stands for delay of  the i-th stage */
Subject to  
Y
T
T
TTARGET t

)(
V
PI /* Y is target yield */
iii UxL dd .,....,1 ni  /* xi is the size factor for a logic gate.
Li and Ui are the minimum and maximum size factors  */
To solve the above problem, we use gate-level sizing 
algorithm for minimizing total area under yield constraint 
given in [3]. It is an iterative low-complexity algorithm based 
on Lagrangian Relaxation (LR) [3]. We have developed a 
global design optimization algorithm to solve the above 
problem efficiently (Fig. 9). Application of the proposed 
algorithm directly to the complete pipeline, where all the 
stages are sized simultaneously, is computationally very 
expensive. This is because of the fact that, a pipelined circuit 
can be very large and will take inordinately large run time and 
storage to converge [3]. The algorithm employs the principle 
of divide-and-conquer where we size one stage at a time in 
such a way that the target yield for the complete pipeline is 
satisfied while the total area is minimized. It should be noted 
that, for a pipelined design with a target yield Y with respect to 
a delay of TTARGET, the yield constraint to each stage is now set 
to Y (for TTARGET). Moreover, statistical delay analysis is 
performed over the complete pipeline, although the sizing is 
done for only one stage. It helps to make the algorithm 
computationally efficient, since we avoid application of the 
sizing routine on all the stages simultaneously.
160 165 170 175 180 185
0
500
1000
1500
2000
2500
3000
3500
4000
Delay [ps]
# 
of
 o
cc
ur
en
ce
Balanced
Unbalanced
Target Delay 
Reduction in 
 mean delay
60
65
70
75
80
85
90
95
100
Target Yield
Ac
hie
ve
d 
Yi
eld
 w
ith
 sa
m
e 
ar
ea Unbalanced(worst)Balanaced
Unbalanced(Best)
70 75 80
 (a)                                  (b)            
Figure 7: Effect of unbalancing on a) pipeline delay, b) yield
0.85 0.9 0.95 1 1.05 1.1
0.032
0.036
0.04
0.044
Normalized delay
 
Ar
ea
 
stage 1
stage 3
stage 2 
−dA1
−dA3
dA2
L1
L2
L3
Figure 8:  Area vs. delay curves of different logic stages of the 
3-stage ALU-Decoder pipeline design
Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’05) 
1530-1591/05 $ 20.00 IEEE 
It is worth noting that, improvement in the total area of the 
pipeline strongly depends on the ordering at which the stages 
are chosen for sizing.  In our algorithm, the ordering is based 
on the position of each stage in their area vs. delay curve as 
explained in section 3.2 (14). This minimizes the total design 
area to meet the target pipeline yield during global 
optimization, since stages with lower Ri are sized before stages 
with higher Ri.
 For each stage, we apply the algorithm in [3], which starts 
with assigning an initial delay constraint Ao (pipeline delay) to 
the stage and iteratively find optimal sizes with respect to it. 
At the end of each iteration, statistical timing analysis on the 
complete pipeline is performed. Depending on the new µ and ı
of the pipeline delay, the pipeline target delay is modified to 
A’o (steps 4-7 in Fig. 9). Since logic gates of only one stage 
are being sized at a time, we perform full timing analysis on 
one stage only to estimate its mean and the standard deviation. 
The proposed analytical models (section 2.2) are then used to 
produce the resulting pipeline delay distribution. This 
incremental timing analysis further improves the 
computational efficiency of the algorithm.  
The global optimization algorithm proposed here is 
significantly faster and takes much less storage compared to 
the case where all the stages are sized simultaneously. The LR 
based sizing algorithm proposed in [3] has a computational 
complexity of O(n
2
) where n is the number of logic gates to 
size. For m pipeline stages each having n gates the 
simultaneous sizing approach runs with a complexity of 
O(m
2
n
2
) (with space complexity of O(mn)). The proposed 
algorithm improves the complexity to O(mn
2
) (with space 
complexity of O(n)).
The algorithm is applied on an example 4-stage pipelined 
circuit (stages are designed with ISCAS85 benchmark circuits) 
and the results were compared with the case where the logic 
stages were individually optimized using method in [3]. We 
observed that, preferentially incorporating imbalance among 
the stages using the proposed algorithm ensures target yield 
(Table-II) or minimizes area for a target yield (Table-III). The 
highlighted rows in the tables show the stages chosen by the 
algorithm for yield improvement with very small area 
increase. Similarly, the plain rows (Table-II and III) show 
stages chosen by the algorithm for area saving with small 
penalty in yield. Overall a 9% improvement in yield can be 
obtained with a small area penalty 2% (Table-II). On the other 
hand, about 8.4% area improvement can be obtained for the 
same yield (table-III).  
5. Conclusions
We have investigated pipeline delay distribution under inter- 
and intra-die parameter variations. Analytical models for 
estimating yield of a pipelined circuit are presented. We have 
observed the impact of logic depth (or number of pipe stages) 
and imbalance among the stage delays on the variability of the 
pipeline delay. An efficient sizing algorithm for pipeline to 
minimize area under yield constraint is presented. Our 
investigations show that a statistical design of a complete 
pipeline (not only the individual stages) is effective to 
improve yield in presence of parameter variations.  
REFERENCES
[1] K. A. Bowman et al., “Impact of Die-to-Die and Within-Die 
Parameter Fluctuations on the Maximum Clock Frequency 
Distribution for Gigascale Integration”, JSSC 2002, pp. 183-190. 
[2] E. Jacobs et al., “Gate Sizing Using a Statistical Delay Model”, 
DATE 2000, pp. 283-290.
[3] S. Choi, et al., “Novel Sizing Algorithm for Yield Improvement 
under Process Variation in Nanometer Technology”, DAC 2004, pp.  
454-459. 
[4] J. L. Hennessy et al., “Computer Architecture: A Quantitative 
Approach”, Morgan Kaufmann, 3-rd edition, May 2002. 
[5] S. Borkar et al., “Parameter Variations and Impact on Circuits and 
Microarchitecture”, DAC 2003, pp. 338-342. 
[6] H. Mahmoodi, et al., “Estimation of Delay Variations Due to 
Random-dopant Fluctuations in Nano-Scaled CMOS circuits”, CICC
2004, pp. 17-20. 
[7] A. M. Ross, “Useful Bounds on the Expected Maximum of 
Correlated Normal Variables”, Aug 2003. 
[8] C. E. Clark, “The Greatest of a Finite Set of Random Variables”, 
Operations Research 9(2), Mar-Apr, 1961, pp. 145-162. 
[9] BPTM: http://www-device.eecs.berkeley.edu/~ptm/
1.a Com pute area vs. delay plot for each stage
1.b Perform  statistica l tim ing analys is  for each stage
Input: C om plete pipelined design
w ith individua l s tages optim ized,
Y ie ld (Y) w .r.t. delay A 0,
M odels for process variation
2. O rder ind iv idual stages w .r.t. 
the ir position in  area vs. delay p lot 
3. For each stage Si in  this  order
6.a Perform  statis tical tim ing analys is  for S i
6.b Com pute (µ, ı ) for com plete p ipeline, 
using m odel in  section III
4 . Set delay constra int A0
5. Calculate optim al s izes 
for the target delay
7. Calculate A ’0 (new target delay) 
from  A 0 and delay variation
8. Is optim al sizing 
reached?
Output: O ptim ized p ipe line design
m eeting yie ld
9. Are a ll s tages 
processed?
YES
YES
NO
NO
G reedy 
heuris tic-
based 
ordering of 
s tages
Sizing of 
one stage at 
a tim e, 
considering 
g lobal yie ld
Figure 9: Algorithm for the global optimization of pipeline for 
ensuring yield with minimum area overhead 
Table-II  Ensuring YTARGET (80%) with small area penalty 
Individually Optimized   Proposed Algorithm  Stage 
Logic Area (%) Yield (%) Area (%) Yield (%) 
c3540 47.4 86.3 47.3 86 
c2670 25.7 95 27.4 99.1 
c1980 20.4 95 20.7 95.6 
c432 6.5 95 6.6 99.2 
Pipeline: 100 73.9 102 80.5 
Table-III Area reduction for a target yield (80%) 
Individually Optimized   Proposed Algorithm  Stage 
Logic Area (%) Yield (%) Area (%) Yield (%) 
c3540 50 94 45 90.5 
c2670 23.2 95 21.2 99.1 
c1980 20.3 95 19.1 90.5 
c432 6.5 94.5 6.3 99.2 
Pipeline:  100 80.3 91.6 80.5 
Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’05) 
1530-1591/05 $ 20.00 IEEE 
