Transistor sizing analysis of regular fabrics by Marranghello, Felipe S. et al.
Transistor Sizing Analysis of Regular Fabrics 
 
Felipe S. Marranghello, Vinicius Dal Bem, André I. Reis, Renato P. Ribas 
UFRGS, Porto Alegre, Brazil 
rpribas@inf.ufrgs.br 
 
Francesc Moll 
UPC, Barcelona, Spain  
 
Abstract 
 
This paper presents an extensive transistor sizing analysis for regular transistor fabrics. Several evaluation methods 
have been exploited, such as DC simulations, ring oscillators and single-gate open chain structures. Different design 
aspects are addressed taking into account stacked transistors, cells with drive strengths and circuit critical paths. The 
performance degradation of using regular fabrics in comparison to standard cells is naturally expected, but it is quite 
important to evaluate the dimension of such impact. The results were obtained for predictive PTM45 CMOS 
parameters, and the conclusions can be easily extrapolated to other technology nodes and fabrication processes. 
 
 
1 Introduction 
Systematic process variations have become a major issue 
for integrated circuits manufacturing due to the small 
dimensions used in modern technologies, which are 
smaller than the wavelength used in photolithography [1]. 
These variations result in discrepancies between the 
designed layout and the manufactured product leading to 
unpredictable behavior [2]. 
Resolution Enhancement Techniques (RET), such as 
phase shift mask and optical proximity correction, can be 
used to improve the layout quality for lithography 
processing [3-4]. However, these techniques are too 
expensive to be used for huge VLSI design with many 
distinguished layout patterns. Thus, reducing the number 
of allowed patterns is desirable. 
Several techniques to improve lithography quality using 
regular layout have been studied and proposed in the 
literature [5]-[7]. One methodology for the utilization of 
dummy features to improve regularity is presented in [5]. 
The work of Smayling et al. [6] shows an approach to 
reduce variability on gates. None of these purposes is 
completely regular. A fully regular layout technique is the 
Via-Configurable Transistor Array (VCTA) fabric [7]. In 
this work, a regular transistor fabric (RTF) similar to 
VCTA is investigated. 
Regular transistor fabrics are here understood as a matrix 
of identically sized transistors forming a regular structure. 
The main purpose of this regular fabric is to minimize 
systematic process variations. However, due to the 
identical size of all transistors, it can lead to a significant 
penalty on circuit aspects like timing and area. 
Transistor sizing has a major influence on circuit 
performance. Since each logic gate (cell) cannot be sized 
individually when the RTF approach is targeted, this task 
becomes even more critical than when addressing the 
conventional Standard Cell methodology. If the 
transistors width in the RTF pattern is too small, the 
circuit tends to present poor timing performance. On the 
other hand, if the transistors width is too large, the power 
dissipation becomes a drawback, while timing 
improvement can be limited due to the increased gate 
capacitances. 
This paper presents an extensive electrical analysis of 
RTF pattern. The focus is the transistor sizing impact on 
signal delay propagation and power consumption 
characteristics. Area overhead is not addressed herein and 
performance degradation due to metal wiring parasites is 
considered a minor effect being overlooked. 
The paper is organized as follows. Section 2 describes the 
different ways to perform the cell sizing evaluation. 
Section 3 presents the single gate sizing with special 
attention to transistor stacks in NAND and NOR gates. 
Section 4 evaluates the design of cells with increased 
drive strengths. Section 5 analyses the sizing impact on 
circuit critical paths. Section 6 presents a general analysis 
for RTF sizing, and Section 7 outlines the conclusions. 
2 Transistor Sizing Evaluation 
It is often desirable to make PMOS transistors larger than 
NMOS transistors due to the smaller mobility of holes, 
when compared to electrons. This relation is expressed in 
design by the PN ratio, i.e., PMOS channel width (Wp) 
divided by NMOS channel width (Wn). There are several 
different ways to determine the PN ratio. Usually, this 
value is defined for the static CMOS inverter to fit in a 
certain criteria, and then it is taken as reference for other 
gate sizing definitions in cell libraries. In [8], the authors 
exploit the use of specific PN ratios for each different cell 
network in order to optimize the circuit performance. 
The cell timing characteristics depend on several factors. 
Cell delay and power dissipation are influenced by 
elements such as parasitic capacitances and resistances 
associated to physical layout, input slew rate and the 
output load. It is desirable to minimize the effects of such 
parasitic elements. A usual layout optimization technique 
is to compact the transistor drain/source areas as much as 
possible, in order to diminish the source/drain resistance 
and capacitance to improve cell performance. Notice that 
cell characterization should be done considering different 
input slopes and output loads that are expected to appear 
in designs. This way, characterization data is available so 
that delays can be considered according to the input 
slopes and output loads imposed by the circuit context. 
There are several ways to define and evaluate the 
transistor sizing of logic gates. Some strategies are DC 
analysis, the simulation of ring oscillator structures and 
the simulation of single-gate open chain structures. Each 
one of these methods evaluates the cell in a different way. 
An environment for automatic transistor sizing evaluation 
has been developed. This platform consists of several 
HSPICE files for electrical simulation, Perl language 
scripts for data extraction and table generation, and 
Gnuplot files for plotting data. This section discusses how 
DC analysis and simulations of ring oscillators and open 
chain structures can be used to evaluate transistor sizing. 
2.1 DC Analysis 
DC simulation, illustrated in Fig. 1, allows to extract the 
noise margins (NM), the logical threshold of gate (Vth) 
and the DC peak current (IPeak). High (low) noise margin 
– NMh (NMl) is defined in this work as the difference 
between 0.9*Vdd (0.1*Vdd) and the input voltage that 
produces this value at the output. The logical threshold is 
defined as the input voltage that makes it equals to the 
output voltage. Peak current is the maximum crowbar 
current flowing from power supply voltage (Vdd) to 
ground sink node (Gnd). 
 
Figure 1: DC analysis: characteristic transfer curve. 
DC analysis does not take into account the capacitance 
effects present in transient behavior. In the case of cells 
with stacked transistors (for instance, NAND and NOR 
gates), different behaviors may be observed depending on 
the stack position of the switching transistor (input 
variable) under analysis. 
Fig. 2 shows that Vth does not present a linear 
dependency to the PN ratio. In fact, a huge variation on 
the PN ratio tends to lead to a small impact on Vth. These 
results were obtained considering the 45nm PTM process 
parameters [9]. The noise margins variation with PN ratio 
is not shown because optimizing Vth equalizes the noise 
margins. In this case, the optimal PN ratio suggested is 
approximately four because it equalized the noise 
margins. Moreover, since IPeak is more sensitive to PN 
ratio variations than Vth, it is probably a better decision to 
prioritize IPeak over Vth, keeping a lower bound for the 
last one. 
 
 
Figure 2: Vth and IPeak variations with PN ratio. 
2.2 Ring Oscillator Analysis 
A ring oscillator is a closed chain of cells that can be seen 
as an odd number of inverters. This way, the value on 
each node changes periodically. The time between two 
subsequent distinct transitions (high-to-low and low-to-
high) on any node in the structure is called the oscillating 
period. Optionally, additional ‘floating’ gates can be 
connected to nodes for evaluating logic gates with 
increased output load (fanout). In Fig. 3 the ring oscillator 
structure is depicted. 
There are two opposite effects caused by increasing the 
transistor(s) width(s). The current capability of the cell 
increases making the signal runs faster. On the other 
hand, the output load of the previous stage also increases 
due to its output load increasing, making the oscillating 
signal runs slower. There is an important trade-off 
between these two effects. 
Cell sizing can be performed aiming different 
characteristics, such as minimum oscillating period, delay 
propagations (high-to-low and low-to-high) and transition 
times (rising and falling) equalizations. Fig. 4 shows the 
ring oscillator period, delay ratio and transition time ratio 
for different PN values. The delay ratio means the ratio 
between the high-to-low delay and the low-to-high delay, 
while the transition time ratio represents the ratio between 
the output falling transition time and the output rising 
transition time. For the PN ratio variation (X-axis in Fig. 
4), the NMOS width is constant while the PMOS width 
varies. 
If the cell under evaluation, instantiated in the ring 
structure, presents transistors in stacking arrangement, 
like NAND and NOR gates, the simulation result may 
provide different values. It will depend on the stack 
position of the transistor related to the transitioning input 
addressed. 
There is not a single PN ratio that optimizes all criteria 
mentioned above. In the exercise of static CMOS inverter 
analysis, considering the 45nm PTM process parameters 
[9], the minimum oscillating period was achieved with a 
PN ratio equal to 1.8, while the delay equalization leads to 
an optimal ratio equal to 3 and transition time 
equalization requires a ratio of approximately 4. It 
demonstrates that the final PN ratio decision must be 
carefully made regarding the mainly desired 
characteristics for each specific cell.  
 
 
Figure 3: Ring oscillator structure. 
 
 
Figure 4: Ring oscillator analysis (inverters): period, 
delay and transition ratios in terms of PN ratio (X-axis). 
2.3 Single-Gate Open Chain Analysis 
A single gate is simulated for different input slopes and 
output loads. The input slope can be either an ideal ramp 
(directly connected to the input voltage source) or the 
output signal of another cell (inverter is usually adopted 
as the driver cell). If a driver cell is used, then the 
precision of the slew rate variation is lost, but the effect of 
the input capacitance can be better observed. The output 
charge can be an explicit capacitor or a load gate. Fig. 5 
illustrates this structure. The DUT (from the acronym 
Device Under Test) block represents the cell under 
evaluation. It represents a very common technique used 
by cell library designers to characterize individual logic 
gates for delay and power modeling, which is applied 
afterwards in the circuit design flow. 
This analysis is quite useful to verify a pre-defined cell 
sizing in relation to different stimulus conditions. As can 
be observed in Fig. 6, the increasing in delay values 
(high-to-low and low-to-high) and in output transitions 
(rising and falling) can be made almost equivalent for a 
specific choice of cell sizing. In other words, the ratio 
between fall and rise delay is nearly constant. 
 
 
Figure 5: Single-gate open chain structure. 
 
 
(a) 
 
(b) 
Figure 6: CMOS inverter performance in relation to (X-
axis): (a) input slope variation; (b) output load variation. 
3 RTF Sizing Templates 
RTF is a regular layout style. Unlike usual standard cell 
design, it is not possible to perform sizing on each cell 
individually. All transistors of the same type (PMOS and 
NMOS) have the same channel length and width. Another 
difference is that all transistor gates (polysilicon stripes) 
are equally spaced, it means, source/drain areas 
compaction cannot be performed. 
This section discusses the impact of RTF patterns with 
different transistors widths on individual cells 
performance. The electrical simulations were carried out 
taking into account the 45nm PTM process parameters 
[9], where the minimum transistor channel length and 
width are 50 nm and 90 nm, respectively. The power 
supply voltage applied was 1.1 V, at 25 Celsius degree as 
operating temperature. The logic gates addressed in this 
analysis were the inverter, NAND and NOR gates with 2- 
to 4-inputs. 
3.1 Standard Cell Approach as Reference 
Traditional standard cell design uses several layout 
compaction techniques to enhance performance. As 
example, the source/drain areas that have no contact to 
metal wire can be made smaller (compacted). This layout 
style leads to patterns that cannot be well processed 
during lithography in the most advanced technology 
nodes [1]. Indeed, Design for Manufacturability (DFM) 
rules have been developed to restrict the layout design 
patterns and improve the lithography quality [2]. The only 
basic DFM rule that is followed in this work is that all 
polysilicon stripes (transistor gate) are 1D and equally 
spaced. As a consequence, it is not possible to optimize 
drain/sources regions.  
In this section, the optimal cell sizing, taking into account 
the restriction mentioned above, were evaluated and 
determined to be used as reference for the RTF sizing 
analysis. 
The first cell taken into account was the inverter. The 
minimum ring oscillator period (or the maximum 
oscillation frequency) was adopted as the metric for such 
definition. NMOS width (Wn) was arbitrarily kept 
constant at the minimum allowed value, while the PMOS 
width (Wp) of all stages was increased until the maximum 
oscillation frequency was achieved. The result was a 
PMOS width equal to 1.6 times the minimum width value 
(i.e., the NMOS transistor size).  
In the case of NAND and NOR gates, the sized inverter 
was used as reference. The goal was to achieve a 
performance similar to the inverter for both high-to-low 
and low-to-high delay propagations, represented by Td_hl 
and Td_lh, respectively. These delays were measured 
using the single-gate open chain structure for different 
input slopes and output loads.  
To prevent the cells from having a huge gate capacitance, 
the input capacitance (Cin) was limited to four times the 
one of the reference inverter. It represents a hard 
constraint in this task. A second goal is to have all the 
cells with delays similar to the inverter. On the other 
hand, equalizing the rise and fall delays of a cell is not 
necessary. Moreover, the delays are considered similar if 
they are within a 10% margin from the inverter delay. But 
that is a soft constraint in this process. It means, if the 
targeted delay cannot be achieved, the error margin may 
be increased. Table 1 shows the results, which are 
normalized in relation to the inverter characteristics Wn, 
Td_hl and Cin. Certainly, there are other values that could 
be used and fit the established criteria, but it is impractical 
to consider all of them and often the differences are not so 
significant. 
 
Table 1: Normalized transistor sizing, delay propagation 
(high-to-low and low-to-high) and input capacitances. 
Cell Wn Wp Td_hl Td_lh Cin 
INV 1.0 1.6 1.0 1.72 1.00 
NAND2 1.9 1.8 1.0 1.80 1.42 
NAND3 3.0 2.0 1.07 1.84 1.92 
NAND4 4.3 2.2 1.10 1.95 2.50 
NOR2 1.4 4.0 1.08 1.75 2.08 
NOR3 1.8 6.2 1.20 1.98 3.08 
NOR4 2.2 8.0 1.28 2.10 3.92 
 
Even though there is no PMOS stack in NAND2 gate, in 
such situation the PMOS transistors are larger than the 
one in the inverter because the internal cell capacitances 
(drain/source areas) are increased due to the larger NMOS 
present in the transistor network arrangement. 
As the stack size increases, it becomes harder to attain the 
delay equalization. Furthermore, PMOS transistors are 
less sensitive to width increase than NMOS transistors. In 
this sense, for a certain transistor stack height, PMOS 
transistors tend to increase even more proportionally, with 
respect to the reference PMOS from the inverter gate, 
than the augmentation observed for NMOS devices in 
stacking. As a result, the NOR4 is the most difficult cell 
to perform the optimal sizing.  
3.2 RTF Sizing for Basic Gates Design 
As mentioned before, in the RTF approach all transistors 
of the same type (PMOS and NMOS) must have the same 
channel width. Thus, device sizing is expected to have a 
great impact on the cell performance when compared to 
the standard cells approach. The adoption of small sizing 
values tends to lead to poor timing performance of cells 
like NAND4 and NOR4. On the other hand, large 
transistor width values can result to over sizing the 
smaller cells, increasing significantly (and unnecessarily) 
the power consumption due to the increased parasitic 
capacitances related to the input transistor gate and 
drain/source transistor regions. 
In order to evaluate the impact of the transistor sizing 
definition in the RTF pattern, each cell has been simulated 
taking into account the seven transistors widths pairs from 
Table 1, corresponding to the optimal sizing of each gate 
evaluated (INV, NAND 2-4, NOR 2-4). Each pair is 
called here a RTF configuration, being RTF1 the Wn and 
Wp sizing of the inverter in Table 1, RTF2 corresponds to 
the transistors width of NAND2 in this table, RTF3 is the 
width of NAND3 and so on, until RTF7, which uses the 
widths sizing of NOR4 given in Table 1.  
Since the results strongly depend on the chosen transistor 
sizing values, three additional Wn and Wp pairs were also 
considered in this analysis. They are: 
• the NMOS width of NAND4 and the PMOS width of 
the NOR4, in Table 1, referred as ‘RTF_WC’ 
configuration; 
• the average width values from Table 1, for each kind 
of transistor, referred as ‘RTF_Avg1’ configuration; 
• the average width values from Table 1, but excluding 
the worst cases NAND4 and NOR4 gates, referred as 
‘RTF_Avg2’ configuration. 
The values obtained for the RTF_WC, RTF_Avg1 and 
RTF_Avg2 configurations are shown in Table 2. Values 
are normalized to Table 1. Widths are normalized to the 
inverter Wn 1. Cin is the input capacitance of an inverter 
built with those widths, normalized in relation to the 
minimum one. The RTF_WC configuration is expected to 
present the smaller delay propagation but the highest 
power consumption, since it is considered to be an upper 
bound for the transistors widths in this analysis. 
RTF_Avg1 and RTF_Avg2 configurations may present a 
trade-off that compensates the utilization of small 
transistors on some cells by over sizing transistors on 
other ones. RTF_Avg2 does not consider NAND4 and 
NOR4 because these are less likely to appear on circuit 
designs. 
 
Table 2:  Definition of additional RTF templates (Wn and 
Wp) by considering some specific data from Table 1. 
 Wn Wp Cin 
RTF_WC 4.3 8.0 4.73 
RTF_Avg1 2.3 3.7 2.31 
RTF_Avg2 1.8 3.1 1.88 
 
DC simulation was used to measure the maximum 
crowbar current. This current can be useful as metric for 
power consumption, in particular of the short-current 
power component. Table 3 shows the IPeak values. 
Moreover, the ring oscillation period was measured for 
seven oscillators. Each one designed with one of the cells 
targeted (INV, NAND 2-4, NOR 2-4). All the RTF 
configurations defined before were simulated. Table 4 
shows these results. The values are normalized to the 
period obtained using the configuration of Table 1 for 
each specific cell under test. 
As expected, the use of larger transistors does not 
guarantee a better result on delay propagation in the ring 
structure (i.e., minimum oscillating period) due to the 
increasing in input capacitances observed in each stage. 
For instance, in the case that RTF3 or RTF4 templates are 
used to implement a NAND2 gate, the ring oscillator 
period becomes higher than when using RTF2 one. Even 
when the period is minimized, the loss on power 
consumption can significantly increase. For example, 
considering the inverter gate, if RTF7 template is adopted 
the period is 13% smaller, but the input capacitance is 
almost four times higher. If the NOR4 is considered, the 
better period is achieved when using the sizing found in 
Section 3.1. RTF_Avg1 and RTF_Avg2 have similar 
timing results but RTF_Avg2 presents approximately 
20% less capacitance. 
 
Table 3: Normalized DC crowbar current (IPeak) for 
different logic gates in several RTF sizing configurations. 
 INV NAND2 NAND3 NAND4 NOR2 NOR3 NOR4 
RTF1 1.00 0.50 0.31 0.31 0.21 0.53 0.43 
RTF2 2.00 1.00 0.62 0.43 1.39 1.06 0.86 
RTF3 3.23 1.61 1.00 0.69 2.23 1.71 1.38 
RTF4 4.68 2.33 1.45 1.00 3.24 2.47 2.00 
RTF5 1.45 0.72 0.45 0.31 1.00 0.76 0.62 
RTF6 1.89 0.94 0.59 0.41 1.31 1.00 0.81 
RTF7 2.34 1.17 0.72 0.50 1.62 1.24 1.00 
WC 4.68 2.33 1.45 1.00 3.24 2.47 2.00 
AVG1 2.45 1.22 0.76 0.52 1.69 1.29 1.05 
AVG2 1.89 0.94 0.59 0.41 1.31 1.00 0.81 
 
Table 4: Normalized ring oscillator period for different 
logic gates designed in several RTF sizing configurations. 
 INV NAND2 NAND3 NAND4 NOR2 NOR3 NOR4 
RTF1 1.00 1.07 1.18 1.20 1.13 1.22 1.40 
RTF2 0.97 1.00 1.03 1.04 1.15 1.30 1.50 
RTF3 1.02 1.01 1.00 1.01 1.27 1.43 1.74 
RTF4 1.08 1.05 1.01 1.00 1.40 1.63 1.99 
RTF5 0.99 1.19 1.33 1.47 1.00 1.03 1.07 
RTF6 1.01 1.26 1.41 1.60 0.99 1.00 1.02 
RTF7 0.87 1.27 1.43 1.63 0.99 1.01 1.00 
WC 0.87 1.02 1.08 1.18 0.96 1.22 1.07 
AVG1 0.90 1.01 1.07 1.15 0.99 1.38 1.16 
AVG2 0.92 1.03 1.11 1.19 1.00 1.65 1.16 
 
Table 5: Normalized ‘delay-power’ product in single-gate 
open chain test structure. 
 INV NAND2 NAND3 NAND4 NOR2 NOR3 NOR4 
RTF1 1.00 1.09 1.17 1.39 1.20 1.85 2.29 
RTF2 0.98 1.00 0.97 0.93 1.45 1.83 2.26 
RTF3 1.07 1.06 1.00 0.92 1.58 2.00 2.50 
RTF4 1.20 1.19 1.10 1.00 1.78 2.27 2.85 
RTF5 0.97 1.14 1.25 1.34 1.00 0.99 1.02 
RTF6 1.20 1.43 1.56 1.66 1.11 1.00 0.95 
RTF7 1.43 1.71 1.83 1.95 1.27 1.10 1.00 
WC 1.54 1.64 1.66 1.64 1.37 1.26 1.23 
AVG1 0.95 1.03 1.04 1.04 1.06 1.12 1.21 
AVG2 0.90 0.99 1.02 1.03 1.05 1.14 1.26 
 
The average delay propagation and power dissipation 
were measured for each cell using each one of the 
configurations defined before. The product ‘delay-power’ 
is shown in Table 5. When the ‘delay-power’ metric is 
smaller than one, it indicates that the optimal size was not 
the one previously defined. As example, NAND3 using 
RTF has a product equal to 0.97, but the delay loss is 11% 
and this timing degradation makes RTF3 a better choice 
according to the criteria adopted.  
4 Drive Strength Analysis 
In traditional standard cell design, a certain cell with 
increased drive strength can be built by multiplying all its 
transistor widths by the same factor. In order to avoid the 
input capacitance to grow too much, logic gates with 
multiple stages can be used. The advantage of using a 
multiple stage cell over a single stage one is that 
frequently only the final stage has to be sized to fit the 
drive strength targeted. On the other hand, more logic 
depth levels (stages) added to the cell topology may lead 
to higher delay. Thus, there is an optimal point when the 
utilization of multiple stages cells becomes a better 
option. This trade-off depends on the cell profile 
(transistor arrangement) and on the sizing of the stages. 
Two approaches to design a multiple stage cell are: (1) 
adding a buffer to the cell output, and (2) dividing the cell 
logic through the stages by applying the De Morgan 
theorem. Fig. 7 illustrates these different implementations 
for a NAND4 gate. 
 
 
(a)            (b) 
Figure 7: Different implementations of multiple stage 
NAND4: (a) buffered output, (b) decomposed version. 
 
When using RTF, folding is always necessary to 
implement logic gates with larger drive strengths. For this 
reason, the strategy of adding a buffer to the cell output is 
usually adopted. A known drawback of pre-defined 
regular fabrics approach is that not all desirable drive 
strengths may be attained. For instance, if an inverter with 
one transistor in each plan has a drive strength one, then a 
drive strength two inverter is designed by using two 
transistors on each plan, while for the drive strength of 1.5 
it would be necessary one and a half inverters, which is 
impractical. This section discusses the differences of 
using the two cell sizing (drive strength increasing) 
strategies considering both standard cell and RTF 
approaches. 
Table 6 and and Table 7 show the delay propagations and 
the ‘delay-power’ metric, respectively, for the three 
different implementations of a NAND4 gate, considering 
the standard cell approach, it means, exploiting the design 
flexibility to size different gates and stages for optimal 
performance. 
If only the delay propagation is taken into account, a 
single stage cell usually is the best option when compared 
to the multiple stage approaches. However, if power 
dissipation is also considered, then the multiple stage cell 
can become a better choice. The higher the drive strength 
the better is the utilization of multiple stage cells. It is 
important to notice that this work uses a simple fanout 
four rule for sizing the stages of the multiple stage cells. 
Indeed, the results may be more advantageous for the 
multiple stage versions when exploiting more elaborate 
sizing strategies. 
 
Table 6: NAND4 design, in three strategies and different 
drive strength: normalized delay propagation. 
Drive Strength Single Buffer Decomposed 
2 1.00 1.54 1.43 
3 1.04 1.62 1.48 
4 1.09 1.56 1.55 
5 1.14 1.60 1.57 
6 1.19 1.64 1.60 
7 1.24 1.68 1.63 
8 1.28 1.65 1.71 
 
Table 7: NAND4 design, in three strategies and different 
drive strength: normalized ‘delay-power’ metric. 
Drive Strength Single Buffer Decomposed 
2 1.00 1.51 1.34 
3 1.56 2.05 1.80 
4 2.16 2.52 2.61 
5 2.81 3.04 3.09 
6 3.52 3.58 3.59 
7 4.27 4.14 4.10 
8 5.07 4.66 5.04 
 
In the case of RTF approach, only the design strategy 
using buffered output to increase the NAND4 gate drive 
strength was considered. Table 5 and Table 6 show the 
results for the delay propagations and the ‘delay-power’ 
metric, respectively. The results are shown only for four 
configurations, where the discrepancies are more evident. 
Table 8 is normalized using Table 6 and Table 9 is 
normalized using Table 7. They are normalized to the best 
standard cell implementation. It means that the 
normalization value is the best result obtained for that 
drive strength using standard cell.  
RTF1 presented the best results for ‘delay-power’ metric 
because as the drive strength is increased, the number of 
inverters is also increased. Therefore, the performance 
loss due to the slow NAND4 gate is compensated. RTF1 
balances an excessive delay with small power 
consumption. 
The utilization of RTF_Avg2 template yields a better 
delay performance, but there is a significant cost on 
power dissipation. RTF4 has similar delay to RTF_Avg2 
but with increased power consumption. RTF_WC 
template is as fast as RTF1 even though the transistors are 
much larger. 
In all cases evaluated there is significant performance 
degradation on delay propagation when compared to the 
implementation using traditional standard cells. 
 
Table 8: NAND4 with different drive strengths designed 
in RTF patterns: delay propagation. 
Drive Strength RTF1 RTF4 RTF_WC RTF_Avg2 
2 1.59 1.40 1.52 1.39 
3 1.57 1.43 1.53 1.42 
4 1.52 1.39 1.51 1.38 
5 1.49 1.37 1.48 1.36 
6 1.47 1.35 1.45 1.34 
7 1.45 1.33 1.43 1.32 
8 1.42 1.32 1.42 1.31 
 
Table 9: NAND4 with different drive strengths designed 
in RTF patterns: ‘delay-power’ metric. 
Drive Strength RTF1 RTF4 RTF_WC RTF_Avg2 
2 1.31 1.91 3.44 1.64 
3 1.18 1.71 2.95 1.46 
4 1.13 1.68 2.95 1.43 
5 1.06 1.56 2.69 1.33 
6 1.01 1.48 2.51 1.25 
7 1.00 1.46 2.45 1.23 
8 1.05 1.56 2.65 1.32 
5 Critical Path Analysis 
So far, the evaluations were only performed on single 
logic gates. To have a better idea about the RTF impact 
on circuit performance, six benchmark circuits were 
randomly created. 
Benchmark circuits 1 and 2 (named CKT1 and CKT2, 
respectively) use only one instance of each cell. The 
others circuits use a random number of instances, though 
there is a preference for cells with one (INV) or two 
(NAND2 and NOR2) inputs. Table 10 shows results 
normalized to the delay obtained when the cells are 
individually sized using the values of Table 1. 
As previously mentioned, RTF1 has all the gates with 
unitary capacitance. This leads to both small power 
consumption and high delay propagation. One can also 
observe again that larger transistors do not guarantee a 
smaller delay. 
RTF1 may be good if delay is not only the main concern. 
RTF2, RTF3 and RTF4 have higher delay and higher 
capacitance than RTF1. RTF5 is faster than RTF1, but it 
has higher capacitance. RTF_WC presented the best delay 
results but with very high capacitance. RTF_Avg1 and 
RTF_Avg2 show delay around 5% more than using 
RTF_WC, but with almost half capacitance. RTF6 and 
RTF7 have very similar delays in comparison to RTF5 
but with increasing capacitance. 
RTF_Avg1 and RTF_Avg2 have a good trade-off. 
RTF_Avg2 presents better results because when the 
PMOS and NMOS sizes were chosen it was known that 
some cells would appear more times than others. Some 
configurations demonstrated no advantages. 
It is also interesting to notice that some configurations 
could reach delay propagation near the one obtained using 
standard cell for some circuits.  
 
Table 10: Delay propagation in critical paths extracted 
from becnhmark circutis, for different RTF sizings. 
 CKT1 CKT2 CKT3 CKT4 CKT5 CKT6 
StdCell 1.00 1.00 1.00 1.00 1.00 1.00 
RTF1 1.83 1.72 1.29 1.35 1.28 1.16 
RTF2 1.85 1.89 1.27 1.38 1.30 1.14 
RTF3 2.05 2.12 1.37 1.52 1.42 1.23 
RTF4 2.29 2.36 1.48 1.67 1.56 1.34 
RTF5 1.70 1.57 1.20 1.24 1.19 1.11 
RTF6 1.72 1.56 1.22 1.25 1.21 1.13 
RTF7 1.72 1.57 1.22 1.25 1.21 1.13 
MAX 1.55 1.54 1.10 1.17 1.10 1.00 
AVG1 1.61 1.60 1.14 1.21 1.14 1.03 
AVG2 1.63 1.60 1.15 1.22 1.15 1.05 
6 General Analysis 
Transistor sizing for regular fabrics is even a harder task 
than for traditional standard cells, because it is not 
possible to consider the cells individually. 
Previous designs may be used to estimate cell utilization 
if they are available. These gates are expected to have a 
significant impact on design performance. This way, 
transistor width can be chosen considering which cells are 
expected to present a major impact. These cells can be 
initially sized using the structures presented in Section 2, 
as it would be done for traditional standard cell. After 
this, a trade-off between the cell dimensions must be 
found.  
Extracting critical paths from circuits and sizing the 
related gates to meet a given constraint is also a good 
option. It must be noticed that similar timing and power 
dissipation characteristics compared to standard cells are 
unlikely to be achieved. Using this approach it may not be 
needed to design the cells individually.  
7 Conclusions 
This paper presented an extensive electrical analysis of 
RTF regular fabric. The restriction on allowed patterns in 
layout leads to better lithography yield, but there is an 
obvious expected degradation on performance. Adequate 
transistor sizing plays an important role in minimizing the 
gap between RTF and traditional standard cells with DFM 
rules. For this reason, several possible options for 
transistors widths are investigated as RTF sizing 
configurations.Their impact on delay propagation and 
power dissipation was measured comparing with the 
results from the standard cells, taken as the reference 
value. The design of cells with higher drive capability was 
also evaluated. It is shown that there is a practical limit 
for increasing transistors width. After that point, there is 
only an additional increasing in power consumption 
without significant gain on delay performance. As 
example, there is no gain using RTF_AVG2 instead of 
RTF_AVG1. Also, little gain on critical path delay is 
obtained if the transistors are wider than RFT_AVG1 
even if RFT_MAX is used. The analysis present in this 
work can be easily extrapolated to other technology nodes 
and fabrication processes. 
8 References 
[1] S. K. Springer, S. Lee, N. Lu, E. J. Nowak, J.-O. 
Plouchart, J. S. Wattsa, R. O. Williams, and N. 
Zamdmer, “Modeling of variation in submicrometer 
CMOS ULSI technologies,” IEEE Trans. Electron 
Devices, vol. 53, pp. 2168–2006, Sep. 2006. 
[2] B. H. Calhoun, Yu Cao, Xin Li, Ken Mai, L. T. 
Pileggi, R. A. Rutenbar, K. L. Shepard,“Digital 
Circuit Design Challenges and Opportunities in the 
Era of Nanoscale CMOS”, IEE Proccedings, Vol. 96, 
Issue 2, pp. 343-365, February 2008 
[3] M. Lavin, F. L. Heng, and G. Northrop, “Backend 
CAD flows for Frestrictive design rules,” in Proc. 
ACM/IEEE Int. Conf. Computer-Aided Design, Nov. 
2004, pp. 739–746. 
[4] J. Wang, A. K. Wong, E. Y. Lam,”Performance 
optimization for gridded layout standard cells,” in 
Proc. SPIE 24th Annu. BACUS Symp. Photomask 
Technol., W. Staud and J. T. Weed, Eds., 2004, vol. 
5567, pp. 107–118. 
[5] P. G. Drennan et al., “Implications of proximity 
effects for analog design,” CICC, pp. 169–176, 2006. 
[6] M. Smayling et al., “Low k1 logic design using 
gridded design rules,” vol. 6925, no. 1. SPIE, 2008. 
[7] M. Pons, F. Moll, A. Rubio, J. Abella, X. Vera, A. 
Gonzales, “VCTA: A Via-Configurable Transistor 
Array regular fabric”, VLSI-SoC 2010. 
[8] F. Beeftink, P. Kudva, D. S. Kung, R. Puri, L. Stock , 
“Combinatorial Cell Design for CMOS libraries”, 
Integration, the VLSI Journal, Vol. 29, Issue 1, pp. 67-
93, March 2000  
[9] Y. Cao, T. Sato, D. Sylvester, M. Orshansky, C. Hu, 
"New paradigm of predictive MOSFET and 
interconnect modeling for early circuit design," pp. 
201-204, CICC, 2000. 
 
 
 
 
