Via-configurable transistors array: a regular design technique to improve ICs yield by Pons, Marc et al.
Via-Configurable Transistors Array: a Regular
Design Technique to Improve ICs Yield
Marc Pons, Francesc Moll and Antonio Rubio
Universitat Polite`cnica de Catalunya
Dept. of Electronic Engineering
{mpons,moll,rubio}@eel.upc.edu
Jaume Abella, Xavier Vera and Antonio Gonza´lez
Intel Barcelona Research Center
Intel Labs - UPC
{jaumex.abella,xavier.vera,antonio.gonzalez}@intel.com
Abstract—Process variations are a major bottleneck for digital
CMOS integrated circuits manufacturability and yield. That is
why regular techniques with different degrees of regularity are
emerging as possible solutions. Our proposal is a new regular
layout design technique called Via-Configurable Transistors Ar-
ray (VCTA) that pushes to the limit circuit layout regularity
for devices and interconnects in order to maximize regularity
benefits. VCTA is predicted to perform worse than the Standard
Cell approach designs for a certain technology node but it will
allow the use of a future technology on an earlier time. Our
objective is to optimize VCTA for it to be comparable to the
Standard Cell design in an older technology. Simulations for
the first unoptimized version of our VCTA of delay and energy
consumption for a Full Adder circuit in the 90 nm technology
node are presented and also the extrapolation for Carry-Ripple
Adders from 4 bits to 64 bits.
Index Terms—CMOS, DSM, Digital ICs, Regular Designs,
Yield, DFM.
I. INTRODUCTION
Current silicon CMOS technologies suffer from large device
and interconnect parameter variations and the trend is expected
to worsen for future technologies [1]. In general, for complex
digital circuits, typical process variations considered for Deep
Sub-Micron technologies (DSM) are 10% to 30% across
wafers and 5% to 20% across dies [2]. These process variations
pose many challenges for circuit design due to their effects
in performance, power and yield [3]–[9]. Therefore, process
variations are expected to increase costs of the Standard
Cell approach designs to unaffordable limits. Mask costs are
increasing with every technology node, associated to the mask
complexity increase, from 102 thousand dollars, for a 350 nm
design, to almost 1 million dollars, for a 90 nm design [10].
The impact of manufacturing process variations is to de-
crease the predictability of circuit delay and power dissipation
thus increasing the design time because of the difficulty
in verifying and testing the resulting circuits. This circuit
unpredictability has lead to the concept of frequency binning,
and this situation is very costly in terms of performance. In
fact the benefits expected for future technology generations
may diminish significantly [11], [12]. For instance, considering
only transistor variations in a wafer, about 30% variation in
chip frequency and 20x variation in chip leakage have been ob-
served [1]. Other predictions show that process variations for
gates and wires result in a maximum 40% circuit performance
variation and a 55% circuit power dissipation variation [13]. In
the field of microprocessor functional units, some results have
been published for different adder implementations showing
variations on both power and performance around 20% [14].
Other authors assume a 10-15% delay variation for single gates
[15].
Process variations also reduce yield. That results in more
time and investment required to increase yield to production
levels, and therefore, in an increase of the time-to-market
[9], [16], [17]. As we enter the DSM era, layout printability
challenges in sub-wavelength lithography are becoming a
major issue for Design For Manufacturability (DFM) [18].
In fact the Resolution Enhancement Techniques (RETs) like
Phase Shift Mask (PSM) and Optical Proximity Correction
(OPC) that deal with systematic sources of variability are
computationally difficult for huge integrated circuits with
arbitrary layout patterns and this results in functional yield loss
and parametric failures [19], [20]. As the process technologies
scale to small feature sizes, chip yields are expected to drop
from over 90% for 350 nm to around 50% or less for 90 nm
[21]. Furthermore, this trend will continue for next technology
nodes. For instance, on a 64-K direct map L1 cache, for the
45 nm technology node, a 33% yield has been reported [22].
Due to sub-wavelength lithography effects, fabrication yields
for the Standard Cell approach can be unacceptably low even
for layouts complying with the nominal design rules. Design
Rules Check (DRC) is not yet enough for DFM.
There is huge potential to mitigate process variations from
the DFM point of view [19]. In particular, regular layout
designs show to be highly beneficial to reduce the impact of
process variations and increase yield. In fact, regularity helps
to reduce the existent variations due to the manufacturing
process. Regularity-based techniques like Via-Programmable
Gate Arrays or Field- Programmable Gate Arrays and also
Structured ASICs are emerging as a possible solution for
manufacturers [23]–[29]. Usually they offer worse area and
performance than the Standard Cell approach but, according
to the degree of regularity, they improve manufacturing yield,
and show better performance predictability also reducing the
time-to-market and the high mask Non-Recurring Engineering
costs (NREs).
Our proposal is to reduce process variations in silicon and
metal as much as possible at manufacturing time pushing
to the limit circuit layout regularity for devices and also
for interconnects. In this way, we try to maximize yield
and to reduce mask costs that are becoming prohibitive for
manufacturers [10]. We also plan to minimize the design time
developing a regular fabric based on a single configurable
basic cell including transistors and interconnects. We will refer
to this kind of basic cells as Via-Configurable Transistors
Array (VCTA). In this way, we avoid the time required for
optimization of different customized basic cells.
The structure of the paper is as follows. In section II we
will describe our VCTA basic cell proposal. In section III
electric simulations in the 90 nm technology node for a Full
Adder are presented for the structure proposed and then are
compared to the Standard Cell approach in terms of energy and
delay. The behaviors of 4-bit to 64-bit Carry-Ripple Adders
are extrapolated from these simulations too. The impact of
process variations is also studied by the means of Monte Carlo
simulations. In section IV we discuss the impact in terms of
yield and time-to-market of our proposal. Finally, in section
V conclusions and future work are provided.
II. VCTA BASIC CELL DESCRIPTION
Our VCTA proposal is a very fine-grain device regular
structure, similar to a Sea-of-Transistors [30]–[32]. In order
to ensure interconnect regularity and to reduce routability
problems due to prefabricated contacts or vias, we propose
a Via-Configurable structure where all contacts and vias can
be configured depending on the function synthesized. All the
routing channels and the MOS devices are implemented but
only connected depending on the needs. Thus, the via-insertion
algorithm is important. In fact, contacts and vias are the only
source of layout irregularity in our VCTA basic cell.
The Transistors Array is composed by two blocks aligned
vertically: a block of eight serial PMOS transistors and another
block of eight serial NMOS transistors (Figure 1). Note that
the eight transistors in each case share the same oxide diffusion
reducing their area, and that, even though our VCTA has serial
transistors, we can implement parallel connections by setting
up vias properly. In order to force maximum transistor layout
regularity, all of them have the same dimensions: they all
have the minimum channel length of 100 nm and the width
of 200 nm that ensure maximum transistor compaction when
sharing the same oxide diffusion. In order to reduce process
variations, two of the eight transistors, the ones on the upper
and lower extremes, are used as dummy transistors. In this way
we avoid variations between drains/sources that are between
two polisilicon gates and drains/sources that only have one
gate on one side. Therefore, only 12 of the 16 transistors in
the basic cell can be used for implementing functions. The
choice of having 6 PMOS and 6 NMOS transistors in the
basic cell is related with the possibility of implementing 2
logic branches of transistors of the maximum length of 3 serial
transistors fixed in order to avoid body effect and excessive
serial resistance issues.
The Via-Configurable interconnections use three metal lev-
els, from M1 to M3, forming a routing grid: M1 wires are
Fig. 1. Transistors Array structure
vertical, M2 wires are horizontal and M3 ones are vertical
again (Figure 2). The three metal layers are used for inter-
cell connectivity while M2 layer is also devoted to intra-cell
connections.
Fig. 2. Via-Configurable metal grid structure
Regarding the power supply network, wires are reserved
in each metal layer of the basic cell for VDD and GND.
These wires are shared across neighbor cells. In this way,
we can reduce the area when implementing a full circuit
with multiple basic cells. The same design criterion is used
for polarization contacts. On one hand, we will have vertical
symmetry of the basic cells and on the other case we will
use horizontal symmetry. The resulting placement of our basic
cells is depicted in Figure 3.
Fig. 3. Placement and power supply network of basic cells (6 cells in the
picture)
The configuration of the contacts and vias in order to im-
plement the functions desired is not performed automatically
yet. However it will be interesting to develop a CAD tool
that implements the via-insertion algorithm. This via-insertion
algorithm should have as inputs:
1) The circuit schematic including transistors branches and
sizing
2) The number of metal layers for interconnections
3) The number of transistors of the VCTA basic cell
4) The number of possible parallel transistors connections
available in the VCTA cell (this number does not include
the power supply connections that are always available)
5) The number of inputs/outputs supported by the VCTA
cell and, therefore, the number of connections that can
be wired between contiguous basic cells
For our first work, the VCTA basic cell has 6 NMOS and 6
PMOS transistors, 4 possible gates transistors inputs, 3 parallel
connections and no more than 7 inputs/outputs per basic cell.
These constraints have to be considered in order to decide the
final routing and can give us the information of the number
of basic cells needed for implementing the desired function,
and therefore the area required in a very early design stage.
III. SIMULATIONS AND RESULTS
A. Full Adder with no process variations
Complete electrical simulations of the extracted layout of a
Full Adder in the 90 nm technology node have been performed
using HSPICE simulator. The layout of the Full Adder (FA),
that is composed by three VCTA basic cells, is depicted in
Figure 4. On one hand, the FA designed with our VCTA
regular design technique has been evaluated in terms of delay,
energy and area. On the other hand, the same simulations have
been performed for the FA synthesized with a standard cell
library. Figure 5 shows the schematic used for the simulations
where {ai,bi,cii} stand for the initial values of the inputs of
the FA and {af ,bf ,cif} stand for its final values.
Fig. 4. Full Adder layout capture using our VCTA proposal (3 basic cells)
Fig. 5. Schematic used for the simulations of the Full Adder
The delay has been measured from input variation to the
associated output transition considering the cross at 50%
of the power supply VDD. Energy has been also measured
for each input combination considering a 1 ns period time
integrating the current demand at the power supply source. The
area has been measured directly from the layout. Worst-Case
measurement results are shown in Table I. We can see how
regular design implies an increase in area, Worst-Case Delay
and Worst-Case Energy when compared to the Standard Cell
approach.
TABLE I
FULL ADDER SIMULATION RESULTS
WCDelay (ps) WCEnergy (fJ) Area (µm2)
FA STD CELL 156.90 19.36 28.24
FA VCTA 577.00 30.08 52.18
Ratio 3.68x 1.55x 1.85x
B. Full Adder considering process variations
In order to evaluate our VCTA proposal under process vari-
ations, the same electrical simulations have been performed
considering random local process variations on PMOS and
NMOS channel length and width, oxide thickness, threshold
voltage and channel doping concentration. Systematic process
variations have been neglected due to the small size of the
circuit.
Our VCTA proposal is expected to present a certain reduc-
tion of the process variations amount because of its layout
regularity. That is why 4 variability scenarios have been
simulated:
1) Considering 100% of the Gaussian distribution 3-sigma
deviation percents for the MOS parameters variations
2) Considering 75% of the technology variations (25%
reduction)
3) Considering 50% of the variations (50% reduction)
4) Considering 25% of the variations (75% reduction)
For each simulation scenario, a Monte-Carlo analysis have
been done using 1082 points (to ensure a 95% confidence
level and a confidence interval of width 5% [33]) and for each
of the 64 input variation possibilities. Tables II, III, IV and V
show the complete results for the FA design using the Standard
Cell approach (only for 100% technology variations) and our
regular VCTA approach. Detailed results of Worst-Case delays
are presented for all possible transitions from each of the
inputs to all the outputs of the FA, and also the Worst-Case
energy. For each measurement, the mean, deviation and their
ratio are also presented. As expected, higher process variations
increase the variability of delay and energy. Note also that
in the case of our regular VCTA proposal, the maximum
variability is lower than for the Standard Cell design, and this
difference is higher in terms of the energy consumption.
C. Carry-Ripple Adder behavior extrapolation
In order to evaluate our VCTA proposal in a more complex
functional unit, like a binary adder, we will use the detailed
results presented before to extrapolate the behavior for a
simple Carry-Ripple adder (CR) that is composed by FA.
Full simulations are part of our future work although minor
differences are expected with respect to the results of the
extrapolation. Note that, in terms of area, the ratio between
the Standard Cell design and our VCTA proposal will remain
the same as for a single FA due to the fact that the area of the
TABLE II
FULL ADDER SIMULATION RESULTS UNDER 100% TECHNOLOGY
VARIATIONS
100% Technology Variations
FA Delay WCD (ps) σ (ps) µ (ps) 3σ/µ (%)
A to CO STD 110.10 5.49 93.93 17.52
A to Z STD 171.50 5.50 153.90 10.72
B to CO STD 114.40 4.53 95.65 14.22
B to Z STD 173.20 4.79 155.35 9.24
CI to CO STD 112.40 4.17 97.39 12.84
CI to Z STD 172.50 6.70 150.11 13.38
A to CO VCTA 368.00 14.68 310.9 14.17
A to Z VCTA 618.10 21.23 545.42 11.68
B to CO VCTA 371.00 14.93 313.61 14.28
B to Z VCTA 627.80 21.09 548.53 11.54
CI to CO VCTA 343.60 13.81 298.33 13.89
CI to Z VCTA 661.90 25.90 576.90 13.47
FA Energy WCE (fJ) σ (fJ) µ (fJ) 3σ/µ (%)
STD 21.13 0.55 19.23 8.63
VCTA 31.73 0.38 30.22 3.76
TABLE III
FULL ADDER SIMULATION RESULTS UNDER 75% TECHNOLOGY
VARIATIONS
75% Technology Variations
FA Delay WCD (ps) σ (ps) µ (ps) 3σ/µ (%)
A to CO VCTA 353.70 11.06 311.08 10.67
A to Z VCTA 600.30 15.74 545.42 8.66
B to CO VCTA 355.40 11.10 313.69 10.61
B to Z VCTA 603.70 15.83 548.64 8.66
CI to CO VCTA 331.00 10.40 298.74 10.45
CI to Z VCTA 638.30 19.55 576.93 10.16
FA Energy WCE (fJ) σ (fJ) µ (fJ) 3σ/µ (%)
VCTA 31.21 0.28 30.19 2.83
TABLE IV
FULL ADDER SIMULATION RESULTS UNDER 50% TECHNOLOGY
VARIATIONS
50% Technology Variations
FA Delay WCD (ps) σ (ps) µ (ps) 3σ/µ (%)
A to CO VCTA 358.20 12.85 312.43 12.34
A to Z VCTA 610.50 16.79 543.47 9.27
B to CO VCTA 358.00 13.15 315.07 12.52
B to Z VCTA 603.60 18.14 550.35 9.89
CI to CO VCTA 337.20 11.88 299.18 11.92
CI to Z VCTA 668.20 22.84 578.76 11.84
FA Energy WCE (fJ) σ (fJ) µ (fJ) 3σ/µ (%)
VCTA 31.24 0.30 30.19 3.03
TABLE V
FULL ADDER SIMULATION RESULTS UNDER 25% TECHNOLOGY
VARIATIONS
25% Technology Variations
FA Delay WCD (ps) σ (ps) µ (ps) 3σ/µ (%)
A to CO VCTA 325.40 3.97 312.06 3.82
A to Z VCTA 566.80 6.26 545.38 3.45
B to CO VCTA 330.10 4.18 314.01 4.00
B to Z VCTA 566.80 5.94 548.85 3.25
CI to CO VCTA 310.00 3.58 299.24 3.59
CI to Z VCTA 603.10 7.22 577.41 3.75
FA Energy WCE (fJ) σ (fJ) µ (fJ) 3σ/µ (%)
VCTA 30.53 0.12 30.16 1.19
CR adder of N bits is just N times the area of a FA in each
case. Therefore, the area ratio will remain about 1.85x.
Energy and delay results for CR adders from 4 to 64 bits
are presented in Tables VI, VII, VIII and IX. On one hand,
the results for energy have been calculated considering that
every FA is consuming its Worst-Case energy. On the other
hand, the Worst-Case delays have been calculated considering
that the carry in of the CR adder propagates to the last FA of
the sum chain. Figure 6 shows the critical path that has been
considered. Finally, note that in both cases, FA delays and
energy distributions are considered Gaussian and independent
what allows the calculation of the mean and sigma of the CR
adder of N bits {µCR, σCR} as it is shown in equations (1)
and (2):
µCR =
N∑
i=1
µFAi (1)
σCR =
√√√√ N∑
i=1
σFAi2 (2)
Fig. 6. Carry-Ripple Worst-Case delay calculation
We can see how variability in terms of 3σ/µ increases with
increasing process variations but also how it decreases with
the number of bits considered. In fact the absolute deviation
increases, however only by a factor of the square of the
number of bits N, while the mean increases by a factor of N.
This occurs because of the CR structure, whose longest path
increases linearly with the number of bits, thus compensating
random variations across such a long path. Analyzing more
complex adders with shorter critical paths is part of our future
work.
Worst-Case delay and energy ratios for the CR adders of 32
and 64 bits considering the whole technology variations are
shown in Table X. We can see how the energy ratio is almost
the same than for a single FA. However, the delay ratio is
smaller because it tends to the ratio between the CI to CO
delays that is the most repeated delay in the critical path. The
Worst-Case delay for a FA occurs only from CI to Z in the
last FA of the CR adder.
TABLE VI
CARRY-RIPPLE ADDER SIMULATION RESULTS UNDER 100%
TECHNOLOGY VARIATIONS
100% Technology Variations
CR Delay WCD (ps) σ (ps) µ (ps) 3σ/µ (%)
4 bits STD 511.70 10.01 440.55 6.81
8 bits STD 961.30 13.02 830.12 4.71
16 bits STD 1860.50 17.57 1609.27 3.27
32 bits STD 3658.90 24.22 3167.58 2.29
64 bits STD 7255.70 33.80 6284.19 1.61
4 bits VCTA 1720.10 35.70 1487.17 7.20
8 bits VCTA 3094.50 45.14 2680.49 5.05
16 bits VCTA 5843.30 59.69 5067.13 3.53
32 bits VCTA 11340.90 81.32 9840.41 2.48
64 bits VCTA 22336.10 112.76 19386.97 1.74
CR Energy WCE (fJ) σ (fJ) µ (fJ) 3σ/µ (%)
4 bits STD 84.52 1.11 76.92 4.32
8 bits STD 169.04 1.56 153.85 3.05
16 bits STD 338.08 2.21 307.70 2.16
32 bits STD 676.16 3.13 615.39 1.53
64 bits STD 1352.32 4.43 1230.78 1.08
4 bits VCTA 126.92 0.76 120.87 1.88
8 bits VCTA 253.84 1.07 241.74 1.33
16 bits VCTA 507.68 1.51 483.47 0.94
32 bits VCTA 1015.36 2.14 966.94 0.66
64 bits VCTA 2030.72 3.03 1933.89 0.47
TABLE VII
CARRY-RIPPLE ADDER SIMULATION RESULTS UNDER 75% TECHNOLOGY
VARIATIONS
75% Technology Variations
CR Delay WCD (ps) σ (ps) µ (ps) 3σ/µ (%)
4 bits VCTA 1655.70 26.86 1488.10 5.42
8 bits VCTA 2979.70 33.98 2683.06 3.80
16 bits VCTA 5627.70 44.95 5072.98 2.66
32 bits VCTA 10923.70 61.25 9852.82 1.86
64 bits VCTA 21515.70 84.93 19412.50 1.31
CR Energy WCE (fJ) σ (fJ) µ (fJ) 3σ/µ (%)
4 bits VCTA 124.84 0.57 120.75 1.42
8 bits VCTA 249.68 0.81 241.50 1.00
16 bits VCTA 499.36 1.14 483.01 0.71
32 bits VCTA 998.72 1.61 966.02 0.50
64 bits VCTA 1997.44 2.28 1932.03 0.35
TABLE VIII
CARRY-RIPPLE ADDER SIMULATION RESULTS UNDER 50% TECHNOLOGY
VARIATIONS
50% Technology Variations
CR Delay WCD (ps) σ (ps) µ (ps) 3σ/µ (%)
4 bits VCTA 1700.80 31.13 1489.55 6.27
8 bits VCTA 3049.60 39.17 2686.27 4.37
16 bits VCTA 5747.20 51.61 5079.71 3.05
32 bits VCTA 11142.40 70.16 9866.59 2.13
64 bits VCTA 21932.80 97.17 19440.35 1.50
CR Energy WCE (fJ) σ (fJ) µ (fJ) 3σ/µ (%)
4 bits VCTA 124.96 0.61 120.78 1.52
8 bits VCTA 249.92 0.86 241.56 1.07
16 bits VCTA 499.84 1.22 483.12 0.76
32 bits VCTA 999.68 1.73 966.24 0.54
64 bits VCTA 1999.36 2.44 1932.48 0.38
TABLE IX
CARRY-RIPPLE ADDER SIMULATION RESULTS UNDER 25% TECHNOLOGY
VARIATIONS
25% Technology Variations
CR Delay WCD (ps) σ (ps) µ (ps) 3σ/µ (%)
4 bits VCTA 1553.20 9.76 1489.90 1.97
8 bits VCTA 2793.20 12.11 2686.86 1.35
16 bits VCTA 5273.20 15.79 5080.78 0.93
32 bits VCTA 10233.20 21.32 9868.62 0.65
64 bits VCTA 20153.20 29.41 19444.30 0.45
CR Energy WCE (fJ) σ (fJ) µ (fJ) 3σ/µ (%)
4 bits VCTA 122.12 0.24 120.64 0.60
8 bits VCTA 244.24 0.34 241.29 0.42
16 bits VCTA 488.48 0.48 482.58 0.30
32 bits VCTA 976.96 0.68 965.15 0.21
64 bits VCTA 1953.92 0.96 1930.30 0.15
TABLE X
CARRY-RIPPLE ADDER EXTRAPOLATION RESULTS FOR 100%
TECHNOLOGY VARIATION
WCDelay (ps) WCEnergy (fJ)
Ratio 32 bits CR 3.10x 1.50x
Ratio 64 bits CR 3.08x 1.50x
Ratio single FA 3.82x 1.50x
Ratio CI to CO FA 3.06x –
IV. YIELD AND TIME-TO-MARKET IMPROVEMENT
A. Standard Cell approach vs VCTA proposal
Every two years a technology node starts at the first small
circuit or transistor fabrication [34]. Then, huge investments
are required in order to reach commercial chips yield. For
DSM technologies using the Standard Cell approach, the initial
yield is around 15-20% and the time-to-market can last three
years before reaching maximum chip yield around 50-60%
[7], [22], [35].
Regarding the initial yield increase of our VCTA proposal,
we have to examine the different factors causing yield loss.
Figure 7 shows the detailed components for different tech-
nologies. On one hand, defect-density related problems are
caused by actual errors with the silicon, such as when a
contaminating particle is introduced during fabrication. Most
of the lithography based failures occur when there are defects
on the masks used to burn the silicon. Parametric yield loss, on
the other hand, occurs because the manufactured chip does not
meet a design parameter, like frequency or power dissipation.
In order of importance, first, we have the parametric yield
loss (25% for 90 nm). In this case, once the design will be
optimized, we hope the yield loss for our VCTA designs will
be lower to the one with Standard Cells because of the man-
ufacturing variability reduction. Second, we have systematic
lithography based failures (15% of yield loss for 90 nm). For
this factor, our VCTA regular designs are expected to perform
much better than Standard Cells. In fact, as we mentioned
in Section I, by forcing layout regularity in both devices and
interconnects, our structure will reduce systematic yield losses
associated to lithography tools and RETs. Finally, we have
random defect-density related problems (10% of yield loss for
90 nm). In this case, due to the area overhead of regularity, it is
Fig. 7. Yield factors for different process technologies. [21]
possible that our VCTA designs perform worse than Standard
Cells. However, because our layout patterns are simpler than
the Standard Cell ones, there will be less critical areas that
can be critically affected by random defects. In any case, it
is the less important contributor to yield loss. Considering the
hypothesis presented above, we have assumed that adding the
three factors of yield loss for VCTA designs, we will have a
little advantage in front of Standard Cells designs. That is why
we have considered a 5% initial yield improvement in order
to illustrate our proposal with an example.
We also analyze the yield improvement rate over time. The
difference in speed observed between the two yield evolutions
is because VCTA is based on the repetition of a single via-
configurable basic cell that is able to implement all kinds
of combinational circuits (e.g., functional units) and where
contacts and vias of the cell, that includes transistors and
interconnects, are configured depending on the function to be
synthesized. The yield improvement over time is accelerated
because only a single layout cell with a reduced number of
layout patterns has to be optimized instead of a whole Standard
Cell library. For example, in a Standard Cell library consisting
of 1000 Standard Cells there are approximately 2 million
possible configurations to arrange a pair of Standard Cells.
This large number of possible arrangements makes RETs
computationally difficult and therefore time consuming [27].
That is why we have considered that yield improvement rate
for VCTA is increased by a 1.5 factor over a year.
Based on the assumptions explained about initial yield
level and yield improvement rate, Figure 8 shows predicted
yield evolution for the 90 nm, 65 nm and 45 nm technology
nodes compared to the expected yield evolution of our VCTA
regular layout design technique. We can see how it is very
likely that VCTA regular designs provide high yield after
one year of development for a given technology node (i.e.,
65 nm) when Standard Cells provide acceptable yield for the
previous technology node (i.e., 90 nm), that has appeared
two years before but has been developed during three years.
Although this is just a rough evaluation of yield, it illustrates
the advantages of VCTA with respect to Standard Cells. We
are planning to quantify the yield results in the future using
the available CAD tools.
Fig. 8. Yield predictions for Standard Cells and VCTA approaches. We have
considered that the VCTA initial yield is increased by a 5% and that yield
improvement rate is increased by a 1.5 factor over a year
B. Objectives and future work
The final objective is to achieve similar performance for
VCTA in a given technology node and for Standard Cells in
the previous technology node, in such a way that both of them
may reach the market at the same time but VCTA reduces
investments required to achieve commercial yield levels. We
plan to optimize our VCTA proposal in order to reach a
2x factor in Worst-Case delay and also a 2x Worst-Case
energy, what will suppose one technology lost. Going back
to simulation results for the 90 nm technology node presented
in Table I, our first priority is to reduce the delay overhead.
Improving such delay may have an impact in energy and area.
In fact, there is significant room for delay improvement
because all transistors in our cell have the same size and,
moreover, there are unused transistors (e.g., 22% of the tran-
sistors for a FA) that can be used to emulate wider transistors.
Similarly, different configurations for the vias and the metal
layers can be chosen to implement the required function, so
we must devote some effort to choose the best configuration
to minimize interconnect parasitics. One possibility consists
of using M1, M3 and M5 (leaving M2 and M4 for shielding
purposes) instead of M1, M2 and M3, increasing in this
way the distance between layers and therefore decreasing
capacitances. Another possibility is to add a fourth VCTA
basic cell to the FA design in order to have more transistors
available for the transistors sizing. In this way, the area ratio
will be 2.4x but the performance could be improved. Finally,
the choice of the transistors width is also important, because
the precision of the transistor sizing depends on it. With our
present choice of 200 nm for width, by connecting in paralllel
transistors, we can only emulate wider transistors of 400 nm,
600 nm, etc., with a width multiple of the basic transistor, and
this is not always optimal.
An additional advantage of VCTA with respect to Standard
Cells is the fact that the proposed structure will be capable of
minimizing the impact of the remaining process variations on
circuit delay and power once the product has been shipped
out. Differently to classical Standard Cell designs, the Via
Configurable Transistors Array chosen topology enables some
degrees of freedom. For instance, the spare transistors in-place
may also be used to mitigate delay uncertainty of critical paths.
Similarly, different ways to configure the vias and contacts of
regular designs are possible, and hence, some flexibility is
available to connect devices in such a way that variations are
further mitigated.
V. CONCLUSION
Our VCTA layout regularity technique may drastically re-
duce the time-to-market and therefore the investments required
to reach commercial yields by increasing the initial yield level
and its improvement rate over time. However, compared to the
Standard Cell approach for the same technology node, there is
a decrease in circuit efficiency due to regularity. Results are not
good enough yet for the unoptimized VCTA, so further effort
is required to reduce overheads. Placing and routing Standard
Cells provide highly efficient designs and good yield for a
given technology node after a long time-to-market, whereas
regular designs may provide less efficient designs but very high
yield after a short time-to-market. Therefore, it is very likely
that regular designs provide high yield for a given technology
node (i.e., 65 nm) when Standard Cells provide acceptable
yield for the previous technology node (i.e., 90 nm). Therefore,
even if regular designs are less efficient than Standard Cell
ones for the same technology node, regular designs reduce
time-to-market in such a way that at any time they may provide
similar performance, higher yield and much lower design costs
than Standard Cells.
ACKNOWLEDGMENTS
This research work has been supported by Intel Corporation,
Feder Funds and the Spanish Ministry of Education and
Science under grants TIN2004-03702 and TIN2007-61763.
REFERENCES
[1] S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De.
Parameter variations and impact on circuits and microarchitecture. In
Proceedings of Design Automation Conference, pages 338–342, 2003.
[2] M. Nourani and A. Radhakrishnan. Testing on-die process variation in
nanometer VLSI. IEEE Design &amp; Test of Computers, 23(6):438–
451, 2006.
[3] Z. Lin, C.J. Spanos, L.S. Milor, and Y.T. Lin. Circuit sensitivity to inter-
connect variation. IEEE Transactions on Semiconductor Manufacturing,
11(4):557–568, 1998.
[4] Sylvester D.S., Nakagawa O.S., and Hu C. Modeling the impact of
back-end process variation on circuit performance. Int. Symp. On VLSI
Technology, Systems and Applications, -:58–61, 1999.
[5] S.R. Nassif. Modeling and analysis of manufacturing variations. In
IEEE Conference on Custom Integrated Circuits, pages 223–228, 2001.
[6] A. Teene, B. Davis, R. Castagnetti, J. Brown, and S. Ramesh. Impact of
interconnect process variations on memory performance and design. In
Sixth International Symposium on Quality of Electronic Design, ISQED,
pages 694–699, 2005.
[7] S. Ozdemir, D. Sinha, G. Memik, J. Adams, and Hai Zhou. Yield-aware
cache architectures. In 39th Annual IEEE/ACM International Symposium
on Microarchitecture, MICRO, pages 15–25, 2006.
[8] International Technology Roadmap for Semiconductors 2005,
http://www.itrs.net/Links/2005ITRS/Home2005.htm, 2005.
[9] Farid N. Najm, Noel Menezes, and Imad A. Ferzli. A yield model
for integrated circuits and its application to statistical timing analysis.
IEEE Transactions on Computer-Aided Design of Integrated Circuits
and Systems, 26(3):574–591, 2007.
[10] J.D. Sawicki. DFM: magic bullet or marketing hype? In Lars W. Lieb-
mann, editor, Proceedings of the SPIE Design and Process Integration
for Microelectronic Manufacturing II, volume 5379, pages 1–9, May
2004.
[11] K.A. Bowman, S.G. Duvall, and J.D. Meindl. Impact of die-to-die
and within-die parameter fluctuations on the maximum clock frequency
distribution for gigascale integration. IEEE Journal of Solid-State
Circuits, 37(2):183–190, 2002.
[12] K.A. Bowman, S.B. Samaan, and N.Z. Hakim. Maximum clock
frequency distribution model with practical VLSI design considerations.
In Integrated Circuit Design and Technology, 2004. ICICDT ’04. Inter-
national Conference on, pages 183–191, 2004.
[13] International Technology Roadmap for Semiconductors 2006 Update,
http://www.itrs.net/Links/2006Update/2006UpdateFinal.htm, 2006.
[14] K. Bernstein, D. J. Frank, A. E. Gattiker, and B. L. Ji W. Haensch, S. R.
Nassif, E. J. Nowak, D. J. Pearson, and N. J. Rohrer. High-performance
CMOS variability in the 65-nm regime and beyond. IBM Journal of
Research and Development, 50:433–449, 2006.
[15] A. Agarwal, V. Zolotov, and D.T. Blaauw. Statistical clock skew
analysis considering intradie-process variations. IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, 23(8):1231–
1242, 2004.
[16] P. Gupta and A.B. Kahng. Manufacturing-aware physical design. In
International Conference on Computer Aided Design, ICCAD, pages
681–687, 2003.
[17] Animesh Datta, S. Bhunia, Jung Hwan Choi, S. Mukhopadhyay, and
K. Roy. Speed binning aware design methodology to improve profit
under parameter variations. In Asia and South Pacific Conference on
Design Automation, pages 6 pp.–, 2006.
[18] L. Capodlieci, P. Gulpta, A.B. Kahng, D. Sylvester, and J. Yang. Toward
a methodology for manufacturability-driven design rule exploration. In
Proceedings of 41st Design Automation Conference, pages 311–316,
2004.
[19] L. Pileggi, H. Schmit, A.J. Strojwas, P. Gopalakrishnan, V. Kheterpal,
A. Koorapaty, C. Patel, V. Rovner, and K.Y. Tong. Exploring regular
fabrics to optimize the performance-cost trade-off. In Proceedings of
Design Automation Conference, pages 782–787, 2003.
[20] Lei He, Andrew B. Kahng, King Ho Tam, and Jinjun Xiong. Simul-
taneous buffer insertion and wire sizing considering systematic CMP
variation and random Leff variation. IEEE Transactions on Computer-
Aided Design of Integrated Circuits and Systems, 26(5):845–857, 2007.
[21] Handel H. Jones. A delayed 90-nm surprise. Electronics Design Chain
Magazine, 2004.
[22] Amit Agarwal, Bipul C. Paul, Hamid Mahmoodi, Animesh Datta, and
Kaushik Roy. A process-tolerant cache architecture for improved yield
in nanoscale technologies. IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, 13(1):27–38, January 2005.
[23] Abbas El-Gamal, Ivo Bolsens, Andy Broom, Christopher Hamlin,
Philippe Magarshack, Zvi Or-Bach, and Larry Pileggi. Fast, cheap and
under control: the next implementation fabric. In Proceedings of the 40th
conference on Design Automation, DAC, pages 354–355, New York, NY,
USA, 2003. ACM Press.
[24] Deepak D. Sherlekar. Design considerations for regular fabrics. In
Proceedings of International Symposium on Physical Design, ISPD,
pages 97–102, New York, NY, USA, 2004. ACM Press.
[25] B. Zahiri. Structured ASICs: opportunities and challenges. In Pro-
ceedings of 21st International Conference on Computer Design, pages
404–409, 2003.
[26] V. Kheterpal, V. Rovner, T. G. Hersan, D. Motiani, Y. Takegawa, A. J.
Strojwas, and L. Pileggi. Design methodology for IC manufacturability
based on regular logic-bricks. In Proceedings of the 42nd annual
conference on Design Automation, DAC, pages 353–358, New York,
NY, USA, 2005. ACM Press.
[27] T. Jhaveri, L. Pileggi, V. Rovner, and A. J. Strojwas. Maximization
of layout printability/manufacturability by extreme layout regularity. In
Proceedings of SPIE, 2006.
[28] C. Menezes, C. Meinhardt, R. Reis, and R. Tavares. Design of regular
layouts to improve predictability. In Proceedings of the 6th International
Caribbean Conference on Devices, Circuits and Systems, pages 67–72,
2006.
[29] Y. Ran and M. Marek-Sadowska. Designing via-configurable logic
blocks for regular fabric. IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, 14(1):1–14, 2006.
[30] A.D. Lopez and H.-F.S. Law. A dense gate matrix layout method for
MOS VLSI. IEEE Transactions on Electron Devices, 27(8):1671–1675,
1980.
[31] C. Piguet, J. Zahnd, A. Stauffer, and M. Bertarionne. A metal-oriented
layout structure for CMOS logic. IEEE Journal of Solid-State Circuits,
19(3):425–436, 1984.
[32] H.J.M. Veendrick, D.A.J.M. van den Elshout, D.W. Harberts, and
T. Brand. An efficient and flexible architecture for high-density gate
arrays. Solid-State Circuits, IEEE Journal of, 25(5):1153–1157, 1990.
[33] David S. Moore and George P. McCabe. Introduction to the Practice of
Statistics. Freeman & Co, 1989.
[34] Sunil R. Shenoy and Akhilesh Daniel. Intel Architecture and Silicon
Cadence: The Catalyst for Industry Innovation. Technology@Intel
Magazine, pages 1–7, October 2006.
[35] J. A. Torres and C. N. Berglund. Integrated circuit DFM framework for
deep sub-wavelength processes. In Lars W. Liebmann, editor, Design
and Process Integration for Microelectronic Manufacturing III, volume
5756, pages 39–50. SPIE, 2005.
