Optimal 2-D cell layout with integrated transistor folding by Avaneendra Gupta & John P Hayes
1.  ABSTRACT
Folding, a key requirement in high-performance
cell layout, implies breaking a large transistor into
smaller, equal-sized transistors (legs) that are
connected in parallel and placed contiguously with
diffusion sharing. We present a novel technique
FCLIP  that integrates folding into the generation of
optimal layouts of CMOS cells in the two-
dimensional (2-D) style.FCLIP  is based on integer
linear programming (ILP) and precisely formulates
cell width minimization as a 0-1 optimization
problem. Folding is incorporated into the 0-1 ILP
model by variables that represent the degrees of
freedom that folding introduces into cell layout.
FCLIP  yields optimal results for three reasons: (1) it
implicitly explores all possible transistor
placements; (2) it considers all diffusion sharing
possibilities among folded transistors; and (3) when
paired P and N transistors have unequal numbers of
legs, it considers all their relative positions.
FCLIP  is shown to be practical for relatively large
circuits with up to 30 transistors. We then extend
FCLIP  to accommodate and-stack clustering, a
requirement in most practical designs due to its
benefits on circuit performance. This reduces run
times dramatically, making FCLIP  viable for much
larger circuits. It also demonstrates the versatility
of FCLIP ’s ILP-based approach in easily
accommodating additional design constraints.
2.  INTRODUCTION
Cell layout synthesis falls in the category of constrained
optimization whose goal is to find a solution that optimizes
some cost function under a set of constraints. The cost
function can be the cell area, its delay, or a combination of
these. The constraints include bounds on width or height,
aspect ratio, number of diffusion rows, or the maximum size
of transistors. Since cell layout optimization is NP-hard [3],
any exact algorithm can, in the worst case, have an
exponential run time. Therefore, most prior techniques for
cell synthesis have avoided optimal algorithms in favor of
faster, but less exact heuristic methods.
Maziasz and Hayes [13] have shown that for one-
dimensional (1-D) cell layout, exact algorithms can be both
computationally feasible and generate significantly better
solutions than heuristic methods. Recently, the authors
successfully employed an exact optimization method, integer
linear programming (ILP), in theCLIP technique to generate
2-D layouts of minimum width and height [7]. The layout
problem is formulated as a 0-1 ILP problem, which is then
solved using an off-the-shelf integer solver.CLIP provides
optimum layouts for practical-sized cells. However, it
assumes equal transistor sizes. Often, transistors must be
individually sized to meet a circuit’s performance goals. This
leads to unequal transistor sizes that can waste cell area. To
meet the performance goals and minimize area, large
transistors must be folded or split into smaller transistors of
uniform size. In this paper, we present the first exact
technique that integrates transistor folding into 2-D cell
synthesis. This technique, calledFCLIP (Folding in Cell
Layout via Integer Programming), extendsCLIP to generate
optimal 2-D cell layouts with folding.
FCLIP minimizes cell area in the following stages: First,
transistors are folded based on user-specified limits on the
maximum size of the P and N transistors. The input circuit is
preprocessed to generate P/N pairs and identify and-stacks,
that is, transistors that are connected in series. And-stack
clustering is not only necessary in practical designs, but also
reduces the complexity of the problem and, in turn,FCLIP’s
run times. Then an ILP model is formulated and solved to
determine a 2-D layout of minimum widthWmin; this model
maximizes diffusion sharing among folded transistors and
minimizes vertical inter-row connections. A second ILP
model is then constructed to generate a layout that has width
Wmin and minimum height, measured by the number of
horizontal routing tracks. This paper only discusses 2-D cell
width minimization with folding; however,FCLIP can be
extended to minimize cell height also.
FCLIP yields optimal results with folding for two reasons:
(1) It implicitly explores all diffusion sharing possibilities
among folded transistors; and (2) when paired P/N
transistors have unequal numbers of legs, it considers all
their relative positions. Not only doesFCLIP support 2-D
layout, it is superior to prior folding techniques proposed for
1-D layout [9, 11, 14] which consider folding only after a
transistor placement has been determined, and, as we show
later, can produce suboptimal layouts.FCLIP’s optimal
method is particularly targeted towards the layout of
standard-cells and datapath bit-cells for high-volume, high-
performance microprocessor designs. These cells, which
typically have 20-40 transistors, are used multiple times and
✝ This research was sponsored by a grant from Intel Corporation.
Optimal 2-D Cell Layout with Integrated Transistor Folding ✝
Avaneendra Gupta
Cadence Design Systems, Inc.
555 River Oaks Parkway, #1A1




Dept. of EECS, University of Michigan
1301 Beal Avenue
Ann Arbor, MI 48109
1-734-763-0386
jhayes@eecs.umich.edu
must have their layouts optimized as much as possible.
We begin by describing transistor folding and its advantages
in Section 3. Section 4 reviews relevant aspects of theCLIP
method [7] for minimizing 2-D cell width. Section 5 extends
CLIP to FCLIP by integrating folding into ILP model
formulation. Finally, Section 6 demonstrates the versatility
of FCLIP’s ILP-based approach by incorporating and-stack
clustering, a requirement in most practical designs.
3.  TRANSISTOR FOLDING
The assumed 2-D cell layout style is illustrated in Fig. 1,
and generalizes the well studied 1-D style [4, 6, 9, 14]. 2-D
cells contain multiple rows of P and N diffusions calledP/N
rows. The P and N transistors of a P/N row are grouped into
P/N pairs using standard techniques [6]. In dual CMOS
circuits, the P and N transistors of each P/N pair share a
common gate net. The basic assumptions of our 2-D style
are summarized in Table 1 [7]; they are typical of those used
in the layout of standard-cells and datapath bit-cells in
current microprocessor designs.
In practical cell designs, the size of each transistor is
determined individually to meet the circuit’s performance
goals such as rise and fall delays [15]. Thesize of a
transistor, illustrated in the cross-sectional view of Fig. 2a,
is defined as the width of its P or N channel. Since
transistors are placed horizontally, their sizes affect both cell
width and height. Hence, non-uniformity in transistor size
often leads to wasted cell area.Transistor folding is the
process of splitting a transistor into two or more smaller,
equal-sized transistors calledlegs. The legs of a folded
transistor are connected in parallel and are typically placed
contiguously with diffusion sharing in the cell layout.
Figure 2 illustrates folding into an odd or even number of
legs. The nets at the ends of the folded transistor depend on
the parity of the number of legs. When a transistor with
source and drain nets (S, D) has an odd number of legs, the
nets at its ends remain (S, D) irrespective of the its
orientation (Fig. 2b). However, with an even number of legs
(Fig. 2c), the end nets depend on the transistor’s orientation:
(S, S) if the transistor is placed unflipped, and (D, D) if
flipped. The placement affects diffusion sharing with
adjacent transistors and hence, both orientations of a folded
transistor should be considered for possible abutment.
Figure 3 demonstrates the impact of folding on diffusion
sharing and the cell width. A minimum-width 1-D
placement without folding for the NMOS circuit of Fig. 3a
is shown in Fig. 3b; it corresponds to the chain cover [6]
{ abcde, f}. If transistorc is folded into two legs, a diffusion
gap is introduced in the chainabcde betweenc andd (Fig.
3c). However, if a different cover {abfde, c} is chosen,
folding c does not introduce any new gaps. On the other
hand, foldingf into two legs transforms the placement of
Fig. 3b to that of Fig. 3d, which yields a placement without
gaps (Fig. 3e). Hence, folding can often be exploited to
increase diffusion sharing and eliminate diffusion gaps.
Finally, if the transistors of a P/N pair have different
numbers of legs, the diffusion sharing possibilities are
further increased. Consider the P/N pair in Fig. 4 with its
transistors folded into 3 and 5 legs, respectively. Since the P
transistor has fewer legs, it can be placed in three possible






















1. The circuit to be laid out is a static dual CMOS circuit of fixed
structure (no transistor reordering is allowed).
2. Alternate P/N rows are flipped to allow the power rails to be shared
among adjacent diffusion rows.
3. P and N transistors of a pair are vertically aligned so that their
terminals on common nets can be connected using vertical wires.
4. Intra-cell routing is restricted to polysilicon and Metal1, each of
which can be used in the vertical or horizontal direction.
5. When a net spans multiple P/N rows, connections are made to join
terminals that are on transistors placed closest together.
6. Terminals that are on transistors placed in adjacent diffusion rows
are connected using routes in the channel between the rows.
Hence, such connections do not affect the width of the cell.
7. Diffusion areas do not allow Metal1 wires to be routed over them.
All routes that span different rows are routed along the sides of the
cell, and contribute to the widths of the rows they pass through.



















Fig. 2: (a) A transistor and its folding into (b) an odd (three)
and (c) an even (two) number of legs










(a) Schematic (NMOS) circuit
(b) Layout without folding
(c) Layout with c folded
(d) Layout with f folded into
(e) Layout derived from (d)
a b c d e f a b c d e f f
1 o
1 g
2 g 2 o 21 o
1 g
2 g
2 o 2 legs of f
a b c d e f




2 legs of c
g
a b c d f f e
1 o
1 g
2 o 2 g
2 legs of f
into 2 legs with nogaps
2 legs
ways relative to the N transistor: (a) left-justified, (b)
centered, and (c) right-justified. Each placement affects the
nets at the ends of the P/N pair. Moreover, since no net
appears on one side of the P diffusion, there is no net to be
matched (shared) with the pair placed on that side. The
centered position is the most flexible since it relaxes
diffusion sharing on both sides of the P diffusion. Thus,
when the transistors of a P/N pair have unequal numbers of
legs, their placements can increase the possibility of
diffusion sharing with other pairs.
The above effects of folding on diffusion sharing and the
cell width are further magnified when we consider pair-wise
diffusion sharing and allow an arbitrary number of legs for
each transistor. In addition, folding affects the cell height
since it reduces the height of each leg of the folded
transistor and affects the routing within the cell. For 1-D
layout, folding significantly reduces cell area and control of
the cell’s aspect ratio [6]. In summary, folding affects both
the cell width and height for the following reasons: (a) It
decreases the transistor’s height, which affects the cell
height; (b) it increases the number of transistors, which
affects the cell width and intra-cell routing; (c) the
orientation of a even-legged transistor affects the transistor’s
diffusion sharing with other transistors; and (d) when the P/
N transistors of a pair have unequal numbers of legs, their
relative placement—centered, left-justified, or right-
justified—also affects diffusion sharing.
Thus, folding should be considered during the process of
transistor placement, and not later, as is usually the case [9,
11, 14]. Four different folding problems can be defined:
1. Static placement and folding: Given a pre-specified 2-D
transistor placement and limits on transistor size (folding
limits), fold transistors in place and determine their orienta-
tion to preserve the placement and minimize area.
2. Static placement with dynamic folding: Given a 2-D
transistor placement, determine the number of legs and ori-
entation for each transistor so that cell area is minimized
3. Dynamic placement with static folding: Given folding
limits, fold the transistors and determine their 2-D place-
ment, and their orientation to minimize the cell area.
4. Dynamic placement and folding: For each transistor,
determine the number of its legs, their 2-D placement, and
their orientation, so that the overall area is minimized.
Problem 1 is the simplest since its only goal is to find an
orientation for each transistor that maximizes diffusion
sharing and minimizes cell height. Figure 5 illustrates how
flipping transistors and changing the relative positions of the
transistors of P/N pairs can affect diffusion sharing. The
layout of Fig. 5b is obtained by folding the transistors of
Fig. 5a. Both layouts have one diffusion gap. However, by
flipping P1, making N2 right-justified with respect to P2, and
flipping N2, we obtain a layout with no gaps (Fig. 5c).
Problem 2 also pre-specifies a transistor placement, but
determines the number of legs for each transistor during the
solution phase. The dynamic-programming technique of
Her and Wong [8] solves this problem exactly for 1-D
layouts and heuristically for 2-D layouts. Other heuristic
techniques for this problem are included in theGENAC
[14], LiB [10], andTHEDA.P [11] 1-D cell synthesis tools.
However, since problem 2 fixes the transistor ordering
before folding, it can restrict the number of folds—to an odd
number of legs to prevent chain splitting—and produce
suboptimal abutments. Figure 6a shows a layout with two
chains placed with one diffusion gap. WhenN2 has an odd
number of legs, no new gap is introduced (Fig. 6b).
However, even folding ofN2 (Fig. 6c) requires a new gap
betweenN2 andN3. In this case, if transistor ordering is not
fixed, the positions ofN3 andN4 can be swapped, allowing
N2 to share diffusion withN3. This yields the layout of Fig.
6d, which still requires only one gap.
The third problem—dynamic placement with static
folding—is a generalization of the first. It fixes the number
of legs for each transistor, and then determines their position
and orientation to minimize cell area. For 1-D layout, this
problem has been addressed by Gupta, et al. in theXPRESS
[6] tool and by Malavasi and Pandini [12].
Finally, problem 4 is the most general in that it allows both
placement and folding to be dynamic. Hence, any technique
for this problem must simultaneously select the amount of
folding for each transistor and determine its best placement
and orientation.
In this paper, we address the third problem—we assume
static folding and aim at dynamically determining a 2-D
(a) (c)(b)
Fig. 4: Placement of pairs with unequal legs and the P
transistors (a) left-justified, (b) centered, (c) right-justified.
Fig. 5: Illustration of folding with a pre-specified placement:
Layout (a) before folding, (b) after folding with transistors in








4 5 5 6
7
5 84
1 2 3 2
2 1













Fig. 6: Effect of even and odd folding on diffusion sharing
a abbaa bb
abbaa b
(a) Original layout with
(c) Layout after folding
(b) Layout after folding
3 legs of N2

















2 chains N 2 three times
N2 twice reordering N 3 and N4
placement that minimizes cell width. This is most widely
applicable since practical designs typically specify folding
limits for P and N transistors. Moreover, as discussed
earlier, determining the 2-D placement after folding has
several area advantages over static placement.
4.  WIDTH MINIMIZATION
If Wr is the width of ther-th P/N row, the2-D cell-width
minimization problem can be stated as follows [7]:
Minimize the cell widthWcell by placing the P/N pairs in a
given number of rows such that the maximum width among
all rows is minimized. That is, minimizeWcell
Wcell = max {Wr: for each P/N row r = 1, 2, ...} (1)
As discussed in [7],Wcell for a 2-D layout depends on
several factors: diffusion sharing, inter-row connections that
run vertically between P/N rows and so add to their width,
and the diffusion type (P or N) placed at the bottom of the
cell. The widthWr of row r can be expressed as follows,
where tr, cr, andvr are the numbers of pairs, chains, and
inter-row wires, respectively, in rowr:
Wr = tr + cr – 1 +vr (2)
Hence, the following layout characteristics need to be
determined: the row, position, and orientation of each pair,
the inter-pair diffusion sharing; and the diffusion type, P or
N, to be placed at the cell bottom.
We now review theCLIP technique of [7] for minimizing 2-
D cell width. CLIP uses integer linear programming (ILP)
as its core optimization engine. The layout parameters are
modeled as 0-1 variables; the width minimization objective
is converted to a linear cost function; and linear constraints
over the 0-1 variables are used to ensure a valid 2-D layout.
Inputs and outputs. Table 2 lists the input and derived
parameters forCLIP. To represent the position of each pair
in the 2-D plane, we introduce place-holders calledslots in
each row in which pairs will be placed. Slots are numbered
in increasing order from the left. Figure 7 illustrates row and
slot numbering. We also define the following sets of
integers: slots = {1,2,...maxSlots}, rows = {1, 2, ...,
numRows}, and orients= {1, 2, 3, 4} representing the four
possible orientations for each P/N pair. The boolean array
share represents diffusion sharing between two P/N pairs.
The elementshare[pi, oi, pj, oj] is set to 1 if pairspi andpj
can be placed adjacent to each other in orientations,oi and
oj, respectively, with diffusion abutment.
The basic decision variables for each pair are represented by
0-1 arraysX andXor, whereX[p, s, r] = 1 implies pairp is
placed in slots of row r, andXor[p, o] = 1 implies p is
placed in orientationo. To model diffusion sharing, we
introduce arraynogap where nogap[s, r] = 1 if adjacent
slotss ands + 1 have no gap between them. Then the width
of each P/N rowWr can be expressed as follows.
Wr  = #pairs in rowr + #gaps in rowr + #vertical wires (3)
 = #pairs in rowr + (#pairs in rowr – 1 – #abutments) +vr
 = 2× Σ X[p, r] – Σ nogap[s, r] + vr – 1
Constraints. We now describe the constraints ofCLIP that
enforce a valid 2-D layout.
1. Pair inclusion: Each pair must be placed in exactly one
slot with one orientation.
Σ Σ X[p, s, r] = 1 ∀ p ∈ pairs (4)
s ∈ slots r∈ rows
 Σ Xo[p, o] = 1 ∀ p ∈ pairs (5)
 o ∈ orients
2. Slot occupancy: We force the first slot in each P/N row
to be filled with exactly one pair, and slots to be filled in a
left-justified order, that is, in each rowr, the slots should be
occupied before the slots + 1.
 Σ X[p, 1, r]=1 ∀ r ∈ rows (6)
 p ∈ pairs
Σ X[p, s – 1,r] ≥ Σ X[p, s, r] ∀ r ∈ rows, (7)
 p ∈ pairs p∈ pairs s∈ slots
3. Diffusion sharing: The variablenogap[s, r] can be
defined by the following logic equation:
nogap[s, r] (8)
= for every pairpi, pj of pairs that can share diffusion
for  each orientationoi of pi, oj of pj, share[pi, oi, pj, oj] = 1
or (pi is placed in slots in row r
and pi is placed in orientationoi
and pj is placed in slots + 1 in rowr
and pj is placed in orientationoj)
= or {X[pi, s, r] and or {X[pj, s+1, r] and merged[pi, pj]:
∀ pj ∈ pairs}: ∀ pi ∈ pairs}
Heremerged[pi, pj] is 1 if pair pi can share diffusion with
pairpj placed to its immediate right.
merged[pi, pj] (9)
= or {Xor[pi, oi] and Xor[pj, oj]:
∀ oi, oj ∈ orientssuch thatshare[pi, oi, pj, oj]}
= Xor[pi, 1] and or {Xor[pj, oj]:
∀ oj ∈ orients such thatshare[pi, 1, pj, oj]}
 or Xor[pi, 2] and or {Xor[pj, oj]:
∀ oj ∈ orients such thatshare[pi, 2, pj, oj]}
or Xor[pi, 3] and or {Xor[pj, oj]:
∀ oj ∈ orientssuch thatshare[pi, 3, pj, oj]}
or Xor[pi, 4] and or {Xor[pj, oj]:
∀ oj ∈ orients such thatshare[pi, 4, pj, oj]}
Finally, we ensure that for any given P/N pair, there can be
at most one other pair placed on its immediate left or right
side with diffusion abutment.
Σ merged[pi, pj] ≤ 1 ∀ pi ∈ pairs (10)
 pj ∈ pairs
Σ merged[pi, pj] ≤ 1 ∀ pj ∈ pairs (11)




The number of pairs, P/N diffusion rows, and
slots respectively
2. pairs, rows, slots, nets The set of pairs, rows, slots, and nets
3. PpairNets, NpairNets PpairNets[p] = {gate, source, and drain nets
of the P transistor of pair p} (NpairNets is
similarly defined)
4. nDifffAtBottomOfCell A decision variable that is 1 (0) if the N




Psrc[p, n] = 1 if pair p has net n on the source
diffusion of its P transistor
(Pgate[p, n] and Pdrn[p, n] are similarly




Nsrc[p, n] = 1 if pair p has net n on the source
diffusion of its N transistor
(Ngate[p, n] and Ndrn[p, n] are similarly
defined for gate / drain terminals)
7. share[pairs, orients,
pairs, orients]
share[pi, oi, pj, oj] = 1 if pair pi in orient oi can
share diffusion with pair pj in orient oj;
Table 2: Input (1–4) and derived (5–7) parameters forCLIP
4. Inter-row connectivity: These constraints determine the
nets that must be routed from one P/N row to another. We
introduce the 0-1 variablesV, whereV[r, n] = 1 if net n is
routed vertically along rowr. To representV, we introduce
four simpler, auxiliary 0-1 variables,A[r, n], Y[ r, n], T[r, n],
andB[r, n], which take the value 1 if netn appears above
row r, below rowr, in the top diffusion of rowr, and in the
bottom diffusion of rowr, respectively. By enumerating all
sixteen assignments of 0-1 values to the four variablesA, Y,
T, andB, the following reduced sum-of-products expression
is obtained forV:
V  = (A and Y and not T) or (A and B and not T) (12)
or (Yand not B and T)
Constraints (8, 9, 12) are nonlinear since they involve
logical and andor operators, so they must be converted to
linear inequalities to be included in an ILP model. As shown
in [7], these constraints can be linearized without
introducing any new variables.
5.  WIDTH MINIMIZATION WITH FOLDING
We now describe how the above width minimization model
can be extended to incorporate transistor folding inFCLIP.
Inputs and outputs. Given folding limits on maximum
transistor size, we first compute the number of legs for each
transistor. Let legs be an integer array, wherel gs[p]
contains the larger of the number of legs of the P and N
transistor of P/N pairp.
As discussed in Section 3, transistor folding introduces a
new degree of freedom to placement: the relative
positions—centered, or left/right justified—of the
transistors of each P/N pair. Hence, we change the array
share of CLIP to a 6-dimensional 0-1 array, whereshare[pi,
oi, xi, pj, oj, xj] is 1 if pairspi andpj can share diffusions in
orientationsoi and oj, and in relative positionsxi and xj,
respectively. Herexi and xj take values 1, 2, or 3 for
centered, left-justified, and right-justified, respectively. The
value ofshare is again determined from the circuit’s netlist.
Fig. 8 illustrates the various diffusion sharing possibilities
between two pairs P1 and P2, where P transistor of P1 has 3
legs while its N transistor has 5 legs. Observe that in left-
justified and centered positions, pair P1 has no P-diffusion
net on its right side. Therefore, since the P-diffusion net to
the left of P2 can be arbitrary, the P transistors of both pairs
can be placed either unflipped or flipped.
To model the three different relative positions of each P/N
pair, we change theXor variables of theCLIP model to
Xor[pairs, orients, positions], whereXor[p, o, x] = 1 when
pair p is placed in orientationo and in relative positionx. If
p’s transistors have the same leg count, we setΣ Xor[p, o, 2]
= Σ Xor[p, o, 3] = 0 (over allo ∈ orients) for the left (x = 2)
and right-justified positions (x = 3). If the difference in the
number of legs is 1, then we setΣ Xor[p, o, 1] = 0 (over allo
∈ orients) for the centered positionx = 1.
The widthWr of each rowr, defined by Eq. (3), is modified
in FCLIP to consider the leg countlegs[p] of each pairp.
Wr = #pairs in rowr (13)
 + (#pairs in rowr – 1 – #abutments) +vr
 = 2× Σ legs[p] × Xrow[p, r] – Σ nogap[s, r] + vr – 1
Constraints. First, we introduce a new constraint to ensure
that each P/N pairp is placed in exactly one orientation and
relative position.
 Σ  Σ Xor[p, o, x] = 1 ∀ p ∈ pairs (14)
o ∈ orients x∈ {1, 2, 3}
The pair inclusion, slot occupancy, and inter-row
connectivity constraints ofCLIP remain the same inFCLIP.
The diffusion sharing constraint (8) is modified to handle
the relative positions of P/N pairs. The variablenogap[s, r],
which must be 1 if there is no gap between slotss ands + 1
in row r, is now defined as
nogap[s, r] (15)
= for  every pairpi andpj of P/N pairs
for  every relative positionxi of pi andxj of pj
 for each orientationoi of pi andoj of pj
or (pi is placed in slots in row r in orientationoi
   andpj is placed in slots + 1 in rowr in orientationoj
and pi is placed in positionxi
   andpj is placed in positionxj)
= or (X[pi, s, r] and (or (X[pj, s+1, r] and merged[pi, pj]:
∀ pj ∈ pairs)): ∀ pi ∈ pairs)
Here, as inCLIP, merged[pi, pj] = 1 if pairspi andpj are in
orientations that allowpi to share diffusion with pairpj to its
immediate right. Hence,
merged[pi, pj] (16)
= or (Xor[pi, oi, xi] and Xor[pj, oj, xj]
:∀ oi, oj, xi, xj such thatshare[pi, oi, xi, pj, oj, xj])
= Xor[pi, 1, 1]
and (or (Xor[pj, oj, xj]: ∀ oj, xj, share[pi, 1, 1,pj, oj, xj]))
 or Xor[pi, 1, 2]
and (or (Xor[pj, oj, xj]: ∀ oj, xj, share[pi, 1, 2,pj, oj, xj]))
or Xor[pi, 1, 3]
and (or (Xor[pj, oj, xj]: ∀ oj, xj, share[pi, 1, 3,pj, oj, xj]))
or Xor[pi, 2, 1]
and (or (Xor[pj, oj, xj]: ∀ oj, xj, share[pi, 2, 1,pj, oj, xj]))
 ...
or Xor[pi, 4, 3]
and (or (Xor[pj, oj, xj]: ∀ oj, xj, share[pi, 4, 3,pj, oj, xj]))
Equation (16) is a sum-of-products expression with product
terms of the formXor[pi, oi, xi] and (Xor[pj, 1, 1]or Xor[pj,
1, 2] or Xor[pj, 1, 3]or Xor[pj, 2, 1]or ...) for every pairpi
andpj of P/N pairs, and for every orientationi of pi. Now,
merged[pi, pj] appears as a positive term in expression (15)
for nogap[s, r] and hence, as a negative term in the
minimization cost function (13). It is therefore sufficient to
have constraints that setmerged[pi, pj] to 0 whenpi andpj
cannot share their adjacent diffusions;merged[pi, pj] will






Fig. 7: Rows, slots, and diffusion gaps in a 2-D cell layout
nogap[1, 1] nogap[maxSlots–1, 1]
nogap[1, r] nogap[maxSlots–1, r]
nogap[s, 1]
nogap[s, r]
s s + 1
solved. Since, for any pairpj, at most one ofXor[pj, oj, xj]
can be set to 1 in a given solution, each product term of (16)
is equivalent to the linear inequality below:
2 × merged[pi, pj] ≤ (17)
 Xor[pi, oi, xi]
+ Σ Σ share[pi, oi, xi, pj, oj, xj] × Xor[pj, oj, xj]
 oj ∈ orients xj ∈ {1,2,3}
+ 2 × Σ Σ Xor[pi, ok, xk]
ok ∈ orients, ok ≠ oi xk ∈ {1,2,3}, xk ≠ xi
In words, for every orientationoi and relative positionxi of
pi, if Xor[pi, oi, xi] = 1, thenmerged[pi, pj] = 1 if and only if
pair pj is placed in some orientation and relative position in
which it can share diffusions withpi. However, ifXor[pi, oi,
xi] = 0, thenmerged[pi, pj] is made independent ofi andxi
by the last term in the above inequality.
Experimental results. We now present the results of
experiments that applyFCLIP to five representative CMOS
circuits taken from various sources. All our ILP models
were specified inAMPL (A Mathematical Programming
Language) [5], a high-level language that allows the models
to be described in parameterized form, that is,
independently of the input data used for a specific instance.
We evaluated several general-purpose ILP solver programs
but found the specialized 0-1 solverOPBDP [1] to be best
suited to our application. AllFCLIP run times presented
here have been obtained withOPBDP.
Table 3 gives the minimum-width 2-D layouts obtained with
FCLIP for the test circuits. The layouts in one, two, and
three P/N rows were generated by enforcing different
folding limits on the P and N transistors.OPBDP’s -h103
heuristic for selection of the branching variable was used for
layout in a single row; for two or more rows,OPBDP’s
h101 heuristic proved to have the shortest run times.
For circuits as large as the full adder with 28 transistors,
FCLIP run times are in seconds in most cases. Compared to
CLIP, the only additional variables that folding introduces in
FCLIP are due to the relative positions—centered, left-, or
right-justified—of the P and N transistors of pairs. These
new variables depend on the difference in the numbers of
legs of the P and N transistor of a pair. If this difference is
either zero or more than one, then only the centered position
must be considered—no additional variables are required.
However, for a difference in leg count of one, both the left
and right-justified positions must be considered, which
requires the variablesXor[p, o, 2] and Xor[p, o, 3], and
increases the total number of variables for pair by four
(one for every orientationo ∈ orients). As is evident from
Table 3, the run times with transistor folding equal those
without folding in most cases. In a few of the largest
circuits, the run times with folding are over an order of
magnitude more than those without folding, mainly due to
the additionalXor variables.
FCLIP considers each pair separately. However, in most
practical designs, transistors that are connected in series in
the circuit, calledand-stacks, must be placed contiguously
in the layout for area and performance enhancement. The
next section extendsFCLIP to incorporate and-stack
clustering, and shows that this can dramatically reduce run
times and extendFCLIP to larger circuits.
6.  AND-STACK CLUSTERING
An and-stack [7] of sizen is a group ofn ≥ 2 transistors
connected in series. For example, the N transistorsNi, Ni+1,
..., Nj in Fig. 9a form an and-stack. Since the nets that
connect two series-connected transistors, calledinternal
nets, do not connect to any other terminal, they do not
require diffusion-to-metal contacts, calledstraps, when
these transistors are placed contiguously via diffusion
sharing. The absence of straps allows the transistors to be
placed closer together. Not only does this save cell area, but
the resulting smaller diffusion area has better electrical
properties as well. Hence, most practical designs place and-
stacks contiguously.
We first extend the ILP model (without folding) of Section 4
to implement and-stacks of arbitrary size in the circuit. For
each and-stack, we introduce constraints on the relative
Fig. 8: Relative positions of transistors of a P/N pair P1 and
their orientations (1, 2, 3, or 4) that allow for diffusion sharing
with another pair P2 placed to its immediate right
a b a b
































a.A * indicates that OPBDP did not terminate after one hour.








Wcell OPBDP time (secs) a
P/N folding limits P/N folding limits




1 10 12 21 21 0.4 10 1 5
2 5 6 10 11 1 2 543 5
3 4 4 8 8 1 1 2 3





1 11 17 20 26 0.1 95 0.3 1
2 8 11 12 15 1 14 12 9
3 5 7 8 10 7 41 68 96






1 14 17 24 38 50 7 137 2,261
2 7 9 12 19 77 40 92 73
3 5 6 8 12 31 13 41 36




1 14 17 25 42 21 3 1 49
2 7 10 13 22 6 35 172 13
3 5 7 9 15 27 61 137 120
None 10/4 8 / 3 5 / 2 None 10/4 8/3 5/2
Full adder
[9] 28 17
1 16 17 * * 12 2 * *
2 8 10 17 18 6 20 1,335 2,122
3 6 7 12 12 90 290 1,857 3,137
Table 3: Minimum width layouts and run times obtained by
FCLIP  for various P and N folding limits
placement of its constituent P/N pairs, and on the diffusion
sharing between them. We then show how and-stack
incorporation into the transistor folding modelFCLIP is just
a special case of the model without folding.
Inputs and outputs. Let stacks be the set of and-stacks in
the circuit and letnumStacks be the total number of stacks.
Let stkSize[stk] specify the size of stackstk in terms of the
number of its P/N pairs. LetstkPairs[stk, stkSize[stk]]
contain the list of pairs of stackstk, ordered by their
connectivity in the stack. Since internal nets will be
connected via diffusion sharing, they cannot contribute to
cell width or height and hence are dropped from the model.
We introduce new variables to model the placement and
orientation of each stack. LetXrowStk[stacks, rows] be
binary variables whereXrowStk[stk, r] = 1 if stackstk is
placed in rowr. Each stack comprising the ordered pairs (pi,
pi+1, ...,pj) can be placed in two orientations: unflipped (pi,
pi+1, ...,pj) or flipped (pj, pj–1, ...,pi) (Fig. 9). We introduce
the binary variablestkDir[stacks] wherestkDir[stk] = 0 if
stackstk is placed unflipped, and 1 otherwise.
Constraints. We must ensure that each and-stack is placed
in exactly one row, and that its pairs are placed in
contiguous slots with diffusion sharing depending on its
orientation—unflipped or flipped.
1. Stack placement: Each stack must be placed in exactly
one row of the 2-D layout.
Σ XrowStk[stk, r] = 1 ∀ stk∈ stacks
r ∈ rows
In addition, all pairs of a stack must be placed in the same
row as the stack itself.
stkSize[stk] × XrowStk[stk, r] = Σ Xrow[stkPairs[stk, i], r]
i ∈ 1..stkSize[stk]
2. Stack pair placement: The adjacent pairs in stackstk
must be placed in contiguous slots where their order
depends on the value ofstkDir. For example, ifstkDir[stk] =
0 (Fig. 9b), then the difference in the slot values of pairspi+1
andpi is 1; if stkDir[stk] = 1 (Fig. 9c), then difference is –1.
Σ s × X[stkPairs[stk, i+1], s, r]  – Σ s× X[stkPairs[stk, i], s,r]
s ∈ slots, r∈ rows  s∈ slots, r∈ rows
 = 1 – 2× stkDir[stk]
3. Stack diffusion sharing: Adjacent pairs of each stackstk
must share their diffusions. Again, the order of diffusion
sharing depends on the value ofstkDir[stk]. While
merged[pi, pi+1] = merged[pi+1, pi+2] = ... =merged[pj–1, pj]
= 1 in the unflipped orientation, the flipped orientation must
have merged[pj, pj–1] = merged[pj–1, pj–2] = ... =
merged[pi+1, pi] = 1. This is modeled by the following con-
straints, one for each orientation, which imply that the num-
ber of consecutive pairs merged instk equalstkSize[stk] – 1.
 Σ merged[stkPairs[stk, i], stkPairs[stk, i+1]]
 i ∈ 1..stkSize[stk] – 1
= (stkSize[stk] – 1) × (1 –stkDir[stk])
 Σ merged[stkPairs[stk, i], stkPairs[stk, i–1]]
 i ∈ stkSize[stk]..2
= (stkSize[stk] – 1) × stkDir[stk]
Constraints (10, 11) formerged[pi, pj], that permit a given
pair to be merged with at most one pair on its left and right
sides, implicitly ensure that all pairs of an and-stack are
merged in the same direction. In order to incorporate folding
in the foregoing model, we note that when transistors are
folded, it is not always possible to lay out and-stacks
contiguously with diffusion sharing. Figure 10a shows an
and-stack with three transistors and its layout. Now, let each
transistor be folded into two legs. The resulting circuit and
one possible layout are shown in Fig. 10b. Any layout for
the folded and-stack requires at least one diffusion gap since
net2 appears on the right side of transistorb, and cannot be
merged with either the source or drain net of transistorc.
Therefore, for transistor folding, the stack placement and
stack pair placement constraints remain the same, while the
stack diffusion sharing constraints are eliminated.
In some designs however, when transistors of and-stacks are
folded, the desired placement is as shown in Fig. 10c. Here,
the legs of the folded transistors areinterlaced, that is, the
order of transistor legs isa-b-c-c-b-a (instead ofa-a-b-b-c-
c). This placement is possible because the circuit of Fig. 10b
is electrically equivalent to that of Fig. 10c in which the
internal nets2 and 3 do not require explicit connections.
Interlaced layouts, like Fig. 10c, have several area and
performance advantages over non-interlaced layouts:
• Interlacing allows transistor legs to be placed without
any diffusion gaps which reduces the cell area.
• Since internal nets do not have to be connected, their
diffusion terminals do not need diffusion-to-metal
contacts. This allows the legs to be placed closer,
reducing diffusion area and routing complexity and, in
turn, reducing overall area and enhancing performance.
To incorporate leg-interlacing intoFCLIP, all three sets of
constraints—stack placement, stack pair placement, and
Fig. 9: (a) Stackpi–pi+1–...–pj, and its placement in (b)



















Fig. 10: Effect of folding on the layout of and-stacks: (a) before













































1 2 3 4
a b c
stack diffusion sharing—are necessary. Only the value of
the share variables must be re-computed so as to accurately
reflect diffusion sharing between interlaced transistors.
This clearly demonstrates the versatility of our ILP-based
optimization technique in handling new design
requirements such as folding and leg-interlacing. Other
performance-oriented features such as minimizing the total
net length, reducing the diffusion area on critical nodes, or
maximizing the diffusion area on PWR/GND nodes can also
be easily accommodated by suitably modifying the cost
function and adding new constraints. In addition, the 2-D
width minimization problem solved byFCLIP can be
extended to address both width and height minimization
based on the technique ofCLIP described in [7].
Experimental Results. Table 4 presents results ofFCLIP
when and-stacking is used. As these results show, run times
drop significantly with and-stacking. On the average, and-
stacking reduces run times by one or two orders of
magnitude. For example, for the largest circuit with 32
transistors, run times in most cases are only a few seconds.
The cell widths with and-stacking deviate from those
without stacking by only 5% on the average for the circuits
tested. Thus, besides being a common performance-oriented
requirement in practical designs, and-stacking is very
effective in extendingFCLIP to larger circuits.
7.  CONCLUSIONS
FCLIP is the first algorithmic technique that integrates
transistor folding into optimal 2-D cell layout generation.
Unlike most prior folding methods,FCLIP folds transistors
before synthesis and is not restricted to 1-D layouts. Via its
ILP-based algorithm,FCLIP is able to implicitly explore all
possible 2-D transistor placements and diffusion sharing
possibilities, and so generates a 2-D layout that has the
minimum cell width under the modeling assumptions. Its
efficient ILP model formulation and off-the-shelf ILP solver
makeFCLIP practical for relatively large circuits with up to
30 transistors.FCLIP can also be extended along the lines
discussed in [7] to minimize cell height as well.
The ILP-based approach ofFCLIP is quite versatile, in that
new design requirements such as and-stack clustering and
leg-interlacing can be easily and efficiently accommodated.
In addition, and-stack clustering, while heuristic in nature,
is shown to significantly reduce run times, and still yield
near-optimal layouts for larger cells.
FCLIP’s optimal or near-optimal approach is particularly
targeted towards the layout of standard-cells or datapath bit-
cells in the design of high-volume, high-performance chips
such as microprocessors. These cells, which typically have
20-40 transistors, have three primary layout requirements:
(a) they must be optimized as much as possible since they
are used multiple times; (b) they must strictly adhere to
folding limits set by the technology; and (c) they must not
exceed a specified cell width which, in datapaths, is dictated
by the datapath-pitch., and therefore may require a 2-D
layout. FCLIP addresses all these requirements and is an
attractive technique for such application domains.
8.  REFERENCES
[1] P. Barth,Logic Based 0-1 Constraint Programming, Kluwer,
Boston, 1995.
[2] Cadence Design Systems, Inc.,Virtuoso Layout Synthesizer
Tutorial and Reference, 1992-94.
[3] S. Chakravarty, X. He, nd S. S. Ravi, “Minimum area layout
of series-parallel transistor networks is NP-hard,”IEEE
Trans. on CAD, vol. 10, no. 6, pp. 770-782, June 1991.
[4] C. C. Chen and S. L. Chow, “The Layout Synthesizer: An
Automatic Netlist-to-Layout System,”Proc. 26th Design
Automation Conf., pp. 232-238, June 1989.
[5] R. Fourer, D. M Gay, and B. W. Kernighan, AMPL: A Model-
ing Language for Mathematical Programming, Duxbury
Press/Wadsworth Publishing, Belmont, CA,1993.
[6] A. Gupta, S-C. The, and J. P. Hayes, “XPRESS: A Cell Lay-
out Generator with Integrated Transistor Folding,”Proc.
European Design & Test Conf., pp. 393-400, March 1996.
[7] A. Gupta and J. P. Hayes, “CLIP: An Optimizing Layout Gen-
erator for Two-Dimensional CMOS Cells,”Proc. 34th Design
Automation Conf., pp. 452-455, June 1997.
[8] T. W. Her, and D. F. Wong, “Cell Area Minimization by Tran-
sistor Folding,”Proc. European Design Automation Conf., pp.
172-177, 1993.
[9] D.D. Hill, “Sc2: A Hybrid Automatic Layout System,”Proc.
Int’l Conf. on CAD, pp. 172-174, Nov. 1985.
[10] Y-C Hsieh, et al., “LiB: A CMOS Cell Compiler,”IEEE
Trans. on CAD, vol. 10, pp. 994-1005, Aug. 1991.
[11] C-Y Hwang, et al., “An Efficient Layout Style for Two-metal
CMOS Leaf Cells and its Automatic Synthesis,”IEEE Trans.
on CAD, vol. 12, pp. 410-424, March 1993.
[12] E. Malavasi and D. Pandini, “Optimum CMOS Stack Genera-
tion with Analog Constraints,”IEEE Trans. on CAD, vol. 14,
pp. 107-122, Jan. 1995.
[13] R. L. Maziasz and J. P. Hayes,Layout Minimization of CMOS
Cells, Kluwer, Boston, 1992.
[14] C.L. Ong, J.T. Li, and C.Y. Lo, “GENAC: An Automatic Cell
Synthesis Tool,”Proc. 26th Design Automation Conf., pp.
239-244, June 1989.
[15] J-M. Shyu, A. Sangiovanni-Vincentelli, J. P. Fishburn, and A.
E. Dunlop, “Optimization-based Transistor Sizing,”IEEE J.
of Solid-State Circuits, vol. 23, pp. 400-409, Apr. 1988.
[16] K. Tani, et al., “Two-Dimensional Layout Synthesis for
Large-Scale CMOS Circuits,”Proc. Int’l Conf. on CAD, pp.
490-493, Nov. 1991.
a.A * indicates that OPBDP did not terminate after one hour.








OPBDP run times (secs.) a
 P/N folding limits




1 21 1 3 2 1 0.4 49 29
2 6 4 35 2 172 0.2 13 5
3 27 0.1 61 2 137 0.4 120 0.6





1 12 4 2 0.5 * 871 * 2,300
2 6 1 20 37 1,335 107 2,122 74
3 90 1 290 1 1,857 25 3,137 32





1 48 2 * 280 * 1,735 6 1
2 49 1 * 16 * 57 1,643 52
3 * 1 * 5 * 29 * 65
Table 4: FCLIP  run times without (unshaded columns) and
with (shaded columns) and-stack clustering
