The Y-Architecture for On-Chip Interconnect by Hongyu Chen et al.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. XX, NO. Y, MONTH 2004 1
The Y-Architecture for On-Chip Interconnect:
Analysis and Methodology
Hongyu Chen, Student Member, IEEE, Chung-Kuan Cheng, Fellow, IEEE, Andrew B. Kahng, Member, IEEE, Ion
M˘ andoiu, Qinke Wang, Student Member, IEEE, and Bo Yao, Student Member, IEEE
Abstract—The Y-architecture for on-chip interconnect is based
on pervasive use of 0-, 120-, and 240-degree oriented semi-
global and global wiring. Its use of three uniform directions
exploits on-chip routing resources more efﬁciently than tradi-
tional Manhattan wiring architecture. This paper gives in-depth
analysis of deployment issues associated with the Y-architecture.
Our contributions are as follows: (1) We analyze communication
capability (throughput of meshes) for different interconnect
architectures using a multi-commodity ﬂow approach and a
Rentian communication model. Throughput of the Y-architecture
is largely improved compared to the Manhattan architecture,
and is close to the throughput of the X-architecture. (2) We
improve existing estimates for the wirelength reduction of various
interconnect architectures by taking into account the effect of
routing-geometry-aware placement. (3) We propose a symmetri-
cal Y clock tree structure with better total wire length compared
to both H and X clock tree structures, and better path length
compared to the H tree. (4) We discuss power distribution under
the Y-architecture, and give analytical and SPICE simulation
results showing that the power network in Y-architecture can
achieve 8.5% less IR drop than an equally-resourced power
network in Manhattan architecture. (5) We propose the use of
via tunnels and banks of via tunnels as a technique for improving
routability for Manhattan and Y-architectures.
Index Terms—VLSI, interconnect architectures, Y-architecture
I. INTRODUCTION
T
HE Y-architecture refers to the use of 0-, 120-, and
240-degree oriented wires for on-chip interconnect,
along with supporting methodologies including hexagonal die
shapes, hexagonal power and clock distribution, etc. This name
is ﬁrst used in [8] in the same spirit as the “X architecture”
for pervasive use of 45- and 135-degree angles [32].
Compared to the traditional Manhattan (M-) architecture,
the Y-architecture offers many potential advantages, such
as substantially reduced wirelength and power consumption,
and increased communication bandwidth for a wide range
Work partially supported by Cadence Design Systems, Inc., the California
MICRO program, the MARCO Gigascale Silicon Research Center, NSF MIP-
9987678 and the Semiconductor Research Corporation. A preliminary version
of this work has appeared in Proc. SLIP 2003 and Proc. ICCAD 2003.
H. Chen, Q. Wang, and B. Yao are with the Department of Computer
Science and Engineering, University of California at San Diego, La Jolla, CA
92093-0114. E-mail: fhchen,qiwang,byaog@cs.ucsd.edu.
C.-K. Cheng and A.B. Kahng are with the Departments of Computer
Science and Engineering, and of Electrical and Computer Engineering,
University of California at San Diego, La Jolla, CA 92093-0114. E-mail:
fkuan,abkg@cs.ucsd.edu.
I. M˘ andoiu is with the Department of Computer Science and Engineering,
University of Connecticut, Storrs, CT 06269. E-mail: ion@engr.uconn.edu.
His work was partially performed while he was with the Department of
Electrical and Computer Engineering, University of California at San Diego.
of demand topologies. Combined with the M-architecture,
the Y-architecture can be applied to the upper two layers
to improve global interconnects, such as clock and power
distribution networks. Moreover, unlike the X-architecture, the
Y-architecture supports a regular routing grid and novel means
of avoiding via blockage effects.
Two previous series of works examine the potential use of
Y-architecture for integrated circuits: a series of LSI Logic
patents by Rostoker et al. [24] [25] [26], and a series of works
by Cheng and coauthors [7] [8]. Together, these works set
out a number of ideas for device architecture, ﬂoorplanning,
and place-and-route. However, a number of technical gaps still
exist, ranging from clock and power distribution methodology
to wireability and throughput analysis. In this work, we
provide a more complete, technically in-depth analysis of key
deployment and methodology issues associated with the Y-
architecture. Our main contributions are as follows:
 We give a more realistic throughput analysis using a
communication model based on Rent’s rule. Our results
show that the Y-architecture provides a throughput im-
provement of about 20% over the M-architecture for
a square chip, very close to the throughput of the X-
architecture.
 We improve existing estimates for the wirelength reduc-
tion of various interconnect architectures by taking into
account the effect of routing-geometry-aware placement.
Our estimate is based on a simulated annealing placer,
driven by wirelength in different routing geometries. We
also discuss and analyze a “virtuous cycle” effect: re-
duction of overall wirelength results in decreased routing
area, which in turn leads to further wirelength reduction.
 We discuss clock and power distribution under the Y-
architecture. For clock distribution we propose a symmet-
rical Y clock tree structure with better total wire length
compared to both H and X clock tree structures, and better
path length compared to the H tree. For power distribution
we give analytical and SPICE simulation results showing
that a mesh power network in Y-architecture can achieve
8.6% less IR drop than an equally-resourced mesh power
network in M-architecture.
 To fully utilize the uniform routing grid available in M-
and Y-architectures, and to deal with future increases in
via demand due to repeaters [27], we propose the use of
via tunnels and banks of via tunnels to improve routability
in these architectures. Such techniques are not obvious
with the X-architecture.IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. XX, NO. Y, MONTH 2004 2
 We discuss lithography and manufacturing infrastructure
needs, particularly in mask write, related to possible
adoption of the Y-architecture.
The remainder of the paper is organized as follows. Section
II presents throughput analysis for square-shaped chips. Sec-
tion III discusses wirelength reduction with hexagonal routing.
Sections IV and V examine clock and power distribution, and
Section VI discusses routability issues. The paper concludes
in Section VII. Discussion about the “virtuous cycle” wire-
length reduction effect, manufacturing issues and a supporting
approximation of IR-drop are given in the Appendices.
II. COMMUNICATION THROUGHPUT IN MESHES
A multi-commodity ﬂow (MCF) approach was developed
by Chen and coauthors [8] [9] to evaluate communication
efﬁciency of different interconnect architectures. Communi-
cation resources are decomposed into a 2D array of slots. A
uniform communication requirement is assumed, i.e., every
pair of nodes communicates with equal demand and all com-
munications occur at the same time. The throughput,deﬁned as
the maximum amount of communication ﬂow simultaneously
achievable between every pair of nodes, is computed by a
provably good multicommodity ﬂow (MCF) algorithm [13]
and is used to measure communication capabilities of different
interconnect architectures.
A. Rentian Communication Demand
The uniform pairwise communication used in [8] is simple
and general. However, it is not very realistic, since in a
well-designed layout the probability of communication de-
creases with increasing distance between nodes. Stroobandt
and Campenhout [28] derive from Rent’s rule an expression
for occupation probability, i.e., the probability that a given pair
of points will be connected by a wire in an optimal physical
placement of the circuit. For a hierarchical placement of a
circuit with Rent exponent p in a two-dimensional Manhattan
grid, the occupation probability of a pair of points with
Manhattan distance D between them can be approximated by
CD2p 4 where C a normalization constant.1 When only 2-pin
nets are considered, the occupation probability indicates the
probability of communication between pairs of nodes. In the
following, to ensure a fair comparison of the communication
throughput capabilities of different interconnect architectures,
we assume a Rentian communication demand, i.e., we set the
communication demand between any two unit-area slots to
be proportional to D2p 4, where D is the Euclidean distance
D between them. A widely quoted survey of Bakoglu [3]
indicates that the Rent exponent at the chip and module level
of high-speed computers is approximately 0.63.
B. Communication Throughput
Deﬁnition 1: The throughput is deﬁned to be the maximum
fraction of communication demand simultaneously satisﬁed
between every pair of nodes in nn square meshes.
1C depends on the routing architectures and the underlying distance metric.
(a) A 7￿ x￿ 7 mesh usin￿g Y￿-￿architecture.￿
(b) A 7￿ x￿ 7 mesh using M￿-￿
architecture.￿
(c) A 7￿ x￿ 7 mesh using X￿-￿
architecture.￿
Fig. 1. 77 meshes with different interconnect architectures.
We compute the throughput using the MCF algorithm. The
throughput is tightly correlated to routability, and describes
communication capabilities of different interconnect architec-
tures. Figure 1 illustrates three 77 meshes using different
interconnect architectures. For Y-architecture, the shape of
each slot is hexagonal, and the enclosing box of the slots is
close to square. Although Y-architecture meshes are different
from M- and X-architecture meshes, this does not signiﬁcantly
affect the communication demand. For the 1717 Y-mesh,
total communication demand is only 1:8% different from that
for other architectures.
In the experiments, total routing area is set to be the same
for all meshes. We normalize the computed throughput so
that it is independent of the dimension of meshes and total
communication demand.2 Table I lists the results for nn
meshes with n ranging between 9 and 17.3 Compared to
the M-architecture, the Y-architecture provides an average
throughput improvement of 19:8% for these meshes, which
is comparable to the 21:9% improvement achieved by the
X-architecture. For a 1717 mesh, Y-architecture provides
a throughput improvement of 20:6% while X-architecture
achieves an improvement of 22:1%.
For Manhattan architecture and Y-architecture, equally dis-
tributed edge capacities produces maximum throughput on n
by n meshes. For X-architecture, we show the optimal ratio of
the routing area of diagonal routing edges to that of Manhattan
edges in the last column. That ratio approaches 1.25 when n
increases.
Figure 2 show bottlenecks of communication ﬂows for 10
by 10 meshes using different interconnect architectures. The
fully saturated edges are highlighted with bold lines. For
Y-architecture, there is only one central horizontal cut line,
instead of horizontal and vertical cut lines for Manhattan
architecture. For X-architecture, there are two types of cut
lines: horizontal and vertical cut lines, and diagonal cut lines.
The throughput of meshes using X-architecture depends on
2For example, the computed throughput on a n  n mesh using Y-
architecture is normalized by
TDM
TDY Dc=n, where TDM and TDY are total
demand for M- and Y-architectures, respectively, and Dc is the communication
demand crossing the horizontal middle cut line on the Manhattan mesh.
3The experiments end up with size 17 is mainly because of CPU limit.
However, the current improvement results roughly show convergence and are
close to theoretical improvement bounds.IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. XX, NO. Y, MONTH 2004 3
TABLE I
NORMALIZED THROUGHPUT (AND IMPROVEMENT VS. M-ARCHITECTURE
IN PERCENTAGE) IN SQUARE MESHES WITH RENTIAN DEMAND. LAST
COLUMN GIVES THE OPTIMAL RATIO OF DIAGONAL TO AXIS-PARALLEL
ROUTING EDGES.
n #Mesh M-arch. Y-architecture X-architecture
Nodes Thrpt Thrpt Impr. Thrpt Impr. RA Ratio
9 81 1.989 2.354 18.30 2.412 21.25 1.18
10 100 1.989 2.366 18.92 2.419 21.59 1.20
11 121 1.987 2.374 19.47 2.420 21.78 1.23
12 144 1.986 2.382 19.94 2.423 22.00 1.25
13 169 1.991 2.386 19.84 2.425 21.76 1.25
14 196 1.990 2.392 20.19 2.429 22.02 1.25
15 225 1.988 2.395 20.47 2.429 22.14 1.25
16 256 1.992 2.400 20.44 2.430 21.98 1.25
17 289 1.992 2.402 20.58 2.433 22.11 1.25
 ￿  ￿
 ￿ (a) X￿-￿architecture￿ ￿ (￿b￿) ￿Y￿-￿architecture￿ ￿
Fig. 2. Congestion patterns of 10 by 10 mesh
both types of cut lines.
Summing up the capacities of the edges passing across the
cut lines, we can derive a throughput upper bound for n by n
meshes with different interconnect architectures.
For Manhattan architecture, there are n edges crossing each
cut line. The total edge capacity is n. For Y-architecture, there
are 2n 1 edges passing across each cut line and each edge
has capacity 0:6205, the total edge capacity crossing the cut
line is 1:241n 0:6205. When n approaches inﬁnity, an n
by n mesh using Y-architecture can have 24:1% more ﬂow
crossing the cut line. Thus, Y-architecture can achieve at most
24:1% throughput improvement over Manhattan architecture
on a squared mesh.
For X-architecture, there are 2(n 1) diagonal edges and n
Manhattan edges crossing each of the horizontal and vertical
cut lines. To achieve maximum throughput, the ratio of routing
area for diagonal edges and that for Manhattan edges is 1.25.
Under this ratio, the edge capacities are 0:444 and 0:393
for Manhattan edges and diagonal edges respectively. The
total ﬂow amount that go across the cut line is at most
1:230n 0:393. When n approaches inﬁnity, the throughput
improvement bound is 23:0%.
A rectangular chip has communication bottlenecks on two
(horizontal and vertical) middle cut lines. The physical dimen-
sion of the middle part of the chip restricts the communication
ﬂow and thus prevents us from achieving larger throughput.
For M- and Y-architectures, convex-shaped chips (diamond
chip for M-architecture and hexagonal chip for Y-architecture)
produce better throughput by allowing more wires to cross the
original middle cut lines [8].4 Note that the use of octagonal
chips for the X-architecture is undesirable, since the wafer
cannot be tiled by octagons without waste.
III. WIRELENGTH REDUCTION
Because of its restrictions on routing directions, the M-
architecture entails signiﬁcant added wirelength beyond the
Euclidean optimum. In the Y-architecture, routing is allowed
along three uniform orientations, and total wirelength is ex-
pected to be reduced. An accurate cost-beneﬁt analysis of
Non-Manhattan routing is impossible without good estimation
of the expected wirelength reduction when switching from
Manhattan to Non-Manhattan routing. However, the literature
contains only simplistic (and seemingly conﬂicting) estimates.
 For nets with 10 or more pins, experiments with both
exact [21] and heuristic Steiner algorithms [16] [17] [22]
suggest an average improvement between Manhattan and
octilinear Steiner trees of approximately 10% when the
nets are randomly generated, and even smaller improve-
ments when nets are extracted from real VLSI designs.
 For 2-pin nets, the octilinear over Manhattan improve-
ment is estimated to be 17.17% in [24], respectively
14.6% in [31]. [24] and [31] assume different probability
distributions over 2-pin nets: in [24] one pin is assumed to
be chosen uniformly at random from an Euclidean circle
centered at the other pin, while in [31] the Euclidean
circle is replaced by a Manhattan circle.
 For full commercial designs placed and routed with
octilinear-aware tools, [15] and [31] report wirelength
improvements of 20% or more.
The previous estimates do not adequately address the effect
of routing-geometry-aware placement on the overall wire-
length improvement. Previous studies of the routing demand
using different traditional placers [19] show that Manhattan
placers tend to align circuit elements either vertically or
horizontally, leaving few opportunities to exploit additional
routing directions. A Y-aware or X-aware placer factors in
hexagonal or octilinear wiring during placement, and results
in better placements of nets when such wiring is used to route
the nets. Therefore, total wirelength can be greatly reduced.
To estimate the wirelength improvement achieved by Y-
aware or X-aware placement and routing versus Manhattan
placement and routing, we have built a simpliﬁed placer which
uses simulated annealing driven by hexagonal or octilinear
wirelength estimation. The input of the placer is a simpliﬁed
netlist extracted from MCNC instances, in which a list of cells
is speciﬁed for each net. After a random initial placement of
cells, two cells are randomly selected, and we decide whether
to swap these two cells based on the current annealing tem-
perature and the new SMT cost with hexagonal or octilinear
routing, which is computed using an exact SMT algorithm,
GeoSteiner [21]. The initial temperature of the simulated
annealing algorithm is speciﬁed so that it is far larger than
the standard deviation of total wirelength distribution [14].
For each temperature, the number of swaps is on the order of
4Note that it is not necessarily to use a regular hexagon for the Y-
architecture: either horizontal or vertical symmetry sufﬁces.IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. XX, NO. Y, MONTH 2004 4
TABLE II
AVERAGE WIRELENGTH IMPROVEMENTS FOR NON-MANHATTAN
PLACEMENT AND ROUTING VS. MANHATTAN PLACEMENT AND ROUTING
(%).
Instance #nets Y-Arch X-Arch Euclidean
C2 601 4.81 8.92 11.04
BALU 658 7.13 9.29 11.07
PRIMARY1 695 7.32 10.31 13.03
C5 1438 8.34 11.48 12.73
TABLE III
TOTAL WIRELENGTH OF INSTANCE “C2” WITH DIFFERENT
COMBINATIONS OF PLACEMENT AND ROUTING.
routing-geometry routing-geometry
driven placement Rect Hex Oct Euclidean
Rectilinear 1805 1841.8 1719.2 1683.9
Hexagonal 1896 1743.0 1665.9 1621.5
Octilinear 1908 1799.6 1644.0 1617.4
Euclidean 1865 1772.1 1646.9 1605.7
100 times the number of cells [20]. The new temperature is
generated by multiplying the current temperature by a=0:95,
which is a relatively large a for simulated annealing [20].
For each instance and each routing geometry (rectilinear,
hexagonal, octilinear and Euclidean), we run the placer 5
times, and get the best wirelength with routing-geometry-
aware placement and routing. The wirelength improvements
achieved by Non-Manhattan placement and routing are sum-
marized in Table II. In Tables III–VI we give total wirelengths
obtained with different combinations of placement and routing
for each of the four testcases.
According to the results, the Y-architecture achieves a
wirelength improvement up to about 8.3%. The X-architecture
further reduces total wirelength to be up to about 11.4%
over M-architecture and it produces about 3.3% wirelength
reduction over Y-architecture with the cost of one more routing
direction.
We note that in the above experiments the placer uses
a ﬁxed area die. However, reduction of overall wirelength
results in decreased routing area, which in turn leads to further
wirelength reduction, creating a “virtuous cycle” effect. An
analysis of this effect is given in Appendix I.
IV. Y CLOCK TREE
Clock distribution networks synchronize the ﬂow of data
signals among synchronous data paths. The design of these
TABLE IV
TOTAL WIRELENGTH OF INSTANCE “BALU” WITH DIFFERENT
COMBINATIONS OF PLACEMENT AND ROUTING.
routing-geometry routing-geometry
driven placement Rect Hex Oct Euclidean
Rectilinear 1820 1856.0 1728.4 1694.7
Hexagonal 1878 1748 1660.8 1623.6
Octilinear 1886 1785.6 1650.9 1621.3
Euclidean 1898 1769.8 1654.6 1616.5
TABLE V
TOTAL WIRELENGTH OF INSTANCE “PRIMARY1” WITH DIFFERENT
COMBINATIONS OF PLACEMENT AND ROUTING.
routing-geometry routing-geometry
driven placement Rect Hex Oct Euclidean
Rectilinear 2058 2080.8 1942.5 1903.7
Hexagonal 2126 1958.3 1882.7 1833.9
Octilinear 2136 2004.8 1862.0 1828.0
Euclidean 2124 1976.8 1854.1 1805.5
TABLE VI
TOTAL WIRELENGTH OF INSTANCE “C5” WITH DIFFERENT
COMBINATIONS OF PLACEMENT AND ROUTING.
routing-geometry routing-geometry
driven placement Rect Hex Oct Euclidean
Rectilinear 3557 3625 3340 3272
Hexagonal 3619 3334.99 3182.48 3107.45
Octilinear 3569 3390 3149 3097
Euclidean 3628 3397 3183 3104
networks can dramatically affect system-wide performance
and reliability. The “H” clock tree [4] is widely used in the
IC industry. In the H-tree, clock terminals are arranged in a
symmetric fashion, and are connected by a planar hierarchy of
symmetric “H” structures. When octilinear routing is allowed,
the “H” structure can be replaced with an “X” structure, so
that source-sink path (i.e., insertion) delay and total wirelength
are decreased. However, signiﬁcant undesirable overlapping
(superposition) will occur between parallel interconnect wires
in the X-tree.
With three uniform routing directions, a Y clock tree can be
built as depicted in Figure 3(a), essentially giving a “distorted
X-tree” with reduced wirelength and no superposed parallel
wires. Let the distance between two adjacent clock terminals
(a) Y Clock Tree.￿ (b) A One-Level Y￿
Clock Tree.￿
a￿ b￿
x￿
y￿
s1￿
s2￿
s3￿
s4￿
o￿
Fig. 3. Y Clock Tree.
TABLE VII
PATH LENGTH AND TOTAL WIRELENGTH OF H-TREE, X-TREE AND
Y-TREE.
Path Length Total Wirelength
H-tree (2n  1) 3
2 2n(2n  1)
X-tree
p
2
2 (2n  1)
p
22n(2n  1)
Y-tree 1
2(1+
p
3
3 )(2n  1) 1+
p
3
2 2n(2n  1)IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. XX, NO. Y, MONTH 2004 5
be 1. Path length from the clock source to clock terminal, as
well as total wirelength, are compared with H-tree and X-tree
in Table VII. In the table, n is the level of clock trees. Path
length and total wirelength of n-level clock trees can be easily
derived. E.g., source-to-sink path length of H clock tree is
derived according to the following formula.
PathLengthH(n) = PathLengthH(n 1)+2n 1 (1)
The Y clock tree has a path length of :7887(2n 1), 21:1%
less than the H-tree. Its total wirelength is 1:3662n(2n 1),
8:9% less than H-tree, and 3:4% less than X-tree. Actually,
the one-level Y-tree shown in Figure 3(b) is the optimal
Euclidean Steiner Minimum Tree to connect four adjacent
clock terminals s1;s2;s3;s4 and the clock source o. Thus the Y
clock tree provides minimal total wirelength among all clock
trees with similar symmetric structure. The further advantage
of Y clock tree is that there is no overlapping of parallel
interconnect wires. It can be shown:
Theorem 1: Let the distance between two adjacent clock
terminals be D. The minimum distance between two parallel
interconnect wires is
p
3 1
4 D.
Proof: Suppose there is a coordinate system with a 0
x-axis, a 60 y-axis and the origin (0;0) at the center of
the main Y-tree structure (see Figure 3(b)). Then in a one-
level Y-tree, the two interconnect wires that are parallel to
the y-axis in the ﬁgure have x-coordinates of a. In a two-
level Y-tree, the lowest-level y-axis-parallel interconnect wires
have x-coordinates of a2a and a2(a+b). Generally,
in an n-level Y-tree, x-coordinates of the lowest-level y-axis-
parallel interconnect wires are a(2a or 2(a+b)):::
(2n 1a or 2n 1(a+b)).
Since a = D
2(1  
p
3
3 ), and (a + b) = D
2(1 +
p
3
3 ), the y-
coordinates can be written as (20 21 :::2n 1) 1
2D+
(20 21 :::2n 1)
p
3
6 D. These values cannot be zero
because the values of 20  21  :::  2n 1 must not be
zero, and the minimum absolute value among them is a =
D
2(1 
p
3
3 ). Thus the minimal distance between two parallel
interconnect wires in the Y clock tree is
p
3
2 a =
p
3 1
4 D.
V. Y POWER DISTRIBUTION
Excessive voltage drop in the power grid can slow device
switching speed and reduce noise margin. Robust power
distribution within available area resource is critical to chip
performance and reliability. Hierarchical mesh structures are
widely used for power distribution in high performance chips
because of their robustness [5]. In this section, we show that
power distribution in the Y-architecture is not only natural, but
achieves less IR drop than equally-resourced mesh distribution
in the M-architecture.
Our comparison is based on the following model of the
power distribution network.
 The power distribution network is constructed by a hier-
archy of mesh structures connected by vias at crossing
points of wires. Each mesh has equal wire spacing and
C4 pad￿
Top￿-￿level￿
mesh￿
B￿ottom￿-￿
l￿evel￿ mesh￿
(￿b￿)￿ Representative areas.￿
C4 pad￿ B￿ottom￿-￿
l￿evel￿ mesh￿
(a) Two-level power mesh.￿
Fig. 4. Power distribution networks and representative areas for M- and
Y-architectures.
wire width. Ignoring the resistance of vias,5 we assume
perfect contact at each crossing point.
 On top of metal layers, there are arrays of C4 power pads
evenly distributed on the surface of the power mesh.
 Under the bottom-level mesh, there are devices connected
to the wires of the bottom-level mesh. The devices are
modeled as uniform current sinks and placed at crossing
points of the bottom-level mesh.
In state-of-art designs, there is a fairly large number (>100)
of power pads evenly distributed on the surface of the top-
level power mesh [34]. It is reasonable to assume that the
whole power mesh is an inﬁnite resistive grid constructed by
replicating the area surrounded by adjacent power pads. Figure
4 illustrates two-level power meshes and the representative
areas in the M- and Y-architectures. Our analysis and circuit
simulations consider only the worst-case IR-drop on the rep-
resentative area. This method is also used in [12].
A. IR-Drop on Single-Level Power Mesh
Static IR-drop on a hierarchical power mesh depends largely
on the top-level mesh since usually the top-level mesh is wider
and coarser and most current ﬂows along the top-level mesh.
Here we analyze and compare the worst-case static IR-drop
on a single-level power mesh in the M- and Y-architectures.
1) IR-Drop on Single-Level Power Mesh in the Y-
Architecture: A single-level power mesh in the Y-architecture
is abstracted as an inﬁnite triangular resistive lattice with edge
resistance RY.6 We examine IR-drop in the triangular area
5In practice, high current density on vias often causes reliability problems.
In the Y-architecture, assuming same wire width, the area of intersection
(overlap) between two adjacent-layer wires is larger than in the M-architecture.
Hence, we can place a bigger via between adjacent layers or place more vias in
the via array between adjacent layers to reduce resistance and current density
for vias. Let AY, AX, and AM represent this area for Y-, X- and M-architectures,
respectively. We have AY = 1:1547AM and AX = 1:414AM.
6Note that for a uniform mesh with ﬁxed total routing area, the edge
resistance is independent of the number of metal lines on the mesh. When the
number lines increases, wire pitch and wire width decreases with the same
ratio, and the edge resistance remains the same.IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. XX, NO. Y, MONTH 2004 6
with NY rows surrounded by three adjacent power pads7. In
this case, the worst-case IR-drop appears at the center of
this representative area. Each power pad supplies a current
IY = N2
Yi to the power mesh, where i is the current drain at
each intersection on the mesh.
Assume there is a coordinate system with the origin at
the center of the power mesh, and 0-degree and 120-degree
lines used as m-axis and n-axis, respectively. We analyze
the voltage drop between the node (0;0) and the power pad
at (NY
3 ; NY
3 ) by considering currents from power pads and
evenly distributed current sinks separately.
IR-drop caused by currents from power pads. Suppose
that a current IY enters the lattice at the node (ms;ns) and
leaves at inﬁnity. The voltage drop for any node on the lattice
is analyzed in [2]. The voltage drop between (ms;ns) and
(m;n), denoted as V(ms;ns)(m;n), is given by the integral
IYRY
2p
p=2
￿
0
 
1 e j(m ms) (n ns)jx
cos(((m ms)+(n ns))y)=(sinhxcosy) dy;
(2)
where 2coshxcosy+cos2y=3. When j(m ms) (n ns)j is
large, the voltage drop V(ms;ns)(m;n) can be approximated as
IYRY
4
p
3p
[ln((m ms)2+(n ns)2 (m ms)(n ns))+c1]; (3)
where c1 = 3:6393 is a constant.8
Let V(ms;ns) denote the voltage drop between (0;0) and
the power pad at (
NY
3 ; 
NY
3 ) caused by the current source at
(ms;ns). According to the above approximation, we have
 when (ms;ns) = (
NY
3 ; 
NY
3 ), V(ms;ns) 
(IYRY=4
p
3p)(2lnNY  ln3+c1);
 when (ms;ns) 6= (NY
3 ; NY
3 ), V(ms;ns) = V(ms;ns)(0;0)  
V(ms;ns)(
NY
3 ; 
NY
3 )  (IYRY=2
p
3p)ln
D0
Ds , where Ds is the
Euclidean distance between (ms;ns) and (NY
3 ; NY
3 ), and
D0 is the Euclidean distance between (ms;ns) and (0;0).
The constant c2 = å
(ms;ns)6=(
NY
3 ; 
NY
3 )
ln
D0
Ds can be computed
by a simple algorithm, which calculates the summation
for all the current sources within a circle around the ori-
gin. As the radius of the circle increases, the summation
converges to a value of c2 =  1:173679:
Therefore, if only currents from power pads are considered, the
voltage drop between (0;0) and the power pad at (NY
3 ; NY
3 )
is
Vsource = å
(ms;ns)
V(ms;ns) =
IYRY
2
p
3p
(lnNY +CY); (4)
where CY = c1=2 ln3=2+c2 = 0:09666:
IR-drop caused by evenly distributed current sinks. Next,
we consider the voltage drop caused by current sinks at the
intersections of the power mesh. If the voltage between (0;0)
and (m;n) is denoted by Vsink(m;n), by a combination of
7E.g., for the top-level Y-architecture mesh shown in Figure 4(b), NY is
equal to 3.
8See the Appendix II for details of this approximation.
TABLE VIII
SIMULATION RESULTS FOR WORST-CASE IR-DROP ON THE SINGLE-LEVEL
POWER MESH IN THE Y-ARCHITECTURE, COMPARED TO ESTIMATED
VALUES (MV).
NY IR-Drop Estimated IR-Drop Error
3 166.67 165.39 1.28
6 229.17 229.08 0.09
9 266.36 266.34 0.02
12 292.78 292.77 0.01
15 313.28 313.27 0.01
18 330.03 330.03 0.00
21 344.20 344.19 0.00
Ohm’s and Kirchhoff’s Laws we have
Vsink(m 1;n)+Vsink(m+1;n)+Vsink(m;n+1)
+Vsink(m;n 1)+Vsink(m 1;n 1)
+Vsink(m+1;n+1) 6Vsink(m;n) = iRY:
(5)
If the resistive lattice is regarded as a discrete approximation
to a continuous resistive medium, we will obtain a potential
function proportional to D2, where D is the Euclidean distance
from the origin. Therefore, we assume the following represen-
tation for the voltage between (0;0) and (m;n):
Vsink(m;n) = k (m2+n2 mn); (6)
where k is a constant. Equation (5) then yields
Vsink(m;n) =
iRY
6
(m2+n2 mn): (7)
When only current sinks are considered, the voltage drop
between (0;0) and the power pad at (
NY
3 ; 
NY
3 ) is
Vsink =Vsink(
NY
3
; 
NY
3
) =
IYRY
18
: (8)
Veriﬁcation of Worst-Case IR-Drop. From the above
analysis, we obtain the voltage drop at the center:
VY = Vsource+Vsink 
IYRY
18
+
IYRY
2
p
3p
(lnNY+CY); (9)
where CY = 0:09666.
To verify the above formula for worst-case IR-drop on the
single-level power mesh, we use HSpice to simulate various
power meshes with different values of NY’s. Since the problem
is linear in nature, in our experiments the resistance of each
wire segment RY is simply set to be 1KW, and the total current
drain in the area IY is set to be 1mA. We list simulation results
for NY from 3 to 21 in Table VIII, and compare them with the
estimated values from the formula. The results show that the
formula is accurate, with error less than 1%.
2) Comparing IR-Drop on Single-Level Power Mesh: For
a single-level power mesh in the M-architecture, worst-case
IR-drop is analyzed and veriﬁed in [11]. Suppose the power
mesh has edge resistance RM, number of rows within the
representative area NM and current supplied by each power
pad IM, the worst-case IR-drop on the single-level Manhattan
(M-) mesh is:
VM 
IMRM
8
+
IMRM
2p
(lnNM+CM); (10)IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. XX, NO. Y, MONTH 2004 7
TABLE IX
IR-DROP IMPROVEMENTS IN SINGLE-LEVEL Y-MESH VS. M-MESH.
NM Estimated IR-Drop (mV) IR-Drop Impr. (%)
in M-mesh with Y-mesh
2 214.25 10.78
3 278.78 8.28
4 324.56 7.11
5 360.08 6.41
6 389.09 5.93
7 413.63 5.58
8 434.88 5.31
9 453.63 5.09
where CM =  0:1324.
To fairly compare the Y-mesh and M-mesh, we constrain
the two meshes to have the same wire material and thickness,
cover the same area (same total current drain) with the same
wiring resource, and have the same number of crossing points
and power pads. Therefore, we have RY =
p
3RM, IY = IM,
and NY = NM. According to Equations (9) and (10), worst-
case IR-drop on the single-level Y-mesh is less than that on
the M-mesh by
DV = VM VY = cIMRM; (11)
where c = 0:02309. We list IR-drop improvements with Y-
mesh for different values of NM. The number of wire lines
between two adjacent power pads on the top-level power
mesh is usually small [11]. When NM = 4, static IR-drop
improvement of the Y-mesh over M-mesh is 7.1%.
B. IR-Drop on Hierarchical Power Mesh
In practice, power is distributed through a hierarchy of six
or more metal layers. In this section, we simulate hierarchical
power networks for the X-, Y- and M-architectures using
HSpice, explore different conﬁgurations of power networks,
and compare the best solutions. We assume an equal sum of
routing resources (i.e., total routing area) for X-, Y- and M-
architecture power distribution across layers M6, M5 and M4.
In our experiment below, we set the total wiring area of M6,
M5 and M4 to be 52% of the total representative area. The
representative area for the Manhattan and X mesh is set to
be a 1:2mm by 1:2mm square. To achieve the same power
pad density, the representative area for the Y power grid is an
equilateral triangle with edge length 1:289mm. Further details
of our comparison are as follows.
 Layer thickness and resistivity parameters of a 6-layer
process are taken from TSMC 0.13µm copper process
information [30]. Layer thicknesses are 0.33µm for M1,
0.36µm for M2-5, and 1.02µm for M6.
 M1-M3 power distribution is native to library cells and
blocks, requiring a common interface (0-degree) at M4.
Power routing in M1-3 has the same pitch in both the Y
and Manhattan solutions: M1 has pitch of 8µm and wire
width of 2µm, M2 has pitch of 60µm and wire width of
4µm, and M3 has pitch of 60µm and wire width of 4µm.
M4 pitch is ﬁxed at 75µm to enable matchup with M1-3
macros and an apples-to-apples comparison.
TABLE X
STATIC IR-DROP ON BEST POWER NETWORKS IN X-, Y-, AND
MANHATTAN ARCHITECTURES
Arch. M6 pitch M6 area M5 pitch M5 area M4 pitch IR-drop
(um) (%) (um) (%) (um) (mv)
M 300 70 75 20 75 38.5
Y 600 70 150 20 75 35.2
X 300 40 300 40 150 34.8
 Allowed values of wiring separations (= pitches) on M5
and M6, denoted by S5 and S6, are f600µm, 300µm,
150µm, 75µmg. Allowed percentages of total wiring area
used on M4 and M5, denoted as P4 and P5, are f10%,
20%, 30%, 40%, ..., 80%g.
 There is a via at each intersection of wires on adjacent
metal layers. The vias are regarded as perfect electrical
contact.
 1V voltage sources are placed at the corners of repre-
sentative areas. Each current sink on M1 (between two
adjacent vias) is 5:2110 7A.
All combinations of wire pitch and wire width of M4, M5,
and M6 are exhaustively searched. Table X shows the best con-
ﬁgurations in M-, Y- and X-architecture and the corresponding
static IR-drop. In the best M-architecture conﬁguration, M6
has wire pitch of 300µm and uses 70% of the power routing
resource; M5 has wire pitch of 75µm and uses 20% of
the resource. The IR-drop produced by this conﬁguration is
38:5mV. In the best Y-architecture conﬁguration, M6 has pitch
600µm and uses 70% of the power routing resource, while
M5 has pitch 150µm and again uses 20% of wiring area.
The IR-drop is 35:2mV, which is 8.6% smaller than that of
the best M-architecture solution. In the best X-architecture
conﬁguration, M6 and M5 both have 300µm wiring pitch and
uses 40% of wiring area, while the M4 has pitch 150µm. The
IR-drop is 34:8mV, which is 1.2% smaller than that of the best
Y-architecture conﬁguration. Ongoing research seeks a more
general and formal comparison.
VI. ROUTABILITY IN THE Y-ARCHITECTURE
A. Uniform Routing Grid
A nice property of the Y-architecture is that there is a
natural, uniform routing grid. Figure 5(a)(b) illustrates the
routing grid in the M- and Y-architectures, wherein each
routing layer has exactly the same wiring pitch. Figure 5(c)
shows the X-architecture grid, where identical layer pitches
imply that wire intersection points are not coincident. It is
therefore difﬁcult to ﬁnd a natural, resource-efﬁcient, uniform
wiring grid in the X-architecture.
A uniform routing grid is expected to beneﬁt large VLSI
designs for two main reasons. (1) It enables continued use
of today’s dominating gridded routing algorithms. (2) The
uniform routing grid can permit integral coordinates (even if
absolute positions have irrational coordinates!), signiﬁcantly
simplifying detailed routing and design rule checking algo-
rithms.IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. XX, NO. Y, MONTH 2004 8
(a)￿ (b)￿ (c)￿
Fig. 5. Routing grids in M-, Y- and X-architectures.
 ￿
a￿ ￿ b￿ ￿
c￿ ￿ d￿ ￿
k+2￿
overhead￿
#￿vias￿ = kL￿
(a) A ￿ via tunnle￿ (b) ￿ Bank￿  of via tunnles￿
Fig. 6. Via tunnel and bank of via tunnels in the M-architecture.
B. Via Tunnels and Via Tunnel Banks
Another advantage of the uniform global routing grid is
that we can utilize via tunnels and via tunnel banks to avoid
the fragmentation of routing resources caused by vias; this
improves overall chip routability. In multi-layer routing, wire
tracks are blocked on the layers that a via passes through.
Traditional routing schemes scatter vias all over the chip,
and this fragmentation of routing resources may cause serious
wireability problems; this is called “via blockage effect”. As
we approach the 65nm technology node, this effect becomes
more serious, since buffering of global wires introduces many
via chains that go through all the way from the top-level metal
down to the gate layer. We believe that the proposed use of
via tunnels and via tunnel banks will reduce the via blockage
effect and thus improve routability and wiring density.
Figure 6(a) shows an example of a via tunnel in the
Manhattan architecture. There are two routing layers shown
in the ﬁgure: the upper layer is for horizontal routing and
the lower layer is for vertical routing. Terminals a and b
are connected by detouring the horizontal wires around the
via using the space on the vertical layer. Because the detour
happens on the lower layer, it will not affect the wire between
terminals c and d on the upper layer.
By aligning a number of via tunnels in vertical direction, we
obtain a bank of via tunnels, which is shown in Figure 6(b).
Suppose each via tunnel have k vias arranged in a horizontal
line (in Figure 6(b), k = 3), and we align L via tunnels into a
bank. In the resulting bank, all the horizontal tracks are free
to route, and only k+2 vertical tracks are blocked. Note that
there are a total of kL vias in the bank; without the bank of
via tunnels, up to kL tracks could be blocked on each layer
that the vias pass through. The use of via tunnel banks can
thus signiﬁcantly reduce the “via blocking effect”.
We have designed similar via tunnel and bank of via tunnels
for the Y-architecture.
 Figure 7(a) shows the birds-eye view of a via tunnel
design in the Y-architecture. In this example, we have
Blocking 4￿
tracks on￿
the￿ 120￿-￿
degree￿
layer￿
(a) A via tunnel in Y￿-￿architecture.￿ (b) Two tunnels aligned together.￿
Fig. 7. Via tunnels and bank of via tunnels in Y-architecture.
(a) ￿k = 2￿
overhead = 5￿
(b) ￿k = 3￿
overhead = 7￿
(c) ￿k = 4￿
overhead = 9￿
Fig. 8. Via tunnels with vias aligned in a line in Y-architecture.
Overhead = 5￿
tracks￿
Fig. 9. Three tunnels with k = 2 aligned together
Overhead =￿
2k+1￿ tracks￿
#vias = ￿k￿L￿
Bank of￿
tunnels￿
Fig. 10. Bank of via tunnels in Y-architecture.IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. XX, NO. Y, MONTH 2004 9
three layers. From top to bottom, the routing direction
is 60-degree, 120-degree and 0-degree in each layer,
respectively. The circle in the center represents a through
via. The space in the middle layer is used to detour wires
around the via. We can achieve blockage-free routing on
the top and bottom layers, and have four tracks blocked
on the middle layer.
 Similar to the construction of banks of via tunnels in M-
architecture, we align the via tunnels together to obtain
a bank of via tunnels in the Y-architecture. Figure 7(b)
illustrates how two via tunnels shown in Figure 7(a) are
aligned along the 120-degree direction.
 In order to reduce the average track overhead, each via
tunnel can have more than one vias in a line. Figure 8
illustrates the construction of via tunnels with k (k =
2;3;4) vias aligned in a line. The ﬁgures show detour
routing patterns on the middle layer for k = 2;3;4,
respectively. Figure 9 is an example of three via tunnels
with k = 2 aligned along the 120-degree direction. From
these examples, we can see that for via tunnel with k vias
aligned in a line, the track overhead on the middle layer
is 2k+1.
 Figure 10 depicts a bank of via tunnels in the Y-
architecture. Suppose the bottom m layers are used to
perform intra-cell routing, and the top n m layers are
used for distributing signals to the banks. Assume each
via tunnel has k vias in a line, and there are L via tunnels
in the bank. All the kL vias introduce only 2k+1 tracks
of routing blockage on the 120-degree routing layers.
VII. CONCLUSIONS
In this paper, we have examined key issues concerning
the potential use of Y-architecture for semiconductor ICs,
including throughput analysis, estimates of wirelength savings,
clock and power distribution methodology, wireability, and
manufacturing. We have not discussed such issues as graph-
ics engine changes, computational-geometric data structures,
number and coordinate systems, calibration of parasitic ex-
traction (especially capacitance extraction) models, etc. Such
“mundane” issues are part of the necessary groundwork for
the eventual deployment of the Y-architecture, and the subject
of ongoing work in our group, but are beyond the scope of
the present paper.
Compared to the X-architecture, the Y-architecture supports
a regular routing grid, which is important for simplifying
manufacturing process and routing and design rule checking
algorithms. Besides, novel means of via tunnel and via tunnel
bank can be used to avoid via blockage effects; such tech-
niques are not obvious with the X-architecture. Moreover, a Y
clock tree has a better total wire length compared to X clock
tree structure without overlapping of parallel interconnect
wires. Finally, the Y-architecture provides a throughput and
wirelength improvement close to the X-architecture with one
less routing direction; convex-shaped chips can produce fur-
ther improvement for the Y-architecture without wafer waste.
Further research directions include: (1) theoretical anal-
ysis and high-impact designs or codes to demonstrate Y-
architecture advantages; (2) more accurate estimations of ex-
pected wirelength improvement which formalizes interactions
between nets; and (3) interfaces to current library cells and
new Y-speciﬁc library cells. Many parts of a commercially
successful Y-architecture methodology remain open. The Y-
architecture also has applications beyond the die, e.g., it may
be valuable on laminates used for multi-die integration, and
on the buildup layers (e.g., BBUL [33]) that will replace
traditional packages.
REFERENCES
[1] F. Abboud, S. Babin, V. Charkarian, et al., “Design Considerations for an
Electron-Beam Pattern Generator for the 130-nm Generation of Masks”,
SPIE Symp. Photomask and X-Ray Mask Technology VI, SPIE Vol. 3748,
1999, pp. 385-399.
[2] D. Atkinson and F. J. van Steenwijk, “Inﬁnite Resistive Lattices”, Am.
J. Phys. 67 (1999), pp. 486-492.
[3] H. B. Bakoglu, Circuits, Interconnections, and Packaging for VLSI,
Addison-Wesley, 1990.
[4] H. B. Bakoglu, J. T. Walker and J. D. Meindl, “A Symmetric Clock
Distribution Tree and Optimized High-Speed Interconnections for Re-
duced Clock Skew in ULSI and WSI Circuits”, Proc. IEEE Int. Conf.
Computer Design, Oct. 1986, pp. 118-122.
[5] S. Boyd, L. Vandenberghe and A. El Gamal, “Design of Robust Global
Power and Ground Networks”, Proc. ACM/SIGDA Int. Symp. Physical
Design, 2001, pp. 283-288.
[6] P. Buck, Dupont Photomasks, personal communication, Nov. 2002.
[7] H. Chen, B. Yao, F. Zhou and C. K. Cheng, “Physical Planning of
On-Chip Interconnect Architectures”, Proc. IEEE Int. Conf. Computer
Design, Sep. 2002, pp. 30-35.
[8] H. Chen, B. Yao, F. Zhou and C. K. Cheng, “The Y-Architecture: Yet
Another On-Chip Interconnect Solution”, Proc. Asia and South Paciﬁc
Design Automation Conf., 2003, pp. 840-846.
[9] H. Chen, C.-K. Cheng, A. B. Kahng, I. M˘ andoiu and Q. Wang,
“Estimation of Wirelength Reduction for l-Geometry vs. Manhattan
Placement and Routing”, Proc. ACM/IEEE Workshop on System Level
Interconnect Prediction, 2003, pp. 71-76.
[10] H. Chen, C.-K. Cheng, A. B. Kahng, I. M˘ andoiu, Q. Wang and
B. Yao, “The Y-Architecture for On-Chip Interconnect: Analysis and
Methodology”, Proc. Int. Conf. Computer Aided Design, 2003, to appear.
[11] H. Chen, C.-K. Cheng, A. B. Kahng, Q. Wang and B. Yao, “Optimal
Sizing Analyses for Mesh-Based Power Plans”, unpublished manuscript,
2003.
[12] A. Dharchoudhury and R. Panda, “Design and Analysis of Power
Distribution Networks in POwerPC Microprocessors”, Proc. Design
Automation Conf., 1998, pp. 738-743.
[13] N. Garg, and J. Konemann, “Faster and Simpler Algorithms for Mul-
ticommodity Flow and other Fractional Packing Problems”, Proc. 39th
Annual Symp. Foundations of Computer Science, 1998, pp. 300-309.
[14] T. Hildebrandt, “An Annotated Placement Bibliography”, ACM SIGDA
Newsletter, Dec. 1985, pp. 12-21.
[15] M. Igarashi, T. Mitsuhashi, A. Lee, et al., “A Diagonal-Interconnect
Architecture and Its Application to RISC Core Design”, Proc. Int. Solid-
State Circuits Conf., 2002, pp. 166-167.
[16] A. B. Kahng, I. I. M˘ andoiu and A. Z. Zelikovsky, “Highly Scalable
Algorithms for Rectilinear and Octilinear Steiner Trees”, Proc. Asia and
South Paciﬁc Design Automation Conf., 2003, pp. 827-833.
[17] C.-K. Koh and P. H. Madden, “Manhattan or Non-Manhattan? A
Study of Alternative VLSI Routing Architectures”, Proc. Great Lakes
Symposium on VLSI, 2000, pp. 47-52.
[18] M. Lemke, J. Gramss, H. J. Doering, et. al., “Advanced Writing
Strategies for High-End Mask Making”, Proc. SPIE, Vol. 3996, 2000,
pp. 166-172.
[19] P. H. Madden, “Congestion Reduction in Traditional and New Routing
Architectures”, to appear.
[20] Carl Sechen, “Placement and Global Routing of Integrated Circuits Us-
ing Simulated Annealing”, Ph.D. Dissertation, U. California, Berkeley,
1987, Chapter 2.
[21] B. K. Nielsen, P. Winter and M. Zachariasen, “An Exact Algorithm for
the Uniformly-Oriented Steiner Tree Problem”, Proc. 10th European
Symp. Algorithms, Springer LNCS Vol. 2461, 2002, pp. 760-772.
[22] M. Hayase and S. Meki, “An Algorithm for Steiner Trees in l-
Geometry”, IPSJ Journal 38(4) (1997).IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. XX, NO. Y, MONTH 2004 10
R￿ R￿ R￿ R￿
(a) Rectilinear Circle￿ (a) Hexagonal Circle￿ (a) Octilinear Circle￿ (a) Euclidean Circle￿
Fig. 11. Circles with radius R for each routing geometry (rectilinear,
hexagonal, octilinear and Euclidean).
[23] C. Progler, Photronics Inc., personal communication, Nov. 2002.
[24] M. D. Rostoker et al., “Hexagonal Architecture”, U.S. Patent, No.
US6407434B1, June 2002.
[25] M. D. Rostoker et al., “CAD for Hexagonal Architecture”, U.S. Patent,
No. US5822214, Oct. 1998.
[26] R. Scepanovic et al., “Microelectronic Integrated Circuit Structure
and Method Using Three Directional Interconnect Routing Based on
Hexagonal Geometry”, U.S. Patent, No. US5578840, Nov. 1996.
[27] P. Saxena, N. Menezes, P. Cocchini and D. A. Kirkpatrick, “The Scaling
Chanllenge: Can Correct-by-Construction Design Help?”, Proc. Intl.
Symp. Physical Design, 2003, pp. 51-58.
[28] D. Stroobandt and J. V. Campenhout, “Accurate Interconnection Length
Estimations for Predictions Early in the Design Cycle”, VLSI Design,
Special Issue on Physical Design in Deep Submicron 10(1) (1999), pp.
1-20.
[29] S. Teig and J. L. Ganley, “Method and Apparatus for Considering Diag-
onal Wiring in Placement”, Int. Patent Application, No. WO 02/47165
A2, June 2002.
[30] TSMC 0.13µm Design Rules. http://www.tsmc.com.
[31] S. Teig, “The X Architecture”, Proc. ACM/IEEE Workshop on System
Level Interconnect Prediction, 2002, pp. 33-37.
[32] http://www.xinitiative.org.
[33] Intel Research Webpage on Packaging.
http://www.intel.com/research/silicon/packaging.htm.
[34] The ITRS Assembly and Packaging roadmap.
http://public.itrs.net.
[35] WaterJet-Guided Laser In Wafer Cutting – Synova SA.
http://www.gemcity.com/downloads/synova01.pdf.
APPENDIX I
ANALYSIS OF THE “VIRTUOUS CYCLE” WIRELENGTH
REDUCTION EFFECT
The simulated annealing placer in Section III places cells
within a chip that has ﬁxed area. However, reduction of overall
wirelength results in decreased routing area, which in turn
leads to further wirelength reduction, creating a “virtuous
cycle” effect.
Consider a cluster of two-pin nets which are connected to
one pin A. All other pins are uniformly located in a circle by
a routing-geometry-aware placer. Circles for different routing
geometries are shown in Figure 11. Based on the “virtuous
cycle” effect, the circle will have an area proportional to
the total routing area. For Manhattan placement and routing,
suppose the pins are placed in a rectilinear circle with radius
R. We have the area of the rectilinear circle, A = 2R2, and the
total routing area, Arouting = 4
￿ R
0 (x xdx
A N) = 4
3
R3
A N = 2
3RN,
where N is the number of two-pin nets and xdx
A N is the number
of pins located between unit circles with radii x and x+dx. Let
AArouting. We have RN=3 and Arouting  2
9N2. Similar
analysis can be done for other routing geometries, with the
results summarized as follows:
Rectilinear: Arouting  2
9N2
Hexagonal: Arouting  8
p
3
81 N2, 23:0% less compared to
Manhattan placement and routing.
Octilinear: Arouting 
p
2
9 N2, 29:3% less compared to
Manhattan placement and routing.
Euclidean: Arouting  4
9pN2, 36:3% less compared to
Manhattan placement and routing.
This simple analysis shows that the wirelength reduction
caused by the “virtuous cycle” effect is signiﬁcant, and can
partly explain the large wirelength reductions reported in [15]
and [31].
APPENDIX II
APPROXIMATION OF EQUATION (2)
Suppose a current I enters a uniform inﬁnite triangular
resistive lattice with edge resistance R at the origin and leaves
at inﬁnity. The voltage drop for any node on the lattice
is analyzed in [2]. The ﬁnal result for the voltage drop is
expressed as an integral representation. The voltage between
(0;0) and (m;n), V(m;n), is:
V(m;n) =
IR
2p
p=2
￿
0
(1 e j(m n)jxcos(m+n)y)
sinhxcosy
dy; (12)
where 2coshxcosy+cos2y = 3. When jm nj is large, the
exponential term in the above expression become negligible
except when x is very small. When x is very small, we have:
 coshx  1+x2=2,
 sinhx  x,
 cosy = [(8+(coshx)2)1=2 coshx]=2  1 x2=6, and
 y  x=
p
3:
The above expression can be rewritten as the sum of three
integrals: Vm;n=IR = I1+I2+I3, where
I1 = (1=2p)
￿ p=2
0 (1 e jm nj
p
3ycos(m+n)y)=
p
3y dy
I2 = (1=2p)
￿ p=2
0 (1=sinhxcosy 1=
p
3y)dy
I3 = (1=2p) [
￿ p=2
0 e jm nj
p
3ycos(m+n)y=
p
3y dy
 
￿ p=2
0 e jm njxcos(m+n)y=sinhxcosy dy]
(13)
The ﬁrst integral can be expressed in terms of the exponential
integral Ein(z),
Ein(z)=
￿ z
0
[(1 e t)=t] dt =
￿ p=2
0
[(1 e 2yz=p)=y] dy; (14)
so that
I1 =
1
2
p
3p
Ref Ein(
p
2
[jm nj
p
3 i(m+n)] ) g: (15)
For large values of its argument, Ein(z)  lnz+c1; where
c1 = 0:57721: So we have
I1 
1
4
p
3p
[ln(m2+n2 mn)+2(lnp+c1)] (16)
The second integral can be integrated numerically. Let I2 =
1
2
p
3p c2, where c2 =
￿ p=2
0 (
p
3=sinhxcosy 1=y)dy = 0:09
772:
The exponentials in the third integral are negligible, except
for small values of x and y, and for those values, sinhxcosy
x 
p
3y, so the third integral can be neglected.
Finally, we haveIEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. XX, NO. Y, MONTH 2004 11
Shot division using only rectangle shot
Shot size
Triangle
Pattern
Input
pattern
Shot
division
Angle parts are divided
into a number of small
rectangle shots.
Rectangle
Shot
Slit size
Shot division using both rectangle and triangle shots
Shot size
Triangle
Pattern
Input
pattern
Shot
Division
Rectangle
Shot
Triangle shot
Fig. 12. Toshiba machine triangle shots.
Vm;n 
IR
4
p
3p
[ln(m2+n2 mn)+c]; (17)
where c = 2(lnp+c1+c2) = 3:6393.
APPENDIX III
MANUFACTURING AND OTHER ISSUES
As is well-known from the example of the X Initiative [32],
any new back end of the line (BEOL) architecture requires
engagement throughout the mask and process infrastructure.
According to our discussions with domain experts [6] [23],
the Y-architecture presents a number of generic challenges to
manufacturing; there are no show-stoppers, but engineering
efforts will be required across several domains. Space limits
preclude detailed discussion here, but we sketch several main
points.
With respect to mask making, Vector Shaped Beam (VSB)
ebeam lithography tools [1] create “shots” of varying shape
and size by imaging the overlap of two apertures, typically
both square. This allows a range of rectangular shots to be
created and exposed on the mask. Existing Toshiba ebeam
lithography systems can produce 45-degree pattern at high
speed through the combination of one rectangular aperture
and one with 45- and 135-degree edges [32]. The new JEOL
JBX3030 tool [18] also has apertures to produce 45- and
135-degree edges. These new tools mitigate the write time
implications of angled data since they provide an alternative to
approximating an angled line with a series of small rectangles;
Figure 12 illustrates mask fracturing using both rectangle and
triangle shots versus mask fracturing using only rectangle
shots [32]. With successful experiences with 45-degree edges
in mind, it is possible that 60- and 120-degree edges can
be printed assuming future availability of 30- and 60-degree
angles in apertures.
Current support for angular edges is really focused on
small edge segments rather than long lines. To produce long
lines efﬁciently, it is necessary to have a pair of rectangular
apertures rotated to still produce rectangular shots, but rotated
to the desired angle. On the other hand, if the Y architecture
is applied only to the upper, lower resolution metal layers -
as we have proposed - the write time issue could be solved
if the masks could be made with optical (laser) lithography
(e.g., ETEC Alta writers), where throughput is independent of
angular edges.
Of course, algorithms for optical proximity correction
(OPC) and mask data preparation (layout data fracturing for
vector shaped e-beam writers) will need to be updated for Y-
architecture geometries. We believe that all forms of rule- and
model-based OPC can be adapted and calibrated to handle Y-
architecture geometries. Current fracturing algorithms already
handle (45-degree) ‘slant’ edges when partitioning layout into
trapezoids; introduction of 30- and 60-degree slant edges
should not pose a signiﬁcant challenge.
The potential of non-rectangulardie also presents challenges
to package I/O design and dicing. Current side-to-side die
sawing cannot cut hexagonal dies due to the silicon lattice
structure. New technologies, such as waterjet-guided laser
[35], are emerging to confront the challenges.
There are other challenges related to inspection, exposure,
repair, metrology and pattern compensation. Ultimately, the
deployment of the Y-architecture will depend on careful en-
gineering, and provable cost reductions vis-a-vis achievable
design quality with pervasive 60- and 120-degree wiring.