This paper presents new single-layer, i.e., planar-embeddable, clock tree constructions with exact zero skew under either the linear or the Elmore delay model. Our method, called Planar-DME, consists of two parts. The rst algorithm, called Linear-Planar-DME, guarantees an optimal planar zero-skew clock tree (ZST) under the linear delay model. The second algorithm, called ElmorePlanar-DME, uses the Linear-Planar-DME connection topology in constructing a low-cost ZST according to the Elmore delay model. While a planar ZST under the linear delay model is easily converted to a planar ZST under the Elmore model by elongating tree edges in bottom-up order, our key idea is to avoid unneeded wire elongation by iterating the DME construction of ZST and the bottom-up modi cation of the resulting non-planar routing. Costs of our planar ZST solutions are comparable to those of the best previous non-planar ZST solutions, and substantially improve over previous planar clock routing methods.
the clock tree is rooted at the source, any edge between a parent node p and its child v may be identi ed with the child node, i.e., we denote this edge as e v . In our discussion, the distance between two points p and q is the Manhattan distance d(p; q), and the distance between two sets of points P and Q, d(P; Q), is minfd(p; q) j p 2 P and q 2 Qg. The cost of the edge e v is simply its wirelength, denoted je v j; this is always at least as large as the Manhattan distance between the endpoints of the edge, i.e., je v j d(l(p); l(v)). The cost of T(S) is the total wirelength of the edges in T(S).
For a given clock tree T(S), let t(s 0 ; s i ) denote the signal propagation time on the unique path from source s 0 to sink s i . The skew of T(S) is the maximum value of jt(s 0 ; s i ) ? t(s 0 ; s j )j over all sink pairs s i ; s j 2 S. If the skew of T(S) is zero then T(S) is a zero-skew tree (ZST) . In what follows, we address the Zero Skew Clock Routing Problem: Given a set S of sink locations, construct a ZST T(S) with minimum cost.
Minimum-Cost Zero-Skew Trees
In the IC CAD literature, minimum-cost, exact zero-skew clock trees for cell-based designs have been constructed by applying geometric optimizations over the set of sink locations. The associated formulations are perhaps best motivated by the \monolithic" single-bu er clocking approach 2, 10] . From the circuits/systems perspective, workers such as Friedman 14] have considered these geometric methods as \subroutines" in addressing further concerns such as use of existing distributed bu ers, non-zero clocking skew, driver and bu er sizing, etc. Thus, the zero-skew clock routing problem, along with its planar variant, remains a fundamental building block in any clock distribution methodology.
The rst clock tree construction for cell-based layouts with arbitrary sink locations was that of Jackson et al. 16] ; their MMM algorithm does recursive top-down partitioning of the set of sinks into two equal-sized subsets, always connecting the centroid of a set to the centroids of its subsets. Kahng et al. 8, 18] constructed clock tree topologies using a bottom-up matching approach. Their \KCR" algorithm obtains zero pathlength skew in practice (i.e., zero skew under the linear delay model) but has no theoretical guarantee. The work of Tsay 23] was the rst to guarantee exact zero skew for any input; this was accomplished with respect to the Elmore delay model. In the same spirit as 18], Tsay recursively combines pairs of zero skew trees at \tapping points" to yield larger zero skew trees. To maintain the exact zero skew, wires are elongated via \snaking" as necessary.
The above methods all concentrate on generation of the clock tree topology: the topology is then embedded in the plane more or less arbitrarily as it is generated. The Deferred-Merge Embedding (DME) method, which was discovered independently by three groups 3, 4, 11] , is a linear-time algorithm which optimally embeds any given topology in the Manhattan plane, i.e., with exact zero skew and minimum total wirelength. Because the properties of the DME construction are central to our present work, we now provide a review of DME following the development in 3].
The DME Algorithm Given sink set S and topology G, DME embeds internal nodes of G via: (i) a bottom-up phase that constructs a tree of merging segments which represent loci of possible placements of internal nodes in the ZST T; and (ii) a top-down embedding phase that determines exact locations for the internal nodes in G (see Figure 3 , reproduced from 3]).
In the bottom-up phase (Figure 3a) , each node v 2 G is associated with a merging segment which represents a set of possible placements of v in a minimum-cost ZST. The merging segment of a node depends on the merging segments of its two children, hence the bottom-up processing order. More precisely, let a and b be the children of node v in G, and let TS a and TS b denote the subtrees of merging segments rooted at a and b, respectively. We seek placements of v which allow TS a and TS b to be merged with minimum added wire while preserving zero skew. This means that we want to minimize je a j + je b j in T, while balancing delays from l(v) to all leaves in the subtree rooted at v. The values of je a j and je b j which achieve this are unique; they are computed and stored for use in the top-down embedding phase of DME.
To formally describe this construction, the following terminology is useful. A Manhattan arc is de ned to be a line segment, possibly of zero length, with slope +1 or -1; in other words, a Manhattan arc is a line segment tilted at 45 degrees from the wiring directions. The collection of points within a xed distance of a Manhattan arc is called a tilted rectangular region, or TRR, whose boundary is composed of Manhattan arcs ( Figure 1, reproduced from 3] ). The Manhattan arc at the center of the TRR is called its core. Finally, the radius of a TRR is the distance between its core and its boundary. Note that a Manhattan arc is itself a TRR with radius 0.
A formal recursive de nition of ms(v), the merging segment of node v 2 G, is as follows. If v is a sink s i , then ms(v) = fs i g (note that this single point is a Manhattan arc). If v is an internal node, then ms(v) is the set of all placements l(v) which merge TS a and TS b with minimum wire cost, i.e., all points within distance je a j of ms(a) and within distance je b j of ms(b). If ms(a) and ms(b) are both Manhattan arcs, then ms(v) = trr a \ trr b is obtained by intersecting two TRRs, trr a with core ms(a) and radius je a j, and trr b with core ms(b) and radius je b j (see Figure 2 , also reproduced from 3]). Boese and Kahng 3] show that if ms(a) and ms(b) are both Manhattan arcs, then ms(v) is also a Manhattan arc. Since the merging segment ms(s i ) for each sink s i is a single point and thus a Manhattan arc, by induction all merging segments are Manhattan arcs. Given the tree of merging segments corresponding to G, the top-down phase (Figure 3b) chooses exact embeddings of internal nodes in the ZST as follows. For node v in topology G, (i) if v is the root node, then DME selects any point in ms(v) to be l(v); 1 or (ii) if v is an internal node other than the root, DME chooses l(v) to be any point in ms(v) that is at distance je v j or less from the placement of v's parent p (the merging segment ms(p) was constructed such that d(ms(v); ms(p)) je v j, so there must exist some l(v) satisfying this condition). In case (ii), l(v) can be any point in the intersection of ms(v) and the square TRR trr p which has radius je v j and core fl(p)g.
Note that DME requires an input topology. Several works have thus proposed topology constructions that yield low-cost routing solutions when DME is applied. Below, we compare against the non-planar KCR+DME 3] and Greedy-DME 12] methods. 1 If a xed clock source location clk has been speci ed, DME chooses l(s0) 2 ms(s0) with minimum distance from clk and connects a wire directly from clk to l(s0 The DME algorithm. The procedure Calculate Edge Lengths in (a) nds the values je a j and je b j such that je a j + je b j is minimized and zero-skew is achieved; this calculation depends on the delay model.
Planar-Embeddable Trees
None of the clock tree solutions given by the above \exact zero skew" algorithms is easily embedded in the layout plane: it is often impossible to perform the actual layout without introducing many vias. This di culty was rst noted by Zhu and Dai 25] , who stated compelling reasons to seek a single-layer, or planar-embeddable clock routing solution.
The clock routing layer may be prescribed, or we may prefer the layer with smallest RC delay.
Routing on fewer distinct layers (i.e., having fewer distinct electrical parameters) makes the layout more independent of process variation. Uniform electrical parameters also simplify bu ering optimizations.
Single-layer routing eliminates the delay and attenuation of the clock signal through vias, thus improving both performance and signal integrity.
Given these observations, the Planar Zero-Skew Clock Routing problem is of interest, i.e., given sink set S, nd a planar-embeddable ZST T(S) with minimum cost. \Planar-embeddable" intuitively means that the tree \can be drawn in the plane without edges crossing", but this concept is not easily characterized in the Manhattan plane. Existing work 25] implicitly relies on Euclidean planar-embeddability being su cient for Manhattan planarembeddability (a line segment in the Euclidean plane can be approximated to any desired accuracy by a monotone staircase in the Manhattan plane). Thus, we de ne two edges as crossing each other when the corresponding open line segments in the Euclidean plane properly intersect (share exactly one point). This de nition allows optimal planar clock routing solutions where the embeddings of edges are superposed. Figure 4 shows this phenomenon: four sinks that are collinear will have an optimal \planar" clock tree whose edges pass over each other. Since this overlapping can be made planar with minimum increase in wirelength, we accept such a degenerate solution as planar. This is also the convention of 25]. and the points p and v pertain to the later discussion about the partitioning rules for Linear-Planar-DME.)
The clock routing method of Zhu and Dai 25] was the rst to guarantee a planar ZST; the solution has minimum possible source-sink pathlength, and runs in between (n log n) and O(n 2 ) time.
However, the method basically creates an \X" clock tree, where an \H" would considerably reduce the tree cost. Khan et al. 19] have observed this de ciency, and have proposed a guaranteed-planar heuristic which reduces the tree cost by applying the top-down horizontal/vertical partitioning approach of 16] for a user-speci ed number of levels, then applying the Zhu-Dai X-tree construction within each of the resulting regions. When the user-speci ed number of levels is zero, the method reduces to that of Zhu-Dai. The authors of 19] claim that their algorithm guarantees minimum source-sink pathlength delay, and report approximately 10% wirelength reduction over 25]. Both 19, 25] rely completely on the linear delay model to achieve their results.
Organization of Paper
The remainder of this paper is organized as follows. Section 2 shows that under the linear delay model, the two passes (bottom-up and top-down) of the DME algorithm can be replaced by a single top-down pass which yields exactly the same (optimal) solution. From this \Single-Pass DME" method, we develop a version called Linear-Planar-DME which guarantees a planar, optimal ZST solution under the linear delay. Because Single-Pass DME cannot be applied to the Elmore delay model, Section 3 presents the Elmore-Planar-DME heuristic, which can transform the Linear-Planar-DME solution to a planar Elmore-ZST with little cost increase. Section 4 discusses Linear-Planar-DME variants which can produce good input topologies for Elmore-Planar-DME, and Section 5 gives experimental results and comparisons with previous work. We conclude in Section 6 by listing several extensions and directions for future work.
2 The Linear-Planar-DME Algorithm Describing our new planar clock routing algorithm requires a little more terminology. For any sink subset S 0 S, we de ne the diameter of S 0 as diam(S 0 ) = maxfd(s i ; s j )js i ; s j 2 S 0 g. The radius of S 0 is then radius(S 0 ) = diam(S 0 )=2. A Manhattan disk is a TRR with a core consisting of a single point; we use MD(s i ; r) to denote the Manhattan disk with core fs i g and radius r 0. In other words, a Manhattan disk is the \diamond-shaped" set of all points within a prescribed radius of a central point. For any sink set S 0 2 S with radius(S 0 ) = r 0 , we de ne center(S 0 ) = T s i 2S 0 MD(s i ; r 0 ), which is so named because the the distance from center(S 0 ) to any sink in S 0 is at most r 0 . We use c(S 0 ) to denote the midpoint of center(S 0 ).
Finally, we use two terms that are de ned in the Euclidean plane: (i) P S 0 denotes any Euclidean convex polygon containing S 0 , and (ii) convex-hull(S 0 ) is the P S 0 with minimum interior area. We say that a point p lies inside P S 0 if p is on the boundary of P S 0 or is strictly interior to P S 0 . These terms will be used in proving that the Linear-Planar-DME algorithm de ned below yields a planar solution: wires embedded along the boundary between two disjoint (Euclidean) convex polygons cannot intersect subsequent wires embedded internally to these polygons.
Single-Pass DME
Our rst theoretical result is that under the linear delay model, a single top-down phase can yield the same output as the original two-phase DME algorithm. We prove that the tree of merging segments constructed in the bottom-up phase can be generated during the top-down phase. This result follows from properties of the minimum-pathlength zero-skew subtree over any sink set S 0 , notably that the root of the subtree over S 0 must be located at center(S 0 ). More precisely, center(S 0 ) is equal to ms(v), where v is the root of the tree of merging segments constructed by DME over S 0 , and ms(v) is hence independent of tree topology. This leads to what we call the Single-Pass DME algorithm.
The following Facts and Theorem are crucial to the development of the Single-Pass DME and then the Linear-Planar-DME algorithms. Two useful facts are due to 3]. Fact 1 is a straightforward extension of Theorem 2 in 3] 2 , and Fact 2 is proved in the analysis of the same Theorem 2.
Fact 1: For any sink set S and topology G, let S v be the set of sinks in the subtree rooted at v in the DME solution. Let t LD (u) be the linear delay (i.e., pathlength) from point u 2 ms(v) to each sink in S v . Then t LD (u) = radius(S v ). Theorem 1: Given a set of n sinks S 2 < 2 and connection topology G, we can produce the same output ZST T(S) that the DME algorithm will produce under the linear delay model, using only a single top-down phase with time complexity between (n log n) and O(n 2 ).
Proof: We determine the merging segments and incident edge lengths for all nodes in top-down 2 Theorem 2 in 3] states that for any sink set S and topology G, the DME algorithm will nd a ZST with source-sink pathlength delay TLD(s0) = diam(S)=2. order as follows. Let v be a node in G with parent p (if v is not the root). As in the statement of Fact 1, let S v and S p be the sets of sinks in the subtrees rooted at nodes v and p in the DME solution. For any sink subset S v S, the value of r 0 = radius(S 0 ) can be found in ( By Fact 1, the length of the edge incident to node v in G, je v j, is equal to t LD (p) ? t LD (v) = radius(S p ) ? radius(S v ). By Fact 3, we can compute je v j for node v in time (jS v j) since we already have t LD (p) = radius(S p ). Thus, we can compute ms(v) and je v j in (jS v j) time, and we now have all the information that would have been determined in the bottom-up phase of DME, and the single top-down phase is su cient. Finally, let L i denote the set of nodes at level i of the ZST, and let l be the height of the ZST. We have P v2L i jS v j n, and log n l n. Thus, the overall time complexity is l P v2L i jS v j ln = (ln), which is between (n log n) and O(n 2 ).
Because Single-Pass DME outputs the same optimum ZST as the original DME with respect to the given connection topology, established properties of the output tree (minimum source-sink pathlength, and minimum total tree cost) are maintained. The worst-case and best-case time bounds are the same as those for the method of 25].
Linear-Planar-DME
The impact of Theorem 1 may not be immediately apparent, since DME can already accomplish the same construction in linear time. However, the proof showed that as soon as Single-Pass DME has been given a partitioning of S v into S a and S b , it can immediately nd the ms(a) and ms (b) that are compatible with an optimal ZST having this \top part" of the clock topology. Thus, Single-Pass DME allows the connection topology to be determined dynamically in a top-down fashion, yet still nds a minimum-pathlength, minimum-cost embedding of whatever topology is eventually determined. If Single-Pass DME chooses and embeds the connection topology carefully, then a planar routing can be achieved.
The Linear-Planar-DME algorithm is essentially a version of Single-Pass DME wherein the connection topology is determined based on the existing routing, such that future routing cannot interfere with this existing routing. We use the (Euclidean) convex polygon concept to guide the top-down partitioning of both the routing area and the set of sinks. Given S 0 S and a convex polygon P S 0 containing S 0 , we recursively divide P S 0 into two smaller convex polygons such that routing inside each smaller polygon cannot interfere with routing inside the other polygon or on the boundary between the polygons.
Embedding and Partitioning Rules
The Linear-Planar-DME algorithm is derived from Single-Pass DME by adopting the following rules for embedding the internal nodes of the ZST, and for top-down bipartitioning of the sinks in each subtree.
Embedding Rules: In each recursive call, Linear-Planar-DME accepts a subset of sinks S 0 S, some convex polygon P S 0 containing S 0 , and some point p inside P S 0 which is to connect to a point v on ms(v) = center(S 0 ). The point p is the embedding of the parent of the root of the subtree over S 0 ; this point has already been determined earlier in the top-down pass. 3 The existing routing is outside P S 0 , hence if we can select a feasible DME embedding point v inside P S 0 , the routing from p to v will not interfere with the routing external to P S 0 . Thus, the resulting routing will be planar and compatible with the DME solution. The embedding rules in Figure 5 ensure that such an embedding point will be selected in O (1) accordingly. A total of (jS 0 j) time is needed to partition the sinks in set S 0 .
The overall Linear-Planar-DME algorithm is given in Figure 6 . Steps 4 and 6 in LinearPlanar-DME-Sub are the key di erence between Linear-Planar-DME and generic Single-Pass DME. Single-Pass DME would more or less arbitrarily choose a feasible embedding point at Step 4, and partition the sinks in the subtree according to the given connection topology at Step 6. In contrast, Linear-Planar-DME chooses both the embedding and the partition of the sinks (thus dynamically determining the topology) so that planarity is maintained. If the clock source location is not speci ed, then Linear-Planar-DME will set the root of the clock tree to be the clock location. Figure 7 illustrates how the planar clock routing is achieved by Linear-Planar-DME. For any given sink set, applying the partitioning and embedding rules takes the same (linear) time that is required to compute merging segments and edge lengths, hence Linear-Planar-DME has the same time complexity as Single-Pass DME.
Correctness of Linear-Planar-DME
We now show that Linear-Planar-DME yields a planar-embeddable, i.e., single-layer optimal ZST.
The following Fact 4 states that for any sink set S 0 S, the midpoint of center(S 0 ) must lie inside convex-hull(S 0 ). Note that center(S 0 ) does not necessarily lie entirely in convex-hull(S 0 ) { consider the case of S 0 containing two points along a diagonal line.
Algorithm Linear-Planar-DME (S,clk) Input: Set of sinks S; clock location clk in P S .
Output: Planar ZST T(S) with root s 0 ; cost(T). s can coincide at A). Furthermore, it is easy to see that c(S 0 ) is the center of gravity of R. These facts imply that c(S 0 ) lies inside convex-hull(fx,y,s,tg). Since fx; y; s; tg S 0 , we conclude that c(S 0 ) lies inside convex-hull(S 0 ).
We now prove that the embedding and partitioning rules have the following properties. Our discussion again refers to Figure 8. Theorem 2: Given a subset S 0 S with jS 0 j 2, a convex polygon P S 0 containing S 0 , and a point p inside P S 0 , (i) the embedding rules will embed v inside P S 0 such that the embedding is compatible with the DME solution and (ii) the partitioning rules will divide P S 0 into two smaller convex polygons P S Finally, inductive application of Theorem 2 easily yields:
Theorem 3: The zero-skew clock routing tree constructed by Linear-Planar-DME is planar. Proof: Initially, there is a convex polygon P S (e.g., a rectangle) which contains the set of sinks S and a clock location clk; the clock location clk is the embedding point of the parent node p of v, where v is the root of the ZST T(S) in any DME solution. 4 The embedding rules guarantee that we can nd embedding point l(v) within P S so that the routing from l(p) to l(v) lies within P S .
The partitioning rules guarantee that we can partition P into two smaller convex polygons P S 1 and P S 2 that respectively contain non-empty sink subsets S 1 and S 2 , such that the routing from l(p) to l(v) is on the boundary between P S 1 and P S 2 . Node v will become the parent node for the ZSTs T(S 1 ) and T(S 2 ), and is contained in both P S 1 and P S 2 . Inductively, all future routing over S 1 and S 2 will be con ned within P S 1 and P S 2 , respectively. We conclude that no routing crosses any other.
3 The Elmore-Planar-DME Algorithm Several works on clock tree design use the Elmore delay model, which is more accurate than linear delay 4, 5, 6, 12] . In this section, we sketch a simple method which is the rst to achieve a single-layer Elmore-ZST, i.e., a ZST under the Elmore delay model. Note that under the Elmore delay model, the DME algorithm is no longer optimal: it does not necessarily return a minimumcost ZST for given S and G 3, 5]. 5 Also, the merging segment for the root of the subtree over S 0 S in the DME solution is no longer independent of the subtree connection topology. Hence, the bottom-up DME phase cannot be eliminated, i.e., Single-Pass DME cannot be applied to the Elmore delay model. To construct a low-cost planar-embedded Elmore-ZST, we propose the following Elmore-Planar-DME heuristic. Two important issues are: (i) choice of the topology G, and (ii) embedding to achieve zero Elmore delay skew.
First, any connection topology can be trivially embedded with exact zero skew onto a single routing layer; however, re-embedding the topology of a non-planar ZST (e.g., from 12]) onto a single layer can drastically increase the tree cost. The partial correspondence between linear delay and Elmore delay (at least in some technology regimes) suggests that the (optimum) LinearPlanar-DME solution can be re-embedded to have zero Elmore delay skew with very little increase in tree cost. Thus, Linear-Planar-DME is a natural choice for generating the connection topology within our approach. 6 Second, given a Linear-Planar-DME solution, it is simple to obtain a planar Elmore-ZST by elongating tree edges in a bottom-up fashion to balance di erences in sink delays (e.g., by the \snaking" method of 23]). In the experimental comparisons of Section 5 below, we call such an approach \Naive-Elmore-Planar-DME". We nd that unneeded elongation of tree edges can be saved by iterating both the application of DME to the given topology and the bottom-up modi cation of any resulting non-planar routing, based on a \principle of least commitment". Planarity is enforced in bottom-up order, with planar-embedded subtrees being retained so that 5 Let Tv denote the subtree rooted at node v in a zero-skew routing. Let C(v) and tED(v) respectively denote the total capacitance of Tv and the Elmore delay from v to each sink in Tv. Assume that loading capacitance C(si) is given for each sink si. Finally, let r and c be the per unit wire resistance and capacitance, and let l1 and l2 be lengths of edges from v to v1 and v2, respectively. Then, Cv and tED(v) for an internal node v with children v1 and v2 are calculated as follows 13, 21, 22]: C(v) = C(v1) + C(v2) + c (l1 +l2), tED(v) = tED(v1) + r l1 ( c l 1 2 +C(v1)) = tED(v2) + r l2 ( c l 2 2 + C(v2)). Typically, we assume tED(si) = 0, i.e., there is no \internal delay" at a sink node. 6 Interestingly, we nd that relaxing the planar-embeddable constraint in variations of Linear-Planar-DME leads to improved planar Elmore-ZST solutions (see Section 4 below). This is possible because the method we use to achieve exact zero Elmore delay skew does not depend on an initial planar-embedded solution.
they remain planar, and routing at higher levels being modi ed. Whenever any non-planar routing at some level of the ZST is changed, the merging tree for the ZST above this level is rebuilt, and top-down DME embedding is applied to the new merging tree. The complementary processes of merging tree reconstruction and top-down embedding are iterated until the entire ZST is planar.
Again, we emphasize that the DME algorithm cannot guarantee optimal tree cost under the Elmore model. Thus, our approach only heuristically minimizes the cost of the output planar Elmore-ZST.
High-Level Description
Our method marks each point v 2 T as either planar or non-planar. An edge in T is a planar edge if both its endpoints are marked as planar. A path s ; t is a sequence of line segments from s to t; a planar path is a path that does not cross any planar edge of T. We use cost(s ; t) and hops(s ; t) to respectively denote the pathlength of a path and the number of segments in the path. Finally, the bounding box bbox(s; t) denotes the smallest rectangle containing points s and t.
The Elmore-Planar-DME algorithm is described in Figure 9 . For simplicity, the template assumes that no clock source location has been prescribed. Accommodating a xed clock source is straightforward, as seen from Figure 6 . Initially, a ZST T is obtained by applying the original DME algorithm (using the Elmore delay model) to the given topology G and sink set S. Then, every sink is marked planar and all other nodes are marked non-planar. As long as the ZST T has a non-planar node, Elmore-Planar-DME iterates at Steps 6 and 7. Note that Step 6 constructs the merging tree TS only for non-planar nodes in the upper part of the ZST; Step 7 calls Find Exact Placements(TS) in Figure 3 to embed the shrinking set of non-planar nodes.
Because non-planar nodes are made planar in bottom-up order, Procedure Rebuild-Tree-ofSegments identi es the lowest non-planar nodes in the tree, i.e., the node set A at level L of the tree. Nodes in A have planar children and will be made planar in the current iteration.
Even though there may be other non-planar nodes whose children are all planar, their processing is deferred since subtrees at lower levels of the ZST tend to contain shorter tree edges, and it is easier for longer edges to detour around shorter edges than vice-versa. This same intuition suggests processing the nodes of A in order of increasing merging cost.
To make the discussion more concrete, for each non-planar node v 2 A, let v have DME embedding point w and children s and t. If edges sw and tw do not cross any existing planar edges of T (i.e., edges in E), then v is marked planar (Step 6), and edges sw and tw are added into the set of planar edges E (Step 7). Otherwise, the non-planar routing at node v will be modi ed at Steps 9-12 as described below. The merging segment ms(v) will be either reduced to v's current Algorithm Elmore-Planar-DME (G, S) Input: Topology G; set of sinks S Output: Planar ZST T having topology G Figure 9 : The Elmore-Planar-DME Algorithm.
embedding point if v is marked planar, or re-calculated if the non-planar routing at v is modi ed (see the discussion of subroutine Partial-Route below). Because the structure of the merging tree above the current level L will be changed, Step 13 constructs the tree of merging segments for the remaining non-planar nodes.
Modi cation of Non-Planar Routing
Now we consider the case where sw or tw crosses a planar edge. Recall that the DME embedding point w is the point on ms(v) which is closest to the embedding point of v's sibling (so that the merging cost for node v and v's parent can be minimized). Our heuristic (Steps 9-12 of RebuildTree-of-Segments) is to nd a planar merging path s ; t such that s ; t is as short as possible and as near point w as possible. Speci cally, we rst use Procedure Find-Merging-Path to seek a planar detouring path s ; t with low merging cost at both v and p (e.g., see Figure 10 ). If the s ; t path has minimum possible pathlength (= d(s; t) ), then Procedure Improve-Path is applied to further reduce the merging cost at v's parent by modifying the s ; t path so that it passes closer to the DME embedding point w without increasing its pathlength (e.g., see Figure 12 ). Otherwise, Procedure Partial-Route is used to bring s and t one hop closer together.
Details of the Subroutines
Details of Procedure Find-Merging-Path are given in Figure 11 . We use the term detour point to denote an embedding point of a planar node which serves as an intermediate point in the s ; t path. Note that nding a shortest path over all detour points may not minimize the merging cost at p, and that slightly greater merging cost at v may result in much lower merging cost at p. Figure 10 shows an example in which path P 1 is slightly longer than path P 2 , but is a better choice since it passes much closer to the DME embedding point w. To balance between e ciency and solution quality, Find-Merging-Path gradually increases the set of possible detour points, in the hope that a feasible path will be found early (i.e., when the problem size is small).
Let T v denote the subtree of a ZST T rooted at point v 2 T. Also recall that edge e v denotes the edge connecting v and v's parent. Experimental results below use V 1 = fu j u 2 T v , where edge e v intersects sw or twg and V 2 = f u j u 2 T v , where edge e v intersects sw, tw, or stg. For the example in Figure 10 , Find-Merging-Path will use V 1 = fa; b; cg and V 2 = fa; b; c; d; e; fg. These choices of V 1 and V 2 allow planar paths near w to be selected rst. If Find-Merging-Path fails to discover a feasible path using V 1 and V 2 , the procedure considers a succession of larger point sets V i , i 3; in our experiments, these are simply increasing dilations 7 of the bounding box bbox(s; t).
Procedure Improve-Path in Figure 13 is applied only when the merging path s ; t obtained by Procedure Find-Merging-Path has minimum length equal to d(s; t). The procedure tries to modify s ; t without increasing its length so that it passes closer to v's DME embedding point w. The procedure rst selects a set of candidate embedding points on ms(v). Then, each selected point u in increasing order of d(u; w) is checked to see whether the shortest planar path s ; u ; t has cost = d(s; t). The shortest planar path s ; u ; t is obtained by calling Find-ShortestPlanar-Path twice, i.e, by nding s ; u and u ; t. Note that to nd a minimum-cost path, say, s ; u, we need only consider detour points inside bbox(s; u). The procedure terminates when the rst s ; u ; t path with cost = d(s; t) is found. In addition to the intersection of ms(v) and s ; t (shown as point u 0 in Figure 12 ), there are two types of candidate embedding points on ms(v): (i) the intersection of ms(v) with any vertical or horizontal line through any detour point inside bbox(s; t) (see Figure 12a) , and (ii) the intersection of ms(v) with any planar edge (see Figure 12b) . Again, the key property of u is that it is the point on ms(v) closest to the DME embedding point w, such that the merging path through u still has minimum cost equal to d(s; t). Procedure Partial-Route in Figure 14 uses a \principle of least commitment" whereby the distance between the two children of node v is shortened by one hop at each iteration. Suppose that the current non-planar node v has children s and t, and that we have a planar path s ; t = fs; s 0 ; ; u; ; t 0 ; tg, with u being the point where zero skew is achieved. Without loss of generality, assume that 0 < d(s; s 0 ) d(t; t 0 ). Then, Partial-Route implements only the partial path ss 0 , with s 0 replacing s as a child of v and ss 0 being added to planar edge set E. In this way, v's children are \pulled closer" toward the delay balance point u so that v can be better re-embedded by DME in the next iteration. This avoidance of \commitment" also allows Partial-Route to minimize the harmful e ects of a suboptimal result from Find-Merging-Path or Improve-Path. 8 Notice that since one of v's children is relocated, the merging segments for v and v's ancestors have to be re-calculated.
Finally, note that both Find-Merging-Path and Improve-Path invoke the procedure FindShortest-Planar-Path, which determines a shortest path between two points s and t in the presence of obstacles (the obstacles are the planar edges in subtrees of the ZST T). Since the detour points must be located at the endpoints of planar edges, a general approach based on visibility graphs (e.g., 1, 24]) can be used. Our current implementation uses Dijkstra's algorithm in the visibility graph, with edge weights computed on the y; this does not cause excessive runtimes (see Section 5) since the number of possible detour points is small in most procedure calls. 4 Linear-Planar-DME Variants
As noted above, Elmore-Planar-DME does not actually require a planar-embedded ZST as input.
Thus, while we use the topology G obtained from the Linear-Planar-DME solution, the LinearPlanar-DME construction itself can actually be modi ed.
We have considered modi cations to the partitioning rules of Section 2.2 which change the splitting line to a splitting path of two or more line segments. In other words, rather than draw a straight line through points p and v, we draw a line segment pv and a rayṽ emanating from v to separate the polygons P S 1 and P S 2 . Since we no longer have a straight splitting line, one of the new smaller regions may be non-convex, and more case analysis is required to maintain guaranteed planarity of the output ZST. From a theoretical perspective, such Linear-Planar-DME variants are unappealing: we lose the guaranteed planarity, and the worst-case time complexity increases. However, all ZST's obtained in our experiments remain planar, with non-convex polygons becoming further divided into smaller convex polygons within the succeeding two or three levels. Furthermore, such variants can achieve averages of up to 10.9% wirelength reduction versus results for the original Linear-Planar-DME algorithm which we have reported in 17]. We now brie y describe two possible Linear-Planar-DME variants.
Using a Splitting Path
Consider a subset of sinks S 0 S that is being partitioned, with jS 0 j 2. Recall that the splitting path consists of line segment pv andṽ, a ray emanating from v. The line segment pv has been determined, but there are jS 0 j ? 1 di erent choices ofṽ. To consider all possible choices ofṽ, our Linear-Planar-DME-2 variant sorts the sinks of S 0 in clockwise order around point v; each pair of consecutive sinks determines a splitting path which partitions S 0 into S 0 1 and S 0 2 . To choose among the possible splitting paths, Linear-Planar-DME-2 uses a heuristic estimate of the cost of The latter estimate considers load balance when bipartitioning the sinks, and yields slightly better results (it is also the estimate used in the experiments reported below). More useful cost functions for sink partitioning are no doubt possible.
In the Manhattan plane, computing the radii of all pairs of sink subsets (corresponding to bipartitions of S 0 ) can be accomplished in O(jS 0 j) time. Thus, the sorting operation dominates the time complexity, and the overall Linear-Planar-DME-2 complexity is O(l n lg n), where l is the number of levels in the output ZST and n = jSj. In practice, l is very close to lg n, as we report below.
Using a Splitting Path and Lookahead
Our Linear-Planar-DME-3 variant is similar to Linear-Planar-DME-2, but chooses splitting paths more carefully based on lookahead. After determining a set of candidate bipartitions of S 0 , we estimate the cost of each by actually constructing the ZST that will be output by Linear-Planar-DME-2. To maintain practical runtimes, the number of candidate bipartitions considered is limited to a small constant ( 16 in the experiments reported below). Given this constraint, our LinearPlanar-DME-3 implementation has worst-case runtime of O(l 2 n lg n). Finally, if the clock source is not speci ed, then the line segment pv of the rst splitting line can be arbitrarily determined since p's location is not given. As we determine the possible choices of v, we sort the sinks clockwise around v; each pair of consecutive sinks determines a possible choice of pv. Thus, there are jSj possible cases for p's location. Again, to maintain practical runtime, we test only 16 equally spaced cases for p's locations. Experimentally, very limited improvements result from trying more than 16 cases.
Experimental Results
We implemented the Linear-Planar-DME and Elmore-Planar-DME algorithms using Sun SPARC-10 workstations and the C/Unix environment. The same seven examples as in 5, 12, 25] were studied. Benchmarks Primary1 and Primary2 both have the same loading capacitance of 0:5pF for all sinks, and also have per-unit wire resistance and wire capacitance of 16:6m and 0:027fF, respectively. The x-coordinates and y-coordinates of Primary1 sink locations range from 120 to 5520 units and from 0 to 5790 units, respectively; those of Primary2 range from 20 to 9840 units and from 0 to 10250 units, respectively. Details of the circuit parameters for benchmarks r1-r5 can be found in 23]. Table 1 shows that our Elmore-Planar-DME implementation is relatively e cient, with runtimes dominated by the generation of a good topology in the call to Linear-Planar-DME-3. Note that the Primary2 and r2 test cases have about the same number of sinks, but Primary2 leads to relatively higher runtimes. This is because Primary2 has a more uneven distribution of sink locations, which leads to more detouring. The last column of Table 1 shows that our output ZSTs have very balanced structures, with average tree height l very close to lg n. Thus, the observed time complexity of Linear-Planar-DME-3 is O(n (lg n) 3 ).
benchmark Lin-Pln-Elm-Pln-ZST height (#pins) DME-3 DME (l= lg n) prim1 ( 269 Table 1 : Sun SPARC-10 CPU time (seconds) for our Planar-DME implementation. Note that the topology generation via Linear-Planar-DME-3 requires much more time than the embedding by Elmore-Planar-DME. In the last column, we also show the ZST height as a multiple of the minimum possible tree height, lg n. Table 2 compares our new algorithms with two leading non-planar ZST algorithms in the literature { Greedy-DME 12] and KCR+DME 5, 18] { as well as the previous planar routing method of Zhu and Dai 25] . Greedy-DME corresponds to the CL+I6 method of Edahiro 12] , and can yield an unbalanced topology. KCR+DME uses a matching approach to achieve a balanced topology 5]. Our new planar ZST solutions are competitive with the best known non-planar ZST solutions of Greedy-DME (having average 9:8% greater wiring cost), and are superior to KCR+DME solutions in all cases. Elmore-Planar-DME also uses 22:5% less wire than the (linear delay based) method of 25]. It is interesting to note that the cost of our Elmore-Planar-DME solutions is only slightly increased from the cost of the starting Linear-Planar-DME ZSTs. We believe this implies that better solutions can be obtained as we continue to improve Linear-Planar-DME. Finally, Figure 15 shows ZSTs for the Primary1 benchmark constructed by Greedy-DME, Linear-Planar-DME, Elmore-Planar-DME, and the method of Zhu and Dai.
Greedy-DME Naive Elmbenchmark (CL+I6 12]) Lin-Pln-DME-3 Elm-Pln-DME KCR+DME 5] Pln-DME Zhu- Dai Table 2 : Comparison of Elmore-Planar-DME with other algorithms in terms of total wirelength, using the same benchmarks studied in 5, 12, 25] . No prescribed clock source location was assumed. Ave Cost indicates the average percentage increase in wirelength versus the results of CL+I6 12] . Note that all the wirelength has been divided by 1000 units. 
Future Work
We have considered several improvements to our current work.
First, the output of our Planar-DME approach may be viewed as a planar routing sketch for a ZST. Currently, we do not take routing capacity, cross-talk constraints, etc. into consideration (recall the example of Figure 12b ). We hope to use such computational geometry techniques as those of Dai et al. 9 ] to enhance our current approach.
Second, although Elmore-Planar-DME has reasonable runtime in practice, various heuristic speedups are possible. For example, obstacles (planar edges) are actually connected as subtrees, and each subtree can be replaced by its convex hull to reduce the complexity of the path-nding instance. Also the number of candidate embedding points tested by ImprovePath can be greatly reduced.
Third, Linear-Planar-DME itself can be improved to yield better connection topologies for input to Elmore-Planar-DME, through the use of more sophisticated partitioning rules (using splitting paths with more than two segments; clustering sinks before partitioning) and embedding rules (e.g., embed the root of the zero-skew subtree over S 0 at a more appropriate place than center(S 0 )).
Finally, we are pursuing methods which construct single-layer clock routing trees with bounded, rather than exactly zero, skew; such constructions are useful in the engineering of general clock distribution solutions, where skew and other attributes are controlled by a mix of topology generation, embedding, wiresizing and bu er optimization 7, 14, 15, 20] 
