117 3.6 ERT, SERT and SERT-C results for 5-terminal nets. 118 3.7 ERT, SERT and SERT-C results for 9-terminal nets. 119 3.8 Near-optimality o f E R T d e l a y and tree cost. 122 3.9 Near-optimality of SERT-C delay and tree cost. 127 3.10 Performance comparisons for the DWSERT algorithm. 134
Chapter 4 4.1 Average clock tree cost for the various heuristics. 158 4.2 Average clock tree cost for the various heuristics continued. 158 4.3 Average pathlength skew for the various heuristics. 159 4.4 Average pathlength skew for the various heuristics continued. 159 4.5 Min, ave, and max tree cost for MMM and GR+E+H. 160 4.6 Min, ave, and max pathlength skew for MMM and GR+E+H. 160 4.7 Average tree costs and skews of KMB and CLOCK2 trees. 162 4.8 Delay and capacitance at each i n ternal node. 180 4.9 E ect of DME on KCR and BB using linear delay. This book discusses problems of optimal interconnection" and describes ecient algorithms for several basic formulations. Our domain of application is the computer-aided design CAD of very large-scale integrated VLSI circuits, wherein interconnection design is now one of the most actively studied areas. However, much of what we develop can be applied to other domains ranging from urban planning to the design of communication n e t works. Because most formulations that we study are intractable, the term optimal" in some sense is a misnomer: rather, our focus is on the reasoned and principled development of good heuristics.
This book is an outgrowth of the 1992 Ph.D. dissertation of Gabriel Robins 203 at the UCLA Computer Science Department. As such, it retains a highly personal perspective: it gives a retrospective o f o u r o wn research, and it is colored by our research i n terests and our background in discrete algorithms and optimization. Our treatment also attempts to convey a sense of history how our eld has co-evolved with an emerging science of VLSI design". With recent years having seen VLSI designs become increasingly performance-dominated, and thus interconnect-dominated, VLSI interconnections are indeed a rich domain for this historical view. In particular, our research o n i n terconnection design has spanned the eld's rapid transition from purely geometric formulations to more physically-motivated" formulations.
Although we do not attempt an encyclopedic treatment, we d o d e s c r i b e k ey relevant w orks, and the discussion is largely self-contained. We e n vision that this book will be useful as a reference for researchers and CAD algorithm de-velopers, or as reading for a seminar on VLSI CAD, heuristic algorithms, or geometric optimization. Our own codes, which are cited throughout the book, are freely available to interested parties; see our contact information below.
THE DOMAIN OF DISCOURSE: ROUTING IN VLSI PHYSICAL DESIGN
Let us rst outline the context for our particular sub eld of VLSI CAD, namely, the global routing phase of physical design. For more complete reviews of VLSI design, and physical design in particular, the reader is referred to 168, 1 8 2 , 1 9 4 , 216 .
The goal of VLSI CAD is to transform a high-level system description into a set of mask geometries for fabrication. This is typically accomplished by t h e following sequence of stages see Figure 1 .1.
Design Speci cation: Starting from a real-world requirement e.g. secure communication", a high-level system description e.g., the DES" data encryption standard is developed which includes such parameters as architecture, performance, area, power, cost and technology.
Functional Design: The design is transformed into a behavioral specication which captures the system I O behavior using mathematical equations, timing diagrams, instruction sets and other devices.
Logic Design: The functional design is represented in logical form, typically via Boolean expressions which m a y be subsequently optimized to reduce the complexity of the system description.
Structural Design: The logic design is represented as a circuit using
components from an available library of modules e.g., NAND and NOR gates, standard cells, or building-block macros; this may also involve t e c hnology mapping steps.
Physical Design: The structural design is transformed into the mask geometry for fabrication while adhering to underlying design rules for the chosen technology.
The last stage in this process, physical design, contains our area of interest.
Physical design consists of two major steps. First, the placement step maps The VLSI design process.
functional units modules onto portions of a layout region, e.g., the surface of a c hip. Second, the routing step interconnects speci ed sets of terminals, i.e., the signal nets of the design, by wiring within routing regions that lie between or over the functional units. A signal net consists of a module output terminal together with the various module input terminals to which the output signal must be delivered.
Within the eld of physical design, prevailing objectives have e v olved over the years in response to advances in VLSI technology. When system operating frequencies were dominated by device switching speeds, placement and routing optimizations centered on reduction of total routing area. Subsequent advances in fabrication technology have increased packing densities, allowing more and faster devices to be placed on larger ICs. Leading-edge fabrication technology now g o e s w ell into submicron feature sizes, and circuit speeds are approaching gigahertz frequencies. The reduced feature size implies more resistive i n terconnects, and increased system complexity implies larger layout regions. Thus, minimizationof interconnection delay has become the major concern in physical design.
In light of this trend, performance-driven physical design has seen much r esearch activity within the past ve y ears. Early works focused on performancedriven placement, with the standard objective being the close placement o f modules belonging to timing-criticalpaths. However, performance-driven placement algorithms will achieve their intended e ect only when the associated routing algorithms can realize the full potential of a high-quality placement. Thus, the emphasis in routing objectives has shifted from area minimization to delay minimization, and more recently to the control of interconnect delay e.g., by limiting skews or delays at particular terminals. This range of routing objectives area, delay, s k ew and beyond de nes the scope of this book.
Once an objective has been established, the actual routing of a given signal net can be decomposed into global and detailed routing. The global routing phase is a higher-level process during which the routing topologies of signal nets are de ned over the available routing regions. Then, the detailed r outing phase produces the actual geometries which realize the required connectivity on the fabricated chip. Our work applies to the global routing phase of physical design.
electrical connectivity of the signal nets. With standard-cell or gate-array design methodologies, which h a ve m a n y small functional modules, global routing may b e v i e w ed as taking place in Manhattan geometry, i.e., distances between terminals are given by rectilinear distance. In other words, these design methodologies possess su ciently high porosity that the routing problem can be formulated in the geometric plane. On the other hand, building-block design methodologies involve larger functional blocks or macro cells. Since these are often treated as obstacles, the routing problem is formulated with respect to a weighted routing graph that represents the available routing area. A standard model is the channel intersection graph CIG, where each edge represents a channeli.e., the empty rectangular space between adjacent modules and each vertex corresponds to the intersection of two orthogonal channels 193 s e e Figure 1 .2. The edge weights of the CIG can be used to model channel width or congestion. A true" global router processes multiple signal nets simultaneously using such techniques as simulated annealing, multicommodity o w or mathematical programming. H o wever, many existing codes are sequential, o r net-at-a-time", i n that they establish a heuristic ordering of nets for routing and use ripup-andretry techniques when the routing fails. There are also even more ne-grain methods which route individual two-terminal subnets of signal nets. With either type of global router, the key operation is to compute a good routing topology over a single signal net: hence, this book deals exclusively with methods that route a single net at a time.
As with previous routing constructions that have formed the basis of new global routers e.g., Steiner min-max trees", each method that we d e v elop can be transparently integrated into existing global routing approaches. In the mathematical programming approach, nding a routing solution for a given net generates a new entering basis column within a primal-dual iteration. In the sequential approach, routing solutions are found for the highest-priority nets rst, leaving lower-priority nets to encounter more congestion and blockage. After each net is routed, the routing region costs e.g., CIG edge weights can be updated before the next net is processed.
We conclude this section with a review of basic conventions and terminology used throughout the book. We de ne a terminal to be a given location in the layout region. A signal net S = fs 0 ; s 1 ; s 2 ; : : : ; s n g is a set of n + 1 terminals, with one terminal s 0 2 S a designated source and the remaining terminals sinks.
A routing solution is a set of wires that connects, i.e., spans, the terminals of a net so that a signal generated at the source will be propagated to all the sinks.
The rectilinear wiring technology implies an underlying Manhattan" geometry, where the distance between points a and b is da; b = ja x , b x j + ja y , b y j, i.e., the sum of the di erences in their x-a n d y-coordinates. A segment is an uninterrupted horizontal or vertical wire, and any connection between two terminals will consist of one or more wire segments. VLSI and printed circuit board technologies admit multiple routing layers, where a preferred-direction routing methodology is used to facilitate design, manufacturability and reliability. In other words, the available wiring layers are partitioned, with horizontal wire segments preferentially routed on certain layers, and vertical wire segments routed on the other layers. A connection between two wire segments from di erent l a yers is called a via.
Sometimes it is convenient t o e m bed S in an underlying routing graph G = V;E, consisting of a set of vertices V and a set of edges E V V . T h us, the set of terminals is some S V . A subgraph of G is a graph G 0 = V 0 ; E 0 with V 0 V and E 0 E, and E 0 V 0 V 0 . A routing solution is a subgraph of G that spans S. A path between two v ertices x; y 2 V is a sequence of k edges of the form x; v i1 ; v i1 ; v i2 ; : : : ; v ik ; y , where v im ; v im+1 2 E for all 1 m k , 1. A graph is connected if there exists a path between each p a i r of vertices. A graph is a tree if it is connected but the removal of any e d g e will disconnect it. Since a tree topology uses the fewest edges of any spanning graph over the signal net, i.e., jSj , 1 = n edges, routing formulations typically seek a tree topology.
A weighted g r aph has a non-negative r e a l w eight assigned to each of its edges. The cost of a weighted graph is the sum of its edge weights. A shortest path in G between two v ertices x; y 2 V , denoted by minpath G x; y, is a minimumcost path connecting x and y. I n a t r e e T, minpath T x; y is simply the unique path between x and y. F o r a w eighted graph G we u s e dist G x; y t o d e n o t e the cost of minpath G x; y. The distance from the source to a given sink s i in a t r e e i s d e n o t e d a s l i = dist T s 0 ; s i .
Because a signal net is inherently oriented from its source to its sinks, we u s e the special notation R i to denote the cost of the shortest s 0 -s i path in G, i.e., R i = dist G s 0 ; s i . We u s e R to denote the maximum R i value over all sinks s i , and say that R is the radius of the signal net. The radius of a routing tree T is rT = max 1in l i . Additional terminology will be developed throughout the following chapters, as needed. The reader is referred to, e.g., 67 o r 9 2 f o r a more rigorous development of basic graph-theoretic concepts.
As noted at the outset, most problems encountered in VLSI CAD, including all of the interconnection formulations that we a d d r e s s , a r e i n tractable. While we resort to heuristic solutions, a basic precept in our work is to prove that our proposed heuristics perform well. For example, we often strive t o s h o w t h a t the heuristic solution cost in the worst case or average case is no more than a constant factor from optimal. Since the practical relevance of a heuristic may hinge on issues beyond asymptotic time and space complexity, w e also augment our performance bounds with empirical simulations using standard test cases from the literature, e.g., those maintained by A CM SIGDA currently available by a n o n ymous ftp to mcnc.org .
OVERVIEW OF THE BOOK
Beyond its sketch of our application domain of VLSI routing, the present c hapter also surveys the main results contained in this book. Chapters 2, 3 and 4 are respectively entitled Area, Delay, a n d S k ew. These form the core of the book, and address three fundamental routing objectives: i minimization of total wirelength, ii minimization of signal delay, and iii minimization of skew among signal arrival times. Chapter 5 provides new frameworks for the simultaneous optimization of multiple competing objectives; one such framework allows various uni cations of the techniques developed in the preceding three chapters. The following subsections summarize the key developments of each c hapter.
Minimum Area: The Steiner Minimal Tree Problem
VLSI design rules dictate a minimum separation between wires, and therefore the area occupied by the routing on a chip is roughly proportional to the total wirelength of the routing. Added wirelength generally increases signal delay and power consumption due to increased resistance and capacitance. Other system cost measures, e.g., those based on fabrication cost, yield and reliability, also increase with chip area. Thus, a fundamental objective is to minimize the total wirelength required to connect a prescribed set of points in the plane, i.e., the terminals of a given signal net. The subject of Chapter 2 is the Steiner minimal tree SMT problem, which f o r a g i v en net S asks for a set S 0 of Steiner points such that the total edgelength of the minimum spanning tree MST over S S 0 is minimized. The main insight is that the points of S 0 will serve as internal nodes of the tree intermediate junction points" which reduce the interconnection cost. Without introducing such points, the minimum-cost solution would simply be a minimum spanning tree over S.
The SMT problem is well-studied in combinatorial optimization and network design; see the monographs 138 and 139 . The geometry of VLSI, which usually allows only vertical and horizontal wiring directions, has motivated studies of the rectilinear version of the problem, typically for the wirelength estimation and global routing phases of layout design. With only a few highly constrained exceptions, existing variants of the SMT problem are NP-complete. Most SMT heuristics in the literature have analogies to classic minimum spanning tree constructions; this is in part due to the MST being a constant-factor approximation to the SMT, with performance ratio SMT heuristics, and shows that such methods cannot have performance ratio better than that of the simple MST approximation.
The focus of Chapter 2 lies in developing the Iterated 1-Steiner I1S heuristic, which iteratively nds optimal Steiner points that are added directly into the set S. The I1S construction thus avoids traditional analogies to minimum spanning tree solutions, and in practice achieves good performance even on inputs that are pathological for previous heuristics. For random 8-point planar instances, I1S solution costs are optimal for 90 of all instances, and average within 0:25 of optimal overall. The I1S approach also applies to graph instances and higher-dimensional geometric instances. The chapter describes a straightforward, e cient implementation of I1S, along with such enhancements as a parallel implementation t h a t a c hieves near-linear speedup. Similarities between I1S and the recent method of Zelikovsky are also discussed.
Finally, Chapter 2 develops the result that any pointset in the Manhattan plane has an MST with maximumdegree 4, and that in three-dimensional Manhattan space the maximum MST degree is 14 the best previous bounds were 6 and 26, respectively; this improves I1S runtimes and is also of independent theoretical interest. The chapter concludes with a discussion of the Steiner problem in graphs.
Minimum Delay: Toward Optimal-Delay Routing Trees
Chapter 3 considers minimization of signal delay, w h i c h is synonymous with performance-driven" system design. As VLSI technology scales to smaller feature sizes and larger layout areas, signal delays become interconnect-dominated, i.e., signal delay through interconnects increasingly dominates delay through devices. In leading-edge technologies, minimum-delay wiring topologies can di er substantially from minimum-area SMT wiring topologies.
The signal delay objective takes us from the unoriented pointset of the Steiner minimal tree problem to an oriented collection of terminals in the layout plane. Such a collection of terminals, which w e c a l l a signal net, has one identi ed source terminal; the remaining terminals are sinks. T ypically, the source terminal is the output of a gate, and the sinks are the fanins for that output signal at inputs of other gates.
The discussion of Chapter 3 centers on four issues which h a ve guided recent progress in minimum-delay routing heuristics. First, there is the issue of technology-dependence in the routing construction, e.g., a simple analysis of Elmore delay in distributed RC trees shows that routing objectives should be dependent on parameters of the prevailing interconnect technology. W e t h us give a taxonomy of methods based on their tunability to speci c technology parameters and signal net criticalities, and demonstrate the advantages of such tunable methods as the Elmore routing tree" approach and the Prim-Dijkstra tradeo .
Second, the chapter compares actual delay", v ersus geometric, routing objectives. To a rst-order approximation, signal delay from the source to a given sink is proportional to the source-sink pathlength in the routing tree. This linear delay approximation suggests minimizing the maximum source-sink pathlength in the routing tree i.e., a geometric minimum-radius" criterion. On the other hand, reducing the total cost of the routing tree will reduce its lumped capacitance i.e., a geometric minimum-cost" criterion. We review how early works employed geometric criteria to achieve tractability in both the design and the analysis of routing heuristics. Of particular interest is a boundedradius, bounded-cost" BRBC approach which seeks a minimum-cost routing tree subject to a given bound on tree radius; we describe an algorithm which simultaneously minimizes both tree cost and tree radius to within constant factors of optimal. The BRBC approach and its analysis generalize to Steiner routing and to routing in arbitrary weighted graphs that capture the variation of routing costs over the layout region. The chapter gives details of recent methods, notably the Elmore routing tree" variants which obtain reduced signal delays by optimizing higher-order delay estimates directly.
Third, we discuss minimization of sink-dependent delay, as opposed to netdependent delay. Here, the key observation is that timing-driven placement and routing are typically iterated with static timing estimation, so that critical-path information is available during the routing tree construction. With this in mind, the traditional objective of minimizing maximum sink delay is net-dependent" in that it ignores available path-dependent information. An approach w h i c h optimizes delay to identi ed critical sinks, such a s t h a t g i v en in 1993 by Boese, Kahng and Robins 34 , seems better matched to modern design methodologies. More recent w o r k o f B o e s e e t a l . provides an interesting addendum to the earlier SMT discussion: it generalizes Hanan's theorem to Elmore delay-optimal Steiner trees and gives a new peeling" decomposition for optimal Steiner trees.
Finally, Chapter 3 addresses the issue of demonstrable quality for minimumdelay routing heuristics. Analogous to the empirical studies of the I1S SMT heuristic in Chapter 2, we present empirical studies showing near-optimality o f a construction for minimumElmore delay at prescribed critical sinks. The chap-ter concludes with a review of two other recent a d v ances in performance-driven interconnect design; these involve wiresizing and non-tree routing techniques. An Appendix provides the basic theory behind several e cient delay estimates, and also discusses measures of accuracy and delity for the linear, Elmore, and two-pole delay approximations.
Minimum Skew: The Zero-Skew Clock Routing Problem
In a high-performance VLSI design, circuit speed is limited not only by t h e signal propagation within and between circuit elements, but also by t h e skew between signal arrival times. The form of skew most often studied is clock skew, i.e., the di erence between longest and shortest arrival times of a clock signal at synchronizing elements of the circuit. Clock s k ew minimization, and in particular the zero-skew clock routing" problem, has become a central issue in the design of leading-edge systems. However, it should be noted that skew control for arbitrary signal nets is also of increasing importance, as are related problems of prescribed-skew or bounded-skew routing.
Chapter 4 discusses clock tree construction to minimize skew and wirelength as a combination of two processes: topology generation, and geometric embedding of the topology. W e present methods which accomplish each of these processes using either the linear or Elmore delay model to guide the construction. Our discussion focuses on so-called exact zero skew" clock routing constructions.
The rst part of Chapter 4 uses the linear delay model to motivate a pathlengthbalanced t r ee problem formulation, which seeks a minimum-cost tree with all source-sink pathlengths of equal length. We describe a simple approach, based on iterative geometric matching, for generating a clock tree topology while simultaneously embedding it in the layout region.
The second part of the chapter describes the Deferred-Merge Embedding DME algorithm, which e m beds any prescribed connection topology i.e., a binary tree with the clock sinks at the leaves, so as to create a clock tree with zero skew while minimizing total wirelength. The algorithm runs in linear time, and always yields exact zero skew trees with respect to a given monotone delay model such as linear or Elmore delay. The DME method achieves substantial cost reductions over earlier constructions, and can be combined with previous methods that concentrate on generation of the clock tree topology.
Finally, the third part of the chapter uni es the topology generation and geometric embedding of exact zero-skew clock trees. Under the linear delay m o d e l , the two phases of the DME algorithm bottom-up identi cation of loci for zeroskew balance points", followed by top-down selection of these balance points within a minimum-delay zero-skew embedding can be replaced by a single topdown phase. Where DME would nominally require a prescribed topology as input, this top-down construction allows the clock tree topology to be determined dynamically and exibly while being optimally embedded at the same time. A natural outgrowth is a DME-like algorithm for single-layer, e x a c t zero-skew clock routing; such a construction is increasingly sought to minimize signal attenuation through vias, simplify bu ering optimizations, and maximize process-variation independence.
Chapter 4 also describes extensions of these clock routing methods to minmax" delay constraints and bounded-skew routing for general signal nets. The chapter concludes by noting additional issues and problem formulations, including optimal bu ering hierarchies for minimum phase delay, a n d m ultiple-level clock trees for multi-chip module packaging.
Multiple Objectives
The last chapter of the book, Chapter 5, discusses frameworks and techniques which enable the simultaneous optimization of multiple competing objectives. Section 5.1 notes that beyond the nominal total wirelength, the grid-based structure of VLSI routing resources provides additional information for determining the impact of a given routing solution on layout area. The discussion explores a new minimum density objective for spanning and Steiner tree constructions, which seeks to balance the use of horizontal and vertical routing resources. We describe two heuristic constructions for low-density spanning trees whose outputs are within small constants of optimal with respect to both tree cost and density. The proof techniques suggest a constructive l o wer bound scheme which a ords tighter estimates of solution quality f o r a g i v en problem instance. Of particular interest is that the minimum density objective c a n be transparently combined with, e.g., minimum radius or minimum skew without a ecting asymptotic solution quality with respect to these competing objectives.
While previous chapters each focus on a fundamental routing criterion i.e., area, delay o r s k ew, many secondary objectives may exist, including congestion avoidance, jog minimization, reliability, etc. Section 5.2 develops a general framework of multi-weighted g r aphs, i n w h i c h m ultiple competing objectives can be simultaneously optimized. This is accomplished by assigning to each edge a vector of weights, corresponding to the various optimization criteria; graph searches are then guided by the weighted average of the edge weights according to designer-speci ed tradeo parameters. This framework is applicable to graph-based routing regimes, such as building-block design and eld-programmable gate array l a yout.
Finally, w e describe optimization within the framework of a continuouslyweighted layout region, w h i c h can be induced by the simultaneous consideration of multiple criteria e.g., reliability, thermal density, and routing congestion. Within this framework, we consider a problem which has applications ranging from circuit board routing to vehicle navigation, namely, nding a minimumcost prescribed-width path connecting a given source and destination 131 . Previous path routing approaches such as Dijkstra's algorithm implicitly assume that the path is of zero width, but this assumption is usually not realistic e.g., consider routing a wide bus, or traces on a circuit board. Section 5.3 develops a network-ow based approach to prescribed-width routing in a continuously weighted region. Interestingly, the extension to higher dimensions can solve a discrete version of Plateau's problem, which seeks a minimum-area surface that spans a given closed curve 130 .
ACKNOWLEDGMENTS
This book is the product of the research, suggestions, and technical assistance of many individuals. We rst thank the students who have been so dedicated to the research that forms the basis of this book. In alphabetical order 2 , they are: Mike Alexander, Charles J. Alpert, Kenneth D. Boese, Dennis Jen-Hsin Huang, Berni A. McCoy, C h ung-Wen Albert Tsao and Tongtong Zhang. Any l i s t o f speci c debts must begin with Ken Boese, who developed much of the core material in Chapters 3 and 4, including the characterization of delay-optimal routing trees and the results concerning the DME clock routing algorithm. The precise exposition in these sections is a product of Ken's e orts. Berni McCoy dedicated well over a year to investigations of accuracy and delity o f d e l a y estimates, near-optimality o f t h e E R T construction, dynamic wiresizing and non-tree routing these results appear throughout Chapter 3. Mike Alexander developed the graph generalization of I1S in Chapter 2, as well as the multi-
