Search CORE

103 research outputs found

Recommended from our members

Interconnect optimizations for nanometer VLSI design

Author: Zhang Yilin, 1986-
Publication venue
Publication date: 19/09/2014
Field of study

textAs the semiconductor technology scales into deeper sub-micron domain, billions of transistors can be used on a single system-on-chip (SOC) makes interconnection optimization more important roughly for two reasons. First, congestion, power, timing in routing and buffering requirements make inter- connection optimization more and more challenging. Second, gate delay get- ting shorter while the RC delay gets longer due to scaling. Study of interconnection construction and optimization algorithms in real industry flows and designs ends up with interesting findings. One used to be overlooked but very important and practical problem is how to utilize over- the-block routing resources intelligently. Routing over large IP blocks needs special attention as there is almost no way to insert buffers inside hard IP blocks, which can lead to unsolvable slew/timing violations. In current design flows we have seen, the routing resources over the IP blocks were either dealt as routing blockages leading to a significant waste, or simply treated in the same way as outside-the-block routing resources, which would violate the slew constraints and thus fail buffering. To handle that, this work proposes a novel buffering-aware over-the- block rectilinear Steiner minimum tree (BOB-RSMT) algorithm which helps reclaim the “wasted” over-the-block routing resources while meeting user-specified slew constraints. Proposed algorithm incrementally and efficiently migrates initial tree structures with buffering-awareness to meet slew constraints while minimizing wire-length. Moreover, due to the fact that timing optimization is important for the VLSI design, in this work, timing-driven over-the-block rectilinear Steiner tree (TOB-RST) is also studied to optimize critical paths. This proposed TOB-RST algorithm can be used in routing or post-routing stage to provide high-quality topologies to help close timing. Then a follow-up problem emerges: how to accomplish the whole routing with over-the-block routing resources used properly. Utilizing over-the- block routing resources could dramatically improve the routing solution, yet require special attention, since the slew, affected by different RC on different metal layers, must be constrained by buffering and is easily violated. Moreover, even of all nets are slew-legalized, the routing solution could still suffer from heavy congestion problem. A new global router, BOB-Router, is to solve the over-the-block global routing problem through minimizing overflows, wire-length and via count simultaneously without violating slew constraints. Based on my completed works, BOB-RSMT and BOB-Router tremendously improve the overall routing and buffering quality. Experimental results show that proposed over-the-block rectilinear Steiner tree construction and routing completely satisfies the slew constraints and significantly outperforms the obstacle-avoiding rectilinear Steiner tree construction and routing in terms of wire-length, via count and overflows.Electrical and Computer Engineerin

Texas ScholarWorks

Algorithms for the scaling toward nanometer VLSI physical synthesis

Author: Sze Chin Ngai
Publication venue: Texas A&M University
Publication date: 25/04/2007
Field of study

Along the history of Very Large Scale Integration (VLSI), we have successfully scaled down the size of transistors, scaled up the speed of integrated circuits (IC) and the number of transistors in a chip - these are just a few examples of our achievement in VLSI scaling. It is projected to enter the nanometer (timing estimation and buffer planning for global routing and other early stages such as floorplanning. A novel path based buffer insertion scheme is also included, which can overcome the weakness of the net based approaches. Part-2 Circuit clustering techniques with the application in Field-Programmable Gate Array (FPGA) technology mapping The problem of timing driven n-way circuit partitioning with application to FPGA technology mapping is studied and a hierarchical clustering approach is presented for the latest multi-level FPGA architectures. Moreover, a more general delay model is included in order to accurately characterize the delay behavior of the clusters and circuit elements

Texas A&M Repository

Throughput-driven floorplanning with wire pipelining

Author: Casu Mario Roberto
Macchiarulo Luca
Publication venue: IEEE
Publication date: 01/01/2005
Field of study

The size of future high-performance SoC is such that the time-of-flight of wires connecting distant pins in the layout can be much higher than the clock period. In order to keep the frequency as high as possible, the wires may be pipelined. However, the insertion of flip-flops may alter the throughput of the system due to the presence of loops in the logic netlist. In this paper, we address the problem of floorplanning a large design where long interconnects are pipelined by inserting the throughput in the cost function of a tool based on simulated annealing. The results obtained on a series of benchmarks are then validated using a simple router that breaks long interconnects by suitably placing flip-flops along the wires

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Fast Repeater Tree Construction

Author: Bartoschek Christoph
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

Repeaters are used during physical design of chips to improve the electrical and timing properties of interconnections. They are added along Steiner trees that connect root gates to sinks, creating repeater trees. Their construction became a crucial part of chip design. We present a new algorithm to solve the repeater tree construction problem. We first present an extensive version of the Repeater Tree Problem. Our problem formulation encapsulates most of the constraints that have been studied so far. We also consider several aspects for the first time, for example, slew dependent required arrival times at repeater tree sinks. The employed technology, the properties of available repeaters and metal wires, the shape of the chip, the temperature, the voltages, and many other factors highly influence the results of repeater tree construction. To take all this into account, we extensively preprocess the environment to extract parameters for our algorithms. We first present an algorithm for Steiner tree creation and prove that our algorithm is able to create timing-efficient as well as cost-efficient trees. Our algorithm is based on a delay model that accurately describes the timing that one can achieve after repeater insertion upfront. Next, we deal with the problem of adding repeaters to a given Steiner tree. The predominantly used algorithms to solve this problem use dynamic programming. However, they have several drawbacks. Firstly, potential repeater positions along the Steiner tree have to be chosen upfront. Secondly, the algorithms strictly follow the given Steiner tree and miss optimization opportunities. Finally, dynamic programming causes high running times. We present our new buffer insertion algorithm, Fast Buffering, that overcomes these limitations. It is able to produce results with similar quality to a dynamic programming approach but a much better running time. In addition, we also present improvements to the dynamic programming approach that allows us to push the quality at the expense of a high running time. We have implemented our algorithms as part of the BonnTools physical design optimization suite developed at the Research Institute for Discrete Mathematics in cooperation with IBM. Our implementation deals with all tedious details of a grown real-world chip optimization environment. We have created extensive experimental results on challenging real-world test cases provided by our cooperation partner. Our algorithm can solve about 5.7 million instances per hour

bonndoc – Der Publikationsserver der Universität Bonn

Timing-Constrained Global Routing with Buffered Steiner Trees

Author: Rotter Daniel
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

This dissertation deals with the combination of two key problems that arise in the physical design of computer chips: global routing and buffering. The task of buffering is the insertion of buffers and inverters into the chip's netlist to speed-up signal delays and to improve electrical properties of the chip. Insertion of buffers and inverters goes alongside with construction of Steiner trees that connect logical sources with possibly many logical sinks and have buffers and inverters as parts of these connections. Classical global routing focuses on packing Steiner trees within the limited routing space. Buffering and global routing have been solved separately in the past. In this thesis we overcome the limitations of the classical approaches by considering the buffering problem as a global, multi-objective problem. We study its theoretical aspects and propose algorithms which we implement in the tool BonnRouteBuffer for timing-constrained global routing with buffered Steiner trees. At its core, we propose a new theoretically founded framework to model timing constraints inherently within global routing. As most important sub-task we have to compute a buffered Steiner tree for a single net minimizing the sum of prices for delays, routing congestion, placement congestion, power consumption, and net length. For this sub-task we present a fully polynomial time approximation scheme to compute an almost-cheapest Steiner tree with a given routing topology and prove that an exact algorithm cannot exist unless P=NP. For topology computation we present a bicriteria approximation algorithm that bounds both the geometric length and the worst slack of the topology. To improve the practical results we present many heuristic modifications, speed-up- and post-optimization techniques for buffered Steiner trees. We conduct experiments on challenging real-world test cases provided by our cooperation partner IBM to demonstrate the quality of our tool. Our new algorithm could produce better solutions with respect to both timing and routability. After post-processing with gate sizing and Vt-assignment, we can even reduce the power consumption on most instances. Overall, our results show that our tool BonnRouteBuffer for timing-constrained global routing is superior to industrial state-of-the-art tools

bonndoc – Der Publikationsserver der Universität Bonn

Shortest Paths and Steiner Trees in VLSI Routing

Author: Peyer Sven
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

Routing is one of the major steps in very-large-scale integration (VLSI) design. Its task is to find disjoint wire connections between sets of points on a chip, subject to numerous constraints. This problem is solved in a two-stage approach, which consists of so-called global and detailed routing steps. For each set of metal components to be connected, global routing reduces the search space by computing corridors in which detailed routing sequentially determines the desired connections as shortest paths. In this thesis, we present new theoretical results on Steiner trees and shortest paths, the two main mathematical concepts in routing. In the practical part, we give computational results of BonnRoute, a VLSI routing tool developed at the Research Institute for Discrete Mathematics at the University of Bonn. Interconnect signal delays are becoming increasingly important in modern chip designs. Therefore, the length of paths or direct delay measures should be taken into account when constructing rectilinear Steiner trees. We consider the problem of finding a rectilinear Steiner minimum tree (RSMT) that --- as a secondary objective --- minimizes a signal delay related objective. Given a source we derive some structural properties of RSMTs for which the weighted sum of path lengths from the source to the other terminals is minimized. Also, we present an exact algorithm for constructing RSMTs with weighted sum of path lengths as secondary objective, and a heuristic for various secondary objectives. Computational results for industrial designs are presented. We further consider the problem of finding a shortest rectilinear Steiner tree in the plane in the presence of rectilinear obstacles. The Steiner tree is allowed to run over obstacles; however, if it intersects an obstacle, then no connected component of the induced subtree must be longer than a given fixed length. This kind of length restriction is motivated by its application in VLSI routing where a large Steiner tree requires the insertion of repeaters which must not be placed on top of obstacles. We show that there are optimal length-restricted Steiner trees with a special structure. In particular, we prove that a certain graph (called augmented Hanan grid) always contains an optimal solution. Based on this structural result, we give an approximation scheme for the special case that all obstacles are of rectangular shape or are represented by at most a constant number of edges. Turning to the shortest paths problem, we present a new generic framework for Dijkstra's algorithm for finding shortest paths in digraphs with non-negative integral edge lengths. Instead of labeling individual vertices, we label subgraphs which partition the given graph. Much better running times can be achieved if the number of involved subgraphs is small compared to the order of the original graph and the shortest path problems restricted to these subgraphs is computationally easy. As an application we consider the VLSI routing problem, where we need to find millions of shortest paths in partial grid graphs with billions of vertices. Here, the algorithm can be applied twice, once in a coarse abstraction (where the labeled subgraphs are rectangles), and once in a detailed model (where the labeled subgraphs are intervals). Using the result of the first algorithm to speed up the second one via goal-oriented techniques leads to considerably reduced running time. We illustrate this with the routing program BonnRoute on leading-edge industrial chips. Finally, we present computational results of BonnRoute obtained on real-world VLSI chips. BonnRoute fulfills all requirements of modern VLSI routing and has been used by IBM and its customers over many years to produce more than one thousand different chips. To demonstrate the strength of BonnRoute as a state-of-the-art industrial routing tool, we show that it performs excellently on all traditional quality measures such as wire length and number of vias, but also on further criteria of equal importance in the every-day work of the designer

bonndoc – Der Publikationsserver der Universität Bonn

Timing Closure in Chip Design

Author: Held Stephan
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

Achieving timing closure is a major challenge to the physical design of a computer chip. Its task is to find a physical realization fulfilling the speed specifications. In this thesis, we propose new algorithms for the key tasks of performance optimization, namely repeater tree construction; circuit sizing; clock skew scheduling; threshold voltage optimization and plane assignment. Furthermore, a new program flow for timing closure is developed that integrates these algorithms with placement and clocktree construction. For repeater tree construction a new algorithm for computing topologies, which are later filled with repeaters, is presented. To this end, we propose a new delay model for topologies that not only accounts for the path lengths, as existing approaches do, but also for the number of bifurcations on a path, which introduce extra capacitance and thereby delay. In the extreme cases of pure power optimization and pure delay optimization the optimum topologies regarding our delay model are minimum Steiner trees and alphabetic code trees with the shortest possible path lengths. We presented a new, extremely fast algorithm that scales seamlessly between the two opposite objectives. For special cases, we prove the optimality of our algorithm. The efficiency and effectiveness in practice is demonstrated by comprehensive experimental results. The task of circuit sizing is to assign millions of small elementary logic circuits to elements from a discrete set of logically equivalent, predefined physical layouts such that power consumption is minimized and all signal paths are sufficiently fast. In this thesis we develop a fast heuristic approach for global circuit sizing, followed by a local search into a local optimum. Our algorithms use, in contrast to existing approaches, the available discrete layout choices and accurate delay models with slew propagation. The global approach iteratively assigns slew targets to all source pins of the chip and chooses a discrete layout of minimum size preserving the slew targets. In comprehensive experiments on real instances, we demonstrate that the worst path delay is within 7% of its lower bound on average after a few iterations. The subsequent local search reduces this gap to 2% on average. Combining global and local sizing we are able to size more than 5.7 million circuits within 3 hours. For the clock skew scheduling problem we develop the first algorithm with a strongly polynomial running time for the cycle time minimization in the presence of different cycle times and multi-cycle paths. In practice, an iterative local search method is much more efficient. We prove that this iterative method maximizes the worst slack, even when restricting the feasible schedule to certain time intervals. Furthermore, we enhance the iterative local approach to determine a lexicographically optimum slack distribution. The clock skew scheduling problem is then generalized to allow for simultaneous data path optimization. In fact, this is a time-cost tradeoff problem. We developed the first combinatorial algorithm for computing time-cost tradeoff curves in graphs that may contain cycles. Starting from the lowest-cost solution, the algorithm iteratively computes a descent direction by a minimum cost flow computation. The maximum feasible step length is then determined by a minimum ratio cycle computation. This approach can be used in chip design for several optimization tasks, e.g. threshold voltage optimization or plane assignment. Finally, the optimization routines are combined into a timing closure flow. Here, the global placement is alternated with global performance optimization. Netweights are used to penalize the length of critical nets during placement. After the global phase, the performance is improved further by applying more comprehensive optimization routines on the most critical paths. In the end, the clock schedule is optimized and clocktrees are inserted. Computational results of the design flow are obtained on real-world computer chips

bonndoc – Der Publikationsserver der Universität Bonn

High-performance Global Routing for Trillion-gate Systems-on-Chips.

Author: Hu Jin
Publication venue
Publication date: 01/01/2013
Field of study

Due to aggressive transistor scaling, modern-day CMOS circuits have continually increased in both complexity and productivity. Modern semiconductor designs have narrower and more resistive wires, thereby shifting the performance bottleneck to interconnect delay. These trends considerably impact timing closure and call for improvements in high-performance physical design tools to keep pace with the current state of IC innovation. As leading-edge designs may incorporate tens of millions of gates, algorithm and software scalability are crucial to achieving reasonable turnaround time. Moreover, with decreasing device sizes, optimizing traditional objectives is no longer sufficient. Our research focuses on (i) expanding the capabilities of standalone global routing, (ii) extending global routing for use in different design applications, and (iii) integrating routing within broader physical design optimizations and flows, e.g., congestion-driven placement. Our first global router relies on integer-linear programming (ILP), and can solve fairly large problem instances to optimality. Our second iterative global router relies on Lagrangian relaxation, where we relax the routing violation constraints to allowing routing overflow at a penalty. In both approaches, our desire is to give the router the maximum degree of freedom within a specified context. Empirically, both routers produce competitive results within a reasonable amount of runtime. To improve routability, we explore the incorporation of routing with placement, where the router estimates congestion and feeds this information to the placer. In turn, the emphasis on runtime is heightened, as the router will be invoked multiple times. Empirically, our placement-and-route framework significantly improves the final solution’s routability than performing the steps sequentially. To further enhance routability-driven placement, we (i) leverage incrementality to generate fast and accurate congestion maps, and (ii) develop several techniques to relieve cell-based and layout-based congestion. To broaden the scope of routing, we integrate a global router in a chip-design flow that addresses the buffer explosion problem.PHDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/98025/1/jinhu_1.pd

Deep Blue Documents at the University of Michigan

Recommended from our members

Nanometer VLSI placement and optimization for multi-objective design closure

Author: Luo Tao, Ph. D.
Publication venue
Publication date: 01/12/2007
Field of study

In a VLSI physical synthesis flow, placement directly defines the interconnection, which affects many other design objectives, such as timing, power consumption, congestion, and thermal issues. With the scaling of technology, the relative interconnect delay increases dramatically. As a result, placement has become a bottleneck in deep sub-micron physical synthesis. In this dissertation, I propose several optimization algorithms from global placement, placement migration, timing driven placements, to incremental power optimizations for multi-objective VLSI design closure. The first work is DPlace, a new global placement algorithm that scales well to the modern large-scale circuit placement problems. DPlace simulates the natural diffusion process to spread cells smoothly over the placement region, and uses both analytical and discrete techniques to improve the wire length. However, global placement is never sufficient for multi-objective design closure, a variety of design objectives have to be improved incrementally, such as timing, routing congestion, signal integrity, and heat distribution. Placement migration is a critical step to address the cell overlaps appearing during incremental optimizations. To achieve high placement stability, I propose a computational geometry based placement migration flow to cope with placement changes, and a new stability metric to measure the “similarity” between two placements accurately. Our placement migration algorithm has clear advantage over conventional legalization algorithms such that the neighborhood characteristics of the original placement are preserved. For timing closure in high performance designs, I present a linear programming based incremental timing driven placement to improve the timing on critical paths directly. I further present an efficient timing driven placement algorithm (Pyramids). Two formulations of Pyramids are proposed, which are suitable for different optimization stages in a physical synthesis flow. Both approaches find the optimal location for timing of a cell in constant time, through computational geometry based approaches. For fast convergence of design closure, placement should be integrated with other optimization techniques. I propose to combine placement, gate sizing and Vt swapping techniques to reduce the total power consumption, especially the leakage power, which is becoming increasingly critical for nanometer VLSI design closure.Electrical and Computer Engineerin

Texas ScholarWorks