    High performance algorithms for large scale placement problem

    Placement is one of the most important problems in electronic design automation (EDA). An inferior placement solution will not only affect the chip’s performance but might also make it nonmanufacturable by producing excessive wirelength, which is beyond available routing resources. Although placement has been extensively investigated for several decades, it is still a very challenging problem mainly due to that design scale has been dramatically increased by order of magnitudes and the increasing trend seems unstoppable. In modern design, chips commonly integrate millions of gates that require over tens of metal routing layers. Besides, new manufacturing techniques bring out new requests leading to that multi-objectives should be optimized simultaneously during placement. Our research provides high performance algorithms for placement problem. We propose (i) a high performance global placement core engine POLAR; (ii) an efficient routability-driven placer POLAR 2.0, which is an extension of POLAR to deal with routing congestion; (iii) an ultrafast global placer POLAR 3.0, which explore parallelism on POLAR and can make full use of multi-core system; (iv) some efficient triple patterning lithography (TPL) aware detailed placement algorithms

    An integrated placement and routing approach

    As the feature size continues scaling down, interconnects become the major contributor of signal delay. Since interconnects are mainly determined by placement and routing, these two stages play key roles to achieve high performance. Historically, they are divided into two separate stages to make the problem tractable. Therefore, the routing information is not available during the placement process. Net models such as HPWL, are employed to approximate the routing to simplify the placement problem. However, the good placement in terms of these objectives may not be routable at all in the routing stage because different objectives are optimized in placement and routing stages. This inconsistancy makes the results obtained by the two-step optimization method far from optimal;In order to achieve high-quality placement solution and ensure the following routing, we propose an integrated placement and routing approach. In this approach, we integrate placement and routing into the same framework so that the objective optimized in placement is the same as that in routing. Since both placement and routing are very hard problems (NP-hard), we need to have very efficient algorithms so that integrating them together will not lead to intractable complexity;In this dissertation, we first develop a highly efficient placer - FastPlace 3.0 for large-scale mixed-size placement problem. Then, an efficient and effective detailed placer - FastDP is proposed to improve global placement by moving standard cells in designs. For high-degree nets in designs, we propose a novel performance-driven topology design algorithm to generate good topologies to achieve very strict timing requirement. In the routing phase, we develop two global routers, FastRoute and FastRoute 2.0. Compared to traditional global routers, they can generate better solutions and are two orders of magnitude faster. Finally, based on these efficient and high-quality placement and routing algorithms, we propose a new flow which integrates placement and routing together closely. In this flow, global routing is extensively applied to obtain the interconnect information and direct the placement process. In this way, we can get very good placement solutions with guaranteed routability

    An efficient analytical placement algorithm using cell shifting, iterative local refinement and a hybrid net model

    In this thesis, we present FastPlace-a fast, iterative, flat placement algorithm for large scale standard cell designs in the fixed-die context. FastPlace is based on the quadratic placement approach. The quadratic approach formulates the wirelength minimization problem as a convex quadratic program, which can be solved analytically by some efficient techniques. However, the quadratic approach in general suffers from some drawbacks. First, the resulting placement has a lot of overlap among cells. Second, the resulting total wirelength may be long as the quadratic wirelength objective is only an indirect measure of the total linear wirelength. Third, existing net models tend to create a lot of non-zero entries in the connectivity matrix while modeling the netlist and this slows down the quadratic program solver. These problems are handled as follows: (1) A Cell Shifting technique is proposed to generate an evenly distribute global placement from the quadratic program solution. This technique is very efficient and produces a high-quality global placement with even cell distribution. (2) An Iterative Local Refinement technique is proposed to reduce the wirelength according to the half-perimeter bounding rectangle measure. This technique is very effective as it makes use of the wirelength and cell distribution information provided by a coarse global placement. (3) A Hybrid Net Model is proposed which is a combination of the traditional clique and star models. This net model significantly reduces the number of non-zero entries in the connectivity matrix. It results in a significant speed-up of the solver as compared to using it with the traditional clique model. Experimental results show that the run-time of FastPlace is of the order O(n1·412), where n is the circuit size given by the number of pins. Also, the current implementation when tested on 18 Standard Cell benchmark circuits is on average 11.0 and 82.7 times faster than existing academic placers Capo and Dragon respectively

    High-performance Global Routing for Trillion-gate Systems-on-Chips.

    Due to aggressive transistor scaling, modern-day CMOS circuits have continually increased in both complexity and productivity. Modern semiconductor designs have narrower and more resistive wires, thereby shifting the performance bottleneck to interconnect delay. These trends considerably impact timing closure and call for improvements in high-performance physical design tools to keep pace with the current state of IC innovation. As leading-edge designs may incorporate tens of millions of gates, algorithm and software scalability are crucial to achieving reasonable turnaround time. Moreover, with decreasing device sizes, optimizing traditional objectives is no longer sufficient. Our research focuses on (i) expanding the capabilities of standalone global routing, (ii) extending global routing for use in different design applications, and (iii) integrating routing within broader physical design optimizations and flows, e.g., congestion-driven placement. Our first global router relies on integer-linear programming (ILP), and can solve fairly large problem instances to optimality. Our second iterative global router relies on Lagrangian relaxation, where we relax the routing violation constraints to allowing routing overflow at a penalty. In both approaches, our desire is to give the router the maximum degree of freedom within a specified context. Empirically, both routers produce competitive results within a reasonable amount of runtime. To improve routability, we explore the incorporation of routing with placement, where the router estimates congestion and feeds this information to the placer. In turn, the emphasis on runtime is heightened, as the router will be invoked multiple times. Empirically, our placement-and-route framework significantly improves the final solution’s routability than performing the steps sequentially. To further enhance routability-driven placement, we (i) leverage incrementality to generate fast and accurate congestion maps, and (ii) develop several techniques to relieve cell-based and layout-based congestion. To broaden the scope of routing, we integrate a global router in a chip-design flow that addresses the buffer explosion problem.PHDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/98025/1/jinhu_1.pd

    Handling the complexity of routing problem in modern VLSI design

    In VLSI physical design, the routing task consists of using over-the-cell metal wires to connect pins and ports of circuit gates and blocks. Traditionally, VLSI routing is an important design step in the sense that the quality of routing solution has great impact on various design metrics such as circuit timing, power consumption, chip reliability and manufacturability etc. As the advancing VLSI design enters the nanometer era, the routing success (routability issue) has been arising as one of the most critical problems in back-end design. In one aspect, the degree of design complexity is increasing dramatically as more and more modules are integrated into the chip. Much higher chip density leads to higher routing demands and potentially more risks in routing failure. In another aspect, with decreasing design feature size, there are more complex design rules imposed to ensure manufacturability. These design rules are hard to satisfy and they usually create more barriers for achieving routing closure (i.e., generate DRC free routing solution) and thus affect chip time to market (TTM) plan. In general, the behavior and performance of routing are affected by three consecutive phases: placement phase, global routing phase and detailed routing phase in a typical VLSI physical design flow. Traditional CAD tools handle each of the three phases independently and the global picture of the routability issue is neglected. Different from conventional approaches which propose tools and algorithms for one particular design phase, this thesis investigates the routability issue from all three phases and proposes a series of systematic solutions to build a more generic flow and improve quality of results (QoR). For the placement phase, we will introduce a mixed-sized placement refinement tool for alleviating congestion after placement. The tool shifts and relocates modules based on a global routing estimation. For the global routing phase, a very fast and effective global router is developed. Its performance surpasses many peer works as verified by ISPD 2008 global routing contest results. In the detailed routing phase, a tool is proposed to perform detailed routing using regular routing patterns based on a correct-by-construction methodology to improve routability as well as satisfy most design rules. Finally, the tool which integrates global routing and detailed routing is developed to remedy the inconsistency between global routing and detailed routing. To verify the algorithms we proposed, three sets of testcases derived from ISPD98 and ISPD05/06 placement benchmark suites are proposed. The results indicate that our proposed methods construct an integrated and systematic flow for routability improvement which is better than conventional methods

    Timing-Driven Macro Placement

    Placement is an important step in the process of finding physical layouts for electronic computer chips. The basic task during placement is to arrange the building blocks of the chip, the circuits, disjointly within a given chip area. Furthermore, such positions should result in short circuit interconnections which can be routed easily and which ensure all signals arrive in time. This dissertation mostly focuses on macros, the largest circuits on a chip. In order to optimize timing characteristics during macro placement, we propose a new optimistic timing model based on geometric distance constraints. This model can be computed and evaluated efficiently in order to predict timing traits accurately in practice. Packing rectangles disjointly remains strongly NP-hard under slack maximization in our timing model. Despite of this we develop an exact, linear time algorithm for special cases. The proposed timing model is incorporated into BonnMacro, the macro placement component of the BonnTools physical design optimization suite developed at the Research Institute for Discrete Mathematics. Using efficient formulations as mixed-integer programs we can legalize macros locally while optimizing timing. This results in the first timing-aware macro placement tool. In addition, we provide multiple enhancements for the partitioning-based standard circuit placement algorithm BonnPlace. We find a model of partitioning as minimum-cost flow problem that is provably as small as possible using which we can avoid running time intensive instances. Moreover we propose the new global placement flow Self-Stabilizing BonnPlace. This approach combines BonnPlace with a force-directed placement framework. It provides the flexibility to optimize the two involved objectives, routability and timing, directly during placement. The performance of our placement tools is confirmed on a large variety of academic benchmarks as well as real-world designs provided by our industrial partner IBM. We reduce running time of partitioning significantly and demonstrate that Self-Stabilizing BonnPlace finds easily routable placements for challenging designs – even when simultaneously optimizing timing objectives. BonnMacro and Self-Stabilizing BonnPlace can be combined to the first timing-driven mixed-size placement flow. This combination often finds placements with competitive timing traits and even outperforms solutions that have been determined manually by experienced designers

    High-Performance Placement and Routing for the Nanometer Scale.

    Modern semiconductor manufacturing facilitates single-chip electronic systems that only five years ago required ten to twenty chips. Naturally, design complexity has grown within this period. In contrast to this growth, it is becoming common in the industry to limit design team size which places a heavier burden on design automation tools. Our work identifies new objectives, constraints and concerns in the physical design of systems-on-chip, and develops new computational techniques to address them. In addition to faster and more relevant design optimizations, we demonstrate that traditional design flows based on ``separation of concerns'' produce unnecessarily suboptimal layouts. We develop new integrated optimizations that streamline traditional chains of loosely-linked design tools. In particular, we bridge the gap between mixed-size placement and routing by updating the objective of global and detail placement to a more accurate estimate of routed wirelength. To this we add sophisticated whitespace allocation, and the combination provides increased routability, faster routing, shorter routed wirelength, and the best via counts of published techniques. To further improve post-routing design metrics, we present new global routing techniques based on Discrete Lagrange Multipliers (DLM) which produce the best routed wirelength results on recent benchmarks. Our work culminates in the integration of our routing techniques within an incremental placement flow to improve detailed routing solutions, shrink die sizes and reduce total chip cost. Not only do our techniques improve the quality and cost of designs, but also simplify design automation software implementation in many cases. Ultimately, we reduce the time needed for design closure through improved tool fidelity and the use of our incremental techniques for placement and routing.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/64639/1/royj_1.pd

    Flow-based Partitioning and Fast Global Placement in Chip Design

    VLSI placement is one of the major steps in the chip design process and an interesting subject of research in industry and academia. Recent chips consist of several millions of circuits connected by millions of nets. The classical placement objective of finding positions for circuits and minimizing netlength among them is an ongoing issue in optimization of chip performance. The increasing instance sizes, the tightness of timing and routability constraints impose a real challenge to the design flows and the designers, which often cannot be addressed properly without considering them explicitly within the placement. Many of the complex design methodologies follow an iterative approach, using placement several times in this process. Thus, placement runtime has a severe impact on the turnaround time in chip development. The major contributios of this thesis deal with the global placement, a common relaxation of the placement problem, which computes rough positions of the circuits minimizing the total length of wires to interconnect the. Based on the idea of subsequent quadratic netlength minimization and partitioning, as in BonnPlace [BrennerStruzynaVygen:2008], we present several new algorithms, generalized data structures and a completely new implementation of this top-down placement scheme. We introduce and formalize the concept of movebounds which are position constraints on subsets of cells. Movebounds, which can be regarded as mandatory or soft constraints, provide a mechanism to explicitly incorporate movement constraints to the placement which result from issues of timing, power and routability. With inclusive movebounds, such restrictions can be assigned to groups of circuits without any influence to other placeable objects. The other constraints, namely the exclusive movebounds, are of particular interest for semi-hierarchical approaches, as they can be used to obtain a flat view of the design and prevent cells from being placed into hierarchy units. Both provide a toolbox to the designer and allow the control of particular circuit sets without netlist manipulations. We also present a top-down partitioning scheme and extend the legalization algorithm of [BrennerVygen:2004] to be able to deal with millions of cells and dozens of movebounds efficiently. The presented algorithm can handle different types of overlapping movebounds, even in legalization, and produces significantly better results than a modern industrial tool. We present a novel partitioning algorithm for global placement. Unlike previous iterative and recursive approaches, the new method provides a global view of the problem using a novel MinCostFlow model with extremely fast and highly parallelizable local realization steps. The new flow-based partitioning can address density targets much more accurately and lowers the risk of density violations. The presented MinCostFlow model does not depend on the number of cells, making it highly interesting for large and huge designs. Moreover, the embedded flow structure responds to the chip's floorplan much better than the classical global partitioning approach. Another significant advantage of this algorithm is the fact that it can be applied to any initial placement and guarantees a feasible (fractional) solution (if one exists), improving the tool's reliability, even with movebounds and starting from placements with significant density violations. Using this method we can extend the congestion-driven placement to a combined movement, density adjustment, and cell size inflation approach. This method is able to handle movebounds and guarantees to resolve density overloads properly. Flow-based partitioning creates the opportunity of applying local, density unaware, optimization steps within global placement and allows it to break the strict recursive structure of levels and save runtime. The extended flexibility and runtime improvement are not the only advantages. The proposed flow realization, which is a combination of local quadratic programs and local partitioning, does not only yield a runtime improvement, but also seems to merge connectivity information to partitioning in a much better way than the old recursive partitioning approach. The new flow-based partitioning helps to significantly improve the results of our placement also in terms of netlength. We provide fast data structures for hierarchically clustered netlists and extend the net models Clique and Star to be applied within the clustered netlists efficiently. We show how shared-memory parallelization can be used for speeding up various routines in placement, without the loss of repeatability. In addition, we commit ourselves to the clustering problem, finding circuit groups which should be placed in the vicinity of each other. In order to provide global information for a fast bottom-up clustering, we propose to incorporate connectivity information using random walks. To this end, we show how the hitting times can be efficiently retrieved from large netlist hypergraphs. Due to the proposed model, parallel computation on sparse, shared-memory matrices can be used for computing hitting times to several targets simultaneously. Combined with a bottom-up clustering, even our preliminary approach significantly outperforms the popular BestChoice} algorithm [Nam et al. 2005]. We conclude this thesis by providing several experimental results on a large testbed of real-world chips and benchmarks demonstrating the performance of our tool. Without movebounds, our tool performs as good as a state-of-the-art force directed placer, but is more than 5x faster. We achieve the same speedup over the old BonnPlace, but produce significantly better results, on average more than 8%. With movebounds, our placements are more than 30% shorter compairing to the force-directed placer and our tool is 9x-20x faster. Our tool also produces the best results on the latest ISPD 2006 placement benchmarks