886 research outputs found

    The predictor-adaptor paradigm : automation of custom layout by flexible design

    Get PDF

    A complete design path for the layout of flexible macros

    Get PDF
    XIV+172hlm.;24c

    Some Applications of the Weighted Combinatorial Laplacian

    Get PDF
    The weighted combinatorial Laplacian of a graph is a symmetric matrix which is the discrete analogue of the Laplacian operator. In this thesis, we will study a new application of this matrix to matching theory yielding a new characterization of factor-criticality in graphs and matroids. Other applications are from the area of the physical design of very large scale integrated circuits. The placement of the gates includes the minimization of a quadratic form given by a weighted Laplacian. A method based on the dual constrained subgradient method is proposed to solve the simultaneous placement and gate-sizing problem. A crucial step of this method is the projection to the flow space of an associated graph, which can be performed by minimizing a quadratic form given by the unweighted combinatorial Laplacian.Andwendungen der gewichteten kombinatorischen Laplace-Matrix Die gewichtete kombinatorische Laplace-Matrix ist das diskrete Analogon des Laplace-Operators. In dieser Arbeit stellen wir eine neuartige Charakterisierung von Faktor-KritikalitĂ€t von Graphen und Matroiden mit Hilfe dieser Matrix vor. Wir untersuchen andere Anwendungen im Bereich des Entwurfs von höchstintegrierten Schaltkreisen. Die Platzierung basiert auf der Minimierung einer quadratischen Form, die durch eine gewichtete kombinatorische Laplace-Matrix gegeben ist. Wir prĂ€sentieren einen Algorithmus fĂŒr das allgemeine simultane Platzierungs- und GattergrĂ¶ĂŸen-Optimierungsproblem, der auf der dualen Subgradientenmethode basiert. Ein wichtiger Bestandteil dieses Verfahrens ist eine Projektion auf den Flussraum eines assoziierten Graphen, die als die Minimierung einer durch die Laplace-Matrix gegebenen quadratischen Form aufgefasst werden kann

    Algorithms for the scaling toward nanometer VLSI physical synthesis

    Get PDF
    Along the history of Very Large Scale Integration (VLSI), we have successfully scaled down the size of transistors, scaled up the speed of integrated circuits (IC) and the number of transistors in a chip - these are just a few examples of our achievement in VLSI scaling. It is projected to enter the nanometer (timing estimation and buffer planning for global routing and other early stages such as floorplanning. A novel path based buffer insertion scheme is also included, which can overcome the weakness of the net based approaches. Part-2 Circuit clustering techniques with the application in Field-Programmable Gate Array (FPGA) technology mapping The problem of timing driven n-way circuit partitioning with application to FPGA technology mapping is studied and a hierarchical clustering approach is presented for the latest multi-level FPGA architectures. Moreover, a more general delay model is included in order to accurately characterize the delay behavior of the clusters and circuit elements

    Timing Closure in Chip Design

    Get PDF
    Achieving timing closure is a major challenge to the physical design of a computer chip. Its task is to find a physical realization fulfilling the speed specifications. In this thesis, we propose new algorithms for the key tasks of performance optimization, namely repeater tree construction; circuit sizing; clock skew scheduling; threshold voltage optimization and plane assignment. Furthermore, a new program flow for timing closure is developed that integrates these algorithms with placement and clocktree construction. For repeater tree construction a new algorithm for computing topologies, which are later filled with repeaters, is presented. To this end, we propose a new delay model for topologies that not only accounts for the path lengths, as existing approaches do, but also for the number of bifurcations on a path, which introduce extra capacitance and thereby delay. In the extreme cases of pure power optimization and pure delay optimization the optimum topologies regarding our delay model are minimum Steiner trees and alphabetic code trees with the shortest possible path lengths. We presented a new, extremely fast algorithm that scales seamlessly between the two opposite objectives. For special cases, we prove the optimality of our algorithm. The efficiency and effectiveness in practice is demonstrated by comprehensive experimental results. The task of circuit sizing is to assign millions of small elementary logic circuits to elements from a discrete set of logically equivalent, predefined physical layouts such that power consumption is minimized and all signal paths are sufficiently fast. In this thesis we develop a fast heuristic approach for global circuit sizing, followed by a local search into a local optimum. Our algorithms use, in contrast to existing approaches, the available discrete layout choices and accurate delay models with slew propagation. The global approach iteratively assigns slew targets to all source pins of the chip and chooses a discrete layout of minimum size preserving the slew targets. In comprehensive experiments on real instances, we demonstrate that the worst path delay is within 7% of its lower bound on average after a few iterations. The subsequent local search reduces this gap to 2% on average. Combining global and local sizing we are able to size more than 5.7 million circuits within 3 hours. For the clock skew scheduling problem we develop the first algorithm with a strongly polynomial running time for the cycle time minimization in the presence of different cycle times and multi-cycle paths. In practice, an iterative local search method is much more efficient. We prove that this iterative method maximizes the worst slack, even when restricting the feasible schedule to certain time intervals. Furthermore, we enhance the iterative local approach to determine a lexicographically optimum slack distribution. The clock skew scheduling problem is then generalized to allow for simultaneous data path optimization. In fact, this is a time-cost tradeoff problem. We developed the first combinatorial algorithm for computing time-cost tradeoff curves in graphs that may contain cycles. Starting from the lowest-cost solution, the algorithm iteratively computes a descent direction by a minimum cost flow computation. The maximum feasible step length is then determined by a minimum ratio cycle computation. This approach can be used in chip design for several optimization tasks, e.g. threshold voltage optimization or plane assignment. Finally, the optimization routines are combined into a timing closure flow. Here, the global placement is alternated with global performance optimization. Netweights are used to penalize the length of critical nets during placement. After the global phase, the performance is improved further by applying more comprehensive optimization routines on the most critical paths. In the end, the clock schedule is optimized and clocktrees are inserted. Computational results of the design flow are obtained on real-world computer chips

    Flow-based Partitioning and Fast Global Placement in Chip Design

    Get PDF
    VLSI placement is one of the major steps in the chip design process and an interesting subject of research in industry and academia. Recent chips consist of several millions of circuits connected by millions of nets. The classical placement objective of finding positions for circuits and minimizing netlength among them is an ongoing issue in optimization of chip performance. The increasing instance sizes, the tightness of timing and routability constraints impose a real challenge to the design flows and the designers, which often cannot be addressed properly without considering them explicitly within the placement. Many of the complex design methodologies follow an iterative approach, using placement several times in this process. Thus, placement runtime has a severe impact on the turnaround time in chip development. The major contributios of this thesis deal with the global placement, a common relaxation of the placement problem, which computes rough positions of the circuits minimizing the total length of wires to interconnect the. Based on the idea of subsequent quadratic netlength minimization and partitioning, as in BonnPlace [BrennerStruzynaVygen:2008], we present several new algorithms, generalized data structures and a completely new implementation of this top-down placement scheme. We introduce and formalize the concept of movebounds which are position constraints on subsets of cells. Movebounds, which can be regarded as mandatory or soft constraints, provide a mechanism to explicitly incorporate movement constraints to the placement which result from issues of timing, power and routability. With inclusive movebounds, such restrictions can be assigned to groups of circuits without any influence to other placeable objects. The other constraints, namely the exclusive movebounds, are of particular interest for semi-hierarchical approaches, as they can be used to obtain a flat view of the design and prevent cells from being placed into hierarchy units. Both provide a toolbox to the designer and allow the control of particular circuit sets without netlist manipulations. We also present a top-down partitioning scheme and extend the legalization algorithm of [BrennerVygen:2004] to be able to deal with millions of cells and dozens of movebounds efficiently. The presented algorithm can handle different types of overlapping movebounds, even in legalization, and produces significantly better results than a modern industrial tool. We present a novel partitioning algorithm for global placement. Unlike previous iterative and recursive approaches, the new method provides a global view of the problem using a novel MinCostFlow model with extremely fast and highly parallelizable local realization steps. The new flow-based partitioning can address density targets much more accurately and lowers the risk of density violations. The presented MinCostFlow model does not depend on the number of cells, making it highly interesting for large and huge designs. Moreover, the embedded flow structure responds to the chip's floorplan much better than the classical global partitioning approach. Another significant advantage of this algorithm is the fact that it can be applied to any initial placement and guarantees a feasible (fractional) solution (if one exists), improving the tool's reliability, even with movebounds and starting from placements with significant density violations. Using this method we can extend the congestion-driven placement to a combined movement, density adjustment, and cell size inflation approach. This method is able to handle movebounds and guarantees to resolve density overloads properly. Flow-based partitioning creates the opportunity of applying local, density unaware, optimization steps within global placement and allows it to break the strict recursive structure of levels and save runtime. The extended flexibility and runtime improvement are not the only advantages. The proposed flow realization, which is a combination of local quadratic programs and local partitioning, does not only yield a runtime improvement, but also seems to merge connectivity information to partitioning in a much better way than the old recursive partitioning approach. The new flow-based partitioning helps to significantly improve the results of our placement also in terms of netlength. We provide fast data structures for hierarchically clustered netlists and extend the net models Clique and Star to be applied within the clustered netlists efficiently. We show how shared-memory parallelization can be used for speeding up various routines in placement, without the loss of repeatability. In addition, we commit ourselves to the clustering problem, finding circuit groups which should be placed in the vicinity of each other. In order to provide global information for a fast bottom-up clustering, we propose to incorporate connectivity information using random walks. To this end, we show how the hitting times can be efficiently retrieved from large netlist hypergraphs. Due to the proposed model, parallel computation on sparse, shared-memory matrices can be used for computing hitting times to several targets simultaneously. Combined with a bottom-up clustering, even our preliminary approach significantly outperforms the popular BestChoice} algorithm [Nam et al. 2005]. We conclude this thesis by providing several experimental results on a large testbed of real-world chips and benchmarks demonstrating the performance of our tool. Without movebounds, our tool performs as good as a state-of-the-art force directed placer, but is more than 5x faster. We achieve the same speedup over the old BonnPlace, but produce significantly better results, on average more than 8%. With movebounds, our placements are more than 30% shorter compairing to the force-directed placer and our tool is 9x-20x faster. Our tool also produces the best results on the latest ISPD 2006 placement benchmarks
    • 

    corecore