1,056 research outputs found
Timing Closure in Chip Design
Achieving timing closure is a major challenge to the physical design of a computer chip. Its task is to find a physical realization fulfilling the speed specifications. In this thesis, we propose new algorithms for the key tasks of performance optimization, namely repeater tree construction; circuit sizing; clock skew scheduling; threshold voltage optimization and plane assignment. Furthermore, a new program flow for timing closure is developed that integrates these algorithms with placement and clocktree construction. For repeater tree construction a new algorithm for computing topologies, which are later filled with repeaters, is presented. To this end, we propose a new delay model for topologies that not only accounts for the path lengths, as existing approaches do, but also for the number of bifurcations on a path, which introduce extra capacitance and thereby delay. In the extreme cases of pure power optimization and pure delay optimization the optimum topologies regarding our delay model are minimum Steiner trees and alphabetic code trees with the shortest possible path lengths. We presented a new, extremely fast algorithm that scales seamlessly between the two opposite objectives. For special cases, we prove the optimality of our algorithm. The efficiency and effectiveness in practice is demonstrated by comprehensive experimental results. The task of circuit sizing is to assign millions of small elementary logic circuits to elements from a discrete set of logically equivalent, predefined physical layouts such that power consumption is minimized and all signal paths are sufficiently fast. In this thesis we develop a fast heuristic approach for global circuit sizing, followed by a local search into a local optimum. Our algorithms use, in contrast to existing approaches, the available discrete layout choices and accurate delay models with slew propagation. The global approach iteratively assigns slew targets to all source pins of the chip and chooses a discrete layout of minimum size preserving the slew targets. In comprehensive experiments on real instances, we demonstrate that the worst path delay is within 7% of its lower bound on average after a few iterations. The subsequent local search reduces this gap to 2% on average. Combining global and local sizing we are able to size more than 5.7 million circuits within 3 hours. For the clock skew scheduling problem we develop the first algorithm with a strongly polynomial running time for the cycle time minimization in the presence of different cycle times and multi-cycle paths. In practice, an iterative local search method is much more efficient. We prove that this iterative method maximizes the worst slack, even when restricting the feasible schedule to certain time intervals. Furthermore, we enhance the iterative local approach to determine a lexicographically optimum slack distribution. The clock skew scheduling problem is then generalized to allow for simultaneous data path optimization. In fact, this is a time-cost tradeoff problem. We developed the first combinatorial algorithm for computing time-cost tradeoff curves in graphs that may contain cycles. Starting from the lowest-cost solution, the algorithm iteratively computes a descent direction by a minimum cost flow computation. The maximum feasible step length is then determined by a minimum ratio cycle computation. This approach can be used in chip design for several optimization tasks, e.g. threshold voltage optimization or plane assignment. Finally, the optimization routines are combined into a timing closure flow. Here, the global placement is alternated with global performance optimization. Netweights are used to penalize the length of critical nets during placement. After the global phase, the performance is improved further by applying more comprehensive optimization routines on the most critical paths. In the end, the clock schedule is optimized and clocktrees are inserted. Computational results of the design flow are obtained on real-world computer chips
Recommended from our members
Cross-Layer Pathfinding for Off-Chip Interconnects
Off-chip interconnects for integrated circuits (ICs) today induce a diverse design space, spanning many different applications that require transmission of data at various bandwidths, latencies and link lengths. Off-chip interconnect design solutions are also variously sensitive to system performance, power and cost metrics, while also having a strong impact on these metrics. The costs associated with off-chip interconnects include die area, package (PKG) and printed circuit board (PCB) area, technology and bill of materials (BOM). Choices made regarding off-chip interconnects are fundamental to product definition, architecture, design implementation and technology enablement. Given their cross-layer impact, it is imperative that a cross-layer approach be employed to architect and analyze off-chip interconnects up front, so that a top-down design flow can comprehend the cross-layer impacts and correctly assess the system performance, power and cost tradeoffs for off-chip interconnects. Chip architects are not exposed to all the tradeoffs at the physical and circuit implementation or technology layers, and often lack the tools to accurately assess off-chip interconnects. Furthermore, the collaterals needed for a detailed analysis are often lacking when the chip is architected; these include circuit design and layout, PKG and PCB layout, and physical floorplan and implementation. To address the need for a framework that enables architects to assess the system-level impact of off-chip interconnects, this thesis presents power-area-timing (PAT) models for off-chip interconnects, optimization and planning tools with the appropriate abstraction using these PAT models, and die/PKG/PCB co-design methods that help expose the off-chip interconnect cross-layer metrics to the die/PKG/PCB design flows. Together, these models, tools and methods enable cross-layer optimization that allows for a top-down definition and exploration of the design space and helps converge on the correct off-chip interconnect implementation and technology choice. The tools presented cover off-chip memory interfaces for mobile and server products, silicon photonic interfaces, 2.5D silicon interposers and 3D through-silicon vias (TSVs). The goal of the cross-layer framework is to assess the key metrics of the interconnect (such as timing, latency, active/idle/sleep power, and area/cost) at an appropriate level of abstraction by being able to do this across layers of the design flow. In additional to signal interconnect, this thesis also explores the need for such cross-layer pathfinding for power distribution networks (PDN), where the system-on-chip (SoC) floorplan and pinmap must be optimized before the collateral layouts for PDN analysis are ready. Altogether, the developed cross-layer pathfinding methodology for off-chip interconnects enables more rapid and thorough exploration of a vast design space of off-chip parallel and serial links, inter-die and inter-chiplet links and silicon photonics. Such exploration will pave the way for off-chip interconnect technology enablement that is optimized for system needs. The basis of the framework can be extended to cover other interconnect technology as well, since it fundamentally relates to system-level metrics that are common to all off-chip interconnects
The IPS fidelity scale as a guideline to implement Supported Employment
info:eu-repo/semantics/publishe
Recommended from our members
Improved Physical Design for Manufacturing Awareness and Advanced VLSI
Increasing challenges arise with each new semiconductor technology node, especially in advanced nodes, where the industry tries to extract every ounce of benefit as it approaches the limits of physics, through manufacturing-aware design technology co-optimization and design-based equivalent scaling. The increasing complexity of design and process technologies, and ever-more complex design rules, also become hurdles for academic researchers, separating academic researchers from the most up-to-date technical issues.This thesis presents innovative methodologies and optimizations to address the above challenges. There are three directions in this thesis: (i) manufacturing-aware design technology co-optimization; (ii) advanced node design-based equivalent scaling; and (iii) an open source academic detailed routing flow.To realize manufacturing-aware design technology co-optimization, this thesis presents two works: (i) a multi-row detailed placement optimization for neighbor diffusion effect mitigation between neighboring standard cells; and (ii) a post-routing optimization to generate 2D block mask layout for dummy segment removal in self-aligned multiple patterning.To achieve advanced node design-based equivalent scaling, this thesis presents two improved physical design methodologies: (i) a post-placement flop tray generation approach for clock power reduction; and (ii) a detailed placement approach to exploit inter-row M1 routing for congestion and wirelength reduction.To address the increasing gap between academia and industry, this thesis presents two works toward an open source academic detailed routing flow: (i) a complete, robust, scalable and design ruleaware dynamic programming-based pin access analysis framework; and (ii) TritonRoute – the open source detailed router that is capable of delivering DRC-clean detailed routing solutions in advanced nodes.This thesis concludes with a summary of its contributions and open directions for future research
Efficient quadratic placement for FPGAs.
Field Programmable Gate Arrays (FPGAs) are widely used in industry because they can implement any digital circuit on site simply by specifying programmable logic and their interconnections. However, this rapid prototyping advantage may be adversely affected because of the long compile time, which is dominated by placement and routing. This issue is of great importance, especially as the logic capacities of FPGAs continue to grow. This thesis focuses on the placement phase of FPGA Computer Aided Design (CAD) flow and presents a fast, high quality, wirelength-driven placement algorithm for FPGAs that is based on the quadratic placement approach. In this thesis, multiple iterations of equation solving process together with a linear wirelength reduction technique are introduced. The proposed algorithm efficiently handles the main problems with the quadratic placement algorithm and produces a fast and high quality placement. Experimental results, using twenty benchmark circuits, show that this algorithm can achieve comparable total wirelength and, on average, 5X faster run time when compared to an existing, state-of-the-art placement tool. This thesis also shows that the proposed algorithm delivers promising preliminary results in minimizing the critical path delay while maintaining high placement quality.Dept. of Electrical and Computer Engineering. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .X86. Source: Masters Abstracts International, Volume: 44-04, page: 1946. Thesis (M.A.Sc.)--University of Windsor (Canada), 2005
On the Use of Directed Moves for Placement in VLSI CAD
Search-based placement methods have long been used for placing integrated circuits targeting the field programmable gate array (FPGA) and standard cell design styles. Such methods offer the potential for high-quality solutions but often come at the cost of long run-times compared to alternative methods.
This dissertation examines strategies for enhancing local search heuristics---and in particular, simulated annealing---through the application of directed moves. These moves help to guide a search-based optimizer by focusing efforts on states which are most likely to yield productive improvement, effectively pruning the size of the search space.
The engineering theory and implementation details of directed moves are discussed in the context of both field programmable gate array and standard cell designs. This work explores the ways in which such moves can be used to improve the quality of FPGA placements, improve the robustness of floorplan repair and legalization methods for mixed-size standard cell designs, and enhance the quality of detailed placement for standard cell circuits. The analysis presented herein confirms the validity and efficacy of directed moves, and supports the use of such heuristics within various optimization frameworks
FPGA Architecture Optimization Using Geometric Programming
Volume 4 No 13 of the periodical Progression. Published November, February, May and August by The Radiant Healing Centre. SPCL PER BT 732 P76 V.1,1932-V.5,193
Early analysis of VLSI systems with packaging considerations
There is an explosive growth in the size of the VLSI (Very Large Scale Integration) systems today. Microelectronic system designers are packing millions of transistors in a single IC chip. Packaging techniques like Multi-chip module (MCM) and flip-chip bonding offer faster interconnects and IC\u27s capable of accommodating a larger number of inputs and outputs. The complexity of today\u27s designs and the availability of advanced packaging techniques call for an early analysis of the system based on estimation of system parameters to select from a wide choice of circuit partitioning, architecture alternatives and packaging options which give the best cost/performance.
A procedure for the early analysis of VLSI systems under packaging considerations has been developed and implemented in this dissertation work. The early analysis tool was used to evaluate the inter-relationship between partitioning and packaging and to determine the best system design considering cost, size and delays. The functional unit level description of a 750,000-transistor MicroSparc processor was studied using an exhaustive search technique. The early analysis performed on the MicroSparc design suggested that the three chip multi-chip design using flip-chip IC\u27s interconnected on a MCM-D substrate is the most cost effective. An early bond pitch analysis performed using the tool concluded that a 250-micron bond pitch is the best choice for the multi-chip MicroSparc designs. The tool was also used to perform an early cache analysis which showed that the use of separate memory and logic processes made it feasible to design the MicroSparc design with larger cache sizes than the use of a combined logic and memory process. The designs based on the separate processes gave equivalent or better performance than the design candidates with smaller cache sizes. Future extensions of the procedure are also outlined here
Algorithms for Circuit Sizing in VLSI Design
One of the key problems in the physical design of computer chips, also known as integrated circuits, consists of choosing a physical layout for the logic gates and memory circuits (registers) on the chip. The layouts have a high influence on the power consumption and area of the chip and the delay of signal paths. A discrete set of predefined layouts for each logic function and register type with different physical properties is given by a library. One of the most influential characteristics of a circuit defined by the layout is its size. In this thesis we present new algorithms for the problem of choosing sizes for the circuits and its continuous relaxation, and evaluate these in theory and practice. A popular approach is based on Lagrangian relaxation and projected subgradient methods. We show that seemingly heuristic modifications that have been proposed for this approach can be theoretically justified by applying the well-known multiplicative weights algorithm. Subsequently, we propose a new model for the sizing problem as a min-max resource sharing problem. In our context, power consumption and signal delays are represented by resources that are distributed to customers. Under certain assumptions we obtain a polynomial time approximation for the continuous relaxation of the sizing problem that improves over the Lagrangian relaxation based approach. The new resource sharing algorithm has been implemented as part of the BonnTools software package which is developed at the Research Institute for Discrete Mathematics at the University of Bonn in cooperation with IBM. Our experiments on the ISPD 2013 benchmarks and state-of-the-art microprocessor designs provided by IBM illustrate that the new algorithm exhibits more stable convergence behavior compared to a Lagrangian relaxation based algorithm. Additionally, better timing and reduced power consumption was achieved on almost all instances. A subproblem of the new algorithm consists of finding sizes minimizing a weighted sum of power consumption and signal delays. We describe a method that approximates the continuous relaxation of this problem in polynomial time under certain assumptions. For the discrete problem we provide a fully polynomial approximation scheme under certain assumptions on the topology of the chip. Finally, we present a new algorithm for timing-driven optimization of registers. Their sizes and locations on a chip are usually determined during the clock network design phase, and remain mostly unchanged afterwards although the timing criticalities on which they were based can change. Our algorithm permutes register positions and sizes within so-called clusters without impairing the clock network such that it can be applied late in a design flow. Under mild assumptions, our algorithm finds an optimal solution which maximizes the worst cluster slack. It is implemented as part of the BonnTools and improves timing of registers on state-of-the-art microprocessor designs by up to 7.8% of design cycle time. </div
- …