1,144 research outputs found

    An efficient and optimal algorithm for simultaneous buffer and wire sizing

    Full text link

    Fast and exact simultaneous gate and wire sizing by Lagrangian relaxation

    Full text link

    Variability-Aware VLSI Design Automation For Nanoscale Technologies

    Get PDF
    As technology scaling enters the nanometer regime, design of large scale ICs gets more challenging due to shrinking feature sizes and increasing design complexity. Aggressive scaling causes significant degradation in reliability, increased susceptibility to fabrication and environmental randomness and increased dynamic and leakage power dissipation. In this work, we investigate these scaling issues in large scale integrated systems. This dissertation proposes to develop variability-aware design methodologies by proposing design analysis, design-time optimization, post-silicon tunability and runtime-adaptivity based optimization techniques for handling variability. We discuss our research in the area of variability-aware analysis, specifically focusing on the problem of statistical timing analysis. The first technique presents the concept of error budgeting that achieves significant runtime speedups during statistical timing analysis. The second work presents a general framework for non-linear non-Gaussian statistical timing analysis considering correlations. Further, we present our work on design-time optimization schemes that are applicable during physical synthesis. Firstly, we present a buffer insertion technique that considers wire-length uncertainty and proposes algorithms to perform probabilistic buffer insertion. Secondly, we present a stochastic optimization framework based on Monte-Carlo technique considering fabrication variability. This optimization framework can be applied to problems that can be modeled as linear programs without without imposing any assumptions on the nature of the variability. Subsequently, we present our work on post-silicon tunability based design optimization. This work presents a design management framework that can be used to balance the effort spent on pre-silicon (through gate sizing) and post-silicon optimization (through tunable clock-tree buffers) while maximizing the yield gains. Lastly, we present our work on variability-aware runtime optimization techniques. We look at the problem of runtime supply voltage scaling for dynamic power optimization, and propose a framework to consider the impact of variability on the reliability of such designs. We propose a probabilistic design synthesis technique where reliability of the design is a primary optimization metric

    Algorithms for Circuit Sizing in VLSI Design

    Get PDF
    One of the key problems in the physical design of computer chips, also known as integrated circuits, consists of choosing a  physical layout  for the logic gates and memory circuits (registers) on the chip. The layouts have a high influence on the power consumption and area of the chip and the delay of signal paths.  A discrete set of predefined layouts  for each logic function and register type with different physical properties is given by a library. One of the most influential characteristics of a circuit defined by the layout is its size. In this thesis we present new algorithms for the problem of choosing sizes for the circuits and its continuous relaxation,  and  evaluate these in theory and practice. A popular approach is based on Lagrangian relaxation and projected subgradient methods. We show that seemingly heuristic modifications that have been proposed for this approach can be theoretically justified by applying the well-known multiplicative weights algorithm. Subsequently, we propose a new model for the sizing problem as a min-max resource sharing problem. In our context, power consumption and signal delays are represented by resources that are distributed to customers. Under certain assumptions we obtain a polynomial time approximation for the continuous relaxation of the sizing problem that improves over the Lagrangian relaxation based approach. The new resource sharing algorithm has been implemented as part of the BonnTools software package which is developed at the Research Institute for Discrete Mathematics at the University of Bonn in cooperation with IBM. Our experiments on the ISPD 2013 benchmarks and state-of-the-art microprocessor designs provided by IBM illustrate that the new algorithm exhibits more stable convergence behavior compared to a Lagrangian relaxation based algorithm. Additionally, better timing and reduced power consumption was achieved on almost all instances. A subproblem of the new algorithm consists of finding sizes minimizing a weighted sum of power consumption and signal delays. We describe a method that approximates the continuous relaxation of this problem in polynomial time under certain assumptions. For the discrete problem we provide a fully polynomial approximation scheme under certain assumptions on the topology of the chip. Finally, we present a new algorithm for timing-driven optimization of registers. Their sizes and locations on a chip are usually determined during the clock network design phase, and remain mostly unchanged afterwards although the timing criticalities on which they were based can change. Our algorithm permutes register positions and sizes within so-called  clusters  without impairing the clock network such that it can be applied late in a design flow. Under mild assumptions, our algorithm finds an optimal solution which maximizes the worst cluster slack. It is implemented as part of the BonnTools and improves timing of registers on state-of-the-art microprocessor designs by up to 7.8% of design cycle time. </div

    Algorithms for the scaling toward nanometer VLSI physical synthesis

    Get PDF
    Along the history of Very Large Scale Integration (VLSI), we have successfully scaled down the size of transistors, scaled up the speed of integrated circuits (IC) and the number of transistors in a chip - these are just a few examples of our achievement in VLSI scaling. It is projected to enter the nanometer (timing estimation and buffer planning for global routing and other early stages such as floorplanning. A novel path based buffer insertion scheme is also included, which can overcome the weakness of the net based approaches. Part-2 Circuit clustering techniques with the application in Field-Programmable Gate Array (FPGA) technology mapping The problem of timing driven n-way circuit partitioning with application to FPGA technology mapping is studied and a hierarchical clustering approach is presented for the latest multi-level FPGA architectures. Moreover, a more general delay model is included in order to accurately characterize the delay behavior of the clusters and circuit elements

    Timing Closure in Chip Design

    Get PDF
    Achieving timing closure is a major challenge to the physical design of a computer chip. Its task is to find a physical realization fulfilling the speed specifications. In this thesis, we propose new algorithms for the key tasks of performance optimization, namely repeater tree construction; circuit sizing; clock skew scheduling; threshold voltage optimization and plane assignment. Furthermore, a new program flow for timing closure is developed that integrates these algorithms with placement and clocktree construction. For repeater tree construction a new algorithm for computing topologies, which are later filled with repeaters, is presented. To this end, we propose a new delay model for topologies that not only accounts for the path lengths, as existing approaches do, but also for the number of bifurcations on a path, which introduce extra capacitance and thereby delay. In the extreme cases of pure power optimization and pure delay optimization the optimum topologies regarding our delay model are minimum Steiner trees and alphabetic code trees with the shortest possible path lengths. We presented a new, extremely fast algorithm that scales seamlessly between the two opposite objectives. For special cases, we prove the optimality of our algorithm. The efficiency and effectiveness in practice is demonstrated by comprehensive experimental results. The task of circuit sizing is to assign millions of small elementary logic circuits to elements from a discrete set of logically equivalent, predefined physical layouts such that power consumption is minimized and all signal paths are sufficiently fast. In this thesis we develop a fast heuristic approach for global circuit sizing, followed by a local search into a local optimum. Our algorithms use, in contrast to existing approaches, the available discrete layout choices and accurate delay models with slew propagation. The global approach iteratively assigns slew targets to all source pins of the chip and chooses a discrete layout of minimum size preserving the slew targets. In comprehensive experiments on real instances, we demonstrate that the worst path delay is within 7% of its lower bound on average after a few iterations. The subsequent local search reduces this gap to 2% on average. Combining global and local sizing we are able to size more than 5.7 million circuits within 3 hours. For the clock skew scheduling problem we develop the first algorithm with a strongly polynomial running time for the cycle time minimization in the presence of different cycle times and multi-cycle paths. In practice, an iterative local search method is much more efficient. We prove that this iterative method maximizes the worst slack, even when restricting the feasible schedule to certain time intervals. Furthermore, we enhance the iterative local approach to determine a lexicographically optimum slack distribution. The clock skew scheduling problem is then generalized to allow for simultaneous data path optimization. In fact, this is a time-cost tradeoff problem. We developed the first combinatorial algorithm for computing time-cost tradeoff curves in graphs that may contain cycles. Starting from the lowest-cost solution, the algorithm iteratively computes a descent direction by a minimum cost flow computation. The maximum feasible step length is then determined by a minimum ratio cycle computation. This approach can be used in chip design for several optimization tasks, e.g. threshold voltage optimization or plane assignment. Finally, the optimization routines are combined into a timing closure flow. Here, the global placement is alternated with global performance optimization. Netweights are used to penalize the length of critical nets during placement. After the global phase, the performance is improved further by applying more comprehensive optimization routines on the most critical paths. In the end, the clock schedule is optimized and clocktrees are inserted. Computational results of the design flow are obtained on real-world computer chips

    Interconnect tree optimization algorithm in nanometer very large scale integration designs

    Get PDF
    This thesis proposes a graph-based maze routing and buffer insertion algorithm for nanometer Very Large Scale Integration (VLSI) layout designs. The algorithm is called Hybrid Routing Tree and Buffer insertion with Look-Ahead (HRTB-LA). In recent VLSI designs, interconnect delay becomes a dominant factor compared to gate delay. The well-known technique to minimize the interconnect delay is by inserting buffers along the interconnect wires. In conventional buffer insertion algorithms, the buffers are inserted on the fixed routing paths. However, in a modern design, there are macro blocks that prohibit any buffer insertion in their respective area. Most of the conventional buffer insertion algorithms do not consider these obstacles. In the presence of buffer obstacles, post routing algorithm may produce poor solution. On the other hand, simultaneous routing and buffer insertion algorithm offers a better solution, but it was proven to be NP-complete. Besides timing performance, power dissipation of the inserted buffers is another metric that needs to be optimized. Research has shown that power dissipation overhead due to buffer insertions is significantly high. In other words, interconnect delay and power dissipation move in opposite directions. Although many methodologies to optimize timing performance with power constraint have been proposed, no algorithm is based on grid graph technique. Hence, the main contribution of this thesis is an efficient algorithm using a hybrid approach for multi-constraint optimization in multi-terminal nets. The algorithm uses dynamic programming to compute the interconnect delay and power dissipation of the inserted buffers incrementally, while an effective runtime is achieved with the aid of novel look-ahead and graph pruning schemes. Experimental results prove that HRTB-LA is able to handle multi-constraint optimizations and produces up to 47% better solution compared to a post routing buffer insertion algorithm in comparable runtime
    corecore