583 research outputs found

    Performance and power optimization in VLSI physical design

    Get PDF
    As VLSI technology enters the nanoscale regime, a great amount of efforts have been made to reduce interconnect delay. Among them, buffer insertion stands out as an effective technique for timing optimization. A dramatic rise in on-chip buffer density has been witnessed. For example, in two recent IBM ASIC designs, 25% gates are buffers. In this thesis, three buffer insertion algorithms are presented for the procedure of performance and power optimization. The second chapter focuses on improving circuit performance under inductance effect. The new algorithm works under the dynamic programming framework and runs in provably linear time for multiple buffer types due to two novel techniques: restrictive cost bucketing and efficient delay update. The experimental results demonstrate that our linear time algorithm consistently outperforms all known RLC buffering algorithms in terms of both solution quality and runtime. That is, the new algorithm uses fewer buffers, runs in shorter time and the buffered tree has better timing. The third chapter presents a method to guarantee a high fidelity signal transmission in global bus. It proposes a new redundant via insertion technique to reduce via variation and signal distortion in twisted differential line. In addition, a new buffer insertion technique is proposed to synchronize the transmitted signals, thus further improving the effectiveness of the twisted differential line. Experimental results demonstrate a 6GHz signal can be transmitted with high fidelity using the new approaches. In contrast, only a 100MHz signal can be reliably transmitted using a single-end bus with power/ground shielding. Compared to conventional twisted differential line structure, our new techniques can reduce the magnitude of noise by 45% as witnessed in our simulation. The fourth chapter proposes a buffer insertion and gate sizing algorithm for million plus gates. The algorithm takes a combinational circuit as input instead of individual nets and greatly reduces the buffer and gate cost of the entire circuit. The algorithm has two main features: 1) A circuit partition technique based on the criticality of the primary inputs, which provides the scalability for the algorithm, and 2) A linear programming formulation of non-linear delay versus cost tradeoff, which formulates the simultaneous buffer insertion and gate sizing into linear programming problem. Experimental results on ISCAS85 circuits show that even without the circuit partition technique, the new algorithm achieves 17X speedup compared with path based algorithm. In the meantime, the new algorithm saves 16.0% buffer cost, 4.9% gate cost, 5.8% total cost and results in less circuit delay

    Concurrent optimization strategies for high-performance VLSI circuits

    Get PDF
    In the next generation of VLSI circuits, concurrent optimizations will be essential to achieve the performance challenges. In this dissertation, we present techniques for combining traditional timing optimization techniques to achieve a superior performance;The method of buffer insertion is used in timing optimization to either increase the driving power of a path in a circuit, or to isolate large capacitive loads that lie on noncritical or less critical paths. The procedure of transistor sizing selects the sizes of transistors within a circuit to achieve a given timing specification. Traditional design techniques perform these two optimizations as independent steps during synthesis, even though they are intimately linked and performing them in alternating steps is liable to lead to suboptimal solutions. The first part of this thesis presents a new approach for unifying transistor sizing with buffer insertion. Our algorithm achieve from 5% to 49% area reduction compared with the results of a standard transistor sizing algorithm;The next part of the thesis deals with the problem of collapsing gates for technology mapping. Two new techniques are proposed. The first method, the odd-level transistor replacement (OTR) method, performs technology mapping without the restriction of a fixed library size, and maps a circuit to a virtual library of complex static CMOS gates. The second technique, the Static CMOS/PTL method, uses a mix of static CMOS and pass transistor logic (PTL) to realize the circuit, using the relation between PTL and binary decision diagrams. The methods are very efficient and can handle all ISCAS\u2785 benchmark circuits in minutes. On average, it was found that the OTR method gave 40%, and the Static/PTL gave 50% delay reductions over SIS, with substantial area savings;Finally, we extend the technology mapping work to interleave it with placement in a single optimization. Conventional methods that perform these steps separately will not be adequate for next-generation circuits. Our approach presents an integrated solution to this problem, and shows an average of 28.19%, and a maximum of 78.42% improvement in the delay over a method that performs the two optimizations in separate steps

    Physical design algorithms for asynchronous circuits

    Get PDF
    Asynchronous designs have been demonstrated to be able to achieve both higher performance and lower power compared with their synchronous counterparts. It provides a very promising solution to the emerging challenges in advanced technology. However, due to the lack of proper EDA tool support, the design cycle for asynchronous circuits is much longer compared with the one for synchronous circuits. Thus, even with many advantages, asynchronous circuits are still not the mainstream in the industry. In this thesis, we provides several algorithms to resolve the emerging issues for the physical design of asynchronous circuits. Our proposed algorithms optimize asynchronous circuits using placement, gate sizing, repeater insertion and pipeline buffer insertion techniques. An incremental maximum cycle ratio algorithm is also proposed to speed up the timing analysis of asynchronous circuits

    Variability-Aware VLSI Design Automation For Nanoscale Technologies

    Get PDF
    As technology scaling enters the nanometer regime, design of large scale ICs gets more challenging due to shrinking feature sizes and increasing design complexity. Aggressive scaling causes significant degradation in reliability, increased susceptibility to fabrication and environmental randomness and increased dynamic and leakage power dissipation. In this work, we investigate these scaling issues in large scale integrated systems. This dissertation proposes to develop variability-aware design methodologies by proposing design analysis, design-time optimization, post-silicon tunability and runtime-adaptivity based optimization techniques for handling variability. We discuss our research in the area of variability-aware analysis, specifically focusing on the problem of statistical timing analysis. The first technique presents the concept of error budgeting that achieves significant runtime speedups during statistical timing analysis. The second work presents a general framework for non-linear non-Gaussian statistical timing analysis considering correlations. Further, we present our work on design-time optimization schemes that are applicable during physical synthesis. Firstly, we present a buffer insertion technique that considers wire-length uncertainty and proposes algorithms to perform probabilistic buffer insertion. Secondly, we present a stochastic optimization framework based on Monte-Carlo technique considering fabrication variability. This optimization framework can be applied to problems that can be modeled as linear programs without without imposing any assumptions on the nature of the variability. Subsequently, we present our work on post-silicon tunability based design optimization. This work presents a design management framework that can be used to balance the effort spent on pre-silicon (through gate sizing) and post-silicon optimization (through tunable clock-tree buffers) while maximizing the yield gains. Lastly, we present our work on variability-aware runtime optimization techniques. We look at the problem of runtime supply voltage scaling for dynamic power optimization, and propose a framework to consider the impact of variability on the reliability of such designs. We propose a probabilistic design synthesis technique where reliability of the design is a primary optimization metric

    An efficient and optimal algorithm for simultaneous buffer and wire sizing

    Full text link

    Timing-driven hierarchical global routing with wire-sizing and buffer-insertion for VLSI with multi-routing-layer

    Full text link

    Interconnect delay modeling under exponential input

    Get PDF
    Interconnect has become the dominating factor in determining the performance of VLSI deep submicron designs. With the rapid shrinking of feature size and development in the process technologies, it has been observed that the resistance per unit length of the interconnect continues to increase, capacitance per unit length remains roughly constant, and transistor or gate delay continues to decrease. This had led to the increasing dominance of interconnect delay over logic delay, and this trend is expected to continue. With this being the main bottleneck in realizing high speed circuits, complete understanding of the interconnect delay and thereby efficient and accurate delay circulation has assumed a greater significance in physical design, optimization and fast verification. In this thesis, a interconnect delay model under exponential input is presented. Because of its simple closed form expression, fast computation speed, and fidelity with respect to simulation, Elmore delay model remains popular. More accurate delay computation methods are typically central processing unit intensive and/or difficult to implement. To bridge this gap between accuracy and efficiency/simplicity, a new RC delay metric for interconnects which is as efficient as the Elmore metric, but more accurate, is proposed. However, there is no interconnect delay model considering exponential input waveform existing in the literature. The proposed Exponential Delay Metric uses exponential waveform as input and captures resistive shielding effects by modeling the downstream by a [pi]-model. An application of the delay model to perform interconnect optimization using wire sizing is also presented. Experimental results show that the proposed delay model is significantly more accurate than the existing interconnect delay models

    Interconnect performance estimation models for design planning

    Full text link

    Timing Closure in Chip Design

    Get PDF
    Achieving timing closure is a major challenge to the physical design of a computer chip. Its task is to find a physical realization fulfilling the speed specifications. In this thesis, we propose new algorithms for the key tasks of performance optimization, namely repeater tree construction; circuit sizing; clock skew scheduling; threshold voltage optimization and plane assignment. Furthermore, a new program flow for timing closure is developed that integrates these algorithms with placement and clocktree construction. For repeater tree construction a new algorithm for computing topologies, which are later filled with repeaters, is presented. To this end, we propose a new delay model for topologies that not only accounts for the path lengths, as existing approaches do, but also for the number of bifurcations on a path, which introduce extra capacitance and thereby delay. In the extreme cases of pure power optimization and pure delay optimization the optimum topologies regarding our delay model are minimum Steiner trees and alphabetic code trees with the shortest possible path lengths. We presented a new, extremely fast algorithm that scales seamlessly between the two opposite objectives. For special cases, we prove the optimality of our algorithm. The efficiency and effectiveness in practice is demonstrated by comprehensive experimental results. The task of circuit sizing is to assign millions of small elementary logic circuits to elements from a discrete set of logically equivalent, predefined physical layouts such that power consumption is minimized and all signal paths are sufficiently fast. In this thesis we develop a fast heuristic approach for global circuit sizing, followed by a local search into a local optimum. Our algorithms use, in contrast to existing approaches, the available discrete layout choices and accurate delay models with slew propagation. The global approach iteratively assigns slew targets to all source pins of the chip and chooses a discrete layout of minimum size preserving the slew targets. In comprehensive experiments on real instances, we demonstrate that the worst path delay is within 7% of its lower bound on average after a few iterations. The subsequent local search reduces this gap to 2% on average. Combining global and local sizing we are able to size more than 5.7 million circuits within 3 hours. For the clock skew scheduling problem we develop the first algorithm with a strongly polynomial running time for the cycle time minimization in the presence of different cycle times and multi-cycle paths. In practice, an iterative local search method is much more efficient. We prove that this iterative method maximizes the worst slack, even when restricting the feasible schedule to certain time intervals. Furthermore, we enhance the iterative local approach to determine a lexicographically optimum slack distribution. The clock skew scheduling problem is then generalized to allow for simultaneous data path optimization. In fact, this is a time-cost tradeoff problem. We developed the first combinatorial algorithm for computing time-cost tradeoff curves in graphs that may contain cycles. Starting from the lowest-cost solution, the algorithm iteratively computes a descent direction by a minimum cost flow computation. The maximum feasible step length is then determined by a minimum ratio cycle computation. This approach can be used in chip design for several optimization tasks, e.g. threshold voltage optimization or plane assignment. Finally, the optimization routines are combined into a timing closure flow. Here, the global placement is alternated with global performance optimization. Netweights are used to penalize the length of critical nets during placement. After the global phase, the performance is improved further by applying more comprehensive optimization routines on the most critical paths. In the end, the clock schedule is optimized and clocktrees are inserted. Computational results of the design flow are obtained on real-world computer chips

    Performance and power optimization in VLSI physical design

    Get PDF
    As VLSI technology enters the nanoscale regime, a great amount of efforts have been made to reduce interconnect delay. Among them, buffer insertion stands out as an effective technique for timing optimization. A dramatic rise in on-chip buffer density has been witnessed. For example, in two recent IBM ASIC designs, 25% gates are buffers. In this thesis, three buffer insertion algorithms are presented for the procedure of performance and power optimization. The second chapter focuses on improving circuit performance under inductance effect. The new algorithm works under the dynamic programming framework and runs in provably linear time for multiple buffer types due to two novel techniques: restrictive cost bucketing and efficient delay update. The experimental results demonstrate that our linear time algorithm consistently outperforms all known RLC buffering algorithms in terms of both solution quality and runtime. That is, the new algorithm uses fewer buffers, runs in shorter time and the buffered tree has better timing. The third chapter presents a method to guarantee a high fidelity signal transmission in global bus. It proposes a new redundant via insertion technique to reduce via variation and signal distortion in twisted differential line. In addition, a new buffer insertion technique is proposed to synchronize the transmitted signals, thus further improving the effectiveness of the twisted differential line. Experimental results demonstrate a 6GHz signal can be transmitted with high fidelity using the new approaches. In contrast, only a 100MHz signal can be reliably transmitted using a single-end bus with power/ground shielding. Compared to conventional twisted differential line structure, our new techniques can reduce the magnitude of noise by 45% as witnessed in our simulation. The fourth chapter proposes a buffer insertion and gate sizing algorithm for million plus gates. The algorithm takes a combinational circuit as input instead of individual nets and greatly reduces the buffer and gate cost of the entire circuit. The algorithm has two main features: 1) A circuit partition technique based on the criticality of the primary inputs, which provides the scalability for the algorithm, and 2) A linear programming formulation of non-linear delay versus cost tradeoff, which formulates the simultaneous buffer insertion and gate sizing into linear programming problem. Experimental results on ISCAS85 circuits show that even without the circuit partition technique, the new algorithm achieves 17X speedup compared with path based algorithm. In the meantime, the new algorithm saves 16.0% buffer cost, 4.9% gate cost, 5.8% total cost and results in less circuit delay
    • …
    corecore