7 research outputs found

    Interconnect delay estimation models

    Get PDF
    With the continuous scaling down of very large scale integrated (VLSI) technologies and increased die size, the transistors are much smaller, and hence much faster. On the other hand, interconnects are narrower. So they are more resistive and slower in transmitting signals. This trend has led the interconnect delay to become a significant factor in determining the performance of VLSI designs. As the die size becomes larger, global interconnect length becomes longer. Thus, global interconnect delay is beginning to dominate a larger portion of the overall system performance. In order to take the impact of interconnect delay into account, it is very important to have computationally inexpensive and accurate interconnect delay models. The primary contribution of this thesis is to present two new interconnect delay models, called the Fitted Elmore Delay (FED) and the Improved Effective Capacitance Metric (IECM). The FED model is a simple, efficient and reasonably accurate interconnect performance estimation model. This model uses a curve fitting technique to approximate the accurate Hspice delay data. The functional form used in the curve fitting is derived based on the Elmore Delay (ED) model. Thus, our model has all the advantages of the Elmore Delay model. It has a closed form expression as simple as the ED model and is extremely efficient to compute. More importantly, it is significantly more accurate than the ED model. In fact, because of its striking similarity to the ED model, optimization of the delay with respect to the design parameters can be easily done. When applied to interconnect optimization techniques (i.e., wire sizing), the FED model is three to four times more accurate than the Elmore Delay model. On the other hand, like the ED model, the FED has the limitation of ignoring the resistive shielding problem. This problem occurs when the gate no longer sees the total net capacitance due to the high interconnect resistance. The Improved Effective Capacitance Metric (IECM) overcomes the resistive shielding problem. We adopt the methodology of computing the first three Taylor series coefficients of the driving-point admittance in the circuit. The IECM uses these Taylor coefficients to derive a closed form solution for the effective capacitance that captures the resistive shielding characteristics. The IECM can be implemented with similar simplicity as the Elmore Delay model. We have tested the IECM on a single-load circuit and multiple tree topologies. Experiments show that our model is significantly more accurate than other existing interconnect delay models in capturing the resistive shielding characteristics

    Bounded-skew clock and steiner routing under elmore delay

    No full text
    Abstract: We study the minimum-cost bounded-skew routing tree problem under the Elmore delay model. We present two approaches to construct bounded-skew routing trees: (i) the Boundary Merging and Embedding (BME) method which utilizes merging points that are restricted to theboundariesof merging regions, and (ii) the Interior Merging and Embedding (IME) algorithm which employs a sampling strategy and dynamic programming to consider merging points that are interior to, rather than on the boundary of, the merging regions. Our new algorithms allow accurate control of Elmore delay skew, and show the utility of merging points inside merging regions.

    Analysis and optimization of VLSI Clock Distribution Networks for skew variability reduction

    Get PDF
    As VLSI technology moves into the Ultra-Deep Sub-Micron (UDSM) era, manufacturing variations, power supply noise and temperature variations greatly affect the performance and yield of VLSI circuits. Clock Distribution Network (CDN), which is one of the biggest and most important nets in any synchronous VLSI chip, is especially sensitive to these variations. To address this problem variability-aware analysis and optimization techniques for VLSI circuits are needed. In the first part of this thesis an analytical bound for the unwanted skew due to interconnect variation is established. Experimental results show that this bound is safer, tighter and computationally faster than existing approaches. This bound could be used in variation-aware clock tree synthesis.The second part of the thesis deals with optimizing a given clock tree to minimize the unwanted skew variations. Non-tree CDNs have been recognized as a promising approach to overcome the variation problem. We propose a novel non-tree CDN obtained by adding cross links in an existing clock tree. We analyze the effect of the link insertion on clock skew variability and propose link insertion schemes. The non-tree CDNs so obtained are shown to be highly tolerant to skew variability with very little increase in total wire-length. This can be used in applications such as ASIC design where a significant increase in the total wire-length is unacceptable

    Case Studies on Clock Gating and Local Routign for VLSI Clock Mesh

    Get PDF
    The clock is the important synchronizing element in all synchronous digital systems. The difference in the clock arrival time between sink points is called the clock skew. This uncertainty in arrival times will limit operating frequency and might cause functional errors. Various clock routing techniques can be broadly categorized into 'balanced tree' and 'fixed mesh' methods. The skew and delay using the balanced tree method is higher compared to the fixed mesh method. Although fixed mesh inherently uses more wire length, the redundancy created by loops in a mesh structure reduces undesired delay variations. The fixed mesh method uses a single mesh over the entire chip but it is hard to introduce clock gating in a single clock mesh. This thesis deals with the introduction of 'reconfigurability' by using control structures like transmission gates between sub-clock meshes, thus enabling clock gating in clock mesh. By using the optimum value of size for PMOS and NMOS of transmission gate (SZF) and optimum number of transmission gates between sub-clock meshes (NTG) for 4x4 reconfigurable mesh, the average of the maximum skew for all benchmarks is reduced by 18.12 percent compared to clock mesh structure when no transmission gates are used between the sub-clock meshes (reconfigurable mesh with NTG =0). Further, the research deals with a ‘modified zero skew method' to connect synchronous flip-flops or sink points in the circuit to the clock grids of clock mesh. The wire length reduction algorithms can be applied to reduce the wire length used for a local clock distribution network. The modified version of ‘zero skew method’ of local clock routing which is based on Elmore delay balancing aims at minimizing wire length for the given bounded skew of CDN using clock mesh and H-tree. The results of ‘modified zero skew method' (HC_MZSK) show average local wire length reduction of 17.75 percent for all ISPD benchmarks compared to direct connection method. The maximum skew is small for HC_MZSK in most of the test cases compared to other methods of connections like direct connections and modified AHHK. Thus, HC_MZSK for local routing reduces the wire length and maximum skew

    Clock routing for high performance microprocessor designs.

    Get PDF
    Tian, Haitong.Chinese abstract is on unnumbered page.Thesis (M.Phil.)--Chinese University of Hong Kong, 2011.Includes bibliographical references (p. 65-74).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivations --- p.1Chapter 1.2 --- Our Contributions --- p.2Chapter 1.3 --- Organization of the Thesis --- p.3Chapter 2 --- Background Study --- p.4Chapter 2.1 --- Traditional Clock Routing Problem --- p.4Chapter 2.2 --- Tree-Based Clock Routing Algorithms --- p.5Chapter 2.2.1 --- Clock Routing Using H-tree --- p.5Chapter 2.2.2 --- Method of Means and Medians(MMM) --- p.6Chapter 2.2.3 --- Geometric Matching Algorithm (GMA) --- p.8Chapter 2.2.4 --- Exact Zero-Skew Algorithm --- p.9Chapter 2.2.5 --- Deferred Merge Embedding (DME) --- p.10Chapter 2.2.6 --- Boundary Merging and Embedding (BME) Algorithm --- p.14Chapter 2.2.7 --- Planar Clock Routing Algorithm --- p.17Chapter 2.2.8 --- Useful-skew Tree Algorithm --- p.18Chapter 2.3 --- Non-Tree Clock Distribution Networks --- p.19Chapter 2.3.1 --- Grid (Mesh) Structure --- p.20Chapter 2.3.2 --- Spine Structure --- p.20Chapter 2.3.3 --- Hybrid Structure --- p.21Chapter 2.4 --- Post-grid Clock Routing Problem --- p.22Chapter 2.5 --- Limitations of the Previous Work --- p.24Chapter 3 --- Post-Grid Clock Routing Problem --- p.26Chapter 3.1 --- Introduction --- p.26Chapter 3.2 --- Problem Definition --- p.27Chapter 3.3 --- Our Approach --- p.30Chapter 3.3.1 --- Delay-driven Path Expansion Algorithm --- p.31Chapter 3.3.2 --- Pre-processing to Connect Critical ports --- p.34Chapter 3.3.3 --- Post-processing to Reduce Capacitance --- p.36Chapter 3.4 --- Experimental Results --- p.39Chapter 3.4.1 --- Experiment Setup --- p.39Chapter 3.4.2 --- Validations of the Delay and Slew Estimation --- p.39Chapter 3.4.3 --- Comparisons with the Tree Grow (TG) Approach --- p.41Chapter 3.4.4 --- Lowest Achievable Delays --- p.42Chapter 3.4.5 --- Simulation Results --- p.42Chapter 4 --- Non-tree Based Post-Grid Clock Routing Problem --- p.44Chapter 4.1 --- Introduction --- p.44Chapter 4.2 --- Handling Ports with Large Load Capacitances --- p.46Chapter 4.2.1 --- Problem Ports Identification --- p.47Chapter 4.2.2 --- Non-Tree Construction --- p.47Chapter 4.2.3 --- Wire Link Selection --- p.48Chapter 4.3 --- Path Expansion in Non-tree Algorithm --- p.51Chapter 4.4 --- Limitations of the Non-tree Algorithm --- p.51Chapter 4.5 --- Experimental Results --- p.51Chapter 4.5.1 --- Experiment Setup --- p.51Chapter 4.5.2 --- Validations of the Delay and Slew Estimation --- p.52Chapter 4.5.3 --- Lowest Achievable Delays --- p.53Chapter 4.5.4 --- Results on New Benchmarks --- p.53Chapter 4.5.5 --- Simulation Results --- p.55Chapter 5 --- Efficient Partitioning-based Extension --- p.57Chapter 5.1 --- Introduction --- p.57Chapter 5.2 --- Partition-based Extension --- p.58Chapter 5.3 --- Experimental Results --- p.61Chapter 5.3.1 --- Experiment Setup --- p.61Chapter 5.3.2 --- Running Time Improvement with Partitioning Technique --- p.61Chapter 6 --- Conclusion --- p.63Bibliography --- p.6

    Synthesis Methodologies for Robust and Reconfigurable Clock Networks

    Get PDF
    In today\u27s aggressively scaled technology nodes, billions of transistors are packaged into a single integrated circuit. Electronic Design Automation (EDA) tools are needed to automatically assemble the transistors into a functioning system. One of the most important design steps in the physical synthesis is the design of the clock network. The clock network delivers a synchronizing clock signal to each sequential element. The clock signal is required to be delivered meeting timing constraints under variations and in multiple operating modes. Synthesizing such clock networks is becoming increasingly difficult with the complex power management methodologies and severe manufacturing variations. Clock network synthesis is an important problem because it has a direct impact on the functional correctness, the maximum operating frequency, and the overall power consumption of each synchronous integrated circuit. In this dissertation, we proposed synthesis methodologies for robust and reconfigurable clock networks. We have made three contributions to this topic. First, we have proposed a clock network optimization framework that can achieve better timing quality than previous frameworks. Our proposed framework improves timing quality by reducing the propagation delay on critical paths in a clock network using buffer sizing and layer assignment. Second, we have proposed a clock tree synthesis methodology that integrates the clock tree synthesis with the clock tree optimization. The methodology improves timing quality by avoiding to synthesize clock trees with topologies that are sensitive to variations. Third, we have proposed a clock network that can reconfigure the topology based on the active mode of operation. Lastly, we conclude the dissertation with future research directions
    corecore