238 research outputs found

    DLWUC: Distance and Load Weight Updated Clustering-Based Clock Distribution for SOC Architecture

    Get PDF
    High-clock skew variations and degradation of driving ability of buffers lead to an additional power dissipation in Clock Distribution Network (CDN) that increases the dimensionality of buffers and coordination among flip-flops. The manual threshold level to predict the Region of Interest (ROI) is not applicable in clustering process due to the complexities of excessive wire length and critical delay. This paper proposes the Distance and Load Weight Updated Clustering (DLWUC) to determine the suitable position of logical components. Initially, the DLWUC utilizes the Hybrid Weighted Distance (HWD) to estimate the distance and construct the distance matrix. The weight value extracted from the sorted distance matrix facilitates the projection of buffers. The updated weight value serves as the base for clustering with labeled outputs. The placement of buffer at the suitable place from load weight updated clustering provides the necessary trade-off between clock provision and load balance. The DLWUC discussed in this paper reduces the size of buffers, skew, power and latency compared to the existing topologies

    Throughput-driven floorplanning with wire pipelining

    Get PDF
    The size of future high-performance SoC is such that the time-of-flight of wires connecting distant pins in the layout can be much higher than the clock period. In order to keep the frequency as high as possible, the wires may be pipelined. However, the insertion of flip-flops may alter the throughput of the system due to the presence of loops in the logic netlist. In this paper, we address the problem of floorplanning a large design where long interconnects are pipelined by inserting the throughput in the cost function of a tool based on simulated annealing. The results obtained on a series of benchmarks are then validated using a simple router that breaks long interconnects by suitably placing flip-flops along the wires

    Design methodologies for variation-aware integrated circuits

    Get PDF
    The scaling of VLSI technology has spurred a rapid growth in the semiconductor industry. With the CMOS device dimension scaling to and beyond 90nm technology, it is possible to achieve higher performance and to pack more complex functionalities on a single chip. However, the scaling trend has introduced drastic variation of process and design parameters, leading to severe variability of chip performance in nanometer regime. Also, the manufacturing community projects CMOS will scale for three to four more generations. Since the uncertainties due to variations are expected to increase in each generation, it will significantly impact the performance of design and consequently the yield. Another challenging issue in the nanometer IC design is the high power consumption due to the greater packing density, higher frequency of operation and excessive leakage power. Moreover, the circuits are usually over-designed to compensate for uncertainties due to variations. The over-designed circuits not only make timing closure difficult but also cause excessive power consumption. For portable electronics, excessive power consumption may reduce battery life; for non-portable systems it may impose great difficulties in cooling and packaging. The objective of my research has been to develop design methodologies to address variations and power dissipation for reliable circuit operation. The proposed work has been divided into three parts: the first part addresses the issues related with power/ground noise induced by clock distribution network and proposes techniques to reduce power/ground noise considering the effects of process variations. The second part proposes an elastic pipeline scheme for random circuits with feedback loops. The proposed scheme provides a low-power solution that has the same variation tolerance as the conventional approaches. The third section deals with discrete buffer and wire sizing for link-based non-tree clock network, which is an energy efficient structure for skew tolerance to variations. For the power/ground noise problem, our approach could reduce the peak current and the delay variations by 50% and 51% respectively. Compared to conventional approach, the elastic timing scheme reduces power dissipation by 20% − 27%. The sizing method achieves clock skew reduction of 45% with a small increase in power dissipation

    Timing Closure in Chip Design

    Get PDF
    Achieving timing closure is a major challenge to the physical design of a computer chip. Its task is to find a physical realization fulfilling the speed specifications. In this thesis, we propose new algorithms for the key tasks of performance optimization, namely repeater tree construction; circuit sizing; clock skew scheduling; threshold voltage optimization and plane assignment. Furthermore, a new program flow for timing closure is developed that integrates these algorithms with placement and clocktree construction. For repeater tree construction a new algorithm for computing topologies, which are later filled with repeaters, is presented. To this end, we propose a new delay model for topologies that not only accounts for the path lengths, as existing approaches do, but also for the number of bifurcations on a path, which introduce extra capacitance and thereby delay. In the extreme cases of pure power optimization and pure delay optimization the optimum topologies regarding our delay model are minimum Steiner trees and alphabetic code trees with the shortest possible path lengths. We presented a new, extremely fast algorithm that scales seamlessly between the two opposite objectives. For special cases, we prove the optimality of our algorithm. The efficiency and effectiveness in practice is demonstrated by comprehensive experimental results. The task of circuit sizing is to assign millions of small elementary logic circuits to elements from a discrete set of logically equivalent, predefined physical layouts such that power consumption is minimized and all signal paths are sufficiently fast. In this thesis we develop a fast heuristic approach for global circuit sizing, followed by a local search into a local optimum. Our algorithms use, in contrast to existing approaches, the available discrete layout choices and accurate delay models with slew propagation. The global approach iteratively assigns slew targets to all source pins of the chip and chooses a discrete layout of minimum size preserving the slew targets. In comprehensive experiments on real instances, we demonstrate that the worst path delay is within 7% of its lower bound on average after a few iterations. The subsequent local search reduces this gap to 2% on average. Combining global and local sizing we are able to size more than 5.7 million circuits within 3 hours. For the clock skew scheduling problem we develop the first algorithm with a strongly polynomial running time for the cycle time minimization in the presence of different cycle times and multi-cycle paths. In practice, an iterative local search method is much more efficient. We prove that this iterative method maximizes the worst slack, even when restricting the feasible schedule to certain time intervals. Furthermore, we enhance the iterative local approach to determine a lexicographically optimum slack distribution. The clock skew scheduling problem is then generalized to allow for simultaneous data path optimization. In fact, this is a time-cost tradeoff problem. We developed the first combinatorial algorithm for computing time-cost tradeoff curves in graphs that may contain cycles. Starting from the lowest-cost solution, the algorithm iteratively computes a descent direction by a minimum cost flow computation. The maximum feasible step length is then determined by a minimum ratio cycle computation. This approach can be used in chip design for several optimization tasks, e.g. threshold voltage optimization or plane assignment. Finally, the optimization routines are combined into a timing closure flow. Here, the global placement is alternated with global performance optimization. Netweights are used to penalize the length of critical nets during placement. After the global phase, the performance is improved further by applying more comprehensive optimization routines on the most critical paths. In the end, the clock schedule is optimized and clocktrees are inserted. Computational results of the design flow are obtained on real-world computer chips

    Clock tree synthesis for prescribed skew specifications

    Get PDF
    In ultra-deep submicron VLSI designs, clock network layout plays an increasingly important role in determining circuit performance including timing, power consumption, cost, power supply noise and tolerance to process variations. It is required that a clock layout algorithm can achieve any prescribed skews with the minimum wire length and acceptable slew rate. Traditional zero-skew clock routing methods are not adequate to address this demand, since they tend to yield excessive wire length for prescribed skew targets. The interactions among skew targets, sink location proximities and capacitive load balance are analyzed. Based on this analysis, a maximum delay-target ordering merging scheme is suggested to minimize wire and buffer area, which results in lesser cost, power consumption and vulnerability to process variations. During the clock routing, buffers are inserted simultaneously to facilitate a proper slew rate level and reduce wire snaking. The proposed algorithm is simple and fast for practical applications. Experimental results on benchmark circuits show that the algorithm can reduce the total wire and buffer capacitance by 60% over an extension of the existing zero-skew routing method

    High-performance and Low-power Clock Network Synthesis in the Presence of Variation.

    Full text link
    Semiconductor technology scaling requires continuous evolution of all aspects of physical design of integrated circuits. Among the major design steps, clock-network synthesis has been greatly affected by technology scaling, rendering existing methodologies inadequate. Clock routing was previously sufficient for smaller ICs, but design difficulty and structural complexity have greatly increased as interconnect delay and clock frequency increased in the 1990s. Since a clock network directly influences IC performance and often consumes a substantial portion of total power, both academia and industry developed synthesis methodologies to achieve low skew, low power and robustness from PVT variations. Nevertheless, clock network synthesis under tight constraints is currently the least automated step in physical design and requires significant manual intervention, undermining turn-around-time. The need for multi-objective optimization over a large parameter space and the increasing impact of process variation make clock network synthesis particularly challenging. Our work identifies new objectives, constraints and concerns in the clock-network synthesis for systems-on-chips and microprocessors. To address them, we generate novel clock-network structures and propose changes in traditional physical-design flows. We develop new modeling techniques and algorithms for clock power optimization subject to tight skew constraints in the presence of process variations. In particular, we offer SPICE-accurate optimizations of clock networks, coordinated to reduce nominal skew below 5 ps, satisfy slew constraints and trade-off skew, insertion delay and power, while tolerating variations. To broaden the scope of clock-network-synthesis optimizations, we propose new techniques and a methodology to reduce dynamic power consumption by 6.8%-11.6% for large IC designs with macro blocks by integrating clock network synthesis within global placement. We also present a novel non-tree topology that is 2.3x more power-efficient than mesh structures. We fuse several clock trees to create large-scale redundancy in a clock network to bridge the gap between tree-like and mesh-like topologies. Integrated optimization techniques for high-quality clock networks described in this dissertation strong empirical results in experiments with recent industry-released benchmarks in the presence of process variation. Our software implementations were recognized with the first-place awards at the ISPD 2009 and ISPD 2010 Clock-Network Synthesis Contests organized by IBM Research and Intel Research.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89711/1/ejdjsy_1.pd

    Advanced Timing and Synchronization Methodologies for Digital VLSI Integrated Circuits

    Get PDF
    This dissertation addresses timing and synchronization methodologies that are critical to the design, analysis and optimization of high-performance, integrated digital VLSI systems. As process sizes shrink and design complexities increase, achieving timing closure for digital VLSI circuits becomes a significant bottleneck in the integrated circuit design flow. Circuit designers are motivated to investigate and employ alternative methods to satisfy the timing and physical design performance targets. Such novel methods for the timing and synchronization of complex circuitry are developed in this dissertation and analyzed for performance and applicability.Mainstream integrated circuit design flow is normally tuned for zero clock skew, edge-triggered circuit design. Non-zero clock skew or multi-phase clock synchronization is seldom used because the lack of design automation tools increases the length and cost of the design cycle. For similar reasons, level-sensitive registers have not become an industry standard despite their superior size, speed and power consumption characteristics compared to conventional edge-triggered flip-flops.In this dissertation, novel design and analysis techniques that fully automate the design and analysis of non-zero clock skew circuits are presented. Clock skew scheduling of both edge-triggered and level-sensitive circuits are investigated in order to exploit maximum circuit performances. The effects of multi-phase clocking on non-zero clock skew, level-sensitive circuits are investigated leading to advanced synchronization methodologies. Improvements in the scalability of the computational timing analysis process with clock skew scheduling are explored through partitioning and parallelization.The integration of the proposed design and analysis methods to the physical design flow of integrated circuits synchronized with a next-generation clocking technology-resonant rotary clocking technology-is also presented. Based on the design and analysis methods presented in this dissertation, a computer-aided design tool for the design of rotary clock synchronized integrated circuits is developed

    VLSI design of high-speed adders for digital signal processing applications.

    Get PDF

    Clock routing for high performance microprocessor designs.

    Get PDF
    Tian, Haitong.Chinese abstract is on unnumbered page.Thesis (M.Phil.)--Chinese University of Hong Kong, 2011.Includes bibliographical references (p. 65-74).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivations --- p.1Chapter 1.2 --- Our Contributions --- p.2Chapter 1.3 --- Organization of the Thesis --- p.3Chapter 2 --- Background Study --- p.4Chapter 2.1 --- Traditional Clock Routing Problem --- p.4Chapter 2.2 --- Tree-Based Clock Routing Algorithms --- p.5Chapter 2.2.1 --- Clock Routing Using H-tree --- p.5Chapter 2.2.2 --- Method of Means and Medians(MMM) --- p.6Chapter 2.2.3 --- Geometric Matching Algorithm (GMA) --- p.8Chapter 2.2.4 --- Exact Zero-Skew Algorithm --- p.9Chapter 2.2.5 --- Deferred Merge Embedding (DME) --- p.10Chapter 2.2.6 --- Boundary Merging and Embedding (BME) Algorithm --- p.14Chapter 2.2.7 --- Planar Clock Routing Algorithm --- p.17Chapter 2.2.8 --- Useful-skew Tree Algorithm --- p.18Chapter 2.3 --- Non-Tree Clock Distribution Networks --- p.19Chapter 2.3.1 --- Grid (Mesh) Structure --- p.20Chapter 2.3.2 --- Spine Structure --- p.20Chapter 2.3.3 --- Hybrid Structure --- p.21Chapter 2.4 --- Post-grid Clock Routing Problem --- p.22Chapter 2.5 --- Limitations of the Previous Work --- p.24Chapter 3 --- Post-Grid Clock Routing Problem --- p.26Chapter 3.1 --- Introduction --- p.26Chapter 3.2 --- Problem Definition --- p.27Chapter 3.3 --- Our Approach --- p.30Chapter 3.3.1 --- Delay-driven Path Expansion Algorithm --- p.31Chapter 3.3.2 --- Pre-processing to Connect Critical ports --- p.34Chapter 3.3.3 --- Post-processing to Reduce Capacitance --- p.36Chapter 3.4 --- Experimental Results --- p.39Chapter 3.4.1 --- Experiment Setup --- p.39Chapter 3.4.2 --- Validations of the Delay and Slew Estimation --- p.39Chapter 3.4.3 --- Comparisons with the Tree Grow (TG) Approach --- p.41Chapter 3.4.4 --- Lowest Achievable Delays --- p.42Chapter 3.4.5 --- Simulation Results --- p.42Chapter 4 --- Non-tree Based Post-Grid Clock Routing Problem --- p.44Chapter 4.1 --- Introduction --- p.44Chapter 4.2 --- Handling Ports with Large Load Capacitances --- p.46Chapter 4.2.1 --- Problem Ports Identification --- p.47Chapter 4.2.2 --- Non-Tree Construction --- p.47Chapter 4.2.3 --- Wire Link Selection --- p.48Chapter 4.3 --- Path Expansion in Non-tree Algorithm --- p.51Chapter 4.4 --- Limitations of the Non-tree Algorithm --- p.51Chapter 4.5 --- Experimental Results --- p.51Chapter 4.5.1 --- Experiment Setup --- p.51Chapter 4.5.2 --- Validations of the Delay and Slew Estimation --- p.52Chapter 4.5.3 --- Lowest Achievable Delays --- p.53Chapter 4.5.4 --- Results on New Benchmarks --- p.53Chapter 4.5.5 --- Simulation Results --- p.55Chapter 5 --- Efficient Partitioning-based Extension --- p.57Chapter 5.1 --- Introduction --- p.57Chapter 5.2 --- Partition-based Extension --- p.58Chapter 5.3 --- Experimental Results --- p.61Chapter 5.3.1 --- Experiment Setup --- p.61Chapter 5.3.2 --- Running Time Improvement with Partitioning Technique --- p.61Chapter 6 --- Conclusion --- p.63Bibliography --- p.6

    Design and automation of voltage-scaled clock networks

    Get PDF
    In this dissertation, a vital step of VLSI physical design flow, synthesis of clock distribution networks, is investigated. Clock network synthesis (CNS) involves large and complex optimization problems to achieve high performance and low power demands of current integrated circuits (ICs). Ineffectiveness of existing methodologies to provide high performance at lower voltage nodes is the main driver for this dissertation research. A design and automation flow for voltage-scaled clock networks is proposed to satisfy tight timing constraints at high frequency (for high performance) and low voltage (for low power) operation. One implementation of voltage-scaled clock networks is low (voltage) swing clocking, which is a known technique, yet its applicability remains limited to designs with low performance demands. In this dissertation, novel methodologies are introduced to i) apply low swing clocking to legacy designs as a power saving methodology, ii) develop a complete CNS flow for low swing clocking of high performance ICs. These methodologies include slew-driven approaches that are better suited to future transistor and interconnect technologies. Second implementation of voltage-scaled clock networks is multi-voltage clocking, which is another known technique, yet its applicability remains limited to clock tree topology. In this dissertation, multi-voltage clocking with a clock mesh topology is investigated in order to address a missing aspect in the current IC design flows. Practical considerations of the current IC design flows are also investigated in this dissertation to expand the applicability of the proposed CNS flow. A novel methodology is introduced to facilitate clock gating within low swing clocking. The applicability of low swing clocking to FinFET technology, which is currently the industry norm, is shown to be effective.Ph.D., Electrical Engineering -- Drexel University, 201
    corecore