10 research outputs found
Improved algorithms for link-based non-tree clock networks for skew variability reduction
In the nanometer VLSI technology, the variation effects like manufacturing variation, power supply noise, temperature etc. become very significant. As one of the most vital nets in any synchronous VLSI chip, the Clock Distribution Network (CDN) is especially sensitive to these variations. Recently proposed link-based non-tree [1] addresses this problem by constructing a non-tree that is significantly more tolerant to variations when compared to a clock tree. Although the two algorithms proposed in [1] are effective in reducing the skew variability, they have a few drawbacks including high com-plexity, lengthy links and uneven link distribution across the clock network. In this paper, we propose two new algorithms that can overcome these disadvantages. The effectiveness of the proposed algorithms has been validated using HSPICE based Monte Carlo simulations. Experimental results show that the new algorithms are able to achieve the same or better skew reduction with an average of 5 % wire length increase when compared to the 15 % wire length increase of the existing algorithms in [1]. Moreover, the new algorithms scale extremely well to big clock networks, i.e., the bigger the clock network, the less overall link cost (less than 2 % for the biggest benchmark we have)
Case Studies on Clock Gating and Local Routign for VLSI Clock Mesh
The clock is the important synchronizing element in all synchronous digital systems. The difference in the clock arrival time between sink points is called the clock skew. This uncertainty in arrival times will limit operating frequency and might cause functional errors.
Various clock routing techniques can be broadly categorized into 'balanced tree' and 'fixed mesh' methods. The skew and delay using the balanced tree method is higher compared to the fixed mesh method. Although fixed mesh inherently uses more wire length, the redundancy created by loops in a mesh structure reduces undesired delay variations. The fixed mesh method uses a single mesh over the entire chip but it is hard to introduce clock gating in a single clock mesh. This thesis deals with the introduction of 'reconfigurability' by using control structures like transmission gates between sub-clock meshes, thus enabling clock gating in clock mesh. By using the optimum value of size for PMOS and NMOS of transmission gate (SZF) and optimum number of transmission gates between sub-clock meshes (NTG) for 4x4 reconfigurable mesh, the average of the maximum skew for all benchmarks is reduced by 18.12 percent compared to clock mesh structure when no transmission gates are used between the sub-clock meshes (reconfigurable mesh with NTG =0).
Further, the research deals with a ‘modified zero skew method' to connect synchronous flip-flops or sink points in the circuit to the clock grids of clock mesh. The wire length reduction algorithms can be applied to reduce the wire length used for a local clock distribution network. The modified version of ‘zero skew method’ of local clock routing which is based on Elmore delay balancing aims at minimizing wire length for the given bounded skew of CDN using clock mesh and H-tree. The results of ‘modified zero skew method' (HC_MZSK) show average local wire length reduction of 17.75 percent for all ISPD benchmarks compared to direct connection method. The maximum skew is small for HC_MZSK in most of the test cases compared to other methods of connections like direct connections and modified AHHK. Thus, HC_MZSK for local routing reduces the wire length and maximum skew
Reducing Clock Skew Variability via Cross Links
Increasingly significant variational e#ects present a great challenge for delivering desired clock skew reliably. Non-tree clock network has been recognized as a promising approach to overcome the variation problem. Existing non-tree clock routing methods are restricted to a few simple or regular structures, and often consume excessive amount of wires. In this paper, we suggest to construct a low cost non-tree clock network by inserting cross links in a given clock tree. The e#ect of the link insertion on clock skew variability is analyzed. Based on the analysis, two link insertion schemes are proposed. These methods can quickly convert a clock tree to a non-tree with significantly lower skew variability and very small amount of extra wires. Further, they can be applied to the recently popular non-zero skew routing easily. Experimental results on benchmark circuits show that this approach can achieve significant skew variability reduction with less than 2% increase of wirelength
Design methodologies for variation-aware integrated circuits
The scaling of VLSI technology has spurred a rapid growth in the semiconductor
industry. With the CMOS device dimension scaling to and beyond 90nm technology,
it is possible to achieve higher performance and to pack more complex functionalities
on a single chip. However, the scaling trend has introduced drastic variation of
process and design parameters, leading to severe variability of chip performance in
nanometer regime. Also, the manufacturing community projects CMOS will scale for
three to four more generations. Since the uncertainties due to variations are expected
to increase in each generation, it will significantly impact the performance of design
and consequently the yield.
Another challenging issue in the nanometer IC design is the high power consumption
due to the greater packing density, higher frequency of operation and excessive
leakage power. Moreover, the circuits are usually over-designed to compensate for
uncertainties due to variations. The over-designed circuits not only make timing closure
difficult but also cause excessive power consumption. For portable electronics,
excessive power consumption may reduce battery life; for non-portable systems it
may impose great difficulties in cooling and packaging.
The objective of my research has been to develop design methodologies to address
variations and power dissipation for reliable circuit operation. The proposed work
has been divided into three parts: the first part addresses the issues related with
power/ground noise induced by clock distribution network and proposes techniques to reduce power/ground noise considering the effects of process variations. The second
part proposes an elastic pipeline scheme for random circuits with feedback loops. The
proposed scheme provides a low-power solution that has the same variation tolerance
as the conventional approaches. The third section deals with discrete buffer and wire
sizing for link-based non-tree clock network, which is an energy efficient structure for
skew tolerance to variations.
For the power/ground noise problem, our approach could reduce the peak current
and the delay variations by 50% and 51% respectively. Compared to conventional
approach, the elastic timing scheme reduces power dissipation by 20% − 27%. The
sizing method achieves clock skew reduction of 45% with a small increase in power
dissipation
Reducing clock skew variability via cross links
Abstract — Increasingly significant variational effects present a great challenge for delivering desired clock skew reliably. Non-tree clock network has been recognized as a promising approach to overcome the variation problem. Existing non-tree clock routing methods are restricted to a few simple or regular structures, and often consume excessive amount of wirelength. In this paper, we suggest to construct a low cost non-tree clock network by inserting cross links in a given clock tree. The effects of the link insertion on clock skew variability are analyzed. Based on the analysis, we propose two link insertion schemes that can quickly convert a clock tree to a non-tree with significantly lower skew variability and very limited wirelength increase. In these schemes, the complicated non-tree delay computation is circumvented. Further, they can be applied to the recently popular non-zero skew routing easily. The effectiveness of the proposed techniques has been validated through SPICE based Monte Carlo simulations. I
Layout optimization in ultra deep submicron VLSI design
As fabrication technology keeps advancing, many deep submicron (DSM) effects have become
increasingly evident and can no longer be ignored in Very Large Scale Integration
(VLSI) design. In this dissertation, we study several deep submicron problems (eg. coupling
capacitance, antenna effect and delay variation) and propose optimization techniques
to mitigate these DSM effects in the place-and-route stage of VLSI physical design.
The place-and-route stage of physical design can be further divided into several steps:
(1) Placement, (2) Global routing, (3) Layer assignment, (4) Track assignment, and (5) Detailed
routing. Among them, layer/track assignment assigns major trunks of wire segments
to specific layers/tracks in order to guide the underlying detailed router. In this dissertation,
we have proposed techniques to handle coupling capacitance at the layer/track assignment
stage, antenna effect at the layer assignment, and delay variation at the ECO (Engineering
Change Order) placement stage, respectively. More specifically, at layer assignment, we
have proposed an improved probabilistic model to quickly estimate the amount of coupling
capacitance for timing optimization. Antenna effects are also handled at layer assignment
through a linear-time tree partitioning algorithm. At the track assignment stage, timing is
further optimized using a graph based technique. In addition, we have proposed a novel
gate splitting methodology to reduce delay variation in the ECO placement considering
spatial correlations. Experimental results on benchmark circuits showed the effectiveness
of our approaches
Fast interconnect optimization
As the continuous trend of Very Large Scale Integration (VLSI) circuits technology
scaling and frequency increases, delay optimization techniques for interconnect
are increasingly important for achieving timing closure of high performance designs.
For the gigahertz microprocessor and multi-million gate ASIC designs it is crucial to
have fast algorithms in the design automation tools for many classical problems in
the field to shorten time to market of the VLSI chip. This research presents algorithmic
techniques and constructive models for two such problems: (1) Fast buffer
insertion for delay optimization, (2) Wire sizing for delay optimization and variation
minimization on non-tree networks.
For the buffer insertion problem, this dissertation proposes several innovative
speedup techniques for different problem formulations and the realistic requirement.
For the basic buffer insertion problem, an O(n log2 n) optimal algorithm that runs
much faster than the previous classical van GinnekenÂs O(n2) algorithm is proposed,
where n is the number of buffer positions. For modern design libraries that contain
hundreds of buffers, this research also proposes an optimal algorithm in O(bn2) time
for b buffer types, a significant improvement over the previous O(b2n2) algorithm
by Lillis, Cheng and Lin. For nets with small numbers of sinks and large numbers
of buffer positions, a simple O(mn) optimal algorithm is proposed, where m is the
number of sinks. For the buffer insertion with minimum cost problem, the problem is first proved to be NP-complete. Then several optimal and approximation techniques
are proposed to further speed up the buffer insertion algorithm with resource control
for big industrial designs.
For the wire sizing problem, we propose a systematic method to size the wires of
general non-tree RC networks. The new method can be used for delay optimization
and variation reduction
High-performance and Low-power Clock Network Synthesis in the Presence of Variation.
Semiconductor technology scaling requires continuous evolution of all aspects of physical
design of integrated circuits. Among the major design steps, clock-network synthesis
has been greatly affected by technology scaling, rendering existing methodologies inadequate.
Clock routing was previously sufficient for smaller ICs, but design difficulty and
structural complexity have greatly increased as interconnect delay and clock frequency increased
in the 1990s. Since a clock network directly influences IC performance and often
consumes a substantial portion of total power, both academia and industry developed synthesis
methodologies to achieve low skew, low power and robustness from PVT variations.
Nevertheless, clock network synthesis under tight constraints is currently the least automated
step in physical design and requires significant manual intervention, undermining
turn-around-time. The need for multi-objective optimization over a large parameter space
and the increasing impact of process variation make clock network synthesis particularly
challenging.
Our work identifies new objectives, constraints and concerns in the clock-network synthesis
for systems-on-chips and microprocessors. To address them, we generate novel
clock-network structures and propose changes in traditional physical-design flows. We
develop new modeling techniques and algorithms for clock power optimization subject
to tight skew constraints in the presence of process variations. In particular, we offer
SPICE-accurate optimizations of clock networks, coordinated to reduce nominal skew below
5 ps, satisfy slew constraints and trade-off skew, insertion delay and power, while
tolerating variations. To broaden the scope of clock-network-synthesis optimizations, we
propose new techniques and a methodology to reduce dynamic power consumption by
6.8%-11.6% for large IC designs with macro blocks by integrating clock network synthesis
within global placement. We also present a novel non-tree topology that is 2.3x more
power-efficient than mesh structures. We fuse several clock trees to create large-scale redundancy
in a clock network to bridge the gap between tree-like and mesh-like topologies.
Integrated optimization techniques for high-quality clock networks described in this dissertation
strong empirical results in experiments with recent industry-released benchmarks
in the presence of process variation. Our software implementations were recognized with
the first-place awards at the ISPD 2009 and ISPD 2010 Clock-Network Synthesis Contests
organized by IBM Research and Intel Research.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89711/1/ejdjsy_1.pd
Cross link insertion for variation driven clock network construction.
Clock skew caused by variation is one of the most important problems in clock network synthesis today. Even if a clock network is designed to have zero skew, variation such as capacitive load and power supply will cause differences in arrival time of a clock signal. Non-tree clock network is considered to be an effective way to address the skew variation problem. Due to its inherent redundancy, clock mesh is very tolerant to variation. However, it costs much excessive amount of power compared to a clock tree. Link based non-tree clock network is an economic way to reduce clock skew caused by variation. Instead of using a dense mesh, only a number of links are inserted into a tree, so the power increase is small. Several existing works focus on the effect of cross link as well as the construction of such cross link structure. However, it is still not very clear where cross links should be inserted to achieve the most clock skew reduction with small wire resources. In this thesis, we propose a new method using linear program to solve this problem. In our approach, clock skew in a non-tree clock network is computed using an idea of load redistribution and non-tree decomposition. The delay information obtained is then used to select the node pairs for cross link insertion. Our methodology tries to insert cross links where skew can be reduced most effectively. Our method also considers tradeoff between cross link length and skew reduction effect. We compare our result with the most similar work on this problem [1] and a recent work [4] which inserts links between internal nodes of a tree. Experiments show that our method can reduce skew under variation effectively. We achieve 28% clock skew reduction with only 40% link resources.Qian, Fuqiang.Thesis (M.Phil.)--Chinese University of Hong Kong, 2012.Includes bibliographical references (leaves 51-55).Abstract --- p.iAcknowledgement --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Clock Distribution Network --- p.1Chapter 1.2 --- Our Contributions --- p.6Chapter 1.3 --- Organization of the Thesis --- p.8Chapter 2 --- Literature Review --- p.9Chapter 2.1 --- Exact Zero Skew --- p.9Chapter 2.2 --- DME Algorithm --- p.11Chapter 2.3 --- Combinatorial Algorithms for Fast Clock Mesh Optimization --- p.12Chapter 2.4 --- MeshWorks: An Efficient Framework for Planning, Synthesis and Optimization of Clock Mesh Networks --- p.14Chapter 2.5 --- Reducing Clock Skew variability via Cross Links --- p.16Chapter 2.6 --- Statistical Based Link Insertion for Robust Clock Network Design --- p.18Chapter 2.7 --- Variation Tolerant Buffered Clock Network Synthesis with Cross Links --- p.20Chapter 2.8 --- Cross Link Insertion for Improving Tolerance to Variations in Clock Network Synthesis --- p.22Chapter 3 --- Clock Network Construction with Cross Links --- p.24Chapter 3.1 --- Signal Delay and Clock Skew in Non-tree Clock Network --- p.24Chapter 3.1.1 --- Computing Delay in Non-tree Network --- p.25Chapter 3.1.2 --- Effect of a Cross Link on Clock Skew --- p.27Chapter 3.2 --- Link Insertion for Non-tree Clock Network --- p.28Chapter 3.2.1 --- Motivation of Computing Delay for Link Insertion --- p.29Chapter 3.2.2 --- Overall Flow for Cross Link Insertion --- p.30Chapter 3.2.3 --- Linear Program for Selecting Node Pairs --- p.31Chapter 3.2.4 --- Reducing the Number of Optimizations --- p.35Chapter 3.2.5 --- Experimental Results --- p.37Chapter 4 --- Buffered Clock Network with Cross Links --- p.41Chapter 4.1 --- Link Insertion in Buffered Clock Network --- p.41Chapter 4.1.1 --- Delay Calculation in Buffered Clock Network --- p.42Chapter 4.1.2 --- Linear Program Formulation for Buffered Clock Network --- p.43Chapter 4.2 --- Experimental Results and Comparison --- p.44Chapter 4.3 --- Possible Extensions --- p.46Chapter 4.3.1 --- Link Insertion at Internal Nodes --- p.46Chapter 4.3.2 --- Modeling Clock Buffer Delay Variation --- p.47Chapter 5 --- Conclusion --- p.49Bibliography --- p.5