29,292 research outputs found
메쉬 기반의 클락 네트워크 설계 방법론
학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 2. 김태환.The clock distribution network in a synchronous digital circuit delivers a clock signal to every storage element i.e., clock sink in the circuit. However, since the continued technology scaling increases PVT (process-voltage-temperature) variation, the increase of clock skew variation is highly likely to cause performance degradation or system failure at run time. Recently, to mitigate the clock skew variation, many researchers have taken a profound interest in the clock mesh network. However, though the structure of clock mesh network is excellent in tolerating timing variation, it demands significantly high power consumption due to the use of excessive mesh wire and buffer resources. Thus, optimizing the resources required in the mesh clock synthesis while maintaining the variation tolerance is crucially important. The three major tasks that greatly affect the cost of resulting clock mesh are (1) mesh segment allocation, (2) mesh buffer allocation and sizing, and (3) clock sink binding to mesh segments. Previous clock mesh optimization approaches solve the three tasks sequentially, one by one at a time, to manage the run time complexity of the tasks at the expense of losing the quality of results. However, since the three tasks are tightly inter-related, simultaneously optimizing all three tasks is essential, if the run time is ever permitted, to synthesize an economical clock mesh network. In this dissertation, we propose an approach which is able to tackle the problem in an integrated fashion by combining the three tasks into an iterative framework of incremental updates and solving them simultaneously to find a globally optimal allocation of mesh resources while taking into account the clock skew tolerance constraints. The core parts of this dissertation are a precise analysis on the relation among the resource optimization tasks and an establishment of mechanism for effective and efficient integration of the tasks. In particular, to handle the run time problem, we propose a set of speed-up techniques i.e., modeling RC circuit for eliminating redundant matrix multiplications, exploiting sliding window scheme, and fast buffer sizing effect estimation, which are fitted into our context of fast clock skew estimation in mesh resource optimization as well as an invention of early decision policies. In summary, this dissertation presents the efficient design methodology for clock mesh synthesis with consideration on integration of three tasks and reduction of runtime complexity.Abstract i
Contents iii
List of Figures vi
List of Tables x
1 Introduction 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contributions of This Dissertation . . . . . . . . . . . . . . . . . . . 3
2 Background 5
2.1 Clock Distribution Network . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Clock Network Topologies . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Design Metrics of Clock Network . . . . . . . . . . . . . . . . . . . 7
2.4 The Effects of Variations on Clock Skew . . . . . . . . . . . . . . . . 9
3 Clock Mesh Synthesis Flow 12
3.1 Elements of Clock Mesh . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Conventional Clock Mesh Synthesis Overview . . . . . . . . . . . . . 13
3.3 Initial Grid Generation . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4 Mesh Buffer Placement and Sizing . . . . . . . . . . . . . . . . . . . 14
3.5 Clock Mesh Optimization . . . . . . . . . . . . . . . . . . . . . . . . 17
4 Integrated Resource Allocation and Binding in Clock Mesh Synthesis 19
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 Framework of Clock Mesh Optimization . . . . . . . . . . . . . . . . 26
4.3.1 Incremental Resource Updates . . . . . . . . . . . . . . . . . 29
4.3.2 Constraints for Variation Tolerance . . . . . . . . . . . . . . 34
4.3.3 Early Decision Policies . . . . . . . . . . . . . . . . . . . . . 38
4.3.4 Time Complexity Analysis . . . . . . . . . . . . . . . . . . . 39
4.4 Fast Clock Skew Estimation Techniques . . . . . . . . . . . . . . . . 40
4.4.1 Partially Reusing Matrix Multiplication for Incremental Updates 41
4.4.2 Adopting Sliding Window Scheme . . . . . . . . . . . . . . . 43
4.4.3 Adjusting Delay Caused by Buffer Resizing . . . . . . . . . . 44
4.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.5.1 Experimental Environments . . . . . . . . . . . . . . . . . . 46
4.5.2 Resource Requirement and Variation Tolerance Comparison . 48
4.5.3 Comparison with Clock Mesh Optimization using Worst Case Timing Analysis of Commercial Tool . . . . . . . . . . . . . 56
4.5.4 Analysis of the Effect of Proposed Techniques . . . . . . . . 58
4.5.5 Run Time Analysis . . . . . . . . . . . . . . . . . . . . . . . 61
4.5.6 Accuracy and Run Time of Fast Clock Skew Estimation . . . 63
4.5.7 Electromigration Analysis . . . . . . . . . . . . . . . . . . . 68
4.5.8 Run-time Analysis in Multi-thread Computing Environment . 70
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5 Conclusion 74
Abstract in Korean 84Docto
Variation and power issues in VLSI clock networks
Clock Distribution Network (CDN) is an important component of any synchronous logic circuit. The function of CDN is to deliver the clock signal to the clock
sinks. Clock skew is defined as the difference in the arrival time of the clock signal at
the clock sinks. Higher uncertainty in skew (due to PVT variations) degrades circuit
performance by decreasing the maximum possible delay between any two sequential
elements. Aggressive frequency scaling has also led to high power consumption especially in CDN. This dissertation addresses variation and power issues in the design of
current and potential future CDN. The research detailed in this work presents algorithmic techniques for the following problems: (1) Variation tolerance in useful skew
design, (2) Link insertion for buffered clock nets, (3) Methodology and algorithms for
rotary clocking and (4) Clock mesh optimization for skew-power trade off.
For clock trees this dissertation presents techniques to integrate the different
aspects of clock tree synthesis (skew scheduling, abstract topology and layout embedding) into one framework- tolerance to variations. This research addresses the issues
involved in inserting cross-links in a buffered clock tree and proposes design criteria
to avoid the risk of short-circuit current. Rotary clocking is a promising new clocking
scheme that consists of unterminated rings formed by differential transmission lines.
Rotary clocking achieves reduction in power dissipation clock skew. This dissertation
addresses the issues in adopting current CAD methodology to rotary clocks. Alternative methodology and corresponding algorithmic techniques are detailed. Clock
mesh is a popular form of CDN used in high performance systems. The problem
of simultaneous sizing and placement of mesh buffers in a clock mesh is addressed.
The algorithms presented remove the edges from the clock mesh to trade off skew
tolerance for low power.
For clock trees as well as link insertion, our experiments indicate significant reduction in clock skew due to variations. For clock mesh, experimental results indicate
18.5% reduction in power with 1.3% delay penalty on a average. In summary, this dissertation details methodologies/algorithms that address two critical issues- variation
and power dissipation in current and potential future CDN
An Extensible Timing Infrastructure for Adaptive Large-scale Applications
Real-time access to accurate and reliable timing information is necessary to
profile scientific applications, and crucial as simulations become increasingly
complex, adaptive, and large-scale. The Cactus Framework provides flexible and
extensible capabilities for timing information through a well designed
infrastructure and timing API. Applications built with Cactus automatically
gain access to built-in timers, such as gettimeofday and getrusage,
system-specific hardware clocks, and high-level interfaces such as PAPI. We
describe the Cactus timer interface, its motivation, and its implementation. We
then demonstrate how this timing information can be used by an example
scientific application to profile itself, and to dynamically adapt itself to a
changing environment at run time
Modeling of thermally induced skew variations in clock distribution network
Clock distribution network is sensitive to large thermal gradients on the die as the performance of both clock buffers and interconnects are affected by temperature. A robust clock network design relies on the accurate analysis of clock skew subject to temperature variations. In this work, we address the problem of thermally induced clock skew modeling in nanometer CMOS technologies. The complex thermal behavior of both buffers and interconnects are taken into account. In addition, our characterization of the temperature effect on buffers and interconnects provides valuable insight to designers about the potential impact of thermal variations on clock networks. The use of industrial standard data format in the interface allows our tool to be easily integrated into existing design flow
Task-based adaptive multiresolution for time-space multi-scale reaction-diffusion systems on multi-core architectures
A new solver featuring time-space adaptation and error control has been
recently introduced to tackle the numerical solution of stiff
reaction-diffusion systems. Based on operator splitting, finite volume adaptive
multiresolution and high order time integrators with specific stability
properties for each operator, this strategy yields high computational
efficiency for large multidimensional computations on standard architectures
such as powerful workstations. However, the data structure of the original
implementation, based on trees of pointers, provides limited opportunities for
efficiency enhancements, while posing serious challenges in terms of parallel
programming and load balancing. The present contribution proposes a new
implementation of the whole set of numerical methods including Radau5 and
ROCK4, relying on a fully different data structure together with the use of a
specific library, TBB, for shared-memory, task-based parallelism with
work-stealing. The performance of our implementation is assessed in a series of
test-cases of increasing difficulty in two and three dimensions on multi-core
and many-core architectures, demonstrating high scalability
Parallel VLSI Circuit Analysis and Optimization
The prevalence of multi-core processors in recent years has introduced new
opportunities and challenges to Electronic Design Automation (EDA) research and
development. In this dissertation, a few parallel Very Large Scale Integration (VLSI)
circuit analysis and optimization methods which utilize the multi-core computing
platform to tackle some of the most difficult contemporary Computer-Aided Design
(CAD) problems are presented. The first CAD application that is addressed
in this dissertation is analyzing and optimizing mesh-based clock distribution network.
Mesh-based clock distribution network (also known as clock mesh) is used in
high-performance microprocessor designs as a reliable way of distributing clock signals
to the entire chip. The second CAD application addressed in this dissertation
is the Simulation Program with Integrated Circuit Emphasis (SPICE) like circuit
simulation. SPICE simulation is often regarded as the bottleneck of the design flow.
Recently, parallel circuit simulation has attracted a lot of attention.
The first part of the dissertation discusses circuit analysis techniques. First, a
combination of clock network specific model order reduction algorithm and a port sliding
scheme is presented to tackle the challenges in analyzing large clock meshes with
a large number of clock drivers. Our techniques run much faster than the standard
SPICE simulation and existing model order reduction techniques. They also provide
a basis for the clock mesh optimization. Then, a hierarchical multi-algorithm parallel
circuit simulation (HMAPS) framework is presented as an novel technique of parallel circuit simulation. The inter-algorithm parallelism approach in HMAPS is completely
different from the existing intra-algorithm parallel circuit simulation techniques and
achieves superlinear speedup in practice. The second part of the dissertation talks
about parallel circuit optimization. A modified asynchronous parallel pattern search
(APPS) based method which utilizes the efficient clock mesh simulation techniques for
the clock driver size optimization problem is presented. Our modified APPS method
runs much faster than a continuous optimization method and effectively reduces the
clock skew for all test circuits. The third part of the dissertation describes parallel
performance modeling and optimization of the HMAPS framework. The performance
models and runtime optimization scheme improve the speed of HMAPS further more.
The dynamically adapted HMAPS becomes a complete solution for parallel circuit
simulation
A Parallel Mesh-Adaptive Framework for Hyperbolic Conservation Laws
We report on the development of a computational framework for the parallel,
mesh-adaptive solution of systems of hyperbolic conservation laws like the
time-dependent Euler equations in compressible gas dynamics or
Magneto-Hydrodynamics (MHD) and similar models in plasma physics. Local mesh
refinement is realized by the recursive bisection of grid blocks along each
spatial dimension, implemented numerical schemes include standard
finite-differences as well as shock-capturing central schemes, both in
connection with Runge-Kutta type integrators. Parallel execution is achieved
through a configurable hybrid of POSIX-multi-threading and MPI-distribution
with dynamic load balancing. One- two- and three-dimensional test computations
for the Euler equations have been carried out and show good parallel scaling
behavior. The Racoon framework is currently used to study the formation of
singularities in plasmas and fluids.Comment: late submissio
- …