3,100 research outputs found
Synthesis of Clock Trees with Useful Skew based on Sparse-Graph Algorithms
Computer-aided design (CAD) for very large scale integration (VLSI) involve
TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for On-line Data-Intensive Applications
Datacenters running on-line, data-intensive applications (OLDIs) consume
significant amounts of energy. However, reducing their energy is challenging
due to their tight response time requirements. A key aspect of OLDIs is that
each user query goes to all or many of the nodes in the cluster, so that the
overall time budget is dictated by the tail of the replies' latency
distribution; replies see latency variations both in the network and compute.
Previous work proposes to achieve load-proportional energy by slowing down the
computation at lower datacenter loads based directly on response times (i.e.,
at lower loads, the proposal exploits the average slack in the time budget
provisioned for the peak load). In contrast, we propose TimeTrader to reduce
energy by exploiting the latency slack in the sub- critical replies which
arrive before the deadline (e.g., 80% of replies are 3-4x faster than the
tail). This slack is present at all loads and subsumes the previous work's
load-related slack. While the previous work shifts the leaves' response time
distribution to consume the slack at lower loads, TimeTrader reshapes the
distribution at all loads by slowing down individual sub-critical nodes without
increasing missed deadlines. TimeTrader exploits slack in both the network and
compute budgets. Further, TimeTrader leverages Earliest Deadline First
scheduling to largely decouple critical requests from the queuing delays of
sub- critical requests which can then be slowed down without hurting critical
requests. A combination of real-system measurements and at-scale simulations
shows that without adding to missed deadlines, TimeTrader saves 15-19% and
41-49% energy at 90% and 30% loading, respectively, in a datacenter with 512
nodes, whereas previous work saves 0% and 31-37%.Comment: 13 page
Cooperative Synchronization in Wireless Networks
Synchronization is a key functionality in wireless network, enabling a wide
variety of services. We consider a Bayesian inference framework whereby network
nodes can achieve phase and skew synchronization in a fully distributed way. In
particular, under the assumption of Gaussian measurement noise, we derive two
message passing methods (belief propagation and mean field), analyze their
convergence behavior, and perform a qualitative and quantitative comparison
with a number of competing algorithms. We also show that both methods can be
applied in networks with and without master nodes. Our performance results are
complemented by, and compared with, the relevant Bayesian Cram\'er-Rao bounds
λ‘μ§ λ° νΌμ§μ»¬ ν©μ±μμμ νμ΄λ° λΆμκ³Ό μ΅μ ν
νμλ
Όλ¬Έ (λ°μ¬) -- μμΈλνκ΅ λνμ : 곡과λν μ κΈ°Β·μ 보곡νλΆ, 2020. 8. κΉνν.Timing analysis is one of the necessary steps in the development of a semiconductor circuit. In addition, it is increasingly important in the advanced process technologies due to various factors, including the increase of processβvoltageβtemperature variation. This dissertation addresses three problems related to timing analysis and optimization in logic and physical synthesis. Firstly, most static timing analysis today are based on conventional fixed flip-flop timing models, in which every flip-flop is assumed to have a fixed clock-to-Q delay. However, setup and hold skews affect the clock-to-Q delay in reality. In this dissertation, I propose a mathematical formulation to solve the problem and apply it to the clock skew scheduling problems as well as to the analysis of a given circuit, with a scalable speedup technique. Secondly, near-threshold computing is one of the promising concepts for energy-efficient operation of VLSI systems, but wide performance variation and nonlinearity to process variations block the proliferation. To cope with this, I propose a holistic hardware performance monitoring methodology for accurate timing prediction in a near-threshold voltage regime and advanced process technology. Lastly, an asynchronous circuit is one of the alternatives to the conventional synchronous style, and asynchronous pipeline circuit especially attractive because of its small design effort. This dissertation addresses the synthesis problem of lightening two-phase bundled-data asynchronous pipeline controllers, in which delay buffers are essential for guaranteeing the correct handshaking operation but incurs considerable area increase.νμ΄λ° λΆμμ λ°λ체 νλ‘ κ°λ° νμ κ³Όμ μ€ νλλ‘, μ΅μ 곡μ μΌμλ‘ κ³΅μ -μ μ-μ¨λ λ³μ΄ μ¦κ°λ₯Ό ν¬ν¨ν λ€μν μμΈμΌλ‘ νμ¬κΈ κ·Έ μ€μμ±μ΄ 컀μ§κ³ μλ€. λ³Έ λ
Όλ¬Έμμλ λ‘μ§ λ° νΌμ§μ»¬ ν©μ±κ³Ό κ΄λ ¨νμ¬ μΈ κ°μ§ νμ΄λ° λΆμ λ° μ΅μ ν λ¬Έμ μ λν΄ λ€λ£¬λ€. 첫째λ‘, μ€λλ λλΆλΆμ μ μ νμ΄λ° λΆμμ λͺ¨λ ν립-νλ‘μ ν΄λ-μΆλ ₯ λλ μ΄κ° κ³ μ λ κ°μ΄λΌλ κ°μ μ λ°νμΌλ‘ μ΄λ£¨μ΄μ‘λ€. νμ§λ§ μ€μ ν΄λ-μΆλ ₯ λλ μ΄λ ν΄λΉ ν립-νλ‘μ μ
μ
λ° νλ μ€νμ μν₯μ λ°λλ€. λ³Έ λ
Όλ¬Έμμλ μ΄λ¬ν νΉμ±μ μνμ μΌλ‘ μ 리νμμΌλ©°, μ΄λ₯Ό νμ₯ κ°λ₯ν μλ ν₯μ κΈ°λ²κ³Ό λλΆμ΄ μ£Όμ΄μ§ νλ‘μ νμ΄λ° λΆμ λ° ν΄λ μ€ν μ€μΌμ₯΄λ§ λ¬Έμ μ μ μ©νμλ€. λμ§Έλ‘, μ μ¬ λ¬Έν± μ°μ°μ μ΄κ³ μ§μ νλ‘ λμμ μλμ§ ν¨μ¨μ λμ΄ μ¬λ¦΄ μ μλ€λ μ μμ κ°κ΄λ°μ§λ§, ν° νμ μ±λ₯ λ³μ΄ λ° λΉμ νμ± λλ¬Έμ λ리 νμ©λκ³ μμ§ μλ€. μ΄λ₯Ό ν΄κ²°νκΈ° μν΄ μ μ¬ λ¬Έν± μ μ μμ λ° μ΅μ 곡μ λ
Έλμμ λ³΄λ€ μ νν νμ΄λ° μμΈ‘μ μν νλμ¨μ΄ μ±λ₯ λͺ¨λν°λ§ λ°©λ²λ‘ μ λ°μ μ μνμλ€. λ§μ§λ§μΌλ‘, λΉλκΈ° νλ‘λ κΈ°μ‘΄ λκΈ° νλ‘μ λμ μ€ νλλ‘, κ·Έ μ€μμλ λΉλκΈ° νμ΄νλΌμΈ νλ‘λ λΉκ΅μ μ μ μ€κ³ λ
Έλ ₯λ§μΌλ‘λ ꡬν κ°λ₯νλ€λ μ₯μ μ΄ μλ€. λ³Έ λ
Όλ¬Έμμλ 2μμ λ¬Άμ λ°μ΄ν° νλ‘ν μ½ κΈ°λ° λΉλκΈ° νμ΄νλΌμΈ 컨νΈλ‘€λ¬ μμμ, μ νν νΈλμ
°μ΄νΉ ν΅μ μ μν΄ μ½μ
λ λλ μ΄ λ²νΌμ μν λ©΄μ μ¦κ°λ₯Ό μνν μ μλ ν©μ± κΈ°λ²μ μ μνμλ€.1 INTRODUCTION 1
1.1 Flexible Flip-Flop Timing Model 1
1.2 Hardware Performance Monitoring Methodology 4
1.3 Asynchronous Pipeline Controller 10
1.4 Contributions of this Dissertation 15
2 ANALYSIS AND OPTIMIZATION CONSIDERING FLEXIBLE FLIP-FLOP TIMING MODEL 17
2.1 Preliminaries 17
2.1.1 Terminologies 17
2.1.2 Timing Analysis 20
2.1.3 Clock-to-Q Delay Surface Modeling 21
2.2 Clock-to-Q Delay Interval Analysis 22
2.2.1 Derivation 23
2.2.2 Additional Constraints 26
2.2.3 Analysis: Finding Minimum Clock Period 28
2.2.4 Optimization: Clock Skew Scheduling 30
2.2.5 Scalable Speedup Technique 33
2.3 Experimental Results 37
2.3.1 Application to Minimum Clock Period Finding 37
2.3.2 Application to Clock Skew Scheduling 39
2.3.3 Efficacy of Scalable Speedup Technique 43
2.4 Summary 44
3 HARDWARE PERFORMANCE MONITORING METHODOLOGY AT NTC AND ADVANCED TECHNOLOGY NODE 45
3.1 Overall Flow of Proposed HPM Methodology 45
3.2 Prerequisites to HPM Methodology 47
3.2.1 BEOL Process Variation Modeling 47
3.2.2 Surrogate Model Preparation 49
3.3 HPM Methodology: Design Phase 52
3.3.1 HPM2PV Model Construction 52
3.3.2 Optimization of Monitoring Circuits Configuration 54
3.3.3 PV2CPT Model Construction 58
3.4 HPM Methodology: Post-Silicon Phase 60
3.4.1 Transfer Learning in Silicon Characterization Step 60
3.4.2 Procedures in Volume Production Phase 61
3.5 Experimental Results 62
3.5.1 Experimental Setup 62
3.5.2 Exploration of Monitoring Circuits Configuration 64
3.5.3 Effectiveness of Monitoring Circuits Optimization 66
3.5.4 Considering BEOL PVs and Uncertainty Learning 68
3.5.5 Comparison among Different Prediction Flows 69
3.5.6 Effectiveness of Prediction Model Calibration 71
3.6 Summary 73
4 LIGHTENING ASYNCHRONOUS PIPELINE CONTROLLER 75
4.1 Preliminaries and State-of-the-Art Work 75
4.1.1 Bundled-data vs. Dual-rail Asynchronous Circuits 75
4.1.2 Two-phase vs. Four-phase Bundled-data Protocol 76
4.1.3 Conventional State-of-the-Art Pipeline Controller Template 77
4.2 Delay Path Sharing for Lightening Pipeline Controller Template 78
4.2.1 Synthesizing Sharable Delay Paths 78
4.2.2 Validating Logical Correctness for Sharable Delay Paths 80
4.2.3 Reformulating Timing Constraints of Controller Template 81
4.2.4 Minimally Allocating Delay Buffers 87
4.3 In-depth Pipeline Controller Template Synthesis with Delay Path Reusing 88
4.3.1 Synthesizing Delay Path Units 88
4.3.2 Validating Logical Correctness of Delay Path Units 89
4.3.3 Updating Timing Constraints for Delay Path Units 91
4.3.4 In-depth Synthesis Flow Utilizing Delay Path Units 95
4.4 Experimental Results 99
4.4.1 Environment Setup 99
4.4.2 Piecewise Linear Modeling of Delay Path Unit Area 99
4.4.3 Comparison of Power, Performance, and Area 102
4.5 Summary 107
5 CONCLUSION 109
5.1 Chapter 2 109
5.2 Chapter 3 110
5.3 Chapter 4 110
Abstract (In Korean) 127Docto
- β¦