3,100 research outputs found

    Synthesis of Clock Trees with Useful Skew based on Sparse-Graph Algorithms

    Get PDF
    Computer-aided design (CAD) for very large scale integration (VLSI) involve

    TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for On-line Data-Intensive Applications

    Get PDF
    Datacenters running on-line, data-intensive applications (OLDIs) consume significant amounts of energy. However, reducing their energy is challenging due to their tight response time requirements. A key aspect of OLDIs is that each user query goes to all or many of the nodes in the cluster, so that the overall time budget is dictated by the tail of the replies' latency distribution; replies see latency variations both in the network and compute. Previous work proposes to achieve load-proportional energy by slowing down the computation at lower datacenter loads based directly on response times (i.e., at lower loads, the proposal exploits the average slack in the time budget provisioned for the peak load). In contrast, we propose TimeTrader to reduce energy by exploiting the latency slack in the sub- critical replies which arrive before the deadline (e.g., 80% of replies are 3-4x faster than the tail). This slack is present at all loads and subsumes the previous work's load-related slack. While the previous work shifts the leaves' response time distribution to consume the slack at lower loads, TimeTrader reshapes the distribution at all loads by slowing down individual sub-critical nodes without increasing missed deadlines. TimeTrader exploits slack in both the network and compute budgets. Further, TimeTrader leverages Earliest Deadline First scheduling to largely decouple critical requests from the queuing delays of sub- critical requests which can then be slowed down without hurting critical requests. A combination of real-system measurements and at-scale simulations shows that without adding to missed deadlines, TimeTrader saves 15-19% and 41-49% energy at 90% and 30% loading, respectively, in a datacenter with 512 nodes, whereas previous work saves 0% and 31-37%.Comment: 13 page

    Cooperative Synchronization in Wireless Networks

    Full text link
    Synchronization is a key functionality in wireless network, enabling a wide variety of services. We consider a Bayesian inference framework whereby network nodes can achieve phase and skew synchronization in a fully distributed way. In particular, under the assumption of Gaussian measurement noise, we derive two message passing methods (belief propagation and mean field), analyze their convergence behavior, and perform a qualitative and quantitative comparison with a number of competing algorithms. We also show that both methods can be applied in networks with and without master nodes. Our performance results are complemented by, and compared with, the relevant Bayesian Cram\'er-Rao bounds

    둜직 및 피지컬 ν•©μ„±μ—μ„œμ˜ 타이밍 뢄석과 μ΅œμ ν™”

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사) -- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 전기·정보곡학뢀, 2020. 8. κΉ€νƒœν™˜.Timing analysis is one of the necessary steps in the development of a semiconductor circuit. In addition, it is increasingly important in the advanced process technologies due to various factors, including the increase of process–voltage–temperature variation. This dissertation addresses three problems related to timing analysis and optimization in logic and physical synthesis. Firstly, most static timing analysis today are based on conventional fixed flip-flop timing models, in which every flip-flop is assumed to have a fixed clock-to-Q delay. However, setup and hold skews affect the clock-to-Q delay in reality. In this dissertation, I propose a mathematical formulation to solve the problem and apply it to the clock skew scheduling problems as well as to the analysis of a given circuit, with a scalable speedup technique. Secondly, near-threshold computing is one of the promising concepts for energy-efficient operation of VLSI systems, but wide performance variation and nonlinearity to process variations block the proliferation. To cope with this, I propose a holistic hardware performance monitoring methodology for accurate timing prediction in a near-threshold voltage regime and advanced process technology. Lastly, an asynchronous circuit is one of the alternatives to the conventional synchronous style, and asynchronous pipeline circuit especially attractive because of its small design effort. This dissertation addresses the synthesis problem of lightening two-phase bundled-data asynchronous pipeline controllers, in which delay buffers are essential for guaranteeing the correct handshaking operation but incurs considerable area increase.타이밍 뢄석은 λ°˜λ„μ²΄ 회둜 개발 ν•„μˆ˜ κ³Όμ • 쀑 ν•˜λ‚˜λ‘œ, μ΅œμ‹  κ³΅μ •μΌμˆ˜λ‘ 곡정-μ „μ••-μ˜¨λ„ 변이 증가λ₯Ό ν¬ν•¨ν•œ λ‹€μ–‘ν•œ μš”μΈμœΌλ‘œ ν•˜μ—¬κΈˆ κ·Έ μ€‘μš”μ„±μ΄ 컀지고 μžˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 둜직 및 피지컬 ν•©μ„±κ³Ό κ΄€λ ¨ν•˜μ—¬ μ„Έ 가지 타이밍 뢄석 및 μ΅œμ ν™” λ¬Έμ œμ— λŒ€ν•΄ 닀룬닀. 첫째둜, μ˜€λŠ˜λ‚  λŒ€λΆ€λΆ„μ˜ 정적 타이밍 뢄석은 λͺ¨λ“  ν”Œλ¦½-ν”Œλ‘­μ˜ 클럭-좜λ ₯ λ”œλ ˆμ΄κ°€ κ³ μ •λœ κ°’μ΄λΌλŠ” 가정을 λ°”νƒ•μœΌλ‘œ μ΄λ£¨μ–΄μ‘Œλ‹€. ν•˜μ§€λ§Œ μ‹€μ œ 클럭-좜λ ₯ λ”œλ ˆμ΄λŠ” ν•΄λ‹Ή ν”Œλ¦½-ν”Œλ‘­μ˜ μ…‹μ—… 및 ν™€λ“œ μŠ€νμ— 영ν–₯을 λ°›λŠ”λ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ΄λŸ¬ν•œ νŠΉμ„±μ„ μˆ˜ν•™μ μœΌλ‘œ μ •λ¦¬ν•˜μ˜€μœΌλ©°, 이λ₯Ό ν™•μž₯ κ°€λŠ₯ν•œ 속도 ν–₯상 기법과 λ”λΆˆμ–΄ 주어진 회둜의 타이밍 뢄석 및 클럭 슀큐 μŠ€μΌ€μ₯΄λ§ λ¬Έμ œμ— μ μš©ν•˜μ˜€λ‹€. λ‘˜μ§Έλ‘œ, μœ μ‚¬ λ¬Έν„± 연산은 μ΄ˆκ³ μ§‘μ  회둜 λ™μž‘μ˜ μ—λ„ˆμ§€ νš¨μœ¨μ„ λŒμ–΄ 올릴 수 μžˆλ‹€λŠ” μ μ—μ„œ κ°κ΄‘λ°›μ§€λ§Œ, 큰 폭의 μ„±λŠ₯ 변이 및 λΉ„μ„ ν˜•μ„± λ•Œλ¬Έμ— 널리 ν™œμš©λ˜κ³  μžˆμ§€ μ•Šλ‹€. 이λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•΄ μœ μ‚¬ λ¬Έν„± μ „μ•• μ˜μ—­ 및 μ΅œμ‹  곡정 λ…Έλ“œμ—μ„œ 보닀 μ •ν™•ν•œ 타이밍 μ˜ˆμΈ‘μ„ μœ„ν•œ ν•˜λ“œμ›¨μ–΄ μ„±λŠ₯ λͺ¨λ‹ˆν„°λ§ 방법둠 μ „λ°˜μ„ μ œμ•ˆν•˜μ˜€λ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ, 비동기 νšŒλ‘œλŠ” κΈ°μ‘΄ 동기 회둜의 λŒ€μ•ˆ 쀑 ν•˜λ‚˜λ‘œ, κ·Έ μ€‘μ—μ„œλ„ 비동기 νŒŒμ΄ν”„λΌμΈ νšŒλ‘œλŠ” 비ꡐ적 적은 섀계 λ…Έλ ₯λ§ŒμœΌλ‘œλ„ κ΅¬ν˜„ κ°€λŠ₯ν•˜λ‹€λŠ” μž₯점이 μžˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 2μœ„μƒ 묢음 데이터 ν”„λ‘œν† μ½œ 기반 비동기 νŒŒμ΄ν”„λΌμΈ 컨트둀러 μƒμ—μ„œ, μ •ν™•ν•œ ν•Έλ“œμ…°μ΄ν‚Ή 톡신을 μœ„ν•΄ μ‚½μž…λœ λ”œλ ˆμ΄ 버퍼에 μ˜ν•œ 면적 증가λ₯Ό μ™„ν™”ν•  수 μžˆλŠ” ν•©μ„± 기법을 μ œμ‹œν•˜μ˜€λ‹€.1 INTRODUCTION 1 1.1 Flexible Flip-Flop Timing Model 1 1.2 Hardware Performance Monitoring Methodology 4 1.3 Asynchronous Pipeline Controller 10 1.4 Contributions of this Dissertation 15 2 ANALYSIS AND OPTIMIZATION CONSIDERING FLEXIBLE FLIP-FLOP TIMING MODEL 17 2.1 Preliminaries 17 2.1.1 Terminologies 17 2.1.2 Timing Analysis 20 2.1.3 Clock-to-Q Delay Surface Modeling 21 2.2 Clock-to-Q Delay Interval Analysis 22 2.2.1 Derivation 23 2.2.2 Additional Constraints 26 2.2.3 Analysis: Finding Minimum Clock Period 28 2.2.4 Optimization: Clock Skew Scheduling 30 2.2.5 Scalable Speedup Technique 33 2.3 Experimental Results 37 2.3.1 Application to Minimum Clock Period Finding 37 2.3.2 Application to Clock Skew Scheduling 39 2.3.3 Efficacy of Scalable Speedup Technique 43 2.4 Summary 44 3 HARDWARE PERFORMANCE MONITORING METHODOLOGY AT NTC AND ADVANCED TECHNOLOGY NODE 45 3.1 Overall Flow of Proposed HPM Methodology 45 3.2 Prerequisites to HPM Methodology 47 3.2.1 BEOL Process Variation Modeling 47 3.2.2 Surrogate Model Preparation 49 3.3 HPM Methodology: Design Phase 52 3.3.1 HPM2PV Model Construction 52 3.3.2 Optimization of Monitoring Circuits Configuration 54 3.3.3 PV2CPT Model Construction 58 3.4 HPM Methodology: Post-Silicon Phase 60 3.4.1 Transfer Learning in Silicon Characterization Step 60 3.4.2 Procedures in Volume Production Phase 61 3.5 Experimental Results 62 3.5.1 Experimental Setup 62 3.5.2 Exploration of Monitoring Circuits Configuration 64 3.5.3 Effectiveness of Monitoring Circuits Optimization 66 3.5.4 Considering BEOL PVs and Uncertainty Learning 68 3.5.5 Comparison among Different Prediction Flows 69 3.5.6 Effectiveness of Prediction Model Calibration 71 3.6 Summary 73 4 LIGHTENING ASYNCHRONOUS PIPELINE CONTROLLER 75 4.1 Preliminaries and State-of-the-Art Work 75 4.1.1 Bundled-data vs. Dual-rail Asynchronous Circuits 75 4.1.2 Two-phase vs. Four-phase Bundled-data Protocol 76 4.1.3 Conventional State-of-the-Art Pipeline Controller Template 77 4.2 Delay Path Sharing for Lightening Pipeline Controller Template 78 4.2.1 Synthesizing Sharable Delay Paths 78 4.2.2 Validating Logical Correctness for Sharable Delay Paths 80 4.2.3 Reformulating Timing Constraints of Controller Template 81 4.2.4 Minimally Allocating Delay Buffers 87 4.3 In-depth Pipeline Controller Template Synthesis with Delay Path Reusing 88 4.3.1 Synthesizing Delay Path Units 88 4.3.2 Validating Logical Correctness of Delay Path Units 89 4.3.3 Updating Timing Constraints for Delay Path Units 91 4.3.4 In-depth Synthesis Flow Utilizing Delay Path Units 95 4.4 Experimental Results 99 4.4.1 Environment Setup 99 4.4.2 Piecewise Linear Modeling of Delay Path Unit Area 99 4.4.3 Comparison of Power, Performance, and Area 102 4.5 Summary 107 5 CONCLUSION 109 5.1 Chapter 2 109 5.2 Chapter 3 110 5.3 Chapter 4 110 Abstract (In Korean) 127Docto
    • …
    corecore