2 research outputs found

    λΉ„μš© 효율적인 클럭 및 νŒŒμ›Œ κ²Œμ΄νŒ… 섀계 방법둠

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사)--μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› :κ³΅κ³ΌλŒ€ν•™ 전기·정보곡학뢀,2020. 2. κΉ€νƒœν™˜.μ €μ „λ ₯ μ„€κ³„λŠ” μ΅œμ‹  μ‹œμŠ€ν…œ-온-μΉ© (SoCs) μ„€κ³„μ—μ„œ 맀우 μ€‘μš”ν•œ μš”μ†Œ μ€‘μ˜ ν•˜λ‚˜μ΄λ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 동적 및 정적 μ „λ ₯ μ†ŒλΉ„λ₯Ό κ°μ†Œμ‹œν‚€κΈ° μœ„ν•œ μ €μ „λ ₯ 섀계 방법둠에 λŒ€ν•΄ λ…Όν•œλ‹€. ꡬ체적으둜 λΉ„μš© 효율적인 μ €μ „λ ₯ 섀계λ₯Ό μœ„ν•˜μ—¬ 두 가지 μƒˆλ‘œμš΄ κΈ°μˆ μ„ μ œμ•ˆν•œλ‹€. μš°μ„  λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 동적 μ „λ ₯ μ†ŒλΉ„λ₯Ό 쀄일 수 μžˆλŠ” μƒˆλ‘œμš΄ 클럭 κ²Œμ΄νŒ… 방법을 μ œμ•ˆν•œλ‹€. κΈ°μ‘΄ ν”Œλ¦½-ν”Œλž μž…λ ₯ 데이터 ν† κΈ€ 기반 클럭 κ²Œμ΄νŒ…μ€ κ°€μž₯ 널리 μ‚¬μš©λ˜λŠ” 클럭 κ²Œμ΄νŒ… 기법 μ€‘μ˜ ν•˜λ‚˜μ΄λ‹€. ν•˜μ§€λ§Œ 이 방법은 더 λ§Žμ€ ν”Œλ¦½-ν”Œλžμ— λŒ€ν•΄ μ μš©ν• μˆ˜λ‘ 클럭 κ²Œμ΄νŒ…μ— ν•„μš”ν•œ λΆ€κ°€ νšŒλ‘œκ°€ κΈ‰κ²©νžˆ μ¦κ°€ν•œλ‹€λŠ” 근본적인 ν•œκ³„λ₯Ό μ§€λ‹ˆκ³  μžˆλ‹€. μ΄λŸ¬ν•œ ν•œκ³„λ₯Ό κ·Ήλ³΅ν•˜κΈ° μœ„ν•˜μ—¬ λ³Έ λ…Όλ¬Έμ—μ„œλŠ” λ‹€μŒκ³Ό 같이 μƒˆλ‘œμš΄ 클럭 κ²Œμ΄νŒ… 방법을 μ œμ•ˆν•œλ‹€. 첫 번째둜 κΈ°μ‘΄ μž…λ ₯ 데이터 ν† κΈ€ 기반 클럭 κ²Œμ΄νŒ… 방법에 ν•„μš”ν•œ 회둜 μžμ›μ„ λΆ„μ„ν•˜μ—¬ ν•΄λ‹Ή λ°©λ²•μ˜ λΉ„νš¨μœ¨μ„±μ„ 보이고, κΈ°μ‘΄ λ°©λ²•μ—μ„œ μ‚¬μš©λ˜λŠ” μž…λ ₯ 데이터 ν† κΈ€ κ²€μΆœμ— ν•„μˆ˜μ μ΄μ§€λ§Œ κ³ λΉ„μš©μ˜ XOR 게이트λ₯Ό μ™„λ²½νžˆ μ œκ±°ν•œ ν”Œλ¦½-ν”Œλž μƒνƒœ 기반 클럭 κ²Œμ΄νŒ…'μ΄λΌλŠ” μƒˆλ‘œμš΄ 클럭 κ²Œμ΄νŒ… 방법을 μ œμ•ˆν•œλ‹€. 두 번째둜 μ œμ•ˆλœ XOR κ²Œμ΄νŠΈκ°€ ν•„μš” μ—†λŠ” 클럭 κ²Œμ΄νŒ… 방법을 μœ„ν•œ λΆ€κ°€ 회둜λ₯Ό μ œμ‹œν•˜λ©°, λ‹€μ–‘ν•œ 타이밍 뢄석을 ν†΅ν•˜μ—¬ ν•΄λ‹Ή νšŒλ‘œκ°€ μ•ˆμ •μ μœΌλ‘œ 적용될 수 μžˆμŒμ„ 보인닀. μ„Έ 번째둜 회둜의 ν”Œλ¦½-ν”Œλž μƒνƒœ ν”„λ‘œνŒŒμΌμ— κΈ°λ°˜ν•˜μ—¬, μ œμ•ˆλœ 클럭 κ²Œμ΄νŒ… 기법을 κΈ°μ‘΄ 클럭 κ²Œμ΄νŒ… 기법과 μ™„λ²½ν•˜κ²Œ 톡합할 수 μžˆλŠ” 클럭 κ²Œμ΄νŒ… 방법둠을 μ œμ•ˆν•œλ‹€. μ—¬λŸ¬ 벀치마크 νšŒλ‘œμ— λŒ€ν•œ μ‹€ν—˜ κ²°κ³ΌλŠ” κΈ°μ‘΄ μž…λ ₯ 데이터 ν† κΈ€ 기반 클럭 κ²Œμ΄νŒ… 방법이 μ „λ ₯ μ†ŒλΉ„ 절감 기회λ₯Ό λ†“μΉ˜λŠ” 반면 λ³Έ λ…Όλ¬Έμ—μ„œ μ œμ•ˆλœ 방법은 λͺ¨λ“  타이밍 μ œμ•½ 쑰건을 λ§Œμ‘±ν•˜λ©΄μ„œ μ „λ ₯ μ†ŒλΉ„ κ°μ†Œμ— 맀우 νš¨κ³Όμ μž„μ„ 보여쀀닀. λ‹€μŒμœΌλ‘œ 정적 μ „λ ₯ μ†ŒλΉ„λ₯Ό 쀄이기 μœ„ν•œ λ°©μ•ˆμœΌλ‘œ, λ³Έ λ…Όλ¬Έμ—μ„œλŠ” κΈ°μ‘΄ νŒŒμ›Œ 게이트 회둜의 μƒνƒœ 보쑴용 μ €μž₯ 곡간 ν• λ‹Ή 방법듀이 μ§€λ‹ˆκ³  μžˆλŠ” 두 가지 μ€‘μš”ν•œ ν•œκ³„λ“€μ„ ν•΄κ²°ν•  수 μžˆλŠ” 방법을 μ œμ•ˆν•œλ‹€. μ€‘μš”ν•œ ν•œκ³„λ“€μ΄λž€ 첫 번째둜 닀쀑-λΉ„νŠΈ μƒνƒœ 보쑴 ν”Œλ¦½-ν”Œλžμ˜ λ¬΄λΆ„λ³„ν•œ μ‚¬μš©μœΌλ‘œ μΈν•œ κΈ΄ 웨이크업 지연 μ‹œκ°„μ΄λ©°, 두 번째둜 λ©€ν‹°ν”Œλ ‰μ„œ λ˜λ¨Ήμž„ 루프가 μžˆλŠ” μƒνƒœ 보쑴 ν”Œλ¦½-ν”Œλžμ˜ μ΅œμ ν™” λΆˆκ°€λŠ₯성이닀. κΈ°μ‘΄ λ°©λ²•λ“€μ—μ„œλŠ” μƒνƒœ 보쑴을 μœ„ν•œ μ €μž₯ 곡간을 μ΅œμ†Œν™”ν•˜κΈ° μœ„ν•΄ κΈ΄ 웨이크업 지연 μ‹œκ°„μ΄ ν•„μˆ˜μ μ΄μ—ˆλ‹€. 그리고 λ˜λ¨Ήμž„ 루프가 μžˆλŠ” ν”Œλ¦½-ν”Œλžμ€ μ΅œμ ν™”ν•  수 μ—†λŠ” λŒ€μƒμœΌλ‘œ λ‹€λ£¨μ–΄μ‘Œλ‹€. κ·ΈλŸ¬λ‚˜ 일반적으둜 ν•˜λ“œμ›¨μ–΄ 기술 μ–Έμ–΄(HDL)λ‘œλΆ€ν„° μƒμ„±λ˜λŠ” λ˜λ¨Ήμž„ 루프λ₯Ό μ§€λ‹Œ ν”Œλ¦½-ν”Œλžμ€ λ¬΄μ‹œν•  수 μžˆμ„ μ •λ„λ‘œ 적은 양이 μ•„λ‹ˆλ‹€. 첫 번째 ν•œκ³„λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•œ λ°©λ²•μœΌλ‘œ λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ΅œλŒ€ 2 λΉ„νŠΈμ˜ 닀쀑-λΉ„νŠΈ μƒνƒœ 보쑴 ν”Œλ¦½-ν”Œλžμ„ μ‚¬μš©ν•˜μ—¬ 웨이크업 지연 μ‹œκ°„μ„ 두 클럭 μ‚¬μ΄ν΄λ‘œ μ œν•œν•˜λ©΄μ„œλ„ μƒνƒœ 보쑴을 μœ„ν•œ μ €μž₯ 곡간을 효율적으둜 μ ˆμ•½ν•  수 μžˆμŒμ„ 보인닀. 그리고 두 번째 ν•œκ³„λ₯Ό κ·Ήλ³΅ν•˜κΈ° μœ„ν•΄μ„œ λ˜λ¨Ήμž„ 루프λ₯Ό μ§€λ‹Œ ν”Œλ¦½-ν”Œλžμ΄ ν¬ν•¨λœ 두 ν”Œλ¦½-ν”Œλž 쌍의 μƒνƒœλ₯Ό 볡원할 수 μžˆλŠ” 2단 μƒνƒœ 보쑴 μ œμ–΄ λ°©μ•ˆμ„ μ œμ•ˆν•œλ‹€. λ˜ν•œ 주어진 νšŒλ‘œμ—μ„œ μΆ©λŒμ—†μ΄ λ™μ‹œμ— μ‘΄μž¬ν•  수 μžˆλŠ” ν”Œλ¦½-ν”Œλž μŒμ„ μ΅œλŒ€λ‘œ μΆ”μΆœν•˜κΈ° μœ„ν•΄ 독립 집합 문제(independent set problem)기반의 연산법도 μ œμ•ˆν•œλ‹€. 벀치마크 νšŒλ‘œμ— λŒ€ν•œ μ‹€ν—˜ κ²°κ³ΌλŠ” λ³Έ λ…Όλ¬Έμ—μ„œ μ œμ•ˆλœ 방법이 웨이크업 지연 μ‹œκ°„μ„ 두 클럭 μ‚¬μ΄ν΄λ‘œ μ œν•œν•˜λ©΄μ„œλ„ μƒνƒœ 보쑴에 ν•„μš”ν•œ μ €μž₯ 곡간과 νŒŒμ›Œλ₯Ό κ°μ†Œμ‹œν‚€λŠ”λ° 맀우 νš¨κ³Όμ μž„μ„ 보여쀀닀.Low power design is of great importance in modern system-on-chips (SoCs). This dissertation studies on low power design methodologies for saving dynamic and static power consumption. Precisely, we unveil two novel techniques of cost effective low power design. Firstly, we propose a novel clock gating method for reducing the dynamic power consumption. Flip-flop's input data toggling based clock gating is one of the most commonly used clock gating methods, in which one critical and inherent limitation is the sharp increase of gating logic as more flip-flops are involved in gating. In this dissertation, we propose a new clock gating method to overcome this limitation. Specifically, (1) we analyze the resources of gating logic in the input data toggling based clock gating, from which an ineffectiveness in resource utilization is observed and we propose a new clock gating technique called flip-flop state driven clock gating which completely eliminates the essential and expensive component of XOR gates for detecting input toggling of flip-flops; (2) we provide the supporting logic circuitry of our proposed XOR-free clock gating, confirming its safe applicability through a comprehensive timing analysis; (3) we propose, based on the flip-flops' state profile, a clock gating methodology that seamlessly combines our flip-flop state based clock gating with the toggling based clock gating. Through experiments with benchmark circuits, it is confirmed that our clock gating method is very effective in reducing power, which otherwise the toggling based clock gating shall miss the power saving opportunity, while meeting all timing constraints. Secondly, for reducing the static power consumption, we solve two critical limitations of the conventional approaches to the allocation of state retention storage for power gated circuits. Those are (1) the long wakeup delay caused by the senseless use of multi-bit retention flip-flops (MBRFFs) and (2) the inability to optimize retention flip-flops for the flip-flops with mux-feedback loop. It should be noted that the conventional approaches have regarded the long wakeup delay as an inevitable consequence of maximizing the reduction of total storage size for state retention while they have treated the flip-flops with mux-feedback loop (called self-loop flip-flop) as nonoptimizable component, but practically, the self-loop flip-flops synthesized from hardware description language (HDL) code are not far from a small amount and thus, can in no way be negligible. More precisely, for solving (1), we show that the use of MBRFFs with up to two bits, consequently, constraining the wakeup delay to no more than two clock cycles, is enough to maintain the high reduction of total retention storage and for solving (2), we devise a 2-phase retention control mechanism for a pair of flip-flops, one of which has self-loop, by which just a single retention bit can be used to restore state of the two flip-flops, and propose an independent set based algorithm for maximally extracting the non-conflict pairs from circuits. Through experiments with benchmark circuits, it is shown that our proposed method is very effective against reducing the state retention storage and the power consumption compared with the existing best MBRFF allocation while the wakeup delay is strictly limited to two clock cycles.1 INTRODUCTION 1 1.1 Clock Gating 1 1.2 Power Gating and State Retention 3 1.3 Multi-bit Retention Registers 4 1.4 Contributions of This Dissertation 6 2 FLIP-FLOP STATE DRIVEN CLOCK GATING: CONCEPT, DESIGN, AND METHODOLOGY 9 2.1 Motivations 9 2.1.1 Toggling based Clock Gating 9 2.1.2 Area and Power by Clock Gating 10 2.2 The Proposed Clock Gating 13 2.2.1 Concept of Flip-flop State Driven Clock Gating 13 2.2.2 Design of Gating Logic Circuitry 17 2.2.3 Integrated Clock Gating Methodology 22 2.2.4 Cost Formulation 23 2.3 Experiments 25 2.3.1 Experimental Setup 25 2.3.2 Experimental Results 26 3 ALGORITHM AND DESIGN OPTIMIZATION OF ALLOCATING MULTI-BIT RETENTION FLIP-FLOPS FOR POWER GATED CIRCUITS 32 3.1 Motivations 32 3.1.1 Flip-flops with Mux-feedback Loop 32 3.1.2 Impact of Wakeup Delay 37 3.2 The Proposed Allocation Algorithm 39 3.3 Design of Multi-Bit Retention Flip-Flop and Multi-Bit Extension 48 3.3.1 Multi-Bit Retention Flip-Flop 48 3.3.2 Multi-Bit Flip-Flop Extension 52 3.4 Experiments 54 3.4.1 Experimental Setup 54 3.4.2 Experimental Results 57 4 CONCLUSIONS 65 4.1 Flip-flop State Driven Clock Gating: Concept, Design, and Methodology 65 4.2 Algorithm and Design Optimization of Allocating Multi-bit Retention Flip-flops for Power Gated Circuits 66 Abstract (In Korean) 71Docto

    Clock Tree and Flip-flop Co-optimization for Reducing Power Consumption and Power/Ground Noise of Integrated Circuits and Systems

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사)-- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› κ³΅κ³ΌλŒ€ν•™ 전기·컴퓨터곡학뢀, 2017. 8. κΉ€νƒœν™˜.For very-large-scale integration (VLSI) circuits, the activation of all flip-flops that are used to store data is synchronized by clock signals delivered through clock networks. Due to very high frequency of clock signal switches, the dynamic power consumed on clock networks takes a considerable portion of the total power consumption of the circuits. In addition, the largest amount of power consumption in the clock networks comes from the flip-flops and the buffers that drive the flip-flops at the clock network boundary. In addition, the requirement of simultaneously activating all flip-flops for synchronous circuits induces a high peak power/ground noise (i.e., voltage drop) at the clock boundary. In this regards, this thesis addresses two new problems: the problem of reducing the clock power consumption at the clock network boundary, and the problem of reducing the peak current at the clock network boundary. Unlike the prior works which have considered the optimization of flip-flops and clock buffers separately, our approach takes into account the co-optimization of flip-flops and clock buffers. Precisely, we propose four different types of hardware component that can implement a set of flip-flops and their driving buffer as a single unit. The key idea for the derivation of the four types of clock boundary component is that one of the inverters in the driving buffer and one of the inverters in each flip-flop can be combined and removed without changing the functionality of the flip-flops. Consequently, we have a more freedom to select (i.e., allocate) clock boundary components that is able to reduce the power consumption or peak current under timing constraint. We have implemented our approach of clock boundary optimization under bounded clock skew constraint and tested it with ISCAS 89 benchmark circuits. The experimental results confirm that our approach is able to reduce the clock power consumption by 7.9∼10.2% and power/ground noise by 27.7%∼30.9% on average.Chapter 1 Introduction 1 1.1 Clock Signal 1 1.2 Metrics of Clock Design 2 1.3 Clock Network Topologies 4 1.4 Multibit Flip-flop 5 1.5 Simultaneous Switching Noise 6 1.6 Contributions of This Dissertation 6 Chapter 2 Clock Tree and Flip-flop Co-optimization for Reducing Power Consumption 8 2.1 Introduction 8 2.2 Types of Boundary Optimization 9 2.3 Analysis of Four Types of Flip-flop 12 2.3.1 Internal Power Comparison 12 2.3.2 Characterization of Power Consumption 14 2.4 Problem Formulation 15 2.5 The Proposed Algorithm 17 2.5.1 Independence Assumption 17 2.5.2 BoundaryMin Algorithm 17 2.6 Experimental Results 29 2.6.1 Experimental Setup 29 2.6.2 Clock Tree Boundary Optimization Results 33 2.6.3 Capacitance Analysis on Flip-flops 38 2.6.4 Slew and Skew Analysis 39 2.6.5 Window Width Analysis 39 2.7 Conclusions 41 Chapter 3 Clock Tree and Flip-flop Co-optimization for Reducing Power/Ground Noise 42 3.1 Introduction 42 3.2 Current Characteristic of Four Types of Flip-flop 45 3.3 Motivational Example 47 3.4 Problem Formulation 52 3.5 Proposed Algorithm 54 3.5.1 An Overview 54 3.5.2 Superposition of Current Flows 55 3.5.3 Formulation to Instance of MOSP Problem 57 3.5.4 Selecting Target Power Grid Points 59 3.5.5 Consideration of Reducing Power Consumption 62 3.6 Experimental Results 62 3.7 Summary 65 Chapter 4 Conclusion 68 4.1 Clock Buffer and Flip-flop Co-optimization for Reducing Power Consumption 68 4.2 Clock Buffer and Flip-flop Co-optimization for Reducing Power/Ground Noise 69 초둝 78Docto
    corecore