964 research outputs found

    λΉ„μš© 효율적인 클럭 및 νŒŒμ›Œ κ²Œμ΄νŒ… 섀계 방법둠

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사)--μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› :κ³΅κ³ΌλŒ€ν•™ 전기·정보곡학뢀,2020. 2. κΉ€νƒœν™˜.μ €μ „λ ₯ μ„€κ³„λŠ” μ΅œμ‹  μ‹œμŠ€ν…œ-온-μΉ© (SoCs) μ„€κ³„μ—μ„œ 맀우 μ€‘μš”ν•œ μš”μ†Œ μ€‘μ˜ ν•˜λ‚˜μ΄λ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 동적 및 정적 μ „λ ₯ μ†ŒλΉ„λ₯Ό κ°μ†Œμ‹œν‚€κΈ° μœ„ν•œ μ €μ „λ ₯ 섀계 방법둠에 λŒ€ν•΄ λ…Όν•œλ‹€. ꡬ체적으둜 λΉ„μš© 효율적인 μ €μ „λ ₯ 섀계λ₯Ό μœ„ν•˜μ—¬ 두 가지 μƒˆλ‘œμš΄ κΈ°μˆ μ„ μ œμ•ˆν•œλ‹€. μš°μ„  λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 동적 μ „λ ₯ μ†ŒλΉ„λ₯Ό 쀄일 수 μžˆλŠ” μƒˆλ‘œμš΄ 클럭 κ²Œμ΄νŒ… 방법을 μ œμ•ˆν•œλ‹€. κΈ°μ‘΄ ν”Œλ¦½-ν”Œλž μž…λ ₯ 데이터 ν† κΈ€ 기반 클럭 κ²Œμ΄νŒ…μ€ κ°€μž₯ 널리 μ‚¬μš©λ˜λŠ” 클럭 κ²Œμ΄νŒ… 기법 μ€‘μ˜ ν•˜λ‚˜μ΄λ‹€. ν•˜μ§€λ§Œ 이 방법은 더 λ§Žμ€ ν”Œλ¦½-ν”Œλžμ— λŒ€ν•΄ μ μš©ν• μˆ˜λ‘ 클럭 κ²Œμ΄νŒ…μ— ν•„μš”ν•œ λΆ€κ°€ νšŒλ‘œκ°€ κΈ‰κ²©νžˆ μ¦κ°€ν•œλ‹€λŠ” 근본적인 ν•œκ³„λ₯Ό μ§€λ‹ˆκ³  μžˆλ‹€. μ΄λŸ¬ν•œ ν•œκ³„λ₯Ό κ·Ήλ³΅ν•˜κΈ° μœ„ν•˜μ—¬ λ³Έ λ…Όλ¬Έμ—μ„œλŠ” λ‹€μŒκ³Ό 같이 μƒˆλ‘œμš΄ 클럭 κ²Œμ΄νŒ… 방법을 μ œμ•ˆν•œλ‹€. 첫 번째둜 κΈ°μ‘΄ μž…λ ₯ 데이터 ν† κΈ€ 기반 클럭 κ²Œμ΄νŒ… 방법에 ν•„μš”ν•œ 회둜 μžμ›μ„ λΆ„μ„ν•˜μ—¬ ν•΄λ‹Ή λ°©λ²•μ˜ λΉ„νš¨μœ¨μ„±μ„ 보이고, κΈ°μ‘΄ λ°©λ²•μ—μ„œ μ‚¬μš©λ˜λŠ” μž…λ ₯ 데이터 ν† κΈ€ κ²€μΆœμ— ν•„μˆ˜μ μ΄μ§€λ§Œ κ³ λΉ„μš©μ˜ XOR 게이트λ₯Ό μ™„λ²½νžˆ μ œκ±°ν•œ ν”Œλ¦½-ν”Œλž μƒνƒœ 기반 클럭 κ²Œμ΄νŒ…'μ΄λΌλŠ” μƒˆλ‘œμš΄ 클럭 κ²Œμ΄νŒ… 방법을 μ œμ•ˆν•œλ‹€. 두 번째둜 μ œμ•ˆλœ XOR κ²Œμ΄νŠΈκ°€ ν•„μš” μ—†λŠ” 클럭 κ²Œμ΄νŒ… 방법을 μœ„ν•œ λΆ€κ°€ 회둜λ₯Ό μ œμ‹œν•˜λ©°, λ‹€μ–‘ν•œ 타이밍 뢄석을 ν†΅ν•˜μ—¬ ν•΄λ‹Ή νšŒλ‘œκ°€ μ•ˆμ •μ μœΌλ‘œ 적용될 수 μžˆμŒμ„ 보인닀. μ„Έ 번째둜 회둜의 ν”Œλ¦½-ν”Œλž μƒνƒœ ν”„λ‘œνŒŒμΌμ— κΈ°λ°˜ν•˜μ—¬, μ œμ•ˆλœ 클럭 κ²Œμ΄νŒ… 기법을 κΈ°μ‘΄ 클럭 κ²Œμ΄νŒ… 기법과 μ™„λ²½ν•˜κ²Œ 톡합할 수 μžˆλŠ” 클럭 κ²Œμ΄νŒ… 방법둠을 μ œμ•ˆν•œλ‹€. μ—¬λŸ¬ 벀치마크 νšŒλ‘œμ— λŒ€ν•œ μ‹€ν—˜ κ²°κ³ΌλŠ” κΈ°μ‘΄ μž…λ ₯ 데이터 ν† κΈ€ 기반 클럭 κ²Œμ΄νŒ… 방법이 μ „λ ₯ μ†ŒλΉ„ 절감 기회λ₯Ό λ†“μΉ˜λŠ” 반면 λ³Έ λ…Όλ¬Έμ—μ„œ μ œμ•ˆλœ 방법은 λͺ¨λ“  타이밍 μ œμ•½ 쑰건을 λ§Œμ‘±ν•˜λ©΄μ„œ μ „λ ₯ μ†ŒλΉ„ κ°μ†Œμ— 맀우 νš¨κ³Όμ μž„μ„ 보여쀀닀. λ‹€μŒμœΌλ‘œ 정적 μ „λ ₯ μ†ŒλΉ„λ₯Ό 쀄이기 μœ„ν•œ λ°©μ•ˆμœΌλ‘œ, λ³Έ λ…Όλ¬Έμ—μ„œλŠ” κΈ°μ‘΄ νŒŒμ›Œ 게이트 회둜의 μƒνƒœ 보쑴용 μ €μž₯ 곡간 ν• λ‹Ή 방법듀이 μ§€λ‹ˆκ³  μžˆλŠ” 두 가지 μ€‘μš”ν•œ ν•œκ³„λ“€μ„ ν•΄κ²°ν•  수 μžˆλŠ” 방법을 μ œμ•ˆν•œλ‹€. μ€‘μš”ν•œ ν•œκ³„λ“€μ΄λž€ 첫 번째둜 닀쀑-λΉ„νŠΈ μƒνƒœ 보쑴 ν”Œλ¦½-ν”Œλžμ˜ λ¬΄λΆ„λ³„ν•œ μ‚¬μš©μœΌλ‘œ μΈν•œ κΈ΄ 웨이크업 지연 μ‹œκ°„μ΄λ©°, 두 번째둜 λ©€ν‹°ν”Œλ ‰μ„œ λ˜λ¨Ήμž„ 루프가 μžˆλŠ” μƒνƒœ 보쑴 ν”Œλ¦½-ν”Œλžμ˜ μ΅œμ ν™” λΆˆκ°€λŠ₯성이닀. κΈ°μ‘΄ λ°©λ²•λ“€μ—μ„œλŠ” μƒνƒœ 보쑴을 μœ„ν•œ μ €μž₯ 곡간을 μ΅œμ†Œν™”ν•˜κΈ° μœ„ν•΄ κΈ΄ 웨이크업 지연 μ‹œκ°„μ΄ ν•„μˆ˜μ μ΄μ—ˆλ‹€. 그리고 λ˜λ¨Ήμž„ 루프가 μžˆλŠ” ν”Œλ¦½-ν”Œλžμ€ μ΅œμ ν™”ν•  수 μ—†λŠ” λŒ€μƒμœΌλ‘œ λ‹€λ£¨μ–΄μ‘Œλ‹€. κ·ΈλŸ¬λ‚˜ 일반적으둜 ν•˜λ“œμ›¨μ–΄ 기술 μ–Έμ–΄(HDL)λ‘œλΆ€ν„° μƒμ„±λ˜λŠ” λ˜λ¨Ήμž„ 루프λ₯Ό μ§€λ‹Œ ν”Œλ¦½-ν”Œλžμ€ λ¬΄μ‹œν•  수 μžˆμ„ μ •λ„λ‘œ 적은 양이 μ•„λ‹ˆλ‹€. 첫 번째 ν•œκ³„λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•œ λ°©λ²•μœΌλ‘œ λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ΅œλŒ€ 2 λΉ„νŠΈμ˜ 닀쀑-λΉ„νŠΈ μƒνƒœ 보쑴 ν”Œλ¦½-ν”Œλžμ„ μ‚¬μš©ν•˜μ—¬ 웨이크업 지연 μ‹œκ°„μ„ 두 클럭 μ‚¬μ΄ν΄λ‘œ μ œν•œν•˜λ©΄μ„œλ„ μƒνƒœ 보쑴을 μœ„ν•œ μ €μž₯ 곡간을 효율적으둜 μ ˆμ•½ν•  수 μžˆμŒμ„ 보인닀. 그리고 두 번째 ν•œκ³„λ₯Ό κ·Ήλ³΅ν•˜κΈ° μœ„ν•΄μ„œ λ˜λ¨Ήμž„ 루프λ₯Ό μ§€λ‹Œ ν”Œλ¦½-ν”Œλžμ΄ ν¬ν•¨λœ 두 ν”Œλ¦½-ν”Œλž 쌍의 μƒνƒœλ₯Ό 볡원할 수 μžˆλŠ” 2단 μƒνƒœ 보쑴 μ œμ–΄ λ°©μ•ˆμ„ μ œμ•ˆν•œλ‹€. λ˜ν•œ 주어진 νšŒλ‘œμ—μ„œ μΆ©λŒμ—†μ΄ λ™μ‹œμ— μ‘΄μž¬ν•  수 μžˆλŠ” ν”Œλ¦½-ν”Œλž μŒμ„ μ΅œλŒ€λ‘œ μΆ”μΆœν•˜κΈ° μœ„ν•΄ 독립 집합 문제(independent set problem)기반의 연산법도 μ œμ•ˆν•œλ‹€. 벀치마크 νšŒλ‘œμ— λŒ€ν•œ μ‹€ν—˜ κ²°κ³ΌλŠ” λ³Έ λ…Όλ¬Έμ—μ„œ μ œμ•ˆλœ 방법이 웨이크업 지연 μ‹œκ°„μ„ 두 클럭 μ‚¬μ΄ν΄λ‘œ μ œν•œν•˜λ©΄μ„œλ„ μƒνƒœ 보쑴에 ν•„μš”ν•œ μ €μž₯ 곡간과 νŒŒμ›Œλ₯Ό κ°μ†Œμ‹œν‚€λŠ”λ° 맀우 νš¨κ³Όμ μž„μ„ 보여쀀닀.Low power design is of great importance in modern system-on-chips (SoCs). This dissertation studies on low power design methodologies for saving dynamic and static power consumption. Precisely, we unveil two novel techniques of cost effective low power design. Firstly, we propose a novel clock gating method for reducing the dynamic power consumption. Flip-flop's input data toggling based clock gating is one of the most commonly used clock gating methods, in which one critical and inherent limitation is the sharp increase of gating logic as more flip-flops are involved in gating. In this dissertation, we propose a new clock gating method to overcome this limitation. Specifically, (1) we analyze the resources of gating logic in the input data toggling based clock gating, from which an ineffectiveness in resource utilization is observed and we propose a new clock gating technique called flip-flop state driven clock gating which completely eliminates the essential and expensive component of XOR gates for detecting input toggling of flip-flops; (2) we provide the supporting logic circuitry of our proposed XOR-free clock gating, confirming its safe applicability through a comprehensive timing analysis; (3) we propose, based on the flip-flops' state profile, a clock gating methodology that seamlessly combines our flip-flop state based clock gating with the toggling based clock gating. Through experiments with benchmark circuits, it is confirmed that our clock gating method is very effective in reducing power, which otherwise the toggling based clock gating shall miss the power saving opportunity, while meeting all timing constraints. Secondly, for reducing the static power consumption, we solve two critical limitations of the conventional approaches to the allocation of state retention storage for power gated circuits. Those are (1) the long wakeup delay caused by the senseless use of multi-bit retention flip-flops (MBRFFs) and (2) the inability to optimize retention flip-flops for the flip-flops with mux-feedback loop. It should be noted that the conventional approaches have regarded the long wakeup delay as an inevitable consequence of maximizing the reduction of total storage size for state retention while they have treated the flip-flops with mux-feedback loop (called self-loop flip-flop) as nonoptimizable component, but practically, the self-loop flip-flops synthesized from hardware description language (HDL) code are not far from a small amount and thus, can in no way be negligible. More precisely, for solving (1), we show that the use of MBRFFs with up to two bits, consequently, constraining the wakeup delay to no more than two clock cycles, is enough to maintain the high reduction of total retention storage and for solving (2), we devise a 2-phase retention control mechanism for a pair of flip-flops, one of which has self-loop, by which just a single retention bit can be used to restore state of the two flip-flops, and propose an independent set based algorithm for maximally extracting the non-conflict pairs from circuits. Through experiments with benchmark circuits, it is shown that our proposed method is very effective against reducing the state retention storage and the power consumption compared with the existing best MBRFF allocation while the wakeup delay is strictly limited to two clock cycles.1 INTRODUCTION 1 1.1 Clock Gating 1 1.2 Power Gating and State Retention 3 1.3 Multi-bit Retention Registers 4 1.4 Contributions of This Dissertation 6 2 FLIP-FLOP STATE DRIVEN CLOCK GATING: CONCEPT, DESIGN, AND METHODOLOGY 9 2.1 Motivations 9 2.1.1 Toggling based Clock Gating 9 2.1.2 Area and Power by Clock Gating 10 2.2 The Proposed Clock Gating 13 2.2.1 Concept of Flip-flop State Driven Clock Gating 13 2.2.2 Design of Gating Logic Circuitry 17 2.2.3 Integrated Clock Gating Methodology 22 2.2.4 Cost Formulation 23 2.3 Experiments 25 2.3.1 Experimental Setup 25 2.3.2 Experimental Results 26 3 ALGORITHM AND DESIGN OPTIMIZATION OF ALLOCATING MULTI-BIT RETENTION FLIP-FLOPS FOR POWER GATED CIRCUITS 32 3.1 Motivations 32 3.1.1 Flip-flops with Mux-feedback Loop 32 3.1.2 Impact of Wakeup Delay 37 3.2 The Proposed Allocation Algorithm 39 3.3 Design of Multi-Bit Retention Flip-Flop and Multi-Bit Extension 48 3.3.1 Multi-Bit Retention Flip-Flop 48 3.3.2 Multi-Bit Flip-Flop Extension 52 3.4 Experiments 54 3.4.1 Experimental Setup 54 3.4.2 Experimental Results 57 4 CONCLUSIONS 65 4.1 Flip-flop State Driven Clock Gating: Concept, Design, and Methodology 65 4.2 Algorithm and Design Optimization of Allocating Multi-bit Retention Flip-flops for Power Gated Circuits 66 Abstract (In Korean) 71Docto

    클럭 κ²Œμ΄νŒ… 및 ν”Œλ¦½ ν”Œλ‘­ λ™μ‹œ μ΅œμ ν™”λ₯Ό μœ„ν•œ 섀계 및 μ•Œκ³ λ¦¬μ¦˜

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (석사)-- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 전기·정보곡학뢀, 2019. 2. κΉ€νƒœν™˜.λ³Έ λ…Όλ¬Έμ—μ„œλŠ” ν‘œμ€€ μ…€μ—μ„œλΆ€ν„° 배치 단계에 이λ₯΄λŠ” λ‹€μ–‘ν•œ μ„€κ³„λ‹¨μ—μ—μ„œ 칩의 동적 μ „λ ₯을 μ΅œμ ν™” 기법을 μ†Œκ°œν•œλ‹€. 이 μ—°κ΅¬λŠ” μš°μ„  데이터 κ΅¬λ™ν˜• (즉, 토글링 기반) 클럭 κ²Œμ΄νŒ…μ΄ μ’…λž˜ 클럭 κ²Œμ΄νŒ… κΈ°λ²•λ“€μ—μ„œ κ²°μ½” 닀루어지지 μ•Šμ•˜λ˜ ν”Œλ¦½ ν”Œ 둭의 ν•©μ„±κ³Ό λ°€μ ‘ν•˜κ²Œ 톡합될 수 μžˆλŠ” 방법을 μ—°κ΅¬ν•œλ‹€. 우리의 κ΄€μΈ‘μ˜ 핡심은 ν”Œλ¦½ ν”Œλ‘­ μ…€μ˜ 일뢀 λ‚΄λΆ€ λΆ€ν’ˆμ΄ 클럭 κ²Œμ΄νŒ… 인에이블 μ‹ ν˜Έλ₯Ό 생성 ν•˜κΈ° μœ„ν•΄ μž¬μ‚¬μš© 될 수 μžˆλ‹€λŠ” 것이닀. 이λ₯Ό λ°”νƒ•μœΌλ‘œ eXOR-FF 라고 λΆˆλ¦¬λŠ” μƒˆλ‘­κ²Œ μ΅œμ ν™”λœ ν”Œλ¦½ ν”Œλ‘­ λ°°μ„  ꡬ쑰λ₯Ό μ œμ•ˆν•©λ‹ˆλ‹€. 이 κ΅¬μ‘°μ—μ„œλŠ” 맀 클럭 μ£ΌκΈ°λ§ˆλ‹€ λ‚΄λΆ€ λ‘œμ§μ„ μž¬μ‚¬μš© ν•˜μ—¬ 클럭 κ²Œμ΄νŒ…μ„ 톡해 ν”Œλ¦½ ν”Œλ‘­μ„ ν™œμ„±ν™”ν• μ§€ λ˜λŠ” λΉ„ν™œμ„±ν™”ν• μ§€ κ²°μ •ν•©λ‹ˆλ‹€. λͺ¨λ“  쌍의 ν”Œλ¦½ ν”Œλ‘­ 및 토글릴 감지 λ‘œμ§μ—μ„œμ˜ μ˜μ—­μ„ μ ˆμ•½ν•¨μ— λ”°λΌμ„œ λˆ„μ„€ 및 동적 μ „λ ₯의 μ ˆμ „ 효과λ₯Ό λ‹¬μ„±ν•©λ‹ˆλ‹€. 그런 λ‹€μŒ, 두 κ°€μ§€κ³ μœ ν•œ μž₯점을 μ œκ³΅ν•˜λŠ” 배치/타이밍 인식 클럭 κ²Œμ΄νŒ… 탐색에 λŒ€ν•œ 포괄적인 방법둠을 μ œμ•ˆν•©λ‹ˆλ‹€. ν•΄λ‹Ή λ°© 법둠은 eXOR-FF 의 이점을 κ·ΉλŒ€ν™”ν•˜κ³ , μ „λ ₯ μ†ŒλΉ„ 및 타이밍 영ν–₯의 뢄해에 λŒ€ν•œ μ •λ°€ 뢄석을 μˆ˜ν–‰ν•˜κ³  ν‹€λŸ­ κ²Œμ΄νŒ… μ°Έμƒ‰μ˜ 핡심 엔진을 λΉ„μš©κΈ°λŠ₯으둜 λ³€ν™˜ν•˜λŠ”λ° κ°€μž₯ μ ν•©ν•©λ‹ˆλ‹€. ISCAS89, ITC89, ITC99 및 IWLS 2005의 벀치 마크 회둜λ₯Ό μ‚¬μš© ν•œ μ‹€ν—˜μ„ 톡해 μ œμ•ˆ 된 방법이 μ΄μ „μ˜ 데이터 ꡬ동 클둝 κ²Œμ΄νŒ… 방식과 λΉ„κ΅ν•˜μ—¬ 총 μ „λ ₯을 5.6 % 및 면적으둜 5.3 % 쀄일 수 μžˆμŒμ„ 보여 μ£Όμ—ˆλ‹€.In this paper, we introduce dynamic power optimization techniques applicable for various design stage from standard cell to placement stage. This work firstly investiοΏ½gates the problem of how designing data-driven (i.e., toggling based) clock gating can be closely integrated with the synthesis of flip-flops, which has never been addressed in the prior clock gating works. Our key observation is that some internal part of a flip-flop cell can be reused to generate its clock gating enable signal. Based on this, we propose a newly optimized flip-flop wiring structure, called eXOR-FF, in which an internal logic can be reused for every clock cycle to decide if the flip-flop is to be activated or inactivated through clock gating, thereby achieving area saving (thus, leakage as well as dynamic power saving) on every pair of flip-flop and its toggling detection logic. Then, we propose a comprehensive methodology of placement/timingοΏ½aware clock gating exploration that provides two unique strengths: best suited for maxοΏ½imally exploiting the benefit of eXOR-FFs and precise analyses on the decomposition of power consumptions and timing impact, and translating them into cost functions in core engine of clock gating exploration. Through experiments with benchmark circuits in ISCAS89, ITC89, ITC99 and IWLS 2005, it is shown that our proposed method is able to reduce the total power by 5.6% and total cell area by 5.3% compared with the previous data-driven clock gating method in [1].Abstract Contents List of Tables List of Figures 1 Introduction 1.1 Power Consumption in CMOS Digital Design 1.2 Low Power Design Methodologies 1.3 Contribution of This Thesis 2 Preliminary and Motivations 6 2.1 Background 2.2 Observation on Area and Power Saving 2.3 Observation on Timing Impact 3 Redesign of Flip-flops Specialized for Clock Gating 3.1 Observation on Area Impact 4 Placement-aware Clock Gating Methodology Utilizing eXOR-FF Cells 4.1 Overall Design Flow 4.2 Cost Formulation for Conventional Clock Gating 4.3 Cost Formulation for Our Clock Gating using eXOR-FFs 5 Experiments 5.1 Experimental Setup 5.2 Experimental Results 5.3 Comparing with Industry Algorithm 6 Conclusion Abstract (In Korean)Maste

    Desynchronization: Synthesis of asynchronous circuits from synchronous specifications

    Get PDF
    Asynchronous implementation techniques, which measure logic delays at run time and activate registers accordingly, are inherently more robust than their synchronous counterparts, which estimate worst-case delays at design time, and constrain the clock cycle accordingly. De-synchronization is a new paradigm to automate the design of asynchronous circuits from synchronous specifications, thus permitting widespread adoption of asynchronicity, without requiring special design skills or tools. In this paper, we first of all study different protocols for de-synchronization and formally prove their correctness, using techniques originally developed for distributed deployment of synchronous language specifications. We also provide a taxonomy of existing protocols for asynchronous latch controllers, covering in particular the four-phase handshake protocols devised in the literature for micro-pipelines. We then propose a new controller which exhibits provably maximal concurrency, and analyze the performance of desynchronized circuits with respect to the original synchronous optimized implementation. We finally prove the feasibility and effectiveness of our approach, by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architectur

    Physical Design and Clock Tree Synthesis Methods For A 8-Bit Processor

    Get PDF
    Now days a number of processors are available with a lot kind of feature from different industries. A processor with similar kind of architecture of the current processors only missing the memory stuffs like the RAM and ROM has been designed here with the help of Verilog style of coding. This processor contains architecturally the program counter, instruction register, ALU, ALU latch, General Purpose Registers, control state module, flag registers and the core module containing all the modules. And a test module is designed for testing the processor. After the design of the processor with successful functionality, the processor is synthesized with 180nm technology. The synthesis is performed with the data path optimization like the selection of proper adders and multipliers for timing optimization in the data path while the ALU operations are performed. During synthesis how to take care of the worst negative slack (WNS), how to include the clock gating cells, how to define the cost and path groups etc. have been covered. After the proper synthesis we get the proper net list and the synthesized constraint file for carrying out the physical design. In physical design the steps like floor-planning, partitioning, placement, legalization of the placement, clock tree synthesis, and routing etc. have been performed. At all the stages the static timing analysis is performed for the timing meet of the design for better performance in terms of timing or frequency. Each steps of physical design are discussed with special effort towards the concepts behind the step. Out of all the steps of physical design the clock tree synthesis is performed with some improvement in the performance of the clock tree by creating a symmetrical clock tree and maintaining more common clock paths. A special algorithm has been framed for creating a symmetrical clock tree and thereby making the power consumption of the clock tree low

    Power Minimisation Techniques for Testing Low Power VLSI Circuits (PhD Dissertation)

    No full text
    Testing low power very large scale integrated (VLSI) circuits has recently become an area of concern due to yield and reliability problems. This dissertation focuses on minimising power dissipation during test application at logic level and register-transfer level (RTL) of abstraction of the VLSI design flow. The first part of this dissertation addresses power minimisation techniques in scan sequential circuits at the logic level of abstraction. A new best primary input change (BPIC) technique based on a novel test application strategy has been proposed. The technique increases the correlation between successive states during shifting in test vectors and shifting out test responses by changing the primary inputs such that the smallest number of transitions is achieved. The new technique is test set dependent and it is applicable to small to medium sized full and partial scan sequential circuits. Since the proposed test application strategy depends only on controlling primary input change time, power is minimised with no penalty in test area, performance, test efficiency, test application time or volume of test data. Furthermore, it is shown that partial scan does not provide only the commonly known benefits such as less test area overhead and test application time, but also less power dissipation during test application when compared to full scan. To achieve power savings in large scan sequential circuits a new test set independent multiple scan chain-based technique which employs a new design for test (DFT) architecture and a novel test application strategy, is presented. The technique has been validated using benchmark examples, and it has been shown that power is minimised with low computational time, low overhead in test area and volume of test data, and with no penalty in test application time, test efficiency, or performance. The second part of this dissertation addresses power minimisation techniques for testing low power VLSI circuits using built-in self-test (BIST) at RTL. First, it is important to overcome the shortcomings associated with traditional BIST methodologies. It is shown how a new BIST methodology for RTL data paths using a novel concept called test compatibility classes (TCC) overcomes high test application time, BIST area overhead, performance degradation, volume of test data, fault-escape probability, and complexity of the testable design space exploration. Second, power minimisation in BIST RTL data paths is achieved by analysing the effect of test synthesis and test scheduling on power dissipation during test application and by employing new power conscious test synthesis and test scheduling algorithms. Third, the new BIST methodology has been validated using benchmark examples. Further, it is shown that when the proposed power conscious test synthesis and test scheduling is combined with novel test compatibility classes simultaneous reduction in test application time and power dissipation is achieved with low overhead in computational time

    Address generator synthesis

    Get PDF

    μ •ν™•ν•˜κ³  ν•™μŠ΅ 기반 μ „λ ₯ 뢄석을 기반으둜 ν•˜λŠ” 클둝 κ²Œμ΄νŒ…μ˜ ν•©μ„±

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(석사) -- μ„œμšΈλŒ€ν•™κ΅λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 전기·정보곡학뢀, 2023. 2. κΉ€νƒœν™˜.In this paper, we introduce two techniques to efficiently apply clock gating in the synthesis stage. First, We propose a new clock gating methodology based on a precise power saving analysis to overcome the ineffectiveness of the conventional logic structure based clock gating. Two new features exploited in our proposed clock gating are (i) the multiplexer selection signal probability that a flip-flop with multiplexer feedback loop receives a new input and (ii) the joint probability of selection signals that two flip-flops with different multiplexor selection signals both receive new inputs at the same clock cycle. In summary, our method reduces the total power consumption by 2.46% on average (up to 5.00%) over the conventional clock gating method. In the second work, we address a new problem of transforming the long toggling/untoggling sequences of flip-flops cycle-accurate activities into short embedding vectors, so that the flip-flop grouping for clock gating is practically feasible in terms of the memory usage and run time for checking activity similarity among flip-flops. To this end, we propose a machine learning based generation of embedding vectors which are accurate enough to predict the original flip-flop toggling sequences. Precisely, we develop a neural network model of LSTM (long short-term memory) based AE(autoencoder) model combined with SDAE (stacked denoising autoencoder) to take into account the time-series (i.e., clock cycle) similarity feature among the toggling sequences, which is essential to determine which flip-flops should be grouped together for clock gating. By integrating (1) our LSTM based embedding vector generation model, we propose two additional ML models for clock gating: (2) joint state probability predictor (JSP) model for generating 0-state probability of two embedding vectors, and (3) joint feature predictor (JFP) model for generating a new embedding vector that combines two embedding vectors. Through experiments, it is confirmed that our proposed LSTM combined with AutoEnc improves the toggling sequence prediction accuracy up to 0.88 while an LSTM (long short-term memory) based AE model produces accuracy to 0.72, thereby enabling our ML based clock gating framework to save the dynamic power consumption further over that by the state-of-the-art commercial clock gating tool, which relies on the flip-flops toggling probability for grouping flip-flops. Through experiments with benchmark circuits in IWLS, it is shown that our method is able to reduce the dynamic power by 14.0% on average over that by the conventional toggling-driven clock gating.λ³Έ λ…Όλ¬Έμ—μ„œλŠ” ν•©μ„± λ‹¨κ³„μ—μ„œ 클둝 κ²Œμ΄νŒ…μ„ 효율적으둜 μ μš©ν•˜κΈ° μœ„ν•œ 두 가지 기법을 μ†Œκ°œν•œλ‹€. 첫째둜, 클둝 κ²Œμ΄νŒ… 기반의 κΈ°μ‘΄ 둜직 ꡬ쑰의 λΉ„νš¨μœ¨μ„±μ„ κ·Ήλ³΅ν•˜κΈ° μœ„ν•΄ μ •λ°€ ν•œ μ ˆμ „ 뢄석을 기반으둜 ν•œ μƒˆλ‘œμš΄ 클둝 κ²Œμ΄νŒ… 방법둠을 μ œμ•ˆν•œλ‹€. μ œμ•ˆλœ 클둝 κ²Œμ΄νŒ… λ°©λ²•μ—μ„œ ν™œμš©λ˜λŠ” 두 가지 μƒˆλ‘œμš΄ κΈ°λŠ₯은 (i) ν”Όλ“œλ°± 루프가 μžˆλŠ” ν”Œλ¦½ν”Œλ‘­ 의 λ©€ν‹°ν”Œλ ‰μ„œ 선택 μ‹ ν˜Έ ν™•λ₯  및 (ii) μ„œλ‘œ λ‹€λ₯Έ λ©€ν‹°ν”Œλ ‰μ„œ 선택 μ‹ ν˜Έλ₯Ό κ°–λŠ” 두 ν”Œλ¦½ν”Œλ‘­μ˜ λ©€ν‹°ν”Œλ ‰μ„œ 선택 μ‹ ν˜Έ κ²°ν•© ν™•λ₯ μ΄λ‹€. μ „λ ₯ 이득이 μžˆλŠ” κ²½μš°μ—λ§Œ 클둝 κ²Œμ΄νŒ…μ„ μ μš©ν•˜κ³  μ„œλ‘œ λ‹€λ₯Έ 클둝 κ²Œμ΄νŒ… 그룹을 ν†΅ν•©ν•¨μœΌλ‘œμ„œ 전체 동적 μ „λ ₯λ₯Ό μ€„μ΄κ³ μž ν•˜μ˜€λ‹€. μ‹€ν—˜μ„ 톡해 기쑴의 클둝 κ²Œμ΄νŒ… 방법에 λΉ„ν•΄ 평균 2.46%(μ΅œλŒ€ 5.00%)의 총 μ „λ ₯ μ†ŒλΉ„λ₯Ό μ€„μ΄λŠ” 것을 ν™•μΈν•˜μ˜€λ‹€. 두 번째둜 ν”Œλ¦½ν”Œλ‘­μ˜ 클둝 주기별 μƒνƒœλ₯Ό λ‚˜νƒ€λ‚΄λŠ” κΈ΄ 토글링/언토글링 μ‹œν€€μŠ€ λ₯Ό 짧은 μž„λ² λ”© λ²‘ν„°λ‘œ λ³€ν™˜ν•˜λŠ” 문제λ₯Ό ν•΄κ²°ν•˜μ˜€λ‹€. 이λ₯Ό 토글링 기반 클둝 게이 νŒ…μ„ μœ„ν•œ ν”Œλ¦½ν”Œλ‘­ 그룹화에 μ μš©ν•˜μ—¬ ν”Œλ¦½ν”Œλ‘­ κ°„μ˜ μƒνƒœ μœ μ‚¬μ„± 확인이 λ©”λͺ¨λ¦¬ μ‚¬μš©λŸ‰ 및 μ‹€ν–‰ μ‹œκ°„ μΈ‘λ©΄μ—μ„œ μ‹€μ§ˆμ μœΌλ‘œ μ‹€ν˜„ κ°€λŠ₯ν•˜κ²Œ ν•˜μ˜€λ‹€. 이λ₯Ό μœ„ν•΄ 기계 ν•™μŠ΅ 기반으둜 μ›λž˜μ˜ ν”Œλ¦½ν”Œλ‘­ ν† κΈ€ μ‹œν€€μŠ€λ₯Ό μ˜ˆμΈ‘ν•˜κΈ°μ— μΆ©λΆ„νžˆ μ •ν™•ν•œ μ €μ°¨μ›μ˜ μž„λ² λ”© λ²‘ν„°μ˜ 생성을 μ œμ•ˆν•œλ‹€. μš°λ¦¬λŠ” 토글링 μ‹œν€€μŠ€ κ°„μ˜ μ‹œκ³„μ—΄ μœ μ‚¬μ„±μ„ κ³ λ € ν•˜κΈ° μœ„ν•΄ λ””λ…Έμ΄μ¦ˆ μ˜€ν† μΈμ½”λ”λ₯Ό μ΄μš©ν•˜μ—¬ 5000 클둝 μ‚¬μ΄ν΄μ˜ 토글링 μ‹œν€€μŠ€λ₯Ό 10μ°¨μ›μœΌλ‘œ μ••μΆ•ν•˜κ³  이λ₯Ό μž₯단기 λ©”λͺ¨λ¦¬ μ˜€ν† μΈμ½”λ”μ— μž…λ ₯ν•˜μ—¬ 전체 μ‹œν€€μŠ€λ₯Ό λŒ€λ³€ν•˜λŠ” 저차원 μž„λ² λ”© 벑터λ₯Ό μƒμ„±ν•˜λŠ” 신경망 λͺ¨λΈμ„ κ°œλ°œν•˜μ˜€λ‹€. λ˜ν•œ μš°λ¦¬λŠ” 클둝 κ²Œμ΄νŒ…μ„ μœ„ν•œ 두 가지 뢀가적인 신경망 λͺ¨λΈμΈ (1) 2개의 μž„λ² λ”© λ²‘ν„°μ˜ 0- μƒνƒœ ν™•λ₯  생성을 μœ„ν•œ κ²°ν•© ν™•λ₯  예츑 λͺ¨λΈκ³Ό (2) 두 개의 μž„λ² λ”© 벑터λ₯Ό κ²°ν•©ν•˜μ—¬ μƒˆλ‘œμš΄ μž„λ² λ”© 벑터λ₯Ό μ˜ˆμΈ‘ν•˜λŠ” κ²°ν•© νŠΉμ§• 예츑 λͺ¨λΈμ„ μ œμ•ˆν•œλ‹€. IWLS 벀치마크 회둜λ₯Ό μ΄μš©ν•œ μ‹€ν—˜μ„ 톡해, λ””λ…Έμ΄μ¦ˆ μ˜€ν† μΈμ½”λ”λ§Œ μ‚¬μš©ν–ˆμ„λ•Œλ³΄λ‹€ μž₯단기 λ©”λͺ¨λ¦¬ 기반의 μ˜€ν† μΈμ½”λ”λ₯Ό κ²°ν•©ν–ˆμ„ λ•Œ μž…λ ₯ 데이터λ₯Ό 볡원 정확도가 더 μš°μˆ˜ν•œ 것을 ν™• μΈν•˜μ˜€λ‹€. λ˜ν•œ 우리의 방법이 기쑴의 토글링 기반 클둝 κ²Œμ΄νŒ…μ— λΉ„ν•΄ 평균 14.0% 의 동적 μ „λ ₯을 쀄일 수 μžˆμŒμ„ ν™•μΈν•˜μ˜€λ‹€.1 Selective Clock Gating Based on Comprehensive Power Saving Analysis 1 1.1 Introduction 1 1.2 Preliminary and Motivation 1 1.3 Selective Clock Gating 3 1.3.1 Concept of Selective Clock Gating 3 1.3.2 Joint probability of selection signals 5 1.4 Experimental Results 6 1.4.1 Experimental Setup 6 1.4.2 Experimental Result 7 1.5 Conclusion 10 2 Machine Learning Based Flip-Flop Grouping for Toggling Driven Clock Gating 11 2.1 Introduction 11 2.2 Preliminaries and Prior Works 13 2.2.1 Preliminary and Motivation 13 2.2.2 Prior Works 14 2.3 Machine Learning Based Clock Gating Framework 14 2.3.1 Primary Model: Embedding Vector Generation 14 2.3.2 Secondary Models: Joint State Probability and Joint Feature Prediction 17 2.3.3 Distance Analysis Between Embedding Vectors 18 2.3.4 Power Analysis Model 19 2.3.5 Overall Flow of Flip-flop Grouping 19 2.4 Experimental Results 19 2.4.1 Comparison of Dynamic Power Saving 20 2.4.2 Performance of Auto-encoder Reconstruction Model 21 2.5 Conclusion 21 Abstract (In Korean) 26석

    Physical design of USB1.1

    Get PDF
    In earlier days, interfacing peripheral devices to host computer has a big problematic. There existed so many different kinds’ ports like serial port, parallel port, PS/2 etc. And their use restricts many situations, Such as no hot-pluggability and involuntary configuration. There are very less number of methods to connect the peripheral devices to host computer. The main reason that Universal Serial Bus was implemented to provide an additional benefits compared to earlier interfacing ports. USB is designed to allow many peripheral be connecting using single standardize interface. It provides an expandable fast, cost effective, hot-pluggable plug and play serial hardware interface that makes life of computer user easier allowing them to plug different devices to into USB port and have them configured automatically. In this thesis demonstrated the USB v1.1 architecture part in briefly and generated gate level net list form RTL code by applying the different constraints like timing, area and power. By applying the various types design constraints so that the performance was improved by 30%. And then it implemented in physically by using SoC encounter EDI system, estimation of chip size, power analysis and routing the clock signal to all flip-flops presented in the design. To reduce the clock switching power implemented register clustering algorithm (DBSCAN). In this design implementation TSMC 180nm technology library is used

    CAD Tools for Synthesis of Sleep Convention Logic

    Get PDF
    This dissertation proposes an automated flow for the Sleep Convention Logic (SCL) asynchronous design style. The proposed flow synthesizes synchronous RTL into an SCL netlist. The flow utilizes commercial design tools, while supplementing missing functionality using custom tools. A method for determining the performance bottleneck in an SCL design is proposed. A constraint-driven method to increase the performance of linear SCL pipelines is proposed. Several enhancements to SCL are proposed, including techniques to reduce the number of registers and total sleep capacitance in an SCL design

    Performance Comparison of Static CMOS and Domino Logic Style in VLSI Design: A Review

    Get PDF
    Of late, there is a steep rise in the usage of handheld gadgets and high speed applications. VLSI designers often choose static CMOS logic style for low power applications. This logic style provides low power dissipation and is free from signal noise integrity issues. However, designs based on this logic style often are slow and cannot be used in high performance circuits. On the other hand designs based on Domino logic style yield high performance and occupy less area. Yet, they have more power dissipation compared to their static CMOS counterparts. As a practice, designers during circuit synthesis, mix more than one logic style judiciously to obtain the advantages of each logic style. Carefully designing a mixed static Domino CMOS circuit can tap the advantages of both static and Domino logic styles overcoming their own short comings
    • …
    corecore