Search CORE

22,583 research outputs found

Introducing KeyRing self‐timed microarchitecture and timing‐driven design flow

Author: Fiorentino Mickael
Savaria Yvon
Thibeault Claude
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/11/2021
Field of study

ABSTRACT: A self-timed microarchitecture called KeyRing is presented, and a method for implementing KeyRing circuits compatible with a timing-driven electronic design automation (EDA) flow is discussed. The KeyRing microarchitecture is derived from the AnARM, a low-power self-timed ARM processor based on ad hoc design principles. First, the unorthodox design style and circuit structures are revisited. A theoretical model that can support the design of generic circuits and the elaboration of EDA methods is then presented. Also addressed are the compatibility issues between KeyRing circuits and timing-driven EDA flows. The proposed method leverages relative timing constraints to translate the timing relations in a KeyRing circuit into a set of timing constraints that enable timing-driven synthesis and static timing analysis. Finally, two 32-bit RISC-V processors are presented; called KeyV and based on KeyRing microarchitectures, they are synthesized in a 65 nm technology using the proposed EDA flow. Postsynthesis results demonstrate the effectiveness of the design methodology and allow comparisons with a synchronous alternative called SynV. Performance and power consumption evaluations show that KeyV has a power efficiency that lies between SynV with clock-gating and SynV without clock-gating

Directory of Open Access Journals

PolyPublie

클럭 게이팅 및 플립 플롭 동시 최적화를 위한 설계 및 알고리즘

Author: 양기용
Publication venue: 서울대학교 대학원
Publication date: 01/02/2019
Field of study

학위논문 (석사)-- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2019. 2. 김태환.본 논문에서는 표준 셀에서부터 배치 단계에 이르는 다양한 설계단에에서 칩의 동적 전력을 최적화 기법을 소개한다. 이 연구는 우선 데이터 구동형 (즉, 토글링 기반) 클럭 게이팅이 종래 클럭 게이팅 기법들에서 결코 다루어지지 않았던 플립 플 롭의 합성과 밀접하게 통합될 수 있는 방법을 연구한다. 우리의 관측의 핵심은 플립 플롭 셀의 일부 내부 부품이 클럭 게이팅 인에이블 신호를 생성 하기 위해 재사용 될 수 있다는 것이다. 이를 바탕으로 eXOR-FF 라고 불리는 새롭게 최적화된 플립 플롭 배선 구조를 제안합니다. 이 구조에서는 매 클럭 주기마다 내부 로직을 재사용 하여 클럭 게이팅을 통해 플립 플롭을 활성화할지 또는 비활성화할지 결정합니다. 모든 쌍의 플립 플롭 및 토글릴 감지 로직에서의 영역을 절약함에 따라서 누설 및 동적 전력의 절전 효과를 달성합니다. 그런 다음, 두 가지고유한 장점을 제공하는 배치/타이밍 인식 클럭 게이팅 탐색에 대한 포괄적인 방법론을 제안합니다. 해당 방 법론은 eXOR-FF 의 이점을 극대화하고, 전력 소비 및 타이밍 영향의 분해에 대한 정밀 분석을 수행하고 틀럭 게이팅 참색의 핵심 엔진을 비용기능으로 변환하는데 가장 적합합니다. ISCAS89, ITC89, ITC99 및 IWLS 2005의 벤치 마크 회로를 사용 한 실험을 통해 제안 된 방법이 이전의 데이터 구동 클록 게이팅 방식과 비교하여 총 전력을 5.6 % 및 면적으로 5.3 % 줄일 수 있음을 보여 주었다.In this paper, we introduce dynamic power optimization techniques applicable for various design stage from standard cell to placement stage. This work firstly investi�gates the problem of how designing data-driven (i.e., toggling based) clock gating can be closely integrated with the synthesis of flip-flops, which has never been addressed in the prior clock gating works. Our key observation is that some internal part of a flip-flop cell can be reused to generate its clock gating enable signal. Based on this, we propose a newly optimized flip-flop wiring structure, called eXOR-FF, in which an internal logic can be reused for every clock cycle to decide if the flip-flop is to be activated or inactivated through clock gating, thereby achieving area saving (thus, leakage as well as dynamic power saving) on every pair of flip-flop and its toggling detection logic. Then, we propose a comprehensive methodology of placement/timing�aware clock gating exploration that provides two unique strengths: best suited for max�imally exploiting the benefit of eXOR-FFs and precise analyses on the decomposition of power consumptions and timing impact, and translating them into cost functions in core engine of clock gating exploration. Through experiments with benchmark circuits in ISCAS89, ITC89, ITC99 and IWLS 2005, it is shown that our proposed method is able to reduce the total power by 5.6% and total cell area by 5.3% compared with the previous data-driven clock gating method in [1].Abstract Contents List of Tables List of Figures 1 Introduction 1.1 Power Consumption in CMOS Digital Design 1.2 Low Power Design Methodologies 1.3 Contribution of This Thesis 2 Preliminary and Motivations 6 2.1 Background 2.2 Observation on Area and Power Saving 2.3 Observation on Timing Impact 3 Redesign of Flip-flops Specialized for Clock Gating 3.1 Observation on Area Impact 4 Placement-aware Clock Gating Methodology Utilizing eXOR-FF Cells 4.1 Overall Design Flow 4.2 Cost Formulation for Conventional Clock Gating 4.3 Cost Formulation for Our Clock Gating using eXOR-FFs 5 Experiments 5.1 Experimental Setup 5.2 Experimental Results 5.3 Comparing with Industry Algorithm 6 Conclusion Abstract (In Korean)Maste

SNU Open Repository and Archive

An FPGA Architecture and CAD Flow Supporting Dynamically Controlled Power Gating

Author: Bsoul AAM
Luk W
Tsoi KH
Wilton SJE
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/12/2014
Field of study

© 2015 IEEE.Leakage power is an important component of the total power consumption in field-programmable gate arrays (FPGAs) built using 90-nm and smaller technology nodes. Power gating was shown to be effective at reducing the leakage power. Previous techniques focus on turning OFF unused FPGA resources at configuration time; the benefit of this approach depends on resource utilization. In this paper, we present an FPGA architecture that enables dynamically controlled power gating, in which FPGA resources can be selectively powered down at run-time. This could lead to significant overall energy savings for applications having modules with long idle times. We also present a CAD flow that can be used to map applications to the proposed architecture. We study the area and power tradeoffs by varying the different FPGA architecture parameters and power gating granularity. The proposed CAD flow is used to map a set of benchmark circuits that have multiple power-gated modules to the proposed architecture. Power savings of up to 83% are achievable for these circuits. Finally, we study a control system of a robot that is used in endoscopy. Using the proposed architecture combined with clock gating results in up to 19% energy savings in this application

Spiral - Imperial College Digital Repository

비용 효율적인 클럭 및 파워 게이팅 설계 방법론

Author: 현경환
Publication venue: 서울대학교 대학원
Publication date: 01/02/2020
Field of study

학위논문(박사)--서울대학교 대학원 :공과대학 전기·정보공학부,2020. 2. 김태환.저전력 설계는 최신 시스템-온-칩 (SoCs) 설계에서 매우 중요한 요소 중의 하나이다. 본 논문에서는 동적 및 정적 전력 소비를 감소시키기 위한 저전력 설계 방법론에 대해 논한다. 구체적으로 비용 효율적인 저전력 설계를 위하여 두 가지 새로운 기술을 제안한다. 우선 본 논문에서는 동적 전력 소비를 줄일 수 있는 새로운 클럭 게이팅 방법을 제안한다. 기존 플립-플랍 입력 데이터 토글 기반 클럭 게이팅은 가장 널리 사용되는 클럭 게이팅 기법 중의 하나이다. 하지만 이 방법은 더 많은 플립-플랍에 대해 적용할수록 클럭 게이팅에 필요한 부가 회로가 급격히 증가한다는 근본적인 한계를 지니고 있다. 이러한 한계를 극복하기 위하여 본 논문에서는 다음과 같이 새로운 클럭 게이팅 방법을 제안한다. 첫 번째로 기존 입력 데이터 토글 기반 클럭 게이팅 방법에 필요한 회로 자원을 분석하여 해당 방법의 비효율성을 보이고, 기존 방법에서 사용되는 입력 데이터 토글 검출에 필수적이지만 고비용의 XOR 게이트를 완벽히 제거한 플립-플랍 상태 기반 클럭 게이팅'이라는 새로운 클럭 게이팅 방법을 제안한다. 두 번째로 제안된 XOR 게이트가 필요 없는 클럭 게이팅 방법을 위한 부가 회로를 제시하며, 다양한 타이밍 분석을 통하여 해당 회로가 안정적으로 적용될 수 있음을 보인다. 세 번째로 회로의 플립-플랍 상태 프로파일에 기반하여, 제안된 클럭 게이팅 기법을 기존 클럭 게이팅 기법과 완벽하게 통합할 수 있는 클럭 게이팅 방법론을 제안한다. 여러 벤치마크 회로에 대한 실험 결과는 기존 입력 데이터 토글 기반 클럭 게이팅 방법이 전력 소비 절감 기회를 놓치는 반면 본 논문에서 제안된 방법은 모든 타이밍 제약 조건을 만족하면서 전력 소비 감소에 매우 효과적임을 보여준다. 다음으로 정적 전력 소비를 줄이기 위한 방안으로, 본 논문에서는 기존 파워 게이트 회로의 상태 보존용 저장 공간 할당 방법들이 지니고 있는 두 가지 중요한 한계들을 해결할 수 있는 방법을 제안한다. 중요한 한계들이란 첫 번째로 다중-비트 상태 보존 플립-플랍의 무분별한 사용으로 인한 긴 웨이크업 지연 시간이며, 두 번째로 멀티플렉서 되먹임 루프가 있는 상태 보존 플립-플랍의 최적화 불가능성이다. 기존 방법들에서는 상태 보존을 위한 저장 공간을 최소화하기 위해 긴 웨이크업 지연 시간이 필수적이었다. 그리고 되먹임 루프가 있는 플립-플랍은 최적화할 수 없는 대상으로 다루어졌다. 그러나 일반적으로 하드웨어 기술 언어(HDL)로부터 생성되는 되먹임 루프를 지닌 플립-플랍은 무시할 수 있을 정도로 적은 양이 아니다. 첫 번째 한계를 해결하기 위한 방법으로 본 논문에서는 최대 2 비트의 다중-비트 상태 보존 플립-플랍을 사용하여 웨이크업 지연 시간을 두 클럭 사이클로 제한하면서도 상태 보존을 위한 저장 공간을 효율적으로 절약할 수 있음을 보인다. 그리고 두 번째 한계를 극복하기 위해서 되먹임 루프를 지닌 플립-플랍이 포함된 두 플립-플랍 쌍의 상태를 복원할 수 있는 2단 상태 보존 제어 방안을 제안한다. 또한 주어진 회로에서 충돌없이 동시에 존재할 수 있는 플립-플랍 쌍을 최대로 추출하기 위해 독립 집합 문제(independent set problem)기반의 연산법도 제안한다. 벤치마크 회로에 대한 실험 결과는 본 논문에서 제안된 방법이 웨이크업 지연 시간을 두 클럭 사이클로 제한하면서도 상태 보존에 필요한 저장 공간과 파워를 감소시키는데 매우 효과적임을 보여준다.Low power design is of great importance in modern system-on-chips (SoCs). This dissertation studies on low power design methodologies for saving dynamic and static power consumption. Precisely, we unveil two novel techniques of cost effective low power design. Firstly, we propose a novel clock gating method for reducing the dynamic power consumption. Flip-flop's input data toggling based clock gating is one of the most commonly used clock gating methods, in which one critical and inherent limitation is the sharp increase of gating logic as more flip-flops are involved in gating. In this dissertation, we propose a new clock gating method to overcome this limitation. Specifically, (1) we analyze the resources of gating logic in the input data toggling based clock gating, from which an ineffectiveness in resource utilization is observed and we propose a new clock gating technique called flip-flop state driven clock gating which completely eliminates the essential and expensive component of XOR gates for detecting input toggling of flip-flops; (2) we provide the supporting logic circuitry of our proposed XOR-free clock gating, confirming its safe applicability through a comprehensive timing analysis; (3) we propose, based on the flip-flops' state profile, a clock gating methodology that seamlessly combines our flip-flop state based clock gating with the toggling based clock gating. Through experiments with benchmark circuits, it is confirmed that our clock gating method is very effective in reducing power, which otherwise the toggling based clock gating shall miss the power saving opportunity, while meeting all timing constraints. Secondly, for reducing the static power consumption, we solve two critical limitations of the conventional approaches to the allocation of state retention storage for power gated circuits. Those are (1) the long wakeup delay caused by the senseless use of multi-bit retention flip-flops (MBRFFs) and (2) the inability to optimize retention flip-flops for the flip-flops with mux-feedback loop. It should be noted that the conventional approaches have regarded the long wakeup delay as an inevitable consequence of maximizing the reduction of total storage size for state retention while they have treated the flip-flops with mux-feedback loop (called self-loop flip-flop) as nonoptimizable component, but practically, the self-loop flip-flops synthesized from hardware description language (HDL) code are not far from a small amount and thus, can in no way be negligible. More precisely, for solving (1), we show that the use of MBRFFs with up to two bits, consequently, constraining the wakeup delay to no more than two clock cycles, is enough to maintain the high reduction of total retention storage and for solving (2), we devise a 2-phase retention control mechanism for a pair of flip-flops, one of which has self-loop, by which just a single retention bit can be used to restore state of the two flip-flops, and propose an independent set based algorithm for maximally extracting the non-conflict pairs from circuits. Through experiments with benchmark circuits, it is shown that our proposed method is very effective against reducing the state retention storage and the power consumption compared with the existing best MBRFF allocation while the wakeup delay is strictly limited to two clock cycles.1 INTRODUCTION 1 1.1 Clock Gating 1 1.2 Power Gating and State Retention 3 1.3 Multi-bit Retention Registers 4 1.4 Contributions of This Dissertation 6 2 FLIP-FLOP STATE DRIVEN CLOCK GATING: CONCEPT, DESIGN, AND METHODOLOGY 9 2.1 Motivations 9 2.1.1 Toggling based Clock Gating 9 2.1.2 Area and Power by Clock Gating 10 2.2 The Proposed Clock Gating 13 2.2.1 Concept of Flip-flop State Driven Clock Gating 13 2.2.2 Design of Gating Logic Circuitry 17 2.2.3 Integrated Clock Gating Methodology 22 2.2.4 Cost Formulation 23 2.3 Experiments 25 2.3.1 Experimental Setup 25 2.3.2 Experimental Results 26 3 ALGORITHM AND DESIGN OPTIMIZATION OF ALLOCATING MULTI-BIT RETENTION FLIP-FLOPS FOR POWER GATED CIRCUITS 32 3.1 Motivations 32 3.1.1 Flip-flops with Mux-feedback Loop 32 3.1.2 Impact of Wakeup Delay 37 3.2 The Proposed Allocation Algorithm 39 3.3 Design of Multi-Bit Retention Flip-Flop and Multi-Bit Extension 48 3.3.1 Multi-Bit Retention Flip-Flop 48 3.3.2 Multi-Bit Flip-Flop Extension 52 3.4 Experiments 54 3.4.1 Experimental Setup 54 3.4.2 Experimental Results 57 4 CONCLUSIONS 65 4.1 Flip-flop State Driven Clock Gating: Concept, Design, and Methodology 65 4.2 Algorithm and Design Optimization of Allocating Multi-bit Retention Flip-flops for Power Gated Circuits 66 Abstract (In Korean) 71Docto

SNU Open Repository and Archive

Vector processing-aware advanced clock-gating techniques for low-power fused multiply-add

Author: Cristal Kestelman Adrián
Palomar Pérez Óscar
Ratkovic Ivan
Stanic Milan
Unsal Osman Sabri
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

The need for power efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a retailoring for the mobile market that they are entering now. Floating-point (FP) fused multiply-add (FMA), being a functional unit with high power consumption, deserves special attention. Although clock gating is a well-known method to reduce switching power in synchronous designs, there are unexplored opportunities for its application to vector processors, especially when considering active operating mode. In this research, we comprehensively identify, propose, and evaluate the most suitable clock-gating techniques for vector FMA units (VFUs). These techniques ensure power savings without jeopardizing the timing. We evaluate the proposed techniques using both synthetic and “real-world” application-based benchmarking. Using vector masking and vector multilane-aware clock gating, we report power reductions of up to 52%, assuming active VFU operating at the peak performance. Among other findings, we observe that vector instruction-based clock-gating techniques achieve power savings for all vector FP instructions. Finally, when evaluating all techniques together, using “real-world” benchmarking, the power reductions are up to 80%. Additionally, in accordance with processor design trends, we perform this research in a fully parameterizable and automated fashion.The research leading to these results has received funding from the RoMoL ERC Advanced Grant GA 321253 and is supported in part by the European Union (FEDER funds) under contract TTIN2015-65316-P. The work of I. Ratkovic was supported by a FPU research grant from the Spanish MECD.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

Author: Gujarathi Hemal S
McDonald-Maier Klaus D
Qadri Muhammad Yasir
Publication venue: 'Academy Publisher'
Publication date: 01/01/2009
Field of study

The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

University of Essex Research Repository

CiteSeerX

Crossref

Power-Gating: More Than Leakage Savings

Author: Calimera Andrea
Macii Enrico
Poncino Massimo
Publication venue: ACM/IEEE
Publication date: 01/01/2010
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

An ARM Cortex-M0 for Energy Harvesting Systems: A Novel Application of UPF with Synopsys’ Galaxy Platform

Author: Mistry Jatin N.
Myers James
Publication venue
Publication date: 26/03/2012
Field of study

Southampton (e-Prints Soton)

RTL2RTL Formal Equivalence: Boosting the Design Confidence

Author: Bindumadhava S S
Gupta Aarti
Kumar M V Achutha Kiran
Publication venue: 'Open Publishing Association'
Publication date: 15/07/2014
Field of study

Increasing design complexity driven by feature and performance requirements and the Time to Market (TTM) constraints force a faster design and validation closure. This in turn enforces novel ways of identifying and debugging behavioral inconsistencies early in the design cycle. Addition of incremental features and timing fixes may alter the legacy design behavior and would inadvertently result in undesirable bugs. The most common method of verifying the correctness of the changed design is to run a dynamic regression test suite before and after the intended changes and compare the results, a method which is not exhaustive. Modern Formal Verification (FV) techniques involving new methods of proving Sequential Hardware Equivalence enabled a new set of solutions for the given problem, with complete coverage guarantee. Formal Equivalence can be applied for proving functional integrity after design changes resulting from a wide variety of reasons, ranging from simple pipeline optimizations to complex logic redistributions. We present here our experience of successfully applying the RTL to RTL (RTL2RTL) Formal Verification across a wide spectrum of problems on a Graphics design. The RTL2RTL FV enabled checking the design sanity in a very short time, thus enabling faster and safer design churn. The techniques presented in this paper are applicable to any complex hardware design.Comment: In Proceedings FSFMA 2014, arXiv:1407.195

arXiv.org e-Print Archive

Directory of Open Access Journals