4 research outputs found
ν΄λ κ²μ΄ν λ° ν립 νλ‘ λμ μ΅μ νλ₯Ό μν μ€κ³ λ° μκ³ λ¦¬μ¦
νμλ
Όλ¬Έ (μμ¬)-- μμΈλνκ΅ λνμ : 곡과λν μ κΈ°Β·μ 보곡νλΆ, 2019. 2. κΉνν.λ³Έ λ
Όλ¬Έμμλ νμ€ μ
μμλΆν° λ°°μΉ λ¨κ³μ μ΄λ₯΄λ λ€μν μ€κ³λ¨μμμ μΉ©μ
λμ μ λ ₯μ μ΅μ ν κΈ°λ²μ μκ°νλ€. μ΄ μ°κ΅¬λ μ°μ λ°μ΄ν° ꡬλν (μ¦, ν κΈλ§
κΈ°λ°) ν΄λ κ²μ΄ν
μ΄ μ’
λ ν΄λ κ²μ΄ν
κΈ°λ²λ€μμ κ²°μ½ λ€λ£¨μ΄μ§μ§ μμλ ν립 ν
λ‘μ ν©μ±κ³Ό λ°μ νκ² ν΅ν©λ μ μλ λ°©λ²μ μ°κ΅¬νλ€. μ°λ¦¬μ κ΄μΈ‘μ ν΅μ¬μ ν립
νλ‘ μ
μ μΌλΆ λ΄λΆ λΆνμ΄ ν΄λ κ²μ΄ν
μΈμμ΄λΈ μ νΈλ₯Ό μμ± νκΈ° μν΄ μ¬μ¬μ©
λ μ μλ€λ κ²μ΄λ€. μ΄λ₯Ό λ°νμΌλ‘ eXOR-FF λΌκ³ λΆλ¦¬λ μλ‘κ² μ΅μ νλ ν립
νλ‘ λ°°μ ꡬ쑰λ₯Ό μ μν©λλ€. μ΄ κ΅¬μ‘°μμλ 맀 ν΄λ μ£ΌκΈ°λ§λ€ λ΄λΆ λ‘μ§μ μ¬μ¬μ©
νμ¬ ν΄λ κ²μ΄ν
μ ν΅ν΄ ν립 νλ‘μ νμ±νν μ§ λλ λΉνμ±νν μ§ κ²°μ ν©λλ€.
λͺ¨λ μμ ν립 νλ‘ λ° ν κΈλ¦΄ κ°μ§ λ‘μ§μμμ μμμ μ μ½ν¨μ λ°λΌμ λμ€ λ°
λμ μ λ ₯μ μ μ ν¨κ³Όλ₯Ό λ¬μ±ν©λλ€. κ·Έλ° λ€μ, λ κ°μ§κ³ μ ν μ₯μ μ μ 곡νλ
λ°°μΉ/νμ΄λ° μΈμ ν΄λ κ²μ΄ν
νμμ λν ν¬κ΄μ μΈ λ°©λ²λ‘ μ μ μν©λλ€. ν΄λΉ λ°©
λ²λ‘ μ eXOR-FF μ μ΄μ μ κ·Ήλννκ³ , μ λ ₯ μλΉ λ° νμ΄λ° μν₯μ λΆν΄μ λν
μ λ° λΆμμ μννκ³ νλ κ²μ΄ν
μ°Έμμ ν΅μ¬ μμ§μ λΉμ©κΈ°λ₯μΌλ‘ λ³ννλλ°
κ°μ₯ μ ν©ν©λλ€. ISCAS89, ITC89, ITC99 λ° IWLS 2005μ λ²€μΉ λ§ν¬ νλ‘λ₯Ό μ¬μ©
ν μ€νμ ν΅ν΄ μ μ λ λ°©λ²μ΄ μ΄μ μ λ°μ΄ν° ꡬλ ν΄λ‘ κ²μ΄ν
λ°©μκ³Ό λΉκ΅νμ¬ μ΄
μ λ ₯μ 5.6 % λ° λ©΄μ μΌλ‘ 5.3 % μ€μΌ μ μμμ λ³΄μ¬ μ£Όμλ€.In this paper, we introduce dynamic power optimization techniques applicable for
various design stage from standard cell to placement stage. This work firstly investiοΏ½gates the problem of how designing data-driven (i.e., toggling based) clock gating can
be closely integrated with the synthesis of flip-flops, which has never been addressed
in the prior clock gating works. Our key observation is that some internal part of a
flip-flop cell can be reused to generate its clock gating enable signal. Based on this,
we propose a newly optimized flip-flop wiring structure, called eXOR-FF, in which
an internal logic can be reused for every clock cycle to decide if the flip-flop is to
be activated or inactivated through clock gating, thereby achieving area saving (thus,
leakage as well as dynamic power saving) on every pair of flip-flop and its toggling
detection logic. Then, we propose a comprehensive methodology of placement/timingοΏ½aware clock gating exploration that provides two unique strengths: best suited for maxοΏ½imally exploiting the benefit of eXOR-FFs and precise analyses on the decomposition
of power consumptions and timing impact, and translating them into cost functions in
core engine of clock gating exploration.
Through experiments with benchmark circuits in ISCAS89, ITC89, ITC99 and
IWLS 2005, it is shown that our proposed method is able to reduce the total power by
5.6% and total cell area by 5.3% compared with the previous data-driven clock gating
method in [1].Abstract
Contents
List of Tables
List of Figures
1 Introduction
1.1 Power Consumption in CMOS Digital Design
1.2 Low Power Design Methodologies
1.3 Contribution of This Thesis
2 Preliminary and Motivations 6
2.1 Background
2.2 Observation on Area and Power Saving
2.3 Observation on Timing Impact
3 Redesign of Flip-flops Specialized for Clock Gating
3.1 Observation on Area Impact
4 Placement-aware Clock Gating Methodology Utilizing eXOR-FF Cells
4.1 Overall Design Flow
4.2 Cost Formulation for Conventional Clock Gating
4.3 Cost Formulation for Our Clock Gating using eXOR-FFs
5 Experiments
5.1 Experimental Setup
5.2 Experimental Results
5.3 Comparing with Industry Algorithm
6 Conclusion
Abstract (In Korean)Maste
μ ννκ³ νμ΅ κΈ°λ° μ λ ₯ λΆμμ κΈ°λ°μΌλ‘ νλ ν΄λ‘ κ²μ΄ν μ ν©μ±
νμλ
Όλ¬Έ(μμ¬) -- μμΈλνκ΅λνμ : 곡과λν μ κΈ°Β·μ 보곡νλΆ, 2023. 2. κΉνν.In this paper, we introduce two techniques to efficiently apply clock gating in the
synthesis stage.
First, We propose a new clock gating methodology based on a precise power saving
analysis to overcome the ineffectiveness of the conventional logic structure based clock
gating. Two new features exploited in our proposed clock gating are (i) the multiplexer
selection signal probability that a flip-flop with multiplexer feedback loop receives a
new input and (ii) the joint probability of selection signals that two flip-flops with
different multiplexor selection signals both receive new inputs at the same clock cycle.
In summary, our method reduces the total power consumption by 2.46% on average
(up to 5.00%) over the conventional clock gating method.
In the second work, we address a new problem of transforming the long toggling/untoggling sequences of flip-flops cycle-accurate activities into short embedding vectors, so that the flip-flop grouping for clock gating is practically feasible in
terms of the memory usage and run time for checking activity similarity among flip-flops. To this end, we propose a machine learning based generation of embedding
vectors which are accurate enough to predict the original flip-flop toggling sequences.
Precisely, we develop a neural network model of LSTM (long short-term memory)
based AE(autoencoder) model combined with SDAE (stacked denoising autoencoder)
to take into account the time-series (i.e., clock cycle) similarity feature among the toggling sequences, which is essential to determine which flip-flops should be grouped
together for clock gating. By integrating (1) our LSTM based embedding vector generation model, we propose two additional ML models for clock gating: (2) joint state
probability predictor (JSP) model for generating 0-state probability of two embedding
vectors, and (3) joint feature predictor (JFP) model for generating a new embedding
vector that combines two embedding vectors. Through experiments, it is confirmed
that our proposed LSTM combined with AutoEnc improves the toggling sequence prediction accuracy up to 0.88 while an LSTM (long short-term memory) based AE model
produces accuracy to 0.72, thereby enabling our ML based clock gating framework to
save the dynamic power consumption further over that by the state-of-the-art commercial clock gating tool, which relies on the flip-flops toggling probability for grouping
flip-flops. Through experiments with benchmark circuits in IWLS, it is shown that our
method is able to reduce the dynamic power by 14.0% on average over that by the
conventional toggling-driven clock gating.λ³Έ λ
Όλ¬Έμμλ ν©μ± λ¨κ³μμ ν΄λ‘ κ²μ΄ν
μ ν¨μ¨μ μΌλ‘ μ μ©νκΈ° μν λ κ°μ§
κΈ°λ²μ μκ°νλ€.
첫째λ‘, ν΄λ‘ κ²μ΄ν
κΈ°λ°μ κΈ°μ‘΄ λ‘μ§ κ΅¬μ‘°μ λΉν¨μ¨μ±μ 극볡νκΈ° μν΄ μ λ°
ν μ μ λΆμμ κΈ°λ°μΌλ‘ ν μλ‘μ΄ ν΄λ‘ κ²μ΄ν
λ°©λ²λ‘ μ μ μνλ€. μ μλ ν΄λ‘
κ²μ΄ν
λ°©λ²μμ νμ©λλ λ κ°μ§ μλ‘μ΄ κΈ°λ₯μ (i) νΌλλ°± 루νκ° μλ ν립νλ‘
μ λ©ν°νλ μ μ ν μ νΈ νλ₯ λ° (ii) μλ‘ λ€λ₯Έ λ©ν°νλ μ μ ν μ νΈλ₯Ό κ°λ λ
ν립νλ‘μ λ©ν°νλ μ μ ν μ νΈ κ²°ν© νλ₯ μ΄λ€. μ λ ₯ μ΄λμ΄ μλ κ²½μ°μλ§ ν΄λ‘
κ²μ΄ν
μ μ μ©νκ³ μλ‘ λ€λ₯Έ ν΄λ‘ κ²μ΄ν
κ·Έλ£Ήμ ν΅ν©ν¨μΌλ‘μ μ 체 λμ μ λ ₯λ₯Ό
μ€μ΄κ³ μ νμλ€. μ€νμ ν΅ν΄ κΈ°μ‘΄μ ν΄λ‘ κ²μ΄ν
λ°©λ²μ λΉν΄ νκ· 2.46%(μ΅λ
5.00%)μ μ΄ μ λ ₯ μλΉλ₯Ό μ€μ΄λ κ²μ νμΈνμλ€.
λ λ²μ§Έλ‘ ν립νλ‘μ ν΄λ‘ μ£ΌκΈ°λ³ μνλ₯Ό λνλ΄λ κΈ΄ ν κΈλ§/μΈν κΈλ§ μνμ€
λ₯Ό 짧μ μλ² λ© λ²‘ν°λ‘ λ³ννλ λ¬Έμ λ₯Ό ν΄κ²°νμλ€. μ΄λ₯Ό ν κΈλ§ κΈ°λ° ν΄λ‘ κ²μ΄
ν
μ μν ν립νλ‘ κ·Έλ£Ήνμ μ μ©νμ¬ ν립νλ‘ κ°μ μν μ μ¬μ± νμΈμ΄ λ©λͺ¨λ¦¬
μ¬μ©λ λ° μ€ν μκ° μΈ‘λ©΄μμ μ€μ§μ μΌλ‘ μ€ν κ°λ₯νκ² νμλ€. μ΄λ₯Ό μν΄ κΈ°κ³
νμ΅ κΈ°λ°μΌλ‘ μλμ ν립νλ‘ ν κΈ μνμ€λ₯Ό μμΈ‘νκΈ°μ μΆ©λΆν μ νν μ μ°¨μμ
μλ² λ© λ²‘ν°μ μμ±μ μ μνλ€. μ°λ¦¬λ ν κΈλ§ μνμ€ κ°μ μκ³μ΄ μ μ¬μ±μ κ³ λ €
νκΈ° μν΄ λλ
Έμ΄μ¦ μ€ν μΈμ½λλ₯Ό μ΄μ©νμ¬ 5000 ν΄λ‘ μ¬μ΄ν΄μ ν κΈλ§ μνμ€λ₯Ό
10μ°¨μμΌλ‘ μμΆνκ³ μ΄λ₯Ό μ₯λ¨κΈ° λ©λͺ¨λ¦¬ μ€ν μΈμ½λμ μ
λ ₯νμ¬ μ 체 μνμ€λ₯Ό
λλ³νλ μ μ°¨μ μλ² λ© λ²‘ν°λ₯Ό μμ±νλ μ κ²½λ§ λͺ¨λΈμ κ°λ°νμλ€. λν μ°λ¦¬λ
ν΄λ‘ κ²μ΄ν
μ μν λ κ°μ§ λΆκ°μ μΈ μ κ²½λ§ λͺ¨λΈμΈ (1) 2κ°μ μλ² λ© λ²‘ν°μ 0-
μν νλ₯ μμ±μ μν κ²°ν© νλ₯ μμΈ‘ λͺ¨λΈκ³Ό (2) λ κ°μ μλ² λ© λ²‘ν°λ₯Ό κ²°ν©νμ¬
μλ‘μ΄ μλ² λ© λ²‘ν°λ₯Ό μμΈ‘νλ κ²°ν© νΉμ§ μμΈ‘ λͺ¨λΈμ μ μνλ€. IWLS λ²€μΉλ§ν¬
νλ‘λ₯Ό μ΄μ©ν μ€νμ ν΅ν΄, λλ
Έμ΄μ¦ μ€ν μΈμ½λλ§ μ¬μ©νμλλ³΄λ€ μ₯λ¨κΈ° λ©λͺ¨λ¦¬
κΈ°λ°μ μ€ν μΈμ½λλ₯Ό κ²°ν©νμ λ μ
λ ₯ λ°μ΄ν°λ₯Ό 볡μ μ νλκ° λ μ°μν κ²μ ν
μΈνμλ€. λν μ°λ¦¬μ λ°©λ²μ΄ κΈ°μ‘΄μ ν κΈλ§ κΈ°λ° ν΄λ‘ κ²μ΄ν
μ λΉν΄ νκ· 14.0%
μ λμ μ λ ₯μ μ€μΌ μ μμμ νμΈνμλ€.1 Selective Clock Gating Based on Comprehensive Power Saving Analysis 1
1.1 Introduction 1
1.2 Preliminary and Motivation 1
1.3 Selective Clock Gating 3
1.3.1 Concept of Selective Clock Gating 3
1.3.2 Joint probability of selection signals 5
1.4 Experimental Results 6
1.4.1 Experimental Setup 6
1.4.2 Experimental Result 7
1.5 Conclusion 10
2 Machine Learning Based Flip-Flop Grouping for Toggling Driven Clock Gating 11
2.1 Introduction 11
2.2 Preliminaries and Prior Works 13
2.2.1 Preliminary and Motivation 13
2.2.2 Prior Works 14
2.3 Machine Learning Based Clock Gating Framework 14
2.3.1 Primary Model: Embedding Vector Generation 14
2.3.2 Secondary Models: Joint State Probability and Joint Feature Prediction 17
2.3.3 Distance Analysis Between Embedding Vectors 18
2.3.4 Power Analysis Model 19
2.3.5 Overall Flow of Flip-flop Grouping 19
2.4 Experimental Results 19
2.4.1 Comparison of Dynamic Power Saving 20
2.4.2 Performance of Auto-encoder Reconstruction Model 21
2.5 Conclusion 21
Abstract (In Korean) 26μ
Lottery Aware Sparsity Hunting: Enabling Federated Learning on Resource-Limited Edge
Edge devices can benefit remarkably from federated learning due to their
distributed nature; however, their limited resource and computing power poses
limitations in deployment. A possible solution to this problem is to utilize
off-the-shelf sparse learning algorithms at the clients to meet their resource
budget. However, such naive deployment in the clients causes significant
accuracy degradation, especially for highly resource-constrained clients. In
particular, our investigations reveal that the lack of consensus in the
sparsity masks among the clients may potentially slow down the convergence of
the global model and cause a substantial accuracy drop. With these
observations, we present \textit{federated lottery aware sparsity hunting}
(FLASH), a unified sparse learning framework for training a sparse sub-model
that maintains the performance under ultra-low parameter density while yielding
proportional communication benefits. Moreover, given that different clients may
have different resource budgets, we present \textit{hetero-FLASH} where clients
can take different density budgets based on their device resource limitations
instead of supporting only one target parameter density. Experimental analysis
on diverse models and datasets shows the superiority of FLASH in closing the
gap with an unpruned baseline while yielding up to
improved accuracy with fewer communication,
compared to existing alternatives, at similar hyperparameter settings. Code is
available at \url{https://github.com/SaraBabakN/flash_fl}.Comment: Accepted in TMLR, https://openreview.net/forum?id=iHyhdpsny
λΉμ© ν¨μ¨μ μΈ ν΄λ λ° νμ κ²μ΄ν μ€κ³ λ°©λ²λ‘
νμλ
Όλ¬Έ(λ°μ¬)--μμΈλνκ΅ λνμ :곡과λν μ κΈ°Β·μ 보곡νλΆ,2020. 2. κΉνν.μ μ λ ₯ μ€κ³λ μ΅μ μμ€ν
-μ¨-μΉ© (SoCs) μ€κ³μμ λ§€μ° μ€μν μμ μ€μ νλμ΄λ€. λ³Έ λ
Όλ¬Έμμλ λμ λ° μ μ μ λ ₯ μλΉλ₯Ό κ°μμν€κΈ° μν μ μ λ ₯ μ€κ³ λ°©λ²λ‘ μ λν΄ λ
Όνλ€. ꡬ체μ μΌλ‘ λΉμ© ν¨μ¨μ μΈ μ μ λ ₯ μ€κ³λ₯Ό μνμ¬ λ κ°μ§ μλ‘μ΄ κΈ°μ μ μ μνλ€.
μ°μ λ³Έ λ
Όλ¬Έμμλ λμ μ λ ₯ μλΉλ₯Ό μ€μΌ μ μλ μλ‘μ΄ ν΄λ κ²μ΄ν
λ°©λ²μ μ μνλ€. κΈ°μ‘΄ ν립-νλ μ
λ ₯ λ°μ΄ν° ν κΈ κΈ°λ° ν΄λ κ²μ΄ν
μ κ°μ₯ λ리 μ¬μ©λλ ν΄λ κ²μ΄ν
κΈ°λ² μ€μ νλμ΄λ€. νμ§λ§ μ΄ λ°©λ²μ λ λ§μ ν립-νλμ λν΄ μ μ©ν μλ‘ ν΄λ κ²μ΄ν
μ νμν λΆκ° νλ‘κ° κΈκ²©ν μ¦κ°νλ€λ κ·Όλ³Έμ μΈ νκ³λ₯Ό μ§λκ³ μλ€. μ΄λ¬ν νκ³λ₯Ό 극볡νκΈ° μνμ¬ λ³Έ λ
Όλ¬Έμμλ λ€μκ³Ό κ°μ΄ μλ‘μ΄ ν΄λ κ²μ΄ν
λ°©λ²μ μ μνλ€. 첫 λ²μ§Έλ‘ κΈ°μ‘΄ μ
λ ₯ λ°μ΄ν° ν κΈ κΈ°λ° ν΄λ κ²μ΄ν
λ°©λ²μ νμν νλ‘ μμμ λΆμνμ¬ ν΄λΉ λ°©λ²μ λΉν¨μ¨μ±μ 보μ΄κ³ , κΈ°μ‘΄ λ°©λ²μμ μ¬μ©λλ μ
λ ₯ λ°μ΄ν° ν κΈ κ²μΆμ νμμ μ΄μ§λ§ κ³ λΉμ©μ XOR κ²μ΄νΈλ₯Ό μλ²½ν μ κ±°ν ν립-νλ μν κΈ°λ° ν΄λ κ²μ΄ν
'μ΄λΌλ μλ‘μ΄ ν΄λ κ²μ΄ν
λ°©λ²μ μ μνλ€. λ λ²μ§Έλ‘ μ μλ XOR κ²μ΄νΈκ° νμ μλ ν΄λ κ²μ΄ν
λ°©λ²μ μν λΆκ° νλ‘λ₯Ό μ μνλ©°, λ€μν νμ΄λ° λΆμμ ν΅νμ¬ ν΄λΉ νλ‘κ° μμ μ μΌλ‘ μ μ©λ μ μμμ 보μΈλ€. μΈ λ²μ§Έλ‘ νλ‘μ ν립-νλ μν νλ‘νμΌμ κΈ°λ°νμ¬, μ μλ ν΄λ κ²μ΄ν
κΈ°λ²μ κΈ°μ‘΄ ν΄λ κ²μ΄ν
κΈ°λ²κ³Ό μλ²½νκ² ν΅ν©ν μ μλ ν΄λ κ²μ΄ν
λ°©λ²λ‘ μ μ μνλ€. μ¬λ¬ λ²€μΉλ§ν¬ νλ‘μ λν μ€ν κ²°κ³Όλ κΈ°μ‘΄ μ
λ ₯ λ°μ΄ν° ν κΈ κΈ°λ° ν΄λ κ²μ΄ν
λ°©λ²μ΄ μ λ ₯ μλΉ μ κ° κΈ°νλ₯Ό λμΉλ λ°λ©΄ λ³Έ λ
Όλ¬Έμμ μ μλ λ°©λ²μ λͺ¨λ νμ΄λ° μ μ½ μ‘°κ±΄μ λ§μ‘±νλ©΄μ μ λ ₯ μλΉ κ°μμ λ§€μ° ν¨κ³Όμ μμ 보μ¬μ€λ€.
λ€μμΌλ‘ μ μ μ λ ₯ μλΉλ₯Ό μ€μ΄κΈ° μν λ°©μμΌλ‘, λ³Έ λ
Όλ¬Έμμλ κΈ°μ‘΄ νμ κ²μ΄νΈ νλ‘μ μν λ³΄μ‘΄μ© μ μ₯ κ³΅κ° ν λΉ λ°©λ²λ€μ΄ μ§λκ³ μλ λ κ°μ§ μ€μν νκ³λ€μ ν΄κ²°ν μ μλ λ°©λ²μ μ μνλ€. μ€μν νκ³λ€μ΄λ 첫 λ²μ§Έλ‘ λ€μ€-λΉνΈ μν 보쑴 ν립-νλμ 무λΆλ³ν μ¬μ©μΌλ‘ μΈν κΈ΄ μ¨μ΄ν¬μ
μ§μ° μκ°μ΄λ©°, λ λ²μ§Έλ‘ λ©ν°νλ μ λλ¨Ήμ 루νκ° μλ μν 보쑴 ν립-νλμ μ΅μ ν λΆκ°λ₯μ±μ΄λ€. κΈ°μ‘΄ λ°©λ²λ€μμλ μν 보쑴μ μν μ μ₯ 곡κ°μ μ΅μννκΈ° μν΄ κΈ΄ μ¨μ΄ν¬μ
μ§μ° μκ°μ΄ νμμ μ΄μλ€. κ·Έλ¦¬κ³ λλ¨Ήμ 루νκ° μλ ν립-νλμ μ΅μ νν μ μλ λμμΌλ‘ λ€λ£¨μ΄μ‘λ€. κ·Έλ¬λ μΌλ°μ μΌλ‘ νλμ¨μ΄ κΈ°μ μΈμ΄(HDL)λ‘λΆν° μμ±λλ λλ¨Ήμ 루νλ₯Ό μ§λ ν립-νλμ 무μν μ μμ μ λλ‘ μ μ μμ΄ μλλ€. 첫 λ²μ§Έ νκ³λ₯Ό ν΄κ²°νκΈ° μν λ°©λ²μΌλ‘ λ³Έ λ
Όλ¬Έμμλ μ΅λ 2 λΉνΈμ λ€μ€-λΉνΈ μν 보쑴 ν립-νλμ μ¬μ©νμ¬ μ¨μ΄ν¬μ
μ§μ° μκ°μ λ ν΄λ μ¬μ΄ν΄λ‘ μ ννλ©΄μλ μν 보쑴μ μν μ μ₯ 곡κ°μ ν¨μ¨μ μΌλ‘ μ μ½ν μ μμμ 보μΈλ€. κ·Έλ¦¬κ³ λ λ²μ§Έ νκ³λ₯Ό 극볡νκΈ° μν΄μ λλ¨Ήμ 루νλ₯Ό μ§λ ν립-νλμ΄ ν¬ν¨λ λ ν립-νλ μμ μνλ₯Ό 볡μν μ μλ 2λ¨ μν 보쑴 μ μ΄ λ°©μμ μ μνλ€. λν μ£Όμ΄μ§ νλ‘μμ μΆ©λμμ΄ λμμ μ‘΄μ¬ν μ μλ ν립-νλ μμ μ΅λλ‘ μΆμΆνκΈ° μν΄ λ
립 μ§ν© λ¬Έμ (independent set problem)κΈ°λ°μ μ°μ°λ²λ μ μνλ€. λ²€μΉλ§ν¬ νλ‘μ λν μ€ν κ²°κ³Όλ λ³Έ λ
Όλ¬Έμμ μ μλ λ°©λ²μ΄ μ¨μ΄ν¬μ
μ§μ° μκ°μ λ ν΄λ μ¬μ΄ν΄λ‘ μ ννλ©΄μλ μν 보쑴μ νμν μ μ₯ 곡κ°κ³Ό νμλ₯Ό κ°μμν€λλ° λ§€μ° ν¨κ³Όμ μμ 보μ¬μ€λ€.Low power design is of great importance in modern system-on-chips (SoCs). This dissertation studies on low power design methodologies for saving dynamic and static power consumption. Precisely, we unveil two novel techniques of cost effective low power design.
Firstly, we propose a novel clock gating method for reducing the dynamic power consumption. Flip-flop's input data toggling based clock gating is one of the most commonly used clock gating methods, in which one critical and inherent limitation is the sharp increase of gating logic as more flip-flops are involved in gating. In this dissertation, we propose a new clock gating method to overcome this limitation. Specifically, (1) we analyze the resources of gating logic in the input data toggling based clock gating, from which an ineffectiveness in resource utilization is observed and we propose a new clock gating technique called flip-flop state driven clock gating which completely eliminates the essential and expensive component of XOR gates for detecting input toggling of flip-flops; (2) we provide the supporting logic circuitry of our proposed XOR-free clock gating, confirming its safe applicability through a comprehensive timing analysis; (3) we propose, based on the flip-flops' state profile, a clock gating methodology that seamlessly combines our flip-flop state based clock gating with the toggling based clock gating. Through experiments with benchmark circuits, it is confirmed that our clock gating method is very effective in reducing power, which otherwise the toggling based clock gating shall miss the power saving opportunity, while meeting all timing constraints.
Secondly, for reducing the static power consumption, we solve two critical limitations of the conventional approaches to the allocation of state retention storage for power gated circuits. Those are (1) the long wakeup delay caused by the senseless use of multi-bit retention flip-flops (MBRFFs) and (2) the inability to optimize retention flip-flops for the flip-flops with mux-feedback loop. It should be noted that the conventional approaches have regarded the long wakeup delay as an inevitable consequence of maximizing the reduction of total storage size for state retention while they have treated the flip-flops with mux-feedback loop (called self-loop flip-flop) as nonoptimizable component, but practically, the self-loop flip-flops synthesized from hardware description language (HDL) code are not far from a small amount and thus, can in no way be negligible. More precisely, for solving (1), we show that the use of MBRFFs with up to two bits, consequently, constraining the wakeup delay to no more than two clock cycles, is enough to maintain the high reduction of total retention storage and for solving (2), we devise a 2-phase retention control mechanism for a pair of flip-flops, one of which has self-loop, by which just a single retention bit can be used to restore state of the two flip-flops, and propose an independent set based algorithm for maximally extracting the non-conflict pairs from circuits. Through experiments with benchmark circuits, it is shown that our proposed method is very effective against reducing the state retention storage and the power consumption compared with the existing best MBRFF allocation while the wakeup delay is strictly limited to two clock cycles.1 INTRODUCTION 1
1.1 Clock Gating 1
1.2 Power Gating and State Retention 3
1.3 Multi-bit Retention Registers 4
1.4 Contributions of This Dissertation 6
2 FLIP-FLOP STATE DRIVEN CLOCK GATING: CONCEPT, DESIGN, AND METHODOLOGY 9
2.1 Motivations 9
2.1.1 Toggling based Clock Gating 9
2.1.2 Area and Power by Clock Gating 10
2.2 The Proposed Clock Gating 13
2.2.1 Concept of Flip-flop State Driven Clock Gating 13
2.2.2 Design of Gating Logic Circuitry 17
2.2.3 Integrated Clock Gating Methodology 22
2.2.4 Cost Formulation 23
2.3 Experiments 25
2.3.1 Experimental Setup 25
2.3.2 Experimental Results 26
3 ALGORITHM AND DESIGN OPTIMIZATION OF ALLOCATING MULTI-BIT RETENTION FLIP-FLOPS FOR POWER GATED CIRCUITS 32
3.1 Motivations 32
3.1.1 Flip-flops with Mux-feedback Loop 32
3.1.2 Impact of Wakeup Delay 37
3.2 The Proposed Allocation Algorithm 39
3.3 Design of Multi-Bit Retention Flip-Flop and Multi-Bit Extension 48
3.3.1 Multi-Bit Retention Flip-Flop 48
3.3.2 Multi-Bit Flip-Flop Extension 52
3.4 Experiments 54
3.4.1 Experimental Setup 54
3.4.2 Experimental Results 57
4 CONCLUSIONS 65
4.1 Flip-flop State Driven Clock Gating: Concept, Design, and Methodology 65
4.2 Algorithm and Design Optimization of Allocating Multi-bit Retention Flip-flops for Power Gated Circuits 66
Abstract (In Korean) 71Docto