Search CORE

1,026 research outputs found

Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

Author: Gujarathi Hemal S
McDonald-Maier Klaus D
Qadri Muhammad Yasir
Publication venue: 'Academy Publisher'
Publication date: 01/01/2009
Field of study

The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

University of Essex Research Repository

CiteSeerX

Crossref

클럭 게이팅 및 플립 플롭 동시 최적화를 위한 설계 및 알고리즘

Author: 양기용
Publication venue: 서울대학교 대학원
Publication date: 01/02/2019
Field of study

학위논문 (석사)-- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2019. 2. 김태환.본 논문에서는 표준 셀에서부터 배치 단계에 이르는 다양한 설계단에에서 칩의 동적 전력을 최적화 기법을 소개한다. 이 연구는 우선 데이터 구동형 (즉, 토글링 기반) 클럭 게이팅이 종래 클럭 게이팅 기법들에서 결코 다루어지지 않았던 플립 플 롭의 합성과 밀접하게 통합될 수 있는 방법을 연구한다. 우리의 관측의 핵심은 플립 플롭 셀의 일부 내부 부품이 클럭 게이팅 인에이블 신호를 생성 하기 위해 재사용 될 수 있다는 것이다. 이를 바탕으로 eXOR-FF 라고 불리는 새롭게 최적화된 플립 플롭 배선 구조를 제안합니다. 이 구조에서는 매 클럭 주기마다 내부 로직을 재사용 하여 클럭 게이팅을 통해 플립 플롭을 활성화할지 또는 비활성화할지 결정합니다. 모든 쌍의 플립 플롭 및 토글릴 감지 로직에서의 영역을 절약함에 따라서 누설 및 동적 전력의 절전 효과를 달성합니다. 그런 다음, 두 가지고유한 장점을 제공하는 배치/타이밍 인식 클럭 게이팅 탐색에 대한 포괄적인 방법론을 제안합니다. 해당 방 법론은 eXOR-FF 의 이점을 극대화하고, 전력 소비 및 타이밍 영향의 분해에 대한 정밀 분석을 수행하고 틀럭 게이팅 참색의 핵심 엔진을 비용기능으로 변환하는데 가장 적합합니다. ISCAS89, ITC89, ITC99 및 IWLS 2005의 벤치 마크 회로를 사용 한 실험을 통해 제안 된 방법이 이전의 데이터 구동 클록 게이팅 방식과 비교하여 총 전력을 5.6 % 및 면적으로 5.3 % 줄일 수 있음을 보여 주었다.In this paper, we introduce dynamic power optimization techniques applicable for various design stage from standard cell to placement stage. This work firstly investi�gates the problem of how designing data-driven (i.e., toggling based) clock gating can be closely integrated with the synthesis of flip-flops, which has never been addressed in the prior clock gating works. Our key observation is that some internal part of a flip-flop cell can be reused to generate its clock gating enable signal. Based on this, we propose a newly optimized flip-flop wiring structure, called eXOR-FF, in which an internal logic can be reused for every clock cycle to decide if the flip-flop is to be activated or inactivated through clock gating, thereby achieving area saving (thus, leakage as well as dynamic power saving) on every pair of flip-flop and its toggling detection logic. Then, we propose a comprehensive methodology of placement/timing�aware clock gating exploration that provides two unique strengths: best suited for max�imally exploiting the benefit of eXOR-FFs and precise analyses on the decomposition of power consumptions and timing impact, and translating them into cost functions in core engine of clock gating exploration. Through experiments with benchmark circuits in ISCAS89, ITC89, ITC99 and IWLS 2005, it is shown that our proposed method is able to reduce the total power by 5.6% and total cell area by 5.3% compared with the previous data-driven clock gating method in [1].Abstract Contents List of Tables List of Figures 1 Introduction 1.1 Power Consumption in CMOS Digital Design 1.2 Low Power Design Methodologies 1.3 Contribution of This Thesis 2 Preliminary and Motivations 6 2.1 Background 2.2 Observation on Area and Power Saving 2.3 Observation on Timing Impact 3 Redesign of Flip-flops Specialized for Clock Gating 3.1 Observation on Area Impact 4 Placement-aware Clock Gating Methodology Utilizing eXOR-FF Cells 4.1 Overall Design Flow 4.2 Cost Formulation for Conventional Clock Gating 4.3 Cost Formulation for Our Clock Gating using eXOR-FFs 5 Experiments 5.1 Experimental Setup 5.2 Experimental Results 5.3 Comparing with Industry Algorithm 6 Conclusion Abstract (In Korean)Maste

SNU Open Repository and Archive

ASIC implemented MicroBlaze-based Coprocessor for Data Stream Management Systems

Author: Balasubramanian Linknath Surya
Publication venue
Publication date: 01/01/2020
Field of study

Indiana University-Purdue University Indianapolis (IUPUI)The drastic increase in Internet usage demands the need for processing data in real time with higher efficiency than ever before. Symbiote Coprocessor Unit (SCU), developed by Dr. Pranav Vaidya, is a hardware accelerator which has potential of providing data processing speedup of up to 150x compared with traditional data stream processors. However, SCU implementation is very complex, fixed, and uses an outdated host interface, which limits future improvement. Mr. Tareq S. Alqaisi, an MSECE graduate from IUPUI worked on curbing these limitations. In his architecture, he used a Xilinx MicroBlaze microcontroller to reduce the complexity of SCU along with few other modifications. The objective of this study is to make SCU suitable for mass production while reducing its power consumption and delay. To accomplish this, the execution unit of SCU has been implemented in application specific integrated circuit and modules such as ACG/OCG, sequential comparator, and D-word multiplier/divider are integrated into the design. Furthermore, techniques such as operand isolation, buffer insertion, cell swapping, and cell resizing are also integrated into the system. As a result, the new design attains 67.9435 µW of dynamic power as compared to 74.0012 µW before power optimization along with a small increase in static power, 39.47 ns of clock period as opposed to 52.26 ns before time optimization

IUPUIScholarWorks

Purdue E-Pubs

Performance Comparison of Static CMOS and Domino Logic Style in VLSI Design: A Review

Author: Dr. S. M. Ramesh Dr. M. Jagdissh Chandra Prasad, Dr. T. V. P. Sundararajan, Dr. P. Senthilkumar
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/08/2019
Field of study

Of late, there is a steep rise in the usage of handheld gadgets and high speed applications. VLSI designers often choose static CMOS logic style for low power applications. This logic style provides low power dissipation and is free from signal noise integrity issues. However, designs based on this logic style often are slow and cannot be used in high performance circuits. On the other hand designs based on Domino logic style yield high performance and occupy less area. Yet, they have more power dissipation compared to their static CMOS counterparts. As a practice, designers during circuit synthesis, mix more than one logic style judiciously to obtain the advantages of each logic style. Carefully designing a mixed static Domino CMOS circuit can tap the advantages of both static and Domino logic styles overcoming their own short comings

International Journal on Future Revolution in Computer Science & Communication Engineering

The impact of design techniques in the reduction of power consumption of SoCs Multimedia

Author: Yang Yun Ju, 1980-
Publication venue: [s.n.]
Publication date: 19/08/2018
Field of study

Orientador: Guido Costa Souza de AraújoDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: A indústria de semicondutores sempre enfrentou fortes demandas em resolver problema de dissipação de calor e reduzir o consumo de energia em dispositivos. Esta tendência tem sido intensificada nos últimos anos com o movimento de sustentabilidade ambiental. A concepção correta de um sistema eletrônico de baixo consumo de energia é um problema de vários níveis de complexidade e exige estratégias sistemáticas na sua construção. Fora disso, a adoção de qualquer técnica de redução de energia sempre está vinculada com objetivos especiais e provoca alguns impactos no projeto. Apesar dos projetistas conheçam bem os impactos de forma qualitativa, as detalhes quantitativas ainda são incógnitas ou apenas mantidas dentro do 'know-how' das empresas. Neste trabalho, de acordo com resultados experimentais baseado num plataforma de SoC1 industrial, tentamos quantificar os impactos derivados do uso de técnicas de redução de consumo de energia. Nos concentramos em relacionar o fator de redução de energia de cada técnica aos impactos em termo de área, desempenho, esforço de implementação e verificação. Na ausência desse tipo de dados, que relacionam o esforço de engenharia com as metas de consumo de energia, incertezas e atrasos serão frequentes no cronograma de projeto. Esperamos que este tipo de orientações possam ajudar/guiar os arquitetos de projeto em selecionar as técnicas adequadas para reduzir o consumo de energia dentro do alcance de orçamento e cronograma de projetoAbstract: The semiconductor industry has always faced strong demands to solve the problem of heat dissipation and reduce the power consumption in electronic devices. This trend has been increased in recent years with the action of environmental sustainability. The correct conception of an electronic system for low power consumption is an issue with multiple levels of complexities and requires systematic approaches in its construction. However, the adoption of any technique for reducing the power consumption is always linked with some specific goals and causes some impacts on the project. Although the designers know well that these impacts can affect the design in a quality aspect, the quantitative details are still unkown or just be kept inside the company's know-how. In this work, according to the experimental results based on an industrial SoC2 platform, we try to quantify the impacts of the use of low power techniques. We will relate the power reduction factor of each technique to the impact in terms of area, performance, implementation and verification effort. In the absence of such data, which relates the engineering effort to the goals of power consumption, uncertainties and delays are frequent. We hope that such guidelines can help/guide the project architects in selecting the appropriate techniques to reduce the power consumption within the limit of budget and project scheduleMestradoCiência da ComputaçãoMestre em Ciência da Computaçã

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio da Producao Cientifica e Intelectual da Unicamp

Cross-Layer Automated Hardware Design for Accuracy-Configurable Approximate Computing

Author: Alan Tanfer
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 11/10/2021
Field of study

Approximate Computing trades off computation accuracy against performance or energy efficiency. It is a design paradigm that arose in the last decade as an answer to diminishing returns from Dennard\u27s scaling and a shift in the prominent workloads. A range of modern workloads, categorized mainly as recognition, mining, and synthesis, features an inherent tolerance to approximations. Their characteristics, such as redundancies in their input data and robust-to-noise algorithms, allow them to produce outputs of acceptable quality, despite an approximation in some of their computations. Approximate Computing leverages the application tolerance by relaxing the exactness in computation towards primary design goals of increasing performance or improving energy efficiency. Existing techniques span across the abstraction layers of computer systems where cross-layer techniques are shown to offer a larger design space and yield higher savings. Currently, the majority of the existing work aims at meeting a single accuracy. The extent of approximation tolerance, however, significantly varies with a change in input characteristics and applications. In this dissertation, methods and implementations are presented for cross-layer and automated design of accuracy-configurable Approximate Computing to maximally exploit the performance and energy benefits. In particular, this dissertation addresses the following challenges and introduces novel contributions: A main Approximate Computing category in hardware is to scale either voltage or frequency beyond the safe limits for power or performance benefits, respectively. The rationale is that timing errors would be gradual and for an initial range tolerable. This scaling enables a fine-grain accuracy-configurability by varying the timing error occurrence. However, conventional synthesis tools aim at meeting a single delay for all paths within the circuit. Subsequently, with voltage or frequency scaling, either all paths succeed, or a large number of paths fail simultaneously, with a steep increase in error rate and magnitude. This dissertation presents an automated method for minimizing path delays by individually constraining the primary outputs of combinational circuits. As a result, it reduces the number of failing paths and makes the timing errors significantly more gradual, and also rarer and smaller on average. Additionally, it reveals that delays can be significantly reduced towards the least significant bit (LSB) and allows operating at a higher frequency when small operands are computed. Precision scaling, i.e., reducing the representation of data and its accuracy is widely used in multiple abstraction layers in Approximate Computing. Reducing data precision also reduces the transistor toggles, and therefore the dynamic power consumption. Application and architecture level precision scaling results in using only LSBs of the circuit. Arithmetic circuits often have less complexity and logic depth in LSBs compared to most significant bits (MSB). To take advantage of this circuit property, a delay-altering synthesis methodology is proposed. The method finds energy-optimal delay values under configurable precision usage and assigns them to primary outputs used for different precisions. Thereby, it enables dynamic frequency-precision scalable circuits for energy efficiency. Within the hardware architecture, it is possible to instantiate multiple units with the same functionality with different fixed approximation levels, where each block benefits from having fewer transistors and also synthesis relaxations. These blocks can be selected dynamically and thus allow to configure the accuracy during runtime. Instantiating such approximate blocks can be a lower dynamic power but higher area and leakage cost alternative to the current state-of-the-art gating mechanisms which switch off a group of paths in the circuit to reduce the toggling activity. Jointly, instantiating multiple blocks and gating mechanisms produce a large design space of accuracy-configurable hardware, where energy-optimal solutions require a cross-layer search in architecture and circuit levels. To that end, an approximate hardware synthesis methodology is proposed with joint optimizations in architecture and circuit for dynamic accuracy scaling, and thereby it enables energy vs. area trade-offs

KITopen

Cross-Layer Optimization for Power-Efficient and Robust Digital Circuits and Systems

Author: Huang Yanxiang
Publication venue
Publication date: 15/09/2017
Field of study

With the increasing digital services demand, performance and power-efficiency become vital requirements for digital circuits and systems. However, the enabling CMOS technology scaling has been facing significant challenges of device uncertainties, such as process, voltage, and temperature variations. To ensure system reliability, worst-case corner assumptions are usually made in each design level. However, the over-pessimistic worst-case margin leads to unnecessary power waste and performance loss as high as 2.2x. Since optimizations are traditionally confined to each specific level, those safe margins can hardly be properly exploited. To tackle the challenge, it is therefore advised in this Ph.D. thesis to perform a cross-layer optimization for digital signal processing circuits and systems, to achieve a global balance of power consumption and output quality. To conclude, the traditional over-pessimistic worst-case approach leads to huge power waste. In contrast, the adaptive voltage scaling approach saves power (25% for the CORDIC application) by providing a just-needed supply voltage. The power saving is maximized (46% for CORDIC) when a more aggressive voltage over-scaling scheme is applied. These sparsely occurred circuit errors produced by aggressive voltage over-scaling are mitigated by higher level error resilient designs. For functions like FFT and CORDIC, smart error mitigation schemes were proposed to enhance reliability (soft-errors and timing-errors, respectively). Applications like Massive MIMO systems are robust against lower level errors, thanks to the intrinsically redundant antennas. This property makes it applicable to embrace digital hardware that trades quality for power savings.Comment: 190 page

arXiv.org e-Print Archive

Lirias

Recommended from our members

Using Functional Independence Conditions to Optimize the Performance of Latency-Insensitive Systems

Author: Carloni Luca
Li Cheng-Hong
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2007
Field of study

In latency-insensitive design shell modules are used to encapsulate system components (pearls) in order to interface them with the given latency-insensitive protocol and dynamically control their operations. In particular, a shell stalls a pearl whenever new valid data are not available on its input channels. We study how functional independence conditions (FIC) can be applied to the performance optimization of a latency-insensitive system by avoiding unnecessary stalling of their pearls. We present a novel circuit design of a generic shell template that can exploit FICs. We also provide an automatic procedure for the logic synthesis of a shell instance that is only based on the particular local characteristics of its corresponding pearl and does not require any input from the designers. We conclude reporting on a set of experimental results that illustrate the beneits and overhead of the proposed technique

Columbia University Academic Commons