397 research outputs found

    Design for testability of a latch-based design

    Get PDF
    Abstract. The purpose of this thesis was to decrease the area of digital logic in a power management integrated circuit (PMIC), by replacing selected flip-flops with latches. The thesis consists of a theory part, that provides background theory for the thesis, and a practical part, that presents a latch register design and design for testability (DFT) method for achieving an acceptable level of manufacturing fault coverage for it. The total area was decreased by replacing flip-flops of read-write and one-time programmable registers with latches. One set of negative level active primary latches were shared with all the positive level active latch registers in the same register bank. Clock gating was used to select which latch register the write data was loaded to from the primary latches. The latches were made transparent during the shift operation of partial scan testing. The observability of the latch register clock gating logic was improved by leaving the first bit of each latch register as a flip-flop. The controllability was improved by inserting control points. The latch register design, developed in this thesis, resulted in a total area decrease of 5% and a register bank area decrease of 15% compared to a flip-flop-based reference design. The latch register design manages to maintain the same stuck-at fault coverage as the reference design.Salpaperäisen piirin testattavuuden suunnittelu. Tiivistelmä. Tämän opinnäytetyön tarkoituksena oli pienentää digitaalisen logiikan pinta-alaa integroidussa tehonhallintapiirissä, korvaamalla valitut kiikut salpapiireillä. Opinnäytetyö koostuu teoriaosasta, joka antaa taustatietoa opinnäytetyölle, ja käytännön osuudesta, jossa esitellään salparekisteripiiri ja testattavuussuunnittelun menetelmä, jolla saavutettiin riittävän hyvä virhekattavuus salparekisteripiirille. Kokonaispinta-alaa pienennettiin korvaamalla luku-kirjoitusrekistereiden ja kerran ohjelmoitavien rekistereiden kiikut salpapiireillä. Yhdet negatiivisella tasolla aktiiviset isäntä-salpapiirit jaettiin kaikkien samassa rekisteripankissa olevien positiivisella tasolla aktiivisten salparekistereiden kanssa. Kellon portittamisella valittiin mihin salparekisteriin kirjoitusdata ladattiin yhteisistä isäntä-salpapireistä. Osittaisessa testipolkuihin perustuvassa testauksessa salpapiirit tehtiin läpinäkyviksi siirtooperaation aikana. Salparekisterin kellon portituslogiikan havaittavuutta parannettiin jättämällä jokaisen salparekisterin ensimmäinen bitti kiikuksi. Ohjattavuutta parannettiin lisäämällä ohjauspisteitä. Salparekisteripiiri, joka suunniteltiin tässä diplomityössä, pienensi kokonaispinta-alaa 5 % ja rekisteripankin pinta-alaa 15 % verrattuna kiikkuperäiseen vertailupiiriin. Salparekisteripiiri onnistuu pitämään saman juuttumisvikamallin virhekattavuuden kuin vertailupiiri

    Review Paper on Power Efficient Hybrid D- Flip Flop

    Get PDF
    ABSTRACT: In nanometer CMOS technologies leakage powerhas become a serious concern and is a very important issue in hardware and software VLSI design. The leakage powerincreases as technology is scaled down. Low power flip-flops play a vital role for the design of low-power digital systems. In thispaper several different flip flop topologies are analyzed andpower efficient flip flop method is proposed. This paper presentssurvey on low power hybrid dual dynamic flip flop (DDFF) andembedded logic module (DDFF-ELM) based on DDFF. Thissurvey concludes that ultra low leakage CMOS structure calledas sleepy stack inverter pair method which is efficient in leakagepower reduction for design of low power hybrid flip flop thusproviding circuit designer with new choice to handle leakagepower problem

    Effect of clock gating in conditional pulse enhancement flip-flop for low power applications

    Get PDF
    Flip-Flops (FFs) play a fundamental role in digital designs. A clock system consumes above 25% of total system power. The use of pulse-triggered flip-flops (P-FFs) in digital design provides better performance than conventional flip-flop designs. This paper presents the design of a new power-efficient implicit pulse-triggered flip-flop suitable for low power applications. This flip-flop architecture is embedded with two key features. Firstly, the enhancement in width and height of triggering pulses during specific conditions gives a solution for the longest discharging path problem in existing P-FFs. Secondly, the clock gating concept reduces unwanted switching activities at sleep/idle mode of operation and thereby reducing dynamic power consumption. The post-layout simulation results in cadence software based on CMOS 90-nm technology shows that the proposed design features less power dissipation and better power delay performance (PDP) when compared with conventional P-FFs. Its maximum power saving against conventional designs is up to 30.65%

    비용 효율적인 클럭 및 파워 게이팅 설계 방법론

    Get PDF
    학위논문(박사)--서울대학교 대학원 :공과대학 전기·정보공학부,2020. 2. 김태환.저전력 설계는 최신 시스템-온-칩 (SoCs) 설계에서 매우 중요한 요소 중의 하나이다. 본 논문에서는 동적 및 정적 전력 소비를 감소시키기 위한 저전력 설계 방법론에 대해 논한다. 구체적으로 비용 효율적인 저전력 설계를 위하여 두 가지 새로운 기술을 제안한다. 우선 본 논문에서는 동적 전력 소비를 줄일 수 있는 새로운 클럭 게이팅 방법을 제안한다. 기존 플립-플랍 입력 데이터 토글 기반 클럭 게이팅은 가장 널리 사용되는 클럭 게이팅 기법 중의 하나이다. 하지만 이 방법은 더 많은 플립-플랍에 대해 적용할수록 클럭 게이팅에 필요한 부가 회로가 급격히 증가한다는 근본적인 한계를 지니고 있다. 이러한 한계를 극복하기 위하여 본 논문에서는 다음과 같이 새로운 클럭 게이팅 방법을 제안한다. 첫 번째로 기존 입력 데이터 토글 기반 클럭 게이팅 방법에 필요한 회로 자원을 분석하여 해당 방법의 비효율성을 보이고, 기존 방법에서 사용되는 입력 데이터 토글 검출에 필수적이지만 고비용의 XOR 게이트를 완벽히 제거한 플립-플랍 상태 기반 클럭 게이팅'이라는 새로운 클럭 게이팅 방법을 제안한다. 두 번째로 제안된 XOR 게이트가 필요 없는 클럭 게이팅 방법을 위한 부가 회로를 제시하며, 다양한 타이밍 분석을 통하여 해당 회로가 안정적으로 적용될 수 있음을 보인다. 세 번째로 회로의 플립-플랍 상태 프로파일에 기반하여, 제안된 클럭 게이팅 기법을 기존 클럭 게이팅 기법과 완벽하게 통합할 수 있는 클럭 게이팅 방법론을 제안한다. 여러 벤치마크 회로에 대한 실험 결과는 기존 입력 데이터 토글 기반 클럭 게이팅 방법이 전력 소비 절감 기회를 놓치는 반면 본 논문에서 제안된 방법은 모든 타이밍 제약 조건을 만족하면서 전력 소비 감소에 매우 효과적임을 보여준다. 다음으로 정적 전력 소비를 줄이기 위한 방안으로, 본 논문에서는 기존 파워 게이트 회로의 상태 보존용 저장 공간 할당 방법들이 지니고 있는 두 가지 중요한 한계들을 해결할 수 있는 방법을 제안한다. 중요한 한계들이란 첫 번째로 다중-비트 상태 보존 플립-플랍의 무분별한 사용으로 인한 긴 웨이크업 지연 시간이며, 두 번째로 멀티플렉서 되먹임 루프가 있는 상태 보존 플립-플랍의 최적화 불가능성이다. 기존 방법들에서는 상태 보존을 위한 저장 공간을 최소화하기 위해 긴 웨이크업 지연 시간이 필수적이었다. 그리고 되먹임 루프가 있는 플립-플랍은 최적화할 수 없는 대상으로 다루어졌다. 그러나 일반적으로 하드웨어 기술 언어(HDL)로부터 생성되는 되먹임 루프를 지닌 플립-플랍은 무시할 수 있을 정도로 적은 양이 아니다. 첫 번째 한계를 해결하기 위한 방법으로 본 논문에서는 최대 2 비트의 다중-비트 상태 보존 플립-플랍을 사용하여 웨이크업 지연 시간을 두 클럭 사이클로 제한하면서도 상태 보존을 위한 저장 공간을 효율적으로 절약할 수 있음을 보인다. 그리고 두 번째 한계를 극복하기 위해서 되먹임 루프를 지닌 플립-플랍이 포함된 두 플립-플랍 쌍의 상태를 복원할 수 있는 2단 상태 보존 제어 방안을 제안한다. 또한 주어진 회로에서 충돌없이 동시에 존재할 수 있는 플립-플랍 쌍을 최대로 추출하기 위해 독립 집합 문제(independent set problem)기반의 연산법도 제안한다. 벤치마크 회로에 대한 실험 결과는 본 논문에서 제안된 방법이 웨이크업 지연 시간을 두 클럭 사이클로 제한하면서도 상태 보존에 필요한 저장 공간과 파워를 감소시키는데 매우 효과적임을 보여준다.Low power design is of great importance in modern system-on-chips (SoCs). This dissertation studies on low power design methodologies for saving dynamic and static power consumption. Precisely, we unveil two novel techniques of cost effective low power design. Firstly, we propose a novel clock gating method for reducing the dynamic power consumption. Flip-flop's input data toggling based clock gating is one of the most commonly used clock gating methods, in which one critical and inherent limitation is the sharp increase of gating logic as more flip-flops are involved in gating. In this dissertation, we propose a new clock gating method to overcome this limitation. Specifically, (1) we analyze the resources of gating logic in the input data toggling based clock gating, from which an ineffectiveness in resource utilization is observed and we propose a new clock gating technique called flip-flop state driven clock gating which completely eliminates the essential and expensive component of XOR gates for detecting input toggling of flip-flops; (2) we provide the supporting logic circuitry of our proposed XOR-free clock gating, confirming its safe applicability through a comprehensive timing analysis; (3) we propose, based on the flip-flops' state profile, a clock gating methodology that seamlessly combines our flip-flop state based clock gating with the toggling based clock gating. Through experiments with benchmark circuits, it is confirmed that our clock gating method is very effective in reducing power, which otherwise the toggling based clock gating shall miss the power saving opportunity, while meeting all timing constraints. Secondly, for reducing the static power consumption, we solve two critical limitations of the conventional approaches to the allocation of state retention storage for power gated circuits. Those are (1) the long wakeup delay caused by the senseless use of multi-bit retention flip-flops (MBRFFs) and (2) the inability to optimize retention flip-flops for the flip-flops with mux-feedback loop. It should be noted that the conventional approaches have regarded the long wakeup delay as an inevitable consequence of maximizing the reduction of total storage size for state retention while they have treated the flip-flops with mux-feedback loop (called self-loop flip-flop) as nonoptimizable component, but practically, the self-loop flip-flops synthesized from hardware description language (HDL) code are not far from a small amount and thus, can in no way be negligible. More precisely, for solving (1), we show that the use of MBRFFs with up to two bits, consequently, constraining the wakeup delay to no more than two clock cycles, is enough to maintain the high reduction of total retention storage and for solving (2), we devise a 2-phase retention control mechanism for a pair of flip-flops, one of which has self-loop, by which just a single retention bit can be used to restore state of the two flip-flops, and propose an independent set based algorithm for maximally extracting the non-conflict pairs from circuits. Through experiments with benchmark circuits, it is shown that our proposed method is very effective against reducing the state retention storage and the power consumption compared with the existing best MBRFF allocation while the wakeup delay is strictly limited to two clock cycles.1 INTRODUCTION 1 1.1 Clock Gating 1 1.2 Power Gating and State Retention 3 1.3 Multi-bit Retention Registers 4 1.4 Contributions of This Dissertation 6 2 FLIP-FLOP STATE DRIVEN CLOCK GATING: CONCEPT, DESIGN, AND METHODOLOGY 9 2.1 Motivations 9 2.1.1 Toggling based Clock Gating 9 2.1.2 Area and Power by Clock Gating 10 2.2 The Proposed Clock Gating 13 2.2.1 Concept of Flip-flop State Driven Clock Gating 13 2.2.2 Design of Gating Logic Circuitry 17 2.2.3 Integrated Clock Gating Methodology 22 2.2.4 Cost Formulation 23 2.3 Experiments 25 2.3.1 Experimental Setup 25 2.3.2 Experimental Results 26 3 ALGORITHM AND DESIGN OPTIMIZATION OF ALLOCATING MULTI-BIT RETENTION FLIP-FLOPS FOR POWER GATED CIRCUITS 32 3.1 Motivations 32 3.1.1 Flip-flops with Mux-feedback Loop 32 3.1.2 Impact of Wakeup Delay 37 3.2 The Proposed Allocation Algorithm 39 3.3 Design of Multi-Bit Retention Flip-Flop and Multi-Bit Extension 48 3.3.1 Multi-Bit Retention Flip-Flop 48 3.3.2 Multi-Bit Flip-Flop Extension 52 3.4 Experiments 54 3.4.1 Experimental Setup 54 3.4.2 Experimental Results 57 4 CONCLUSIONS 65 4.1 Flip-flop State Driven Clock Gating: Concept, Design, and Methodology 65 4.2 Algorithm and Design Optimization of Allocating Multi-bit Retention Flip-flops for Power Gated Circuits 66 Abstract (In Korean) 71Docto

    Development of an image converter of radical design

    Get PDF
    A long term investigation of thin film sensors, monolithic photo-field effect transistors, and epitaxially diffused phototransistors and photodiodes to meet requirements to produce acceptable all solid state, electronically scanned imaging system, led to the production of an advanced engineering model camera which employs a 200,000 element phototransistor array (organized in a matrix of 400 rows by 500 columns) to secure resolution comparable to commercial television. The full investigation is described for the period July 1962 through July 1972, and covers the following broad topics in detail: (1) sensor monoliths; (2) fabrication technology; (3) functional theory; (4) system methodology; and (5) deployment profile. A summary of the work and conclusions are given, along with extensive schematic diagrams of the final solid state imaging system product

    Power Reductions with Energy Recovery Using Resonant Topologies

    Get PDF
    The problem of power densities in system-on-chips (SoCs) and processors has become more exacerbated recently, resulting in high cooling costs and reliability issues. One of the largest components of power consumption is the low skew clock distribution network (CDN), driving large load capacitance. This can consume as much as 70% of the total dynamic power that is lost as heat, needing elaborate sensing and cooling mechanisms. To mitigate this, resonant clocking has been utilized in several applications over the past decade. An improved energy recovering reconfigurable generalized series resonance (GSR) solution with all the critical support circuitry is developed in this work. This LC resonant clock driver is shown to save about 50% driver power (\u3e40% overall), on a 22nm process node and has 50% less skew than a non-resonant driver at 2GHz. It can operate down to 0.2GHz to support other energy savings techniques like dynamic voltage and frequency scaling (DVFS). As an example, GSR can be configured for the simpler pulse series resonance (PSR) operation to enable further power saving for double data rate (DDR) applications, by using de-skewing latches instead of flip-flop banks. A PSR based subsystem for 40% savings in clocking power with 40% driver active area reduction xii is demonstrated. This new resonant driver generates tracking pulses at each transition of clock for dual edge operation across DVFS. PSR clocking is designed to drive explicit-pulsed latches with negative setup time. Simulations using 45nm IBM/PTM device and interconnect technology models, clocking 1024 flip-flops show the reductions, compared to non-resonant clocking. DVFS range from 2GHz/1.3V to 200MHz/0.5V is obtained. The PSR frequency is set \u3e3× the clock rate, needing only 1/10th the inductance of prior-art LC resonance schemes. The skew reductions are achieved without needing to increase the interconnect widths owing to negative set-up times. Applications in data circuits are shown as well with a 90nm example. Parallel resonant and split-driver non-resonant configurations as well are derived from GSR. Tradeoffs in timing performance versus power, based on theoretical analysis, are compared for the first time and verified. This enables synthesis of an optimal topology for a given application from the GSR

    Clock Tree Power Optimization of Three Dimensional VLSI System with Network

    Get PDF
    Abstract:The proposed method is based on minimum-cost maximum-flow formulation to globally determine the tree topology, which maintains load balance and considers the wirelength between pulse generators and pulsed latches. Experimental results indicate that the proposed migration approach can improve the power consumption by 12% and 13% with 7% and 70% skew improvements on average compared with the most recent paper on the industrial circuits and ISPD-2010 benchmarks, respectively. Minimizing the size of a clock tree is known as an effective approach to reduce power dissipation in modern circuit designs. However, most existing power-aware clock-tree minimization algorithms optimize power on the basis of flip-flops alone, which may result in limited power savings. To achieve a power and timing tradeoff, this paper investigates the pulsed-latch utilization in a clock tree for further power savings. This is the first paper to propose a migration approach to efficiently construct a clock tree with both pulsed-latches and flip-flops

    Power efficient resilient microarchitectures for PVT variability mitigation

    Get PDF
    Nowadays, the high power density and the process, voltage, and temperature variations became the most critical issues that limit the performance of the digital integrated circuits because of the continuous scaling of the fabrication technology. Dynamic voltage and frequency scaling technique is used to reduce the power consumption while different time relaxation techniques and error recovery microarchitectures are used to tolerate the process, voltage, and temperature variations. These techniques reduce the throughput by scaling down the frequency or flushing and restarting the errant pipeline. This thesis presents a novel resilient microarchitecture which is called ERSUT-based resilient microarchitecture to tolerate the induced delays generated by the voltage scaling or the process, voltage, and temperature variations. The resilient microarchitecture detects and recovers the induced errors without flushing the pipeline and without scaling down the operating frequency. An ERSUT-based resilient 16 × 16 bit MAC unit, implemented using Global Foundries 65 nm technology and ARM standard cells library, is introduced as a case study with 18.26% area overhead and up to 1.5x speedup. At the typical conditions, the maximum frequency of the conventional MAC unit is about 375 MHz while the resilient MAC unit operates correctly at a frequency up to 565 MHz. In case of variations, the resilient MAC unit tolerates induced delays up to 50% of the clock period while keeping its throughput equal to the conventional MAC unit’s maximum throughput. At 375 MHz, the resilient MAC unit is able to scale down the supply voltage from 1.2 V to 1.0 V saving about 29% of the power consumed by the conventional MAC unit. A double-edge-triggered microarchitecture is also introduced to reduce the power consumption extremely by reducing the frequency of the clock tree to the half while preserving the same maximum throughput. This microarchitecture is applied to different ISCAS’89 benchmark circuits in addition to the 16x16 bit MAC unit and the average power reduction of all these circuits is 63.58% while the average area overhead is 31.02%. All these circuits are designed using Global Foundries 65nm technology and ARM standard cells library. Towards the full automation of the ERSUT-based resilient microarchitecture, an ERSUT-based algorithm is introduced in C++ to accelerate the design process of the ERSUT-based microarchitecture. The developed algorithm reduces the design-time efforts dramatically and allows the ERSUT-based microarchitecture to be adopted by larger industrial designs. Depending on the ERSUT-based algorithm, a validation study about applying the ERSUT-based microarchitecture on the MAC unit and different ISCAS’89 benchmark circuits with different complexity weights is introduced. This study shows that 72% of these circuits tolerates more than 14% of their clock periods and 54.5% of these circuits tolerates more than 20% while 27% of these circuits tolerates more than 30%. Consequently, the validation study proves that the ERSUT-based resilient microarchitecture is a valid applicable solution for different circuits with different complexity weights

    A Modified Implementation of Tristate Inverter Based Static Master-Slave Flip-Flop with Improved Power-Delay-Area Product

    Get PDF
    The paper introduces novel architectures for implementation of fully static master-slave flip-flops for low power, high performance, and high density. Based on the proposed structure, traditional C2MOS latch (tristate inverter/clocked inverter) based flip-flop is implemented with fewer transistors. The modified C2MOS based flip-flop designs mC2MOSff1 and mC2MOSff2 are realized using only sixteen transistors each while the number of clocked transistors is also reduced in case of mC2MOSff1. Postlayout simulations indicate that mC2MOSff1 flip-flop shows 12.4% improvement in PDAP (power-delay-area product) when compared with transmission gate flip-flop (TGFF) at 16X capacitive load which is considered to be the best design alternative among the conventional master-slave flip-flops. To validate the correct behaviour of the proposed design, an eight bit asynchronous counter is designed to layout level. LVS and parasitic extraction were carried out on Calibre, whereas layouts were implemented using IC station (Mentor Graphics). HSPICE simulations were used to characterize the transient response of the flip-flop designs in a 180 nm/1.8 V CMOS technology. Simulations were also performed at 130 nm, 90 nm, and 65 nm to reveal the scalability of both the designs at modern process nodes
    corecore