201 research outputs found

    Desynchronization: Synthesis of asynchronous circuits from synchronous specifications

    Get PDF
    Asynchronous implementation techniques, which measure logic delays at run time and activate registers accordingly, are inherently more robust than their synchronous counterparts, which estimate worst-case delays at design time, and constrain the clock cycle accordingly. De-synchronization is a new paradigm to automate the design of asynchronous circuits from synchronous specifications, thus permitting widespread adoption of asynchronicity, without requiring special design skills or tools. In this paper, we first of all study different protocols for de-synchronization and formally prove their correctness, using techniques originally developed for distributed deployment of synchronous language specifications. We also provide a taxonomy of existing protocols for asynchronous latch controllers, covering in particular the four-phase handshake protocols devised in the literature for micro-pipelines. We then propose a new controller which exhibits provably maximal concurrency, and analyze the performance of desynchronized circuits with respect to the original synchronous optimized implementation. We finally prove the feasibility and effectiveness of our approach, by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architectur

    Scan Chain Grouping for Mitigating IR-Drop-Induced Test Data Corruption

    Get PDF
    Loading and unloading test patterns during scan testing causes many scan flip-flops to trigger simultaneously. This instantaneous switching activity during shift in turn may cause excessive IR-drop that can disrupt the states of some scan flip-flops and corrupt test stimuli or responses. A common design technique to even out these instantaneous power surges is to design multiple scan chains and shift only a group of the scan chains at a same time. This paper introduces a novel algorithm to optimally group scan chains so as to minimize the probability of test data corruption caused by excessive instantaneous IR-drop on scan flip-flops. The experiments show optimal results on all large ITC\u2799 benchmark circuits.2017 IEEE 26th Asian Test Symposium (ATS), 27-30 November 2017, Taipei, Taiwa

    A Novel Methodology for Error-Resilient Circuits in Near-Threshold Computing

    Get PDF
    Department of Electrical EngineeringThe main goal of designing VLSI system is high performance with low energy consumption. Actually, to realize the human-related techniques, such as internet of things (IoTs) and wearable devices, efficient power management techniques are required. Near threshold computing (NTC) is one of the most well-known techniques which is proposed for the trade-off between energy consumption and performance improvement. With this technique, the solution would be selected by the lowest energy with highest performance. However, NTC suffers a significant performance degradation, which is prone to timing errors. However, main goal of Integrated Circuit (IC) design is making the circuit to always operate correctly though worst-case condition. But, in order to make the circuit always work correctly, considerable area and power overheads may occur. As an alternative, better-than-worst-case (BTWC) design paradigm has been proposed. One of the main design of BTWC design includes error-resilient circuits which detect and correct timing errors, though they cause area and power overheads. In this thesis, we propose various design methodologies which provide an optimal implementation of error-resilient circuits. Slack-based, sensitivity-based methodology and modified Quine-McCluskey (Q-M) algorithm have been exploited to earn the minimum set of error-resilient circuits without any loss of detection ability. From sensitivity-based methodology, benchmark results show that the optimal designs reduces up to 46% monitoring area without compromising error detection ability of the initial error-resilient design. From the Quine-McCluskey (Q-M) algorithm, benchmark results show that optimal design reduces up to 72% of flip-flops which are required to be changed to error-resilient circuits without compromising an error detection ability. In addition, more power and area reduction can be possible when reasonable underestimation of error detection ability is accepted. Monte-Carlo analysis validates that our proposed method is tolerant to process variation.ope

    PROBABILITY-DRIVEN MULTI-BIT FLIP-FLOP INTEGRATION WITH CLOCK GATING

    Get PDF
    Data-driven clock gated (DDCG) and multi bit flip-flops (MBFFs) are two low-power design techniques that are usually treated separately. Combining these techniques into a single grouping algorithm and design flow enables further power savings. We study MBFF multiplicity and its synergy with FF data-to-clock toggling probabilities. A probabilistic model is implemented to maximize the expected energy savings by grouping FFs in increasing order of their data-to-clock toggling probabilities. We present a front-end design flow, guided by physical layout considerations for a 65-nm 32-bit MIPS and a 28-nm industrial network processor. It is shown to achieve the power savings of 23% and 17%, respectively, compared with designs with ordinary FFs. About half of the savings was due to integrating the DDCG into the MBFFs. The proposed architecture of this paper analysis the logic size, area and power consumption using Tanner tool

    DLWUC: Distance and Load Weight Updated Clustering-Based Clock Distribution for SOC Architecture

    Get PDF
    High-clock skew variations and degradation of driving ability of buffers lead to an additional power dissipation in Clock Distribution Network (CDN) that increases the dimensionality of buffers and coordination among flip-flops. The manual threshold level to predict the Region of Interest (ROI) is not applicable in clustering process due to the complexities of excessive wire length and critical delay. This paper proposes the Distance and Load Weight Updated Clustering (DLWUC) to determine the suitable position of logical components. Initially, the DLWUC utilizes the Hybrid Weighted Distance (HWD) to estimate the distance and construct the distance matrix. The weight value extracted from the sorted distance matrix facilitates the projection of buffers. The updated weight value serves as the base for clustering with labeled outputs. The placement of buffer at the suitable place from load weight updated clustering provides the necessary trade-off between clock provision and load balance. The DLWUC discussed in this paper reduces the size of buffers, skew, power and latency compared to the existing topologies

    Physical Design and Clock Tree Synthesis Methods For A 8-Bit Processor

    Get PDF
    Now days a number of processors are available with a lot kind of feature from different industries. A processor with similar kind of architecture of the current processors only missing the memory stuffs like the RAM and ROM has been designed here with the help of Verilog style of coding. This processor contains architecturally the program counter, instruction register, ALU, ALU latch, General Purpose Registers, control state module, flag registers and the core module containing all the modules. And a test module is designed for testing the processor. After the design of the processor with successful functionality, the processor is synthesized with 180nm technology. The synthesis is performed with the data path optimization like the selection of proper adders and multipliers for timing optimization in the data path while the ALU operations are performed. During synthesis how to take care of the worst negative slack (WNS), how to include the clock gating cells, how to define the cost and path groups etc. have been covered. After the proper synthesis we get the proper net list and the synthesized constraint file for carrying out the physical design. In physical design the steps like floor-planning, partitioning, placement, legalization of the placement, clock tree synthesis, and routing etc. have been performed. At all the stages the static timing analysis is performed for the timing meet of the design for better performance in terms of timing or frequency. Each steps of physical design are discussed with special effort towards the concepts behind the step. Out of all the steps of physical design the clock tree synthesis is performed with some improvement in the performance of the clock tree by creating a symmetrical clock tree and maintaining more common clock paths. A special algorithm has been framed for creating a symmetrical clock tree and thereby making the power consumption of the clock tree low

    Optimization of state assignment in a finite state machine: evaluation of a simulated annealing approach

    Get PDF
    In this research, the application of the Simulated Annealing algorithm to solve the state assignment problem in finite state machines is investigated. The state assignment is a classic NP-Complete problem in digital systems design and impacts directly on both area and power costs as well as on the design time. The solutions found in the literature uses population-based methods that consume additional computer resources. The Simulated Annealing algorithm has been chosen because it does not use populations while seeking a solution. Therefore, the objective of this research is to evaluate the impact on the quality of the solution when using the Simulated Annealing approach. The proposed solution is evaluated using the LGSynth89 benchmark and compared with other approaches in the state-of-the-art. The experimental simulations point out an average loss in solution quality of 11%, while an average processing performance of 86%. The results indicate that it is possible to have few quality losses with a significant increase in processing performance

    ๋กœ์ง ๋ฐ ํ”ผ์ง€์ปฌ ํ•ฉ์„ฑ์—์„œ์˜ ํƒ€์ด๋ฐ ๋ถ„์„๊ณผ ์ตœ์ ํ™”

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2020. 8. ๊น€ํƒœํ™˜.Timing analysis is one of the necessary steps in the development of a semiconductor circuit. In addition, it is increasingly important in the advanced process technologies due to various factors, including the increase of processโ€“voltageโ€“temperature variation. This dissertation addresses three problems related to timing analysis and optimization in logic and physical synthesis. Firstly, most static timing analysis today are based on conventional fixed flip-flop timing models, in which every flip-flop is assumed to have a fixed clock-to-Q delay. However, setup and hold skews affect the clock-to-Q delay in reality. In this dissertation, I propose a mathematical formulation to solve the problem and apply it to the clock skew scheduling problems as well as to the analysis of a given circuit, with a scalable speedup technique. Secondly, near-threshold computing is one of the promising concepts for energy-efficient operation of VLSI systems, but wide performance variation and nonlinearity to process variations block the proliferation. To cope with this, I propose a holistic hardware performance monitoring methodology for accurate timing prediction in a near-threshold voltage regime and advanced process technology. Lastly, an asynchronous circuit is one of the alternatives to the conventional synchronous style, and asynchronous pipeline circuit especially attractive because of its small design effort. This dissertation addresses the synthesis problem of lightening two-phase bundled-data asynchronous pipeline controllers, in which delay buffers are essential for guaranteeing the correct handshaking operation but incurs considerable area increase.ํƒ€์ด๋ฐ ๋ถ„์„์€ ๋ฐ˜๋„์ฒด ํšŒ๋กœ ๊ฐœ๋ฐœ ํ•„์ˆ˜ ๊ณผ์ • ์ค‘ ํ•˜๋‚˜๋กœ, ์ตœ์‹  ๊ณต์ •์ผ์ˆ˜๋ก ๊ณต์ •-์ „์••-์˜จ๋„ ๋ณ€์ด ์ฆ๊ฐ€๋ฅผ ํฌํ•จํ•œ ๋‹ค์–‘ํ•œ ์š”์ธ์œผ๋กœ ํ•˜์—ฌ๊ธˆ ๊ทธ ์ค‘์š”์„ฑ์ด ์ปค์ง€๊ณ  ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋กœ์ง ๋ฐ ํ”ผ์ง€์ปฌ ํ•ฉ์„ฑ๊ณผ ๊ด€๋ จํ•˜์—ฌ ์„ธ ๊ฐ€์ง€ ํƒ€์ด๋ฐ ๋ถ„์„ ๋ฐ ์ตœ์ ํ™” ๋ฌธ์ œ์— ๋Œ€ํ•ด ๋‹ค๋ฃฌ๋‹ค. ์ฒซ์งธ๋กœ, ์˜ค๋Š˜๋‚  ๋Œ€๋ถ€๋ถ„์˜ ์ •์  ํƒ€์ด๋ฐ ๋ถ„์„์€ ๋ชจ๋“  ํ”Œ๋ฆฝ-ํ”Œ๋กญ์˜ ํด๋Ÿญ-์ถœ๋ ฅ ๋”œ๋ ˆ์ด๊ฐ€ ๊ณ ์ •๋œ ๊ฐ’์ด๋ผ๋Š” ๊ฐ€์ •์„ ๋ฐ”ํƒ•์œผ๋กœ ์ด๋ฃจ์–ด์กŒ๋‹ค. ํ•˜์ง€๋งŒ ์‹ค์ œ ํด๋Ÿญ-์ถœ๋ ฅ ๋”œ๋ ˆ์ด๋Š” ํ•ด๋‹น ํ”Œ๋ฆฝ-ํ”Œ๋กญ์˜ ์…‹์—… ๋ฐ ํ™€๋“œ ์Šคํ์— ์˜ํ–ฅ์„ ๋ฐ›๋Š”๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ํŠน์„ฑ์„ ์ˆ˜ํ•™์ ์œผ๋กœ ์ •๋ฆฌํ•˜์˜€์œผ๋ฉฐ, ์ด๋ฅผ ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ์†๋„ ํ–ฅ์ƒ ๊ธฐ๋ฒ•๊ณผ ๋”๋ถˆ์–ด ์ฃผ์–ด์ง„ ํšŒ๋กœ์˜ ํƒ€์ด๋ฐ ๋ถ„์„ ๋ฐ ํด๋Ÿญ ์Šคํ ์Šค์ผ€์ฅด๋ง ๋ฌธ์ œ์— ์ ์šฉํ•˜์˜€๋‹ค. ๋‘˜์งธ๋กœ, ์œ ์‚ฌ ๋ฌธํ„ฑ ์—ฐ์‚ฐ์€ ์ดˆ๊ณ ์ง‘์  ํšŒ๋กœ ๋™์ž‘์˜ ์—๋„ˆ์ง€ ํšจ์œจ์„ ๋Œ์–ด ์˜ฌ๋ฆด ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์—์„œ ๊ฐ๊ด‘๋ฐ›์ง€๋งŒ, ํฐ ํญ์˜ ์„ฑ๋Šฅ ๋ณ€์ด ๋ฐ ๋น„์„ ํ˜•์„ฑ ๋•Œ๋ฌธ์— ๋„๋ฆฌ ํ™œ์šฉ๋˜๊ณ  ์žˆ์ง€ ์•Š๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์œ ์‚ฌ ๋ฌธํ„ฑ ์ „์•• ์˜์—ญ ๋ฐ ์ตœ์‹  ๊ณต์ • ๋…ธ๋“œ์—์„œ ๋ณด๋‹ค ์ •ํ™•ํ•œ ํƒ€์ด๋ฐ ์˜ˆ์ธก์„ ์œ„ํ•œ ํ•˜๋“œ์›จ์–ด ์„ฑ๋Šฅ ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐฉ๋ฒ•๋ก  ์ „๋ฐ˜์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋น„๋™๊ธฐ ํšŒ๋กœ๋Š” ๊ธฐ์กด ๋™๊ธฐ ํšŒ๋กœ์˜ ๋Œ€์•ˆ ์ค‘ ํ•˜๋‚˜๋กœ, ๊ทธ ์ค‘์—์„œ๋„ ๋น„๋™๊ธฐ ํŒŒ์ดํ”„๋ผ์ธ ํšŒ๋กœ๋Š” ๋น„๊ต์  ์ ์€ ์„ค๊ณ„ ๋…ธ๋ ฅ๋งŒ์œผ๋กœ๋„ ๊ตฌํ˜„ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์žฅ์ ์ด ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” 2์œ„์ƒ ๋ฌถ์Œ ๋ฐ์ดํ„ฐ ํ”„๋กœํ† ์ฝœ ๊ธฐ๋ฐ˜ ๋น„๋™๊ธฐ ํŒŒ์ดํ”„๋ผ์ธ ์ปจํŠธ๋กค๋Ÿฌ ์ƒ์—์„œ, ์ •ํ™•ํ•œ ํ•ธ๋“œ์…ฐ์ดํ‚น ํ†ต์‹ ์„ ์œ„ํ•ด ์‚ฝ์ž…๋œ ๋”œ๋ ˆ์ด ๋ฒ„ํผ์— ์˜ํ•œ ๋ฉด์  ์ฆ๊ฐ€๋ฅผ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ํ•ฉ์„ฑ ๊ธฐ๋ฒ•์„ ์ œ์‹œํ•˜์˜€๋‹ค.1 INTRODUCTION 1 1.1 Flexible Flip-Flop Timing Model 1 1.2 Hardware Performance Monitoring Methodology 4 1.3 Asynchronous Pipeline Controller 10 1.4 Contributions of this Dissertation 15 2 ANALYSIS AND OPTIMIZATION CONSIDERING FLEXIBLE FLIP-FLOP TIMING MODEL 17 2.1 Preliminaries 17 2.1.1 Terminologies 17 2.1.2 Timing Analysis 20 2.1.3 Clock-to-Q Delay Surface Modeling 21 2.2 Clock-to-Q Delay Interval Analysis 22 2.2.1 Derivation 23 2.2.2 Additional Constraints 26 2.2.3 Analysis: Finding Minimum Clock Period 28 2.2.4 Optimization: Clock Skew Scheduling 30 2.2.5 Scalable Speedup Technique 33 2.3 Experimental Results 37 2.3.1 Application to Minimum Clock Period Finding 37 2.3.2 Application to Clock Skew Scheduling 39 2.3.3 Efficacy of Scalable Speedup Technique 43 2.4 Summary 44 3 HARDWARE PERFORMANCE MONITORING METHODOLOGY AT NTC AND ADVANCED TECHNOLOGY NODE 45 3.1 Overall Flow of Proposed HPM Methodology 45 3.2 Prerequisites to HPM Methodology 47 3.2.1 BEOL Process Variation Modeling 47 3.2.2 Surrogate Model Preparation 49 3.3 HPM Methodology: Design Phase 52 3.3.1 HPM2PV Model Construction 52 3.3.2 Optimization of Monitoring Circuits Configuration 54 3.3.3 PV2CPT Model Construction 58 3.4 HPM Methodology: Post-Silicon Phase 60 3.4.1 Transfer Learning in Silicon Characterization Step 60 3.4.2 Procedures in Volume Production Phase 61 3.5 Experimental Results 62 3.5.1 Experimental Setup 62 3.5.2 Exploration of Monitoring Circuits Configuration 64 3.5.3 Effectiveness of Monitoring Circuits Optimization 66 3.5.4 Considering BEOL PVs and Uncertainty Learning 68 3.5.5 Comparison among Different Prediction Flows 69 3.5.6 Effectiveness of Prediction Model Calibration 71 3.6 Summary 73 4 LIGHTENING ASYNCHRONOUS PIPELINE CONTROLLER 75 4.1 Preliminaries and State-of-the-Art Work 75 4.1.1 Bundled-data vs. Dual-rail Asynchronous Circuits 75 4.1.2 Two-phase vs. Four-phase Bundled-data Protocol 76 4.1.3 Conventional State-of-the-Art Pipeline Controller Template 77 4.2 Delay Path Sharing for Lightening Pipeline Controller Template 78 4.2.1 Synthesizing Sharable Delay Paths 78 4.2.2 Validating Logical Correctness for Sharable Delay Paths 80 4.2.3 Reformulating Timing Constraints of Controller Template 81 4.2.4 Minimally Allocating Delay Buffers 87 4.3 In-depth Pipeline Controller Template Synthesis with Delay Path Reusing 88 4.3.1 Synthesizing Delay Path Units 88 4.3.2 Validating Logical Correctness of Delay Path Units 89 4.3.3 Updating Timing Constraints for Delay Path Units 91 4.3.4 In-depth Synthesis Flow Utilizing Delay Path Units 95 4.4 Experimental Results 99 4.4.1 Environment Setup 99 4.4.2 Piecewise Linear Modeling of Delay Path Unit Area 99 4.4.3 Comparison of Power, Performance, and Area 102 4.5 Summary 107 5 CONCLUSION 109 5.1 Chapter 2 109 5.2 Chapter 3 110 5.3 Chapter 4 110 Abstract (In Korean) 127Docto
    • โ€ฆ
    corecore