Search CORE

201 research outputs found

Desynchronization: Synthesis of asynchronous circuits from synchronous specifications

Author: Alex Kondratyev
Christos Sotiriou
Jordi Cortadella
Lavagno Luciano
Publication venue
Publication date: 01/01/2006
Field of study

Asynchronous implementation techniques, which measure logic delays at run time and activate registers accordingly, are inherently more robust than their synchronous counterparts, which estimate worst-case delays at design time, and constrain the clock cycle accordingly. De-synchronization is a new paradigm to automate the design of asynchronous circuits from synchronous specifications, thus permitting widespread adoption of asynchronicity, without requiring special design skills or tools. In this paper, we first of all study different protocols for de-synchronization and formally prove their correctness, using techniques originally developed for distributed deployment of synchronous language specifications. We also provide a taxonomy of existing protocols for asynchronous latch controllers, covering in particular the four-phase handshake protocols devised in the literature for micro-pipelines. We then propose a new controller which exhibits provably maximal concurrency, and analyze the performance of desynchronized circuits with respect to the original synchronous optimized implementation. We finally prove the feasibility and effectiveness of our approach, by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architectur

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Scan Chain Grouping for Mitigating IR-Drop-Induced Test Data Corruption

Author: Holst Stefan
Kajihara Seiji
Miyase Kohei
Qianz Jun
Wen Xiaoqing
Zhang Yucong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/01/2018
Field of study

Loading and unloading test patterns during scan testing causes many scan flip-flops to trigger simultaneously. This instantaneous switching activity during shift in turn may cause excessive IR-drop that can disrupt the states of some scan flip-flops and corrupt test stimuli or responses. A common design technique to even out these instantaneous power surges is to design multiple scan chains and shift only a group of the scan chains at a same time. This paper introduces a novel algorithm to optimally group scan chains so as to minimize the probability of test data corruption caused by excessive instantaneous IR-drop on scan flip-flops. The experiments show optimal results on all large ITC\u2799 benchmark circuits.2017 IEEE 26th Asian Test Symposium (ATS), 27-30 November 2017, Taipei, Taiwa

Kyutacar : Kyushu Institute of Technology Academic Repository

A Novel Methodology for Error-Resilient Circuits in Near-Threshold Computing

Author: Lee Jaemin
Publication venue: Graduate School of UNIST
Publication date: 01/08/2017
Field of study

Department of Electrical EngineeringThe main goal of designing VLSI system is high performance with low energy consumption. Actually, to realize the human-related techniques, such as internet of things (IoTs) and wearable devices, efficient power management techniques are required. Near threshold computing (NTC) is one of the most well-known techniques which is proposed for the trade-off between energy consumption and performance improvement. With this technique, the solution would be selected by the lowest energy with highest performance. However, NTC suffers a significant performance degradation, which is prone to timing errors. However, main goal of Integrated Circuit (IC) design is making the circuit to always operate correctly though worst-case condition. But, in order to make the circuit always work correctly, considerable area and power overheads may occur. As an alternative, better-than-worst-case (BTWC) design paradigm has been proposed. One of the main design of BTWC design includes error-resilient circuits which detect and correct timing errors, though they cause area and power overheads. In this thesis, we propose various design methodologies which provide an optimal implementation of error-resilient circuits. Slack-based, sensitivity-based methodology and modified Quine-McCluskey (Q-M) algorithm have been exploited to earn the minimum set of error-resilient circuits without any loss of detection ability. From sensitivity-based methodology, benchmark results show that the optimal designs reduces up to 46% monitoring area without compromising error detection ability of the initial error-resilient design. From the Quine-McCluskey (Q-M) algorithm, benchmark results show that optimal design reduces up to 72% of flip-flops which are required to be changed to error-resilient circuits without compromising an error detection ability. In addition, more power and area reduction can be possible when reasonable underestimation of error detection ability is accepted. Monte-Carlo analysis validates that our proposed method is tolerant to process variation.ope

ScholarWorks@UNIST

PROBABILITY-DRIVEN MULTI-BIT FLIP-FLOP INTEGRATION WITH CLOCK GATING

Author: Bharti Supriya
Sasikiran S
Publication venue: International Journal of Innovative Technology and Research
Publication date: 14/11/2018
Field of study

Data-driven clock gated (DDCG) and multi bit flip-flops (MBFFs) are two low-power design techniques that are usually treated separately. Combining these techniques into a single grouping algorithm and design flow enables further power savings. We study MBFF multiplicity and its synergy with FF data-to-clock toggling probabilities. A probabilistic model is implemented to maximize the expected energy savings by grouping FFs in increasing order of their data-to-clock toggling probabilities. We present a front-end design flow, guided by physical layout considerations for a 65-nm 32-bit MIPS and a 28-nm industrial network processor. It is shown to achieve the power savings of 23% and 17%, respectively, compared with designs with ordinary FFs. About half of the savings was due to integrating the DDCG into the MBFFs. The proposed architecture of this paper analysis the logic size, area and power consumption using Tanner tool

International Journal of Innovative Technology and Research (IJITR)

DLWUC: Distance and Load Weight Updated Clustering-Based Clock Distribution for SOC Architecture

Author: A. Sridevi
V. Lakshmiprabha
Publication venue: 'Mechanical Engineering Faculty in Slavonski Brod'
Publication date: 01/01/2018
Field of study

High-clock skew variations and degradation of driving ability of buffers lead to an additional power dissipation in Clock Distribution Network (CDN) that increases the dimensionality of buffers and coordination among flip-flops. The manual threshold level to predict the Region of Interest (ROI) is not applicable in clustering process due to the complexities of excessive wire length and critical delay. This paper proposes the Distance and Load Weight Updated Clustering (DLWUC) to determine the suitable position of logical components. Initially, the DLWUC utilizes the Hybrid Weighted Distance (HWD) to estimate the distance and construct the distance matrix. The weight value extracted from the sorted distance matrix facilitates the projection of buffers. The updated weight value serves as the base for clustering with labeled outputs. The placement of buffer at the suitable place from load weight updated clustering provides the necessary trade-off between clock provision and load balance. The DLWUC discussed in this paper reduces the size of buffers, skew, power and latency compared to the existing topologies

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Physical Design and Clock Tree Synthesis Methods For A 8-Bit Processor

Author: Daxiniray Debaprasad
Publication venue
Publication date: 01/01/2016
Field of study

Now days a number of processors are available with a lot kind of feature from different industries. A processor with similar kind of architecture of the current processors only missing the memory stuffs like the RAM and ROM has been designed here with the help of Verilog style of coding. This processor contains architecturally the program counter, instruction register, ALU, ALU latch, General Purpose Registers, control state module, flag registers and the core module containing all the modules. And a test module is designed for testing the processor. After the design of the processor with successful functionality, the processor is synthesized with 180nm technology. The synthesis is performed with the data path optimization like the selection of proper adders and multipliers for timing optimization in the data path while the ALU operations are performed. During synthesis how to take care of the worst negative slack (WNS), how to include the clock gating cells, how to define the cost and path groups etc. have been covered. After the proper synthesis we get the proper net list and the synthesized constraint file for carrying out the physical design. In physical design the steps like floor-planning, partitioning, placement, legalization of the placement, clock tree synthesis, and routing etc. have been performed. At all the stages the static timing analysis is performed for the timing meet of the design for better performance in terms of timing or frequency. Each steps of physical design are discussed with special effort towards the concepts behind the step. Out of all the steps of physical design the clock tree synthesis is performed with some improvement in the performance of the clock tree by creating a symmetrical clock tree and maintaining more common clock paths. A special algorithm has been framed for creating a symmetrical clock tree and thereby making the power consumption of the clock tree low

ethesis@nitr

Optimization of state assignment in a finite state machine: evaluation of a simulated annealing approach

Author: da Silva Almeida Tiago
da Silva Ribeiro Reinaldo
Lima de Carvalho Rafael
Publication venue: 'Universidade Federal do Tocantins'
Publication date: 01/12/2021
Field of study

In this research, the application of the Simulated Annealing algorithm to solve the state assignment problem in finite state machines is investigated. The state assignment is a classic NP-Complete problem in digital systems design and impacts directly on both area and power costs as well as on the design time. The solutions found in the literature uses population-based methods that consume additional computer resources. The Simulated Annealing algorithm has been chosen because it does not use populations while seeking a solution. Therefore, the objective of this research is to evaluate the impact on the quality of the solution when using the Simulated Annealing approach. The proposed solution is evaluated using the LGSynth89 benchmark and compared with other approaches in the state-of-the-art. The experimental simulations point out an average loss in solution quality of 11%, while an average processing performance of 86%. The results indicate that it is possible to have few quality losses with a significant increase in processing performance

Periódicos UFT (Universidade Federal do Tocantins)

로직 및 피지컬 합성에서의 타이밍 분석과 최적화

Author: 허정우
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2020. 8. 김태환.Timing analysis is one of the necessary steps in the development of a semiconductor circuit. In addition, it is increasingly important in the advanced process technologies due to various factors, including the increase of process–voltage–temperature variation. This dissertation addresses three problems related to timing analysis and optimization in logic and physical synthesis. Firstly, most static timing analysis today are based on conventional fixed flip-flop timing models, in which every flip-flop is assumed to have a fixed clock-to-Q delay. However, setup and hold skews affect the clock-to-Q delay in reality. In this dissertation, I propose a mathematical formulation to solve the problem and apply it to the clock skew scheduling problems as well as to the analysis of a given circuit, with a scalable speedup technique. Secondly, near-threshold computing is one of the promising concepts for energy-efficient operation of VLSI systems, but wide performance variation and nonlinearity to process variations block the proliferation. To cope with this, I propose a holistic hardware performance monitoring methodology for accurate timing prediction in a near-threshold voltage regime and advanced process technology. Lastly, an asynchronous circuit is one of the alternatives to the conventional synchronous style, and asynchronous pipeline circuit especially attractive because of its small design effort. This dissertation addresses the synthesis problem of lightening two-phase bundled-data asynchronous pipeline controllers, in which delay buffers are essential for guaranteeing the correct handshaking operation but incurs considerable area increase.타이밍 분석은 반도체 회로 개발 필수 과정 중 하나로, 최신 공정일수록 공정-전압-온도 변이 증가를 포함한 다양한 요인으로 하여금 그 중요성이 커지고 있다. 본 논문에서는 로직 및 피지컬 합성과 관련하여 세 가지 타이밍 분석 및 최적화 문제에 대해 다룬다. 첫째로, 오늘날 대부분의 정적 타이밍 분석은 모든 플립-플롭의 클럭-출력 딜레이가 고정된 값이라는 가정을 바탕으로 이루어졌다. 하지만 실제 클럭-출력 딜레이는 해당 플립-플롭의 셋업 및 홀드 스큐에 영향을 받는다. 본 논문에서는 이러한 특성을 수학적으로 정리하였으며, 이를 확장 가능한 속도 향상 기법과 더불어 주어진 회로의 타이밍 분석 및 클럭 스큐 스케쥴링 문제에 적용하였다. 둘째로, 유사 문턱 연산은 초고집적 회로 동작의 에너지 효율을 끌어 올릴 수 있다는 점에서 각광받지만, 큰 폭의 성능 변이 및 비선형성 때문에 널리 활용되고 있지 않다. 이를 해결하기 위해 유사 문턱 전압 영역 및 최신 공정 노드에서 보다 정확한 타이밍 예측을 위한 하드웨어 성능 모니터링 방법론 전반을 제안하였다. 마지막으로, 비동기 회로는 기존 동기 회로의 대안 중 하나로, 그 중에서도 비동기 파이프라인 회로는 비교적 적은 설계 노력만으로도 구현 가능하다는 장점이 있다. 본 논문에서는 2위상 묶음 데이터 프로토콜 기반 비동기 파이프라인 컨트롤러 상에서, 정확한 핸드셰이킹 통신을 위해 삽입된 딜레이 버퍼에 의한 면적 증가를 완화할 수 있는 합성 기법을 제시하였다.1 INTRODUCTION 1 1.1 Flexible Flip-Flop Timing Model 1 1.2 Hardware Performance Monitoring Methodology 4 1.3 Asynchronous Pipeline Controller 10 1.4 Contributions of this Dissertation 15 2 ANALYSIS AND OPTIMIZATION CONSIDERING FLEXIBLE FLIP-FLOP TIMING MODEL 17 2.1 Preliminaries 17 2.1.1 Terminologies 17 2.1.2 Timing Analysis 20 2.1.3 Clock-to-Q Delay Surface Modeling 21 2.2 Clock-to-Q Delay Interval Analysis 22 2.2.1 Derivation 23 2.2.2 Additional Constraints 26 2.2.3 Analysis: Finding Minimum Clock Period 28 2.2.4 Optimization: Clock Skew Scheduling 30 2.2.5 Scalable Speedup Technique 33 2.3 Experimental Results 37 2.3.1 Application to Minimum Clock Period Finding 37 2.3.2 Application to Clock Skew Scheduling 39 2.3.3 Efficacy of Scalable Speedup Technique 43 2.4 Summary 44 3 HARDWARE PERFORMANCE MONITORING METHODOLOGY AT NTC AND ADVANCED TECHNOLOGY NODE 45 3.1 Overall Flow of Proposed HPM Methodology 45 3.2 Prerequisites to HPM Methodology 47 3.2.1 BEOL Process Variation Modeling 47 3.2.2 Surrogate Model Preparation 49 3.3 HPM Methodology: Design Phase 52 3.3.1 HPM2PV Model Construction 52 3.3.2 Optimization of Monitoring Circuits Configuration 54 3.3.3 PV2CPT Model Construction 58 3.4 HPM Methodology: Post-Silicon Phase 60 3.4.1 Transfer Learning in Silicon Characterization Step 60 3.4.2 Procedures in Volume Production Phase 61 3.5 Experimental Results 62 3.5.1 Experimental Setup 62 3.5.2 Exploration of Monitoring Circuits Configuration 64 3.5.3 Effectiveness of Monitoring Circuits Optimization 66 3.5.4 Considering BEOL PVs and Uncertainty Learning 68 3.5.5 Comparison among Different Prediction Flows 69 3.5.6 Effectiveness of Prediction Model Calibration 71 3.6 Summary 73 4 LIGHTENING ASYNCHRONOUS PIPELINE CONTROLLER 75 4.1 Preliminaries and State-of-the-Art Work 75 4.1.1 Bundled-data vs. Dual-rail Asynchronous Circuits 75 4.1.2 Two-phase vs. Four-phase Bundled-data Protocol 76 4.1.3 Conventional State-of-the-Art Pipeline Controller Template 77 4.2 Delay Path Sharing for Lightening Pipeline Controller Template 78 4.2.1 Synthesizing Sharable Delay Paths 78 4.2.2 Validating Logical Correctness for Sharable Delay Paths 80 4.2.3 Reformulating Timing Constraints of Controller Template 81 4.2.4 Minimally Allocating Delay Buffers 87 4.3 In-depth Pipeline Controller Template Synthesis with Delay Path Reusing 88 4.3.1 Synthesizing Delay Path Units 88 4.3.2 Validating Logical Correctness of Delay Path Units 89 4.3.3 Updating Timing Constraints for Delay Path Units 91 4.3.4 In-depth Synthesis Flow Utilizing Delay Path Units 95 4.4 Experimental Results 99 4.4.1 Environment Setup 99 4.4.2 Piecewise Linear Modeling of Delay Path Unit Area 99 4.4.3 Comparison of Power, Performance, and Area 102 4.5 Summary 107 5 CONCLUSION 109 5.1 Chapter 2 109 5.2 Chapter 3 110 5.3 Chapter 4 110 Abstract (In Korean) 127Docto

SNU Open Repository and Archive

Physical Design Implementation and Engineering Change Order Flow

Author: Afonso Ferreira Pinto Gomes Moreira
Publication venue
Publication date: 06/09/2017
Field of study

Repositório Aberto da Universidade do Porto