Search CORE

213 research outputs found

Analysis and development of digital circuits optimization algorithms in terms of speed and power dissipation

Author: Καλονάκης Χρήστος Ν.
Ντιούδης Δήμος Π.
Publication venue
Publication date: 01/01/2015
Field of study

University of Thessaly Institutional Repository

ASIC implemented MicroBlaze-based Coprocessor for Data Stream Management Systems

Author: Balasubramanian Linknath Surya
Publication venue
Publication date: 01/01/2020
Field of study

Indiana University-Purdue University Indianapolis (IUPUI)The drastic increase in Internet usage demands the need for processing data in real time with higher efficiency than ever before. Symbiote Coprocessor Unit (SCU), developed by Dr. Pranav Vaidya, is a hardware accelerator which has potential of providing data processing speedup of up to 150x compared with traditional data stream processors. However, SCU implementation is very complex, fixed, and uses an outdated host interface, which limits future improvement. Mr. Tareq S. Alqaisi, an MSECE graduate from IUPUI worked on curbing these limitations. In his architecture, he used a Xilinx MicroBlaze microcontroller to reduce the complexity of SCU along with few other modifications. The objective of this study is to make SCU suitable for mass production while reducing its power consumption and delay. To accomplish this, the execution unit of SCU has been implemented in application specific integrated circuit and modules such as ACG/OCG, sequential comparator, and D-word multiplier/divider are integrated into the design. Furthermore, techniques such as operand isolation, buffer insertion, cell swapping, and cell resizing are also integrated into the system. As a result, the new design attains 67.9435 µW of dynamic power as compared to 74.0012 µW before power optimization along with a small increase in static power, 39.47 ns of clock period as opposed to 52.26 ns before time optimization

IUPUIScholarWorks

Purdue E-Pubs

Crosstalk-driven interconnect optimization by simultaneous gate and wire sizing

Author: I.H.-R. Jiang
Jing-Yang Jou
Yao-Wen Chang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

초미세 회로 설계를 위한 인터커넥트의 타이밍 분석 및 디자인 룰 위반 예측

Author: 한창호
Publication venue: 서울대학교 대학원
Publication date: 01/02/2021
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2021. 2. 김태환.타이밍 분석 및 디자인 룰 위반 제거는 반도체 칩 제조를 위한 마스크 제작 전에 완료되어야 할 필수 과정이다. 그러나 트랜지스터와 인터커넥트의 변이가 증가하고 있고 디자인 룰 역시 복잡해지고 있기 때문에 타이밍 분석 및 디자인 룰 위반 제거는 초미세 회로에서 더 어려워지고 있다. 본 논문에서는 초미세 설계를 위한 두가지 문제인 타이밍 분석과 디자인 룰 위반에 대해 다룬다. 첫번째로 공정 코너에서 타이밍 분석은 실리콘으로 제작된 회로의 성능을 정확히 예측하지 못한다. 그 이유는 공정 코너에서 가장 느린 타이밍 경로가 모든 공정 조건에서도 가장 느린 것은 아니기 때문이다. 게다가 칩 내의 임계 경로에서 인터커넥트에 의한 지연 시간이 전체 지연 시간에서의 영향이 증가하고 있고, 10나노 이하 공정에서는 20%를 초과하고 있다. 즉, 실리콘으로 제작된 회로의 성능을 정확히 예측하기 위해서는 대표 회로가 트랜지스터의 변이 뿐만아니라 인터커넥트의 변이도 반영해야한다. 인터커넥트를 구성하는 금속이 10층 이상 사용되고 있고, 각 층을 구성하는 금속의 저항과 캐패시턴스와 비아 저항이 모두 회로 지연 시간에 영향을 주기 때문에 대표 회로를 찾는 문제는 차원이 매우 높은 영역에서 최적의 해를 찾는 방법이 필요하다. 이를 위해 인터커넥트를 제작하는 공정(백 엔드 오브 라인)의 변이를 반영한 대표 회로를 생성하는 방법을 제안하였다. 공정 변이가 없을때 가장 느린 타이밍 경로에 사용된 게이트와 라우팅 패턴을 변경하면서 점진적으로 탐색하는 방법이다. 구체적으로, 본 논문에서 제안하는 합성 프레임워크는 다음의 새로운 기술들을 통합하였다: (1) 라우팅을 구성하는 여러 금속 층과 비아를 추출하고 탐색 시간 감소를 위해 유사한 구성들을 같은 범주로 분류하였다. (2) 빠르고 정확한 타이밍 분석을 위하여 여러 금속 층과 비아들의 변이를 수식화하였다. (3) 확장성을 고려하여 일반적인 링 오실레이터로 대표회로를 탐색하였다. 두번째로 디자인 룰의 복잡도가 증가하고 있고, 이로 인해 표준 셀들의 인터커넥트를 통한 연결을 진행하는 동안 디자인 룰 위반이 증가하고 있다. 게다가 표준 셀의 크기가 계속 작아지면서 셀들의 연결은 점점 어려워지고 있다. 기존에는 회로 내 모든 표준 셀을 연결하는데 필요한 트랙 수, 가능한 트랙 수, 이들 간의 차이를 이용하여 연결 가능성을 판단하고, 디자인 룰 위반이 발생하지 않도록 셀 배치를 최적화하였다. 그러나 기존 방법은 최신 공정에서는 정확하지 않기 때문에 더 많은 정보를 이용한 회로내 모든 표준 셀 사이의 연결 가능성을 예측하는 방법이 필요하다. 본 논문에서는 기계 학습을 통해 디자인 룰 위반이 발생하는 영역 및 개수를 예측하고 이를 줄이기 위해 표준 셀의 배치를 바꾸는 방법을 제안하였다. 디자인 룰 위반 영역은 이진 분류로 예측하였고 표준 셀의 배치는 디자인 룰 위반 개수를 최소화하는 방향으로 최적화를 수행하였다. 제안하는 프레임워크는 다음의 세가지 기술로 구성되었다: (1) 회로 레이아웃을 여러 개의 정사각형 격자로 나누고 각 격자에서 라우팅을 예측할 수 있는 요소들을 추출한다. (2) 각 격자에서 디자인 룰 위반이 있는지 여부를 판단하는 이진 분류를 수행한다. (3) 메타휴리스틱 최적화 또는 베이지안 최적화를 이용하여 전체 디자인 룰 위반 개수가 감소하도록 각 격자에 있는 표준 셀을 움직인다.Timing analysis and clearing design rule violations are the essential steps for taping out a chip. However, they keep getting harder in deep sub-micron circuits because the variations of transistors and interconnects have been increasing and design rules have become more complex. This dissertation addresses two problems on timing analysis and design rule violations for synthesizing deep sub-micron circuits. Firstly, timing analysis in process corners can not capture post-Si performance accurately because the slowest path in the process corner is not always the slowest one in the post-Si instances. In addition, the proportion of interconnect delay in the critical path on a chip is increasing and becomes over 20% in sub-10nm technologies, which means in order to capture post-Si performance accurately, the representative critical path circuit should reflect not only FEOL (front-end-of-line) but also BEOL (backend-of-line) variations. Since the number of BEOL metal layers exceeds ten and the layers have variation on resistance and capacitance intermixed with resistance variation on vias between them, a very high dimensional design space exploration is necessary to synthesize a representative critical path circuit which is able to provide an accurate performance prediction. To cope with this, I propose a BEOL-aware methodology of synthesizing a representative critical path circuit, which is able to incrementally explore, starting from an initial path circuit on the post-Si target circuit, routing patterns (i.e., BEOL reconfiguring) as well as gate resizing on the path circuit. Precisely, the synthesis framework of critical path circuit integrates a set of novel techniques: (1) extracting and classifying BEOL configurations for lightening design space complexity, (2) formulating BEOL random variables for fast and accurate timing analysis, and (3) exploring alternative (ring oscillator) circuit structures for extending the applicability of this work. Secondly, the complexity of design rules has been increasing and results in more design rule violations during routing. In addition, the size of standard cell keeps decreasing and it makes routing harder. In the conventional P&R flow, the routability of pre-routed layout is predicted by routing congestion obtained from global routing, and then placement is optimized not to cause design rule violations. But it turned out to be inaccurate in advanced technology nodes so that it is necessary to predict routability with more features. I propose a methodology of predicting the hotspots of design rule violations (DRVs) using machine learning with placement related features and the conventional routing congestion, and perturbating placed cells to reduce the number of DRVs. Precisely, the hotspots are predicted by a pre-trained binary classification model and placement perturbation is performed by global optimization methods to minimize the number of DRVs predicted by a pre-trained regression model. To do this, the framework is composed of three techniques: (1) dividing the circuit layout into multiple rectangular grids and extracting features such as pin density, cell density, global routing results (demand, capacity and overflow), and more in the placement phase, (2) predicting if each grid has DRVs using a binary classification model, and (3) perturbating the placed standard cells in the hotspots to minimize the number of DRVs predicted by a regression model.1 Introduction 1 1.1 Representative Critical Path Circuit . . . . . . . . . . . . . . . . . . . 1 1.2 Prediction of Design Rule Violations and Placement Perturbation . . . 5 1.3 Contributions of This Dissertation . . . . . . . . . . . . . . . . . . . 7 2 Methodology for Synthesizing Representative Critical Path Circuits reflecting BEOL Timing Variation 9 2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Definitions and Overall Flow . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Techniques for BEOL-Aware RCP Generation . . . . . . . . . . . . . 17 2.3.1 Clustering BEOL Configurations . . . . . . . . . . . . . . . . 17 2.3.2 Formulating Statistical BEOL Random Variables . . . . . . . 18 2.3.3 Delay Modeling . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.4 Exploring Ring Oscillator Circuit Structures . . . . . . . . . . 24 2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.5 Further Study on Variations . . . . . . . . . . . . . . . . . . . . . . . 37 3 Methodology for Reducing Routing Failures through Enhanced Prediction on Design Rule Violations in Placement 39 3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2 Overall Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3 Techniques for Reducing Routing Failures . . . . . . . . . . . . . . . 43 3.3.1 Binary Classification . . . . . . . . . . . . . . . . . . . . . . 43 3.3.2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.3.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3.4 Placement Perturbation . . . . . . . . . . . . . . . . . . . . . 47 3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.4.1 Experiments Setup . . . . . . . . . . . . . . . . . . . . . . . 51 3.4.2 Hotspot Prediction . . . . . . . . . . . . . . . . . . . . . . . 51 3.4.3 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.4.4 Placement Perturbation . . . . . . . . . . . . . . . . . . . . . 57 4 Conclusions 61 4.1 Synthesis of Representative Critical Path Circuits reflecting BEOL Timing Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2 Reduction of Routing Failures through Enhanced Prediction on Design Rule Violations in Placement . . . . . . . . . . . . . . . . . . . . . . 62 Abstract (In Korean) 69Docto

SNU Open Repository and Archive

Lagrangian relaxation-based multi-threaded discrete gate sizer

Author: Sharma Ankur
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2018
Field of study

In integrated circuit design gate sizing is one of the key optimization techniques which is repeatedly invoked to trade-off delays for area and/or power of the gates during logic design and physical design stages. With increasing design sizes of a million gates and larger, discrete gate sizes and non-convex delay models the gate sizing algorithms that were designed for continuous sizes and convex delay models are slow and timing inaccurate. Of the several published discrete gate sizing algorithms, recent works have shown that Lagrangian relaxation based gate sizers have produced designs with the lowest power on average with high timing accuracy. But they are also very slow due to a large number of expensive timing updates spread across hundreds of iterations of solving the Lagrangian sub-problem. In this thesis we present a Lagrangian relaxation based multi-threaded discrete gate sizer for fast timing and power reduction by swapping the gate sizes and the threshold voltages. We developed two parallelization enabling techniques to reduce the runtime of Lagrangian sub-problem solver, namely, mutual exclusion edge (MEE) assignment and directed acyclic graph (DAG) based netlist traversal. MEEs are dummy edges assigned to reduce computational dependencies among gates sharing one or more common fan-ins. DAG based netlist traversal facilitates simultaneous resizing of gates belonging to different topological levels. We designed a Lagrange multiplier update framework that enables rapid convergence of the timing recovery and power recovery algorithms. To reduce the runtime of timing updates, we proposed a simple and fast-to-compute effective capacitance model and several mechanisms to calibrate the timing models to improve their accuracy. Compared to the state-of-the-art gate sizer, our proposed gate sizer is on average 15x faster and the optimized designs have only 1.7\% higher power. In digital synchronous designs simultaneous gate sizing and clock skew scheduling provides significantly more power saving. We extend the gate sizer to simultaneously schedule the clock skew. It can achieve an average of 18.8\% more reduction in power with only 20\% increase in the runtime

Digital Repository @ Iowa State University (ISU)

Discrete Gate Sizing Methodologies for Delay, Area and Power Optimization

Author: Xie Jiani
Publication venue: SURFACE at Syracuse University
Publication date: 01/12/2014
Field of study

The modeling of an individual gate and the optimization of circuit performance has long been a critical issue in the VLSI industry. In this work, we first study of the gate sizing problem for today\u27s industrial designs, and explore the contributions and limitations of all the existing approaches, which mainly suffer from producing only continuous solutions, using outdated timing models or experiencing performance inefficiency. In this dissertation, we present our new discrete gate sizing technique which optimizes different aspects of circuit performance, including delay, area and power consumption. And our method is fast and efficient as it applies the local search instead of global exhaustive search during gate size selection process, which greatly reduces the search space and improves the computation complexity. In addition to that, it is also flexible with different timing models, and it is able to deal with the constraints of input/output slew and output load capacitance, under which very few previous research works were reported. We then propose a new timing model, which is derived from the classic Elmore delay model, but takes the features of modern timing models from standard cell library. With our new timing model, we are able to formulate the combinatorial discrete sizing problem as a simplified mathematical expression and apply it to existing Lagrangian relaxation method, which is shown to converge to optimal solution. We demonstrate that the classic Elmore delay model based gate sizing approaches can still be valid. Therefore, our work might provide a new look into the numerous Elmore delay model based research works in various areas (such as placement, routing, layout, buffer insertion, timing analysis, etc.)

Syracuse University Research Facility and Collaborative Environment

메쉬 기반의 클락 네트워크 설계 방법론

Author: 강민석
Publication venue: 서울대학교 대학원
Publication date: 01/02/2015
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 2. 김태환.The clock distribution network in a synchronous digital circuit delivers a clock signal to every storage element i.e., clock sink in the circuit. However, since the continued technology scaling increases PVT (process-voltage-temperature) variation, the increase of clock skew variation is highly likely to cause performance degradation or system failure at run time. Recently, to mitigate the clock skew variation, many researchers have taken a profound interest in the clock mesh network. However, though the structure of clock mesh network is excellent in tolerating timing variation, it demands significantly high power consumption due to the use of excessive mesh wire and buffer resources. Thus, optimizing the resources required in the mesh clock synthesis while maintaining the variation tolerance is crucially important. The three major tasks that greatly affect the cost of resulting clock mesh are (1) mesh segment allocation, (2) mesh buffer allocation and sizing, and (3) clock sink binding to mesh segments. Previous clock mesh optimization approaches solve the three tasks sequentially, one by one at a time, to manage the run time complexity of the tasks at the expense of losing the quality of results. However, since the three tasks are tightly inter-related, simultaneously optimizing all three tasks is essential, if the run time is ever permitted, to synthesize an economical clock mesh network. In this dissertation, we propose an approach which is able to tackle the problem in an integrated fashion by combining the three tasks into an iterative framework of incremental updates and solving them simultaneously to find a globally optimal allocation of mesh resources while taking into account the clock skew tolerance constraints. The core parts of this dissertation are a precise analysis on the relation among the resource optimization tasks and an establishment of mechanism for effective and efficient integration of the tasks. In particular, to handle the run time problem, we propose a set of speed-up techniques i.e., modeling RC circuit for eliminating redundant matrix multiplications, exploiting sliding window scheme, and fast buffer sizing effect estimation, which are fitted into our context of fast clock skew estimation in mesh resource optimization as well as an invention of early decision policies. In summary, this dissertation presents the efficient design methodology for clock mesh synthesis with consideration on integration of three tasks and reduction of runtime complexity.Abstract i Contents iii List of Figures vi List of Tables x 1 Introduction 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contributions of This Dissertation . . . . . . . . . . . . . . . . . . . 3 2 Background 5 2.1 Clock Distribution Network . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Clock Network Topologies . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Design Metrics of Clock Network . . . . . . . . . . . . . . . . . . . 7 2.4 The Effects of Variations on Clock Skew . . . . . . . . . . . . . . . . 9 3 Clock Mesh Synthesis Flow 12 3.1 Elements of Clock Mesh . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 Conventional Clock Mesh Synthesis Overview . . . . . . . . . . . . . 13 3.3 Initial Grid Generation . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.4 Mesh Buffer Placement and Sizing . . . . . . . . . . . . . . . . . . . 14 3.5 Clock Mesh Optimization . . . . . . . . . . . . . . . . . . . . . . . . 17 4 Integrated Resource Allocation and Binding in Clock Mesh Synthesis 19 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3 Framework of Clock Mesh Optimization . . . . . . . . . . . . . . . . 26 4.3.1 Incremental Resource Updates . . . . . . . . . . . . . . . . . 29 4.3.2 Constraints for Variation Tolerance . . . . . . . . . . . . . . 34 4.3.3 Early Decision Policies . . . . . . . . . . . . . . . . . . . . . 38 4.3.4 Time Complexity Analysis . . . . . . . . . . . . . . . . . . . 39 4.4 Fast Clock Skew Estimation Techniques . . . . . . . . . . . . . . . . 40 4.4.1 Partially Reusing Matrix Multiplication for Incremental Updates 41 4.4.2 Adopting Sliding Window Scheme . . . . . . . . . . . . . . . 43 4.4.3 Adjusting Delay Caused by Buffer Resizing . . . . . . . . . . 44 4.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.5.1 Experimental Environments . . . . . . . . . . . . . . . . . . 46 4.5.2 Resource Requirement and Variation Tolerance Comparison . 48 4.5.3 Comparison with Clock Mesh Optimization using Worst Case Timing Analysis of Commercial Tool . . . . . . . . . . . . . 56 4.5.4 Analysis of the Effect of Proposed Techniques . . . . . . . . 58 4.5.5 Run Time Analysis . . . . . . . . . . . . . . . . . . . . . . . 61 4.5.6 Accuracy and Run Time of Fast Clock Skew Estimation . . . 63 4.5.7 Electromigration Analysis . . . . . . . . . . . . . . . . . . . 68 4.5.8 Run-time Analysis in Multi-thread Computing Environment . 70 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5 Conclusion 74 Abstract in Korean 84Docto

SNU Open Repository and Archive

Fuzzy simulated evolution algorithm for VLSI cell placement

Author: Ali H.
Sait Sadiq M.
Youssef H.
Publication venue: PERGAMON-ELSEVIER SCIENCE LTD, THE BOULEVARD, LANGFORD LANE, KIDLINGTON, OXFORD OX5 1GB, ENGLAND
Publication date: 01/02/2003
Field of study

Placement is a major step encountered during the design of very large scale integrated circuits. It is a generalization of the quadratic assignment problem with numerous constraints, several objectives, and a very noisy solution space. Besides the NP-hard nature of this problem, many circuit parameters such as area, interconnect delays, wire requirements, etc. can only be imprecisely estimated before completing the remaining design automation steps and committing the circuit to silicon. Further, the best placement is usually one that combines several desirable physical characteristics. There has not been a consensus on how to accommodate all these (conflicting) requirements in the search for near optimal feasible solutions. In this paper, we present a fuzzy simulated evolution (FSE) algorithm to tackle this problem. Identification of near optimal solutions is achieved through a novel goal-directed fuzzy search approach. This approach can be followed by other iterative (meta-) heuristics to find desirable solutions to optimization problems with noisy search space and possibly more than one objective. This approach is dominance preserving, i.e. if a solution A dominates another solution B with respect to all objective criteria, then A will surely have a higher membership in the fuzzy set of good solutions than solution B. Further, the approach scales well with larger problem instances and/or a larger number of objective criteria. Also, the operators of all stages of simulated evolution have been implemented using fuzzy logic to exploit the nature of fuzzy information of the problem domain. Experiments with benchmark tests demonstrate a noticeable improvement in solution quality. (C) 2002 Published by Elsevier Science Ltd

KFUPM ePrints

Overcoming the challenges in very deep submicron for area reduction, power reduction and faster design closure

Author: Koyyalamudi Rakesh
Publication venue
Publication date: 01/01/2007
Field of study

The project is aimed at understanding the existing very deep sub-micron (VDSM) implementation of a digital design, analyzing it from the point of view of power, area and timing and to come up with solutions and strategies to optimize the implementation in terms of power, area and timing. The effort involved, to understand the constraints, reasons and the requirements resulting in the existing implementation of the design. Further, various experiments were carried out to improve the design in various aspects like power, area and timing. The tradeoffs required and the benefits of each of the experiments were contrasted and analyzed. The optimum solutions and strategies which balance the requirements were tried out and published at the end of the report

ethesis@nitr

Simulated evolution for timing and low power VLSI standard cell placement

Author: Khan Junaid A.
Sait Sadiq M.
Publication venue
Publication date
Field of study

Abstract This paper presents a Fuzzy Simulated Evolution algorithm for VLSI standard cell placement with the objective of minimizing power, delay and area. For this hard multiobjective combinatorial optimization problem, no known exact and efficient algorithms exist that guarantee finding a solution of specific or desirable quality. Approximation iterative heuristics such as Simulated Evolution are best suited to perform an intelligent search of the solution space. Due to the imprecise nature of design information at the placement stage the various objectives and constraints are expressed in the fuzzy domain. The search is made to evolve toward a vector of fuzzy goals. Variants of the algorithm which include adaptive bias and biasless simulated evolution are proposed and experimental results are presented. Comparison with genetic algorithm is discussed. r 2003 Elsevier Ltd. All rights reserved