Search CORE

58 research outputs found

Analytical Layer Planning for Nanometer VLSI Designs

Author: Chang Chi-Yu
Publication venue
Publication date
Field of study

In this thesis, we proposed an intermediate sub-process between placement and routing stage in physical design. The algorithm is for generating layer guidance for post-placement optimization technique especially buffer insertion. This issue becomes critical in nowadays VLSI chip design due to the factor of timing, congestion, and increasingly non-uniform parasitic among different metal layers. Besides, as a step before routing, this layer planning algorithm accounts for routability by considering minimized overlap area between different nets. Moreover, layer directive information which is a crucial concern in industrial design is also considered in the algorithm. The core problem is formulated as nonlinear programming problem which is composed of objective function and constraints. The problem is further solved by conjugate gradient method. The whole algorithm is implemented by C++ under Linux operating system and tested on ISPD2008 Global Routing Contest Benchmarks. The experiment results are shown in the end of this thesis and confirm the effectiveness of our approach especially in routability aspect

Texas A&M Repository

Comparing energy and latency of asynchronous and synchronous NoCs for embedded SoCs

Author: Gebhardt Daniel
Stevens Kenneth
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Journal ArticlePower consumption of on-chip interconnects is a primary concern for many embedded system-on-chip (SoC) applications. In this paper, we compare energy and performance characteristics of asynchronous (clockless) and synchronous network on-chip implementations, optimized for a number of SoC designs. We adapted the COSI-2.0 framework with ORION 2.0 router and wire models for synchronous network generation. Our own tool, ANetGen, specifies the asynchronous network by determining the topology with simulated-annealing and router locations with force-directed placement. It uses energy and delay models from our 65 nm bundled-data router design. SystemC simulations varied traffic burstiness using the self-similar b-model. Results show that the asynchronous network provided lower median and maximum message latency, especially under bursty traffic, and used far less router energy with a slight overhead for the interrouter wires

The University of Utah: J. Willard Marriott Digital Library

Advances in parallel programming for electronic design automation

Author: Lin Chun-Xun
Publication venue
Publication date: 01/08/2020
Field of study

The continued miniaturization of the technology node increases not only the chip capacity but also the circuit design complexity. How does one efficiently design a chip with millions or billions transistors? This has become a challenging problem in the integrated circuit (IC) design industry, especially for the developers of electronic design automation (EDA) tools. To boost the performance of EDA tools, one promising direction is via parallel computing. In this dissertation, we explore different parallel computing approaches, from CPU to GPU to distributed computing, for EDA applications. Nowadays multi-core processors are prevalent from mobile devices to laptops to desktop, and it is natural for software developers to utilize the available cores to maximize the performance of their applications. Therefore, in this dissertation we first focus on multi-threaded programming. We begin by reviewing a C++ parallel programming library called Cpp-Taskflow. Cpp-Taskflow is designed to facilitate programming parallel applications, and has been successfully applied to an EDA timing analysis tool. We will demonstrate Cpp-Taskflow’s programming model and interface, software architecture and execution flow. Then, we improve Cpp-Taskflow in several aspects. First, we enhance Cpp-Taskflow’s usability through restructuring the software architecture. Second, we introduce task graph composition to support composability and modularity, which makes it easier for users to construct large and complex parallel patterns. Third, we add a new task type in Cpp-Taskflow to let users control the graph execution flow. This feature empowers the graph model with the ability to describe complex control flow. Aside from the above enhancements, we have designed a new scheduler to adaptively manage the threads based on available parallelism. The new scheduler uses a simple and effective strategy which can not only prevent resource from being underutilized, but also mitigate resource over-subscription. We have evaluated the new scheduler on both micro-benchmarks and a very-large-scale integration (VLSI) application, and the results show that the new scheduler can achieve good performance and is very energy-efficient. Next we study the applicability of heterogeneous computing, specifically the graphics processing unit (GPU), to EDA. We demonstrate how to use GPU to accelerate VLSI placement, and we show that GPU can bring substantial performance gain to VLSI placement. Finally, as the design size keeps increasing, a more scalable solution will be distributed computing. We introduce a distributed power grid analysis framework built on top of DtCraft. This framework allows users to flexibly partition the design and automatically deploy the computations across several machines. In addition, we propose a job scheduler that can efficiently utilize cluster resource to improve the framework’s performance

Illinois Digital Environment for Access to Learning and Scholarship Repository

On The Engineering of a Stable Force-Directed Placer

Author: Vorwerk Kristofer
Publication venue: 'University of Waterloo'
Publication date: 01/01/2004
Field of study

Analytic and force-directed placement methods that simultaneously minimize wire length and spread cells are receiving renewed attention from both academia and industry. However, these methods are by no means trivial to implement---to date, published works have failed to provide sufficient engineering details to replicate results. This dissertation addresses the implementation of a generic force-directed placer entitled FDP. Specifically, this thesis provides (1) a description of efficient force computation for spreading cells, (2) an illustration of numerical instability in this method and a means to avoid the instability, (3) metrics for measuring cell distribution throughout the placement area, and (4) a complementary technique that aids in minimizing wire length. FDP is compared to Kraftwerk and other leading academic tools including Capo, Dragon, and mPG for both standard cell and mixed-size circuits. Wire lengths produced by FDP are found to be, on average, up to 9% and 3% better than Kraftwerk and Capo, respectively. All told, this thesis confirms the validity and applicability of the approach, and provides clarifying details of the intricacies surrounding the implementation of a force-directed global placer

University of Waterloo's Institutional Repository

Custom Cell Placement Automation for Asynchronous VLSI

Author: Yang Yihang
Publication venue: EliScholar – A Digital Platform for Scholarly Publishing at Yale
Publication date: 01/04/2022
Field of study

Asynchronous Very-Large-Scale-Integration (VLSI) integrated circuits have demonstrated many advantages over their synchronous counterparts, including low power consumption, elastic pipelining, robustness against manufacturing and temperature variations, etc. However, the lack of dedicated electronic design automation (EDA) tools, especially physical layout automation tools, largely limits the adoption of asynchronous circuits. Existing commercial placement tools are optimized for synchronous circuits, and require a standard cell library provided by semiconductor foundries to complete the physical design. The physical layouts of cells in this library have the same height to simplify the placement problem and the power distribution network. Although the standard cell methodology also works for asynchronous designs, the performance is inferior compared with counterparts designed using the full-custom design methodology. To tackle this challenge, we propose a gridded cell layout methodology for asynchronous circuits, in which the cell height and cell width can be any integer multiple of two grid values. The gridded cell approach combines the shape regularity of standard cells with the size flexibility of full-custom layouts. Therefore, this approach can achieve a better space utilization ratio and lower wire length for asynchronous designs. Experiments have shown that the gridded cell placement approach reduces area without impacting the routability. We have also used this placer to tape out a chip in a 65nm process technology, demonstrating that our placer generates design-rule clean results

Yale University

On the Use of Directed Moves for Placement in VLSI CAD

Author: Vorwerk Kristofer
Publication venue: 'University of Waterloo'
Publication date: 01/01/2009
Field of study

Search-based placement methods have long been used for placing integrated circuits targeting the field programmable gate array (FPGA) and standard cell design styles. Such methods offer the potential for high-quality solutions but often come at the cost of long run-times compared to alternative methods. This dissertation examines strategies for enhancing local search heuristics---and in particular, simulated annealing---through the application of directed moves. These moves help to guide a search-based optimizer by focusing efforts on states which are most likely to yield productive improvement, effectively pruning the size of the search space. The engineering theory and implementation details of directed moves are discussed in the context of both field programmable gate array and standard cell designs. This work explores the ways in which such moves can be used to improve the quality of FPGA placements, improve the robustness of floorplan repair and legalization methods for mixed-size standard cell designs, and enhance the quality of detailed placement for standard cell circuits. The analysis presented herein confirms the validity and efficacy of directed moves, and supports the use of such heuristics within various optimization frameworks

University of Waterloo's Institutional Repository

Recommended from our members

Nanometer VLSI placement and optimization for multi-objective design closure

Author: Luo Tao, Ph. D.
Publication venue
Publication date: 01/12/2007
Field of study

In a VLSI physical synthesis flow, placement directly defines the interconnection, which affects many other design objectives, such as timing, power consumption, congestion, and thermal issues. With the scaling of technology, the relative interconnect delay increases dramatically. As a result, placement has become a bottleneck in deep sub-micron physical synthesis. In this dissertation, I propose several optimization algorithms from global placement, placement migration, timing driven placements, to incremental power optimizations for multi-objective VLSI design closure. The first work is DPlace, a new global placement algorithm that scales well to the modern large-scale circuit placement problems. DPlace simulates the natural diffusion process to spread cells smoothly over the placement region, and uses both analytical and discrete techniques to improve the wire length. However, global placement is never sufficient for multi-objective design closure, a variety of design objectives have to be improved incrementally, such as timing, routing congestion, signal integrity, and heat distribution. Placement migration is a critical step to address the cell overlaps appearing during incremental optimizations. To achieve high placement stability, I propose a computational geometry based placement migration flow to cope with placement changes, and a new stability metric to measure the “similarity” between two placements accurately. Our placement migration algorithm has clear advantage over conventional legalization algorithms such that the neighborhood characteristics of the original placement are preserved. For timing closure in high performance designs, I present a linear programming based incremental timing driven placement to improve the timing on critical paths directly. I further present an efficient timing driven placement algorithm (Pyramids). Two formulations of Pyramids are proposed, which are suitable for different optimization stages in a physical synthesis flow. Both approaches find the optimal location for timing of a cell in constant time, through computational geometry based approaches. For fast convergence of design closure, placement should be integrated with other optimization techniques. I propose to combine placement, gate sizing and Vt swapping techniques to reduce the total power consumption, especially the leakage power, which is becoming increasingly critical for nanometer VLSI design closure.Electrical and Computer Engineerin

Texas ScholarWorks

Simultaneous block and I/O buffer floorplanning for flip-chip design

Author: Chih-Yang Peng
Jyh-Herng Wang
Wen-Chang Chao
Yao-Wen Chang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2006
Field of study

The flip-chip package gives the highest chip density of any packaging method to support the pad-limited ASIC design. One of the most important characteristics of flip-chip designs is that the input/output buffers could be placed anywhere inside a chip. In this paper, we first introduce the floorplanning problem for the flip-chip design and formulate it as assigning the positions of input/output buffers and first-stage/last-stage blocks so that the path length between blocks and bump balls as well as the delay skew of the paths are simultaneously minimized. We then present a hierarchical method to solve the problem. We first cluster a block and its corresponding buffers to reduce the problem size. Then, we go into iterations of the alternating and interacting global optimization step and the partitioning step. The global optimization step places blocks based on simulated annealing using the B*-tree representation to minimize a given cost function. The partitioning step dissects the chip into two subregions, and the blocks are divided into two groups and are placed in respective subregions. The two steps repeat until each subregion contains at most a given number of blocks, defined by the ratio of the total block area to the chip area. At last, we refine the floorplan by perturbing blocks inside a subregion as well as in different subregions. Compared with the B*-tree based floorplanner alone, our method is more efficient and obtains significantly better results, with an average cost of only 51.8 % of that obtained by using the B*-tree alone, based on a set of real industrial flip-chip designs provided by leading companies

CiteSeerX

Crossref

Analytical Layer Planning for Nanometer VLSI Designs

Author: Chang Chi-Yu
Publication venue
Publication date
Field of study

Texas A&M Repository