8 research outputs found
A Metaheuristic Method for Fast Multi-Deck Legalization
Department of Electrical EngineeringIn the field of circuit design, decreasing the transistor size is getting harder and harder. Hence, improving the circuit performance also becoming difficult. For the better circuit performance, various technologies are being tired and multi-deck standard cell technology is one of them. The standard cell methodology is a fundamental structure of EDA (Electric Design Automation). Using the standard cell library, EDA tools can easily design, and optimize the physical design of chips.
In order to conventional standard cell, multi-deck standard cell occupies multiple rows on the chip. This multiple occupation increases complexity of the circuit physical design for EDA tools. Thus, legalization problem has become more challenging for the multi-deck standard cells. Recently, various multi-deck legalization methods are proposed because the conventional single-deck legalization method is not effective for multi-deck legalization. A state-of-the-arts legalization method is based on quadratic programming with the linear complementary problem(LCP). However, these previous researches can only cover the double-deck case because of runtime burden.
In this thesis, we propose the fast and enhanced the multi-deck standard cell legalization algorithm which can handle higher than double-deck standard cell cases. The proposed legalization method achieves the most fastest runtime result for the dominant number of benchmarks on ICCAD Contest 2017 [1] compared with Top 3 results.ope
Towards Collaborative Intelligence: Routability Estimation based on Decentralized Private Data
Applying machine learning (ML) in design flow is a popular trend in EDA with
various applications from design quality predictions to optimizations. Despite
its promise, which has been demonstrated in both academic researches and
industrial tools, its effectiveness largely hinges on the availability of a
large amount of high-quality training data. In reality, EDA developers have
very limited access to the latest design data, which is owned by design
companies and mostly confidential. Although one can commission ML model
training to a design company, the data of a single company might be still
inadequate or biased, especially for small companies. Such data availability
problem is becoming the limiting constraint on future growth of ML for chip
design. In this work, we propose an Federated-Learning based approach for
well-studied ML applications in EDA. Our approach allows an ML model to be
collaboratively trained with data from multiple clients but without explicit
access to the data for respecting their data privacy. To further strengthen the
results, we co-design a customized ML model FLNet and its personalization under
the decentralized training scenario. Experiments on a comprehensive dataset
show that collaborative training improves accuracy by 11% compared with
individual local models, and our customized model FLNet significantly
outperforms the best of previous routability estimators in this collaborative
training flow.Comment: 6 pages, 2 figures, 5 tables, accepted by DAC'2
Legalização incremental de células multi-row usando árvores espaciais
TCC(graduação) - Universidade Federal de Santa Catarina. Centro Tecnológico. Ciências da Computação.A crescente complexidade de circuitos integrados requer fluxos de desenvolvimento cada vez mais automatizados para manter um tempo reduzido do início do desenvolvimento até a saída para o mercado (time to market). Neste contexto, o fluxo standard cell é responsável pelo projeto físico dos circuitos, sendo a etapa de legalização um de seus passos fundamentais. A legalização é responsável por alinhar as células com as linhas e colunas do circuito, além de remover suas sobreposições enquanto tenta minimizar o deslocamento das mesmas. Esta etapa pode ser aplicada diversas vezes durante o fluxo de projeto, principalmente ao se usar técnicas de otimização incremental, onde a legalização pode ser aplicada após cada iteração da otimização ou após cada transformação no posicionamento. O trabalho tem como objetivo propor e analisar uma técnica de legalização incremental compatível com células multi-row, já que técnicas assim são raras na literatura
Recommended from our members
Improved Physical Design for Manufacturing Awareness and Advanced VLSI
Increasing challenges arise with each new semiconductor technology node, especially in advanced nodes, where the industry tries to extract every ounce of benefit as it approaches the limits of physics, through manufacturing-aware design technology co-optimization and design-based equivalent scaling. The increasing complexity of design and process technologies, and ever-more complex design rules, also become hurdles for academic researchers, separating academic researchers from the most up-to-date technical issues.This thesis presents innovative methodologies and optimizations to address the above challenges. There are three directions in this thesis: (i) manufacturing-aware design technology co-optimization; (ii) advanced node design-based equivalent scaling; and (iii) an open source academic detailed routing flow.To realize manufacturing-aware design technology co-optimization, this thesis presents two works: (i) a multi-row detailed placement optimization for neighbor diffusion effect mitigation between neighboring standard cells; and (ii) a post-routing optimization to generate 2D block mask layout for dummy segment removal in self-aligned multiple patterning.To achieve advanced node design-based equivalent scaling, this thesis presents two improved physical design methodologies: (i) a post-placement flop tray generation approach for clock power reduction; and (ii) a detailed placement approach to exploit inter-row M1 routing for congestion and wirelength reduction.To address the increasing gap between academia and industry, this thesis presents two works toward an open source academic detailed routing flow: (i) a complete, robust, scalable and design ruleaware dynamic programming-based pin access analysis framework; and (ii) TritonRoute – the open source detailed router that is capable of delivering DRC-clean detailed routing solutions in advanced nodes.This thesis concludes with a summary of its contributions and open directions for future research
Advances in parallel programming for electronic design automation
The continued miniaturization of the technology node increases not only the chip capacity but also the circuit design complexity. How does one efficiently design a chip with millions or billions transistors? This has become a challenging problem in the integrated circuit (IC) design industry, especially for the developers of electronic design automation (EDA) tools. To boost the performance of EDA tools, one promising direction is via parallel computing. In this dissertation, we explore different parallel computing approaches, from CPU to GPU to distributed computing, for EDA applications.
Nowadays multi-core processors are prevalent from mobile devices to laptops to desktop, and it is natural for software developers to utilize the available cores to maximize the performance of their applications. Therefore, in this dissertation we first focus on multi-threaded programming. We begin by reviewing a C++ parallel programming library called Cpp-Taskflow. Cpp-Taskflow is designed to facilitate programming parallel applications, and has been successfully applied to an EDA timing analysis tool. We will demonstrate Cpp-Taskflow’s programming model and interface, software architecture and execution flow. Then, we improve Cpp-Taskflow in several aspects. First, we enhance Cpp-Taskflow’s usability through restructuring the software architecture. Second, we introduce task graph composition to support composability and modularity, which makes it easier for users to construct large and complex parallel patterns. Third, we add a new task type in Cpp-Taskflow to let users control the graph execution flow. This feature empowers the graph model with the ability to describe complex control flow. Aside from the above enhancements, we have designed a new scheduler to adaptively manage the threads based on available parallelism. The new scheduler uses a simple and effective strategy which can not only prevent resource from being underutilized, but also mitigate resource over-subscription. We have evaluated the new scheduler on both micro-benchmarks and a very-large-scale integration (VLSI) application, and the results show that the new scheduler can achieve good performance and is very energy-efficient.
Next we study the applicability of heterogeneous computing, specifically the graphics processing unit (GPU), to EDA. We demonstrate how to use GPU to accelerate VLSI placement, and we show that GPU can bring substantial performance gain to VLSI placement. Finally, as the design size keeps increasing, a more scalable solution will be distributed computing. We introduce a distributed power grid analysis framework built on top of DtCraft. This framework allows users to flexibly partition the design and automatically deploy the computations across several machines. In addition, we propose a job scheduler that can efficiently utilize cluster resource to improve the framework’s performance
Distributed timing analysis
As design complexities continue to grow larger, the need to efficiently analyze circuit timing with billions of transistors across multiple modes and corners is quickly becoming the major bottleneck to the overall chip design closure process. To alleviate the long runtimes, recent trends are driving the need of distributed timing analysis (DTA) in electronic design automation (EDA) tools. However, DTA has received little research attention so far and remains a critical problem. In this thesis, we introduce several methods to approach DTA problems. We present a near-optimal algorithm to speed up the path-based timing analysis in Chapter 1. Path-based timing analysis is a key step in the overall timing flow to reduce unwanted pessimism, for example, common path pessimism removal (CPPR). In Chapter 2, we introduce a MapReduce-based distributed Path-based timing analysis framework that can scale up to hundreds of machines. In Chapter 3, we introduce our standalone timer, OpenTimer, an open-source high-performance timing analysis tool for very large scale integration (VLSI) systems. OpenTimer efficiently supports (1) both block-based and path-based timing propagations, (2) CPPR, and (3) incremental timing. OpenTimer works on industry formats (e.g., .v, .spef, .lib, .sdc) and is designed to be parallel and portable. To further facilitate integration between timing and timing-driven optimizations, OpenTimer provides user-friendly application programming interface (API) for inactive analysis. Experimental results on industry benchmarks re- leased from TAU 2015 timing analysis contest have demonstrated remarkable results achieved by OpenTimer, especially in its order-of-magnitude speedup over existing timers.
In Chapter 4 we present a DTA framework built on top of our standalone timer OpenTimer. We investigated into existing cluster computing frameworks from big data community and demonstrated DTA is a difficult fit here in terms of computation patterns and performance concern. Our specialized DTA framework supports (1) general design partitions (logical, physical, hierarchical, etc.) stored in a distributed file system, (2) non-blocking IO with event-driven programming for effective communication and computation overlap, and (3) an efficient messaging interface between application and network layers. The effectiveness and scalability of our framework has been evaluated on large hierarchical industry designs over a cluster with hundreds of machines.
In Chapter 5, we present our system DtCraft, a distributed execution engine for compute-intensive applications. Motivated by our DTA framework, DtCraft introduces a high-level programming model that lets users without detailed experience of distributed computing utilize the cluster resources. The major goal is to simplify the coding efforts on building distributed applications based on our system. In contrast to existing data-parallel cluster computing frameworks, DtCraft targets on high-performance or compute- intensive applications including simulations, modeling, and most EDA applications. Users describe a program in terms of a sequential stream graph associated with computation units and data streams. The DtCraft runtime transparently deals with the concurrency controls including work distribution, process communication, and fault tolerance. We have evaluated DtCraft on both micro-benchmarks and large-scale simulation and optimization problems, and showed the promising performance from single multi-core machines to clusters of computers