13 research outputs found

    High performance algorithms for large scale placement problem

    Get PDF
    Placement is one of the most important problems in electronic design automation (EDA). An inferior placement solution will not only affect the chip’s performance but might also make it nonmanufacturable by producing excessive wirelength, which is beyond available routing resources. Although placement has been extensively investigated for several decades, it is still a very challenging problem mainly due to that design scale has been dramatically increased by order of magnitudes and the increasing trend seems unstoppable. In modern design, chips commonly integrate millions of gates that require over tens of metal routing layers. Besides, new manufacturing techniques bring out new requests leading to that multi-objectives should be optimized simultaneously during placement. Our research provides high performance algorithms for placement problem. We propose (i) a high performance global placement core engine POLAR; (ii) an efficient routability-driven placer POLAR 2.0, which is an extension of POLAR to deal with routing congestion; (iii) an ultrafast global placer POLAR 3.0, which explore parallelism on POLAR and can make full use of multi-core system; (iv) some efficient triple patterning lithography (TPL) aware detailed placement algorithms

    High-performance Global Routing for Trillion-gate Systems-on-Chips.

    Full text link
    Due to aggressive transistor scaling, modern-day CMOS circuits have continually increased in both complexity and productivity. Modern semiconductor designs have narrower and more resistive wires, thereby shifting the performance bottleneck to interconnect delay. These trends considerably impact timing closure and call for improvements in high-performance physical design tools to keep pace with the current state of IC innovation. As leading-edge designs may incorporate tens of millions of gates, algorithm and software scalability are crucial to achieving reasonable turnaround time. Moreover, with decreasing device sizes, optimizing traditional objectives is no longer sufficient. Our research focuses on (i) expanding the capabilities of standalone global routing, (ii) extending global routing for use in different design applications, and (iii) integrating routing within broader physical design optimizations and flows, e.g., congestion-driven placement. Our first global router relies on integer-linear programming (ILP), and can solve fairly large problem instances to optimality. Our second iterative global router relies on Lagrangian relaxation, where we relax the routing violation constraints to allowing routing overflow at a penalty. In both approaches, our desire is to give the router the maximum degree of freedom within a specified context. Empirically, both routers produce competitive results within a reasonable amount of runtime. To improve routability, we explore the incorporation of routing with placement, where the router estimates congestion and feeds this information to the placer. In turn, the emphasis on runtime is heightened, as the router will be invoked multiple times. Empirically, our placement-and-route framework significantly improves the final solution’s routability than performing the steps sequentially. To further enhance routability-driven placement, we (i) leverage incrementality to generate fast and accurate congestion maps, and (ii) develop several techniques to relieve cell-based and layout-based congestion. To broaden the scope of routing, we integrate a global router in a chip-design flow that addresses the buffer explosion problem.PHDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/98025/1/jinhu_1.pd

    Algorithmic techniques for physical design : macro placement and under-the-cell routing

    Get PDF
    With the increase of chip component density and new manufacturability constraints imposed by modern technology nodes, the role of algorithms for electronic design automation is key to the successful implementation of integrated circuits. Two of the critical steps in the physical design flows are macro placement and ensuring all design rules are honored after timing closure. This thesis proposes contributions to help in these stages, easing time-consuming manual steps and helping physical design engineers to obtain better layouts in reduced turnaround time. The first contribution is under-the-cell routing, a proposal to systematically connect standard cell components via lateral pins in the lower metal layers. The aim is to reduce congestion in the upper metal layers caused by extra metal and vias, decreasing the number of design rule violations. To allow cells to connect by abutment, a standard cell library is enriched with instances containing lateral pins in a pre-selected sharing track. Algorithms are proposed to maximize the numbers of connections via lateral connection by mapping placed cell instances to layouts with lateral pins, and proposing local placement modifications to increase the opportunities for such connections. Experimental results show a significant decrease in the number of pins, vias, and in number of design rule violations, with negligible impact on wirelength and timing. The second contribution, done in collaboration with eSilicon (a leading ASIC design company), is the creation of HiDaP, a macro placement tool for modern industrial designs. The proposed approach follows a multilevel scheme to floorplan hierarchical blocks, composed of macros and standard cells. By exploiting RTL information available in the netlist, the dataflow affinity between these blocks is modeled and minimized to find a macro placement with good wirelength and timing properties. The approach is further extended to allow additional engineer input, such as preferred macro locations, and also spectral and force methods to guide the floorplanning search. Experimental results show that the layouts generated by HiDaP outperforms those obtained by a state-of-the-art EDA physical design software, with similar wirelength and better timing when compared to manually designed tape-out ready macro placements. Layouts obtained by HiDaP have successfully been brought to near timing closure with one to two rounds of small modifications by physical design engineers. HiDaP has been fully integrated in the design flows of the company and its development remains an ongoing effort.A causa de l'increment de la densitat de components en els xip i les noves restriccions de disseny imposades pels últims nodes de fabricació, el rol de l'algorísmia en l'automatització del disseny electrònic ha esdevingut clau per poder implementar circuits integrats. Dos dels passos crucials en el procés de disseny físic és el placement de macros i assegurar la correcció de les regles de disseny un cop les restriccions de timing del circuit són satisfetes. Aquesta tesi proposa contribucions per ajudar en aquests dos reptes, facilitant laboriosos passos manuals en el procés i ajudant als enginyers de disseny físic a obtenir millors resultats en menys temps. La primera contribució és el routing "under-the-cell", una proposta per connectar cel·les estàndard usant pins laterals en les capes de metall inferior de manera sistemàtica. L'objectiu és reduir la congestió en les capes de metall superior causades per l'ús de metall i vies, i així disminuir el nombre de violacions de regles de disseny. Per permetre la connexió lateral de cel·les, estenem una llibreria de cel·les estàndard amb dissenys que incorporen connexions laterals. També proposem modificacions locals al placement per permetre explotar aquest tipus de connexions més sovint. Els resultats experimentals mostren una reducció significativa en el nombre de pins, vies i nombre de violacions de regles de disseny, amb un impacte negligible en wirelength i timing. La segona contribució, desenvolupada en col·laboració amb eSilicon (una empresa capdavantera en disseny ASIC), és el desenvolupament de HiDaP, una eina de macro placement per a dissenys industrials actuals. La proposta segueix un procés multinivell per fer el floorplan de blocks jeràrquics, formats per macros i cel·les estàndard. Mitjançant la informació RTL disponible en la netlist, l'afinitat de dataflow entre els mòduls es modela i minimitza per trobar macro placements amb bones propietats de wirelength i timing. La proposta també incorpora la possibilitat de rebre input addicional de l'enginyer, com ara suggeriments de les posicions de les macros. Finalment, també usa mètodes espectrals i de forçes per guiar la cerca de floorplans. Els resultats experimentals mostren que els dissenys generats amb HiDaP són millors que els obtinguts per eines comercials capdavanteres de EDA. Els resultats també mostren que els dissenys presentats poden obtenir un wirelength similar i millor timing que macro placements obtinguts manualment, usats per fabricació. Alguns dissenys obtinguts per HiDaP s'han dut fins a timing-closure en una o dues rondes de modificacions incrementals per part d'enginyers de disseny físic. L'eina s'ha integrat en el procés de disseny de eSilicon i el seu desenvolupament continua més enllà de les aportacions a aquesta tesi

    Algorithmic techniques for physical design : macro placement and under-the-cell routing

    Get PDF
    With the increase of chip component density and new manufacturability constraints imposed by modern technology nodes, the role of algorithms for electronic design automation is key to the successful implementation of integrated circuits. Two of the critical steps in the physical design flows are macro placement and ensuring all design rules are honored after timing closure. This thesis proposes contributions to help in these stages, easing time-consuming manual steps and helping physical design engineers to obtain better layouts in reduced turnaround time. The first contribution is under-the-cell routing, a proposal to systematically connect standard cell components via lateral pins in the lower metal layers. The aim is to reduce congestion in the upper metal layers caused by extra metal and vias, decreasing the number of design rule violations. To allow cells to connect by abutment, a standard cell library is enriched with instances containing lateral pins in a pre-selected sharing track. Algorithms are proposed to maximize the numbers of connections via lateral connection by mapping placed cell instances to layouts with lateral pins, and proposing local placement modifications to increase the opportunities for such connections. Experimental results show a significant decrease in the number of pins, vias, and in number of design rule violations, with negligible impact on wirelength and timing. The second contribution, done in collaboration with eSilicon (a leading ASIC design company), is the creation of HiDaP, a macro placement tool for modern industrial designs. The proposed approach follows a multilevel scheme to floorplan hierarchical blocks, composed of macros and standard cells. By exploiting RTL information available in the netlist, the dataflow affinity between these blocks is modeled and minimized to find a macro placement with good wirelength and timing properties. The approach is further extended to allow additional engineer input, such as preferred macro locations, and also spectral and force methods to guide the floorplanning search. Experimental results show that the layouts generated by HiDaP outperforms those obtained by a state-of-the-art EDA physical design software, with similar wirelength and better timing when compared to manually designed tape-out ready macro placements. Layouts obtained by HiDaP have successfully been brought to near timing closure with one to two rounds of small modifications by physical design engineers. HiDaP has been fully integrated in the design flows of the company and its development remains an ongoing effort.A causa de l'increment de la densitat de components en els xip i les noves restriccions de disseny imposades pels últims nodes de fabricació, el rol de l'algorísmia en l'automatització del disseny electrònic ha esdevingut clau per poder implementar circuits integrats. Dos dels passos crucials en el procés de disseny físic és el placement de macros i assegurar la correcció de les regles de disseny un cop les restriccions de timing del circuit són satisfetes. Aquesta tesi proposa contribucions per ajudar en aquests dos reptes, facilitant laboriosos passos manuals en el procés i ajudant als enginyers de disseny físic a obtenir millors resultats en menys temps. La primera contribució és el routing "under-the-cell", una proposta per connectar cel·les estàndard usant pins laterals en les capes de metall inferior de manera sistemàtica. L'objectiu és reduir la congestió en les capes de metall superior causades per l'ús de metall i vies, i així disminuir el nombre de violacions de regles de disseny. Per permetre la connexió lateral de cel·les, estenem una llibreria de cel·les estàndard amb dissenys que incorporen connexions laterals. També proposem modificacions locals al placement per permetre explotar aquest tipus de connexions més sovint. Els resultats experimentals mostren una reducció significativa en el nombre de pins, vies i nombre de violacions de regles de disseny, amb un impacte negligible en wirelength i timing. La segona contribució, desenvolupada en col·laboració amb eSilicon (una empresa capdavantera en disseny ASIC), és el desenvolupament de HiDaP, una eina de macro placement per a dissenys industrials actuals. La proposta segueix un procés multinivell per fer el floorplan de blocks jeràrquics, formats per macros i cel·les estàndard. Mitjançant la informació RTL disponible en la netlist, l'afinitat de dataflow entre els mòduls es modela i minimitza per trobar macro placements amb bones propietats de wirelength i timing. La proposta també incorpora la possibilitat de rebre input addicional de l'enginyer, com ara suggeriments de les posicions de les macros. Finalment, també usa mètodes espectrals i de forçes per guiar la cerca de floorplans. Els resultats experimentals mostren que els dissenys generats amb HiDaP són millors que els obtinguts per eines comercials capdavanteres de EDA. Els resultats també mostren que els dissenys presentats poden obtenir un wirelength similar i millor timing que macro placements obtinguts manualment, usats per fabricació. Alguns dissenys obtinguts per HiDaP s'han dut fins a timing-closure en una o dues rondes de modificacions incrementals per part d'enginyers de disseny físic. L'eina s'ha integrat en el procés de disseny de eSilicon i el seu desenvolupament continua més enllà de les aportacions a aquesta tesi.Postprint (published version

    Graphics Processing Unit-Based Computer-Aided Design Algorithms for Electronic Design Automation

    Get PDF
    The electronic design automation (EDA) tools are a specific set of software that play important roles in modern integrated circuit (IC) design. These software automate the design processes of IC with various stages. Among these stages, two important EDA design tools are the focus of this research: floorplanning and global routing. Specifically, the goal of this study is to parallelize these two tools such that their execution time can be significantly shortened on modern multi-core and graphics processing unit (GPU) architectures. The GPU hardware is a massively parallel architecture, enabling thousands of independent threads to execute concurrently. Although a small set of EDA tools can benefit from using GPU to accelerate their speed, most algorithms in this field are designed with the single-core paradigm in mind. The floorplanning and global routing algorithms are among the latter, and difficult to render any speedup on the GPU due to their inherent sequential nature. This work parallelizes the floorplanning and global routing algorithm through a novel approach and results in significant speedups for both tools implemented on the GPU hardware. Specifically, with a complete overhaul of solution space and design space exploration, a GPU-based floorplanning algorithm is able to render 4-166X speedup, while achieving similar or improved solutions compared with the sequential algorithm. The GPU-based global routing algorithm is shown to achieve significant speedup against existing state-of-the-art routers, while delivering competitive solution quality. Importantly, this parallel model for global routing renders a stable solution that is independent from the level of parallelism. In summary, this research has shown that through a design paradigm overhaul, sequential algorithms can also benefit from the massively parallel architecture. The findings of this study have a positive impact on the efficiency and design quality of modern EDA design flow

    Towards Machine Learning-Based FPGA Backend Flow: Challenges and Opportunities

    Get PDF
    Field-Programmable Gate Array (FPGA) is at the core of System on Chip (SoC) design across various Industry 5.0 digital systems—healthcare devices, farming equipment, autonomous vehicles and aerospace gear to name a few. Given that pre-silicon verification using Computer Aided Design (CAD) accounts for about 70% of the time and money spent on the design of modern digital systems, this paper summarizes the machine learning (ML)-oriented efforts in different FPGA CAD design steps. With the recent breakthrough of machine learning, FPGA CAD tasks—high-level synthesis (HLS), logic synthesis, placement and routing—are seeing a renewed interest in their respective decision-making steps. We focus on machine learning-based CAD tasks to suggest some pertinent research areas requiring more focus in CAD design. The development of open-source benchmarks optimized for an end-to-end machine learning experience, intra-FPGA optimization, domain-specific accelerators, lack of explainability and federated learning are the issues reviewed to identify important research spots requiring significant focus. The potential of the new cloud-based architectures to understand the application of the right ML algorithms in FPGA CAD decision-making steps is discussed, together with visualizing the scenario of incorporating more intelligence in the cloud platform, with the help of relatively newer technologies such as CAD as Adaptive OpenPlatform Service (CAOS). Altogether, this research explores several research opportunities linked with modern FPGA CAD flow design, which will serve as a single point of reference for modern FPGA CAD flow design

    Improving Coarsening Schemes for Hypergraph Partitioning by Exploiting Community Structure

    Get PDF
    We present an improved coarsening process for multilevel hypergraph partitioning that incorporates global information about the community structure. Community detection is performed via modularity maximization on a bipartite graph representation. The approach is made suitable for different classes of hypergraphs by defining weights for the graph edges that express structural properties of the hypergraph. We integrate our approach into a leading multilevel hypergraph partitioner with strong local search algorithms and perform extensive experiments on a large benchmark set of hypergraphs stemming from application areas such as VLSI design, SAT solving, and scientific computing. Our results indicate that respecting community structure during coarsening not only significantly improves the solutions found by the initial partitioning algorithm, but also consistently improves overall solution quality

    Design methodology and productivity improvement in high speed VLSI circuits

    Get PDF
    2017 Spring.Includes bibliographical references.To view the abstract, please see the full text of the document

    Rapid SoC Design: On Architectures, Methodologies and Frameworks

    Full text link
    Modern applications like machine learning, autonomous vehicles, and 5G networking require an order of magnitude boost in processing capability. For several decades, chip designers have relied on Moore’s Law - the doubling of transistor count every two years to deliver improved performance, higher energy efficiency, and an increase in transistor density. With the end of Dennard’s scaling and a slowdown in Moore’s Law, system architects have developed several techniques to deliver on the traditional performance and power improvements we have come to expect. More recently, chip designers have turned towards heterogeneous systems comprised of more specialized processing units to buttress the traditional processing units. These specialized units improve the overall performance, power, and area (PPA) metrics across a wide variety of workloads and applications. While the GPU serves as a classical example, accelerators for machine learning, approximate computing, graph processing, and database applications have become commonplace. This has led to an exponential growth in the variety (and count) of these compute units found in modern embedded and high-performance computing platforms. The various techniques adopted to combat the slowing of Moore’s Law directly translates to an increase in complexity for modern system-on-chips (SoCs). This increase in complexity in turn leads to an increase in design effort and validation time for hardware and the accompanying software stacks. This is further aggravated by fabrication challenges (photo-lithography, tooling, and yield) faced at advanced technology nodes (below 28nm). The inherent complexity in modern SoCs translates into increased costs and time-to-market delays. This holds true across the spectrum, from mobile/handheld processors to high-performance data-center appliances. This dissertation presents several techniques to address the challenges of rapidly birthing complex SoCs. The first part of this dissertation focuses on foundations and architectures that aid in rapid SoC design. It presents a variety of architectural techniques that were developed and leveraged to rapidly construct complex SoCs at advanced process nodes. The next part of the dissertation focuses on the gap between a completed design model (in RTL form) and its physical manifestation (a GDS file that will be sent to the foundry for fabrication). It presents methodologies and a workflow for rapidly walking a design through to completion at arbitrary technology nodes. It also presents progress on creating tools and a flow that is entirely dependent on open-source tools. The last part presents a framework that not only speeds up the integration of a hardware accelerator into an SoC ecosystem, but emphasizes software adoption and usability.PHDElectrical and Computer EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/168119/1/ajayi_1.pd

    High-Quality Hypergraph Partitioning

    Get PDF
    This dissertation focuses on computing high-quality solutions for the NP-hard balanced hypergraph partitioning problem: Given a hypergraph and an integer kk, partition its vertex set into kk disjoint blocks of bounded size, while minimizing an objective function over the hyperedges. Here, we consider the two most commonly used objectives: the cut-net metric and the connectivity metric. Since the problem is computationally intractable, heuristics are used in practice - the most prominent being the three-phase multi-level paradigm: During coarsening, the hypergraph is successively contracted to obtain a hierarchy of smaller instances. After applying an initial partitioning algorithm to the smallest hypergraph, contraction is undone and, at each level, refinement algorithms try to improve the current solution. With this work, we give a brief overview of the field and present several algorithmic improvements to the multi-level paradigm. Instead of using a logarithmic number of levels like traditional algorithms, we present two coarsening algorithms that create a hierarchy of (nearly) nn levels, where nn is the number of vertices. This makes consecutive levels as similar as possible and provides many opportunities for refinement algorithms to improve the partition. This approach is made feasible in practice by tailoring all algorithms and data structures to the nn-level paradigm, and developing lazy-evaluation techniques, caching mechanisms and early stopping criteria to speed up the partitioning process. Furthermore, we propose a sparsification algorithm based on locality-sensitive hashing that improves the running time for hypergraphs with large hyperedges, and show that incorporating global information about the community structure into the coarsening process improves quality. Moreover, we present a portfolio-based initial partitioning approach, and propose three refinement algorithms. Two are based on the Fiduccia-Mattheyses (FM) heuristic, but perform a highly localized search at each level. While one is designed for two-way partitioning, the other is the first FM-style algorithm that can be efficiently employed in the multi-level setting to directly improve kk-way partitions. The third algorithm uses max-flow computations on pairs of blocks to refine kk-way partitions. Finally, we present the first memetic multi-level hypergraph partitioning algorithm for an extensive exploration of the global solution space. All contributions are made available through our open-source framework KaHyPar. In a comprehensive experimental study, we compare KaHyPar with hMETIS, PaToH, Mondriaan, Zoltan-AlgD, and HYPE on a wide range of hypergraphs from several application areas. Our results indicate that KaHyPar, already without the memetic component, computes better solutions than all competing algorithms for both the cut-net and the connectivity metric, while being faster than Zoltan-AlgD and equally fast as hMETIS. Moreover, KaHyPar compares favorably with the current best graph partitioning system KaFFPa - both in terms of solution quality and running time
    corecore