Search CORE

31 research outputs found

Balanced Coarsening for Multilevel Hypergraph Partitioning via Wasserstein Discrepancy

Author: Guo Zhicheng
Jiao Licheng
Liu Xu
Zhao Jiaxuan
Publication venue
Publication date: 14/06/2021
Field of study

We propose a balanced coarsening scheme for multilevel hypergraph partitioning. In addition, an initial partitioning algorithm is designed to improve the quality of k-way hypergraph partitioning. By assigning vertex weights through the LPT algorithm, we generate a prior hypergraph under a relaxed balance constraint. With the prior hypergraph, we have defined the Wasserstein discrepancy to coordinate the optimal transport of coarsening process. And the optimal transport matrix is solved by Sinkhorn algorithm. Our coarsening scheme fully takes into account the minimization of connectivity metric (objective function). For the initial partitioning stage, we define a normalized cut function induced by Fiedler vector, which is theoretically proved to be a concave function. Thereby, a three-point algorithm is designed to find the best cut under the balance constraint

arXiv.org e-Print Archive

Wirelength and power - aware driven standard cell placement

Author: Αρβανιτάκης Ιωάννης
Publication venue
Publication date: 01/01/2013
Field of study

University of Thessaly Institutional Repository

Wirelength and timing driven standard-cell placement

Author: Ψύρρα Ειρήνη
Publication venue
Publication date: 01/01/2013
Field of study

University of Thessaly Institutional Repository

Impact of gate-level clustering on automated system partitioning of 3D-ICs

Author: Beyne Eric
Delhaye Quentin
Goossens Joël
Milojevic Dragomir
Van der Plas Geert
Publication venue
Publication date: 18/07/2023
Field of study

When partitioning gate-level netlists using graphs, it is beneficial to cluster gates to reduce the order of the graph and preserve some characteristics of the circuit that the partitioning might degrade. Gate clustering is even more important for netlist partitioning targeting 3D system integration. In this paper, we make the argument that the choice of clustering method for 3D-ICs partitioning is not trivial and deserves careful consideration. To support our claim, we implemented three clustering methods that were used prior to partitioning two synthetic designs representing two extremes of the circuits medium/long interconnect diversity spectrum. Automatically partitioned netlists are then placed and routed in 3D to compare the impact of clustering methods on several metrics. From our experiments, we see that the clustering method indeed has a different impact depending on the design considered and that a circuit-blind, universal partitioning method is not the way to go, with wire-length savings of up to 31%, total power of up to 22%, and effective frequency of up to 15% compared to other methods. Furthermore, we highlight that 3D-ICs open new opportunities to design systems with a denser interconnect, drastically reducing the design utilization of circuits that would not be considered viable in 2D.Comment: 8 pages, 6 figure

arXiv.org e-Print Archive

DI-fusion

Multilevel Hypergraph Partitioning with Vertex Weights Revisited

Author: Maas Nikolai
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2020
Field of study

KITopen

High-Quality Hypergraph Partitioning

Author: Schlag Sebastian
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2020
Field of study

This dissertation focuses on computing high-quality solutions for the NP-hard balanced hypergraph partitioning problem: Given a hypergraph and an integer

k

, partition its vertex set into

k

disjoint blocks of bounded size, while minimizing an objective function over the hyperedges. Here, we consider the two most commonly used objectives: the cut-net metric and the connectivity metric. Since the problem is computationally intractable, heuristics are used in practice - the most prominent being the three-phase multi-level paradigm: During coarsening, the hypergraph is successively contracted to obtain a hierarchy of smaller instances. After applying an initial partitioning algorithm to the smallest hypergraph, contraction is undone and, at each level, refinement algorithms try to improve the current solution. With this work, we give a brief overview of the field and present several algorithmic improvements to the multi-level paradigm. Instead of using a logarithmic number of levels like traditional algorithms, we present two coarsening algorithms that create a hierarchy of (nearly)

n

levels, where

n

is the number of vertices. This makes consecutive levels as similar as possible and provides many opportunities for refinement algorithms to improve the partition. This approach is made feasible in practice by tailoring all algorithms and data structures to the

n

-level paradigm, and developing lazy-evaluation techniques, caching mechanisms and early stopping criteria to speed up the partitioning process. Furthermore, we propose a sparsification algorithm based on locality-sensitive hashing that improves the running time for hypergraphs with large hyperedges, and show that incorporating global information about the community structure into the coarsening process improves quality. Moreover, we present a portfolio-based initial partitioning approach, and propose three refinement algorithms. Two are based on the Fiduccia-Mattheyses (FM) heuristic, but perform a highly localized search at each level. While one is designed for two-way partitioning, the other is the first FM-style algorithm that can be efficiently employed in the multi-level setting to directly improve

k

-way partitions. The third algorithm uses max-flow computations on pairs of blocks to refine

k

-way partitions. Finally, we present the first memetic multi-level hypergraph partitioning algorithm for an extensive exploration of the global solution space. All contributions are made available through our open-source framework KaHyPar. In a comprehensive experimental study, we compare KaHyPar with hMETIS, PaToH, Mondriaan, Zoltan-AlgD, and HYPE on a wide range of hypergraphs from several application areas. Our results indicate that KaHyPar, already without the memetic component, computes better solutions than all competing algorithms for both the cut-net and the connectivity metric, while being faster than Zoltan-AlgD and equally fast as hMETIS. Moreover, KaHyPar compares favorably with the current best graph partitioning system KaFFPa - both in terms of solution quality and running time

KITopen

Hardware Acceleration of Electronic Design Automation Algorithms

Author: Gulati Kanupriya
Publication venue
Publication date
Field of study

With the advances in very large scale integration (VLSI) technology, hardware is going parallel. Software, which was traditionally designed to execute on single core microprocessors, now faces the tough challenge of taking advantage of this parallelism, made available by the scaling of hardware. The work presented in this dissertation studies the acceleration of electronic design automation (EDA) software on several hardware platforms such as custom integrated circuits (ICs), field programmable gate arrays (FPGAs) and graphics processors. This dissertation concentrates on a subset of EDA algorithms which are heavily used in the VLSI design flow, and also have varying degrees of inherent parallelism in them. In particular, Boolean satisfiability, Monte Carlo based statistical static timing analysis, circuit simulation, fault simulation and fault table generation are explored. The architectural and performance tradeoffs of implementing the above applications on these alternative platforms (in comparison to their implementation on a single core microprocessor) are studied. In addition, this dissertation also presents an automated approach to accelerate uniprocessor code using a graphics processing unit (GPU). The key idea is to partition the software application into kernels in an automated fashion, such that multiple instances of these kernels, when executed in parallel on the GPU, can maximally benefit from the GPU?s hardware resources. The work presented in this dissertation demonstrates that several EDA algorithms can be successfully rearchitected to maximally harness their performance on alternative platforms such as custom designed ICs, FPGAs and graphic processors, and obtain speedups upto 800X. The approaches in this dissertation collectively aim to contribute towards enabling the computer aided design (CAD) community to accelerate EDA algorithms on arbitrary hardware platforms

Texas A&M Repository

Broadening the Scope of Multi-Objective Optimizations in Physical Synthesis of Integrated Circuits.

Author: Papa David Anthony
Publication venue
Publication date: 01/01/2010
Field of study

In modern VLSI design, physical synthesis tools are primarily responsible for satisfying chip-performance constraints by invoking a broad range of circuit optimizations, such as buffer insertion, logic restructuring, gate sizing and relocation. This process is known as timing closure. Our research seeks more powerful and efficient optimizations to improve the state of the art in modern chip design. In particular, we integrate timing-driven relocation, retiming, logic cloning, buffer insertion and gate sizing in novel ways to create powerful circuit transformations that help satisfy setup-time constraints. State-of-the-art physical synthesis optimizations are typically applied at two scales: i) global algorithms that affect the entire netlist and ii) local transformations that focus on a handful of gates or interconnections. The scale of modern chip designs dictates that only near-linear-time optimization algorithms can be applied at the global scope — typically limited to wirelength-driven placement and legalization. Localized transformations can rely on more time-consuming optimizations with accurate delay models. Few techniques bridge the gap between fully-global and localized optimizations. This dissertation broadens the scope of physical synthesis optimization to include accurate transformations operating between the global and local scales. In particular, we integrate groups of related transformations to break circular dependencies and increase the number of circuit elements that can be jointly optimized to escape local minima. Integrated transformations in this dissertation are developed by identifying and removing obstacles to successful optimizations. Integration is achieved through mapping multiple operations to rigorous mathematical optimization problems that can be solved simultaneously. We achieve computational scalability in our techniques by leveraging analytical delay models and focusing optimization efforts on carefully selected regions of the chip. In this regard, we make extensive use of a linear interconnect-delay model that accounts for the impact of subsequent repeated insertion. Our integrated transformations are evaluated on high-performance circuits with over 100,000 gates. Integrated optimization techniques described in this dissertation ensure graceful timing-closure process and impact nearly every aspect of a typical physical synthesis flow. They have been validated in EDA tools used at IBM for physical synthesis of high-performance CPU and ASIC designs, where they significantly improved chip performance.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/78744/1/iamyou_1.pd

CiteSeerX

Deep Blue Documents at the University of Michigan