14 research outputs found

    Further improve circuit partitioning using GBAW logic perturbation techniques

    Full text link

    Timing Aware Partitioning for Multi-FPGA based Logic Simulation using Top-down Selective Flattening

    Get PDF
    In order to accelerate logic simulation, it is highly beneficial to simulate the circuit design on FPGA hardware. However, limited hardware resources on FPGAs prevent large designs from being implemented on a single FPGA. Hence there is a need to partition the design and simulate it on a multi-FPGA platform. In contrast to existing FPGA-based post-synthesis partitioning approaches which first completely flatten the circuit and then possibly perform bottom-up clustering, we perform a selective top-down flattening and thereby avoid the potential netlist blowup. This also allows us to preserve the design hierarchy to guide the partitioning and to make subsequent debugging easier. Our approach analyzes the hierarchical design and selectively flattens instances using two metrics based on slack. The resulting partially flattened netlist is converted to a hypergraph, partitioned using a public domain partitioner (hMetis), and reconverted back to a plurality of FPGA netlists, one for each FPGA of the FPGA-based accelerated logic simulation platform. We compare our approach with a partitioning approach that operates on a completely flattened netlist. Static timing analysis was performed for both approaches, and over 15 examples from the OpenCores project, our approach yields a 52% logic simulation speedup and about 0.74x runtime for the entire flow, compared to the completely flat approach. The entire tool chain of our approach is automated in an end-to-end flow from hierarchy extraction, selective flattening, partitioning, and netlist reconstruction. Compared to an existing method which also performs slack-based partitioning of a hierarchical netlist, we obtain a 35% simulation speedup

    Using ant colony optimization for routing in microprocesors

    Get PDF
    Power consumption is an important constraint on VLSI systems. With the advancement in technology, it is now possible to pack a large range of functionalities into VLSI devices. Hence it is important to find out ways to utilize these functionalities with optimized power consumption. This work focuses on curbing power consumption at the design stage. This work emphasizes minimizing active power consumption by minimizing the load capacitance of the chip. Capacitance of wires and vias can be minimized using Ant Colony Optimization (ACO) algorithms. ACO provides a multi agent framework for combinatorial optimization problems and hence is used to handle multiple constraints of minimizing wire-length and vias to achieve the goal of minimizing capacitance and hence power consumption. The ACO developed here is able to achieve an 8% reduction of wire-length and 7% reduction in vias thereby providing a 7% reduction in total capacitance, compared to other state of the art routers

    VLSI Circuit Partitioning by Cluster-Removal using Iterative Improvement Techniques

    No full text
    Move-based iterative improvement partitioning methods such as the Fiduccia-Mattheyses (FM) algorithm [3] and Krishnamurthy's Look-Ahead (LA) algorithm [4] are widely used in VLSI CAD applications largely due to their time efficiency and ease of implementation. This class of algorithms is of the "local improvement" type. They generate relatively high quality results for small and medium size circuits. However, as VLSI circuits become larger, these algorithms are not so effective on them as direct partitioning tools. We propose new iterative-improvement methods that select cells to move with a view to moving clusters that straddle the two subsets of a partition into one of the subsets. The new algorithms significantly improve partition quality while preserving the advantage of time efficiency. Experimental results on 25 medium to large size ACM/SIGDA benchmark circuits show up to 70% improvement over FM in cutsize, with an average of per-circuit percent improvements of about 25%, and a t..

    Delay driven multi-way circuit partitioning.

    Get PDF
    Wong Sze Hon.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 88-91).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Preliminaries --- p.1Chapter 1.2 --- Motivations --- p.1Chapter 1.3 --- Contributions --- p.3Chapter 1.4 --- Organization of the Thesis --- p.4Chapter 2 --- VLSI Physical Design Automation --- p.5Chapter 2.1 --- Preliminaries --- p.5Chapter 2.2 --- VLSI Design Cycle [1] --- p.6Chapter 2.2.1 --- System Specification --- p.6Chapter 2.2.2 --- Architectural Design --- p.6Chapter 2.2.3 --- Functional Design --- p.6Chapter 2.2.4 --- Logic Design --- p.8Chapter 2.2.5 --- Circuit Design --- p.8Chapter 2.2.6 --- Physical Design --- p.8Chapter 2.2.7 --- Fabrication --- p.8Chapter 2.2.8 --- Packaging and Testing --- p.9Chapter 2.3 --- Physical Design Cycle [1] --- p.9Chapter 2.3.1 --- Partitioning --- p.9Chapter 2.3.2 --- Floorplanning and Placement --- p.11Chapter 2.3.3 --- Routing --- p.11Chapter 2.3.4 --- Compaction --- p.12Chapter 2.3.5 --- Extraction and Verification --- p.12Chapter 2.4 --- Chapter Summary --- p.12Chapter 3 --- Recent Approaches on Circuit Partitioning --- p.14Chapter 3.1 --- Preliminaries --- p.14Chapter 3.2 --- Circuit Representation --- p.15Chapter 3.3 --- Delay Modelling --- p.16Chapter 3.4 --- Partitioning Objectives --- p.19Chapter 3.4.1 --- Interconnections between Partitions --- p.19Chapter 3.4.2 --- Delay Minimization --- p.19Chapter 3.4.3 --- Area and Number of Partitions --- p.20Chapter 3.5 --- Partitioning Algorithms --- p.20Chapter 3.5.1 --- Cut-size Driven Partitioning Algorithm --- p.21Chapter 3.5.2 --- Delay Driven Partitioning Algorithm --- p.32Chapter 3.5.3 --- Acyclic Circuit Partitioning Algorithm --- p.33Chapter 4 --- Clustering Based Acyclic Multi-way Partitioning --- p.38Chapter 4.1 --- Preliminaries --- p.38Chapter 4.2 --- Previous Works on Clustering Based Partitioning --- p.39Chapter 4.2.1 --- Multilevel Circuit Partitioning [2] --- p.40Chapter 4.2.2 --- Cluster-Oriented Iterative-Improvement Partitioner [3] --- p.42Chapter 4.2.3 --- Section Summary --- p.44Chapter 4.3 --- Problem Formulation --- p.45Chapter 4.4 --- Clustering Based Acyclic Multi-Way Partitioning --- p.46Chapter 4.5 --- Modified Fan-out Free Cone Decomposition --- p.47Chapter 4.6 --- Clustering Phase --- p.48Chapter 4.7 --- Partitioning Phase --- p.51Chapter 4.8 --- The Acyclic Constraint --- p.52Chapter 4.9 --- Experimental Results --- p.57Chapter 4.10 --- Chapter Summary --- p.58Chapter 5 --- Network Flow Based Multi-way Partitioning --- p.61Chapter 5.1 --- Preliminaries --- p.61Chapter 5.2 --- Notations and Definitions --- p.62Chapter 5.3 --- Net Modelling --- p.63Chapter 5.4 --- Previous Works on Network Flow Based Partitioning --- p.64Chapter 5.4.1 --- Network Flow Based Min-Cut Balanced Partitioning [4] --- p.65Chapter 5.4.2 --- Network Flow Based Circuit Partitioning for Time-multiplexed FPGAs [5] --- p.66Chapter 5.5 --- Proposed Net Modelling --- p.70Chapter 5.6 --- Partitioning Properties Based on the Proposed Net Modelling --- p.73Chapter 5.7 --- Partitioning Step --- p.75Chapter 5.8 --- Constrained FM Post Processing Step --- p.79Chapter 5.9 --- Experiment Results --- p.81Chapter 6 --- Conclusion --- p.86Bibliography --- p.8

    Quality Hypergraph Partitioning via Max-Flow-Min-Cut Computations

    Get PDF
    In dieser Arbeit wird ein Framework basierend auf Max-Flow-Min-Cut Berechnungen vorgestellt, zur Verbesserung einer balancierten k-teilige Aufteilung eines Hypergraphen. Aktuell werden Varianten des FM Algorithmus [17] in allen modernen Multilevel Hypergraph Partitionierer als lokaler Suchalgorithmus verwendet. Solche bewegungsbasierenden Heuristiken haben den Nachteil, dass sie nur lokale Informationen über die Problemstruktur in die Berechnungen miteinfließen lassen. Wenn viele Knotenbewegungen den selben Einfluss auf die Lösungsqualität haben, dann hängt das Ergebnis oft von zufälligen Entscheidungen ab, welche der Algorithmus selbst trifft [15, 31, 36]. Flussbasierende Ansätze sind nicht bewegungbasiert und finden einen globalen minimalen Schnitt, welcher zwei Knoten s und t eines Graphen trennt [18]. Unser Framework ist durch die Arbeit von Sanders und Schulz [44] inspiriert. Diese integrierten eine flussbasierende Heuristik erfolgreich in Ihren Multilevel Graph Partitionierer. Wir generalisieren viele Ihrer Ideen, sodass sie im Multilevel Hypergraph Partitionierung-Kontext anwendbar sind. Wir entwickeln mehrere Techniken, um das aktuelle Hypergraph Flussnetzwerk zu verkleinern, welche die resultierende Problemgröße im Vergleich zu der aktuellen Representation [33], um den Faktor 2 reduziert. Zusätzlich zeigen wir, wie ein Flussproblem auf einem Subhypergraphen konfiguriert werden kann, sodass das eine Max-Flow-Min-Cut Berechnung eine bessere Qualität erzielt, als die Modellierung von Sanders und Schulz. Am Ende haben wir unsere Arbeit als Verbesserungsstrategie in den n-level Hypergraph Partitionierer KaHyPar integriert [25]. Wir haben unser Framework auf 3216 verschiedenen Instanzen getestet. Im Vergleich mit 5 verschiedenen Systemen erzielt unsere neue Konfiguration, auf 73% der Instanzen, die besten Ergebnisse. Im Vergleich zu der aktuellen Variante von KaHyPar ist die Qualität der Lösungen um 2.5% gestiegen, während die Laufzeit lediglich um den Faktor 2 langsamer ist. Jedoch hat unser Algorithmus eine vergleichbare Laufzeit mit hMetis und erzielt auf 84% der Instanzen bessere Ergebnisse

    Hardware-software codesign in a high-level synthesis environment

    Get PDF
    Interfacing hardware-oriented high-level synthesis to software development is a computationally hard problem for which no general solution exists. Under special conditions, the hardware-software codesign (system-level synthesis) problem may be analyzed with traditional tools and efficient heuristics. This dissertation introduces a new alternative to the currently used heuristic methods. The new approach combines the results of top-down hardware development with existing basic hardware units (bottom-up libraries) and compiler generation tools. The optimization goal is to maximize operating frequency or minimize cost with reasonable tradeoffs in other properties. The dissertation research provides a unified approach to hardware-software codesign. The improvements over previously existing design methodologies are presented in the frame-work of an academic CAD environment (PIPE). This CAD environment implements a sufficient subset of functions of commercial microelectronics CAD packages. The results may be generalized for other general-purpose algorithms or environments. Reference benchmarks are used to validate the new approach. Most of the well-known benchmarks are based on discrete-time numerical simulations, digital filtering applications, and cryptography (an emerging field in benchmarking). As there is a need for high-performance applications, an additional requirement for this dissertation is to investigate pipelined hardware-software systems\u27 performance and design methods. The results demonstrate that the quality of existing heuristics does not change in the enhanced, hardware-software environment

    Logic perturbation based circuit partitioning and optimum FPGA switch-box designs.

    Get PDF
    Cheung Chak Chung.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 101-114).Abstracts in English and Chinese.Abstract --- p.iAcknowledgments --- p.iiiVita --- p.vTable of Contents --- p.viList of Figures --- p.xList of Tables --- p.xivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivation --- p.1Chapter 1.2 --- Aims and Contribution --- p.4Chapter 1.3 --- Thesis Overview --- p.5Chapter 2 --- VLSI Design Cycle --- p.6Chapter 2.1 --- Logic Synthesis --- p.7Chapter 2.1.1 --- Logic Minimization --- p.8Chapter 2.1.2 --- Technology Mapping --- p.8Chapter 2.1.3 --- Testability --- p.8Chapter 2.2 --- Physical Design Synthesis --- p.8Chapter 2.2.1 --- Partitioning --- p.9Chapter 2.2.2 --- Floorplanning & Placement --- p.10Chapter 2.2.3 --- Routing --- p.11Chapter 2.2.4 --- "Compaction, Extraction & Verification" --- p.12Chapter 2.2.5 --- Physical Design of FPGAs --- p.12Chapter 3 --- Alternative Wiring --- p.13Chapter 3.1 --- Introduction --- p.13Chapter 3.2 --- Notation and Definitions --- p.15Chapter 3.3 --- Application of Rewiring --- p.17Chapter 3.3.1 --- Logic Optimization --- p.17Chapter 3.3.2 --- Timing Optimization --- p.17Chapter 3.3.3 --- Circuit Partitioning and Routing --- p.18Chapter 3.4 --- Logic Optimization Analysis --- p.19Chapter 3.4.1 --- Global Flow Optimization --- p.19Chapter 3.4.2 --- OBDD Representation --- p.20Chapter 3.4.3 --- Automatic Test Pattern Generation (ATPG) --- p.22Chapter 3.4.4 --- Graph Based Alternative Wiring (GBAW) --- p.23Chapter 3.5 --- Augmented GBAW --- p.26Chapter 3.6 --- Logic Optimization by using GBAW --- p.28Chapter 3.7 --- Conclusions --- p.31Chapter 4 --- Multi-way Partitioning using Rewiring Techniques --- p.33Chapter 4.1 --- Introduction --- p.33Chapter 4.2 --- Circuit Partitioning Algorithm Analysis --- p.38Chapter 4.2.1 --- The Kernighan-Lin (KL) Algorithm --- p.39Chapter 4.2.2 --- The Fiduccia-Mattheyses (FM) Algorithm --- p.42Chapter 4.2.3 --- Geometric Representation Algorithm --- p.46Chapter 4.2.4 --- The Multi-level Partitioning Algorithm --- p.49Chapter 4.2.5 --- Hypergraph METIS - hMETIS --- p.51Chapter 4.3 --- The GBAW Partitioning Algorithm --- p.53Chapter 4.4 --- Experimental Results --- p.56Chapter 4.5 --- Conclusions --- p.58Chapter 5 --- Optimum FPGA Switch-Box Designs - HUSB --- p.62Chapter 5.1 --- Introduction --- p.62Chapter 5.2 --- Background and Definitions --- p.65Chapter 5.2.1 --- Routing Architectures --- p.65Chapter 5.2.2 --- Global Routing --- p.67Chapter 5.2.3 --- Detailed Routing --- p.67Chapter 5.3 --- FPGA Router Comparison --- p.69Chapter 5.3.1 --- CGE --- p.69Chapter 5.3.2 --- SEGA --- p.70Chapter 5.3.3 --- TRACER --- p.71Chapter 5.3.4 --- VPR --- p.72Chapter 5.4 --- Switch Box Design --- p.73Chapter 5.4.1 --- Disjoint type switch box (XC4000-type) --- p.73Chapter 5.4.2 --- Anti-symmetric switch box --- p.74Chapter 5.4.3 --- Universal Switch box --- p.74Chapter 5.4.4 --- Switch box Analysis --- p.75Chapter 5.5 --- Terminology --- p.77Chapter 5.6 --- "Hyper-universal (4, W)-design analysis" --- p.82Chapter 5.6.1 --- "H3 is an optimum (4, 3)-design" --- p.84Chapter 5.6.2 --- "H4 is an optimum (4,4)-design" --- p.88Chapter 5.6.3 --- "Hi is a hyper-universal (4, i)-design for i = 5,6,7" --- p.90Chapter 5.7 --- Experimental Results --- p.92Chapter 5.8 --- Conclusions --- p.95Chapter 6 --- Conclusions --- p.99Chapter 6.1 --- Thesis Summary --- p.99Chapter 6.2 --- Future work --- p.100Chapter 6.2.1 --- Alternative Wiring --- p.100Chapter 6.2.2 --- Partitioning Quality --- p.100Chapter 6.2.3 --- Routing Devices Studies --- p.100Bibliography --- p.101Chapter A --- 5xpl - Berkeley Logic Interchange Format (BLIF) --- p.115Chapter B --- Proof of some 2-local patterns --- p.122Chapter C --- Illustrations of FM algorithm --- p.124Chapter D --- HUSB Structures --- p.127Chapter E --- Primitive minimal 4-way global routing Structures --- p.13

    High-Quality Hypergraph Partitioning

    Get PDF
    This dissertation focuses on computing high-quality solutions for the NP-hard balanced hypergraph partitioning problem: Given a hypergraph and an integer kk, partition its vertex set into kk disjoint blocks of bounded size, while minimizing an objective function over the hyperedges. Here, we consider the two most commonly used objectives: the cut-net metric and the connectivity metric. Since the problem is computationally intractable, heuristics are used in practice - the most prominent being the three-phase multi-level paradigm: During coarsening, the hypergraph is successively contracted to obtain a hierarchy of smaller instances. After applying an initial partitioning algorithm to the smallest hypergraph, contraction is undone and, at each level, refinement algorithms try to improve the current solution. With this work, we give a brief overview of the field and present several algorithmic improvements to the multi-level paradigm. Instead of using a logarithmic number of levels like traditional algorithms, we present two coarsening algorithms that create a hierarchy of (nearly) nn levels, where nn is the number of vertices. This makes consecutive levels as similar as possible and provides many opportunities for refinement algorithms to improve the partition. This approach is made feasible in practice by tailoring all algorithms and data structures to the nn-level paradigm, and developing lazy-evaluation techniques, caching mechanisms and early stopping criteria to speed up the partitioning process. Furthermore, we propose a sparsification algorithm based on locality-sensitive hashing that improves the running time for hypergraphs with large hyperedges, and show that incorporating global information about the community structure into the coarsening process improves quality. Moreover, we present a portfolio-based initial partitioning approach, and propose three refinement algorithms. Two are based on the Fiduccia-Mattheyses (FM) heuristic, but perform a highly localized search at each level. While one is designed for two-way partitioning, the other is the first FM-style algorithm that can be efficiently employed in the multi-level setting to directly improve kk-way partitions. The third algorithm uses max-flow computations on pairs of blocks to refine kk-way partitions. Finally, we present the first memetic multi-level hypergraph partitioning algorithm for an extensive exploration of the global solution space. All contributions are made available through our open-source framework KaHyPar. In a comprehensive experimental study, we compare KaHyPar with hMETIS, PaToH, Mondriaan, Zoltan-AlgD, and HYPE on a wide range of hypergraphs from several application areas. Our results indicate that KaHyPar, already without the memetic component, computes better solutions than all competing algorithms for both the cut-net and the connectivity metric, while being faster than Zoltan-AlgD and equally fast as hMETIS. Moreover, KaHyPar compares favorably with the current best graph partitioning system KaFFPa - both in terms of solution quality and running time
    corecore