Search CORE

71 research outputs found

Impact of Multi-level Clustering on Performance Driven Global Placement

Author: Balakrishnan Karthik
Ekpanyapong Mongkol
Lim Sung Kyu
Nanda Vidit
Publication venue: Georgia Institute of Technology
Publication date: 01/01/2003
Field of study

Delay and wirelength minimization continue to be important objectives in the design of high-performance computing systems. For large-scale circuits, the clustering process becomes essential for reducing the problem size. However, to the best of our knowledge, there is no study about the impact of multi-level clustering on performance-driven global placement. In this paper, five clustering algorithms including the quasi-optimal retiming delay driven PRIME and the cutsize-driven ESC have been considered for their impact on state-of-the-art mincut based global placement. Results show that minimizing cutsize or wirelength during clustering typically results in significant performance improvements

Scholarly Materials And Research @ Georgia Tech

Retiming-based timing analysis with an application to mincut-based global placement

Author: J. Cong
Sung Kyu Lim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Clustering for the optimisation of asynchronous controllers

Author: Casanova Bachs Jonàs
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2008
Field of study

The miniaturisation of integrated circuits is bringing new problems in terms of power consumption, speed, and variability tolerance. The current synchronous designs are struggling to cope with these problems, and in consequence new optimisations or paradigms are being studied. The study of this thesis are the optimisations like clock skew for synchronous circuits and asynchronous circuits as an alternative paradigm. The performance analysis of both cases are equivalent and algorithms on graph theory for cycles have been implemented to calculate the optimum speed. Asynchronous controllers are essential for a good asynchronous design. To create a connectivity structure of controllers it is necessary to group the memory elements (registers) of the circuit into clusters. Clustering registers affects power consumption, performance, area, and variability tolerance. To produce a good clustering is a hard job because of the high number of registers and for the trade-offs of optimising all these characteristics. An initial problem in clustering of controllers is to decide how many controllers we want. A design with one cluster give us the same problems of a synchronous design, high power consumption and too much sensible on variability of temperature, voltage, manufacturing errors, etc. On the other hand, having as many controllers as registers will produce too much overhead in area for all the new logic and wires that needs to be added. It is important to have clusters as less connected as possible to design simple controllers and to minimise the impact on area. We know from benchmarks and industrial designs that the register graph is highly connected, and the controllers graph is almost complete. A variation of Min-Cut can give us a solution to optimise this property. The clustering will have an impact on performance. Grouping registers implies a lost of freedom, and optimisations like clock skew or the asynchronous circuit will be affected by this lost as a handicap to reach the maximum speed. From the placement point of view we need to have clusters where their registers are close to minimise the clock tree. The ideal solution is a partition of the space. The worst solution is to have the registers spared around. The contribution of this thesis are two clustering algorithms; A local search solution to minimise the number of connections, and a k-means implementation that combines the minimisation of the clock trees and the maximisation of performance, by using parameters to balance it. These algorithms have been implemented in the Elastix EDA tool and executed on ISCAS benchmarks and SUN Microsystems OpenSparc processo

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Broadening the Scope of Multi-Objective Optimizations in Physical Synthesis of Integrated Circuits.

Author: Papa David Anthony
Publication venue
Publication date: 01/01/2010
Field of study

In modern VLSI design, physical synthesis tools are primarily responsible for satisfying chip-performance constraints by invoking a broad range of circuit optimizations, such as buffer insertion, logic restructuring, gate sizing and relocation. This process is known as timing closure. Our research seeks more powerful and efficient optimizations to improve the state of the art in modern chip design. In particular, we integrate timing-driven relocation, retiming, logic cloning, buffer insertion and gate sizing in novel ways to create powerful circuit transformations that help satisfy setup-time constraints. State-of-the-art physical synthesis optimizations are typically applied at two scales: i) global algorithms that affect the entire netlist and ii) local transformations that focus on a handful of gates or interconnections. The scale of modern chip designs dictates that only near-linear-time optimization algorithms can be applied at the global scope — typically limited to wirelength-driven placement and legalization. Localized transformations can rely on more time-consuming optimizations with accurate delay models. Few techniques bridge the gap between fully-global and localized optimizations. This dissertation broadens the scope of physical synthesis optimization to include accurate transformations operating between the global and local scales. In particular, we integrate groups of related transformations to break circular dependencies and increase the number of circuit elements that can be jointly optimized to escape local minima. Integrated transformations in this dissertation are developed by identifying and removing obstacles to successful optimizations. Integration is achieved through mapping multiple operations to rigorous mathematical optimization problems that can be solved simultaneously. We achieve computational scalability in our techniques by leveraging analytical delay models and focusing optimization efforts on carefully selected regions of the chip. In this regard, we make extensive use of a linear interconnect-delay model that accounts for the impact of subsequent repeated insertion. Our integrated transformations are evaluated on high-performance circuits with over 100,000 gates. Integrated optimization techniques described in this dissertation ensure graceful timing-closure process and impact nearly every aspect of a typical physical synthesis flow. They have been validated in EDA tools used at IBM for physical synthesis of high-performance CPU and ASIC designs, where they significantly improved chip performance.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/78744/1/iamyou_1.pd

CiteSeerX

Deep Blue Documents at the University of Michigan

Advanced Timing and Synchronization Methodologies for Digital VLSI Integrated Circuits

Author: Taskin Baris
Publication venue
Publication date: 01/01/2005
Field of study

This dissertation addresses timing and synchronization methodologies that are critical to the design, analysis and optimization of high-performance, integrated digital VLSI systems. As process sizes shrink and design complexities increase, achieving timing closure for digital VLSI circuits becomes a significant bottleneck in the integrated circuit design flow. Circuit designers are motivated to investigate and employ alternative methods to satisfy the timing and physical design performance targets. Such novel methods for the timing and synchronization of complex circuitry are developed in this dissertation and analyzed for performance and applicability.Mainstream integrated circuit design flow is normally tuned for zero clock skew, edge-triggered circuit design. Non-zero clock skew or multi-phase clock synchronization is seldom used because the lack of design automation tools increases the length and cost of the design cycle. For similar reasons, level-sensitive registers have not become an industry standard despite their superior size, speed and power consumption characteristics compared to conventional edge-triggered flip-flops.In this dissertation, novel design and analysis techniques that fully automate the design and analysis of non-zero clock skew circuits are presented. Clock skew scheduling of both edge-triggered and level-sensitive circuits are investigated in order to exploit maximum circuit performances. The effects of multi-phase clocking on non-zero clock skew, level-sensitive circuits are investigated leading to advanced synchronization methodologies. Improvements in the scalability of the computational timing analysis process with clock skew scheduling are explored through partitioning and parallelization.The integration of the proposed design and analysis methods to the physical design flow of integrated circuits synchronized with a next-generation clocking technology-resonant rotary clocking technology-is also presented. Based on the design and analysis methods presented in this dissertation, a computer-aided design tool for the design of rotary clock synchronized integrated circuits is developed

CiteSeerX

D-Scholarship@Pitt

Simultaneous timing driven clustering and placement for FPGAs

Author: Gang Chen
Jason Cong
Publication venue
Publication date: 11/04/2020
Field of study

Abstract. Traditional placement algorithms for FPGAs are normally carried out on a fixed clustering solution of a circuit. The impact of clustering on wirelength and delay of the placement solutions is not well quantified. In this paper, we present an algorithm named SCPlace that performs simultaneous clustering and placement to minimize both the total wirelength and longest path delay. We also incorporate a recently proposed path counting-based net weighting schem

CiteSeerX

A novel framework for multilevel full-chip gridless routing

Author: Shyh-chang Lin
Tai-chen Chen
Yao-wen Chang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2006
Field of study

Abstract — Due to its great flexibility, gridless routing is desirable for nanometer circuit designs that use variable wire widths and spacings. Nevertheless, it is much more difficult than grid-based routing because of its larger solution space. In this paper, we present a novel “V-shaped ” multilevel framework (called VMF) for full-chip gridless routing. Unlike the traditional “Λ-shaped ” multilevel framework (inaccurately called the “Vcycle” framework in the literature), our VMF works in the V-shaped manner: top-down uncoarsening followed by bottom-up coarsening. Based on the novel framework, we develop a multilevel full-chip gridless router (called VMGR) for large-scale circuit designs. The top-down uncoarsening stage of VMGR starts from the coarsest regions and then processes down to finest ones level by level; at each level, it performs global pattern routing and detailed routing for local nets and then estimate the routing resource for the next level. Then, the bottom-up coarsening stage performs global maze routing and detailed routing to reroute failed connections and refine the solution level by level from the finest level to the coarsest one. We employ a dynamic congestion map to guide the global routing at all stages and propose a new cost function for congestion control. Experimental results show that VMGR achieves the best routability among all published gridless routers based on a set of commonly used MCNC benchmarks. Besides, VMGR can obtain significantly less wirelength, smaller critical path delay, and smaller average net delay than the previous works. In particular, VMF is general and thus can readily apply to other problems. I

CiteSeerX

Crossref

Incremental physical design

Author: J Cong
M Sarrafzadeh
Publication venue
Publication date: 02/04/2020
Field of study

CiteSeerX

MARS-a multilevel full-chip gridless routing system

Author: J. Cong
Jie Fang
Min Xie
Yan Zhang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Architecture Independent Timing Speculation Techniques in VLSI Circuits.

Author: Fojtik Matthew R.
Publication venue
Publication date
Field of study

Conventional digital circuits must ensure correct operation throughout a wide range of operating conditions including process, voltage, and temperature variation. These conditions have an effect on circuit delays, and safety margins must be put in place which come at a power and performance cost. The Razor system proposed eliminating these timing margins by running a circuit with occasional timing errors and correcting the errors when they occur. Several existing Razor style designs have been proposed, however prior to this work, Razor could not be applied blindly or automatically to designs, as the various error correction schemes modified the architecture of the target design. Because of the architectural invasiveness and design complexities of these techniques, no published Razor style system had been applied to a complete existing commercial processor. Additionally, in all prior Razor-style systems, there is a fundamental tradeoff between speculation window and short path, or minimum delay, constraints, limiting the technique’s effectiveness. This thesis introduces the concept of Razor using two-phase latch based timing. By identifying and utilizing time borrowing as an error correction mechanism, it allows for Razor to be applied without the need to reload data or replay instructions. This allows for Razor to be blindly and automatically applied to existing designs without detailed knowledge of internal architecture. Additionally, latch based Razor allows for large speculation windows, up to 100% of nominal circuit delay, because it breaks the connection between minimum delay constraints and speculation window. By demonstrating how to transform conventional flip-flop based designs, including those which make use of clock gating, to two-phase latch based timing, Razor can be automatically added to a large set of existing digital designs. Two forms of latch based Razor are proposed. First, Bubble Razor involves rippling stall cycles throughout a circuit in response to timing errors and is applied to the ARM Cortex-M3 processor, the first ever application of a Razor technique to a complete, existing processor design. Additional work applies Bubble Razor to the ARM Cortex-R4 processor. The second latch based Razor technique, Voltage Razor, uses voltage boosting to correct for timing errors.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/102461/1/mfojtik_1.pd

Deep Blue Documents at the University of Michigan