11 research outputs found

    Early Wire Characterization for Predictable Network-on-Chip Global Interconnects

    Get PDF
    This work envisions a common design methodology, applicable for every interconnect level and based on early wire characterization, to provide a faster convergence to a feasible and robust design. We claim that such a novel design methodology is vital for upcoming nanometer technologies, where increased variations in both device characteristics and interconnect parameters introduce tedious design closure problems. The proposed methodology has been successfully applied to the wire synthesis of a Network-on-Chip interconnect to: (i) achieve a given delay and noise goals, and (ii) attain a more power-efficient design with respect to existing techniques

    Performance-driven register insertion in placement

    Full text link

    JETTY: Filtering snoops for reduced energy consumption in SMP servers

    Get PDF
    We propose methods for reducing the energy consumed by snoop requests in snoopy bus-based symmetric multiprocessor (SMP) systems. Observing that a large fraction of snoops do not find copies in many of the other caches, we introduce JETTY, a small, cache-like structure. A JETTY is introduced in- between the bus and the L2 backside of each processor. There it filters the vast majority of snoops that would not find a locally cached copy. Energy is reduced as accesses to the much more energy demanding L2 tag arrays are decreased. No changes in the existing coherence protocol are required and no performance loss is experienced. We evaluate our method on a 4-way SMP server using a set of shared-memory applications. We demonstrate that a very small JETTY filters 74% (average) of all snoop-induced tag accesses that would miss. This results in an average energy reduction of 29% (range: 12% to 40%) measured as a fraction of the energy required by all L2 accesses (both tag and data arrays)

    Interconnect performance estimation models for design planning

    Full text link

    Modeling and Analysis of Noise and Interconnects for On-Chip Communication Link Design

    Get PDF
    This thesis considers modeling and analysis of noise and interconnects in onchip communication. Besides transistor count and speed, the capabilities of a modern design are often limited by on-chip communication links. These links typically consist of multiple interconnects that run parallel to each other for long distances between functional or memory blocks. Due to the scaling of technology, the interconnects have considerable electrical parasitics that affect their performance, power dissipation and signal integrity. Furthermore, because of electromagnetic coupling, the interconnects in the link need to be considered as an interacting group instead of as isolated signal paths. There is a need for accurate and computationally effective models in the early stages of the chip design process to assess or optimize issues affecting these interconnects. For this purpose, a set of analytical models is developed for on-chip data links in this thesis. First, a model is proposed for modeling crosstalk and intersymbol interference. The model takes into account the effects of inductance, initial states and bit sequences. Intersymbol interference is shown to affect crosstalk voltage and propagation delay depending on bus throughput and the amount of inductance. Next, a model is proposed for the switching current of a coupled bus. The model is combined with an existing model to evaluate power supply noise. The model is then applied to reduce both functional crosstalk and power supply noise caused by a bus as a trade-off with time. The proposed reduction method is shown to be effective in reducing long-range crosstalk noise. The effects of process variation on encoded signaling are then modeled. In encoded signaling, the input signals to a bus are encoded using additional signaling circuitry. The proposed model includes variation in both the signaling circuitry and in the wires to calculate the total delay variation of a bus. The model is applied to study level-encoded dual-rail and 1-of-4 signaling. In addition to regular voltage-mode and encoded voltage-mode signaling, current-mode signaling is a promising technique for global communication. A model for energy dissipation in RLC current-mode signaling is proposed in the thesis. The energy is derived separately for the driver, wire and receiver termination.Siirretty Doriast

    Retiming with wire delay and post-retiming register placement.

    Get PDF
    Tong Ka Yau Dennis.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves 77-81).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivations --- p.1Chapter 1.2 --- Progress on the Problem --- p.2Chapter 1.3 --- Our Contributions --- p.3Chapter 1.4 --- Thesis Organization --- p.4Chapter 2 --- Background on Retiming --- p.5Chapter 2.1 --- Introduction --- p.5Chapter 2.2 --- Preliminaries --- p.7Chapter 2.3 --- Retiming Problem --- p.9Chapter 3 --- Literature Review on Retiming --- p.10Chapter 3.1 --- Introduction --- p.10Chapter 3.2 --- The First Retiming Paper --- p.11Chapter 3.2.1 --- """Retiming Synchronous Circuitry""" --- p.11Chapter 3.3 --- Important Extensions of the Basic Retiming Algorithm --- p.14Chapter 3.3.1 --- """A Fresh Look at Retiming via Clock Skew Optimization""" --- p.14Chapter 3.3.2 --- """An Improved Algorithm for Minimum-Area Retiming""" --- p.16Chapter 3.3.3 --- """Efficient Implementation of Retiming""" --- p.17Chapter 3.4 --- Retiming in Physical Design Stages --- p.19Chapter 3.4.1 --- """Physical Planning with Retiming""" --- p.19Chapter 3.4.2 --- """Simultaneous Circuit Partitioning/Clustering with Re- timing for Performance Optimization" --- p.20Chapter 3.4.3 --- """Performance Driven Multi-level and Multiway Parti- tioning with Retiming" --- p.22Chapter 3.5 --- Retiming with More Sophisticated Timing Models --- p.23Chapter 3.5.1 --- """Retiming with Non-zero Clock Skew, Variable Register, and Interconnect Delay""" --- p.23Chapter 3.5.2 --- """Placement Driven Retiming with a Coupled Edge Tim- ing Model""" --- p.24Chapter 3.6 --- Post-Retiming Register Placement --- p.26Chapter 3.6.1 --- """Layout Driven Retiming Using the Coupled Edge Tim- ing Model""" --- p.26Chapter 3.6.2 --- """Integrating Logic Retiming and Register Placement""" --- p.27Chapter 4 --- Retiming with Gate and Wire Delay [2] --- p.29Chapter 4.1 --- Introduction --- p.29Chapter 4.2 --- Problem Formulation --- p.30Chapter 4.3 --- Optimal Approach [2] --- p.31Chapter 4.3.1 --- Original Mathematical Framework for Retiming --- p.31Chapter 4.3.2 --- A Modified Optimal Approach --- p.33Chapter 4.4 --- Near-Optimal Fast Approach [2] --- p.37Chapter 4.4.1 --- Considering Wire Delay Only --- p.38Chapter 4.4.2 --- Considering Both Gate and Wire Delay --- p.42Chapter 4.4.3 --- Computational Complexity --- p.43Chapter 4.4.4 --- Experimental Results --- p.44Chapter 4.5 --- Lin's Optimal Approach [23] --- p.47Chapter 4.5.1 --- Theoretical Results --- p.47Chapter 4.5.2 --- Algorithm Description --- p.51Chapter 4.5.3 --- Computational Complexity --- p.52Chapter 4.5.4 --- Experimental Results --- p.52Chapter 4.6 --- Summary --- p.54Chapter 5 --- Register Insertion in Placement [36] --- p.55Chapter 5.1 --- Introduction --- p.55Chapter 5.2 --- Problem Formulation --- p.57Chapter 5.3 --- Placement of Registers After Retiming --- p.60Chapter 5.3.1 --- Topology Finding --- p.60Chapter 5.3.2 --- Register Placement --- p.69Chapter 5.4 --- Experimental Results --- p.71Chapter 5.5 --- Summary --- p.74Chapter 6 --- Conclusion --- p.75Bibliography --- p.7

    Shortest Paths and Steiner Trees in VLSI Routing

    Get PDF
    Routing is one of the major steps in very-large-scale integration (VLSI) design. Its task is to find disjoint wire connections between sets of points on a chip, subject to numerous constraints. This problem is solved in a two-stage approach, which consists of so-called global and detailed routing steps. For each set of metal components to be connected, global routing reduces the search space by computing corridors in which detailed routing sequentially determines the desired connections as shortest paths. In this thesis, we present new theoretical results on Steiner trees and shortest paths, the two main mathematical concepts in routing. In the practical part, we give computational results of BonnRoute, a VLSI routing tool developed at the Research Institute for Discrete Mathematics at the University of Bonn. Interconnect signal delays are becoming increasingly important in modern chip designs. Therefore, the length of paths or direct delay measures should be taken into account when constructing rectilinear Steiner trees. We consider the problem of finding a rectilinear Steiner minimum tree (RSMT) that --- as a secondary objective --- minimizes a signal delay related objective. Given a source we derive some structural properties of RSMTs for which the weighted sum of path lengths from the source to the other terminals is minimized. Also, we present an exact algorithm for constructing RSMTs with weighted sum of path lengths as secondary objective, and a heuristic for various secondary objectives. Computational results for industrial designs are presented. We further consider the problem of finding a shortest rectilinear Steiner tree in the plane in the presence of rectilinear obstacles. The Steiner tree is allowed to run over obstacles; however, if it intersects an obstacle, then no connected component of the induced subtree must be longer than a given fixed length. This kind of length restriction is motivated by its application in VLSI routing where a large Steiner tree requires the insertion of repeaters which must not be placed on top of obstacles. We show that there are optimal length-restricted Steiner trees with a special structure. In particular, we prove that a certain graph (called augmented Hanan grid) always contains an optimal solution. Based on this structural result, we give an approximation scheme for the special case that all obstacles are of rectangular shape or are represented by at most a constant number of edges. Turning to the shortest paths problem, we present a new generic framework for Dijkstra's algorithm for finding shortest paths in digraphs with non-negative integral edge lengths. Instead of labeling individual vertices, we label subgraphs which partition the given graph. Much better running times can be achieved if the number of involved subgraphs is small compared to the order of the original graph and the shortest path problems restricted to these subgraphs is computationally easy. As an application we consider the VLSI routing problem, where we need to find millions of shortest paths in partial grid graphs with billions of vertices. Here, the algorithm can be applied twice, once in a coarse abstraction (where the labeled subgraphs are rectangles), and once in a detailed model (where the labeled subgraphs are intervals). Using the result of the first algorithm to speed up the second one via goal-oriented techniques leads to considerably reduced running time. We illustrate this with the routing program BonnRoute on leading-edge industrial chips. Finally, we present computational results of BonnRoute obtained on real-world VLSI chips. BonnRoute fulfills all requirements of modern VLSI routing and has been used by IBM and its customers over many years to produce more than one thousand different chips. To demonstrate the strength of BonnRoute as a state-of-the-art industrial routing tool, we show that it performs excellently on all traditional quality measures such as wire length and number of vias, but also on further criteria of equal importance in the every-day work of the designer

    Efficient approaches in interconnect-driven floorplanning.

    Get PDF
    Lai Tsz Wai.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 123-129).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- VLSI Design Cycle --- p.2Chapter 1.2 --- Physical Design Cycle --- p.4Chapter 1.3 --- Floorplanning --- p.7Chapter 1.3.1 --- Types of Floorplan and Floorplan Representations --- p.11Chapter 1.3.2 --- Interconnect-driven Floorplanning --- p.13Chapter 1.4 --- Motivations and Contributions --- p.17Chapter 1.5 --- Organization of this Thesis --- p.18Chapter 2 --- Literature Review on Floorplan Representation --- p.20Chapter 2.1 --- Slicing Floorplan Representation --- p.20Chapter 2.1.1 --- Normalized Polish Expression --- p.20Chapter 2.2 --- Non-slicing Floorplan Representations --- p.21Chapter 2.2.1 --- Sequence Pair (SP) --- p.21Chapter 2.2.2 --- Bounded-sliceline Grid (BSG) --- p.23Chapter 2.2.3 --- O-tree --- p.25Chapter 2.2.4 --- B*-tree --- p.26Chapter 2.3 --- Mosaic Floorplan Representations --- p.28Chapter 2.3.1 --- Corner Block List (CBL) --- p.28Chapter 2.3.2 --- Twin Binary Trees (TBT) --- p.31Chapter 2.3.3 --- Twin Binary Sequences (TBS) --- p.32Chapter 2.4 --- Summary --- p.34Chapter 3 --- Literature Review on Interconnect Optimization in Floorplan- ning --- p.37Chapter 3.1 --- Wirelength Estimation --- p.37Chapter 3.2 --- Congestion Optimization --- p.38Chapter 3.2.1 --- Integrated Floorplanning and Interconnect Planning --- p.41Chapter 3.2.2 --- Multi-layer Global Wiring Planning (GWP) --- p.43Chapter 3.2.3 --- Estimating Routing Congestion using Probabilistic Anal- ysis --- p.44Chapter 3.2.4 --- Congestion Minimization During Placement --- p.46Chapter 3.2.5 --- Modelling and Minimization of Routing Congestion --- p.48Chapter 3.3 --- Buffer Planning --- p.49Chapter 3.3.1 --- Buffer Clustering with Feasible Region --- p.51Chapter 3.3.2 --- Routability-driven Repeater Clustering Algorithm with Iterative Deletion --- p.55Chapter 3.3.3 --- Planning Buffer Locations by Network Flow --- p.58Chapter 3.3.4 --- Buffer Planning using Integer Multicommodity Flow --- p.60Chapter 3.3.5 --- Buffer Planning Problem using Tile Graph --- p.60Chapter 3.3.6 --- Probabilistic Analysis for Buffer Block Planning --- p.62Chapter 3.3.7 --- Fast Buffer Planning and Congestion Optimization --- p.63Chapter 3.4 --- Summary --- p.66Chapter 4 --- Congestion Evaluation: Wire Density Model --- p.68Chapter 4.1 --- Introduction --- p.68Chapter 4.2 --- Overview of Our Floorplanner --- p.70Chapter 4.3 --- Wire Density Model --- p.71Chapter 4.3.1 --- Computation of Ni --- p.72Chapter 4.3.2 --- Computation of Pi --- p.74Chapter 4.3.3 --- Usage of Mirror TBT --- p.76Chapter 4.4 --- Implementation --- p.76Chapter 4.4.1 --- Efficient Calculation of Ni --- p.76Chapter 4.4.2 --- Solving the LCA Problem Efficiently --- p.81Chapter 4.4.3 --- Cost Function --- p.81Chapter 4.4.4 --- Complexity --- p.81Chapter 4.5 --- Experimental Results --- p.82Chapter 4.6 --- Conclusion --- p.83Chapter 5 --- Buffer Planning: Simple Buffer Planning Method --- p.85Chapter 5.1 --- Introduction --- p.85Chapter 5.2 --- Variable Interval Buffer Insertion Constraint --- p.87Chapter 5.3 --- Overview of Our Floorplanner --- p.88Chapter 5.4 --- Buffer Planning --- p.89Chapter 5.4.1 --- Feasible Grids --- p.89Chapter 5.4.2 --- Table Look-up Approach --- p.89Chapter 5.5 --- Implementation --- p.91Chapter 5.5.1 --- Building the Look-up Tables --- p.91Chapter 5.5.2 --- An Example of Look-up Table Construction --- p.94Chapter 5.5.3 --- A Faster Approach for Building the Look-up Tables --- p.101Chapter 5.5.4 --- An Example of the Faster Look-up Table Construction --- p.105Chapter 5.5.5 --- I/O Pin Locations --- p.106Chapter 5.5.6 --- Cost Function --- p.110Chapter 5.5.7 --- Complexity --- p.111Chapter 5.6 --- Experimental Results --- p.112Chapter 5.6.1 --- Selected Value for A --- p.112Chapter 5.6.2 --- Performance of Our Floorplanner --- p.113Chapter 5.7 --- Conclusion --- p.116Chapter 6 --- Conclusion --- p.118Chapter A --- An Efficient Algorithm for the Least Common Ancestor Prob- lem --- p.120Bibliography --- p.12

    Methodology and Ecosystem for the Design of a Complex Network ASIC

    Full text link
    Performance of HPC systems has risen steadily. While the 10 Petaflop/s barrier has been breached in the year 2011 the next large step into the exascale era is expected sometime between the years 2018 and 2020. The EXTOLL project will be an integral part in this venture. Originally designed as a research project on FPGA basis it will make the transition to an ASIC to improve its already excelling performance even further. This transition poses many challenges that will be presented in this thesis. Nowadays, it is not enough to look only at single components in a system. EXTOLL is part of complex ecosystem which must be optimized overall since everything is tightly interwoven and disregarding some aspects can cause the whole system either to work with limited performance or even to fail. This thesis examines four different aspects in the design hierarchy and proposes efficient solutions or improvements for each of them. At first it takes a look at the design implementation and the differences between FPGA and ASIC design. It introduces a methodology to equip all on-chip memory with ECC logic automatically without the user’s input and in a transparent way so that the underlying code that uses the memory does not have to be changed. In the next step the floorplanning process is analyzed and an iterative solution is worked out based on physical and logical constraints of the EXTOLL design. Besides, a work flow for collaborative design is presented that allows multiple users to work on the design concurrently. The third part concentrates on the high-speed signal path from the chip to the connector and how it is affected by technological limitations. All constraints are analyzed and a package layout for the EXTOLL chip is proposed that is seen as the optimal solution. The last part develops a cost model for wafer and package level test and raises technological concerns that will affect the testing methodology. In order to run testing internally it proposes the development of a stand-alone test platform that is able to test packaged EXTOLL chips in every aspect
    corecore