956 research outputs found

    Concurrent optimization strategies for high-performance VLSI circuits

    Get PDF
    In the next generation of VLSI circuits, concurrent optimizations will be essential to achieve the performance challenges. In this dissertation, we present techniques for combining traditional timing optimization techniques to achieve a superior performance;The method of buffer insertion is used in timing optimization to either increase the driving power of a path in a circuit, or to isolate large capacitive loads that lie on noncritical or less critical paths. The procedure of transistor sizing selects the sizes of transistors within a circuit to achieve a given timing specification. Traditional design techniques perform these two optimizations as independent steps during synthesis, even though they are intimately linked and performing them in alternating steps is liable to lead to suboptimal solutions. The first part of this thesis presents a new approach for unifying transistor sizing with buffer insertion. Our algorithm achieve from 5% to 49% area reduction compared with the results of a standard transistor sizing algorithm;The next part of the thesis deals with the problem of collapsing gates for technology mapping. Two new techniques are proposed. The first method, the odd-level transistor replacement (OTR) method, performs technology mapping without the restriction of a fixed library size, and maps a circuit to a virtual library of complex static CMOS gates. The second technique, the Static CMOS/PTL method, uses a mix of static CMOS and pass transistor logic (PTL) to realize the circuit, using the relation between PTL and binary decision diagrams. The methods are very efficient and can handle all ISCAS\u2785 benchmark circuits in minutes. On average, it was found that the OTR method gave 40%, and the Static/PTL gave 50% delay reductions over SIS, with substantial area savings;Finally, we extend the technology mapping work to interleave it with placement in a single optimization. Conventional methods that perform these steps separately will not be adequate for next-generation circuits. Our approach presents an integrated solution to this problem, and shows an average of 28.19%, and a maximum of 78.42% improvement in the delay over a method that performs the two optimizations in separate steps

    Gate-level timing analysis and waveform evaluation

    Get PDF
    Static timing analysis (STA) is an integral part of modern VLSI chip design. Table lookup based methods are widely used in current industry due to its fast runtime and mature algorithms. Conventional STA algorithms based on table-lookup methods are developed under many assumptions in timing analysis; however, most of those assumptions, such as that input signals and output signals can be accurately modeled as ramp waveforms, are no longer satisfactory to meet the increasing demand of accuracy for new technologies. In this dissertation, we discuss several crucial issues that conventional STA has not taken into consideration, and propose new methods to handle these issues and show that new methods produce accurate results. In logic circuits, gates may have multiple inputs and signals can arrive at these inputs at different times and with different waveforms. Different arrival times and waveforms of signals can cause very different responses. However, multiple-input transition effects are totally overlooked by current STA tools. Using a conventional single-input transition model when multiple-input transition happens can cause significant estimation errors in timing analysis. Previous works on this issue focus on developing a complicated gate model to simulate the behavior of logic gates. These methods have high computational cost and have to make significant changes to the prevailing STA tools, and are thus not feasible in practice. This dissertation proposes a simplified gate model, uses transistor connection structures to capture the behavior of multiple-input transitions and requires no change to the current STA tools. Another issue with table lookup based methods is that the load of each gate in technology libraries is modeled as a single lumped capacitor. But in the real circuit, the Abstract 2 gate connects to its subsequent gates via metal wires. As the feature size of integrated circuit scales down, the interconnection cannot be seen as a simple capacitor since the resistive shielding effect will largely affect the equivalent capacitance seen from the gate. As the interconnection has numerous structures, tabulating the timing data for various interconnection structures is not feasible. In this dissertation, by using the concept of equivalent admittance, we reduce an arbitrary interconnection structure into an equivalent π-model RC circuit. Many previous works have mapped the π-model to an effective capacitor, which makes the table lookup based methods useful again. However, a capacitor cannot be equivalent to a π-model circuit, and will thus result in significant inaccuracy in waveform evaluation. In order to obtain an accurate waveform at gate output, a piecewise waveform evaluation method is proposed in this dissertation. Each part of the piecewise waveform is evaluated according to the gate characteristic and load structures. Another contribution of this dissertation research is a proposed equivalent waveform search method. The signal waveforms can be very complicated in the real circuits because of noises, race hazards, etc. The conventional STA only uses one attribute (i.e., transition time) to describe the waveform shape which can cause significant estimation errors. Our approach is to develop heuristic search functions to find equivalent ramps to approximate input waveforms. Here the transition time of a final ramp can be completely different from that of the original waveform, but we can get higher accuracy on output arrival time and transition time. All of the methods mentioned in this dissertation require no changes to the prevailing STA tools, and have been verified across different process technologies

    Modeling and Analysis of Noise and Interconnects for On-Chip Communication Link Design

    Get PDF
    This thesis considers modeling and analysis of noise and interconnects in onchip communication. Besides transistor count and speed, the capabilities of a modern design are often limited by on-chip communication links. These links typically consist of multiple interconnects that run parallel to each other for long distances between functional or memory blocks. Due to the scaling of technology, the interconnects have considerable electrical parasitics that affect their performance, power dissipation and signal integrity. Furthermore, because of electromagnetic coupling, the interconnects in the link need to be considered as an interacting group instead of as isolated signal paths. There is a need for accurate and computationally effective models in the early stages of the chip design process to assess or optimize issues affecting these interconnects. For this purpose, a set of analytical models is developed for on-chip data links in this thesis. First, a model is proposed for modeling crosstalk and intersymbol interference. The model takes into account the effects of inductance, initial states and bit sequences. Intersymbol interference is shown to affect crosstalk voltage and propagation delay depending on bus throughput and the amount of inductance. Next, a model is proposed for the switching current of a coupled bus. The model is combined with an existing model to evaluate power supply noise. The model is then applied to reduce both functional crosstalk and power supply noise caused by a bus as a trade-off with time. The proposed reduction method is shown to be effective in reducing long-range crosstalk noise. The effects of process variation on encoded signaling are then modeled. In encoded signaling, the input signals to a bus are encoded using additional signaling circuitry. The proposed model includes variation in both the signaling circuitry and in the wires to calculate the total delay variation of a bus. The model is applied to study level-encoded dual-rail and 1-of-4 signaling. In addition to regular voltage-mode and encoded voltage-mode signaling, current-mode signaling is a promising technique for global communication. A model for energy dissipation in RLC current-mode signaling is proposed in the thesis. The energy is derived separately for the driver, wire and receiver termination.Siirretty Doriast

    Design and modelling of variability tolerant on-chip communication structures for future high performance system on chip designs

    Get PDF
    The incessant technology scaling has enabled the integration of functionally complex System-on-Chip (SoC) designs with a large number of heterogeneous systems on a single chip. The processing elements on these chips are integrated through on-chip communication structures which provide the infrastructure necessary for the exchange of data and control signals, while meeting the strenuous physical and design constraints. The use of vast amounts of on chip communications will be central to future designs where variability is an inherent characteristic. For this reason, in this thesis we investigate the performance and variability tolerance of typical on-chip communication structures. Understanding of the relationship between variability and communication is paramount for the designers; i.e. to devise new methods and techniques for designing performance and power efficient communication circuits in the forefront of challenges presented by deep sub-micron (DSM) technologies. The initial part of this work investigates the impact of device variability due to Random Dopant Fluctuations (RDF) on the timing characteristics of basic communication elements. The characterization data so obtained can be used to estimate the performance and failure probability of simple links through the methodology proposed in this work. For the Statistical Static Timing Analysis (SSTA) of larger circuits, a method for accurate estimation of the probability density functions of different circuit parameters is proposed. Moreover, its significance on pipelined circuits is highlighted. Power and area are one of the most important design metrics for any integrated circuit (IC) design. This thesis emphasises the consideration of communication reliability while optimizing for power and area. A methodology has been proposed for the simultaneous optimization of performance, area, power and delay variability for a repeater inserted interconnect. Similarly for multi-bit parallel links, bandwidth driven optimizations have also been performed. Power and area efficient semi-serial links, less vulnerable to delay variations than the corresponding fully parallel links are introduced. Furthermore, due to technology scaling, the coupling noise between the link lines has become an important issue. With ever decreasing supply voltages, and the corresponding reduction in noise margins, severe challenges are introduced for performing timing verification in the presence of variability. For this reason an accurate model for crosstalk noise in an interconnection as a function of time and skew is introduced in this work. This model can be used for the identification of skew condition that gives maximum delay noise, and also for efficient design verification

    Delay Extraction Based Equivalent Elmore Model For RLC On-Chip Interconnects

    Get PDF
    As feature sizes for VLSI technology is shrinking, associated with higher operating frequency, signal integrity analysis of on-chip interconnects has become a real challenge for circuit designers. For this purpose, computer-aided-design (CAD) tools are necessary to simulate signal propagation of on-chip interconnects which has been an active area for research. Although SPICE models exist which can accurately predict signal degradation of interconnects, they are computationally expensive. As a result, more effective and analytic models for interconnects are required to capture the response at the output of high speed VLSI circuits. This thesis contributes to the development of efficient and closed form solution models for signal integrity analysis of on-chip interconnects. The proposed model uses a delay extraction algorithm to improve the accuracy of two-pole Elmore based models used in the analysis of on-chip distributed RLC interconnects. In the proposed scheme, the time of fight signal delay is extracted without increasing the number of poles or affecting the stability of the transfer function. This algorithm is used for both unit step and ramp inputs. From the delay rational approximation of the transfer function, analytic fitted expressions are obtained for the 50% delay and rise time for unit step input. The proposed algorithm is tested on point to point interconnections and tree structure networks. Numerical examples illustrate improved 50% delay and rise time estimates when compared to traditional Elmore based two-pole models

    Exploration and Design of High Performance Variation Tolerant On-Chip Interconnects

    Get PDF
    Siirretty Doriast

    Throughput-Centric Wave-Pipelined Interconnect Circuits for Gigascale Integration

    Get PDF
    The central thesis of this research is that VLSI interconnect design strategies should shift from using global wires that can support only a single binary transition during the latency of the line to global wires that can sustain multiple bits traveling simultaneously along the length of the line. It is shown in this thesis that such throughput-centric multibit transmission can be achieved by wave-pipelining the interconnects using repeaters. A holistic analysis of wave-pipelined interconnect circuits, along with the full-custom optimization of these circuits, is performed in this research. With the help of models and methodologies developed in this thesis, the design rules for repeater insertion are crafted to simultaneously optimize performance, power, and area of VLSI global interconnect networks through a simultaneous application of voltage scaling and wire sizing. A qualitative analysis of latency, throughput, signal integrity, power dissipation, and area is performed that compares the results of design optimizations in this work to those of conventional global interconnect circuits. The objective of this thesis is to study the circuit- and system-level opportunities of voltage scaling, wire sizing, and repeater insertion in wave-pipelined global interconnect networks that are implemented in deep submicron technologies.Ph.D.Committee Chair: Davis, Jeffrey; Committee Member: Kohl, Paul; Committee Member: Meindl, James; Committee Member: Swaminathan, Madhavan; Committee Member: Wills, D. Scot

    Power Management for Deep Submicron Microprocessors

    Get PDF
    As VLSI technology scales, the enhanced performance of smaller transistors comes at the expense of increased power consumption. In addition to the dynamic power consumed by the circuits there is a tremendous increase in the leakage power consumption which is further exacerbated by the increasing operating temperatures. The total power consumption of modern processors is distributed between the processor core, memory and interconnects. In this research two novel power management techniques are presented targeting the functional units and the global interconnects. First, since most leakage control schemes for processor functional units are based on circuit level techniques, such schemes inherently lack information about the operational profile of higher-level components of the system. This is a barrier to the pivotal task of predicting standby time. Without this prediction, it is extremely difficult to assess the value of any leakage control scheme. Consequently, a methodology that can predict the standby time is highly beneficial in bridging the gap between the information available at the application level and the circuit implementations. In this work, a novel Dynamic Sleep Signal Generator (DSSG) is presented. It utilizes the usage traces extracted from cycle accurate simulations of benchmark programs to predict the long standby periods associated with the various functional units. The DSSG bases its decisions on the current and previous standby state of the functional units to accurately predict the length of the next standby period. The DSSG presents an alternative to Static Sleep Signal Generation (SSSG) based on static counters that trigger the generation of the sleep signal when the functional units idle for a prespecified number of cycles. The test results of the DSSG are obtained by the use of a modified RISC superscalar processor, implemented by SimpleScalar, the most widely accepted open source vehicle for architectural analysis. In addition, the results are further verified by a Simultaneous Multithreading simulator implemented by SMTSIM. Leakage saving results shows an increase of up to 146% in leakage savings using the DSSG versus the SSSG, with an accuracy of 60-80% for predicting long standby periods. Second, chip designers in their effort to achieve timing closure, have focused on achieving the lowest possible interconnect delay through buffer insertion and routing techniques. This approach, though, taxes the power budget of modern ICs, especially those intended for wireless applications. Also, in order to achieve more functionality, die sizes are constantly increasing. This trend is leading to an increase in the average global interconnect length which, in turn, requires more buffers to achieve timing closure. Unconstrained buffering is bound to adversely affect the overall chip performance, if the power consumption is added as a major performance metric. In fact, the number of global interconnect buffers is expected to reach hundreds of thousands to achieve an appropriate timing closure. To mitigate the impact of the power consumed by the interconnect buffers, a power-efficient multi-pin routing technique is proposed in this research. The problem is based on a graph representation of the routing possibilities, including buffer insertion and identifying the least power path between the interconnect source and set of sinks. The novel multi-pin routing technique is tested by applying it to the ISPD and IBM benchmarks to verify the accuracy, complexity, and solution quality. Results obtained indicate that an average power savings as high as 32% for the 130-nm technology is achieved with no impact on the maximum chip frequency

    Performance and power optimization in VLSI physical design

    Get PDF
    As VLSI technology enters the nanoscale regime, a great amount of efforts have been made to reduce interconnect delay. Among them, buffer insertion stands out as an effective technique for timing optimization. A dramatic rise in on-chip buffer density has been witnessed. For example, in two recent IBM ASIC designs, 25% gates are buffers. In this thesis, three buffer insertion algorithms are presented for the procedure of performance and power optimization. The second chapter focuses on improving circuit performance under inductance effect. The new algorithm works under the dynamic programming framework and runs in provably linear time for multiple buffer types due to two novel techniques: restrictive cost bucketing and efficient delay update. The experimental results demonstrate that our linear time algorithm consistently outperforms all known RLC buffering algorithms in terms of both solution quality and runtime. That is, the new algorithm uses fewer buffers, runs in shorter time and the buffered tree has better timing. The third chapter presents a method to guarantee a high fidelity signal transmission in global bus. It proposes a new redundant via insertion technique to reduce via variation and signal distortion in twisted differential line. In addition, a new buffer insertion technique is proposed to synchronize the transmitted signals, thus further improving the effectiveness of the twisted differential line. Experimental results demonstrate a 6GHz signal can be transmitted with high fidelity using the new approaches. In contrast, only a 100MHz signal can be reliably transmitted using a single-end bus with power/ground shielding. Compared to conventional twisted differential line structure, our new techniques can reduce the magnitude of noise by 45% as witnessed in our simulation. The fourth chapter proposes a buffer insertion and gate sizing algorithm for million plus gates. The algorithm takes a combinational circuit as input instead of individual nets and greatly reduces the buffer and gate cost of the entire circuit. The algorithm has two main features: 1) A circuit partition technique based on the criticality of the primary inputs, which provides the scalability for the algorithm, and 2) A linear programming formulation of non-linear delay versus cost tradeoff, which formulates the simultaneous buffer insertion and gate sizing into linear programming problem. Experimental results on ISCAS85 circuits show that even without the circuit partition technique, the new algorithm achieves 17X speedup compared with path based algorithm. In the meantime, the new algorithm saves 16.0% buffer cost, 4.9% gate cost, 5.8% total cost and results in less circuit delay

    Interconnect Design with Large Transistor Constraints for Multi-chip Modules and Large Die Soi/Sos

    Get PDF
    corecore