1,101 research outputs found

    Variant X-Tree Clock Distribution Network and Its Performance Evaluations

    Get PDF

    Performance and power optimization in VLSI physical design

    Get PDF
    As VLSI technology enters the nanoscale regime, a great amount of efforts have been made to reduce interconnect delay. Among them, buffer insertion stands out as an effective technique for timing optimization. A dramatic rise in on-chip buffer density has been witnessed. For example, in two recent IBM ASIC designs, 25% gates are buffers. In this thesis, three buffer insertion algorithms are presented for the procedure of performance and power optimization. The second chapter focuses on improving circuit performance under inductance effect. The new algorithm works under the dynamic programming framework and runs in provably linear time for multiple buffer types due to two novel techniques: restrictive cost bucketing and efficient delay update. The experimental results demonstrate that our linear time algorithm consistently outperforms all known RLC buffering algorithms in terms of both solution quality and runtime. That is, the new algorithm uses fewer buffers, runs in shorter time and the buffered tree has better timing. The third chapter presents a method to guarantee a high fidelity signal transmission in global bus. It proposes a new redundant via insertion technique to reduce via variation and signal distortion in twisted differential line. In addition, a new buffer insertion technique is proposed to synchronize the transmitted signals, thus further improving the effectiveness of the twisted differential line. Experimental results demonstrate a 6GHz signal can be transmitted with high fidelity using the new approaches. In contrast, only a 100MHz signal can be reliably transmitted using a single-end bus with power/ground shielding. Compared to conventional twisted differential line structure, our new techniques can reduce the magnitude of noise by 45% as witnessed in our simulation. The fourth chapter proposes a buffer insertion and gate sizing algorithm for million plus gates. The algorithm takes a combinational circuit as input instead of individual nets and greatly reduces the buffer and gate cost of the entire circuit. The algorithm has two main features: 1) A circuit partition technique based on the criticality of the primary inputs, which provides the scalability for the algorithm, and 2) A linear programming formulation of non-linear delay versus cost tradeoff, which formulates the simultaneous buffer insertion and gate sizing into linear programming problem. Experimental results on ISCAS85 circuits show that even without the circuit partition technique, the new algorithm achieves 17X speedup compared with path based algorithm. In the meantime, the new algorithm saves 16.0% buffer cost, 4.9% gate cost, 5.8% total cost and results in less circuit delay

    Delay Extraction Based Equivalent Elmore Model For RLC On-Chip Interconnects

    Get PDF
    As feature sizes for VLSI technology is shrinking, associated with higher operating frequency, signal integrity analysis of on-chip interconnects has become a real challenge for circuit designers. For this purpose, computer-aided-design (CAD) tools are necessary to simulate signal propagation of on-chip interconnects which has been an active area for research. Although SPICE models exist which can accurately predict signal degradation of interconnects, they are computationally expensive. As a result, more effective and analytic models for interconnects are required to capture the response at the output of high speed VLSI circuits. This thesis contributes to the development of efficient and closed form solution models for signal integrity analysis of on-chip interconnects. The proposed model uses a delay extraction algorithm to improve the accuracy of two-pole Elmore based models used in the analysis of on-chip distributed RLC interconnects. In the proposed scheme, the time of fight signal delay is extracted without increasing the number of poles or affecting the stability of the transfer function. This algorithm is used for both unit step and ramp inputs. From the delay rational approximation of the transfer function, analytic fitted expressions are obtained for the 50% delay and rise time for unit step input. The proposed algorithm is tested on point to point interconnections and tree structure networks. Numerical examples illustrate improved 50% delay and rise time estimates when compared to traditional Elmore based two-pole models

    Power Reductions with Energy Recovery Using Resonant Topologies

    Get PDF
    The problem of power densities in system-on-chips (SoCs) and processors has become more exacerbated recently, resulting in high cooling costs and reliability issues. One of the largest components of power consumption is the low skew clock distribution network (CDN), driving large load capacitance. This can consume as much as 70% of the total dynamic power that is lost as heat, needing elaborate sensing and cooling mechanisms. To mitigate this, resonant clocking has been utilized in several applications over the past decade. An improved energy recovering reconfigurable generalized series resonance (GSR) solution with all the critical support circuitry is developed in this work. This LC resonant clock driver is shown to save about 50% driver power (\u3e40% overall), on a 22nm process node and has 50% less skew than a non-resonant driver at 2GHz. It can operate down to 0.2GHz to support other energy savings techniques like dynamic voltage and frequency scaling (DVFS). As an example, GSR can be configured for the simpler pulse series resonance (PSR) operation to enable further power saving for double data rate (DDR) applications, by using de-skewing latches instead of flip-flop banks. A PSR based subsystem for 40% savings in clocking power with 40% driver active area reduction xii is demonstrated. This new resonant driver generates tracking pulses at each transition of clock for dual edge operation across DVFS. PSR clocking is designed to drive explicit-pulsed latches with negative setup time. Simulations using 45nm IBM/PTM device and interconnect technology models, clocking 1024 flip-flops show the reductions, compared to non-resonant clocking. DVFS range from 2GHz/1.3V to 200MHz/0.5V is obtained. The PSR frequency is set \u3e3× the clock rate, needing only 1/10th the inductance of prior-art LC resonance schemes. The skew reductions are achieved without needing to increase the interconnect widths owing to negative set-up times. Applications in data circuits are shown as well with a 90nm example. Parallel resonant and split-driver non-resonant configurations as well are derived from GSR. Tradeoffs in timing performance versus power, based on theoretical analysis, are compared for the first time and verified. This enables synthesis of an optimal topology for a given application from the GSR

    Energy-efficient acceleration of MPEG-4 compression tools

    Get PDF
    We propose novel hardware accelerator architectures for the most computationally demanding algorithms of the MPEG-4 video compression standard-motion estimation, binary motion estimation (for shape coding), and the forward/inverse discrete cosine transforms (incorporating shape adaptive modes). These accelerators have been designed using general low-energy design philosophies at the algorithmic/architectural abstraction levels. The themes of these philosophies are avoiding waste and trading area/performance for power and energy gains. Each core has been synthesised targeting TSMC 0.09 μm TCBN90LP technology, and the experimental results presented in this paper show that the proposed cores improve upon the prior art

    Design, Extraction, and Optimization Tool Flows and Methodologies for Homogeneous and Heterogeneous Multi-Chip 2.5D Systems

    Get PDF
    Chip and packaging industries are making significant progress in 2.5D design as a result of increasing popularity of their application. In advanced high-density 2.5D packages, package redistribution layers become similar to chip Back-End-of-Line routing layers, and the gap between them scales down with pin density improvement. Chiplet-package interactions become significant and severely affect system performance and reliability. Moreover, 2.5D integration offers opportunities to apply novel design techniques. The traditional die-by-die design approach neither carefully considers these interactions nor fully exploits the cross-boundary design opportunities. This thesis presents chiplet-package cross-boundary design, extraction, analysis, and optimization tool flows and methodologies for high-density 2.5D packaging technologies. A holistic flow is presented that can capture all parasitics from chiplets and the package and improve system performance through iterative optimizations. Several design techniques are demonstrated for agile development and quick turn-around time. To validate the flow in silicon, a chip was taped out and studied in TSMC 65nm technology. As the holistic flow cannot handle heterogeneous technologies, in-context flows are presented. Three different flavors of the in-context flow are presented, which offer trade-offs between scalability and accuracy in heterogeneous 2.5D system designs. Inductance is an inseparable part of a package design. A holistic flow is presented that takes package inductance into account in timing analysis and optimization steps. Custom CAD tools are developed to make these flows compatible with the industry standard tools and the foundry model. To prove the effectiveness of the flows several design cases of an ARM Cortex-M0 are implemented for comparitive study

    Modeling and Analysis of Noise and Interconnects for On-Chip Communication Link Design

    Get PDF
    This thesis considers modeling and analysis of noise and interconnects in onchip communication. Besides transistor count and speed, the capabilities of a modern design are often limited by on-chip communication links. These links typically consist of multiple interconnects that run parallel to each other for long distances between functional or memory blocks. Due to the scaling of technology, the interconnects have considerable electrical parasitics that affect their performance, power dissipation and signal integrity. Furthermore, because of electromagnetic coupling, the interconnects in the link need to be considered as an interacting group instead of as isolated signal paths. There is a need for accurate and computationally effective models in the early stages of the chip design process to assess or optimize issues affecting these interconnects. For this purpose, a set of analytical models is developed for on-chip data links in this thesis. First, a model is proposed for modeling crosstalk and intersymbol interference. The model takes into account the effects of inductance, initial states and bit sequences. Intersymbol interference is shown to affect crosstalk voltage and propagation delay depending on bus throughput and the amount of inductance. Next, a model is proposed for the switching current of a coupled bus. The model is combined with an existing model to evaluate power supply noise. The model is then applied to reduce both functional crosstalk and power supply noise caused by a bus as a trade-off with time. The proposed reduction method is shown to be effective in reducing long-range crosstalk noise. The effects of process variation on encoded signaling are then modeled. In encoded signaling, the input signals to a bus are encoded using additional signaling circuitry. The proposed model includes variation in both the signaling circuitry and in the wires to calculate the total delay variation of a bus. The model is applied to study level-encoded dual-rail and 1-of-4 signaling. In addition to regular voltage-mode and encoded voltage-mode signaling, current-mode signaling is a promising technique for global communication. A model for energy dissipation in RLC current-mode signaling is proposed in the thesis. The energy is derived separately for the driver, wire and receiver termination.Siirretty Doriast

    Wave computing with passive memristive networks

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Since CMOS technology approaches its physical limits, the spotlight of computing technologies and architectures shifts to unconventional computing approaches. In this area, novel computing systems, inspired by natural and mostly nonelectronic approaches, provide also new ways of performing a wide range of computations, from simple logic gates to solving computationally hard problems. Reaction-diffusion processes constitute an information processing method, occurs in nature and are capable of massive parallel and low-power computing, such as chemical computing through Belousov-Zhabotinsky reaction. In this paper, inspired by these chemical processes and based on the wave-propagation information processing taking place in the reaction-diffusion media, the novel characteristics of the nanoelectronic element memristor are utilized to design innovative circuits of electronic excitable medium to perform both classical (Boolean) calculations and to model neuromorphic computations in the same Memristor-RLC (M-RLC) reconfigurable network.Peer ReviewedPostprint (author's final draft

    Interconnect delay modeling under exponential input

    Get PDF
    Interconnect has become the dominating factor in determining the performance of VLSI deep submicron designs. With the rapid shrinking of feature size and development in the process technologies, it has been observed that the resistance per unit length of the interconnect continues to increase, capacitance per unit length remains roughly constant, and transistor or gate delay continues to decrease. This had led to the increasing dominance of interconnect delay over logic delay, and this trend is expected to continue. With this being the main bottleneck in realizing high speed circuits, complete understanding of the interconnect delay and thereby efficient and accurate delay circulation has assumed a greater significance in physical design, optimization and fast verification. In this thesis, a interconnect delay model under exponential input is presented. Because of its simple closed form expression, fast computation speed, and fidelity with respect to simulation, Elmore delay model remains popular. More accurate delay computation methods are typically central processing unit intensive and/or difficult to implement. To bridge this gap between accuracy and efficiency/simplicity, a new RC delay metric for interconnects which is as efficient as the Elmore metric, but more accurate, is proposed. However, there is no interconnect delay model considering exponential input waveform existing in the literature. The proposed Exponential Delay Metric uses exponential waveform as input and captures resistive shielding effects by modeling the downstream by a [pi]-model. An application of the delay model to perform interconnect optimization using wire sizing is also presented. Experimental results show that the proposed delay model is significantly more accurate than the existing interconnect delay models
    corecore