438 research outputs found

    공정변이를 고려한 3차원 집적 회로 설계 및 패키징 기법

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 김태환.As CMOS scaling down, The control of variation in chip performance (i.e. speed and power) becomes highly important to improve the chip yield. The increased variation of chip performance demands additional design efforts such as the increase of guard-band or longer design turnaround time (TAT), which cause degradation of both chip performance and economic profit. Meanwhile, through-silicon via (TSV) based 3D technology has been regarded as the promising solution for long interconnect wire and huge die size problem. Since a 3D IC is manufactured by stacking multiple dies which are fabricated in different wafers, integration of the dies that have far different process characteristic can enlarge the difference of device performance on different dies within a single chip. In this dissertation, we analyze the effect of on-package (within-chip) variation on 3D IC and presents effective methods to mitigate the onpackage variation. First, a parametric yield improvement method is presented to resolve the mismatches of dies having different process characteristic. Comprehensive 3D integration algorithms considering post-silicon tuning technique is developed for the multi-layered 3D IC. Then, we show that a careful clock edge embedding in 3D clock tree can greatly reduce the impact of on-package variation on 3D clock skew and propose a two-step solution for the problem of on-package variation-aware layer embedding in 3D clock tree synthesis. In summary, this dissertation presents effective 3D integration method and 3D clock tree synthesis algorithm for process-variation tolerant 3D IC designs.Abstract i Contents ii List of Figures iv List of Tables vii 1 Introduction 1 1.1 Process Variation in 3D ICs . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contributions of This Dissertation . . . . . . . . . . . . . . . . . . . 6 2 Post-silicon Tuning Aware Die/WaferMatching Algorithms for Enhancing Parametric Yield of 3D IC Design 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 The Die-to-Die Matching Problem and Proposed Algorithm Considering Body Biasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.1 Motivation and Problem Definition . . . . . . . . . . . . . . 13 2.3.2 The Proposed Die-to-Die Matching Algorithm . . . . . . . . 15 2.4 TheWafer-to-Wafer Matching Problem and Proposed Algorithm Considering Body Biasing . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4.1 Problem Definition and The Proposed Wafer-to-Wafer Matching Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3 Edge Layer Embedding Algorithm for Mitigating On-Package Variation in 3D Clock Tree Synthesis 32 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2 Problem Definitions and Motivation . . . . . . . . . . . . . . . . . . 35 3.3 The Proposed Algorithm for On-Package Variation Aware Edge Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.3.1 Algorithm for Maximizing Layer Sharing of Edges . . . . . . 39 3.3.2 Refinement: Partial Edge Embedding on Layers . . . . . . . . 47 3.3.3 Clock Tree Routing and Buffer Insertion . . . . . . . . . . . . 49 3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4 Conclusion 64 4.1 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Abstract in Korean 72Docto

    Ring-Based Resonant Standing Wave Oscillators for 3D Clocking Applications

    Get PDF
    Ring-based resonant standing wave oscillators have been shown to be a useful clocking tech-nique that can distribute and generate a high frequency, low skew, low power, and stable clock signal. By using through-silicon-vias, this type of standing wave oscillator can be used to gener-ate the clocking scheme for 3D integrated circuits. In this thesis, we propose the use of such 3D standing wave oscillators and show how independent 3D oscillators in different stacks can syn-chronize through the use of a redistribution layer stub. Inter-chip clock synchronization is then accomplished without the need for a PLL. In addition, we propose the first 3D ring-based resonant standing wave oscillator bootstrap and reset circuit to initialize and stop oscillation. Using a 3D ring-based resonant standing wave oscillator, we propose a ring-based data fabric for 3D stacked DRAM and compare the results with existing approaches such as High Bandwidth Memory (HBM) or Wide I/O memory. We show that our Memory Architecture using a Ring-based Scheme (MARS) can provide the increases in speed necessary to overcome current memory bottlenecks, and can scale effectively as future 3D stacks become larger. Our MARS can trade off power, throughput, and latency to match different application requirements. By using a narrow bus, and connecting it to all channels, the MARS8 can provide an alternative memory configuration with ∼ 6.9× lower power consumption than HBM, and ∼ 2.7× faster speeds than Wide I/O. Using multiple ring topologies in the same stack, the channel count can double from 8 to 16, and then to 32. This is possible since MARS uses about 4× fewer TSVs per channel than HBM or Wide I/O. This provides speeds up to ∼ 4.2× faster than traditional HBM. This scalable architecture allows higher throughput and faster system performance for next-generation DRAM. The MARS topology proposed in this thesis can be used in a variety of computing systems, from lightweight IoT to large-scale data centers

    Exploring Adaptive Implementation of On-Chip Networks

    Get PDF
    As technology geometries have shrunk to the deep submicron regime, the communication delay and power consumption of global interconnections in high performance Multi- Processor Systems-on-Chip (MPSoCs) are becoming a major bottleneck. The Network-on- Chip (NoC) architecture paradigm, based on a modular packet-switched mechanism, can address many of the on-chip communication issues such as performance limitations of long interconnects and integration of large number of Processing Elements (PEs) on a chip. The choice of routing protocol and NoC structure can have a significant impact on performance and power consumption in on-chip networks. In addition, building a high performance, area and energy efficient on-chip network for multicore architectures requires a novel on-chip router allowing a larger network to be integrated on a single die with reduced power consumption. On top of that, network interfaces are employed to decouple computation resources from communication resources, to provide the synchronization between them, and to achieve backward compatibility with existing IP cores. Three adaptive routing algorithms are presented as a part of this thesis. The first presented routing protocol is a congestion-aware adaptive routing algorithm for 2D mesh NoCs which does not support multicast (one-to-many) traffic while the other two protocols are adaptive routing models supporting both unicast (one-to-one) and multicast traffic. A streamlined on-chip router architecture is also presented for avoiding congested areas in 2D mesh NoCs via employing efficient input and output selection. The output selection utilizes an adaptive routing algorithm based on the congestion condition of neighboring routers while the input selection allows packets to be serviced from each input port according to its congestion level. Moreover, in order to increase memory parallelism and bring compatibility with existing IP cores in network-based multiprocessor architectures, adaptive network interface architectures are presented to use multiple SDRAMs which can be accessed simultaneously. In addition, a smart memory controller is integrated in the adaptive network interface to improve the memory utilization and reduce both memory and network latencies. Three Dimensional Integrated Circuits (3D ICs) have been emerging as a viable candidate to achieve better performance and package density as compared to traditional 2D ICs. In addition, combining the benefits of 3D IC and NoC schemes provides a significant performance gain for 3D architectures. In recent years, inter-layer communication across multiple stacked layers (vertical channel) has attracted a lot of interest. In this thesis, a novel adaptive pipeline bus structure is proposed for inter-layer communication to improve the performance by reducing the delay and complexity of traditional bus arbitration. In addition, two mesh-based topologies for 3D architectures are also introduced to mitigate the inter-layer footprint and power dissipation on each layer with a small performance penalty.Siirretty Doriast

    Exploration and Design of Power-Efficient Networked Many-Core Systems

    Get PDF
    Multiprocessing is a promising solution to meet the requirements of near future applications. To get full benefit from parallel processing, a manycore system needs efficient, on-chip communication architecture. Networkon- Chip (NoC) is a general purpose communication concept that offers highthroughput, reduced power consumption, and keeps complexity in check by a regular composition of basic building blocks. This thesis presents power efficient communication approaches for networked many-core systems. We address a range of issues being important for designing power-efficient manycore systems at two different levels: the network-level and the router-level. From the network-level point of view, exploiting state-of-the-art concepts such as Globally Asynchronous Locally Synchronous (GALS), Voltage/ Frequency Island (VFI), and 3D Networks-on-Chip approaches may be a solution to the excessive power consumption demanded by today’s and future many-core systems. To this end, a low-cost 3D NoC architecture, based on high-speed GALS-based vertical channels, is proposed to mitigate high peak temperatures, power densities, and area footprints of vertical interconnects in 3D ICs. To further exploit the beneficial feature of a negligible inter-layer distance of 3D ICs, we propose a novel hybridization scheme for inter-layer communication. In addition, an efficient adaptive routing algorithm is presented which enables congestion-aware and reliable communication for the hybridized NoC architecture. An integrated monitoring and management platform on top of this architecture is also developed in order to implement more scalable power optimization techniques. From the router-level perspective, four design styles for implementing power-efficient reconfigurable interfaces in VFI-based NoC systems are proposed. To enhance the utilization of virtual channel buffers and to manage their power consumption, a partial virtual channel sharing method for NoC routers is devised and implemented. Extensive experiments with synthetic and real benchmarks show significant power savings and mitigated hotspots with similar performance compared to latest NoC architectures. The thesis concludes that careful codesigned elements from different network levels enable considerable power savings for many-core systems.Siirretty Doriast

    Voltage stacking for near/sub-threshold operation

    Get PDF

    Development of Trigger and Control Systems for CMS

    Get PDF
    During the year of 2007, the Large Hadron Collider (LHC) and its four main detectors will begin operation with a view to answering the most pressing questions in particle physics. However before one can analyse the data produced to find the rare phenomena being looked for, both the detector and readout electronics must be thoroughly tested to ensure that the system will operate in a consistent way. The Compact Muon Solenoid (CMS) is one of the two general-purpose detectors at CERN. The tracking component of the design produces more data than any previous detector used in particle physics, with approximately ten million detector channels. The data from the detector is processed by the tracker Front End Driver (FED). The large data volume necessitated the development of a buffering and throttling system to prevent buffer overflow both on and off the detector. A critical component of this system is the APV emulator (APVe), which vetoes trigger decisions based on buffer status in the tracker. The commissioning of these components, along with a large part of the Timing, Trigger and Control (TTC) system is discussed, including the various modifications that were made to improve the robustness of the full system. Another key piece of the CMS electronics is the calorimeter trigger system, responsible for identifying âinteresting' physical events in a background of well-understood phenomena using calorimetric information. Calorimeter information is processed to identify various trigger objects by the Global Calorimeter Trigger (GCT). The first component of this system is the Source card, which has been developed to transfer data from the Regional Calorimeter Trigger (RCT) to the Leaf card, the processing engine of the GCT. The use of modern programmable logic with high speed optical links is discussed, emphasising its use for data concentration and the benefit it confers to the processing algorithms. Looking forward to Super-LHC, a possible addition to the CMS Level-1 trigger system is discussed, incorporating information from a new pixel detector with an alternative stacked geometry that allows the possibility of on-detector data rate reduction by means of a transverse momentum cut. A toy Monte Carlo was developed to study detector performance. Issues with high-speed reconstruction and the complications of on-detector data rate reduction are also discussed

    Power Reductions with Energy Recovery Using Resonant Topologies

    Get PDF
    The problem of power densities in system-on-chips (SoCs) and processors has become more exacerbated recently, resulting in high cooling costs and reliability issues. One of the largest components of power consumption is the low skew clock distribution network (CDN), driving large load capacitance. This can consume as much as 70% of the total dynamic power that is lost as heat, needing elaborate sensing and cooling mechanisms. To mitigate this, resonant clocking has been utilized in several applications over the past decade. An improved energy recovering reconfigurable generalized series resonance (GSR) solution with all the critical support circuitry is developed in this work. This LC resonant clock driver is shown to save about 50% driver power (\u3e40% overall), on a 22nm process node and has 50% less skew than a non-resonant driver at 2GHz. It can operate down to 0.2GHz to support other energy savings techniques like dynamic voltage and frequency scaling (DVFS). As an example, GSR can be configured for the simpler pulse series resonance (PSR) operation to enable further power saving for double data rate (DDR) applications, by using de-skewing latches instead of flip-flop banks. A PSR based subsystem for 40% savings in clocking power with 40% driver active area reduction xii is demonstrated. This new resonant driver generates tracking pulses at each transition of clock for dual edge operation across DVFS. PSR clocking is designed to drive explicit-pulsed latches with negative setup time. Simulations using 45nm IBM/PTM device and interconnect technology models, clocking 1024 flip-flops show the reductions, compared to non-resonant clocking. DVFS range from 2GHz/1.3V to 200MHz/0.5V is obtained. The PSR frequency is set \u3e3× the clock rate, needing only 1/10th the inductance of prior-art LC resonance schemes. The skew reductions are achieved without needing to increase the interconnect widths owing to negative set-up times. Applications in data circuits are shown as well with a 90nm example. Parallel resonant and split-driver non-resonant configurations as well are derived from GSR. Tradeoffs in timing performance versus power, based on theoretical analysis, are compared for the first time and verified. This enables synthesis of an optimal topology for a given application from the GSR

    Physical Design Methodologies for Low Power and Reliable 3D ICs

    Get PDF
    As the semiconductor industry struggles to maintain its momentum down the path following the Moore's Law, three dimensional integrated circuit (3D IC) technology has emerged as a promising solution to achieve higher integration density, better performance, and lower power consumption. However, despite its significant improvement in electrical performance, 3D IC presents several serious physical design challenges. In this dissertation, we investigate physical design methodologies for 3D ICs with primary focus on two areas: low power 3D clock tree design, and reliability degradation modeling and management. Clock trees are essential parts for digital system which dissipate a large amount of power due to high capacitive loads. The majority of existing 3D clock tree designs focus on minimizing the total wire length, which produces sub-optimal results for power optimization. In this dissertation, we formulate a 3D clock tree design flow which directly optimizes for clock power. Besides, we also investigate the design methodology for clock gating a 3D clock tree, which uses shutdown gates to selectively turn off unnecessary clock activities. Different from the common assumption in 2D ICs that shutdown gates are cheap thus can be applied at every clock node, shutdown gates in 3D ICs introduce additional control TSVs, which compete with clock TSVs for placement resources. We explore the design methodologies to produce the optimal allocation and placement for clock and control TSVs so that the clock power is minimized. We show that the proposed synthesis flow saves significant clock power while accounting for available TSV placement area. Vertical integration also brings new reliability challenges including TSV's electromigration (EM) and several other reliability loss mechanisms caused by TSV-induced stress. These reliability loss models involve complex inter-dependencies between electrical and thermal conditions, which have not been investigated in the past. In this dissertation we set up an electrical/thermal/reliability co-simulation framework to capture the transient of reliability loss in 3D ICs. We further derive and validate an analytical reliability objective function that can be integrated into the 3D placement design flow. The reliability aware placement scheme enables co-design and co-optimization of both the electrical and reliability property, thus improves both the circuit's performance and its lifetime. Our electrical/reliability co-design scheme avoids unnecessary design cycles or application of ad-hoc fixes that lead to sub-optimal performance. Vertical integration also enables stacking DRAM on top of CPU, providing high bandwidth and short latency. However, non-uniform voltage fluctuation and local thermal hotspot in CPU layers are coupled into DRAM layers, causing a non-uniform bit-cell leakage (thereby bit flip) distribution. We propose a performance-power-resilience simulation framework to capture DRAM soft error in 3D multi-core CPU systems. In addition, a dynamic resilience management (DRM) scheme is investigated, which adaptively tunes CPU's operating points to adjust DRAM's voltage noise and thermal condition during runtime. The DRM uses dynamic frequency scaling to achieve a resilience borrow-in strategy, which effectively enhances DRAM's resilience without sacrificing performance. The proposed physical design methodologies should act as important building blocks for 3D ICs and push 3D ICs toward mainstream acceptance in the near future
    corecore