99 research outputs found

    Heterogeneous 2.5D integration on through silicon interposer

    Get PDF
    © 2015 AIP Publishing LLC. Driven by the need to reduce the power consumption of mobile devices, and servers/data centers, and yet continue to deliver improved performance and experience by the end consumer of digital data, the semiconductor industry is looking for new technologies for manufacturing integrated circuits (ICs). In this quest, power consumed in transferring data over copper interconnects is a sizeable portion that needs to be addressed now and continuing over the next few decades. 2.5D Through-Si-Interposer (TSI) is a strong candidate to deliver improved performance while consuming lower power than in previous generations of servers/data centers and mobile devices. These low-power/high-performance advantages are realized through achievement of high interconnect densities on the TSI (higher than ever seen on Printed Circuit Boards (PCBs) or organic substrates), and enabling heterogeneous integration on the TSI platform where individual ICs are assembled at close proximity

    Reliable Design of Three-Dimensional Integrated Circuits

    Get PDF

    Architecting a One-to-many Traffic-Aware and Secure Millimeter-Wave Wireless Network-in-Package Interconnect for Multichip Systems

    Get PDF
    With the aggressive scaling of device geometries, the yield of complex Multi Core Single Chip(MCSC) systems with many cores will decrease due to the higher probability of manufacturing defects especially, in dies with a large area. Disintegration of large System-on-Chips(SoCs) into smaller chips called chiplets has shown to improve the yield and cost of complex systems. Therefore, platform-based computing modules such as embedded systems and micro-servers have already adopted Multi Core Multi Chip (MCMC) architectures overMCSC architectures. Due to the scaling of memory intensive parallel applications in such systems, data is more likely to be shared among various cores residing in different chips resulting in a significant increase in chip-to-chip traffic, especially one-to-many traffic. This one-to-many traffic is originated mainly to maintain cache-coherence between many cores residing in multiple chips. Besides, one-to-many traffics are also exploited by many parallel programming models, system-level synchronization mechanisms, and control signals. How-ever, state-of-the-art Network-on-Chip (NoC)-based wired interconnection architectures do not provide enough support as they handle such one-to-many traffic as multiple unicast trafficusing a multi-hop MCMC communication fabric. As a result, even a small portion of such one-to-many traffic can significantly reduce system performance as traditional NoC-basedinterconnect cannot mask the high latency and energy consumption caused by chip-to-chipwired I/Os. Moreover, with the increase in memory intensive applications and scaling of MCMC systems, traditional NoC-based wired interconnects fail to provide a scalable inter-connection solution required to support the increased cache-coherence and synchronization generated one-to-many traffic in future MCMC-based High-Performance Computing (HPC) nodes. Therefore, these computation and memory intensive MCMC systems need an energy-efficient, low latency, and scalable one-to-many (broadcast/multicast) traffic-aware interconnection infrastructure to ensure high-performance. Research in recent years has shown that Wireless Network-in-Package (WiNiP) architectures with CMOS compatible Millimeter-Wave (mm-wave) transceivers can provide a scalable, low latency, and energy-efficient interconnect solution for on and off-chip communication. In this dissertation, a one-to-many traffic-aware WiNiP interconnection architecture with a starvation-free hybrid Medium Access Control (MAC), an asymmetric topology, and a novel flow control has been proposed. The different components of the proposed architecture are individually one-to-many traffic-aware and as a system, they collaborate with each other to provide required support for one-to-many traffic communication in a MCMC environment. It has been shown that such interconnection architecture can reduce energy consumption and average packet latency by 46.96% and 47.08% respectively for MCMC systems. Despite providing performance enhancements, wireless channel, being an unguided medium, is vulnerable to various security attacks such as jamming induced Denial-of-Service (DoS), eavesdropping, and spoofing. Further, to minimize the time-to-market and design costs, modern SoCs often use Third Party IPs (3PIPs) from untrusted organizations. An adversary either at the foundry or at the 3PIP design house can introduce a malicious circuitry, to jeopardize an SoC. Such malicious circuitry is known as a Hardware Trojan (HT). An HTplanted in the WiNiP from a vulnerable design or manufacturing process can compromise a Wireless Interface (WI) to enable illegitimate transmission through the infected WI resulting in a potential DoS attack for other WIs in the MCMC system. Moreover, HTs can be used for various other malicious purposes, including battery exhaustion, functionality subversion, and information leakage. This information when leaked to a malicious external attackercan reveals important information regarding the application suites running on the system, thereby compromising the user profile. To address persistent jamming-based DoS attack in WiNiP, in this dissertation, a secure WiNiP interconnection architecture for MCMC systems has been proposed that re-uses the one-to-many traffic-aware MAC and existing Design for Testability (DFT) hardware along with Machine Learning (ML) approach. Furthermore, a novel Simulated Annealing (SA)-based routing obfuscation mechanism was also proposed toprotect against an HT-assisted novel traffic analysis attack. Simulation results show that,the ML classifiers can achieve an accuracy of 99.87% for DoS attack detection while SA-basedrouting obfuscation could reduce application detection accuracy to only 15% for HT-assistedtraffic analysis attack and hence, secure the WiNiP fabric from age-old and emerging attacks

    Cross-layer design of thermally-aware 2.5D systems

    Full text link
    Over the past decade, CMOS technology scaling has slowed down. To sustain the historic performance improvement predicted by Moore's Law, in the mid-2000s the computing industry moved to using manycore systems and exploiting parallelism. The on-chip power densities of manycore systems, however, continued to increase after the breakdown of Dennard's Scaling. This leads to the `dark silicon' problem, whereby not all cores can operate at the highest frequency or can be turned on simultaneously due to thermal constraints. As a result, we have not been able to take full advantage of the parallelism in manycore systems. One of the 'More than Moore' approaches that is being explored to address this problem is integration of diverse functional components onto a substrate using 2.5D integration technology. 2.5D integration provides opportunities to exploit chiplet placement flexibility to address the dark silicon problem and mitigate the thermal stress of today's high-performance systems. These opportunities can be leveraged to improve the overall performance of the manycore heterogeneous computing systems. Broadly, this thesis aims at designing thermally-aware 2.5D systems. More specifically, to address the dark silicon problem of manycore systems, we first propose a single-layer thermally-aware chiplet organization methodology for homogeneous 2.5D systems. The key idea is to strategically insert spacing between the chiplets of a 2.5D manycore system to lower the operating temperature, and thus reclaim dark silicon by allowing more active cores and/or higher operating frequency under a temperature threshold. We investigate manufacturing cost and thermal behavior of 2.5D systems, then formulate and solve an optimization problem that jointly maximizes performance and minimizes manufacturing cost. We then enhance our methodology by incorporating a cross-layer co-optimization approach. We jointly maximize performance and minimize manufacturing cost and operating temperature across logical, physical, and circuit layers. We propose a novel gas-station link design that enables pipelining in passive interposers. We then extend our thermally-aware optimization methodology for network routing and chiplet placement of heterogeneous 2.5D systems, which consist of central processing unit (CPU) chiplets, graphics processing unit (GPU) chiplets, accelerator chiplets, and/or memory stacks. We jointly minimize the total wirelength and the system temperature. Our enhanced methodology increases the thermal design power budget and thereby improves thermal-constraint performance of the system

    Thermal performance enhancement of packaging substrates with integrated vapor chamber

    Get PDF
    The first part of this research investigates the effects of copper structures, such as copper through-package-vias (TPVs), and copper traces in redistribution layer (RDL), on the thermal performance of glass interposers through numerical and experimental approaches. Numerical parametric study on 2.5D interposers shows that as more copper structures are incorporated in glass interposers, the performance of silicon and glass interposers becomes closer, showing 31% difference in thermal resistance, compared to 53% difference without any copper structures in both interposers. In the second part of this study, a thermal model of glass interposer mounted on the vapor chamber integrated PCB is developed using multi-scale modeling scheme. The comparison of thermal performance between silicon and glass interposers shows that integration of vapor chamber with PCB makes thermal performance of both interposers almost identical, overcoming the limitation posed by low thermal conductivity of glass. The third part of this thesis focuses on design, fabrication, and performance measurement of PCB integrated with vapor chamber. Copper micropillar wick structure is fabricated on PCB with electroplating process, and its wettability is enhanced by silica nanoparticle coating. Design of the wick for the vapor chamber is determined based on the capillary performance and permeability test results. Fabricated device with ultra-thin thickness (~800 µm) shows higher thermal performance than copper plated PCB with the same thickness. Finally, 3D computational fluid dynamics/heat transfer model of the vapor chamber is developed, and modeling result is compared with test result.Ph.D

    Investigation into yield and reliability enhancement of TSV-based three-dimensional integration circuits

    No full text
    Three dimensional integrated circuits (3D ICs) have been acknowledged as a promising technology to overcome the interconnect delay bottleneck brought by continuous CMOS scaling. Recent research shows that through-silicon-vias (TSVs), which act as vertical links between layers, pose yield and reliability challenges for 3D design. This thesis presents three original contributions.The first contribution presents a grouping-based technique to improve the yield of 3D ICs under manufacturing TSV defects, where regular and redundant TSVs are partitioned into groups. In each group, signals can select good TSVs using rerouting multiplexers avoiding defective TSVs. Grouping ratio (regular to redundant TSVs in one group) has an impact on yield and hardware overhead. Mathematical probabilistic models are presented for yield analysis under the influence of independent and clustering defect distributions. Simulation results using MATLAB show that for a given number of TSVs and TSV failure rate, careful selection of grouping ratio results in achieving 100% yield at minimal hardware cost (number of multiplexers and redundant TSVs) in comparison to a design that does not exploit TSV grouping ratios. The second contribution presents an efficient online fault tolerance technique based on redundant TSVs, to detect TSV manufacturing defects and address thermal-induced reliability issue. The proposed technique accounts for both fault detection and recovery in the presence of three TSV defects: voids, delamination between TSV and landing pad, and TSV short-to-substrate. Simulations using HSPICE and ModelSim are carried out to validate fault detection and recovery. Results show that regular and redundant TSVs can be divided into groups to minimise area overhead without affecting the fault tolerance capability of the technique. Synthesis results using 130-nm design library show that 100% repair capability can be achieved with low area overhead (4% for the best case). The last contribution proposes a technique with joint consideration of temperature mitigation and fault tolerance without introducing additional redundant TSVs. This is achieved by reusing spare TSVs that are frequently deployed for improving yield and reliability in 3D ICs. The proposed technique consists of two steps: TSV determination step, which is for achieving optimal partition between regular and spare TSVs into groups; The second step is TSV placement, where temperature mitigation is targeted while optimizing total wirelength and routing difference. Simulation results show that using the proposed technique, 100% repair capability is achieved across all (five) benchmarks with an average temperature reduction of 75.2? (34.1%) (best case is 99.8? (58.5%)), while increasing wirelength by a small amount

    Signaling in 3-D integrated circuits, benefits and challenges

    Get PDF
    Three-dimensional (3-D) or vertical integration is a design and packaging paradigm that can mitigate many of the increasing challenges related to the design of modern integrated systems. 3-D circuits have recently been at the spotlight, since these circuits provide a potent approach to enhance the performance and integrate diverse functions within amulti-plane stack. Clock networks consume a great portion of the power dissipated in a circuit. Therefore, designing a low-power clock network in synchronous circuits is an important task. This requirement is stricter for 3-D circuits due to the increased power densities. Synchronization issues can be more challenging for 3-D circuits since a clock path can spread across several planes with different physical and electrical characteristics. Consequently, designing low power clock networks for 3-D circuits is an important issue. Resonant clock networks are considered efficient low-power alternatives to conventional clock distribution schemes. These networks utilize additional inductive circuits to reduce power while delivering a full swing clock signal to the sink nodes. In this research, a design method to apply resonant clocking to synthesized clock trees is proposed. Manufacturing processes for 3-D circuits include some additional steps as compared to standard CMOS processes which makes 3-D circuits more susceptible to manufacturing defects and lowers the overall yield of the bonded 3-D stack. Testing is another complicated task for 3-D ICs, where pre-bond test is a prerequisite. Pre-bond testability, in turn, presents new challenges to 3-D clock network design primarily due to the incomplete clock distribution networks prior to the bonding of the planes. A design methodology of resonant 3-D clock networks that support wireless pre-bond testing is introduced. To efficiently address this issue, inductive links are exploited to wirelessly transmit the clock signal to the disjoint resonant clock networks. The inductors comprising the LC tanks are used as the receiver circuit for the links, essentially eliminating the need for additional circuits and/or interconnect resources during pre-bond test. Recent FPGAs are quite complex circuits which provide reconfigurablity at the cost of lower performance and higher power consumption as compared to ASIC circuits. Exploiting a large number of programmable switches, routing structures are mainly responsible for performance degradation in FPAGs. Employing 3-D technology can providemore efficient switches which drastically improve the performance and reduce the power consumption of the FPGA. RRAM switches are one of the most promising candidates to improve the FPGA routing architecture thanks to their low on-resistance and non-volatility. Along with the configurable switches, buffers are the other important element of the FPGAs routing structure. Different characteristics of RRAM switches change the properties of signal paths in RRAM-based FPGAs. The on resistance of RRAMswitches is considerably lower than CMOS pass gate switches which results in lower RC delay for RRAM-based routing paths. This different nature in critical path and signal delay in turn affect the need for intermediate buffers. Thus the buffer allocation should be reconsidered. In the last part of this research, the effect of intermediate buffers on signal propagation delay is studied and a modified buffer allocation scheme for RRAM-based FPGA routing path is proposed

    Modeling, design, materials, processes and reliability of multi-layer redistribution wiring layers on glass substrates for next generation of high-performance computing applications

    Get PDF
    There is a growing demand for high performance computing with miniaturization in many electronic systems such as servers for cloud computing, accurate weather prediction, smart mobile and wearable devices and autonomous driving cars. The development of 2.5D silicon interposers in 2010 for heterogeneous integration of graphical processing unit (GPU) to high bandwidth memory (HBM) dies addressed this demand to a certain extent. The back-end-of-line (BEOL) RDL processes in silicon interposers have reached the peak with data rate per signal trace due to the high resistance and capacitance of BEOL RDL, limiting the system bandwidth for 2.5D silicon interposers. The cost of such interposers is also high for large body size substrates (> 1200 sq. mm) due to the fabrication on wafer-based platforms and hence, such interposers have been primarily been used today for cost-insensitive applications like cloud computing. To address these limitations, panel-based organic package substrates with a vast range of body sizes (500-5000 sq.mm) have been under development. These low-cost, high performance panel-based substrates will bring down the cost of high-performance computing systems as well as introduce 2.5D interposers for consumer applications like mobile computing. However, such panel-based substrates have not been able to scale multi-layer polymer RDL below 5 µm RDL which is the primary requirement for 2.5D interposer substrates. The objectives of this research are to address the scaling limitations of multi-layer polymer RDL down to 2 µm and below. This research is focused on addressing these limitations by: (A) Modeling for layer-to-layer registration to predict the fundamental limit of capture pad required for laminate and glass core substrates (B) Design of multi-layer polymer RDL for 5X bandwidth and 3X lower latency than silicon BEOL RDL (C) Design and demonstration of novel materials and processes for scaling polymer RDL well below 2 µm using low-cost panel-based tools and processes (D) Reliability analysis of 2 µm multi-layer polymer RDL and identifying future needs for novel polymer dielectrics for scaling polymer RDL to sub 1-micron features.Ph.D
    • …
    corecore