68 research outputs found

    Yield and Cost Analysis or 3D Stacked ICs

    No full text
    3D stacking is an emerging technology promising many benefits such as low latency between stacked dies, reduced power consumption, high bandwidth communication, improved form factor and package volume density, heterogeneous integration, and low-cost manufacturing. However, it requires modification of existing methods and/or introduction of new ones with respect to design, manufacturing, and testing in order to facilitate production. In this thesis three challenges are addressed: one related to manufacturing (i.e., yield improvement) and two related to testing (i.e., cost modeling and interconnect testing). Yield improvement - We propose two yield improvement schemes applicable for 3D Stacked-ICs (3D-SICs) with similar die sizes (such as memories and FPGAs): wafer matching and layer redundancy. Wafer matching is based on algorithms that select wafers with identical or similar fault maps for stacking to boost the compound yield. Our algorithms outperform yield-wise previously proposed schemes, and more importantly reduce memory and time complexity significantly. On the other hand, redundancy in 3D memories makes use not only of conventional spare rows and columns, but also of the third dimension to access either spare dies (layer redundancy) or spare cells (inter-layer redundancy). Layer redundancy showed to be effective from a yield point of view, but may seriously affect die area and cost. Inter-layer redundancy realizes even higher yield improvements; however, it requires through-silicon vias (TSVs) to scale down with one order of magnitude for area-efficient implementations. Cost Modeling - Selecting an appropriate and efficient test flow for 2.5D/3D SICs is crucial for overall cost optimization. In addition, diverse products and applications require different quality levels resulting in different test flows; these flows may require different design-for-test (DfT) features, which need to be incorporated in the various dies during an early design stage. Therefore, an appropriate cost model used to evaluate test flows with their associated DfT, while taking into account yields and die production costs, is of great importance. A proper cost modeling tool for 2.5D/3D stacked ICs is developed; the tool is referred to as 3D-COSTAR. It considers all costs involved in the whole production chain, including design, manufacturing, test, packaging, and logistics, e.g., related to shipping wafers between a foundry and a test house. 3D-COSTAR provides the estimated overall cost for 2.5D/3D-SICs and its cost breakdown for a given input parameter set, such as test flows, die yield, stack yield etc. The crucial importance of 3D-COSTAR is demonstrated by analyzing trade-offs of different complex optimization test problems such as (a) the impact of test coverage of the pre-bond silicon interposer test, (b) the impact of pre-bond testing of active dies using either dedicated probe-pads or micro-bumps, (c) the impact of mid-bond testing and logistics, and (d) the impact of different test flows on the test escapes. Interconnect Testing - A potential application of 3D-SICs is stacking of memory on logic. However, testing the TSV interconnects between such dies is challenging, as the memory and the logic die typically come from different manufacturers. Currently, proposed solutions fail to address dynamic and time-critical faults. In addition, memory vendors have in the past not been in favor to put additional DfT structures such as IEEE 1149.1 for interconnect testing on their memory devices. We propose a new Memory-Based Interconnect Test (MBIT) approach for 3D stacked memories. Our test patterns are applied by using read and write instructions to the memory and are validated by a case study where a 3D memory is assumed to be stacked on a MIPS64 processor. The main benefits of the MBIT approach include zero area overhead, detection of both static and dynamic faults, at-speed testing, flexibility, extremely short test time, and interconnect fault diagnosis.Software and Computer TechnologyElectrical Engineering, Mathematics and Computer Scienc

    A hardware Accelerator for the OpenFOAM Sparse Matrix-Vector Product

    No full text
    One of the key kernels in scientific applications is the Sparse Matrix Vector Multiplication (SMVM). Profiling OpenFOAM, a sophisticated scientific Computational Fluid Dynamics tool, proved the SMVM to be its most computational intensive kernel. A traditional way to solve such computationally intensive problems in scientific applications is to employ supercomputing power. This approach, however, provides performance efficiency at a high hardware cost. Another approach for high performance scientific computing is based on reconfigurable hardware. Recently, it is becoming more popular due to the increasing On-Chip memory, bandwidth and abundant reasonable cheaper hardware resources. The SGI Reconfigurable Application Specific Computing (RASC) library combines both approaches as it couples traditional supercomputer nodes with reconfigurable hardware. It supports the execution of computational intensive kernels on Customized Computing Units (CCU) in Field Programmable Gate Arrays (FPGA). This thesis presents the architectural design and implementation of the SMVM product for the OpenFOAM toolbox on an FPGA-enabled supercomputer. The SMVM is targeted to be a Custom Computing Unit (CCU) within the RASC machine. The proposed CCU comprises multiple Processing Elements (PE) for IEEE-754 compliant floating point double precision data. Accurate equations are developed that describe the relation between the number of PEs and the available bandwidth. With two PEs and an input bandwidth of 4.8 GB/s the hardware unit can outperform execution in pure software. Simulations suggest speedups between 2.7 and 7.3 for the SMVM kernel considering four PEs. The performance increase at the kernel level is nearly linear to the number of available PEs. The SMVM kernel has been synthesized and verified for the Virtex-4 LX200 FPGA and a hardware counter is integrated in the design to obtain the accurate performance results per CCU. Although the synthesis tool reports higher frequencies, the design has been routed and executed on the Altix 450 machine at 100 MHz. Based on our experimental results we can safely conclude that the proposed approach, using FPGAs as accelerator, has potential for application speedup for the SMVM kernel against traditional supercomputing approaches.Computer EngineeringElectrical Engineering, Mathematics and Computer Scienc

    Yield Improvement for 3D Wafer-to-Wafer Stacked Memories

    Get PDF
    Recent enhancements in process development enable the fabrication of three dimensional stacked ICs (3D-SICs) such as memories based on Wafer-to-Wafer (W2W) stacking. One of the major challenges facing W2W stacking is the low compound yield. This paper investigates compound yield improvement for W2W stacked memories using layer redundancy and compares it to wafer matching. First, an analytical model is provided to prove the added value of layer redundancy. Second, the impact of such a scheme on the manufacturing cost is evaluated. Third, these two parts are integrated to analyze the trade-off between yield improvement and its associated cost; the realized yield improvement is also compared to yield gain obtained when using wafer matching. The simulation results show that for higher stack sizes layer redundancy realizes a significant yield improvement as compared to wafer matching, even at lower cost. For example, for a stack size of six stacked layers and a die yield of 85 %, a relative yield improvement of 118.79 % is obtained with two redundant layers, while this is 14.03 % only with wafer matching. The additional cost due to redundancy pays off; the cost of producing a good 3D stacked memory chip reduces with 37.68 % when using layer redundancy and only with 12.48 % when using wafer matching. Moreover, the results show that the benefits of layer redundancy become extremely significant for lower die yields. Finally, layer redundancy and wafer matching are integrated to obtain further cost reductionsSoftware Computer TechnologyElectrical Engineering, Mathematics and Computer Scienc

    Structured Test Development Approach for Computation-in-Memory Architectures

    No full text
    Testing of Computation-in-Memory (CIM) designs based on emerging non-volatile memory technologies, such as resistive RAM (RRAM), is fundamentally different from testing traditional memories. Such designs allow not only for data storage (i.e., memory configuration) but also for the execution of logical and arithmetic operations (i.e., computing configuration). Therefore, not only significant design changes are needed in the memory array and/or in the peripheral circuits, but also new fault models and test approaches are needed. Moreover, RRAM-based CIM makes use of non-linear non-volatile devices making the defect modeling with traditional linear resistor inappropriate for such device defects. Hence, even the way of doing defect modeling has to change. This paper discusses a structured test development approach for RRAM-based CIM and highlights the test challenges and how testing CIM dies is different from the traditional way of testing logic and memory. Methods for defect modeling, fault modeling, and test development will be discussed. The paper demonstrates that unique faults can occur in the CIM die while in the computation configuration and that these faults cannot be detected by just testing the CIM die in the memory configuration. Moreover, it shows that testing the CIM die in the computation configuration reduces the overall test time while improving the outgoing product quality. Finally, the paper presents an outlook on the future of structured CIM test development.Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Quantum & Computer EngineeringComputer Engineerin

    Using Hopfield Networks to Correct Instruction Faults

    No full text
    Fault injection attacks pose an important threat to security-sensitive applications, such as secure communication and storage. By injecting faults into instructions, an attacker can cause information leakage or denial-of-service. Hence, it is important to secure the sensitive parts not only by detecting faults in the executed instructions but also by correcting them. In this work, we propose a hardware detection and correction module based on Hopfield networks. Our module is connected to the instruction buffer and validates all fetched instructions. In case faults are detected, faulty instructions are replaced by corrected ones. Experimental results on a small RISC-V processor and two RSA implementations show that we achieve near perfect detection and around 70% accurate correction with 9% area overhead. This correction rate is enough to secure some implementations for all considered attacks.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Computer EngineeringQuantum & Computer Engineerin

    Online Fault Detection and Diagnosis in RRAM

    No full text
    Resistive Random Access Memory (RRAM, or ReRAM) is a promising memory technology to replace Flash because of its low power consumption, high storage density, and simple integration in existing IC production processes. This has motivated many companies to invest in this technology. However, RRAM manufacturing introduces new failure mechanisms and faults that cause functional errors. These faults cannot all be detected by state-of-the-art test and diagnosis solutions, thus leading to slower product development and low-quality products. This paper introduces a design-for-test (DFT) based on a parallel-multi-reference read (PMRR) circuit that can detect all RRAM array faults. The PMRR circuit replaces the standard sense amplifier and compares the cell’s state to multiple references during one read operation. Thus, it can be used as a DFT scheme and a normal read circuit at once. This allows for speeding up production testing and the online detection of faults. Furthermore, the circuit is extendable so that more references can be compared, which is required for efficient diagnosis. Finally, the references can be adjusted to maximize the production yield. The circuit outperforms state-of-the-art solutions because it can detect all RRAM faults during diagnosis, production testing, and during its application in the field while minimizing yield loss.Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Computer EngineeringQuantum & Computer Engineerin

    Smart Redundancy Schemes for ANNs Against Fault Attacks

    No full text
    Artificial neural networks (ANNs) are used to accomplish a variety of tasks, including safety critical ones. Hence, it is important to protect them against faults that can influence decisions during operation. In this paper, we propose smart and low-cost redundancy schemes that protect the most vulnerable ANN parts against fault attacks. Experimental results show that the two proposed smart schemes perform similarly to dual modular redundancy (DMR) at a much lower cost, generally improve on the state of the art, and reach protection levels in the range of 93% to 99%.Accepted author manuscriptComputer EngineeringQuantum & Computer Engineerin

    Device-Aware Test for Emerging Memories: Enabling Your Test Program for DPPB Level

    No full text
    This paper introduces a new test approach: device-aware test (DAT) for emerging memory technologies such as MRAM, RRAM, and PCM. The DAT approach enables accurate models of device defects to obtain realistic fault models, which are used to develop high-quality and optimized test solutions. This is demonstrated by an application of DAT to pinhole defects in STT-MRAMs and forming defects in RRAMs.Computer EngineeringQuantum & Computer Engineerin

    System-level sub-20 nm planar and FinFET CMOS delay modelling for supply and threshold voltage scaling under process variation

    No full text
    Standard low power design utilizes a variety of approaches for supply and threshold control to reduce dynamic and idle power. At a very early stage of the design cycle, the Vdd and Vth values are estimated, based on the power budget, and then used to scale the delay and estimate the design performance. Furthermore, process variation in sub-20 nm feature technologies introduces a substantial impact on speed and power. Thus, the impact of such variation on the scaled delay has to also be considered in the performance estimation. In this paper, we propose a system-level model to estimate this delay, taking into consideration voltage scaling under within-die process variation for both planar and FinFET CMOS transistors in the sub-20 nm regime. The model is simple, has acceptable accuracy and is particularly useful for architectural-level simulations for low-power design exploration at an early stage in the design space exploration. The proposed model estimates the delay in different supply voltage and threshold voltage ranges. The model uses a modified alpha-power equation to measure the delay of the critical path of a computational logic core. The targeted technology nodes are 14 nm, 10 nm, and 7 nm for FinFETs, and 22 nm, and 16 nm for planar CMOS. Within-die process variation is assumed to be lumped in with the threshold voltage and the transistor channel length and width to simplify its impact on delay. For the given technology nodes, the average percentage error numbers of theproposed delay equation compared to hSpice are between 0.5% to 14%.Computer EngineeringQuantum & Computer Engineerin

    Testing Computation-in-Memory Architectures Based on Emerging Memories

    No full text
    Today's computing architectures and device technologies are incapable of meeting the increasingly stringent demands on energy and performance posed by evolving applications. Therefore, alternative novel post-CMOS computing architectures are being explored. One of these is a Computation-in-Memory (CIM) architecture based on memristive devices; it integrates the processing units and the storage in the same physical location (i.e., the memory based on memristive devices). Due to their advanced manufacturing processes, use of new materials, and dual functionality, testing such chips requires specific schemes and therefore special attention. This paper describes the need for testing CIM architectures, proposes a systematic test approach, and shows the strong dependency of the test solutions on the nature of the architecture. All of these will be demonstrated using a design that is designed for computation-in-memory bit-wise logical operations.Quantum & Computer EngineeringComputer Engineerin
    • …
    corecore