147 research outputs found

    Variable-width datapath for on-chip network static power reduction

    Full text link

    Two-Layer Error Control Codes Combining Rectangular and Hamming Product Codes for Cache Error

    Get PDF
    We propose a novel two-layer error control code, combining error detection capability of rectangular codes and error correction capability of Hamming product codes in an efficient way, in order to increase cache error resilience for many core systems, while maintaining low power, area and latency overhead. Based on the fact of low latency and overhead of rectangular codes and high error control capability of Hamming product codes, two-layer error control codes employ simple rectangular codes for each cache line to detect cache errors, while loading the extra Hamming product code checks bits in the case of error detection; thus enabling reliable large-scale cache operations. Analysis and experiments are conducted to evaluate the cache fault-tolerant capability of various existing solutions and the proposed approach. The results show that the proposed approach can significantly increase Mean-Error-To-Failure (METF) and Mean-Time-To-failure (MTTF) up to 2.8×, reduce storage overhead by over 57%, and increase instruction per-cycle (IPC) up to 7%, compared to complex four-way 4EC5ED; and it increases METF and MTTF up to 133×, reduces storage overhead by over 11%, and achieves a similar IPC compared to simple eight-way single-error correcting double-error detecting (SECDED). The cost of the proposed approach is no more than 4% external memory access overhead

    Infrastructures and Algorithms for Testable and Dependable Systems-on-a-Chip

    Get PDF
    Every new node of semiconductor technologies provides further miniaturization and higher performances, increasing the number of advanced functions that electronic products can offer. Silicon area is now so cheap that industries can integrate in a single chip usually referred to as System-on-Chip (SoC), all the components and functions that historically were placed on a hardware board. Although adding such advanced functionality can benefit users, the manufacturing process is becoming finer and denser, making chips more susceptible to defects. Today’s very deep-submicron semiconductor technologies (0.13 micron and below) have reached susceptibility levels that put conventional semiconductor manufacturing at an impasse. Being able to rapidly develop, manufacture, test, diagnose and verify such complex new chips and products is crucial for the continued success of our economy at-large. This trend is expected to continue at least for the next ten years making possible the design and production of 100 million transistor chips. To speed up the research, the National Technology Roadmap for Semiconductors identified in 1997 a number of major hurdles to be overcome. Some of these hurdles are related to test and dependability. Test is one of the most critical tasks in the semiconductor production process where Integrated Circuits (ICs) are tested several times starting from the wafer probing to the end of production test. Test is not only necessary to assure fault free devices but it also plays a key role in analyzing defects in the manufacturing process. This last point has high relevance since increasing time-to-market pressure on semiconductor fabrication often forces foundries to start volume production on a given semiconductor technology node before reaching the defect densities, and hence yield levels, traditionally obtained at that stage. The feedback derived from test is the only way to analyze and isolate many of the defects in today’s processes and to increase process’s yield. With the increasing need of high quality electronic products, at each new physical assembly level, such as board and system assembly, test is used for debugging, diagnosing and repairing the sub-assemblies in their new environment. Similarly, the increasing reliability, availability and serviceability requirements, lead the users of high-end products performing periodic tests in the field throughout the full life cycle. To allow advancements in each one of the above scaling trends, fundamental changes are expected to emerge in different Integrated Circuits (ICs) realization disciplines such as IC design, packaging and silicon process. These changes have a direct impact on test methods, tools and equipment. Conventional test equipment and methodologies will be inadequate to assure high quality levels. On chip specialized block dedicated to test, usually referred to as Infrastructure IP (Intellectual Property), need to be developed and included in the new complex designs to assure that new chips will be adequately tested, diagnosed, measured, debugged and even sometimes repaired. In this thesis, some of the scaling trends in designing new complex SoCs will be analyzed one at a time, observing their implications on test and identifying the key hurdles/challenges to be addressed. The goal of the remaining of the thesis is the presentation of possible solutions. It is not sufficient to address just one of the challenges; all must be met at the same time to fulfill the market requirements

    Test and Diagnosis of Integrated Circuits

    Get PDF
    The ever-increasing growth of the semiconductor market results in an increasing complexity of digital circuits. Smaller, faster, cheaper and low-power consumption are the main challenges in semiconductor industry. The reduction of transistor size and the latest packaging technology (i.e., System-On-a-Chip, System-In-Package, Trough Silicon Via 3D Integrated Circuits) allows the semiconductor industry to satisfy the latest challenges. Although producing such advanced circuits can benefit users, the manufacturing process is becoming finer and denser, making chips more prone to defects.The work presented in the HDR manuscript addresses the challenges of test and diagnosis of integrated circuits. It covers:- Power aware test;- Test of Low Power Devices;- Fault Diagnosis of digital circuits

    Comprehensive Evaluation of Supply Voltage Underscaling in FPGA on-Chip Memories

    Get PDF
    In this work, we evaluate aggressive undervolting, i.e., voltage scaling below the nominal level to reduce the energy consumption of Field Programmable Gate Arrays (FPGAs). Usually, voltage guardbands are added by chip vendors to ensure the worst-case process and environmental scenarios. Through experimenting on several FPGA architectures, we measure this voltage guardband to be on average 39% of the nominal level, which in turn, delivers more than an order of magnitude power savings. However, further undervolting below the voltage guardband may cause reliability issues as the result of the circuit delay increase, i.e., start to appear faults. We extensively characterize the behavior of these faults in terms of the rate, location, type, as well as sensitivity to environmental temperature, with a concentration of on-chip memories, or Block RAMs (BRAMs). Finally, we evaluate a typical FPGA-based Neural Network (NN) accelerator under low-voltage BRAM operations. In consequence, the substantial NN energy savings come with the cost of NN accuracy loss. To attain power savings without NN accuracy loss, we propose a novel technique that relies on the deterministic behavior of undervolting faults and can limit the accuracy loss to 0.1% without any timing-slack overhead.Peer ReviewedPostprint (author's final draft

    Cross-layer Soft Error Analysis and Mitigation at Nanoscale Technologies

    Get PDF
    This thesis addresses the challenge of soft error modeling and mitigation in nansoscale technology nodes and pushes the state-of-the-art forward by proposing novel modeling, analyze and mitigation techniques. The proposed soft error sensitivity analysis platform accurately models both error generation and propagation starting from a technology dependent device level simulations all the way to workload dependent application level analysis

    Performance analysis of fault-tolerant nanoelectronic memories

    Get PDF
    Performance growth in microelectronics, as described by Moore’s law, is steadily approaching its limits. Nanoscale technologies are increasingly being explored as a practical solution to sustaining and possibly surpassing current performance trends of microelectronics. This work presents an in-depth analysis of the impact on performance, of incorporating reliability schemes into the architecture of a crossbar molecular switch nanomemory and demultiplexer. Nanoelectronics are currently in their early stages, and so fabrication and design methodologies are still in the process of being studied and developed. The building blocks of nanotechnology are fabricated using bottom-up processes, which leave them highly susceptible to defects. Hence, it is very important that defect and fault-tolerant schemes be incorporated into the design of nanotechnology related devices. In this dissertation, we focus on the study of a novel and promising class of computer chip memories called crossbar molecular switch memories and their demultiplexer addressing units. A major part of this work was the design of a defect and fault tolerance scheme we called the Multi-Switch Junction (MSJ) scheme. The MSJ scheme takes advantage of the regular array geometry of the crossbar nanomemory to create multiple switches in the fabric of the crossbar nanomemory for the storage of a single bit. Implementing defect and fault tolerant schemes come at a performance cost to the crossbar nanomemory; the challenge becomes achieving a balance between device reliability and performance. We have studied the reliability induced performance penalties as they relate to the time (delay) it takes to access a bit, and the amount of power dissipated by the process. Also, MSJ was compared to the banking and error correction coding fault tolerant schemes. Studies were also conducted to ascertain the potential benefits of integrating our MSJ scheme with the banking scheme. Trade-off analysis between access time delay, power dissipation and reliability is outlined and presented in this work. Results show the MSJ scheme increases the reliability of the crossbar nanomemory and demultiplexer. Simulation results also indicated that MSJ works very well for smaller nanomemory array sizes, with reliabilities of 100% for molecular switch failure rates in the 10% or less range

    A Real-Time Error Detection (RTD) architecture and its use for reliability and post-silicon validation for F/F based memory arrays

    Get PDF
    This work proposes in-situ Real-Time Error Detection (RTD): embedding hardware in a memory array for detecting a fault in the array when it occurs, rather than when it is read. RTD breaks the serialization between data access and error-detection and, thus, it can speed-up the access-time of arrays that use in-line error-correction. The approach can also reduce the time needed to root-cause array related bugs during post-silicon validation and product testing. The paper introduces a two-dimensional error-correction scheme based on RTD and, also, presents a proactive error-correction method that combines RTD with demand-scrubbing. The work describes how to build RTD into a memory array with flip-flops to track in real-time the column-parity. A comparison of the proposed two-dimensional ECC scheme, as compared to single-error-correction-double-error-detection, shows that the RTD design has comparable error-detection-and-correction strength and, depending on the array dimensions and configuration, RTD reduces access time by 4% to 26% at an area and power overhead (negative is a reduction) between -7% to 33% and -42% to 86% respectively.Peer ReviewedPostprint (author's final draft

    Study of Radiation-Tolerant SRAM Design

    Get PDF
    Static Random Access Memories (SRAMs) are important storage components and widely used in digital systems. Meanwhile, with the continuous development and progress of aerospace technologies, SRAMs are increasingly used in electronic systems for spacecraft and satellites. Energetic particles in space environments can cause single event upsets normally referred as soft errors in the memories, which can lead to the failure of systems. Nowadays electronics at the ground level also experience this kind of upset mainly due to cosmic neutrons and alpha particles from packaging materials, and the failure rate can be 10 to 100 times higher than the errors from hardware failures. Therefore, it is important to study the single event effects in SRAMs and develop cost-effective techniques to mitigate these errors. The objectives of this thesis are to evaluate the current mitigation techniques of single event effects in SRAMs and develop a radiation-tolerant SRAM based on the developed techniques. Various radiation sources and the mechanism of their respective effects in Complementary Metal-Oxide Semiconductors(CMOS) devices are reviewed first in the thesis. The radiation effects in the SRAMs, specifically single event effects are studied, and various mitigation techniques are evaluated. Error-correcting codes (ECC) are studied in the thesis since they can detect and correct single bit errors in the cell array, and it is a effective method with low overhead in terms of area, speed, and power. Hamming codes are selected and implemented in the design of the SRAM, to protect the cells from single event upsets in the SRAM. The simulation results show they can prevent the single bit errors in the cell arrays with low area and speed overhead. Another important and vulnerable part of SRAMs in radiation environments is the sense amplifier. It may not generate the correct output during the reading operation if it is hit by an energetic particle. A novel fault-tolerant sense amplifier is introduced and validated with simulations. The results showed that the performance of the new design can be more than ten times better than that of the reference design. When combining the SRAM cell arrays protected with ECC and the radiation-tolerant hardened sense amplifiers, the SRAM can achieve high reliability with low speed and area overhead
    • …
    corecore