181 research outputs found

    Improving Error Correction Codes for Multiple-Cell Upsets in Space Applications

    Full text link
    © 2018 IEEE. Personal use of this material is permitted. Permissíon from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertisíng or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.[EN] Currently, faults suffered by SRAM memory systems have increased due to the aggressive CMOS integration density. Thus, the probability of occurrence of single-cell upsets (SCUs) or multiple-cell upsets (MCUs) augments. One of the main causes of MCUs in space applications is cosmic radiation. A common solution is the use of error correction codes (ECCs). Nevertheless, when using ECCs in space applications, they must achieve a good balance between error coverage and redundancy, and their encoding/decoding circuits must be efficient in terms of area, power, and delay. Different codes have been proposed to tolerate MCUs. For instance, Matrix codes use Hamming codes and parity checks in a bi-dimensional layout to correct and detect some patterns of MCUs. Recently presented, column¿line¿code (CLC) has been designed to tolerate MCUs in space applications. CLC is a modified Matrix code, based on extended Hamming codes and parity checks. Nevertheless, a common property of these codes is the high redundancy introduced. In this paper, we present a series of new lowredundant ECCs able to correct MCUs with reduced area, power, and delay overheads. Also, these new codes maintain, or even improve, memory error coverage with respect to Matrix and CLC codes.This work was supported by the Spanish Government under the research Project TIN2016-81075-R.Gracia-Morán, J.; Saiz-Adalid, L.; Gil Tomás, DA.; Gil, P. (2018). Improving Error Correction Codes for Multiple-Cell Upsets in Space Applications. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 26(10):2132-2142. https://doi.org/10.1109/TVLSI.2018.2837220S21322142261

    Weight Spectrum of Quasi-Perfect Binary Codes with Distance 4

    Full text link
    We consider the weight spectrum of a class of quasi-perfect binary linear codes with code distance 4. For example, extended Hamming code and Panchenko code are the known members of this class. Also, it is known that in many cases Panchenko code has the minimal number of weight 4 codewords. We give exact recursive formulas for the weight spectrum of quasi-perfect codes and their dual codes. As an example of application of the weight spectrum we derive a lower estimate for the conditional probability of correction of erasure patterns of high weights (equal to or greater than code distance).Comment: 5 pages, 11 references, 2 tables; some explanations and detail are adde

    Error control coding for semiconductor memories

    Get PDF
    All modern computers have memories built from VLSI RAM chips. Individually, these devices are highly reliable and any single chip may perform for decades before failing. However, when many of the chips are combined in a single memory, the time that at least one of them fails could decrease to mere few hours. The presence of the failed chips causes errors when binary data are stored in and read out from the memory. As a consequence the reliability of the computer memories degrade. These errors are classified into hard errors and soft errors. These can also be termed as permanent and temporary errors respectively. In some situations errors may show up as random errors, in which both 1-to-O errors and 0-to-l errors occur randomly in a memory word. In other situations the most likely errors are unidirectional errors in which 1-to-O errors or 0-to-l errors may occur but not both of them in one particular memory word. To achieve a high speed and highly reliable computer, we need large capacity memory. Unfortunately, with high density of semiconductor cells in memory, the error rate increases dramatically. Especially, the VLSI RAMs suffer from soft errors caused by alpha-particle radiation. Thus the reliability of computer could become unacceptable without error reducing schemes. In practice several schemes to reduce the effects of the memory errors were commonly used. But most of them are valid only for hard errors. As an efficient and economical method, error control coding can be used to overcome both hard and soft errors. Therefore it is becoming a widely used scheme in computer industry today. In this thesis, we discuss error control coding for semiconductor memories. The thesis consists of six chapters. Chapter one is an introduction to error detecting and correcting coding for computer memories. Firstly, semiconductor memories and their problems are discussed. Then some schemes for error reduction in computer memories are given and the advantages of using error control coding over other schemes are presented. In chapter two, after a brief review of memory organizations, memory cells and their physical constructions and principle of storing data are described. Then we analyze mechanisms of various errors occurring in semiconductor memories so that, for different errors different coding schemes could be selected. Chapter three is devoted to the fundamental coding theory. In this chapter background on encoding and decoding algorithms are presented. In chapter four, random error control codes are discussed. Among them error detecting codes, single* error correcting/double error detecting codes and multiple error correcting codes are analyzed. By using examples, the decoding implementations for parity codes, Hamming codes, modified Hamming codes and majority logic codes are demonstrated. Also in this chapter it was shown that by combining error control coding and other schemes, the reliability of the memory can be improved by many orders. For unidirectional errors, we introduced unordered codes in chapter five. Two types of the unordered codes are discussed. They are systematic and nonsystematic unordered codes. Both of them are very powerful for unidirectional error detection. As an example of optimal nonsystematic unordered code, an efficient balanced code are analyzed. Then as an example of systematic unordered codes Berger codes are analyzed. Considering the fact that in practice random errors still may occur in unidirectional error memories, some recently developed t-random error correcting/all unidirectional error detecting codes are introduced. Illustrative examples are also included to facilitate the explanation. Chapter six is the conclusions of the thesis. The whole thesis is oriented to the applications of error control coding for semiconductor memories. Most of the codes discussed in the thesis are widely used in practice. Through the thesis we attempt to provide a review of coding in computer memories and emphasize the advantage of coding. It is obvious that with the requirement of higher speed and higher capacity semiconductor memories, error control coding will play even more important role in the future

    Self-Healing Cellular Automata to Correct Soft Errors in Defective Embedded Program Memories

    Get PDF
    Static Random Access Memory (SRAM) cells in ultra-low power Integrated Circuits (ICs) based on nanoscale Complementary Metal Oxide Semiconductor (CMOS) devices are likely to be the most vulnerable to large-scale soft errors. Conventional error correction circuits may not be able to handle the distributed nature of such errors and are susceptible to soft errors themselves. In this thesis, a distributed error correction circuit called Self-Healing Cellular Automata (SHCA) that can repair itself is presented. A possible way to deploy a SHCA in a system of SRAM-based embedded program memories (ePM) for one type of chip multi-processors is also discussed. The SHCA is compared with conventional error correction approaches and its strengths and limitations are analyzed

    Design and Analysis of an Adjacent Multi-bit Error Correcting Code for Nanoscale SRAMs

    Get PDF
    Increasing static random access memory (SRAM) bitcell density is a major driving force for semiconductor technology scaling. The industry standard 2x reduction in SRAM bitcell area per technology node has lead to a proliferation in memory intensive applications as greater memory system capacity can be realized per unit area. Coupled with this increasing capacity is an increasing SRAM system-level soft error rate (SER). Soft errors, caused by galactic radiation and radioactive chip packaging material corrupt a bitcell’s data-state and are a potential cause of catastrophic system failures. Further, reductions in device geometries, design rules, and sensitive node capacitances increase the probability of multiple adjacent bitcells being upset per particle strike to over 30% of the total SER below the 45 nm process node. Traditionally, these upsets have been addressed using a simple error correction code (ECC) combined with word interleaving. With continued scaling however, errors beyond this setup begin to emerge. Although more powerful ECCs exist, they come at an increased overhead in terms of area and latency. Additionally, interleaving adds complexity to the system and may not always be feasible for the given architecture. In this thesis, a new class of ECC targeted toward adjacent multi-bit upsets (MBU) is proposed and analyzed. These codes present a tradeoff between the currently popular single error correcting-double error detecting (SEC-DED) ECCs used in SRAMs (that are unable to correct MBUs), and the more robust multi-bit ECC schemes used for MBU reliability. The proposed codes are evaluated and compared against other ECCs using a custom test suite and multi-bit error channel model developed in Matlab as well as Verilog hardware description language (HDL) implementations synthesized using Synopsys Design Compiler and a commercial 65 nm bulk CMOS standard cell library. Simulation results show that for the same check-bit overhead as a conventional 64 data-bit SEC-DED code, the proposed scheme provides a corrected-SER approximately equal to the Bose-Chaudhuri- Hocquenghem (BCH) double error correcting (DEC) code, and a 4.38x improvement over the SEC-DED code in the same error channel. While, for 3 additional check-bits (still 3 less than the BCH DEC code), a triple adjacent error correcting version of the proposed code provides a 2.35x improvement in corrected-SER over the BCH DEC code for 90.9% less ECC circuit area and 17.4% less error correction delay. For further verification, a 0.4-1.0 V 75 kb single-cycle SRAM macro protected with a programmable, up-to-3-adjacent-bit-correcting version of the proposed ECC has been fab- ricated in a commercial 28 nm bulk CMOS process. The SRAM macro has undergone neu- tron irradiation testing at the TRIUMF Neutron Irradiation Facility in Vancouver, Canada. Measurements results show a 189x improvement in SER over an unprotected memory with no ECC enabled and a 5x improvement over a traditional single-error-correction (SEC) code at 0.5 V using 1-way interleaving for the same number of check-bits. This is compa- rable with the 4.38x improvement observed in simulation. Measurement results confirm an average active energy of 0.015 fJ/bit at 0.4 V, and average 80 mV reduction in VDDMIN across eight packaged chips by enabling the ECC. Both the SRAM macro and ECC circuit were designed for dynamic voltage and frequency scaling for both nominal and low voltage applications using a full-custom circuit design flow

    Reducing the Overhead of BCH Codes: New Double Error Correction Codes

    Full text link
    [EN] The Bose-Chaudhuri-Hocquenghem (BCH) codes are a well-known class of powerful error correction cyclic codes. BCH codes can correct multiple errors with minimal redundancy. Primitive BCH codes only exist for some word lengths, which do not frequently match those employed in digital systems. This paper focuses on double error correction (DEC) codes for word lengths that are in powers of two (8, 16, 32, and 64), which are commonly used in memories. We also focus on hardware implementations of the encoder and decoder circuits for very fast operations. This work proposes new low redundancy and reduced overhead (LRRO) DEC codes, with the same redundancy as the equivalent BCH DEC codes, but whose encoder, and decoder circuits present a lower overhead (in terms of propagation delay, silicon area usage and power consumption). We used a methodology to search parity check matrices, based on error patterns, in order to design the new codes. We implemented and synthesized them, and compared their results with those obtained for the BCH codes. Our implementation of the decoder circuits achieved reductions between 2.8% and 8.7% in the propagation delay, between 1.3% and 3.0% in the silicon area, and between 15.7% and 26.9% in the power consumption. Therefore, we propose LRRO codes as an alternative for protecting information against multiple errors.This research was supported in part by the Spanish Government, project TIN2016-81075-R, by Primeros Proyectos de Investigacion (PAID-06-18), Vicerrectorado de Investigacion, Innovacion y Transferencia de la Universitat Politecnica de Valencia (UPV), project 20190032, and by the Institute of Information and Communication Technologies (ITACA).Saiz-Adalid, L.; Gracia-Morán, J.; Gil Tomás, DA.; Baraza Calvo, JC.; Gil, P. (2020). Reducing the Overhead of BCH Codes: New Double Error Correction Codes. Electronics. 9(11):1-14. https://doi.org/10.3390/electronics9111897S114911Fujiwara, E. (2005). Code Design for Dependable Systems. doi:10.1002/0471792748Xinmiao, Z. (2017). VLSI Architectures for Modern Error-Correcting Codes. doi:10.1201/b18673Bose, R. C., & Ray-Chaudhuri, D. K. (1960). On a class of error correcting binary group codes. Information and Control, 3(1), 68-79. doi:10.1016/s0019-9958(60)90287-4Chen, P., Zhang, C., Jiang, H., Wang, Z., & Yue, S. (2015). High performance low complexity BCH error correction circuit for SSD controllers. 2015 IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC). doi:10.1109/edssc.2015.7285089IEEE 802.3-2018 - IEEE Standard for Ethernethttps://standards.ieee.org/standard/802_3-2018.htmlH.263: Video Coding for Low Bit Rate Communicationhttps://www.itu.int/rec/T-REC-H.263/enVangelista, L., Benvenuto, N., Tomasin, S., Nokes, C., Stott, J., Filippi, A., … Morello, A. (2009). Key technologies for next-generation terrestrial digital television standard DVB-T2. IEEE Communications Magazine, 47(10), 146-153. doi:10.1109/mcom.2009.52738222013 ITRS—International Technology Roadmap for Semiconductorshttp://www.itrs2.net/2013-itrs.htmlIbe, E., Taniguchi, H., Yahagi, Y., Shimbo, K., & Toba, T. (2010). Impact of Scaling on Neutron-Induced Soft Error in SRAMs From a 250 nm to a 22 nm Design Rule. IEEE Transactions on Electron Devices, 57(7), 1527-1538. doi:10.1109/ted.2010.2047907Gil-Tomás, D., Gracia-Morán, J., Baraza-Calvo, J.-C., Saiz-Adalid, L.-J., & Gil-Vicente, P.-J. (2012). Studying the effects of intermittent faults on a microcontroller. Microelectronics Reliability, 52(11), 2837-2846. doi:10.1016/j.microrel.2012.06.004Neubauer, A., Freudenberger, J., & Khn, V. (2007). Coding Theory. doi:10.1002/9780470519837Morelos-Zaragoza, R. H. (2006). The Art of Error Correcting Coding. doi:10.1002/0470035706Naseer, R., & Draper, J. (2008). DEC ECC design to improve memory reliability in Sub-100nm technologies. 2008 15th IEEE International Conference on Electronics, Circuits and Systems. doi:10.1109/icecs.2008.4674921Saiz-Adalid, L.-J., Gracia-Moran, J., Gil-Tomas, D., Baraza-Calvo, J.-C., & Gil-Vicente, P.-J. (2019). Ultrafast Codes for Multiple Adjacent Error Correction and Double Error Detection. IEEE Access, 7, 151131-151143. doi:10.1109/access.2019.2947315Saiz-Adalid, L.-J., Gil-Vicente, P.-J., Ruiz-García, J.-C., Gil-Tomás, D., Baraza, J.-C., & Gracia-Morán, J. (2013). Flexible Unequal Error Control Codes with Selectable Error Detection and Correction Levels. Computer Safety, Reliability, and Security, 178-189. doi:10.1007/978-3-642-40793-2_17Saiz-Adalid, L.-J., Reviriego, P., Gil, P., Pontarelli, S., & Maestro, J. A. (2015). MCU Tolerance in SRAMs Through Low-Redundancy Triple Adjacent Error Correction. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 23(10), 2332-2336. doi:10.1109/tvlsi.2014.2357476Gracia-Moran, J., Saiz-Adalid, L. J., Gil-Tomas, D., & Gil-Vicente, P. J. (2018). Improving Error Correction Codes for Multiple-Cell Upsets in Space Applications. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 26(10), 2132-2142. doi:10.1109/tvlsi.2018.2837220Cadence: Computational Software for Intelligent System Designhttps://www.cadence.comStine, J. E., Castellanos, I., Wood, M., Henson, J., Love, F., Davis, W. R., … Jenkal, R. (2007). FreePDK: An Open-Source Variation-Aware Design Kit. 2007 IEEE International Conference on Microelectronic Systems Education (MSE’07). doi:10.1109/mse.2007.44NanGate FreePDK45 Open Cell Libraryhttp://www.nangate.com/?page_id=232

    Fault-tolerant computer study

    Get PDF
    A set of building block circuits is described which can be used with commercially available microprocessors and memories to implement fault tolerant distributed computer systems. Each building block circuit is intended for VLSI implementation as a single chip. Several building blocks and associated processor and memory chips form a self checking computer module with self contained input output and interfaces to redundant communications buses. Fault tolerance is achieved by connecting self checking computer modules into a redundant network in which backup buses and computer modules are provided to circumvent failures. The requirements and design methodology which led to the definition of the building block circuits are discussed

    Memory built-in self-repair and correction for improving yield: a review

    Get PDF
    Nanometer memories are highly prone to defects due to dense structure, necessitating memory built-in self-repair as a must-have feature to improve yield. Today’s system-on-chips contain memories occupying an area as high as 90% of the chip area. Shrinking technology uses stricter design rules for memories, making them more prone to manufacturing defects. Further, using 3D-stacked memories makes the system vulnerable to newer defects such as those coming from through-silicon-vias (TSV) and micro bumps. The increased memory size is also resulting in an increase in soft errors during system operation. Multiple memory repair techniques based on redundancy and correction codes have been presented to recover from such defects and prevent system failures. This paper reviews recently published memory repair methodologies, including various built-in self-repair (BISR) architectures, repair analysis algorithms, in-system repair, and soft repair handling using error correcting codes (ECC). It provides a classification of these techniques based on method and usage. Finally, it reviews evaluation methods used to determine the effectiveness of the repair algorithms. The paper aims to present a survey of these methodologies and prepare a platform for developing repair methods for upcoming-generation memories

    Resilience of an embedded architecture using hardware redundancy

    Get PDF
    In the last decade the dominance of the general computing systems market has being replaced by embedded systems with billions of units manufactured every year. Embedded systems appear in contexts where continuous operation is of utmost importance and failure can be profound. Nowadays, radiation poses a serious threat to the reliable operation of safety-critical systems. Fault avoidance techniques, such as radiation hardening, have been commonly used in space applications. However, these components are expensive, lag behind commercial components with regards to performance and do not provide 100% fault elimination. Without fault tolerant mechanisms, many of these faults can become errors at the application or system level, which in turn, can result in catastrophic failures. In this work we study the concepts of fault tolerance and dependability and extend these concepts providing our own definition of resilience. We analyse the physics of radiation-induced faults, the damage mechanisms of particles and the process that leads to computing failures. We provide extensive taxonomies of 1) existing fault tolerant techniques and of 2) the effects of radiation in state-of-the-art electronics, analysing and comparing their characteristics. We propose a detailed model of faults and provide a classification of the different types of faults at various levels. We introduce an algorithm of fault tolerance and define the system states and actions necessary to implement it. We introduce novel hardware and system software techniques that provide a more efficient combination of reliability, performance and power consumption than existing techniques. We propose a new element of the system called syndrome that is the core of a resilient architecture whose software and hardware can adapt to reliable and unreliable environments. We implement a software simulator and disassembler and introduce a testing framework in combination with ERA’s assembler and commercial hardware simulators
    • …
    corecore