15 research outputs found

    MCU Tolerance in SRAMs through Low Redundancy Triple Adjacent Error Correction

    Full text link
    (c) 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.[EN] Static random access memories (SRAMs) are key in electronic systems. They are used not only as standalone devices, but also embedded in application specific integrated circuits. One key challenge for memories is their susceptibility to radiation-induced soft errors that change the value of memory cells. Error correction codes (ECCs) are commonly used to ensure correct data despite soft errors effects in semiconductor memories. Single error correction/double error detection (SEC-DED) codes have been traditionally the preferred choice for data protection in SRAMs. During the last decade, the percentage of errors that affect more than one memory cell has increased substantially, mainly due to multiple cell upsets (MCUs) caused by radiation. The bits affected by these errors are physically close. To mitigate their effects, ECCs that correct single errors and double adjacent errors have been proposed. These codes, known as single error correction/double adjacent error correction (SEC-DAEC), require the same number of parity bits as traditional SEC-DED codes and a moderate increase in the decoder complexity. However, MCUs are not limited to double adjacent errors, because they affect more bits as technology scales. In this brief, new codes that can correct triple adjacent errors and 3-bit burst errors are presented. They have been implemented using a 45-nm library and compared with previous proposals, showing that our codes have better error protection with a moderate overhead and low redundancy.This work was supported in part by the Universitat Politecnica de Valencia, Valencia, Spain, through the DesTT Research Project under Grant SP20120806; in part by the Spanish Ministry of Science and Education under Project AYA-2009-13300-C03; in part by the Arenes Research Project under Grant TIN2012-38308-C02-01; and in part by the Research Project entitled Manufacturable and Dependable Multicore Architectures at Nanoscale within the framework of COST ICT Action under Grant 1103.Saiz-Adalid, L.; Reviriego, P.; Gil, P.; Pontarelli, S.; Maestro, JA. (2015). MCU Tolerance in SRAMs through Low Redundancy Triple Adjacent Error Correction. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 23(10):2332-2336. https://doi.org/10.1109/TVLSI.2014.2357476S23322336231

    Modifying Hamming code and using the replication method to protect memory against triple soft errors

    Get PDF
    As technology scaling increases computer memory’s bit-cell density and reduces the voltage of semiconductors, the number of soft errors due to radiation induced single event upsets (SEU) and multi-bit upsets (MBU) also increases. To address this, error-correcting codes (ECC) can be used to detect and correct soft errors, while x-modular-redundancy improves fault tolerance. This paper presents a technique that provides high error-correction performance, high speed, and low complexity. The proposed technique ensures that only correct values get passed to the system output or are processed in spite of the presence of up to three-bit errors. The Hamming code is modified in order to provide a high probability of MBU detection. In addition, the paper describes the new technique and associated analysis scheme for its implementation. The new technique has been simulated, evaluated, and compared to error correction codes with similar decoding complexity to better understand the overheads required, the gained capabilities to protect data against three-bit errors, and to reduce the misdetection probability and false-detection probability of four-bit errors

    Protecting Memories against Soft Errors: The Case for Customizable Error Correction Codes

    Get PDF
    As technology scales, radiation induced soft errors create more complex error patterns in memories with a single particle corrupting several bits. This poses a challenge to the Error Correction Codes (ECCs) traditionally used to protect memories that can correct only single bit errors. During the last decade, a number of codes have been developed to correct the emerging error patterns, focusing initially on double adjacent errors and later on three bit burst errors. However, as the memory cells get smaller and smaller, the error patterns created by radiation will continue to change and thus new codes will be needed. In addition, the memory layout and the technology used may also make some patterns more likely than others. For example, in some memories, there maybe elements that separate blocks of bits in a word, making errors that affect two blocks less likely. Finally, for a given memory, depending on the data stored, some error patterns may be more critical than others. For example, if numbers are stored in the memory, in most cases, errors on the more significant bits have a larger impact. Therefore, for a given memory and application, to achieve optimal protection, we would like to have a code that corrects a given set of patterns. This is not possible today as there is a limited number of code choices available in terms of correctable error patterns and word lengths. However, most of the codes used to protect memories are linear block codes that have a regular structure and which design can be automated. In this paper, we propose the automation of error correction code design for memory protection. To that end, we introduce a software tool that given a word length and the error patterns that need to be corrected, produces a linear block code described by its parity check matrix and also the bit placement. The benefits of this automated design approach are illustrated with several case studies. Finally, the tool is made available so that designers can easily produce custom error correction codes for their specific needs.Jiaqiang Li and Liyi Xiao would like to acknowledge the support of the Fundamental Research Funds for the Central Universities (Grant No. HIT.KISTP.201404), Harbin science and innovation research special fund (2015RAXXJ003), and Special found for development of Shenzhen strategic emerging industries (JCYJ20150625142543456). Pedro Reviriego would like to acknowledge the support of the TEXEO project TEC2016-80339-R funded by the Spanish Ministry of Economy and Competitivity and of the Madrid Community research project TAPIR-CM Grant No. P2018/TCS-4496

    Novel fault tolerant Multi-Bit Upset (MBU) Error-Detection and Correction (EDAC) architecture

    Get PDF
    Desde el punto de vista de seguridad, la certificación aeronáutica de aplicaciones críticas de vuelo requiere diferentes técnicas que son usadas para prevenir fallos en los equipos electrónicos. Los fallos de tipo hardware debido a la radiación solar que existe a las alturas standard de vuelo, como SEU (Single Event Upset) y MCU (Multiple Bit Upset), provocan un cambio de estado de los bits que soportan la información almacenada en memoria. Estos fallos se producen, por ejemplo, en la memoria de configuración de una FPGA, que es donde se definen todas las funcionalidades. Las técnicas de protección requieren normalmente de redundancias que incrementan el coste, número de componentes, tamaño de la memoria y peso. En la fase de desarrollo de aplicaciones críticas de vuelo, generalmente se utilizan una serie de estándares o recomendaciones de diseño como ABD100, RTCA DO-160, IEC62395, etc, y diferentes técnicas de protección para evitar fallos del tipo SEU o MCU. Estas técnicas están basadas en procesos tecnológicos específicos como memorias robustas, codificaciones para detección y corrección de errores (EDAC), redundancias software, redundancia modular triple (TMR) o soluciones a nivel sistema. Esta tesis está enfocada a minimizar e incluso suprimir los efectos de los SEUs y MCUs que particularmente ocurren en la electrónica de avión como consecuencia de la exposición a radiación de partículas no cargadas (como son los neutrones) que se encuentra potenciada a las típicas alturas de vuelo. La criticidad en vuelo que tienen determinados sistemas obligan a que dichos sistemas sean tolerantes a fallos, es decir, que garanticen un correcto funcionamiento aún cuando se produzca un fallo en ellos. Es por ello que soluciones como las presentadas en esta tesis tienen interés en el sector industrial. La Tesis incluye una descripción inicial de la física de la radiación incidente sobre aeronaves, y el análisis de sus efectos en los componentes electrónicos aeronaúticos basados en semiconductor, que desembocan en la generación de SEUs y MCUs. Este análisis permite dimensionar adecuadamente y optimizar los procedimientos de corrección que se propongan posteriormente. La Tesis propone un sistema de corrección de fallos SEUs y MCUs que permita cumplir la condición de Sistema Tolerante a Fallos, a la vez que minimiza los niveles de redundancia y de complejidad de los códigos de corrección. El nivel de redundancia es minimizado con la introducción del concepto propuesto HSB (Hardwired Seed Bits), en la que se reduce la información esencial a unos pocos bits semilla, neutros frente a radiación. Los códigos de corrección requeridos se reducen a la corrección de un único error, gracias al uso del concepto de Distancia Virtual entre Bits, a partir del cual será posible corregir múltiples errores simultáneos (MCUs) a partir de códigos simples de corrección. Un ejemplo de aplicación de la Tesis es la implementación de una Protección Tolerante a Fallos sobre la memoria SRAM de una FPGA. Esto significa que queda protegida no sólo la información contenida en la memoria sino que también queda auto-protegida la función de protección misma almacenada en la propia SRAM. De esta forma, el sistema es capaz de auto-regenerarse ante un SEU o incluso un MCU, independientemente de la zona de la SRAM sobre la que impacte la radiación. Adicionalmente, esto se consigue con códigos simples tales como corrección por bit de paridad y Hamming, minimizando la dedicación de recursos de computación hacia tareas de supervisión del sistema.For airborne safety critical applications certification, different techniques are implemented to prevent failures in electronic equipments. The HW failures at flying heights of aircrafts related to solar radiation such as SEU (Single-Event-Upset) and MCU (Multiple Bit Upset), causes bits alterations that corrupt the information at memories. These HW failures cause errors, for example, in the Configuration-Code of an FPGA that defines the functionalities. The protection techniques require classically redundant functionalities that increases the cost, components, memory space and weight. During the development phase for airborne safety critical applications, different aerospace standards are generally recommended as ABD100, RTCA-DO160, IEC62395, etc, and different techniques are classically used to avoid failures such as SEU or MCU. These techniques are based on specific technology processes, Hardened memories, error detection and correction codes (EDAC), SW redundancy, Triple Modular Redundancy (TMR) or System level solutions. This Thesis is focussed to minimize, and even to remove, the effects of SEUs and MCUs, that particularly occurs in the airborne electronics as a consequence of its exposition to solar radiation of non-charged particles (for example the neutrons). These non-charged particles are even powered at flying altitudes due to aircraft volume. The safety categorization of different equipments/functionalities requires a design based on fault-tolerant approach that means, the system will continue its normal operation even if a failure occurs. The solution proposed in this Thesis is relevant for the industrial sector because of its Fault-tolerant capability. Thesis includes an initial description for the physics of the solar radiation that affects into aircrafts, and also the analyses of their effects into the airborne electronics based on semiconductor components that create the SEUs and MCUs. This detailed analysis allows the correct sizing and also the optimization of the procedures used to correct the errors. This Thesis proposes a system that corrects the SEUs and MCUs allowing the fulfilment of the Fault-Tolerant requirement, reducing the redundancy resources and also the complexity of the correction codes. The redundancy resources are minimized thanks to the introduction of the concept of HSB (Hardwired Seed Bits), in which the essential information is reduced to a few seed bits, neutral to radiation. The correction codes required are reduced to the correction of one error thanks to the use of the concept of interleaving distance between adjacent bits, this allows the simultaneous multiple error correction with simple single error correcting codes. An example of the application of this Thesis is the implementation of the Fault-tolerant architecture of an SRAM-based FPGA. That means that the information saved in the memory is protected but also the correction functionality is auto protected as well, also saved into SRAM memory. In this way, the system is able to self-regenerate the information lost in case of SEUs or MCUs. This is independent of the SRAM area affected by the radiation. Furthermore, this performance is achieved by means simple error correcting codes, as parity bits or Hamming, that minimize the use of computational resources to this supervision tasks for system.Programa Oficial de Doctorado en Ingeniería Eléctrica, Electrónica y AutomáticaPresidente: Luis Alfonso Entrena Arrontes.- Secretario: Pedro Reviriego Vasallo.- Vocal: Mª Luisa López Vallej

    Multiple Cell Upsets Correction for OLS Codes

    Get PDF
    ABSTRACT: An Error Correction code with Parity check matrix is implemented which is other type of the One Step Majority Logic Decodable (OS-MLD) called as Orthogonal Latin Squares (OLS) codes. It is a concurrent error detection technique for OLS codes encoders and syndrome computation because of the fact that when ECCs are used, the encoder and decoder circuits can also suffer errors.These OLS codes are used to correct the memories and caches. This can be achieved due to their modularity such that the error correction capabilities can be easily adapted to the error rate or to the mode of the operation.OLS codes typically require more parity bits than other codes to correct the same number of errors. However, due to their modularity and the simple low delay decoding implementation these are widely used in Error Correction. All the errors that affect a single circuit node are detected by the parity prediction scheme. The area and latency values are monitored

    Design and Analysis of an Adjacent Multi-bit Error Correcting Code for Nanoscale SRAMs

    Get PDF
    Increasing static random access memory (SRAM) bitcell density is a major driving force for semiconductor technology scaling. The industry standard 2x reduction in SRAM bitcell area per technology node has lead to a proliferation in memory intensive applications as greater memory system capacity can be realized per unit area. Coupled with this increasing capacity is an increasing SRAM system-level soft error rate (SER). Soft errors, caused by galactic radiation and radioactive chip packaging material corrupt a bitcell’s data-state and are a potential cause of catastrophic system failures. Further, reductions in device geometries, design rules, and sensitive node capacitances increase the probability of multiple adjacent bitcells being upset per particle strike to over 30% of the total SER below the 45 nm process node. Traditionally, these upsets have been addressed using a simple error correction code (ECC) combined with word interleaving. With continued scaling however, errors beyond this setup begin to emerge. Although more powerful ECCs exist, they come at an increased overhead in terms of area and latency. Additionally, interleaving adds complexity to the system and may not always be feasible for the given architecture. In this thesis, a new class of ECC targeted toward adjacent multi-bit upsets (MBU) is proposed and analyzed. These codes present a tradeoff between the currently popular single error correcting-double error detecting (SEC-DED) ECCs used in SRAMs (that are unable to correct MBUs), and the more robust multi-bit ECC schemes used for MBU reliability. The proposed codes are evaluated and compared against other ECCs using a custom test suite and multi-bit error channel model developed in Matlab as well as Verilog hardware description language (HDL) implementations synthesized using Synopsys Design Compiler and a commercial 65 nm bulk CMOS standard cell library. Simulation results show that for the same check-bit overhead as a conventional 64 data-bit SEC-DED code, the proposed scheme provides a corrected-SER approximately equal to the Bose-Chaudhuri- Hocquenghem (BCH) double error correcting (DEC) code, and a 4.38x improvement over the SEC-DED code in the same error channel. While, for 3 additional check-bits (still 3 less than the BCH DEC code), a triple adjacent error correcting version of the proposed code provides a 2.35x improvement in corrected-SER over the BCH DEC code for 90.9% less ECC circuit area and 17.4% less error correction delay. For further verification, a 0.4-1.0 V 75 kb single-cycle SRAM macro protected with a programmable, up-to-3-adjacent-bit-correcting version of the proposed ECC has been fab- ricated in a commercial 28 nm bulk CMOS process. The SRAM macro has undergone neu- tron irradiation testing at the TRIUMF Neutron Irradiation Facility in Vancouver, Canada. Measurements results show a 189x improvement in SER over an unprotected memory with no ECC enabled and a 5x improvement over a traditional single-error-correction (SEC) code at 0.5 V using 1-way interleaving for the same number of check-bits. This is compa- rable with the 4.38x improvement observed in simulation. Measurement results confirm an average active energy of 0.015 fJ/bit at 0.4 V, and average 80 mV reduction in VDDMIN across eight packaged chips by enabling the ECC. Both the SRAM macro and ECC circuit were designed for dynamic voltage and frequency scaling for both nominal and low voltage applications using a full-custom circuit design flow

    2D Parity Product Code for TSV online fault correction and detection

    Get PDF
    Through-Silicon-Via (TSV) is one of the most promising technologies to realize 3D Integrated Circuits (3D-ICs).  However, the reliability issues due to the low yield rates and the sensitivity to thermal hotspots and stress issues are preventing TSV-based 3D-ICs from being widely and efficiently used. To enhance the reliability of TSV connections, using error correction code to detect and correct faults automatically has been demonstrated as a viable solution.This paper presents a 2D Parity Product Code (2D-PPC) for TSV fault-tolerance with the ability to correct one fault and detect, at least, two faults.  In an implementation of 64-bit data and 81-bit codeword, 2D-PPC can detect over 71 faults, on average. Its encoder and decoder decrease the overall latency by 38.33% when compared to the Single Error Correction Double Error Detection code.  In addition to the high detection rates, the encoder can detect 100% of its gate failures, and the decoder can detect and correct around 40% of its individual gate failures. The squared 2D-PPC could be extended using orthogonal Latin square to support extra bit correction

    Двухмодульные взвешенные коды с суммированием в кольце вычетов по модулю M=4

    Get PDF
    The paper describes research results of features of error detection in data vectors by sum codes. The task is relevant in this setting, first of all, for the use of sum codes in the implementation of the checkable discrete systems and the technical means for the diagnosis of their components. Methods for sum codes constructing are described. A brief overview in the field of methods for sum codes constructing is provided. The article highlights codes for which the values of all data bits are taken into account once by the operations of summing their values or the values of the weight coefficients of the bits during the formation of the check vector. The paper also highlights codes that are formed when the data vectors are initially divided into subsets, in particular, into two subsets. An extension of the sum code class obtained by isolating two independent parts in the data vectors, as well as weighting the bits of the data vectors at the stage of code construction, is proposed. The paper provides a generalized algorithm for two-module weighted codes construction, and describes their features obtained by weighing with non-ones weight coefficients for one of data bits in each of the subvectors, according to which the total weight is calculated. Particular attention is paid to the two-module weight-based sum code, for which the total weight of the data vector in the residue ring modulo M = 4 is determined. It is shown that the purpose of the inequality between the bits of the data vector in some cases gives improvements in the error detection characteristics compared to the well-known two-module codes. Some modifications of the proposed two-module weighted codes are described. A method for calculating the total number of undetectable errors in the two-module sum codes in the residue ring modulo M = 4 with one weighted bit in each of the subsets is proposed. Detailed characteristics of error detection by the considered codes both by the multiplicities of undetectable errors and by their types (unidirectional, symmetrical and asymmetrical errors) are given. The proposed codes are compared with known codes. A method for the synthesis of two-module sum encoders on a standard element base of the single signals adders is proposed. The classification of two-module sum codes is presented.Представлены результаты исследования особенностей обнаружения ошибок в информационных векторах кодами с суммированием. В такой постановке задача актуальна, прежде всего, для использования кодов с суммированием при реализации контролепригодных дискретных систем и технических средств диагностирования их компонентов. Приводится краткий обзор работ в области построения кодов с суммированием и описание способов их построения. Выделены коды, для которых при формировании контрольного вектора единожды учитываются значения всех информационных разрядов путем операций суммирования их значений или значений весовых коэффициентов разрядов, а также коды, которые формируются при первоначальном разбиении информационных векторов на подмножества, в частности на два подмножества. Предложено расширение класса кодов с суммированием, получаемых за счет выделения двух независимых частей в контрольных векторах, а также взвешивания разрядов информационных векторов на этапе построения кода. Приведен обобщенный алгоритм построения двухмодульных взвешенных кодов, а также описаны особенности некоторых из кодов, полученных при взвешивании неединичными весовыми коэффициентами по одному информационному разряду в каждом из подвекторов, по которым осуществляется подсчет суммарного веса. Особое внимание уделено двухмодульным взвешенным кодам с суммированием, для которых определяется суммарный вес информационного вектора в кольце вычетов по модулю M=4. Показано, что установление неравноправия между разрядами информационного вектора в некоторых случаях дает улучшение в характеристиках обнаружения ошибок по сравнению с известными двухмодульными кодами. Описываются некоторые модификации предложенных двухмодульных взвешенных кодов. Предложен способ подсчета общего числа необнаруживаемых ошибок в двухмодульных кодах с суммированием в кольце вычетов по модулю M=4 с одним взвешенным разрядом в каждом из подмножеств. Приведены подробные характеристики обнаружения ошибок рассматриваемыми кодами как по кратностям необнаруживаемых ошибок, так и по их видам (монотонные, симметричные и асимметричные ошибки). Проведено сравнение с известными кодами. Предложен способ синтеза кодеров двухмодульных кодов с суммированием на стандартной элементной базе сумматоров единичных сигналов. Дана классификация двухмодульных кодов с суммированием
    corecore