38 research outputs found

    MCU Tolerance in SRAMs through Low Redundancy Triple Adjacent Error Correction

    Full text link
    (c) 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.[EN] Static random access memories (SRAMs) are key in electronic systems. They are used not only as standalone devices, but also embedded in application specific integrated circuits. One key challenge for memories is their susceptibility to radiation-induced soft errors that change the value of memory cells. Error correction codes (ECCs) are commonly used to ensure correct data despite soft errors effects in semiconductor memories. Single error correction/double error detection (SEC-DED) codes have been traditionally the preferred choice for data protection in SRAMs. During the last decade, the percentage of errors that affect more than one memory cell has increased substantially, mainly due to multiple cell upsets (MCUs) caused by radiation. The bits affected by these errors are physically close. To mitigate their effects, ECCs that correct single errors and double adjacent errors have been proposed. These codes, known as single error correction/double adjacent error correction (SEC-DAEC), require the same number of parity bits as traditional SEC-DED codes and a moderate increase in the decoder complexity. However, MCUs are not limited to double adjacent errors, because they affect more bits as technology scales. In this brief, new codes that can correct triple adjacent errors and 3-bit burst errors are presented. They have been implemented using a 45-nm library and compared with previous proposals, showing that our codes have better error protection with a moderate overhead and low redundancy.This work was supported in part by the Universitat Politecnica de Valencia, Valencia, Spain, through the DesTT Research Project under Grant SP20120806; in part by the Spanish Ministry of Science and Education under Project AYA-2009-13300-C03; in part by the Arenes Research Project under Grant TIN2012-38308-C02-01; and in part by the Research Project entitled Manufacturable and Dependable Multicore Architectures at Nanoscale within the framework of COST ICT Action under Grant 1103.Saiz-Adalid, L.; Reviriego, P.; Gil, P.; Pontarelli, S.; Maestro, JA. (2015). MCU Tolerance in SRAMs through Low Redundancy Triple Adjacent Error Correction. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 23(10):2332-2336. https://doi.org/10.1109/TVLSI.2014.2357476S23322336231

    Novel fault tolerant Multi-Bit Upset (MBU) Error-Detection and Correction (EDAC) architecture

    Get PDF
    Desde el punto de vista de seguridad, la certificación aeronáutica de aplicaciones críticas de vuelo requiere diferentes técnicas que son usadas para prevenir fallos en los equipos electrónicos. Los fallos de tipo hardware debido a la radiación solar que existe a las alturas standard de vuelo, como SEU (Single Event Upset) y MCU (Multiple Bit Upset), provocan un cambio de estado de los bits que soportan la información almacenada en memoria. Estos fallos se producen, por ejemplo, en la memoria de configuración de una FPGA, que es donde se definen todas las funcionalidades. Las técnicas de protección requieren normalmente de redundancias que incrementan el coste, número de componentes, tamaño de la memoria y peso. En la fase de desarrollo de aplicaciones críticas de vuelo, generalmente se utilizan una serie de estándares o recomendaciones de diseño como ABD100, RTCA DO-160, IEC62395, etc, y diferentes técnicas de protección para evitar fallos del tipo SEU o MCU. Estas técnicas están basadas en procesos tecnológicos específicos como memorias robustas, codificaciones para detección y corrección de errores (EDAC), redundancias software, redundancia modular triple (TMR) o soluciones a nivel sistema. Esta tesis está enfocada a minimizar e incluso suprimir los efectos de los SEUs y MCUs que particularmente ocurren en la electrónica de avión como consecuencia de la exposición a radiación de partículas no cargadas (como son los neutrones) que se encuentra potenciada a las típicas alturas de vuelo. La criticidad en vuelo que tienen determinados sistemas obligan a que dichos sistemas sean tolerantes a fallos, es decir, que garanticen un correcto funcionamiento aún cuando se produzca un fallo en ellos. Es por ello que soluciones como las presentadas en esta tesis tienen interés en el sector industrial. La Tesis incluye una descripción inicial de la física de la radiación incidente sobre aeronaves, y el análisis de sus efectos en los componentes electrónicos aeronaúticos basados en semiconductor, que desembocan en la generación de SEUs y MCUs. Este análisis permite dimensionar adecuadamente y optimizar los procedimientos de corrección que se propongan posteriormente. La Tesis propone un sistema de corrección de fallos SEUs y MCUs que permita cumplir la condición de Sistema Tolerante a Fallos, a la vez que minimiza los niveles de redundancia y de complejidad de los códigos de corrección. El nivel de redundancia es minimizado con la introducción del concepto propuesto HSB (Hardwired Seed Bits), en la que se reduce la información esencial a unos pocos bits semilla, neutros frente a radiación. Los códigos de corrección requeridos se reducen a la corrección de un único error, gracias al uso del concepto de Distancia Virtual entre Bits, a partir del cual será posible corregir múltiples errores simultáneos (MCUs) a partir de códigos simples de corrección. Un ejemplo de aplicación de la Tesis es la implementación de una Protección Tolerante a Fallos sobre la memoria SRAM de una FPGA. Esto significa que queda protegida no sólo la información contenida en la memoria sino que también queda auto-protegida la función de protección misma almacenada en la propia SRAM. De esta forma, el sistema es capaz de auto-regenerarse ante un SEU o incluso un MCU, independientemente de la zona de la SRAM sobre la que impacte la radiación. Adicionalmente, esto se consigue con códigos simples tales como corrección por bit de paridad y Hamming, minimizando la dedicación de recursos de computación hacia tareas de supervisión del sistema.For airborne safety critical applications certification, different techniques are implemented to prevent failures in electronic equipments. The HW failures at flying heights of aircrafts related to solar radiation such as SEU (Single-Event-Upset) and MCU (Multiple Bit Upset), causes bits alterations that corrupt the information at memories. These HW failures cause errors, for example, in the Configuration-Code of an FPGA that defines the functionalities. The protection techniques require classically redundant functionalities that increases the cost, components, memory space and weight. During the development phase for airborne safety critical applications, different aerospace standards are generally recommended as ABD100, RTCA-DO160, IEC62395, etc, and different techniques are classically used to avoid failures such as SEU or MCU. These techniques are based on specific technology processes, Hardened memories, error detection and correction codes (EDAC), SW redundancy, Triple Modular Redundancy (TMR) or System level solutions. This Thesis is focussed to minimize, and even to remove, the effects of SEUs and MCUs, that particularly occurs in the airborne electronics as a consequence of its exposition to solar radiation of non-charged particles (for example the neutrons). These non-charged particles are even powered at flying altitudes due to aircraft volume. The safety categorization of different equipments/functionalities requires a design based on fault-tolerant approach that means, the system will continue its normal operation even if a failure occurs. The solution proposed in this Thesis is relevant for the industrial sector because of its Fault-tolerant capability. Thesis includes an initial description for the physics of the solar radiation that affects into aircrafts, and also the analyses of their effects into the airborne electronics based on semiconductor components that create the SEUs and MCUs. This detailed analysis allows the correct sizing and also the optimization of the procedures used to correct the errors. This Thesis proposes a system that corrects the SEUs and MCUs allowing the fulfilment of the Fault-Tolerant requirement, reducing the redundancy resources and also the complexity of the correction codes. The redundancy resources are minimized thanks to the introduction of the concept of HSB (Hardwired Seed Bits), in which the essential information is reduced to a few seed bits, neutral to radiation. The correction codes required are reduced to the correction of one error thanks to the use of the concept of interleaving distance between adjacent bits, this allows the simultaneous multiple error correction with simple single error correcting codes. An example of the application of this Thesis is the implementation of the Fault-tolerant architecture of an SRAM-based FPGA. That means that the information saved in the memory is protected but also the correction functionality is auto protected as well, also saved into SRAM memory. In this way, the system is able to self-regenerate the information lost in case of SEUs or MCUs. This is independent of the SRAM area affected by the radiation. Furthermore, this performance is achieved by means simple error correcting codes, as parity bits or Hamming, that minimize the use of computational resources to this supervision tasks for system.Programa Oficial de Doctorado en Ingeniería Eléctrica, Electrónica y AutomáticaPresidente: Luis Alfonso Entrena Arrontes.- Secretario: Pedro Reviriego Vasallo.- Vocal: Mª Luisa López Vallej

    A ERROR-CORRECTION ROUTINE FOR DETECTION OF SIGNIFICANCE

    Get PDF
    Recently, the amount of errors affecting several memory cell has elevated considerably. The suggested parallel SEC-DAEC decoder continues to be implemented in High-density lipoprotein and mapped to some TSMC 65-nm technology library using Synopsys Design Compiler. The standard SEC and SEC-DAEC decoders are also carried out to show the advantages of the brand new decoder. The cost compensated for that low decoding time is the fact that generally, the codes aren't optimal when it comes to memory overhead and wish more parity check bits. It's because the scaling from the memory cells and it is forecasted to develop further. This is dependent on the observation the cells impacted by an MCU are physically close. Interleaving, however, includes a cost because it complicates the memory design. Research for multibit ECCs has centered on lowering the decoding latency as oftentimes, the standard decoders are serial and wish several clock cycles. The suggested decoder continues to be implemented in hardware description language and mapped to some 65-nm technology to exhibit its benefits. The primary contribution of the brief would be to enable a quick and efficient parallel correction from the double and single-adjacent errors. The present SEC-DAEC decoders act like SEC decoders but they have to check even the syndrome values that correspond double-adjacent errors. This involves roughly doubling the amount of comparisons. The suggested SEC-DAEC decoder needs a less circuit area than both traditional SEC-DAEC decoder as well as an SEC decoder

    Exploration and Analysis of Combinations of Hamming Codes in 32-bit Memories

    Full text link
    Reducing the threshold voltage of electronic devices increases their sensitivity to electromagnetic radiation dramatically, increasing the probability of changing the memory cells' content. Designers mitigate failures using techniques such as Error Correction Codes (ECCs) to maintain information integrity. Although there are several studies of ECC usage in spatial application memories, there is still no consensus in choosing the type of ECC as well as its organization in memory. This work analyzes some configurations of the Hamming codes applied to 32-bit memories in order to use these memories in spatial applications. This work proposes the use of three types of Hamming codes: Ham(31,26), Ham(15,11), and Ham(7,4), as well as combinations of these codes. We employed 36 error patterns, ranging from one to four bit-flips, to analyze these codes. The experimental results show that the Ham(31,26) configuration, containing five bits of redundancy, obtained the highest rate of simple error correction, almost 97\%, with double, triple, and quadruple error correction rates being 78.7\%, 63.4\%, and 31.4\%, respectively. While an ECC configuration encompassed four Ham(7.4), which uses twelve bits of redundancy, only fixes 87.5\% of simple errors

    EDAC software implementation to protect small satellites memory

    Get PDF
    Radiation is a well-known problem for satellites in space. It can produce different negative effects on electronic components which can provoke errors and failures. Therefore, mitigating these effects is especially important for the success of space missions. One of the techniques to increase the reliability of memory chips and reduce transient errors and permanent faults is Error Detection and Correction (EDAC). EDAC codes are characterised by the use of redundancy to detect and correct errors. This final project consists in the implementation of a software EDAC algorithm to protect the main memory of a microcontroller. The implementation requirements and the issues of software EDAC are described and the test results are commented

    Correction Masking: a technique to implement efficient SET tolerant error correction decoders

    Get PDF
    Single Event Transients (SETs) can be a major concern for combinational circuits. Its importance grows as technology scales because a small charge can create a large disturbance on a circuit node. One example of circuits that can suffer from SETs is the decoders of the Error Correction Codes (ECCs) that are used to protect memories from errors. This paper presents Correction Masking (CM), a technique to implement SET tolerant syndrome decoders. The proposed technique is presented and evaluated both in terms of protection effectiveness and circuit overhead. The results show that it can provide an effective protection while reducing the circuit area and power significantly compared to a Triple Modular Redundancy (TMR) protection. An interesting result is that Correction Masking also reduces the delay as it adds less logic in the critical path than TMR. Finally, the proposed technique can be used for any syndrome decoder. This means that it is applicable to many of the ECCs used to protect memories such as Single Error Correction (SEC), Single Error Correction Double Error Detection (SEC-DED), Single Error Correction Double Adjacent Error Correction (SEC-DAEC), and 3-bit burst codes.The work of Pedro Reviriego was supported in part by the Spanish Agencia Estatal de Investigación (AEI) 10.13039/501100011033 through the ACHILLES Project under Grant PID2019-104207RB-I00 and the Go2Edge Network under Grant RED2018-102585-T, and in part by the Madrid Community Research Project TAPIR-CM under Grant P2018/TCS-4496.Publicad

    Reducing the Overhead of BCH Codes: New Double Error Correction Codes

    Full text link
    [EN] The Bose-Chaudhuri-Hocquenghem (BCH) codes are a well-known class of powerful error correction cyclic codes. BCH codes can correct multiple errors with minimal redundancy. Primitive BCH codes only exist for some word lengths, which do not frequently match those employed in digital systems. This paper focuses on double error correction (DEC) codes for word lengths that are in powers of two (8, 16, 32, and 64), which are commonly used in memories. We also focus on hardware implementations of the encoder and decoder circuits for very fast operations. This work proposes new low redundancy and reduced overhead (LRRO) DEC codes, with the same redundancy as the equivalent BCH DEC codes, but whose encoder, and decoder circuits present a lower overhead (in terms of propagation delay, silicon area usage and power consumption). We used a methodology to search parity check matrices, based on error patterns, in order to design the new codes. We implemented and synthesized them, and compared their results with those obtained for the BCH codes. Our implementation of the decoder circuits achieved reductions between 2.8% and 8.7% in the propagation delay, between 1.3% and 3.0% in the silicon area, and between 15.7% and 26.9% in the power consumption. Therefore, we propose LRRO codes as an alternative for protecting information against multiple errors.This research was supported in part by the Spanish Government, project TIN2016-81075-R, by Primeros Proyectos de Investigacion (PAID-06-18), Vicerrectorado de Investigacion, Innovacion y Transferencia de la Universitat Politecnica de Valencia (UPV), project 20190032, and by the Institute of Information and Communication Technologies (ITACA).Saiz-Adalid, L.; Gracia-Morán, J.; Gil Tomás, DA.; Baraza Calvo, JC.; Gil, P. (2020). Reducing the Overhead of BCH Codes: New Double Error Correction Codes. Electronics. 9(11):1-14. https://doi.org/10.3390/electronics9111897S114911Fujiwara, E. (2005). Code Design for Dependable Systems. doi:10.1002/0471792748Xinmiao, Z. (2017). VLSI Architectures for Modern Error-Correcting Codes. doi:10.1201/b18673Bose, R. C., & Ray-Chaudhuri, D. K. (1960). On a class of error correcting binary group codes. Information and Control, 3(1), 68-79. doi:10.1016/s0019-9958(60)90287-4Chen, P., Zhang, C., Jiang, H., Wang, Z., & Yue, S. (2015). High performance low complexity BCH error correction circuit for SSD controllers. 2015 IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC). doi:10.1109/edssc.2015.7285089IEEE 802.3-2018 - IEEE Standard for Ethernethttps://standards.ieee.org/standard/802_3-2018.htmlH.263: Video Coding for Low Bit Rate Communicationhttps://www.itu.int/rec/T-REC-H.263/enVangelista, L., Benvenuto, N., Tomasin, S., Nokes, C., Stott, J., Filippi, A., … Morello, A. (2009). Key technologies for next-generation terrestrial digital television standard DVB-T2. IEEE Communications Magazine, 47(10), 146-153. doi:10.1109/mcom.2009.52738222013 ITRS—International Technology Roadmap for Semiconductorshttp://www.itrs2.net/2013-itrs.htmlIbe, E., Taniguchi, H., Yahagi, Y., Shimbo, K., & Toba, T. (2010). Impact of Scaling on Neutron-Induced Soft Error in SRAMs From a 250 nm to a 22 nm Design Rule. IEEE Transactions on Electron Devices, 57(7), 1527-1538. doi:10.1109/ted.2010.2047907Gil-Tomás, D., Gracia-Morán, J., Baraza-Calvo, J.-C., Saiz-Adalid, L.-J., & Gil-Vicente, P.-J. (2012). Studying the effects of intermittent faults on a microcontroller. Microelectronics Reliability, 52(11), 2837-2846. doi:10.1016/j.microrel.2012.06.004Neubauer, A., Freudenberger, J., & Khn, V. (2007). Coding Theory. doi:10.1002/9780470519837Morelos-Zaragoza, R. H. (2006). The Art of Error Correcting Coding. doi:10.1002/0470035706Naseer, R., & Draper, J. (2008). DEC ECC design to improve memory reliability in Sub-100nm technologies. 2008 15th IEEE International Conference on Electronics, Circuits and Systems. doi:10.1109/icecs.2008.4674921Saiz-Adalid, L.-J., Gracia-Moran, J., Gil-Tomas, D., Baraza-Calvo, J.-C., & Gil-Vicente, P.-J. (2019). Ultrafast Codes for Multiple Adjacent Error Correction and Double Error Detection. IEEE Access, 7, 151131-151143. doi:10.1109/access.2019.2947315Saiz-Adalid, L.-J., Gil-Vicente, P.-J., Ruiz-García, J.-C., Gil-Tomás, D., Baraza, J.-C., & Gracia-Morán, J. (2013). Flexible Unequal Error Control Codes with Selectable Error Detection and Correction Levels. Computer Safety, Reliability, and Security, 178-189. doi:10.1007/978-3-642-40793-2_17Saiz-Adalid, L.-J., Reviriego, P., Gil, P., Pontarelli, S., & Maestro, J. A. (2015). MCU Tolerance in SRAMs Through Low-Redundancy Triple Adjacent Error Correction. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 23(10), 2332-2336. doi:10.1109/tvlsi.2014.2357476Gracia-Moran, J., Saiz-Adalid, L. J., Gil-Tomas, D., & Gil-Vicente, P. J. (2018). Improving Error Correction Codes for Multiple-Cell Upsets in Space Applications. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 26(10), 2132-2142. doi:10.1109/tvlsi.2018.2837220Cadence: Computational Software for Intelligent System Designhttps://www.cadence.comStine, J. E., Castellanos, I., Wood, M., Henson, J., Love, F., Davis, W. R., … Jenkal, R. (2007). FreePDK: An Open-Source Variation-Aware Design Kit. 2007 IEEE International Conference on Microelectronic Systems Education (MSE’07). doi:10.1109/mse.2007.44NanGate FreePDK45 Open Cell Libraryhttp://www.nangate.com/?page_id=232

    Ultrafast Codes for Multiple Adjacent Error Correction and Double Error Detection

    Full text link
    (c) 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.[EN] Reliable computer systems employ error control codes (ECCs) to protect information from errors. For example, memories are frequently protected using single error correction-double error detection (SEC-DED) codes. ECCs are traditionally designed to minimize the number of redundant bits, as they are added to each word in the whole memory. Nevertheless, using an ECC introduces encoding and decoding latencies, silicon area usage and power consumption. In other computer units, these parameters should be optimized, and redundancy would be less important. For example, protecting registers against errors remains a major concern for deep sub-micron systems due to technology scaling. In this case, an important requirement for register protection is to keep encoding and decoding latencies as short as possible. Ultrafast error control codes achieve very low delays, independently of the word length, increasing the redundancy. This paper summarizes previous works on Ultrafast codes (SEC and SEC-DED), and proposes new codes combining double error detection and adjacent error correction. We have implemented, synthesized and compared different Ultrafast codes with other state-of-the-art fast codes. The results show the validity of the approach, achieving low latencies and a good balance with silicon area and power consumption.This work was supported in part by the Spanish Government under Project TIN2016-81075-R, and in part by the Primeros Proyectos de Investigacion, Vicerrectorado de Investigacion, Innovacion y Transferencia de la Universitat Politecnica de Valencia (UPV), Valencia, Spain, under Project PAID-06-18 20190032.Saiz-Adalid, L.; Gracia-Morán, J.; Gil Tomás, DA.; Baraza Calvo, JC.; Gil, P. (2019). Ultrafast Codes for Multiple Adjacent Error Correction and Double Error Detection. IEEE Access. 7:151131-151143. https://doi.org/10.1109/ACCESS.2019.2947315S151131151143

    Soft-Error Resilience Framework For Reliable and Energy-Efficient CMOS Logic and Spintronic Memory Architectures

    Get PDF
    The revolution in chip manufacturing processes spanning five decades has proliferated high performance and energy-efficient nano-electronic devices across all aspects of daily life. In recent years, CMOS technology scaling has realized billions of transistors within large-scale VLSI chips to elevate performance. However, these advancements have also continually augmented the impact of Single-Event Transient (SET) and Single-Event Upset (SEU) occurrences which precipitate a range of Soft-Error (SE) dependability issues. Consequently, soft-error mitigation techniques have become essential to improve systems\u27 reliability. Herein, first, we proposed optimized soft-error resilience designs to improve robustness of sub-micron computing systems. The proposed approaches were developed to deliver energy-efficiency and tolerate double/multiple errors simultaneously while incurring acceptable speed performance degradation compared to the prior work. Secondly, the impact of Process Variation (PV) at the Near-Threshold Voltage (NTV) region on redundancy-based SE-mitigation approaches for High-Performance Computing (HPC) systems was investigated to highlight the approach that can realize favorable attributes, such as reduced critical datapath delay variation and low speed degradation. Finally, recently, spin-based devices have been widely used to design Non-Volatile (NV) elements such as NV latches and flip-flops, which can be leveraged in normally-off computing architectures for Internet-of-Things (IoT) and energy-harvesting-powered applications. Thus, in the last portion of this dissertation, we design and evaluate for soft-error resilience NV-latching circuits that can achieve intriguing features, such as low energy consumption, high computing performance, and superior soft errors tolerance, i.e., concurrently able to tolerate Multiple Node Upset (MNU), to potentially become a mainstream solution for the aerospace and avionic nanoelectronics. Together, these objectives cooperate to increase energy-efficiency and soft errors mitigation resiliency of larger-scale emerging NV latching circuits within iso-energy constraints. In summary, addressing these reliability concerns is paramount to successful deployment of future reliable and energy-efficient CMOS logic and spintronic memory architectures with deeply-scaled devices operating at low-voltages

    プレーナーガタオヨビフィンフェットガタエスラムニオケルチジョウホウシャセンキインシングルイベントアップセットニカンスルジッケンテキケンキュウ

    Full text link
    T. Kato et al., "Muon-Induced Single-Event Upsets in 20-nm SRAMs: Comparative Characterization With Neutrons and Alpha Particles," in IEEE Transactions on Nuclear Science, vol. 68, no. 7, pp. 1436-1444, July 2021, doi: 10.1109/TNS.2021.3082559
    corecore