1,711 research outputs found

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

    On Fault Tolerance Methods for Networks-on-Chip

    Get PDF
    Technology scaling has proceeded into dimensions in which the reliability of manufactured devices is becoming endangered. The reliability decrease is a consequence of physical limitations, relative increase of variations, and decreasing noise margins, among others. A promising solution for bringing the reliability of circuits back to a desired level is the use of design methods which introduce tolerance against possible faults in an integrated circuit. This thesis studies and presents fault tolerance methods for network-onchip (NoC) which is a design paradigm targeted for very large systems-onchip. In a NoC resources, such as processors and memories, are connected to a communication network; comparable to the Internet. Fault tolerance in such a system can be achieved at many abstraction levels. The thesis studies the origin of faults in modern technologies and explains the classification to transient, intermittent and permanent faults. A survey of fault tolerance methods is presented to demonstrate the diversity of available methods. Networks-on-chip are approached by exploring their main design choices: the selection of a topology, routing protocol, and flow control method. Fault tolerance methods for NoCs are studied at different layers of the OSI reference model. The data link layer provides a reliable communication link over a physical channel. Error control coding is an efficient fault tolerance method especially against transient faults at this abstraction level. Error control coding methods suitable for on-chip communication are studied and their implementations presented. Error control coding loses its effectiveness in the presence of intermittent and permanent faults. Therefore, other solutions against them are presented. The introduction of spare wires and split transmissions are shown to provide good tolerance against intermittent and permanent errors and their combination to error control coding is illustrated. At the network layer positioned above the data link layer, fault tolerance can be achieved with the design of fault tolerant network topologies and routing algorithms. Both of these approaches are presented in the thesis together with realizations in the both categories. The thesis concludes that an optimal fault tolerance solution contains carefully co-designed elements from different abstraction levelsSiirretty Doriast

    New Fault Detection, Mitigation and Injection Strategies for Current and Forthcoming Challenges of HW Embedded Designs

    Full text link
    Tesis por compendio[EN] Relevance of electronics towards safety of common devices has only been growing, as an ever growing stake of the functionality is assigned to them. But of course, this comes along the constant need for higher performances to fulfill such functionality requirements, while keeping power and budget low. In this scenario, industry is struggling to provide a technology which meets all the performance, power and price specifications, at the cost of an increased vulnerability to several types of known faults or the appearance of new ones. To provide a solution for the new and growing faults in the systems, designers have been using traditional techniques from safety-critical applications, which offer in general suboptimal results. In fact, modern embedded architectures offer the possibility of optimizing the dependability properties by enabling the interaction of hardware, firmware and software levels in the process. However, that point is not yet successfully achieved. Advances in every level towards that direction are much needed if flexible, robust, resilient and cost effective fault tolerance is desired. The work presented here focuses on the hardware level, with the background consideration of a potential integration into a holistic approach. The efforts in this thesis have focused several issues: (i) to introduce additional fault models as required for adequate representativity of physical effects blooming in modern manufacturing technologies, (ii) to provide tools and methods to efficiently inject both the proposed models and classical ones, (iii) to analyze the optimum method for assessing the robustness of the systems by using extensive fault injection and later correlation with higher level layers in an effort to cut development time and cost, (iv) to provide new detection methodologies to cope with challenges modeled by proposed fault models, (v) to propose mitigation strategies focused towards tackling such new threat scenarios and (vi) to devise an automated methodology for the deployment of many fault tolerance mechanisms in a systematic robust way. The outcomes of the thesis constitute a suite of tools and methods to help the designer of critical systems in his task to develop robust, validated, and on-time designs tailored to his application.[ES] La relevancia que la electrónica adquiere en la seguridad de los productos ha crecido inexorablemente, puesto que cada vez ésta copa una mayor influencia en la funcionalidad de los mismos. Pero, por supuesto, este hecho viene acompañado de una necesidad constante de mayores prestaciones para cumplir con los requerimientos funcionales, al tiempo que se mantienen los costes y el consumo en unos niveles reducidos. En este escenario, la industria está realizando esfuerzos para proveer una tecnología que cumpla con todas las especificaciones de potencia, consumo y precio, a costa de un incremento en la vulnerabilidad a múltiples tipos de fallos conocidos o la introducción de nuevos. Para ofrecer una solución a los fallos nuevos y crecientes en los sistemas, los diseñadores han recurrido a técnicas tradicionalmente asociadas a sistemas críticos para la seguridad, que ofrecen en general resultados sub-óptimos. De hecho, las arquitecturas empotradas modernas ofrecen la posibilidad de optimizar las propiedades de confiabilidad al habilitar la interacción de los niveles de hardware, firmware y software en el proceso. No obstante, ese punto no está resulto todavía. Se necesitan avances en todos los niveles en la mencionada dirección para poder alcanzar los objetivos de una tolerancia a fallos flexible, robusta, resiliente y a bajo coste. El trabajo presentado aquí se centra en el nivel de hardware, con la consideración de fondo de una potencial integración en una estrategia holística. Los esfuerzos de esta tesis se han centrado en los siguientes aspectos: (i) la introducción de modelos de fallo adicionales requeridos para la representación adecuada de efectos físicos surgentes en las tecnologías de manufactura actuales, (ii) la provisión de herramientas y métodos para la inyección eficiente de los modelos propuestos y de los clásicos, (iii) el análisis del método óptimo para estudiar la robustez de sistemas mediante el uso de inyección de fallos extensiva, y la posterior correlación con capas de más alto nivel en un esfuerzo por recortar el tiempo y coste de desarrollo, (iv) la provisión de nuevos métodos de detección para cubrir los retos planteados por los modelos de fallo propuestos, (v) la propuesta de estrategias de mitigación enfocadas hacia el tratamiento de dichos escenarios de amenaza y (vi) la introducción de una metodología automatizada de despliegue de diversos mecanismos de tolerancia a fallos de forma robusta y sistemática. Los resultados de la presente tesis constituyen un conjunto de herramientas y métodos para ayudar al diseñador de sistemas críticos en su tarea de desarrollo de diseños robustos, validados y en tiempo adaptados a su aplicación.[CA] La rellevància que l'electrònica adquireix en la seguretat dels productes ha crescut inexorablement, puix cada volta més aquesta abasta una major influència en la funcionalitat dels mateixos. Però, per descomptat, aquest fet ve acompanyat d'un constant necessitat de majors prestacions per acomplir els requeriments funcionals, mentre es mantenen els costos i consums en uns nivells reduïts. Donat aquest escenari, la indústria està fent esforços per proveir una tecnologia que complisca amb totes les especificacions de potència, consum i preu, tot a costa d'un increment en la vulnerabilitat a diversos tipus de fallades conegudes, i a la introducció de nous tipus. Per oferir una solució a les noves i creixents fallades als sistemes, els dissenyadors han recorregut a tècniques tradicionalment associades a sistemes crítics per a la seguretat, que en general oferixen resultats sub-òptims. De fet, les arquitectures empotrades modernes oferixen la possibilitat d'optimitzar les propietats de confiabilitat en habilitar la interacció dels nivells de hardware, firmware i software en el procés. Tot i això eixe punt no està resolt encara. Es necessiten avanços a tots els nivells en l'esmentada direcció per poder assolir els objectius d'una tolerància a fallades flexible, robusta, resilient i a baix cost. El treball ací presentat se centra en el nivell de hardware, amb la consideració de fons d'una potencial integració en una estratègia holística. Els esforços d'esta tesi s'han centrat en els següents aspectes: (i) la introducció de models de fallada addicionals requerits per a la representació adequada d'efectes físics que apareixen en les tecnologies de fabricació actuals, (ii) la provisió de ferramentes i mètodes per a la injecció eficient del models proposats i dels clàssics, (iii) l'anàlisi del mètode òptim per estudiar la robustesa de sistemes mitjançant l'ús d'injecció de fallades extensiva, i la posterior correlació amb capes de més alt nivell en un esforç per retallar el temps i cost de desenvolupament, (iv) la provisió de nous mètodes de detecció per cobrir els reptes plantejats pels models de fallades proposats, (v) la proposta d'estratègies de mitigació enfocades cap al tractament dels esmentats escenaris d'amenaça i (vi) la introducció d'una metodologia automatitzada de desplegament de diversos mecanismes de tolerància a fallades de forma robusta i sistemàtica. Els resultats de la present tesi constitueixen un conjunt de ferramentes i mètodes per ajudar el dissenyador de sistemes crítics en la seua tasca de desenvolupament de dissenys robustos, validats i a temps adaptats a la seua aplicació.Espinosa García, J. (2016). New Fault Detection, Mitigation and Injection Strategies for Current and Forthcoming Challenges of HW Embedded Designs [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/73146TESISCompendi

    Synthesis of Fault-Tolerant Embedded Systems

    Get PDF
    This work addresses the issue of design optimization for faulttolerant hard real-time systems. In particular, our focus is on the handling of transient faults using both checkpointing with rollback recovery and active replication. Fault tolerant schedules are generated based on a conditional process graph representation. The formulated system synthesis approaches decide the assignment of fault-tolerance policies to processes, the optimal placement of checkpoints and the mapping of processes to processors, such that multiple transient faults are tolerated, transparency requirements are considered, and the timing constraints of the application are satisfied. 1

    Climate change, blackouts and society: dress rehearsals for the future

    Get PDF
    Blackouts serve as a reminder of how dependent mankind has become on electricity and the appliances it powers. Whatever the cause of a blackout, there are patterns in the consequences that take place as a result. These include not only measurable economic losses but also social consequences that are sometimes immeasurable. This paper reviews almost 50 different significant power-outage events that have occurred in 26 countries, mostly over the last decade. The unpredictable nature of blackouts limits the collection of field data. Hence the data in this paper are collected from reputable media coverage of the events. The patterns that are analysed include economic loss, food, health, crime, social unrest, transport and inequalities While many blackouts are caused by systems failures, there is a growing trend of failures due to inadequate energy; whether due to depletion of resources such as oil and coal or due to the vagaries of the climate in the supply of renewable energy. As we enter the period of peak oil and climate change the security of energy supply for electricity generation is under threat. Understanding the nature of blackouts is more than just a record of past systems failures; blackouts are dress rehearsals for the future

    Proposal of an Adaptive Fault Tolerance Mechanism to Tolerate Intermittent Faults in RAM

    Full text link
    [EN] Due to transistor shrinking, intermittent faults are a major concern in current digital systems. This work presents an adaptive fault tolerance mechanism based on error correction codes (ECC), able to modify its behavior when the error conditions change without increasing the redundancy. As a case example, we have designed a mechanism that can detect intermittent faults and swap from an initial generic ECC to a specific ECC capable of tolerating one intermittent fault. We have inserted the mechanism in the memory system of a 32-bit RISC processor and validated it by using VHDL simulation-based fault injection. We have used two (39, 32) codes: a single error correction-double error detection (SEC-DED) and a code developed by our research group, called EPB3932, capable of correcting single errors and double and triple adjacent errors that include a bit previously tagged as error-prone. The results of injecting transient, intermittent, and combinations of intermittent and transient faults show that the proposed mechanism works properly. As an example, the percentage of failures and latent errors is 0% when injecting a triple adjacent fault after an intermittent stuck-at fault. We have synthesized the adaptive fault tolerance mechanism proposed in two types of FPGAs: non-reconfigurable and partially reconfigurable. In both cases, the overhead introduced is affordable in terms of hardware, time and power consumption.This research was supported in part by the Spanish Government, project TIN2016-81,075-R, and by Primeros Proyectos de Investigacion (PAID-06-18), Vicerrectorado de Investigacion, Innovacion y Transferencia de la Universitat Politecnica de Valencia (UPV), project 20190032.Baraza Calvo, JC.; Gracia-Morán, J.; Saiz-Adalid, L.; Gil Tomás, DA.; Gil, P. (2020). Proposal of an Adaptive Fault Tolerance Mechanism to Tolerate Intermittent Faults in RAM. Electronics. 9(12):1-30. https://doi.org/10.3390/electronics9122074S130912International Technology Roadmap for Semiconductors (ITRS)http://www.itrs2.net/2013-itrs.htmlJeng, S.-L., Lu, J.-C., & Wang, K. (2007). A Review of Reliability Research on Nanotechnology. IEEE Transactions on Reliability, 56(3), 401-410. doi:10.1109/tr.2007.903188Ibe, E., Taniguchi, H., Yahagi, Y., Shimbo, K., & Toba, T. (2010). Impact of Scaling on Neutron-Induced Soft Error in SRAMs From a 250 nm to a 22 nm Design Rule. IEEE Transactions on Electron Devices, 57(7), 1527-1538. doi:10.1109/ted.2010.2047907Boussif, A., Ghazel, M., & Basilio, J. C. (2020). Intermittent fault diagnosability of discrete event systems: an overview of automaton-based approaches. Discrete Event Dynamic Systems, 31(1), 59-102. doi:10.1007/s10626-020-00324-yConstantinescu, C. (2003). Trends and challenges in VLSI circuit reliability. IEEE Micro, 23(4), 14-19. doi:10.1109/mm.2003.1225959Bondavalli, A., Chiaradonna, S., Di Giandomenico, F., & Grandoni, F. (2000). Threshold-based mechanisms to discriminate transient from intermittent faults. IEEE Transactions on Computers, 49(3), 230-245. doi:10.1109/12.841127Contant, O., Lafortune, S., & Teneketzis, D. (2004). Diagnosis of Intermittent Faults. Discrete Event Dynamic Systems, 14(2), 171-202. doi:10.1023/b:disc.0000018570.20941.d2Sorensen, B. A., Kelly, G., Sajecki, A., & Sorensen, P. W. (s. f.). An analyzer for detecting intermittent faults in electronic devices. Proceedings of AUTOTESTCON ’94. doi:10.1109/autest.1994.381590Gracia-Moran, J., Gil-Tomas, D., Saiz-Adalid, L. J., Baraza, J. C., & Gil-Vicente, P. J. (2010). Experimental validation of a fault tolerant microcomputer system against intermittent faults. 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN). doi:10.1109/dsn.2010.5544288Fujiwara, E. (2005). Code Design for Dependable Systems. doi:10.1002/0471792748Hamming, R. W. (1950). Error Detecting and Error Correcting Codes. Bell System Technical Journal, 29(2), 147-160. doi:10.1002/j.1538-7305.1950.tb00463.xSaiz-Adalid, L.-J., Gil-Vicente, P.-J., Ruiz-García, J.-C., Gil-Tomás, D., Baraza, J.-C., & Gracia-Morán, J. (2013). Flexible Unequal Error Control Codes with Selectable Error Detection and Correction Levels. Computer Safety, Reliability, and Security, 178-189. doi:10.1007/978-3-642-40793-2_17Frei, R., McWilliam, R., Derrick, B., Purvis, A., Tiwari, A., & Di Marzo Serugendo, G. (2013). Self-healing and self-repairing technologies. The International Journal of Advanced Manufacturing Technology, 69(5-8), 1033-1061. doi:10.1007/s00170-013-5070-2Maiz, J., Hareland, S., Zhang, K., & Armstrong, P. (s. f.). Characterization of multi-bit soft error events in advanced SRAMs. IEEE International Electron Devices Meeting 2003. doi:10.1109/iedm.2003.1269335Schroeder, B., Pinheiro, E., & Weber, W.-D. (2011). DRAM errors in the wild. Communications of the ACM, 54(2), 100-107. doi:10.1145/1897816.1897844BanaiyanMofrad, A., Ebrahimi, M., Oboril, F., Tahoori, M. B., & Dutt, N. (2015). Protecting caches against multi-bit errors using embedded erasure coding. 2015 20th IEEE European Test Symposium (ETS). doi:10.1109/ets.2015.7138735Kim, J., Sullivan, M., Lym, S., & Erez, M. (2016). All-Inclusive ECC: Thorough End-to-End Protection for Reliable Computer Memory. 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). doi:10.1109/isca.2016.60Hwang, A. A., Stefanovici, I. A., & Schroeder, B. (2012). Cosmic rays don’t strike twice. ACM SIGPLAN Notices, 47(4), 111-122. doi:10.1145/2248487.2150989Gil-Tomás, D., Gracia-Morán, J., Baraza-Calvo, J.-C., Saiz-Adalid, L.-J., & Gil-Vicente, P.-J. (2012). Studying the effects of intermittent faults on a microcontroller. Microelectronics Reliability, 52(11), 2837-2846. doi:10.1016/j.microrel.2012.06.004Plasma CPU Modelhttps://opencores.org/projects/plasmaArlat, J., Aguera, M., Amat, L., Crouzet, Y., Fabre, J.-C., Laprie, J.-C., … Powell, D. (1990). Fault injection for dependability validation: a methodology and some applications. IEEE Transactions on Software Engineering, 16(2), 166-182. doi:10.1109/32.44380Gil-Tomas, D., Gracia-Moran, J., Baraza-Calvo, J.-C., Saiz-Adalid, L.-J., & Gil-Vicente, P.-J. (2012). Analyzing the Impact of Intermittent Faults on Microprocessors Applying Fault Injection. IEEE Design & Test of Computers, 29(6), 66-73. doi:10.1109/mdt.2011.2179514Rashid, L., Pattabiraman, K., & Gopalakrishnan, S. (2010). Modeling the Propagation of Intermittent Hardware Faults in Programs. 2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing. doi:10.1109/prdc.2010.52Amiri, M., Siddiqui, F. M., Kelly, C., Woods, R., Rafferty, K., & Bardak, B. (2016). FPGA-Based Soft-Core Processors for Image Processing Applications. Journal of Signal Processing Systems, 87(1), 139-156. doi:10.1007/s11265-016-1185-7Hailesellasie, M., Hasan, S. R., & Mohamed, O. A. (2019). MulMapper: Towards an Automated FPGA-Based CNN Processor Generator Based on a Dynamic Design Space Exploration. 2019 IEEE International Symposium on Circuits and Systems (ISCAS). doi:10.1109/iscas.2019.8702589Mittal, S. (2018). A survey of FPGA-based accelerators for convolutional neural networks. Neural Computing and Applications, 32(4), 1109-1139. doi:10.1007/s00521-018-3761-1Intel Completes Acquisition of Alterahttps://newsroom.intel.com/news-releases/intel-completes-acquisition-of-altera/#gs.mi6ujuAMD to Acquire Xilinx, Creating the Industry’s High Performance Computing Leaderhttps://www.amd.com/en/press-releases/2020-10-27-amd-to-acquire-xilinx-creating-the-industry-s-high-performance-computingKim, K. H., & Lawrence, T. F. (s. f.). Adaptive fault tolerance: issues and approaches. [1990] Proceedings. Second IEEE Workshop on Future Trends of Distributed Computing Systems. doi:10.1109/ftdcs.1990.138292Gonzalez, O., Shrikumar, H., Stankovic, J. A., & Ramamritham, K. (s. f.). Adaptive fault tolerance and graceful degradation under dynamic hard real-time scheduling. Proceedings Real-Time Systems Symposium. doi:10.1109/real.1997.641271Jacobs, A., George, A. D., & Cieslewski, G. (2009). Reconfigurable fault tolerance: A framework for environmentally adaptive fault mitigation in space. 2009 International Conference on Field Programmable Logic and Applications. doi:10.1109/fpl.2009.5272313Shin, D., Park, J., Park, J., Paul, S., & Bhunia, S. (2017). Adaptive ECC for Tailored Protection of Nanoscale Memory. IEEE Design & Test, 34(6), 84-93. doi:10.1109/mdat.2016.2615844Silva, F., Muniz, A., Silveira, J., & Marcon, C. (2020). CLC-A: An Adaptive Implementation of the Column Line Code (CLC) ECC. 2020 33rd Symposium on Integrated Circuits and Systems Design (SBCCI). doi:10.1109/sbcci50935.2020.9189901Mukherjee, S. S., Emer, J., Fossum, T., & Reinhardt, S. K. (s. f.). Cache scrubbing in microprocessors: myth or necessity? 10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings. doi:10.1109/prdc.2004.1276550Saleh, A. M., Serrano, J. J., & Patel, J. H. (1990). Reliability of scrubbing recovery-techniques for memory systems. IEEE Transactions on Reliability, 39(1), 114-122. doi:10.1109/24.52622X9SRA User’s Manual (Rev. 1.1)https://www.manualshelf.com/manual/supermicro/x9sra/user-s-manual-1-1.htmlChishti, Z., Alameldeen, A. R., Wilkerson, C., Wu, W., & Lu, S.-L. (2009). Improving cache lifetime reliability at ultra-low voltages. Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture - Micro-42. doi:10.1145/1669112.1669126Datta, R., & Touba, N. A. (2011). Designing a fast and adaptive error correction scheme for increasing the lifetime of phase change memories. 29th VLSI Test Symposium. doi:10.1109/vts.2011.5783773Kim, J., Lim, J., Cho, W., Shin, K.-S., Kim, H., & Lee, H.-J. (2016). Adaptive Memory Controller for High-performance Multi-channel Memory. JSTS:Journal of Semiconductor Technology and Science, 16(6), 808-816. doi:10.5573/jsts.2016.16.6.808Yuan, L., Liu, H., Jia, P., & Yang, Y. (2015). Reliability-Based ECC System for Adaptive Protection of NAND Flash Memories. 2015 Fifth International Conference on Communication Systems and Network Technologies. doi:10.1109/csnt.2015.23Zhou, Y., Wu, F., Lu, Z., He, X., Huang, P., & Xie, C. (2019). SCORE. ACM Transactions on Architecture and Code Optimization, 15(4), 1-25. doi:10.1145/3291052Lu, S.-K., Li, H.-P., & Miyase, K. (2018). Adaptive ECC Techniques for Reliability and Yield Enhancement of Phase Change Memory. 2018 IEEE 24th International Symposium on On-Line Testing And Robust System Design (IOLTS). doi:10.1109/iolts.2018.8474118Chen, J., Andjelkovic, M., Simevski, A., Li, Y., Skoncej, P., & Krstic, M. (2019). Design of SRAM-Based Low-Cost SEU Monitor for Self-Adaptive Multiprocessing Systems. 2019 22nd Euromicro Conference on Digital System Design (DSD). doi:10.1109/dsd.2019.00080Wang, X., Jiang, L., & Chakrabarty, K. (2020). LSTM-based Analysis of Temporally- and Spatially-Correlated Signatures for Intermittent Fault Detection. 2020 IEEE 38th VLSI Test Symposium (VTS). doi:10.1109/vts48691.2020.9107600Ebrahimi, H., & G. Kerkhoff, H. (2018). Intermittent Resistance Fault Detection at Board Level. 2018 IEEE 21st International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS). doi:10.1109/ddecs.2018.00031Ebrahimi, H., & Kerkhoff, H. G. (2020). A New Monitor Insertion Algorithm for Intermittent Fault Detection. 2020 IEEE European Test Symposium (ETS). doi:10.1109/ets48528.2020.9131563Hsiao, M. Y. (1970). A Class of Optimal Minimum Odd-weight-column SEC-DED Codes. IBM Journal of Research and Development, 14(4), 395-401. doi:10.1147/rd.144.0395Benso, A., & Prinetto, P. (Eds.). (2004). Fault Injection Techniques and Tools for Embedded Systems Reliability Evaluation. Frontiers in Electronic Testing. doi:10.1007/b105828Gracia, J., Saiz, L. J., Baraza, J. C., Gil, D., & Gil, P. J. (2008). Analysis of the influence of intermittent faults in a microcontroller. 2008 11th IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems. doi:10.1109/ddecs.2008.4538761ZC702 Evaluation Board for the Zynq-7000 XC7Z020 SoChttps://www.xilinx.com/support/documentation/boards_and_kits/zc702_zvik/ug850-zc702-eval-bd.pd

    Toward Biologically-Inspired Self-Healing, Resilient Architectures for Digital Instrumentation and Control Systems and Embedded Devices

    Get PDF
    Digital Instrumentation and Control (I&C) systems in safety-related applications of next generation industrial automation systems require high levels of resilience against different fault classes. One of the more essential concepts for achieving this goal is the notion of resilient and survivable digital I&C systems. In recent years, self-healing concepts based on biological physiology have received attention for the design of robust digital systems. However, many of these approaches have not been architected from the outset with safety in mind, nor have they been targeted for the automation community where a significant need exists. This dissertation presents a new self-healing digital I&C architecture called BioSymPLe, inspired from the way nature responds, defends and heals: the stem cells in the immune system of living organisms, the life cycle of the living cell, and the pathway from Deoxyribonucleic acid (DNA) to protein. The BioSymPLe architecture is integrating biological concepts, fault tolerance techniques, and operational schematics for the international standard IEC 61131-3 to facilitate adoption in the automation industry. BioSymPLe is organized into three hierarchical levels: the local function migration layer from the top side, the critical service layer in the middle, and the global function migration layer from the bottom side. The local layer is used to monitor the correct execution of functions at the cellular level and to activate healing mechanisms at the critical service level. The critical layer is allocating a group of functional B cells which represent the building block that executes the intended functionality of critical application based on the expression for DNA genetic codes stored inside each cell. The global layer uses a concept of embryonic stem cells by differentiating these type of cells to repair the faulty T cells and supervising all repair mechanisms. Finally, two industrial applications have been mapped on the proposed architecture, which are capable of tolerating a significant number of faults (transient, permanent, and hardware common cause failures CCFs) that can stem from environmental disturbances and we believe the nexus of its concepts can positively impact the next generation of critical systems in the automation industry
    corecore