21 research outputs found

    Proposal of an Adaptive Fault Tolerance Mechanism to Tolerate Intermittent Faults in RAM

    Full text link
    [EN] Due to transistor shrinking, intermittent faults are a major concern in current digital systems. This work presents an adaptive fault tolerance mechanism based on error correction codes (ECC), able to modify its behavior when the error conditions change without increasing the redundancy. As a case example, we have designed a mechanism that can detect intermittent faults and swap from an initial generic ECC to a specific ECC capable of tolerating one intermittent fault. We have inserted the mechanism in the memory system of a 32-bit RISC processor and validated it by using VHDL simulation-based fault injection. We have used two (39, 32) codes: a single error correction-double error detection (SEC-DED) and a code developed by our research group, called EPB3932, capable of correcting single errors and double and triple adjacent errors that include a bit previously tagged as error-prone. The results of injecting transient, intermittent, and combinations of intermittent and transient faults show that the proposed mechanism works properly. As an example, the percentage of failures and latent errors is 0% when injecting a triple adjacent fault after an intermittent stuck-at fault. We have synthesized the adaptive fault tolerance mechanism proposed in two types of FPGAs: non-reconfigurable and partially reconfigurable. In both cases, the overhead introduced is affordable in terms of hardware, time and power consumption.This research was supported in part by the Spanish Government, project TIN2016-81,075-R, and by Primeros Proyectos de Investigacion (PAID-06-18), Vicerrectorado de Investigacion, Innovacion y Transferencia de la Universitat Politecnica de Valencia (UPV), project 20190032.Baraza Calvo, JC.; Gracia-Morán, J.; Saiz-Adalid, L.; Gil Tomás, DA.; Gil, P. (2020). Proposal of an Adaptive Fault Tolerance Mechanism to Tolerate Intermittent Faults in RAM. Electronics. 9(12):1-30. https://doi.org/10.3390/electronics9122074S130912International Technology Roadmap for Semiconductors (ITRS)http://www.itrs2.net/2013-itrs.htmlJeng, S.-L., Lu, J.-C., & Wang, K. (2007). A Review of Reliability Research on Nanotechnology. IEEE Transactions on Reliability, 56(3), 401-410. doi:10.1109/tr.2007.903188Ibe, E., Taniguchi, H., Yahagi, Y., Shimbo, K., & Toba, T. (2010). Impact of Scaling on Neutron-Induced Soft Error in SRAMs From a 250 nm to a 22 nm Design Rule. IEEE Transactions on Electron Devices, 57(7), 1527-1538. doi:10.1109/ted.2010.2047907Boussif, A., Ghazel, M., & Basilio, J. C. (2020). Intermittent fault diagnosability of discrete event systems: an overview of automaton-based approaches. Discrete Event Dynamic Systems, 31(1), 59-102. doi:10.1007/s10626-020-00324-yConstantinescu, C. (2003). Trends and challenges in VLSI circuit reliability. IEEE Micro, 23(4), 14-19. doi:10.1109/mm.2003.1225959Bondavalli, A., Chiaradonna, S., Di Giandomenico, F., & Grandoni, F. (2000). Threshold-based mechanisms to discriminate transient from intermittent faults. IEEE Transactions on Computers, 49(3), 230-245. doi:10.1109/12.841127Contant, O., Lafortune, S., & Teneketzis, D. (2004). Diagnosis of Intermittent Faults. Discrete Event Dynamic Systems, 14(2), 171-202. doi:10.1023/b:disc.0000018570.20941.d2Sorensen, B. A., Kelly, G., Sajecki, A., & Sorensen, P. W. (s. f.). An analyzer for detecting intermittent faults in electronic devices. Proceedings of AUTOTESTCON ’94. doi:10.1109/autest.1994.381590Gracia-Moran, J., Gil-Tomas, D., Saiz-Adalid, L. J., Baraza, J. C., & Gil-Vicente, P. J. (2010). Experimental validation of a fault tolerant microcomputer system against intermittent faults. 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN). doi:10.1109/dsn.2010.5544288Fujiwara, E. (2005). Code Design for Dependable Systems. doi:10.1002/0471792748Hamming, R. W. (1950). Error Detecting and Error Correcting Codes. Bell System Technical Journal, 29(2), 147-160. doi:10.1002/j.1538-7305.1950.tb00463.xSaiz-Adalid, L.-J., Gil-Vicente, P.-J., Ruiz-García, J.-C., Gil-Tomás, D., Baraza, J.-C., & Gracia-Morán, J. (2013). Flexible Unequal Error Control Codes with Selectable Error Detection and Correction Levels. Computer Safety, Reliability, and Security, 178-189. doi:10.1007/978-3-642-40793-2_17Frei, R., McWilliam, R., Derrick, B., Purvis, A., Tiwari, A., & Di Marzo Serugendo, G. (2013). Self-healing and self-repairing technologies. The International Journal of Advanced Manufacturing Technology, 69(5-8), 1033-1061. doi:10.1007/s00170-013-5070-2Maiz, J., Hareland, S., Zhang, K., & Armstrong, P. (s. f.). Characterization of multi-bit soft error events in advanced SRAMs. IEEE International Electron Devices Meeting 2003. doi:10.1109/iedm.2003.1269335Schroeder, B., Pinheiro, E., & Weber, W.-D. (2011). DRAM errors in the wild. Communications of the ACM, 54(2), 100-107. doi:10.1145/1897816.1897844BanaiyanMofrad, A., Ebrahimi, M., Oboril, F., Tahoori, M. B., & Dutt, N. (2015). Protecting caches against multi-bit errors using embedded erasure coding. 2015 20th IEEE European Test Symposium (ETS). doi:10.1109/ets.2015.7138735Kim, J., Sullivan, M., Lym, S., & Erez, M. (2016). All-Inclusive ECC: Thorough End-to-End Protection for Reliable Computer Memory. 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). doi:10.1109/isca.2016.60Hwang, A. A., Stefanovici, I. A., & Schroeder, B. (2012). Cosmic rays don’t strike twice. ACM SIGPLAN Notices, 47(4), 111-122. doi:10.1145/2248487.2150989Gil-Tomás, D., Gracia-Morán, J., Baraza-Calvo, J.-C., Saiz-Adalid, L.-J., & Gil-Vicente, P.-J. (2012). Studying the effects of intermittent faults on a microcontroller. Microelectronics Reliability, 52(11), 2837-2846. doi:10.1016/j.microrel.2012.06.004Plasma CPU Modelhttps://opencores.org/projects/plasmaArlat, J., Aguera, M., Amat, L., Crouzet, Y., Fabre, J.-C., Laprie, J.-C., … Powell, D. (1990). Fault injection for dependability validation: a methodology and some applications. IEEE Transactions on Software Engineering, 16(2), 166-182. doi:10.1109/32.44380Gil-Tomas, D., Gracia-Moran, J., Baraza-Calvo, J.-C., Saiz-Adalid, L.-J., & Gil-Vicente, P.-J. (2012). Analyzing the Impact of Intermittent Faults on Microprocessors Applying Fault Injection. IEEE Design & Test of Computers, 29(6), 66-73. doi:10.1109/mdt.2011.2179514Rashid, L., Pattabiraman, K., & Gopalakrishnan, S. (2010). Modeling the Propagation of Intermittent Hardware Faults in Programs. 2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing. doi:10.1109/prdc.2010.52Amiri, M., Siddiqui, F. M., Kelly, C., Woods, R., Rafferty, K., & Bardak, B. (2016). FPGA-Based Soft-Core Processors for Image Processing Applications. Journal of Signal Processing Systems, 87(1), 139-156. doi:10.1007/s11265-016-1185-7Hailesellasie, M., Hasan, S. R., & Mohamed, O. A. (2019). MulMapper: Towards an Automated FPGA-Based CNN Processor Generator Based on a Dynamic Design Space Exploration. 2019 IEEE International Symposium on Circuits and Systems (ISCAS). doi:10.1109/iscas.2019.8702589Mittal, S. (2018). A survey of FPGA-based accelerators for convolutional neural networks. Neural Computing and Applications, 32(4), 1109-1139. doi:10.1007/s00521-018-3761-1Intel Completes Acquisition of Alterahttps://newsroom.intel.com/news-releases/intel-completes-acquisition-of-altera/#gs.mi6ujuAMD to Acquire Xilinx, Creating the Industry’s High Performance Computing Leaderhttps://www.amd.com/en/press-releases/2020-10-27-amd-to-acquire-xilinx-creating-the-industry-s-high-performance-computingKim, K. H., & Lawrence, T. F. (s. f.). Adaptive fault tolerance: issues and approaches. [1990] Proceedings. Second IEEE Workshop on Future Trends of Distributed Computing Systems. doi:10.1109/ftdcs.1990.138292Gonzalez, O., Shrikumar, H., Stankovic, J. A., & Ramamritham, K. (s. f.). Adaptive fault tolerance and graceful degradation under dynamic hard real-time scheduling. Proceedings Real-Time Systems Symposium. doi:10.1109/real.1997.641271Jacobs, A., George, A. D., & Cieslewski, G. (2009). Reconfigurable fault tolerance: A framework for environmentally adaptive fault mitigation in space. 2009 International Conference on Field Programmable Logic and Applications. doi:10.1109/fpl.2009.5272313Shin, D., Park, J., Park, J., Paul, S., & Bhunia, S. (2017). Adaptive ECC for Tailored Protection of Nanoscale Memory. IEEE Design & Test, 34(6), 84-93. doi:10.1109/mdat.2016.2615844Silva, F., Muniz, A., Silveira, J., & Marcon, C. (2020). CLC-A: An Adaptive Implementation of the Column Line Code (CLC) ECC. 2020 33rd Symposium on Integrated Circuits and Systems Design (SBCCI). doi:10.1109/sbcci50935.2020.9189901Mukherjee, S. S., Emer, J., Fossum, T., & Reinhardt, S. K. (s. f.). Cache scrubbing in microprocessors: myth or necessity? 10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings. doi:10.1109/prdc.2004.1276550Saleh, A. M., Serrano, J. J., & Patel, J. H. (1990). Reliability of scrubbing recovery-techniques for memory systems. IEEE Transactions on Reliability, 39(1), 114-122. doi:10.1109/24.52622X9SRA User’s Manual (Rev. 1.1)https://www.manualshelf.com/manual/supermicro/x9sra/user-s-manual-1-1.htmlChishti, Z., Alameldeen, A. R., Wilkerson, C., Wu, W., & Lu, S.-L. (2009). Improving cache lifetime reliability at ultra-low voltages. Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture - Micro-42. doi:10.1145/1669112.1669126Datta, R., & Touba, N. A. (2011). Designing a fast and adaptive error correction scheme for increasing the lifetime of phase change memories. 29th VLSI Test Symposium. doi:10.1109/vts.2011.5783773Kim, J., Lim, J., Cho, W., Shin, K.-S., Kim, H., & Lee, H.-J. (2016). Adaptive Memory Controller for High-performance Multi-channel Memory. JSTS:Journal of Semiconductor Technology and Science, 16(6), 808-816. doi:10.5573/jsts.2016.16.6.808Yuan, L., Liu, H., Jia, P., & Yang, Y. (2015). Reliability-Based ECC System for Adaptive Protection of NAND Flash Memories. 2015 Fifth International Conference on Communication Systems and Network Technologies. doi:10.1109/csnt.2015.23Zhou, Y., Wu, F., Lu, Z., He, X., Huang, P., & Xie, C. (2019). SCORE. ACM Transactions on Architecture and Code Optimization, 15(4), 1-25. doi:10.1145/3291052Lu, S.-K., Li, H.-P., & Miyase, K. (2018). Adaptive ECC Techniques for Reliability and Yield Enhancement of Phase Change Memory. 2018 IEEE 24th International Symposium on On-Line Testing And Robust System Design (IOLTS). doi:10.1109/iolts.2018.8474118Chen, J., Andjelkovic, M., Simevski, A., Li, Y., Skoncej, P., & Krstic, M. (2019). Design of SRAM-Based Low-Cost SEU Monitor for Self-Adaptive Multiprocessing Systems. 2019 22nd Euromicro Conference on Digital System Design (DSD). doi:10.1109/dsd.2019.00080Wang, X., Jiang, L., & Chakrabarty, K. (2020). LSTM-based Analysis of Temporally- and Spatially-Correlated Signatures for Intermittent Fault Detection. 2020 IEEE 38th VLSI Test Symposium (VTS). doi:10.1109/vts48691.2020.9107600Ebrahimi, H., & G. Kerkhoff, H. (2018). Intermittent Resistance Fault Detection at Board Level. 2018 IEEE 21st International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS). doi:10.1109/ddecs.2018.00031Ebrahimi, H., & Kerkhoff, H. G. (2020). A New Monitor Insertion Algorithm for Intermittent Fault Detection. 2020 IEEE European Test Symposium (ETS). doi:10.1109/ets48528.2020.9131563Hsiao, M. Y. (1970). A Class of Optimal Minimum Odd-weight-column SEC-DED Codes. IBM Journal of Research and Development, 14(4), 395-401. doi:10.1147/rd.144.0395Benso, A., & Prinetto, P. (Eds.). (2004). Fault Injection Techniques and Tools for Embedded Systems Reliability Evaluation. Frontiers in Electronic Testing. doi:10.1007/b105828Gracia, J., Saiz, L. J., Baraza, J. C., Gil, D., & Gil, P. J. (2008). Analysis of the influence of intermittent faults in a microcontroller. 2008 11th IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems. doi:10.1109/ddecs.2008.4538761ZC702 Evaluation Board for the Zynq-7000 XC7Z020 SoChttps://www.xilinx.com/support/documentation/boards_and_kits/zc702_zvik/ug850-zc702-eval-bd.pd

    Explanation of the Model Checker Verification Results

    Get PDF
    Immer wenn neue Anforderungen an ein System gestellt werden, müssen die Korrektheit und Konsistenz der Systemspezifikation überprüft werden, was in der Praxis in der Regel manuell erfolgt. Eine mögliche Option, um die Nachteile dieser manuellen Analyse zu überwinden, ist das sogenannte Contract-Based Design. Dieser Entwurfsansatz kann den Verifikationsprozess zur Überprüfung, ob die Anforderungen auf oberster Ebene konsistent verfeinert wurden, automatisieren. Die Verifikation kann somit iterativ durchgeführt werden, um die Korrektheit und Konsistenz des Systems angesichts jeglicher Änderung der Spezifikationen sicherzustellen. Allerdings ist es aufgrund der mangelnden Benutzerfreundlichkeit und der Schwierigkeiten bei der Interpretation von Verifizierungsergebnissen immer noch eine Herausforderung, formale Ansätze in der Industrie einzusetzen. Stellt beispielsweise der Model Checker bei der Verifikation eine Inkonsistenz fest, generiert er ein Gegenbeispiel (Counterexample) und weist gleichzeitig darauf hin, dass die gegebenen Eingabespezifikationen inkonsistent sind. Hier besteht die gewaltige Herausforderung darin, das generierte Gegenbeispiel zu verstehen, das oft sehr lang, kryptisch und komplex ist. Darüber hinaus liegt es in der Verantwortung der Ingenieurin bzw. des Ingenieurs, die inkonsistente Spezifikation in einer potenziell großen Menge von Spezifikationen zu identifizieren. Diese Arbeit schlägt einen Ansatz zur Erklärung von Gegenbeispielen (Counterexample Explanation Approach) vor, der die Verwendung von formalen Methoden vereinfacht und fördert, indem benutzerfreundliche Erklärungen der Verifikationsergebnisse der Ingenieurin bzw. dem Ingenieur präsentiert werden. Der Ansatz zur Erklärung von Gegenbeispielen wird mittels zweier Methoden evaluiert: (1) Evaluation anhand verschiedener Anwendungsbeispiele und (2) eine Benutzerstudie in Form eines One-Group Pretest-Posttest Experiments.Whenever new requirements are introduced for a system, the correctness and consistency of the system specification must be verified, which is often done manually in industrial settings. One viable option to traverse disadvantages of this manual analysis is to employ the contract-based design, which can automate the verification process to determine whether the refinements of top-level requirements are consistent. Thus, verification can be performed iteratively to ensure the system’s correctness and consistency in the face of any change in specifications. Having said that, it is still challenging to deploy formal approaches in industries due to their lack of usability and their difficulties in interpreting verification results. For instance, if the model checker identifies inconsistency during the verification, it generates a counterexample while also indicating that the given input specifications are inconsistent. Here, the formidable challenge is to comprehend the generated counterexample, which is often lengthy, cryptic, and complex. Furthermore, it is the engineer’s responsibility to identify the inconsistent specification among a potentially huge set of specifications. This PhD thesis proposes a counterexample explanation approach for formal methods that simplifies and encourages their use by presenting user-friendly explanations of the verification results. The proposed counterexample explanation approach identifies and explains relevant information from the verification result in what seems like a natural language statement. The counterexample explanation approach extracts relevant information by identifying inconsistent specifications from among the set of specifications, as well as erroneous states and variables from the counterexample. The counterexample explanation approach is evaluated using two methods: (1) evaluation with different application examples, and (2) a user-study known as one-group pretest and posttest experiment

    A User Study for Evaluation of Formal Verification Results and their Explanation at Bosch

    Full text link
    Context: Ensuring safety for any sophisticated system is getting more complex due to the rising number of features and functionalities. This calls for formal methods to entrust confidence in such systems. Nevertheless, using formal methods in industry is demanding because of their lack of usability and the difficulty of understanding verification results. Objective: We evaluate the acceptance of formal methods by Bosch automotive engineers, particularly whether the difficulty of understanding verification results can be reduced. Method: We perform two different exploratory studies. First, we conduct a user survey to explore challenges in identifying inconsistent specifications and using formal methods by Bosch automotive engineers. Second, we perform a one-group pretest-posttest experiment to collect impressions from Bosch engineers familiar with formal methods to evaluate whether understanding verification results is simplified by our counterexample explanation approach. Results: The results from the user survey indicate that identifying refinement inconsistencies, understanding formal notations, and interpreting verification results are challenging. Nevertheless, engineers are still interested in using formal methods in real-world development processes because it could reduce the manual effort for verification. Additionally, they also believe formal methods could make the system safer. Furthermore, the one-group pretest-posttest experiment results indicate that engineers are more comfortable understanding the counterexample explanation than the raw model checker output. Limitations: The main limitation of this study is the generalizability beyond the target group of Bosch automotive engineers.Comment: This manuscript is under review with the Empirical Software Engineering journa

    Agent programming in the cognitive era

    Get PDF
    It is claimed that, in the nascent ‘Cognitive Era’, intelligent systems will be trained using machine learning techniques rather than programmed by software developers. A contrary point of view argues that machine learning has limitations, and, taken in isolation, cannot form the basis of autonomous systems capable of intelligent behaviour in complex environments. In this paper, we explore the contributions that agent-oriented programming can make to the development of future intelligent systems. We briefly review the state of the art in agent programming, focussing particularly on BDI-based agent programming languages, and discuss previous work on integrating AI techniques (including machine learning) in agent-oriented programming. We argue that the unique strengths of BDI agent languages provide an ideal framework for integrating the wide range of AI capabilities necessary for progress towards the next-generation of intelligent systems. We identify a range of possible approaches to integrating AI into a BDI agent architecture. Some of these approaches, e.g., ‘AI as a service’, exploit immediate synergies between rapidly maturing AI techniques and agent programming, while others, e.g., ‘AI embedded into agents’ raise more fundamental research questions, and we sketch a programme of research directed towards identifying the most appropriate ways of integrating AI capabilities into agent programs

    Resilience-Building Technologies: State of Knowledge -- ReSIST NoE Deliverable D12

    Get PDF
    This document is the first product of work package WP2, "Resilience-building and -scaling technologies", in the programme of jointly executed research (JER) of the ReSIST Network of Excellenc

    Certifications of Critical Systems – The CECRIS Experience

    Get PDF
    In recent years, a considerable amount of effort has been devoted, both in industry and academia, to the development, validation and verification of critical systems, i.e. those systems whose malfunctions or failures reach a critical level both in terms of risks to human life as well as having a large economic impact.Certifications of Critical Systems – The CECRIS Experience documents the main insights on Cost Effective Verification and Validation processes that were gained during work in the European Research Project CECRIS (acronym for Certification of Critical Systems). The objective of the research was to tackle the challenges of certification by focusing on those aspects that turn out to be more difficult/important for current and future critical systems industry: the effective use of methodologies, processes and tools.The CECRIS project took a step forward in the growing field of development, verification and validation and certification of critical systems. It focused on the more difficult/important aspects of critical system development, verification and validation and certification process. Starting from both the scientific and industrial state of the art methodologies for system development and the impact of their usage on the verification and validation and certification of critical systems, the project aimed at developing strategies and techniques supported by automatic or semi-automatic tools and methods for these activities, setting guidelines to support engineers during the planning of the verification and validation phases

    Certifications of Critical Systems – The CECRIS Experience

    Get PDF
    In recent years, a considerable amount of effort has been devoted, both in industry and academia, to the development, validation and verification of critical systems, i.e. those systems whose malfunctions or failures reach a critical level both in terms of risks to human life as well as having a large economic impact.Certifications of Critical Systems – The CECRIS Experience documents the main insights on Cost Effective Verification and Validation processes that were gained during work in the European Research Project CECRIS (acronym for Certification of Critical Systems). The objective of the research was to tackle the challenges of certification by focusing on those aspects that turn out to be more difficult/important for current and future critical systems industry: the effective use of methodologies, processes and tools.The CECRIS project took a step forward in the growing field of development, verification and validation and certification of critical systems. It focused on the more difficult/important aspects of critical system development, verification and validation and certification process. Starting from both the scientific and industrial state of the art methodologies for system development and the impact of their usage on the verification and validation and certification of critical systems, the project aimed at developing strategies and techniques supported by automatic or semi-automatic tools and methods for these activities, setting guidelines to support engineers during the planning of the verification and validation phases

    Tехнічні засоби діагностування та контролю бортових систем інформаційного обміну на літаку

    Get PDF
    Робота публікується згідно наказу ректора від 27.05.2021 р. №311/од "Про розміщення кваліфікаційних робіт вищої освіти в репозиторії НАУ". Керівник дипломної роботи: доцент кафедри авіоніки, Слободян Олександр ПетровичТехнічний прогрес в авіаційній та будь-якій іншій галузі тісно пов'язаний з автоматизацією технологічних процесів. Сьогодні Автоматизація технологічних процесів використовується для підвищення характеристик надійності, довговічності, екологічності, ресурсозбереження і, найголовніше, економічності і простоти експлуатації. Завдяки швидкому розвитку комп'ютерних технологій і мікропроцесорів у нас є можливість використовувати більш досконалі і складні методи моніторингу та управління системами авіаційної промисловості і будь-якими іншими. Мікропроцесорні та електронні обчислювальні пристрої, з'єднані обчислювальними і керуючими мережами з використанням загальних баз даних, мають стандарти, що дозволяють модифікувати і інтегрувати нові пристрої, що, в свою чергу, дозволяє інтегрувати і вдосконалювати виробничі процеси і управляти ними. Проектування системи розподіленої інтегрованої модульної авіоніки (DIMA) з використанням розподіленої інтегрованої технології, змішаного планування критичних завдань, резервний планування в режимі реального часу і механізму зв'язку, який запускається за часом, значно підвищує надійність, безпеку і продуктивність інтегрованої електронної системи в режимі реального часу. DIMA являє собою тенденцію розвитку майбутніх систем авіоніки. У цій статті вивчаються і обговорюються архітектурні характеристики DIMA. Потім він детально вивчає та аналізує розвиток ключових технологій в системі DIMA. Нарешті, в ньому розглядається тенденція розвитку технології DIMA
    corecore