15 research outputs found

    Survey of Soft Error Mitigation Techniques Applied to LEON3 Soft Processors on SRAM-Based FPGAs

    Get PDF
    Soft-core processors implemented in SRAM-based FPGAs are an attractive option for applications to be employed in radiation environments due to their flexibility, relatively-low application development costs, and reconfigurability features enabling them to adapt to the evolving mission needs. Despite the advantages soft-core processors possess, they are seldom used in critical applications because they are more sensitive to radiation than their hard-core counterparts. For instance, both the logic and signal routing circuitry of a soft-core processor as well as its user memory are susceptible to radiation-induced faults. Therefore, soft-core processors must be appropriately hardened against ionizing-radiation to become a feasible design choice for harsh environments and thus to reap all their benefits. This survey henceforth discusses various techniques to protect the configuration and user memories of an LEON3 soft processor, which is one of the most widely used soft-core processors in radiation environments, as reported in the state-of-the-art literature, with the objective of facilitating the choice of right fault-mitigation solution for any given soft-core processor

    SATTA: a Self-Adaptive Temperature-based TDF awareness methodology for dynamically reconfigurable FPGAs

    Get PDF
    Dependability issues due to non functional properties are emerging as major cause of faults in modern digital systems. Effective countermeasures have to be presented to properly manage their critical timing effects. This paper presents a methodology to avoid transition delay faults in FPGA-based systems, with low area overhead. The approach is able to exploit temperature information and aging characteristics to minimize the cost in terms of performances degradation and power consumption. The architecture of a hardware manager able to avoid delay faults is presented and deeply analyzed, as well as its integration in the standard implementation design flow

    Técnicas de inyección de fallos basadas en FPGAs para la evaluación de la tolerancia a fallos de tipo SEU en circuitos digitales

    Get PDF
    Este trabajo de tesis doctoral presenta nuevas técnicas de inyección de fallos transitorios en elementos de memoria, que permiten la evaluación del comportamiento de los complejos circuitos digitales actuales en presencia de fallos SEU (Single Event Upset). Se han propuesto técnicas de inyección que dan solución a la evaluación de la tolerancia a fallos SEU para distintos componentes de los sistemas digitales actuales, en los que se tiende a integrar distintos tipos de circuitos en un mismo chip, SoCs (System on Chip). El entorno de inyección en las soluciones propuestas en esta tesis se basa en emulación con dispositivos programables, FPGAs, realizándose las tareas relacionadas con la inyección desde la plataforma hardware de emulación. La implementación hardware del sistema de inyección minimiza la comunicación necesaria entre el hardware y un computador, siendo dicha comunicación la mayor limitación en la velocidad del proceso de inyección. En primer lugar, se presenta una técnica de inyección de fallos basada en la emulación de fallos con FPGA, que permite evaluar un circuito digital cuando se dispone de su descripción en un lenguaje de alto nivel, como VHDL. Por otro lado, se propone una solución para la inyección de fallos en circuitos microprocesadores basada en el uso de la infraestructura de depuración integrada en el propio microprocesador (OCD, On-Chip Debugger), para acceder a sus recursos internos (memorias y registros), en un componente comercial o prototipo final del microprocesador. Cuando se dispone de la descripción del circuito, éste se implementa junto con el sistema de inyección en la FPGA y no es necesario establecer una comunicación con el exterior durante el desarrollo de una campaña de inyección, por lo que esta propuesta se ha denominado Emulación Autónoma. Al implementar el sistema completo de inyección en un único dispositivo (la FPGA) se aumentan la observabilidad y controlabilidad de los elementos del circuito. En este trabajo de investigación se han propuesto optimizaciones del proceso de inyección, basadas en la mayor accesibilidad al circuito que proporciona la Emulación Autónoma, para mejorar la eficiencia de las tareas de inyección de fallos y observación del comportamiento del circuito en presencia de fallos. En esta tesis se describen y desarrollan tres implementaciones de técnicas de inyección basadas en Emulación Autónoma, denominadas Time-Multiplexed, State-Scan y Mask-Scan. Cada una de las tres implementaciones ofrece un compromiso distinto entre velocidad del proceso de inyección y recursos necesarios para su aplicación. La técnica Time-Multiplexed incluye el mayor número de optimizaciones y mejoras por lo que es la técnica que mayor velocidad consigue en el proceso de evaluación pero, para ello, requiere una cantidad de recursos también mayor que las otras dos implementaciones. Las otras dos técnicas son simplificaciones de la primera, por lo que utilizan menos recursos hardware en la emulación de fallos. Además, se han desarrollado modelos de memoria que permiten aplicar la técnica Time-Multiplexed a circuitos con memorias empotradas. Los modelos se basan en controlar (para insertar los fallos) y observar (para detectar los errores y sus efectos) el contenido de memoria a través de las señales de control, el bus de datos y el bus de direcciones, evitando recorrer todas las palabras de datos. La inyección de fallos en circuitos con memorias empotradas es un problema de gran interés, puesto que éstas últimas son un componente cada vez más habitual en los diseños actuales. Además no se había propuesto hasta la fecha ninguna solución eficiente para la emulación de fallos en memorias. Esta aportación de la tesis permite inyectar fallos de forma rápida en memorias empotradas resolviendo el problema de su limitada accesibilidad. También para los modelos de memoria, se han propuesto distintas implementaciones en función de las prestaciones conseguidas y recursos hardware necesarios, denominados modelo Básico y modelo ECAM. El modelo Básico requiere menos recursos para su implementación, mientras que el modelo ECAM proporciona una mayor capacidad de análisis de los fallos. Los experimentos realizados, tanto sobre circuitos de prueba como sobre circuitos industriales reales, prueban que la Emulación Autónoma acelera el proceso de inyección con respecto a otras soluciones propuestas, permitiendo inyectar millones de fallos en unos pocos segundos. La aceleración conseguida es de dos órdenes de magnitud, con la técnica Time-Multiplexed, con respecto a otras soluciones basadas en emulación, que a su vez proporcionan una aceleración de cuatro órdenes de magnitud con respecto a técnicas basadas en simulación. Esta notable aceleración en la inyección de fallos permite evaluar circuitos de gran tamaño, como los circuitos actuales, donde los posibles fallos suponen un número elevado, y para obtener una medida significativa de su tolerancia a fallos es necesario inyectar un gran conjunto de fallos en un tiempo razonable. Se ha comprobado experimentalmente la viabilidad de la solución presentada para la inyección de fallos en memoria y las características de los modelos de memoria propuestos, para ello se han realizado campañas de inyección sobre un microprocesador industrial en el que se inyectan fallos tanto en los biestables como en la memoria. Por otro lado, la técnica de inyección que se propone en la tesis orientada a microprocesadores realiza la inyección de fallos y observación de sus efectos en el circuito a través de su OCD. El avance de las capacidades e infraestructuras de depuración en los microprocesadores actuales se debe al auge de SoCs y sistemas empotrados en los que, de otra forma, el acceso para depuración a dicho componente sería inviable o muy costoso. Estas capacidades proporcionan un mecanismo eficaz para acceder a los recursos internos del microprocesador, necesario para realizar la inyección de fallos y observar el comportamiento del circuito. El sistema de inyección propuesto controla el OCD mediante su interfaz JTAG, el más común para acceder a los microprocesadores actuales. Al igual que en el sistema de Emulación Autónoma, todas las tareas de inyección se realizan desde el hardware, una FPGA, que se conecta al microprocesador bajo estudio a través de su interfaz JTAG. Esta solución es aplicable a cualquier microprocesador con OCD e interfaz JTAG, lo que son características habituales en la actualidad. Los experimentos desarrollados sobre microprocesadores comerciales (ARM y PowerPC) demuestran que esta técnica proporciona una solución para la inyección de fallos en componentes microprocesadores comerciales eficiente, de gran generalidad y que alcanza un compromiso entre velocidad y coste. En resumen, se ha propuesto una solución precisa, rápida y de bajo coste para evaluar la tolerancia a fallos de tipo SEU de los circuitos digitales actuales, permitiendo la inyección de fallos en circuitos de gran tamaño con memorias y microprocesadores empotrados. ____________________________________________This PhD thesis presents new transient fault injection techniques to allow evaluating the behaviour of complex digital circuits, as modern circuits, with transient faults in memory elements, i.e., SEU (Single Event Upset) faults. Fault injection techniques have been proposed to solve SEU tolerance evaluation in different components of systems on chip (SoCs). The fault injection environment of the proposed solutions in this thesis is emulation-based with FPGA, performing injection tasks from the emulation hardware platform. The hardware implementation of the injection system minimises the required communication between hardware and host computer that is a bottleneck in speed injection process. First of all, a transient fault emulation technique in FPGA devices aimed at evaluating a circuit, whose description is available in a hardware description language (as VHDL), is presented. Secondly, a fault injection technique aimed at evaluating fault tolerance in microprocessors is proposed. Such proposal is applied on a final prototype or a commercial component and it consists in using the debugger infrastructure integrated in the circuit (OCD, On-Chip Debugger) to access the microprocessor’s internal resources (memory and registers). On the one side, when the circuit description is available, the circuit is implemented in the FPGA together with the injection system and therefore the communication with the host PC is avoided during fault injection campaign. This fault injection technique has been called Autonomous Emulation. The monolithic hardware implementation for the injection system (a unique FPGA) provides better controllability and observability of the circuit under test, than other solutions. Some injection process optimisations are proposed in this research work in order to enhance the efficiency and the speed of the different injection tasks. In this work, three implementations of the Autonomous Emulation system are proposed and developed. They are called Time-Multiplexed, State-Scan and Mask- Scan. Each one provides a different trade-off between area overhead and injection process speed-up. Time-Multiplexed technique includes more optimisations than the other techniques. Therefore, it obtains the highest speed-up in the evaluation process, but it requires more area overhead than the other implementations. State-Scan and vi Mask-Scan techniques are simplified versions of Time-Multiplexed implementation, using less hardware resources to perform the fault emulation. Furthermore, memory models have been developed in order to apply the Time- Multiplexed technique to digital circuits with embedded memories. Such models are based on controlling (to insert faults) and observing (to detect the errors and watch their effects) the memory data by means of the control signals, data bus and memory address bus, instead of accessing every memory word, that is a slow task, specially for large memories. The fault injection in embedded memories is a very interesting problem as they are components more and more usual in current digital designs. Besides, there is not an efficient solution for fault emulation in memories in the literature. This thesis’ contribution allows the fault injection in embedded memories in a fast way, solving the accessibility limitation problem. Different implementations have been also proposed for the memory models, according to the trade-off between performance and hardware resources requirements; they are named basic model and ECAM model. The basic model involves less hardware resources, whilst the ECAM model provides a better performance in the result analysis task. The experiments developed in this thesis consist in performing fault injection campaigns in benchmark circuits as well as in real ones. The experimental results prove that Autonomous Emulation speeds-up the injection process with respect to other existing solutions, making possible the injection of millions of faults in a few seconds. The injection process speed increases around two orders of magnitude using Time- Multiplexed with respect to other emulation-based solutions, what are faster than simulation-based techniques in four orders of magnitude. This notable enhancement in the injection speed allows the evaluation of the fault tolerance in large circuits, as the current ones. In modern circuits, all the possible SEU faults suppose a very high number of faults, and in order to obtain a significant measurement of the fault tolerance, injecting a large set of faults in reasonable time is necessary. The feasibility of the proposed memory models has also been analyzed performing fault campaigns in an industrial microprocessor, injecting faults in flip-flops as well as in memory. On the other side, the fault injection technique, proposed in this PhD thesis, aimed at evaluating microprocessors using the OCD to insert the faults and to observe their effects in the circuit. Nowadays, enhanced debugging capabilities and integrated infrastructures are available in current microprocessors, due to the increasing use of SoCs and embedded systems, where, without an OCD, the debugging process would be infeasible or require a high cost. The OCD provides a mechanism to access microprocessor’s internal resources and so it can be used to inject faults and to observe the circuit behaviour. The proposed fault injection system controls the OCD by means of the JTAG interface, what is the most common interface to access modern microprocessors. As in the Autonomous Emulation System, all the injection tasks are performed in hardware, in an FPGA, that is connected to the microprocessor under test by means of the JTAG interface. This solution could be applicable to any microprocessor circuit with an OCD and a JTAG interface, what are the most common features nowadays. Developed experiments in commercial microprocessors (ARM and PowerPC) show this technique provides an efficient solution to inject faults in microprocessors devices, applicable to a wide range of different processors and offering a trade-off between the injection process speed and its cost. In summary, a fast, accurate and low cost solution to evaluate the SEU fault tolerance in modern digital circuits has been proposed. It allows fault injection in large circuits with embedded memories and microprocessors

    Real-time fault injection using enhanced on-chip debug infrastructures

    Get PDF
    The rapid increase in the use of microprocessor-based systems in critical areas, where failures imply risks to human lives, to the environment or to expensive equipment, significantly increased the need for dependable systems, able to detect, tolerate and eventually correct faults. The verification and validation of such systems is frequently performed via fault injection, using various forms and techniques. However, as electronic devices get smaller and more complex, controllability and observability issues, and sometimes real time constraints, make it harder to apply most conventional fault injection techniques. This paper proposes a fault injection environment and a scalable methodology to assist the execution of real-time fault injection campaigns, providing enhanced performance and capabilities. Our proposed solutions are based on the use of common and customized on-chip debug (OCD) mechanisms, present in many modern electronic devices, with the main objective of enabling the insertion of faults in microprocessor memory elements with minimum delay and intrusiveness. Different configurations were implemented starting from basic Components Off-The-Shelf (COTS) microprocessors, equipped with real-time OCD infrastructures, to improved solutions based on modified interfaces, and dedicated OCD circuitry that enhance fault injection capabilities and performance. All methodologies and configurations were evaluated and compared concerning performance gain and silicon overhead

    A novel FPGA-based evolvable hardware system based on multiple processing arrays

    Get PDF
    In this paper, an architecture based on a scalable and flexible set of Evolvable Processing arrays is presented. FPGA-native Dynamic Partial Reconfiguration (DPR) is used for evolution, which is done intrinsically, letting the system to adapt autonomously to variable run-time conditions, including the presence of transient and permanent faults. The architecture supports different modes of operation, namely: independent, parallel, cascaded or bypass mode. These modes of operation can be used during evolution time or during normal operation. The evolvability of the architecture is combined with fault-tolerance techniques, to enhance the platform with self-healing features, making it suitable for applications which require both high adaptability and reliability. Experimental results show that such a system may benefit from accelerated evolution times, increased performance and improved dependability, mainly by increasing fault tolerance for transient and permanent faults, as well as providing some fault identification possibilities. The evolvable HW array shown is tailored for window-based image processing applications

    Dynamic Partial Reconfiguration for Dependable Systems

    Get PDF
    Moore’s law has served as goal and motivation for consumer electronics manufacturers in the last decades. The results in terms of processing power increase in the consumer electronics devices have been mainly achieved due to cost reduction and technology shrinking. However, reducing physical geometries mainly affects the electronic devices’ dependability, making them more sensitive to soft-errors like Single Event Transient (SET) of Single Event Upset (SEU) and hard (permanent) faults, e.g. due to aging effects. Accordingly, safety critical systems often rely on the adoption of old technology nodes, even if they introduce longer design time w.r.t. consumer electronics. In fact, functional safety requirements are increasingly pushing industry in developing innovative methodologies to design high-dependable systems with the required diagnostic coverage. On the other hand commercial off-the-shelf (COTS) devices adoption began to be considered for safety-related systems due to real-time requirements, the need for the implementation of computationally hungry algorithms and lower design costs. In this field FPGA market share is constantly increased, thanks to their flexibility and low non-recurrent engineering costs, making them suitable for a set of safety critical applications with low production volumes. The works presented in this thesis tries to face new dependability issues in modern reconfigurable systems, exploiting their special features to take proper counteractions with low impacton performances, namely Dynamic Partial Reconfiguration

    FPGA ARCHITECTURE AND VERIFICATION OF BUILT IN SELF-TEST (BIST) FOR 32-BIT ADDER/SUBTRACTER USING DE0-NANO FPGA AND ANALOG DISCOVERY 2 HARDWARE

    Get PDF
    The integrated circuit (IC) is an integral part of everyday modern technology, and its application is very attractive to hardware and software design engineers because of its versatility, integration, power consumption, cost, and board area reduction. IC is available in various types such as Field Programming Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), System on Chip (SoC) architecture, Digital Signal Processing (DSP), microcontrollers (μC), and many more. With technology demand focused on faster, low power consumption, efficient IC application, design engineers are facing tremendous challenges in developing and testing integrated circuits that guaranty functionality, high fault coverage, and reliability as the transistor technology is shrinking to the point where manufacturing defects of ICs are affecting yield which associates with the increased cost of the part. The competitive IC market is pressuring manufactures of ICs to develop and market IC in a relatively quick turnaround which in return requires design and verification engineers to develop an integrated self-test structure that would ensure fault-free and the quality product is delivered on the market. 70-80% of IC design is spent on verification and testing to ensure high quality and reliability for the enduser. To test complex and sophisticated IC designs, the verification engineers must produce laborious and costly test fixtures which affect the cost of the part on the competitive market. To avoid increasing the part cost due to yield and test time to the end-user and to keep up with the competitive market many IC design engineers are deviating from complex external test fixture approach and are focusing on integrating Built-in Self-Test (BIST) or Design for Test (DFT) techniques onto IC’s which would reduce time to market but still guarantee high coverage for the product. Understanding the BIST, the architecture, as well as the application of IC, must be understood before developing IC. The architecture of FPGA is elaborated in this paper followed by several BIST techniques and applications of those BIST relative to FPGA, SoC, analog to digital (ADC), or digital to analog converters (DAC) that are integrated on IC. Paper is concluded with verification of BIST for the 32-bit adder/subtracter designed in Quartus II software using the Analog Discovery 2 module as stimulus and DE0-NANO FPGA board for verification

    Characterization of Interconnection Delays in FPGAS Due to Single Event Upsets and Mitigation

    Get PDF
    RÉSUMÉ L’utilisation incessante de composants électroniques à géométrie toujours plus faible a engendré de nouveaux défis au fil des ans. Par exemple, des semi-conducteurs à mémoire et à microprocesseur plus avancés sont utilisés dans les systèmes avioniques qui présentent une susceptibilité importante aux phénomènes de rayonnement cosmique. L'une des principales implications des rayons cosmiques, observée principalement dans les satellites en orbite, est l'effet d'événements singuliers (SEE). Le rayonnement atmosphérique suscite plusieurs préoccupations concernant la sécurité et la fiabilité de l'équipement avionique, en particulier pour les systèmes qui impliquent des réseaux de portes programmables (FPGA). Les FPGA à base de cellules de mémoire statique (SRAM) présentent une solution attrayante pour mettre en oeuvre des systèmes complexes dans le domaine de l’avionique. Les expériences de rayonnement réalisées sur les FPGA ont dévoilé la vulnérabilité de ces dispositifs contre un type particulier de SEE, à savoir, les événements singuliers de changement d’état (SEU). Un SEU est considérée comme le changement de l'état d'un élément bistable (c'est-à-dire, un bit-flip) dû à l'effet d'un ion, d'un proton ou d’un neutron énergétique. Cet effet est non destructif et peut être corrigé en réécrivant la partie de la SRAM affectée. Les changements de délai (DC) potentiels dus aux SEU affectant la mémoire de configuration de routage ont été récemment confirmés. Un des objectifs de cette thèse consiste à caractériser plus précisément les DC dans les FPGA causés par les SEU. Les DC observés expérimentalement sont présentés et la modélisation au niveau circuit de ces DC est proposée. Les circuits impliqués dans la propagation du délai sont validés en effectuant une modélisation précise des blocs internes à l'intérieur du FPGA et en exécutant des simulations. Les résultats montrent l’origine des DC qui sont en accord avec les mesures expérimentales de délais. Les modèles proposés au niveau circuit sont, aux meilleures de notre connaissance, le premier travail qui confirme et explique les délais combinatoires dans les FPGA. La conception d'un circuit moniteur de délai pour la détection des DC a été faite dans la deuxième partie de cette thèse. Ce moniteur permet de détecter un changement de délai sur les sections critiques du circuit et de prévenir les pannes de synchronisation engendrées par les SEU sans utiliser la redondance modulaire triple (TMR).----------ABSTRACT The unrelenting demand for electronic components with ever diminishing feature size have emerged new challenges over the years. Among them, more advanced memory and microprocessor semiconductors are being used in avionic systems that exhibit a substantial susceptibility to cosmic radiation phenomena. One of the main implications of cosmic rays, which was primarily observed in orbiting satellites, is single-event effect (SEE). Atmospheric radiation causes several concerns regarding the safety and reliability of avionics equipment, particularly for systems that involve field programmable gate arrays (FPGA). SRAM-based FPGAs, as an attractive solution to implement systems in aeronautic sector, are very susceptible to SEEs in particular Single Event Upset (SEU). An SEU is considered as the change of the state of a bistable element (i.e., bit-flip) due to the effect of an energetic ion or proton. This effect is non-destructive and may be fixed by rewriting the affected part. Sensitivity evaluation of SRAM-based FPGAs to a physical impact such as potential delay changes (DC) has not been addressed thus far in the literature. DCs induced by SEU can affect the functionality of the logic circuits by disturbing the race condition on critical paths. The objective of this thesis is toward the characterization of DCs in SRAM-based FPGAs due to transient ionizing radiation. The DCs observed experimentally are presented and the circuit-level modeling of those DCs is proposed. Circuits involved in delay propagation are reverse-engineered by performing precise modeling of internal blocks inside the FPGA and executing simulations. The results show the root cause of DCs that are in good agreement with experimental delay measurements. The proposed circuit level models are, to the best of our knowledge, the first work on modeling of combinational delays in FPGAs.In addition, the design of a delay monitor circuit for DC detection is investigated in the second part of this thesis. This monitor allowed to show experimentally cumulative DCs on interconnects in FPGA. To this end, by avoiding the use of triple modular redundancy (TMR), a mitigation technique for DCs is proposed and the system downtime is minimized. A method is also proposed to decrease the clock frequency after DC detection without interrupting the process

    Nueva metodología para el endurecimiento óptimo de sistemas digitales con distribución de la funcionalidad, trabajando en entornos sometidos a la radiación ionizante

    Get PDF
    Debido al extenso uso de los sistemas digitales distribuidos en las diferentes áreas tecnológicas sometidas a entornos agresivos, como por ejemplo la automoción o el espacio, actualmente es de vital importancia la aplicación de técnicas de endurecimiento ante los efectos de radiación ionizante. Esta tesis se inició con el proyecto de investigación RENASER+ “TEC2010-22095-C03-03”, financiado por el Ministerio Español de Ciencia y Tecnología. El objetivo principal del subproyecto liderado por la Universidad Carlos III de Madrid, consistía en la evaluación de la sensibilidad de los sistemas digitales a los efectos de la radiación ionizante. Para ello, se planificaron una serie de tareas destinadas a establecer un protocolo y desarrollar un sistema de pruebas válido para realizar campañas de irradiación en los aceleradores instalados en el Centro Nacional de Aceleradores de la Universidad de Sevilla (CNA). Con el fin de alcanzar los objetivos de este proyecto, en los trabajos de la tesis doctoral, se estudiaron y aplicaron técnicas de tolerancia a fallos para sistemas distribuidos, y se comprobó la sensibilidad del sistema con el método basado en la inyección de los fallos por emulación. En concreto, utilizando la herramienta de Emulación Autónoma desarrollada en el grupo DMA de la Universidad Carlos III. A lo largo del desarrollo de la tesis doctoral, en concordancia con las tareas y objetivos del proyecto RENASER+, se han introducido mejoras en el sistema de inyección de fallos transitorios por emulación, para permitir la inyección de fallos múltiples en los elementos de memoria y para permitir el análisis detallado del efecto acumulado de los fallos detectados mediante técnicas de mitigación de fallos, y técnicas de rastreo del fallo y su propagación. Con el objetivo de generar un método integral de análisis, validación y calificación de los sistemas distribuidos robustos, además de inyectar fallos por emulación se indujeron fallos reales en dichos sistemas, mediante el uso del acelerador de partículas tipo ciclotrón del CNA de Sevilla. La evaluación de la sensibilidad del sistema distribuido prototipado requirió una cualificación mediante varias campañas de irradiación, en esta tesis se ha propuesto un método de planificación y realización de una campaña que permite la monitorización continúa del efecto de los fallos y la generación automática de informes finales del proceso completo. En la realización de la tesis doctoral se planteó un objetivo más ambicioso, que consistía en la generación de un método de comprobación continua, durante el tiempo de funcionamiento normal de los sistemas distribuidos, de la presencia de fallos que pudieran causar un mal funcionamiento, e incluso averías graves. Así, se amplió el método de calificación de la robustez de sistemas distribuidos a un método de test on-line de dichos sistemas. Durante el estudio de la robustez de los sistemas distribuidos, principalmente para aplicaciones terrestres, se identificó y estudió el problema de los fallos debidos a envejecimiento de los componentes electrónicos digitales. Por lo tanto, el método propuesto es también adecuado para la realización del test on-line durante el funcionamiento normal del sistema distribuido, para la detección de los fallos procedentes de la radiación ionizante (fallos transitorios) y los fallos debidos al envejecimiento de los dispositivos (fallos permanentes). Los sistemas digitales distribuidos, aparte de los nodos que los componen, tienen un protocolo de comunicación que puede fallar igual que el mismo nodo. En esta tesis doctoral se han estudiado varios protocolos de comunicación utilizados en sistemas distribuidos de aplicaciones aeroespaciales y de automoción; así mismo, se han comparado los protocolos CAN y LIN, y se ha propuesto, diseñado y prototipado un sistema distribuido básico, para la comprobación de la metodología de calificación y test on-line y de las técnicas de endurecimiento propuestas. Como aportación original de esta tesis doctoral, se propone una nueva metodología para comprobar y asegurar el endurecimiento efectivo de un sistema digital distribuido, que incluya cualquier bus de comunicación. Otra contribución original de esta tesis doctoral, consiste en una técnica que permite la comparación de los datos obtenidos tras la irradiación de una forma directa y transparente al sistema de test, basada en un bloque de detección de minoría para rastreo de la propagación de los fallos. Se han analizado distintas tecnologías de FPGA basadas en memoria Flash, con el propósito de una mejor caracterización de los dispositivos a iradiar. Para el sistema final se ha optado por la utilización de una FPGA Igloo de Microsemi®, debido a que es una tecnología más robusta y basada en la tecnología Flash. Se ha desarrollado un software de control para el sistema distribuido, el cual se ejecuta en un microprocesador contenido en uno de los nodos del sistema y el que envía los resultados obtenidos por medio de un bus SPI a un PC. El software automatiza el proceso de recolección de los datos, proporcionando el resultado de cuál de los nodos ha fallado, si se ha recuperado, si ha fallado el sistema de comunicación, etc. Estos resultados permiten validar experimentalmente el método propuesto para los sistemas distribuidos digitales. Además de cumplir con los objetivos del proyecto, se han resuelto dos de los problemas típicos de este tipo de sistemas. Estos corresponden al reset síncrono del sistema y a la monitorización del mantenimiento continuo del tiempo real del sistema. Los distintos bloques propuestos y utilizados en este método integral de calificación de sistemas distribuidos son adecuados para la realización de test on-line de sistemas distribuidos, con el objeto de detectar la presencia de fallos permanentes (debidos al envejecimiento de los dispositivos) y de fallos transitorios (debidos a la radiación ionizante). Finalmente, los últimos aportes de la investigación de este trabajo de tesis, correspondientes al análisis detallado del material del encapsulado del dispositivo a irradiar, y su comportamiento ante el paso de protones, se han podido obtener gracias a la ayuda del Departamento de Ciencia e Ingeniería de los Materiales e Ingeniería Química de la Universidad Carlos III de Madrid y del Centro Nacional de Aceleradores de la Universidad de Sevilla. La mayoría de los resultados parciales y globales de este trabajo de tesis doctoral han sido publicados en Conferencias Internacionales (International On-Line Testing Symposium, Latin American Test Workshop, Digital Circuit and Integrated Systems) y en una revista internacional (IEEE Transactions on Nuclear Science).Nowadays, due to the spread of distributed digital systems in the different technological areas, working in aggressive environments, such as automotive and aerospace applications, hardening techniques against ionizing radiation effects is crucial. This PhD work started with the research Project RENASER+ “TEC2010-22095-C03-03”, supported by the Spanish Ministry of Economy and Competitiveness. The main objective of the sub-project, managed by the Carlos III University of Madrid, was the assessment of the sensitiveness of digital systems against ionizing radiation effects. For this purpose, a number of tasks were planned, aimed to set a protocol and to develop a test system, able to run irradiation campaigns in the accelerators placed in the National Centre of Accelerators, University of Seville (CNA). Furthermore, in order to fulfill the project objectives, the works developed in the PhD were devoted to the study and application of Fault Tolerance Techniques for distributed digital systems, and to the prior analysis of the systems sensitivity with emulation-based fault injection campaigns. This last task used the Autonomous Emulation Tool developed in the DMA research group of Carlos III University of Madrid. Along the development of the work of this PdD, in agreement with the tasks and objectives of RENASER+ Project, new improvements have been proposed, developed and included in the emulation-based transient fault injection tool, in order to enable the injection of multiple faults in memory elements, to enable the detailed tracking of accumulated effect of faults injected and detected by mitigation techniques and to scan the fault propagation within the distributed system. With the main purpose of generating a comprehensive method for the analysis, assessment and qualification of robust digital distributed systems, apart from injecting faults via emulation, real faults have been injected thanks to the use of a particle accelerators, CNA-US cyclotron (proton beam). The assessment of the prototyped distributed digital systems, through proton beam irradiation campaigns, required three irradiation campaigns; in the PhD a planning method was proposed for this type of campaigns on distributed systems, which allows the continuous monitoring of fault effects and automatic final report generation. As a more ambitious objective, in the PhD the method was extended to an on-line test, for the continuous checking during normal operation of distributed digital systems, to detect faults which can cause a wrong behavior or, even, a serious failure. Therefore, the qualification method for the assessment of the robustness of distributed digital systems was extended to an on-line method for these systems. During the robustness study of distributed digital systems another problem was identified and analyzed: the permanent faults appearing in this type of systems due to digital devices aging. Indeed, the proposed method is suitable for on-line testing of the normal operation of the distributed digital systems, for the detection of faults due to ionizing radiation (transient faults) and the detection of faults due to the device aging (permanent faults). Distributed digital systems, apart the nodes composing them, use to include a communication protocol that can suffer also from transient or permanent faults. In this PhD different communication protocols, used in distributed digital systems for aerospace and automotive applications; also, CAN and LIN protocols have been compared and a basic distributed system has been proposed, designed and prototyped, for validating the qualification methodology and the test on-line method, as well as proposed hardening techniques. As original contribution of this PhD, a comprehensive methodology for assessing and assuring the effective hardening of distributed digital systems, including a communication protocol, is proposed. Another original contribution of this PhD is the technique that allows the analysis of the data obtained (transparently and directly whatever the test system used) from any irradiation campaign, based on a minority checker block for the fault tracking. Furthermore, in this PhD work, different Flash-based FPGA technologies have been analyzed, with the purpose of better characterizing the analyzed devices. For the final system implemented, FPGAs Igloo from Microsemi® have been selected, because is a more robust technology. Specific control software has been developed for the distributed system, which was run on a microprocessor included in one node of the system and that sends obtained results through a PCI bus to a Personal Computer. This software automates the data collecting process, telling which node is failing, its recovery state, as well as any communication fault occurring. These tools allow the experimental validation of proposed method for the distributed digital systems. Apart from fulfilling the Project objectives, two typical problems of this type of systems have been solved. They correspond to synchronous initialization of the system and continuous real-time maintenance. Different blocks proposed and used in this comprehensive method for the assessment of the sensitivity of distributed digital systems are also adequate for on-line testing of these systems, with the prupose of detecting permanent faults (due to device aging) and transient faults (due to ionizing radiation). Finally, the latest research works of this PhD work correspond to the detailed analysis of the packaging of the devices to be irradiated, as well as its behavior when proton particles are going through it. These works have been done thanks to the help of Science and Material Engineering and Chemistry Engineering Department of Carlos III University of Madrid, and the National Centre of Accelerators of University of Seville. The majority of partial and global results of this PhD have been published in International Conferences (International On-Line Testing Symposium, Latin American Test Workshop, Digital Circuit and Integrated Systems) and in the IEEE Transactions on Nuclear Science.Esta tesis se inició con el proyecto de investigación RENASER+ “TEC2010-22095-C03-03”, financiado por el Ministerio Español de Ciencia y Tecnología.Programa Oficial de Doctorado en Ingeniería Eléctrica, Electrónica y AutomáticaPresidente: Raoul Velazco.- Secretario: Emilio Olías Ruiz.- Vocal: Miguel Ángel Aguirre Echanov
    corecore