47 research outputs found

    Novel fault tolerant Multi-Bit Upset (MBU) Error-Detection and Correction (EDAC) architecture

    Get PDF
    Desde el punto de vista de seguridad, la certificación aeronáutica de aplicaciones críticas de vuelo requiere diferentes técnicas que son usadas para prevenir fallos en los equipos electrónicos. Los fallos de tipo hardware debido a la radiación solar que existe a las alturas standard de vuelo, como SEU (Single Event Upset) y MCU (Multiple Bit Upset), provocan un cambio de estado de los bits que soportan la información almacenada en memoria. Estos fallos se producen, por ejemplo, en la memoria de configuración de una FPGA, que es donde se definen todas las funcionalidades. Las técnicas de protección requieren normalmente de redundancias que incrementan el coste, número de componentes, tamaño de la memoria y peso. En la fase de desarrollo de aplicaciones críticas de vuelo, generalmente se utilizan una serie de estándares o recomendaciones de diseño como ABD100, RTCA DO-160, IEC62395, etc, y diferentes técnicas de protección para evitar fallos del tipo SEU o MCU. Estas técnicas están basadas en procesos tecnológicos específicos como memorias robustas, codificaciones para detección y corrección de errores (EDAC), redundancias software, redundancia modular triple (TMR) o soluciones a nivel sistema. Esta tesis está enfocada a minimizar e incluso suprimir los efectos de los SEUs y MCUs que particularmente ocurren en la electrónica de avión como consecuencia de la exposición a radiación de partículas no cargadas (como son los neutrones) que se encuentra potenciada a las típicas alturas de vuelo. La criticidad en vuelo que tienen determinados sistemas obligan a que dichos sistemas sean tolerantes a fallos, es decir, que garanticen un correcto funcionamiento aún cuando se produzca un fallo en ellos. Es por ello que soluciones como las presentadas en esta tesis tienen interés en el sector industrial. La Tesis incluye una descripción inicial de la física de la radiación incidente sobre aeronaves, y el análisis de sus efectos en los componentes electrónicos aeronaúticos basados en semiconductor, que desembocan en la generación de SEUs y MCUs. Este análisis permite dimensionar adecuadamente y optimizar los procedimientos de corrección que se propongan posteriormente. La Tesis propone un sistema de corrección de fallos SEUs y MCUs que permita cumplir la condición de Sistema Tolerante a Fallos, a la vez que minimiza los niveles de redundancia y de complejidad de los códigos de corrección. El nivel de redundancia es minimizado con la introducción del concepto propuesto HSB (Hardwired Seed Bits), en la que se reduce la información esencial a unos pocos bits semilla, neutros frente a radiación. Los códigos de corrección requeridos se reducen a la corrección de un único error, gracias al uso del concepto de Distancia Virtual entre Bits, a partir del cual será posible corregir múltiples errores simultáneos (MCUs) a partir de códigos simples de corrección. Un ejemplo de aplicación de la Tesis es la implementación de una Protección Tolerante a Fallos sobre la memoria SRAM de una FPGA. Esto significa que queda protegida no sólo la información contenida en la memoria sino que también queda auto-protegida la función de protección misma almacenada en la propia SRAM. De esta forma, el sistema es capaz de auto-regenerarse ante un SEU o incluso un MCU, independientemente de la zona de la SRAM sobre la que impacte la radiación. Adicionalmente, esto se consigue con códigos simples tales como corrección por bit de paridad y Hamming, minimizando la dedicación de recursos de computación hacia tareas de supervisión del sistema.For airborne safety critical applications certification, different techniques are implemented to prevent failures in electronic equipments. The HW failures at flying heights of aircrafts related to solar radiation such as SEU (Single-Event-Upset) and MCU (Multiple Bit Upset), causes bits alterations that corrupt the information at memories. These HW failures cause errors, for example, in the Configuration-Code of an FPGA that defines the functionalities. The protection techniques require classically redundant functionalities that increases the cost, components, memory space and weight. During the development phase for airborne safety critical applications, different aerospace standards are generally recommended as ABD100, RTCA-DO160, IEC62395, etc, and different techniques are classically used to avoid failures such as SEU or MCU. These techniques are based on specific technology processes, Hardened memories, error detection and correction codes (EDAC), SW redundancy, Triple Modular Redundancy (TMR) or System level solutions. This Thesis is focussed to minimize, and even to remove, the effects of SEUs and MCUs, that particularly occurs in the airborne electronics as a consequence of its exposition to solar radiation of non-charged particles (for example the neutrons). These non-charged particles are even powered at flying altitudes due to aircraft volume. The safety categorization of different equipments/functionalities requires a design based on fault-tolerant approach that means, the system will continue its normal operation even if a failure occurs. The solution proposed in this Thesis is relevant for the industrial sector because of its Fault-tolerant capability. Thesis includes an initial description for the physics of the solar radiation that affects into aircrafts, and also the analyses of their effects into the airborne electronics based on semiconductor components that create the SEUs and MCUs. This detailed analysis allows the correct sizing and also the optimization of the procedures used to correct the errors. This Thesis proposes a system that corrects the SEUs and MCUs allowing the fulfilment of the Fault-Tolerant requirement, reducing the redundancy resources and also the complexity of the correction codes. The redundancy resources are minimized thanks to the introduction of the concept of HSB (Hardwired Seed Bits), in which the essential information is reduced to a few seed bits, neutral to radiation. The correction codes required are reduced to the correction of one error thanks to the use of the concept of interleaving distance between adjacent bits, this allows the simultaneous multiple error correction with simple single error correcting codes. An example of the application of this Thesis is the implementation of the Fault-tolerant architecture of an SRAM-based FPGA. That means that the information saved in the memory is protected but also the correction functionality is auto protected as well, also saved into SRAM memory. In this way, the system is able to self-regenerate the information lost in case of SEUs or MCUs. This is independent of the SRAM area affected by the radiation. Furthermore, this performance is achieved by means simple error correcting codes, as parity bits or Hamming, that minimize the use of computational resources to this supervision tasks for system.Programa Oficial de Doctorado en Ingeniería Eléctrica, Electrónica y AutomáticaPresidente: Luis Alfonso Entrena Arrontes.- Secretario: Pedro Reviriego Vasallo.- Vocal: Mª Luisa López Vallej

    Characterization of Interconnection Delays in FPGAS Due to Single Event Upsets and Mitigation

    Get PDF
    RÉSUMÉ L’utilisation incessante de composants électroniques à géométrie toujours plus faible a engendré de nouveaux défis au fil des ans. Par exemple, des semi-conducteurs à mémoire et à microprocesseur plus avancés sont utilisés dans les systèmes avioniques qui présentent une susceptibilité importante aux phénomènes de rayonnement cosmique. L'une des principales implications des rayons cosmiques, observée principalement dans les satellites en orbite, est l'effet d'événements singuliers (SEE). Le rayonnement atmosphérique suscite plusieurs préoccupations concernant la sécurité et la fiabilité de l'équipement avionique, en particulier pour les systèmes qui impliquent des réseaux de portes programmables (FPGA). Les FPGA à base de cellules de mémoire statique (SRAM) présentent une solution attrayante pour mettre en oeuvre des systèmes complexes dans le domaine de l’avionique. Les expériences de rayonnement réalisées sur les FPGA ont dévoilé la vulnérabilité de ces dispositifs contre un type particulier de SEE, à savoir, les événements singuliers de changement d’état (SEU). Un SEU est considérée comme le changement de l'état d'un élément bistable (c'est-à-dire, un bit-flip) dû à l'effet d'un ion, d'un proton ou d’un neutron énergétique. Cet effet est non destructif et peut être corrigé en réécrivant la partie de la SRAM affectée. Les changements de délai (DC) potentiels dus aux SEU affectant la mémoire de configuration de routage ont été récemment confirmés. Un des objectifs de cette thèse consiste à caractériser plus précisément les DC dans les FPGA causés par les SEU. Les DC observés expérimentalement sont présentés et la modélisation au niveau circuit de ces DC est proposée. Les circuits impliqués dans la propagation du délai sont validés en effectuant une modélisation précise des blocs internes à l'intérieur du FPGA et en exécutant des simulations. Les résultats montrent l’origine des DC qui sont en accord avec les mesures expérimentales de délais. Les modèles proposés au niveau circuit sont, aux meilleures de notre connaissance, le premier travail qui confirme et explique les délais combinatoires dans les FPGA. La conception d'un circuit moniteur de délai pour la détection des DC a été faite dans la deuxième partie de cette thèse. Ce moniteur permet de détecter un changement de délai sur les sections critiques du circuit et de prévenir les pannes de synchronisation engendrées par les SEU sans utiliser la redondance modulaire triple (TMR).----------ABSTRACT The unrelenting demand for electronic components with ever diminishing feature size have emerged new challenges over the years. Among them, more advanced memory and microprocessor semiconductors are being used in avionic systems that exhibit a substantial susceptibility to cosmic radiation phenomena. One of the main implications of cosmic rays, which was primarily observed in orbiting satellites, is single-event effect (SEE). Atmospheric radiation causes several concerns regarding the safety and reliability of avionics equipment, particularly for systems that involve field programmable gate arrays (FPGA). SRAM-based FPGAs, as an attractive solution to implement systems in aeronautic sector, are very susceptible to SEEs in particular Single Event Upset (SEU). An SEU is considered as the change of the state of a bistable element (i.e., bit-flip) due to the effect of an energetic ion or proton. This effect is non-destructive and may be fixed by rewriting the affected part. Sensitivity evaluation of SRAM-based FPGAs to a physical impact such as potential delay changes (DC) has not been addressed thus far in the literature. DCs induced by SEU can affect the functionality of the logic circuits by disturbing the race condition on critical paths. The objective of this thesis is toward the characterization of DCs in SRAM-based FPGAs due to transient ionizing radiation. The DCs observed experimentally are presented and the circuit-level modeling of those DCs is proposed. Circuits involved in delay propagation are reverse-engineered by performing precise modeling of internal blocks inside the FPGA and executing simulations. The results show the root cause of DCs that are in good agreement with experimental delay measurements. The proposed circuit level models are, to the best of our knowledge, the first work on modeling of combinational delays in FPGAs.In addition, the design of a delay monitor circuit for DC detection is investigated in the second part of this thesis. This monitor allowed to show experimentally cumulative DCs on interconnects in FPGA. To this end, by avoiding the use of triple modular redundancy (TMR), a mitigation technique for DCs is proposed and the system downtime is minimized. A method is also proposed to decrease the clock frequency after DC detection without interrupting the process

    Exploration and Analysis of Combinations of Hamming Codes in 32-bit Memories

    Full text link
    Reducing the threshold voltage of electronic devices increases their sensitivity to electromagnetic radiation dramatically, increasing the probability of changing the memory cells' content. Designers mitigate failures using techniques such as Error Correction Codes (ECCs) to maintain information integrity. Although there are several studies of ECC usage in spatial application memories, there is still no consensus in choosing the type of ECC as well as its organization in memory. This work analyzes some configurations of the Hamming codes applied to 32-bit memories in order to use these memories in spatial applications. This work proposes the use of three types of Hamming codes: Ham(31,26), Ham(15,11), and Ham(7,4), as well as combinations of these codes. We employed 36 error patterns, ranging from one to four bit-flips, to analyze these codes. The experimental results show that the Ham(31,26) configuration, containing five bits of redundancy, obtained the highest rate of simple error correction, almost 97\%, with double, triple, and quadruple error correction rates being 78.7\%, 63.4\%, and 31.4\%, respectively. While an ECC configuration encompassed four Ham(7.4), which uses twelve bits of redundancy, only fixes 87.5\% of simple errors

    Total ionizing dose and single event upset testing of flash based field programmable gate arrays

    Get PDF
    The effectiveness of implementing field programmable gate arrays (FPGAs) in communication, military, space and high radiation environment applications, coupled with the increased accessibility of private individuals and researchers to launch satellites, has led to an increased interest in commercial off the shelf components. The metal oxide semiconductor (MOS) structures of FPGAs however, are sensitive to radiation effects which can lead to decreased reliability of the device. In order to successfully implement a FPGA based system in a radiation environment, such as on-board a satellite, the single event upset (SEU) and total ionizing dose (TID) characteristics of the device must first be established. This research experimentally determines a research procedure which could accurately determine the SEU cross sections and TID characteristics of various mitigation techniques as well as control circuits implemented in a ProASIC3 A3P1000 FPGA. To gain an understanding of the SEU effects of the implemented circuits, the test FPGA was irradiated by a 66MeV proton beam at the iTemba LABS facility. Through means of irradiation, the SEU cross section of various communication, motor control and mitigation schemes circuits, induced by high energy proton strikes was investigated. The implementation of a full global triple modular redundancy (TMR) and a combination of TMR and a AND-OR multiplexer filter was found to most effectively mitigate SEUs in comparison to the other techniques. When comparing the communication and motor control circuits, the high frequency I2C and SPI circuits experienced a higher number of upsets when compared to a low frequency servo motor control circuit. To gain a better understanding of the absorbed dose effects, experimental TID testing was conducted by irradiating the test FPGA with a cobalt-60 (Co-60) source. An accumulated absorbed dose resulted in the fluctuation of the device supply current and operating voltages as well as resulted in output errors. The TMR and TMR filtering combination mitigation techniques again were found to be the most effective methods of mitigation

    Real-time trace decoding and monitoring for safety and security in embedded systems

    Get PDF
    Integrated circuits and systems can be found almost everywhere in today’s world. As their use increases, they need to be made safer and more perfor mant to meet current demands in processing power. FPGA integrated SoCs can provide the ideal trade-off between performance, adaptability, and energy usage. One of today’s vital challenges lies in updating existing fault tolerance techniques for these new systems while utilizing all available processing capa bilities, such as multi-core and heterogeneous processing units. Control-flow monitoring is one of the primary mechanisms described for error detection at the software architectural level for the highest grade of hazard level clas sifications (e.g., ASIL D) described in industry safety standards ISO-26262. Control-flow errors are also known to compose the majority of detected errors for ICs and embedded systems in safety-critical and risk-susceptible environ ments [5]. Software-based monitoring methods remain the most popular [6–8]. However, recent studies show that the overheads they impose make actual reliability gains negligible [9, 10]. This work proposes and demonstrates a new control flow checking method implemented in FPGA for multi-core embedded systems called control-flow trace checker (CFTC). CFTC uses existing trace and debug subsystems of modern processors to rebuild their execution states. It can iden tify any errors in real-time by comparing executed states to a set of permitted state transitions determined statically. This novel implementation weighs hardware resource trade-offs to target mul tiple independent tasks in multi-core embedded applications, as well as single core systems. The proposed system is entirely implemented in hardware and isolated from all monitored software components, requiring 2.4% of the target FPGA platform resources to protect an execution unit in its entirety. There fore, it avoids undesired overheads and maintains deterministic error detection latencies, which guarantees reliability improvements without impairing the target software system. Finally, CFTC is evaluated under different software i Resumo fault-injection scenarios, achieving detection rates of 100% of all control-flow errors to wrong destinations and 98% of all injected faults to program binaries. All detection times are further analyzed and precisely described by a model based on the monitor’s resources and speed and the software application’s control-flow structure and binary characteristics.Circuitos integrados estão presentes em quase todos sistemas complexos do mundo moderno. Conforme sua frequência de uso aumenta, eles precisam se tornar mais seguros e performantes para conseguir atender as novas demandas em potência de processamento. Sistemas em Chip integrados com FPGAs conseguem prover o balanço perfeito entre desempenho, adaptabilidade, e uso de energia. Um dos maiores desafios agora é a necessidade de atualizar técnicas de tolerância à falhas para estes novos sistemas, aproveitando os novos avanços em capacidade de processamento. Monitoramento de fluxo de controle é um dos principais mecanismos para a detecção de erros em nível de software para sistemas classificados como de alto risco (e.g. ASIL D), descrito em padrões de segurança como o ISO-26262. Estes erros são conhecidos por compor a maioria dos erros detectados em sistemas integrados [5]. Embora métodos de monitoramento baseados em software continuem sendo os mais populares [6–8], estudos recentes mostram que seus custos adicionais, em termos de performance e área, diminuem consideravelmente seus ganhos reais em confiabilidade [9, 10]. Propomos aqui um novo método de monitora mento de fluxo de controle implementado em FPGA para sistemas embarcados multi-core. Este método usa subsistemas de trace e execução de código para reconstruir o estado atual do processador, identificando erros através de com parações entre diferentes estados de execução da CPU. Propomos uma implementação que considera trade-offs no uso de recuros de sistema para monitorar múltiplas tarefas independetes. Nossa abordagem suporta o monitoramento de sistemas simples e também de sistemas multi-core multitarefa. Por fim, nossa técnica é totalmente implementada em hardware, evitando o uso de unidades de processamento de software que possa adicionar custos indesejáveis à aplicação em perda de confiabilidade. Propomos, assim, um mecanismo de verificação de fluxo de controle, escalável e extensível, para proteção de sistemas embarcados críticos e multi-core

    Real-Time Trace Decoding and Monitoring for Safety and Security in Embedded Systems

    Get PDF
    Integrated circuits and systems can be found almost everywhere in today’s world. As their use increases, they need to be made safer and more perfor mant to meet current demands in processing power. FPGA integrated SoCs can provide the ideal trade-off between performance, adaptability, and energy usage. One of today’s vital challenges lies in updating existing fault tolerance techniques for these new systems while utilizing all available processing capa bilities, such as multi-core and heterogeneous processing units. Control-flow monitoring is one of the primary mechanisms described for error detection at the software architectural level for the highest grade of hazard level clas sifications (e.g., ASIL D) described in industry safety standards ISO-26262. Control-flow errors are also known to compose the majority of detected errors for ICs and embedded systems in safety-critical and risk-susceptible environ ments [5]. Software-based monitoring methods remain the most popular [6–8]. However, recent studies show that the overheads they impose make actual reliability gains negligible [9, 10]. This work proposes and demonstrates a new control flow checking method implemented in FPGA for multi-core embedded systems called control-flow trace checker (CFTC). CFTC uses existing trace and debug subsystems of modern processors to rebuild their execution states. It can iden tify any errors in real-time by comparing executed states to a set of permitted state transitions determined statically. This novel implementation weighs hardware resource trade-offs to target mul tiple independent tasks in multi-core embedded applications, as well as single core systems. The proposed system is entirely implemented in hardware and isolated from all monitored software components, requiring 2.4% of the target FPGA platform resources to protect an execution unit in its entirety. There fore, it avoids undesired overheads and maintains deterministic error detection latencies, which guarantees reliability improvements without impairing the target software system. Finally, CFTC is evaluated under different software i Resumo fault-injection scenarios, achieving detection rates of 100% of all control-flow errors to wrong destinations and 98% of all injected faults to program binaries. All detection times are further analyzed and precisely described by a model based on the monitor’s resources and speed and the software application’s control-flow structure and binary characteristics.Circuitos integrados estão presentes em quase todos sistemas complexos do mundo moderno. Conforme sua frequência de uso aumenta, eles precisam se tornar mais seguros e performantes para conseguir atender as novas demandas em potência de processamento. Sistemas em Chip integrados com FPGAs conseguem prover o balanço perfeito entre desempenho, adaptabilidade, e uso de energia. Um dos maiores desafios agora é a necessidade de atualizar técnicas de tolerância à falhas para estes novos sistemas, aproveitando os novos avanços em capacidade de processamento. Monitoramento de fluxo de controle é um dos principais mecanismos para a detecção de erros em nível de software para sistemas classificados como de alto risco (e.g. ASIL D), descrito em padrões de segurança como o ISO-26262. Estes erros são conhecidos por compor a maioria dos erros detectados em sistemas integrados [5]. Embora métodos de monitoramento baseados em software continuem sendo os mais populares [6–8], estudos recentes mostram que seus custos adicionais, em termos de performance e área, diminuem consideravelmente seus ganhos reais em confiabilidade [9, 10]. Propomos aqui um novo método de monitora mento de fluxo de controle implementado em FPGA para sistemas embarcados multi-core. Este método usa subsistemas de trace e execução de código para reconstruir o estado atual do processador, identificando erros através de com parações entre diferentes estados de execução da CPU. Propomos uma implementação que considera trade-offs no uso de recuros de sistema para monitorar múltiplas tarefas independetes. Nossa abordagem suporta o monitoramento de sistemas simples e também de sistemas multi-core multitarefa. Por fim, nossa técnica é totalmente implementada em hardware, evitando o uso de unidades de processamento de software que possa adicionar custos indesejáveis à aplicação em perda de confiabilidade. Propomos, assim, um mecanismo de verificação de fluxo de controle, escalável e extensível, para proteção de sistemas embarcados críticos e multi-core

    Development of a Remotely-Piloted Vehicle Platform to Support Implementation, Verification, and Validation of Pilot Control Systems

    Get PDF
    This thesis presents the development of a research test bed and the use of a set of metrics for evaluating handling qualities with pilot in the loop configuration. The main objective of this study is to provide software and hardware tools to support performance evaluation of control systems designed to compensate for Pilot Induced Oscillations (PIOs). A remotely-piloted vehicle presented in this thesis consists of an RC aircraft modified to be flown from a ground station cockpit. The unmanned aerial system has a high-speed on-board processing system capable of simulating different conditions during flight such as injecting actuator failures and adding delays. In this study, the analysis of pilot handling qualities based on a set of evaluation metrics, is also included. The metrics are based on time-domain Neal-Smith criterion and are used to provide numerical data which categorizes the control system in one of the levels on the Cooper-Harper Rating scale. Two different control configurations were implemented and analyzed in this study: stick-to-servo and non-linear dynamic inversion control laws. Piloted-simulation results are presented on the Neal-Smith flying qualities plane at different flight conditions

    Mixed-Criticality Systems on Commercial-Off-the-Shelf Multi-Processor Systems-on-Chip

    Get PDF
    Avionics and space industries are struggling with the adoption of technologies like multi-processor system-on-chips (MPSoCs) due to strict safety requirements. This thesis propose a new reference architecture for MPSoC-based mixed-criticality systems (MCS) - i.e., systems integrating applications with different level of criticality - which are a common use case for aforementioned industries. This thesis proposes a system architecture capable of granting partitioning - which is, for short, the property of fault containment. It is based on the detection of spatial and temporal interference, and has been named the online detection of interference (ODIn) architecture. Spatial partitioning requires that an application is not able to corrupt resources used by a different application. In the architecture proposed in this thesis, spatial partitioning is implemented using type-1 hypervisors, which allow definition of resource partitions. An application running in a partition can only access resources granted to that partition, therefore it cannot corrupt resources used by applications running in other partitions. Temporal partitioning requires that an application is not able to unexpectedly change the execution time of other applications. In the proposed architecture, temporal partitioning has been solved using a bounded interference approach, composed of an offline analysis phase and an online safety net. The offline phase is based on a statistical profiling of a metric sensitive to temporal interference’s, performed in nominal conditions, which allows definition of a set of three thresholds: 1. the detection threshold TD; 2. the warning threshold TW ; 3. the α threshold. Two rules of detection are defined using such thresholds: Alarm rule When the value of the metric is above TD. Warning rule When the value of the metric is in the warning region [TW ;TD] for more than α consecutive times. ODIn’s online safety-net exploits performance counters, available in many MPSoC architectures; such counters are configured at bootstrap to monitor the selected metric(s), and to raise an interrupt request (IRQ) in case the metric value goes above TD, implementing the alarm rule. The warning rule is implemented in a software detection module, which reads the value of performance counters when the monitored task yields control to the scheduler and reset them if there is no detection. ODIn also uses two additional detection mechanisms: 1. a control flow check technique, based on compile-time defined block signatures, is implemented through a set of watchdog processors, each monitoring one partition. 2. a timeout is implemented through a system watchdog timer (SWDT), which is able to send an external signal when the timeout is violated. The recovery actions implemented in ODIn are: • graceful degradation, to react to IRQs of WDPs monitoring non-critical applications or to warning rule violations; it temporarily stops non-critical applications to grant resources to the critical application; • hard recovery, to react to the SWDT, to the WDP of the critical application, or to alarm rule violations; it causes a switch to a hot stand-by spare computer. Experimental validation of ODIn was performed on two hardware platforms: the ZedBoard - dual-core - and the Inventami board - quad-core. A space benchmark and an avionic benchmark were implemented on both platforms, composed by different modules as showed in Table 1 Each version of the final application was evaluated through fault injection (FI) campaigns, performed using a specifically designed FI system. There were three types of FI campaigns: 1. HW FI, to emulate single event effects; 2. SW FI, to emulate bugs in non-critical applications; 3. artificial bug FI, to emulate a bug in non-critical applications introducing unexpected interference on the critical application. Experimental results show that ODIn is resilient to all considered types of faul

    High-performance hardware accelerators for image processing in space applications

    Get PDF
    Mars is a hard place to reach. While there have been many notable success stories in getting probes to the Red Planet, the historical record is full of bad news. The success rate for actually landing on the Martian surface is even worse, roughly 30%. This low success rate must be mainly credited to the Mars environment characteristics. In the Mars atmosphere strong winds frequently breath. This phenomena usually modifies the lander descending trajectory diverging it from the target one. Moreover, the Mars surface is not the best place where performing a safe land. It is pitched by many and close craters and huge stones, and characterized by huge mountains and hills (e.g., Olympus Mons is 648 km in diameter and 27 km tall). For these reasons a mission failure due to a landing in huge craters, on big stones or on part of the surface characterized by a high slope is highly probable. In the last years, all space agencies have increased their research efforts in order to enhance the success rate of Mars missions. In particular, the two hottest research topics are: the active debris removal and the guided landing on Mars. The former aims at finding new methods to remove space debris exploiting unmanned spacecrafts. These must be able to autonomously: detect a debris, analyses it, in order to extract its characteristics in terms of weight, speed and dimension, and, eventually, rendezvous with it. In order to perform these tasks, the spacecraft must have high vision capabilities. In other words, it must be able to take pictures and process them with very complex image processing algorithms in order to detect, track and analyse the debris. The latter aims at increasing the landing point precision (i.e., landing ellipse) on Mars. Future space-missions will increasingly adopt Video Based Navigation systems to assist the entry, descent and landing (EDL) phase of space modules (e.g., spacecrafts), enhancing the precision of automatic EDL navigation systems. For instance, recent space exploration missions, e.g., Spirity, Oppurtunity, and Curiosity, made use of an EDL procedure aiming at following a fixed and precomputed descending trajectory to reach a precise landing point. This approach guarantees a maximum landing point precision of 20 km. By comparing this data with the Mars environment characteristics, it is possible to understand how the mission failure probability still remains really high. A very challenging problem is to design an autonomous-guided EDL system able to even more reduce the landing ellipse, guaranteeing to avoid the landing in dangerous area of Mars surface (e.g., huge craters or big stones) that could lead to the mission failure. The autonomous behaviour of the system is mandatory since a manual driven approach is not feasible due to the distance between Earth and Mars. Since this distance varies from 56 to 100 million of km approximately due to the orbit eccentricity, even if a signal transmission at the light speed could be possible, in the best case the transmission time would be around 31 minutes, exceeding so the overall duration of the EDL phase. In both applications, algorithms must guarantee self-adaptability to the environmental conditions. Since the Mars (and in general the space) harsh conditions are difficult to be predicted at design time, these algorithms must be able to automatically tune the internal parameters depending on the current conditions. Moreover, real-time performances are another key factor. Since a software implementation of these computational intensive tasks cannot reach the required performances, these algorithms must be accelerated via hardware. For this reasons, this thesis presents my research work done on advanced image processing algorithms for space applications and the associated hardware accelerators. My research activity has been focused on both the algorithm and their hardware implementations. Concerning the first aspect, I mainly focused my research effort to integrate self-adaptability features in the existing algorithms. While concerning the second, I studied and validated a methodology to efficiently develop, verify and validate hardware components aimed at accelerating video-based applications. This approach allowed me to develop and test high performance hardware accelerators that strongly overcome the performances of the actual state-of-the-art implementations. The thesis is organized in four main chapters. Chapter 2 starts with a brief introduction about the story of digital image processing. The main content of this chapter is the description of space missions in which digital image processing has a key role. A major effort has been spent on the missions in which my research activity has a substantial impact. In particular, for these missions, this chapter deeply analizes and evaluates the state-of-the-art approaches and algorithms. Chapter 3 analyzes and compares the two technologies used to implement high performances hardware accelerators, i.e., Application Specific Integrated Circuits (ASICs) and Field Programmable Gate Arrays (FPGAs). Thanks to this information the reader may understand the main reasons behind the decision of space agencies to exploit FPGAs instead of ASICs for high-performance hardware accelerators in space missions, even if FPGAs are more sensible to Single Event Upsets (i.e., transient error induced on hardware component by alpha particles and solar radiation in space). Moreover, this chapter deeply describes the three available space-grade FPGA technologies (i.e., One-time Programmable, Flash-based, and SRAM-based), and the main fault-mitigation techniques against SEUs that are mandatory for employing space-grade FPGAs in actual missions. Chapter 4 describes one of the main contribution of my research work: a library of high-performance hardware accelerators for image processing in space applications. The basic idea behind this library is to offer to designers a set of validated hardware components able to strongly speed up the basic image processing operations commonly used in an image processing chain. In other words, these components can be directly used as elementary building blocks to easily create a complex image processing system, without wasting time in the debug and validation phase. This library groups the proposed hardware accelerators in IP-core families. The components contained in a same family share the same provided functionality and input/output interface. This harmonization in the I/O interface enables to substitute, inside a complex image processing system, components of the same family without requiring modifications to the system communication infrastructure. In addition to the analysis of the internal architecture of the proposed components, another important aspect of this chapter is the methodology used to develop, verify and validate the proposed high performance image processing hardware accelerators. This methodology involves the usage of different programming and hardware description languages in order to support the designer from the algorithm modelling up to the hardware implementation and validation. Chapter 5 presents the proposed complex image processing systems. In particular, it exploits a set of actual case studies, associated with the most recent space agency needs, to show how the hardware accelerator components can be assembled to build a complex image processing system. In addition to the hardware accelerators contained in the library, the described complex system embeds innovative ad-hoc hardware components and software routines able to provide high performance and self-adaptable image processing functionalities. To prove the benefits of the proposed methodology, each case study is concluded with a comparison with the current state-of-the-art implementations, highlighting the benefits in terms of performances and self-adaptability to the environmental conditions

    Reliable Software for Unreliable Hardware - A Cross-Layer Approach

    Get PDF
    A novel cross-layer reliability analysis, modeling, and optimization approach is proposed in this thesis that leverages multiple layers in the system design abstraction (i.e. hardware, compiler, system software, and application program) to exploit the available reliability enhancing potential at each system layer and to exchange this information across multiple system layers
    corecore