6 research outputs found

    Concepção e validação de arquitetura robusta baseada em soft processors para uso em computadores de bordo de satélites artificiais

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia Elétrica, Florianópolis, 2013.A flexibilidade introduzida pela utilização de FPGAs (Field Programmable Gate Array) SRAM comerciais em aplicações embarcadas, faz com que esta tecnologia se torne uma alternativa atraente para aplicações militares e espaciais. No presente trabalho, foi desenvolvido um Computador de Bordo utilizando soft processor embarcado em um FPGA do tipo SRAM. O Computador de Bordo é baseado em requisitos funcionais especificados pelo Instituto Nacional de Pesquisas Espaciais (INPE) para o Computador de Bordo a ser utilizado em suas futuras missões. Módulos de software e hardware foram implementados visando executar as principais funcionalidades de um Computador de Bordo. No entanto, os avanços oriundos de tecnologias nanométricas trazem uma maior vulnerabilidade dos componentes eletrônicos a efeitos de radiação. Em aplicações críticas é importante que técnicas de tolerância a falhas sejam utilizadas para aumentar o grau de confiabilidade das aplicações. Com o intuito de mitigar falhas causadas pela radiação a qual computadores de bordo são expostos no espaço, uma técnica de tolerância a falhas não intrusiva foi desenvolvida. A técnica proposta visa aplicar mecanismos de detecção de falhas utilizando um monitor de barramento para comparar os dados de saída de um soft processor principal com seu módulo redundante. Caso os dados sejam diferentes, um sinal de erro é gerado, iniciando a estratégia de tolerância a falhas. A técnica proposta se mostrou eficiente quando comparada a técnicas do estado da arte como a Redundância Tripla (Triple Modular Redundancy, TMR) e Tolerância a Falhas em Hardware Implementadas em Software (Software Implemented Hardware Fault Tolerance, SIHFT) para identificação de falhas simples em tempo de execução com menor ocupação de área e sem alterar o desempenho da aplicação.Abstract : The flexibility introduced by Commercial Off The Shelf (COTS) SRAM based FPGAs in on-board system designs make them an attractive option for military and aerospace applications. However, the advances towards the nanometer technology come together with a higher vulnerability of integrated circuits to radiation perturbations. In mission critical applications it is important to improve the reliability of applications by using fault-tolerance techniques. In this work, the concept of an On-Board Computer (OBC) system aiming a soft-processor embedded on a SRAM based FPGA is proposed. The OBC comply with functional requirements of the Brazilian Institute of Space Research (INPE) for the OBC that will be employed in future missions. Modules of software and hardware were implemented in order to execute the main capabilities of the OBC. In order to mitigate the faults caused by radiation on the space environment, a non-intrusive fault tolerance technique has been developed. The proposed technique targets soft processors (e.g. LEON3), and its detection mechanism uses a Bus Monitor to compare output data of a main soft-processor with its redundant module. In case of a mismatch, an error signal is activated, triggering the proposed fault tolerance strategy. This approach shows to be more efficient than the state-of-the-art Triple Modular Redundancy (TMR) and Software Implemented Hardware Fault Tolerance (SIHFT) approaches in order to detect and to correct faults on the fly with low area overhead and with no major performance penalties

    Analysis and Test of the Effects of Single Event Upsets Affecting the Configuration Memory of SRAM-based FPGAs

    Get PDF
    SRAM-based FPGAs are increasingly relevant in a growing number of safety-critical application fields, ranging from automotive to aerospace. These application fields are characterized by a harsh radiation environment that can cause the occurrence of Single Event Upsets (SEUs) in digital devices. These faults have particularly adverse effects on SRAM-based FPGA systems because not only can they temporarily affect the behaviour of the system by changing the contents of flip-flops or memories, but they can also permanently change the functionality implemented by the system itself, by changing the content of the configuration memory. Designing safety-critical applications requires accurate methodologies to evaluate the system’s sensitivity to SEUs as early as possible during the design process. Moreover it is necessary to detect the occurrence of SEUs during the system life-time. To this purpose test patterns should be generated during the design process, and then applied to the inputs of the system during its operation. In this thesis we propose a set of software tools that could be used by designers of SRAM-based FPGA safety-critical applications to assess the sensitivity to SEUs of the system and to generate test patterns for in-service testing. The main feature of these tools is that they implement a model of SEUs affecting the configuration bits controlling the logic and routing resources of an FPGA device that has been demonstrated to be much more accurate than the classical stuck-at and open/short models, that are commonly used in the analysis of faults in digital devices. By keeping this accurate fault model into account, the proposed tools are more accurate than similar academic and commercial tools today available for the analysis of faults in digital circuits, that do not take into account the features of the FPGA technology.. In particular three tools have been designed and developed: (i) ASSESS: Accurate Simulator of SEuS affecting the configuration memory of SRAM-based FPGAs, a simulator of SEUs affecting the configuration memory of an SRAM-based FPGA system for the early assessment of the sensitivity to SEUs; (ii) UA2TPG: Untestability Analyzer and Automatic Test Pattern Generator for SEUs Affecting the Configuration Memory of SRAM-based FPGAs, a static analysis tool for the identification of the untestable SEUs and for the automatic generation of test patterns for in-service testing of the 100% of the testable SEUs; and (iii) GABES: Genetic Algorithm Based Environment for SEU Testing in SRAM-FPGAs, a Genetic Algorithm-based Environment for the generation of an optimized set of test patterns for in-service testing of SEUs. The proposed tools have been applied to some circuits from the ITC’99 benchmark. The results obtained from these experiments have been compared with results obtained by similar experiments in which we considered the stuck-at fault model, instead of the more accurate model for SEUs. From the comparison of these experiments we have been able to verify that the proposed software tools are actually more accurate than similar tools today available. In particular the comparison between results obtained using ASSESS with those obtained by fault injection has shown that the proposed fault simulator has an average error of 0:1% and a maximum error of 0:5%, while using a stuck-at fault simulator the average error with respect of the fault injection experiment has been 15:1% with a maximum error of 56:2%. Similarly the comparison between the results obtained using UA2TPG for the accurate SEU model, with the results obtained for stuck-at faults has shown an average difference of untestability of 7:9% with a maximum of 37:4%. Finally the comparison between fault coverages obtained by test patterns generated for the accurate model of SEUs and the fault coverages obtained by test pattern designed for stuck-at faults, shows that the former detect the 100% of the testable faults, while the latter reach an average fault coverage of 78:9%, with a minimum of 54% and a maximum of 93:16%

    Cross layer reliability estimation for digital systems

    Get PDF
    Forthcoming manufacturing technologies hold the promise to increase multifuctional computing systems performance and functionality thanks to a remarkable growth of the device integration density. Despite the benefits introduced by this technology improvements, reliability is becoming a key challenge for the semiconductor industry. With transistor size reaching the atomic dimensions, vulnerability to unavoidable fluctuations in the manufacturing process and environmental stress rise dramatically. Failing to meet a reliability requirement may add excessive re-design cost to recover and may have severe consequences on the success of a product. %Worst-case design with large margins to guarantee reliable operation has been employed for long time. However, it is reaching a limit that makes it economically unsustainable due to its performance, area, and power cost. One of the open challenges for future technologies is building ``dependable'' systems on top of unreliable components, which will degrade and even fail during normal lifetime of the chip. Conventional design techniques are highly inefficient. They expend significant amount of energy to tolerate the device unpredictability by adding safety margins to a circuit's operating voltage, clock frequency or charge stored per bit. Unfortunately, the additional cost introduced to compensate unreliability are rapidly becoming unacceptable in today's environment where power consumption is often the limiting factor for integrated circuit performance, and energy efficiency is a top concern. Attention should be payed to tailor techniques to improve the reliability of a system on the basis of its requirements, ending up with cost-effective solutions favoring the success of the product on the market. Cross-layer reliability is one of the most promising approaches to achieve this goal. Cross-layer reliability techniques take into account the interactions between the layers composing a complex system (i.e., technology, hardware and software layers) to implement efficient cross-layer fault mitigation mechanisms. Fault tolerance mechanism are carefully implemented at different layers starting from the technology up to the software layer to carefully optimize the system by exploiting the inner capability of each layer to mask lower level faults. For this purpose, cross-layer reliability design techniques need to be complemented with cross-layer reliability evaluation tools, able to precisely assess the reliability level of a selected design early in the design cycle. Accurate and early reliability estimates would enable the exploration of the system design space and the optimization of multiple constraints such as performance, power consumption, cost and reliability. This Ph.D. thesis is devoted to the development of new methodologies and tools to evaluate and optimize the reliability of complex digital systems during the early design stages. More specifically, techniques addressing hardware accelerators (i.e., FPGAs and GPUs), microprocessors and full systems are discussed. All developed methodologies are presented in conjunction with their application to real-world use cases belonging to different computational domains

    Plaeser - plataforma de emulação de soft errors visando a análise experimental de técnicas de tolerância a falhas: uma prototipação rápida utilizando FPGAs

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-Graduação em Engenharia ElétricaO constante avanço na fabricação de circuitos integrados com a miniaturização da tecnologia, o aumento da frequência de operação e a diminuição da tensão de alimentação fazem deles cada vez mais sensíveis à radiação. A preocupação com a sensibilidade de circuitos integrados não é mais restrita a projetos de aplicações espaciais onde o ambiente é mais hostil quanto à radiação. Circuitos fabricados com tecnologias em escala nanométrica são potencialmente sensíveis a partículas que se encontram na atmosfera terrestre e até no nível do mar. A importância da tolerância a falhas em semicondutores existe desde quando anomalias foram observadas no comportamento de dispositivos operando no espaço. A larga presença de circuitos integrados em diversas áreas do nosso cotidiano faz com que técnicas de tolerância a falhas ganhem importância também para aplicações terrestres. Desse modo, formas eficientes de avaliação dessas técnicas de tolerância a falhas são essenciais para lidar com essa demanda. É importante que essa avaliação possa ser realizada em etapas iniciais do projeto de circuitos integrados tolerantes à radiação de forma a reduzir o custo com locação de instalações que utilizam equipamentos de radiação induzida para verificação. Nesse contexto, o trabalho de dissertação apresenta um estudo sobre diferentes técnicas de injeção de falhas. Além do estudo, foi desenvolvida uma plataforma de emulação de soft errors (PLAESER) visando a análise experimental de técnicas de tolerância a falhas. A plataforma PLAESER provê suporte ao fluxo proposto para avaliação de técnicas de tolerância a falhas em fase inicial do projeto de circuitos robustos através da prototipação rápida em FPGAs. Os resultados obtidos com os casos de teste utilizados procuram mostrar o emprego do fluxo proposto para análise de técnicas de tolerância a falhas.The continuous improvements in the integrated circuits manufacture process considering the miniaturization of technology, increase of clock frequencies and limitation of power supply, make them more susceptible to radiation. The concern with circuit sensitivity is no longer restricted to space applications, in harsh environment. Integrated circuits manufactured with nanometric technologies are potentially sensitive to particles present in the atmosphere and also at the sea level. Fault tolerance strategies applied to semiconductors have been around since upsets were first experienced in space applications. The large usage of integrated circuits in several areas of everyday life makes fault tolerance techniques important also for terrestrial applications. Therefore, efficient hardness evaluation solutions are essential to deal with this demand. Such evaluation is important and should be performed earlier in hardened integrated circuit designs in order to reduce costs with rental of radiation facilities. In this context, this work presents a evaluation of different fault injection techniques. Moreover, a soft error emulation platform (PLAESER) has been developed in order to analyze fault tolerance techniques experimentally. PLEASER gives support to the flow proposed to evaluate fault tolerance techniques earlier in hardened circuit designs through rapid prototyping. The results obtained with the selected test cases show the employment of the proposed flow to analyze fault tolerance techniques

    Analyse und Erweiterung eines fehler-toleranten NoC für SRAM-basierte FPGAs in Weltraumapplikationen

    Get PDF
    Data Processing Units for scientific space mission need to process ever higher volumes of data and perform ever complex calculations. But the performance of available space-qualified general purpose processors is just in the lower three digit megahertz range, which is already insufficient for some applications. As an alternative, suitable processing steps can be implemented in hardware on a space-qualified SRAM-based FPGA. However, suitable devices are susceptible against space radiation. At the Institute for Communication and Network Engineering a fault-tolerant, network-based communication architecture was developed, which enables the construction of processing chains on the basis of different processing modules within suitable SRAM-based FPGAs and allows the exchange of single processing modules during runtime, too. The communication architecture and its protocol shall isolate non SEU mitigated or just partial SEU mitigated modules affected by radiation-induced faults to prohibit the propagation of errors within the remaining System-on-Chip. In the context of an ESA study, this communication architecture was extended with further components and implemented in a representative hardware platform. Based on the acquired experiences during the study, this work analyses the actual fault-tolerance characteristics as well as weak points of this initial implementation. At appropriate locations, the communication architecture was extended with mechanisms for fault-detection and fault-differentiation as well as with a hardware-based monitoring solution. Both, the former measures and the extension of the employed hardware-platform with selective fault-injection capabilities for the emulation of radiation-induced faults within critical areas of a non SEU mitigated processing module, are used to evaluate the effects of radiation-induced faults within the communication architecture. By means of the gathered results, further measures to increase fast detection and isolation of faulty nodes are developed, selectively implemented and verified. In particular, the ability of the communication architecture to isolate network nodes without SEU mitigation could be significantly improved.Instrumentenrechner für wissenschaftliche Weltraummissionen müssen ein immer höheres Datenvolumen verarbeiten und immer komplexere Berechnungen ausführen. Die Performanz von verfügbaren qualifizierten Universalprozessoren liegt aber lediglich im unteren dreistelligen Megahertz-Bereich, was für einige Anwendungen bereits nicht mehr ausreicht. Als Alternative bietet sich die Implementierung von entsprechend geeigneten Datenverarbeitungsschritten in Hardware auf einem qualifizierten SRAM-basierten FPGA an. Geeignete Bausteine sind jedoch empfindlich gegenüber der Strahlungsumgebung im Weltraum. Am Institut für Datentechnik und Kommunikationsnetze wurde eine fehlertolerante netzwerk-basierte Kommunikationsarchitektur entwickelt, die innerhalb eines geeigneten SRAM-basierten FPGAs Datenverarbeitungsmodule miteinander nach Bedarf zu Verarbeitungsketten verbindet, sowie den Austausch von einzelnen Modulen im Betrieb ermöglicht. Nicht oder nur partiell SEU mitigierte Module sollen bei strahlungsbedingten Fehlern im Modul durch das Protokoll und die Fehlererkennungsmechanismen der Kommunikationsarchitektur isoliert werden, um ein Ausbreiten des Fehlers im restlichen System-on-Chip zu verhindern. Im Kontext einer ESA Studie wurde diese Kommunikationsarchitektur um Komponenten erweitert und auf einer repräsentativen Hardwareplattform umgesetzt. Basierend auf den gesammelten Erfahrungen aus der Studie, wird in dieser Arbeit eine Analyse der tatsächlichen Fehlertoleranz-Eigenschaften sowie der Schwachstellen dieser ursprünglichen Implementierung durchgeführt. Die Kommunikationsarchitektur wurde an geeigneten Stellen um Fehlerdetektierungs- und Fehlerunterscheidungsmöglichkeiten erweitert, sowie um eine hardwarebasierte Überwachung ergänzt. Sowohl diese Maßnahmen, als auch die Erweiterung der Hardwareplattform um gezielte Fehlerinjektions-Möglichkeiten zum Emulieren von strahlungsinduzierten Fehlern in kritischen Komponenten eines nicht SEU mitigierten Prozessierungsmoduls werden genutzt, um die tatsächlichen auftretenden Effekte in der Kommunikationsarchitektur zu evaluieren. Anhand der Ergebnisse werden weitere Verbesserungsmaßnahmen speziell zur schnellen Detektierung und Isolation von fehlerhaften Knoten erarbeitet, selektiv implementiert und verifiziert. Insbesondere die Fähigkeit, fehlerhafte, nicht SEU mitigierte Netzwerkknoten innerhalb der Kommunikationsarchitektur zu isolieren, konnte dabei deutlich verbessert werden
    corecore