603 research outputs found
Advanced information processing system: Local system services
The Advanced Information Processing System (AIPS) is a multi-computer architecture composed of hardware and software building blocks that can be configured to meet a broad range of application requirements. The hardware building blocks are fault-tolerant, general-purpose computers, fault-and damage-tolerant networks (both computer and input/output), and interfaces between the networks and the computers. The software building blocks are the major software functions: local system services, input/output, system services, inter-computer system services, and the system manager. The foundation of the local system services is an operating system with the functions required for a traditional real-time multi-tasking computer, such as task scheduling, inter-task communication, memory management, interrupt handling, and time maintenance. Resting on this foundation are the redundancy management functions necessary in a redundant computer and the status reporting functions required for an operator interface. The functional requirements, functional design and detailed specifications for all the local system services are documented
Digital avionics design and reliability analyzer
The description and specifications for a digital avionics design and reliability analyzer are given. Its basic function is to provide for the simulation and emulation of the various fault-tolerant digital avionic computer designs that are developed. It has been established that hardware emulation at the gate-level will be utilized. The primary benefit of emulation to reliability analysis is the fact that it provides the capability to model a system at a very detailed level. Emulation allows the direct insertion of faults into the system, rather than waiting for actual hardware failures to occur. This allows for controlled and accelerated testing of system reaction to hardware failures. There is a trade study which leads to the decision to specify a two-machine system, including an emulation computer connected to a general-purpose computer. There is also an evaluation of potential computers to serve as the emulation computer
Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 2: Army fault tolerant architecture design and analysis
Described here is the Army Fault Tolerant Architecture (AFTA) hardware architecture and components and the operating system. The architectural and operational theory of the AFTA Fault Tolerant Data Bus is discussed. The test and maintenance strategy developed for use in fielded AFTA installations is presented. An approach to be used in reducing the probability of AFTA failure due to common mode faults is described. Analytical models for AFTA performance, reliability, availability, life cycle cost, weight, power, and volume are developed. An approach is presented for using VHSIC Hardware Description Language (VHDL) to describe and design AFTA's developmental hardware. A plan is described for verifying and validating key AFTA concepts during the Dem/Val phase. Analytical models and partial mission requirements are used to generate AFTA configurations for the TF/TA/NOE and Ground Vehicle missions
Real-Time Trace Decoding and Monitoring for Safety and Security in Embedded Systems
Integrated circuits and systems can be found almost everywhere in today’s world. As their use increases, they need to be made safer and more perfor mant to meet current demands in processing power. FPGA integrated SoCs can provide the ideal trade-off between performance, adaptability, and energy usage. One of today’s vital challenges lies in updating existing fault tolerance techniques for these new systems while utilizing all available processing capa bilities, such as multi-core and heterogeneous processing units. Control-flow monitoring is one of the primary mechanisms described for error detection at the software architectural level for the highest grade of hazard level clas sifications (e.g., ASIL D) described in industry safety standards ISO-26262. Control-flow errors are also known to compose the majority of detected errors for ICs and embedded systems in safety-critical and risk-susceptible environ ments [5]. Software-based monitoring methods remain the most popular [6–8]. However, recent studies show that the overheads they impose make actual reliability gains negligible [9, 10]. This work proposes and demonstrates a new control flow checking method implemented in FPGA for multi-core embedded systems called control-flow trace checker (CFTC). CFTC uses existing trace and debug subsystems of modern processors to rebuild their execution states. It can iden tify any errors in real-time by comparing executed states to a set of permitted state transitions determined statically. This novel implementation weighs hardware resource trade-offs to target mul tiple independent tasks in multi-core embedded applications, as well as single core systems. The proposed system is entirely implemented in hardware and isolated from all monitored software components, requiring 2.4% of the target FPGA platform resources to protect an execution unit in its entirety. There fore, it avoids undesired overheads and maintains deterministic error detection latencies, which guarantees reliability improvements without impairing the target software system. Finally, CFTC is evaluated under different software i Resumo fault-injection scenarios, achieving detection rates of 100% of all control-flow errors to wrong destinations and 98% of all injected faults to program binaries. All detection times are further analyzed and precisely described by a model based on the monitor’s resources and speed and the software application’s control-flow structure and binary characteristics.Circuitos integrados estão presentes em quase todos sistemas complexos do mundo moderno. Conforme sua frequência de uso aumenta, eles precisam se tornar mais seguros e performantes para conseguir atender as novas demandas em potência de processamento. Sistemas em Chip integrados com FPGAs conseguem prover o balanço perfeito entre desempenho, adaptabilidade, e uso de energia. Um dos maiores desafios agora é a necessidade de atualizar técnicas de tolerância à falhas para estes novos sistemas, aproveitando os novos avanços em capacidade de processamento. Monitoramento de fluxo de controle é um dos principais mecanismos para a detecção de erros em nÃvel de software para sistemas classificados como de alto risco (e.g. ASIL D), descrito em padrões de segurança como o ISO-26262. Estes erros são conhecidos por compor a maioria dos erros detectados em sistemas integrados [5]. Embora métodos de monitoramento baseados em software continuem sendo os mais populares [6–8], estudos recentes mostram que seus custos adicionais, em termos de performance e área, diminuem consideravelmente seus ganhos reais em confiabilidade [9, 10]. Propomos aqui um novo método de monitora mento de fluxo de controle implementado em FPGA para sistemas embarcados multi-core. Este método usa subsistemas de trace e execução de código para reconstruir o estado atual do processador, identificando erros através de com parações entre diferentes estados de execução da CPU. Propomos uma implementação que considera trade-offs no uso de recuros de sistema para monitorar múltiplas tarefas independetes. Nossa abordagem suporta o monitoramento de sistemas simples e também de sistemas multi-core multitarefa. Por fim, nossa técnica é totalmente implementada em hardware, evitando o uso de unidades de processamento de software que possa adicionar custos indesejáveis à aplicação em perda de confiabilidade. Propomos, assim, um mecanismo de verificação de fluxo de controle, escalável e extensÃvel, para proteção de sistemas embarcados crÃticos e multi-core
Real-time trace decoding and monitoring for safety and security in embedded systems
Integrated circuits and systems can be found almost everywhere in today’s world. As their use increases, they need to be made safer and more perfor mant to meet current demands in processing power. FPGA integrated SoCs can provide the ideal trade-off between performance, adaptability, and energy usage. One of today’s vital challenges lies in updating existing fault tolerance techniques for these new systems while utilizing all available processing capa bilities, such as multi-core and heterogeneous processing units. Control-flow monitoring is one of the primary mechanisms described for error detection at the software architectural level for the highest grade of hazard level clas sifications (e.g., ASIL D) described in industry safety standards ISO-26262. Control-flow errors are also known to compose the majority of detected errors for ICs and embedded systems in safety-critical and risk-susceptible environ ments [5]. Software-based monitoring methods remain the most popular [6–8]. However, recent studies show that the overheads they impose make actual reliability gains negligible [9, 10]. This work proposes and demonstrates a new control flow checking method implemented in FPGA for multi-core embedded systems called control-flow trace checker (CFTC). CFTC uses existing trace and debug subsystems of modern processors to rebuild their execution states. It can iden tify any errors in real-time by comparing executed states to a set of permitted state transitions determined statically. This novel implementation weighs hardware resource trade-offs to target mul tiple independent tasks in multi-core embedded applications, as well as single core systems. The proposed system is entirely implemented in hardware and isolated from all monitored software components, requiring 2.4% of the target FPGA platform resources to protect an execution unit in its entirety. There fore, it avoids undesired overheads and maintains deterministic error detection latencies, which guarantees reliability improvements without impairing the target software system. Finally, CFTC is evaluated under different software i Resumo fault-injection scenarios, achieving detection rates of 100% of all control-flow errors to wrong destinations and 98% of all injected faults to program binaries. All detection times are further analyzed and precisely described by a model based on the monitor’s resources and speed and the software application’s control-flow structure and binary characteristics.Circuitos integrados estão presentes em quase todos sistemas complexos do mundo moderno. Conforme sua frequência de uso aumenta, eles precisam se tornar mais seguros e performantes para conseguir atender as novas demandas em potência de processamento. Sistemas em Chip integrados com FPGAs conseguem prover o balanço perfeito entre desempenho, adaptabilidade, e uso de energia. Um dos maiores desafios agora é a necessidade de atualizar técnicas de tolerância à falhas para estes novos sistemas, aproveitando os novos avanços em capacidade de processamento. Monitoramento de fluxo de controle é um dos principais mecanismos para a detecção de erros em nÃvel de software para sistemas classificados como de alto risco (e.g. ASIL D), descrito em padrões de segurança como o ISO-26262. Estes erros são conhecidos por compor a maioria dos erros detectados em sistemas integrados [5]. Embora métodos de monitoramento baseados em software continuem sendo os mais populares [6–8], estudos recentes mostram que seus custos adicionais, em termos de performance e área, diminuem consideravelmente seus ganhos reais em confiabilidade [9, 10]. Propomos aqui um novo método de monitora mento de fluxo de controle implementado em FPGA para sistemas embarcados multi-core. Este método usa subsistemas de trace e execução de código para reconstruir o estado atual do processador, identificando erros através de com parações entre diferentes estados de execução da CPU. Propomos uma implementação que considera trade-offs no uso de recuros de sistema para monitorar múltiplas tarefas independetes. Nossa abordagem suporta o monitoramento de sistemas simples e também de sistemas multi-core multitarefa. Por fim, nossa técnica é totalmente implementada em hardware, evitando o uso de unidades de processamento de software que possa adicionar custos indesejáveis à aplicação em perda de confiabilidade. Propomos, assim, um mecanismo de verificação de fluxo de controle, escalável e extensÃvel, para proteção de sistemas embarcados crÃticos e multi-core
Advanced information processing system: Fault injection study and results
The objective of the AIPS program is to achieve a validated fault tolerant distributed computer system. The goals of the AIPS fault injection study were: (1) to present the fault injection study components addressing the AIPS validation objective; (2) to obtain feedback for fault removal from the design implementation; (3) to obtain statistical data regarding fault detection, isolation, and reconfiguration responses; and (4) to obtain data regarding the effects of faults on system performance. The parameters are described that must be varied to create a comprehensive set of fault injection tests, the subset of test cases selected, the test case measurements, and the test case execution. Both pin level hardware faults using a hardware fault injector and software injected memory mutations were used to test the system. An overview is provided of the hardware fault injector and the associated software used to carry out the experiments. Detailed specifications are given of fault and test results for the I/O Network and the AIPS Fault Tolerant Processor, respectively. The results are summarized and conclusions are given
Development and evaluation of a fault-tolerant multiprocessor (FTMP) computer. Volume 4: FTMP executive summary
The FTMP architecture is a high reliability computer concept modeled after a homogeneous multiprocessor architecture. Elements of the FTMP are operated in tight synchronism with one another and hardware fault-detection and fault-masking is provided which is transparent to the software. Operating system design and user software design is thus greatly simplified. Performance of the FTMP is also comparable to that of a simplex equivalent due to the efficiency of fault handling hardware. The FTMP project constructed an engineering module of the FTMP, programmed the machine and extensively tested the architecture through fault injection and other stress testing. This testing confirmed the soundness of the FTMP concepts
Innovative Techniques for Testing and Diagnosing SoCs
We rely upon the continued functioning of many electronic devices for our everyday welfare,
usually embedding integrated circuits that are becoming even cheaper and smaller
with improved features. Nowadays, microelectronics can integrate a working computer
with CPU, memories, and even GPUs on a single die, namely System-On-Chip (SoC).
SoCs are also employed on automotive safety-critical applications, but need to be tested
thoroughly to comply with reliability standards, in particular the ISO26262 functional
safety for road vehicles.
The goal of this PhD. thesis is to improve SoC reliability by proposing innovative
techniques for testing and diagnosing its internal modules: CPUs, memories, peripherals,
and GPUs. The proposed approaches in the sequence appearing in this thesis are described
as follows:
1. Embedded Memory Diagnosis: Memories are dense and complex circuits which
are susceptible to design and manufacturing errors. Hence, it is important to understand
the fault occurrence in the memory array. In practice, the logical and physical
array representation differs due to an optimized design which adds enhancements to
the device, namely scrambling. This part proposes an accurate memory diagnosis
by showing the efforts of a software tool able to analyze test results, unscramble
the memory array, map failing syndromes to cell locations, elaborate cumulative
analysis, and elaborate a final fault model hypothesis. Several SRAM memory failing
syndromes were analyzed as case studies gathered on an industrial automotive
32-bit SoC developed by STMicroelectronics. The tool displayed defects virtually,
and results were confirmed by real photos taken from a microscope.
2. Functional Test Pattern Generation: The key for a successful test is the pattern applied
to the device. They can be structural or functional; the former usually benefits
from embedded test modules targeting manufacturing errors and is only effective
before shipping the component to the client. The latter, on the other hand, can be
applied during mission minimally impacting on performance but is penalized due
to high generation time. However, functional test patterns may benefit for having
different goals in functional mission mode. Part III of this PhD thesis proposes
three different functional test pattern generation methods for CPU cores embedded
in SoCs, targeting different test purposes, described as follows:
a. Functional Stress Patterns: Are suitable for optimizing functional stress during
I
Operational-life Tests and Burn-in Screening for an optimal device reliability
characterization
b. Functional Power Hungry Patterns: Are suitable for determining functional
peak power for strictly limiting the power of structural patterns during manufacturing
tests, thus reducing premature device over-kill while delivering high test
coverage
c. Software-Based Self-Test Patterns: Combines the potentiality of structural patterns
with functional ones, allowing its execution periodically during mission.
In addition, an external hardware communicating with a devised SBST was proposed.
It helps increasing in 3% the fault coverage by testing critical Hardly
Functionally Testable Faults not covered by conventional SBST patterns.
An automatic functional test pattern generation exploiting an evolutionary algorithm
maximizing metrics related to stress, power, and fault coverage was employed
in the above-mentioned approaches to quickly generate the desired patterns. The
approaches were evaluated on two industrial cases developed by STMicroelectronics;
8051-based and a 32-bit Power Architecture SoCs. Results show that generation
time was reduced upto 75% in comparison to older methodologies while
increasing significantly the desired metrics.
3. Fault Injection in GPGPU: Fault injection mechanisms in semiconductor devices
are suitable for generating structural patterns, testing and activating mitigation techniques,
and validating robust hardware and software applications. GPGPUs are
known for fast parallel computation used in high performance computing and advanced
driver assistance where reliability is the key point. Moreover, GPGPU manufacturers
do not provide design description code due to content secrecy. Therefore,
commercial fault injectors using the GPGPU model is unfeasible, making radiation
tests the only resource available, but are costly. In the last part of this thesis, we
propose a software implemented fault injector able to inject bit-flip in memory elements
of a real GPGPU. It exploits a software debugger tool and combines the
C-CUDA grammar to wisely determine fault spots and apply bit-flip operations in
program variables. The goal is to validate robust parallel algorithms by studying
fault propagation or activating redundancy mechanisms they possibly embed. The
effectiveness of the tool was evaluated on two robust applications: redundant parallel
matrix multiplication and floating point Fast Fourier Transform
Design of an integrated airframe/propulsion control system architecture
The design of an integrated airframe/propulsion control system architecture is described. The design is based on a prevalidation methodology that uses both reliability and performance. A detailed account is given for the testing associated with a subset of the architecture and concludes with general observations of applying the methodology to the architecture
- …