472 research outputs found

    Application-Based Analysis of Register File Criticality for Reliability Assessment in Embedded Microprocessors

    Get PDF
    There is an increasing concern to reduce the cost and overheads during the development of reliable systems. Selective protection of most critical parts of the systems represents a viable solution to obtain a high level of reliability at a fraction of the cost. In particular to design a selective fault mitigation strategy for processor-based systems, it is mandatory to identify and prioritize the most vulnerable registers in the register file as best candidates to be protected (hardened). This paper presents an application-based metric to estimate the criticality of each register from the microprocessor register file in microprocessor-based systems. The proposed metric relies on the combination of three different criteria based on common features of executed applications. The applicability and accuracy of our proposal have been evaluated in a set of applications running in different microprocessors. Results show a significant improvement in accuracy compared to previous approaches and regardless of the underlying architecture.This work was funded in part by the Spanish Ministry of Education, Culture and Sports with the project “Developing hybrid fault tolerance techniques for embedded microprocessors” (PHB2012-0158-PC)

    Selective SWIFT-R. A Flexible Software-Based Technique for Soft Error Mitigation in Low-Cost Embedded Systems

    Get PDF
    Commercial off-the-shelf microprocessors are the core of low-cost embedded systems due to their programmability and cost-effectiveness. Recent advances in electronic technologies have allowed remarkable improvements in their performance. However, they have also made microprocessors more susceptible to transient faults induced by radiation. These non-destructive events (soft errors), may cause a microprocessor to produce a wrong computation result or lose control of a system with catastrophic consequences. Therefore, soft error mitigation has become a compulsory requirement for an increasing number of applications, which operate from the space to the ground level. In this context, this paper uses the concept of selective hardening, which is aimed to design reduced-overhead and flexible mitigation techniques. Following this concept, a novel flexible version of the software-based fault recovery technique known as SWIFT-R is proposed. Our approach makes possible to select different registers subsets from the microprocessor register file to be protected on software. Thus, design space is enriched with a wide spectrum of new partially protected versions, which offer more flexibility to designers. This permits to find the best trade-offs between performance, code size, and fault coverage. Three case studies have been developed to show the applicability and flexibility of the proposal.This work was funded by the Ministry of Science and Innovation in Spain with the project ‘RENASER+: Integral Analysis of Digital Circuits and Systems for Aerospace Applications’ (TEC2010-22095-C03-01)

    Network-on-Chip -based Multi-Processor System-on-Chip: Towards Mixed-Criticality System Certification

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Soft Error Effects on Arm Microprocessors: Early Estimations versus Chip Measurements

    Get PDF
    Extensive research efforts are being carried out to evaluate and improve the reliability of computing devices either through beam experiments or simulation-based fault injection. Unfortunately, it is still largely unclear to which extend fault injection can provide an accurate error rate estimation at early stages and if beam experiments can be used to identify the weakest resources in a device. The importance and challenges associated with a timely, but yet realistic reliability evaluation grow with the increase of complexity in both the hardware domain, with the integration of different types of cores in an SoC (System-on-Chip), and the software domain, with the OS (operating system) required to take full advantage of the available resources. In this paper, we combine and analyze data gathered with extensive beam experiments (on the final physical CPU hardware) and microarchitectural fault injections (on early microarchitectural CPU models). We target a standalone Arm Cortex-A5 CPU and an Arm Cortex-A9 CPU integrated into an SoC and evaluate their reliability in bare-metal and Linux-based configurations. Combining experimental data that covers more than 18 million years of device time with the result of more than 176,000 injections we find that both the SoC integration and the presence of the OS increase the system DUEs (Detected Unrecoverable Errors) rate (for different reasons) but do not significantly impact the SDCs (Silent Data Corruptions) rate which is solely attributed to the CPU core. Our reliability analysis demonstrates that even considering SoC integration and OS inclusion, early, pre-silicon microarchitecture-level fault injection delivers accurate SDC rates estimations and lower bounds for the DUE rates

    Empirical Mathematical Model of Microprocessor Sensitivity and Early Prediction to Proton and Neutron Radiation-Induced Soft Errors

    Get PDF
    A mathematical model is described to predict microprocessor fault tolerance under radiation. The model is empirically trained by combining data from simulated fault-injection campaigns and radiation experiments, both with protons (at the National Center of Accelerators (CNA) facilities, Seville, Spain) and neutrons [at the Los Alamos Neutron Science Center (LANSCE) Weapons Neutron Research Facility at Los Alamos, USA]. The sensitivity to soft errors of different blocks of commercial processors is identified to estimate the reliability of a set of programs that had previously been optimized, hardened, or both. The results showed a standard error under 0.1, in the case of the Advanced RISC Machines (ARM) processor, and 0.12, in the case of the MSP430 microcontroller.This work was supported in part by Spanish MINECO under Project ESP-2015-68245-C4-3-P and Project ESP-2015-68245-C4-4-P

    Toward Fault-Tolerant Applications on Reconfigurable Systems-on-Chip

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Approximate Computing Strategies for Low-Overhead Fault Tolerance in Safety-Critical Applications

    Get PDF
    This work studies the reliability of embedded systems with approximate computing on software and hardware designs. It presents approximate computing methods and proposes approximate fault tolerance techniques applied to programmable hardware and embedded software to provide reliability at low computational costs. The objective of this thesis is the development of fault tolerance techniques based on approximate computing and proving that approximate computing can be applied to most safety-critical systems. It starts with an experimental analysis of the reliability of embedded systems used at safety-critical projects. Results show that the reliability of single-core systems, and types of errors they are sensitive to, differ from multicore processing systems. The usage of an operating system and two different parallel programming APIs are also evaluated. Fault injection experiment results show that embedded Linux has a critical impact on the system’s reliability and the types of errors to which it is most sensitive. Traditional fault tolerance techniques and parallel variants of them are evaluated for their fault-masking capability on multicore systems. The work shows that parallel fault tolerance can indeed not only improve execution time but also fault-masking. Lastly, an approximate parallel fault tolerance technique is proposed, where the system abandons faulty execution tasks. This first approximate computing approach to fault tolerance in parallel processing systems was able to improve the reliability and the fault-masking capability of the techniques, significantly reducing errors that would cause system crashes. Inspired by the conflict between the improvements provided by approximate computing and the safety-critical systems requirements, this work presents an analysis of the applicability of approximate computing techniques on critical systems. The proposed techniques are tested under simulation, emulation, and laser fault injection experiments. Results show that approximate computing algorithms do have a particular behavior, different from traditional algorithms. The approximation techniques presented and proposed in this work are also used to develop fault tolerance techniques. Results show that those new approximate fault tolerance techniques are less costly than traditional ones and able to achieve almost the same level of error masking.Este trabalho estuda a confiabilidade de sistemas embarcados com computação aproximada em software e projetos de hardware. Ele apresenta métodos de computação aproximada e técnicas aproximadas para tolerância a falhas em hardware programável e software embarcado que provêem alta confiabilidade a baixos custos computacionais. O objetivo desta tese é o desenvolvimento de técnicas de tolerância a falhas baseadas em computação aproximada e provar que este paradigma pode ser usado em sistemas críticos. O texto começa com uma análise da confiabilidade de sistemas embarcados usados em sistemas de tolerância crítica. Os resultados mostram que a resiliência de sistemas singlecore, e os tipos de erros aos quais eles são mais sensíveis, é diferente dos multi-core. O uso de sistemas operacionais também é analisado, assim como duas APIs de programação paralela. Experimentos de injeção de falhas mostram que o uso de Linux embarcado tem um forte impacto na confiabilidade do sistema. Técnicas tradicionais de tolerância a falhas e variações paralelas das mesmas são avaliadas. O trabalho mostra que técnicas de tolerância a falhas paralelas podem de fato melhorar não apenas o tempo de execução da aplicação, mas também seu mascaramento de erros. Por fim, uma técnica de tolerância a falhas paralela aproximada é proposta, onde o sistema abandona instâncias de execuções que apresentam falhas. Esta primeira experiência com computação aproximada foi capaz de melhorar a confiabilidade das técnicas previamente apresentadas, reduzindo significativamente a ocorrência de erros que provocam um crash total do sistema. Inspirado pelo conflito entre as melhorias trazidas pela computação aproximada e os requisitos dos sistemas críticos, este trabalho apresenta uma análise da aplicabilidade de computação aproximada nestes sistemas. As técnicas propostas são testadas sob experimentos de injeção de falhas por simulação, emulação e laser. Os resultados destes experimentos mostram que algoritmos aproximados possuem um comportamento particular que lhes é inerente, diferente dos tradicionais. As técnicas de aproximação apresentadas e propostas no trabalho são também utilizadas para o desenvolvimento de técnicas de tolerância a falhas aproximadas. Estas novas técnicas possuem um custo menor que as tradicionais e são capazes de atingir o mesmo nível de mascaramento de erros

    Integration and validation of embedded flight software on space-qualified multicore architectures

    Get PDF
    In the recent decades, the importance of software on space missions has notably increased, reflecting the need to integrate advanced on-board functionalities. With multicore processors being lately introduced to host critical high-performance applications, the complexity to validate software has significantly raised with respect to single core architectures. While there has been a big step forward in avionics after the publication of the CAST-32A paper, the ECSS-E-ST-40C software engineering standard used by the European Space Agency (ESA) is still not providing validation support for multicore processors. Hence, it is expected that standardising guidelines to develop software on such platforms will become a recurring topic in the industry to match the demands of future space exploration missions

    High-Level Analysis of the Impact of Soft-Faults in Cyberphysical Systems

    Get PDF
    As digital systems grow in complexity and are used in a broader variety of safety-critical applications, there is an ever-increasing demand for assessing the dependability and safety of such systems, especially when subjected to hazardous environments. As a result, it is important to identify and correct any functional abnormalities and component faults as early as possible in order to minimize performance degradation and to avoid potential perilous situations. Existing techniques often lack the capacity to perform a comprehensive and exhaustive analysis on complex redundant architectures, leading to less than optimal risk evaluation. Hence, an early analysis of dependability of such safety-critical applications enables designers to develop systems that meets high dependability requirements. Existing techniques in the field often lack the capacity to perform full system analyses due to state-explosion limitations (such as transistor and gate-level analyses), or due to the time and monetary costs attached to them (such as simulation, emulation, and physical testing). In this work we develop a system-level methodology to model and analyze the effects of Single Event Upsets (SEUs) in cyberphysical system designs. The proposed methodology investigates the impacts of SEUs in the entire system model (fault tree level), including SEU propagation paths, logical masking of errors, vulnerability to specific events, and critical nodes. The methodology also provides insights on a system's weaknesses, such as the impact of each component to the system's vulnerability, as well as hidden sources of failure, such as latent faults. Moreover, the proposed methodology is able to identify and categorize the system's components in order of criticality, and to evaluate different approaches to the mitigation of such criticality (in the form of different configurations of TMR) in order to obtain the most efficient mitigation solution available. The proposed methodology is also able to model and analyze system components individually (system component level), in order to more accurately estimate the component's vulnerability to SEUs. In this case, a more refined analysis of the component is conducted, which enables us to identify the source of the component's criticality. Thereafter, a second mitigation mechanic (internal to the component) takes place, in order to evaluate the gains and costs of applying different configurations of TMR to the component internally. Finally, our approach will draw a comparison between the results obtained at both levels of analysis in order to evaluate the most efficient way of improving the targeted system design
    corecore