2 research outputs found

    Designing Fault-Injection Experiments for the Reliability of Embedded Systems

    Get PDF
    This paper considers the long-standing problem of conducting fault-injections experiments to establish the ultra-reliability of embedded systems. There have been extensive efforts in fault injection, and this paper offers a partial summary of the efforts, but these previous efforts have focused on realism and efficiency. Fault injections have been used to examine diagnostics and to test algorithms, but the literature does not contain any framework that says how to conduct fault-injection experiments to establish ultra-reliability. A solution to this problem integrates field-data, arguments-from-design, and fault-injection into a seamless whole. The solution in this paper is to derive a model reduction theorem for a class of semi-Markov models suitable for describing ultra-reliable embedded systems. The derivation shows that a tight upper bound on the probability of system failure can be obtained using only the means of system-recovery times, thus reducing the experimental effort to estimating a reasonable number of easily-observed parameters. The paper includes an example of a system subject to both permanent and transient faults. There is a discussion of integrating fault-injection with field-data and arguments-from-design

    Functional Error Correction for Robust Neural Networks

    Get PDF
    When neural networks (NeuralNets) are implemented in hardware, their weights need to be stored in memory devices. As noise accumulates in the stored weights, the NeuralNet’s performance will degrade. This paper studies how to use error correcting codes (ECCs) to protect the weights. Different from classic error correction in data storage, the optimization objective is to optimize the NeuralNet’s performance after error correction, instead of minimizing the Uncorrectable Bit Error Rate in the protected bits. That is, by seeing the NeuralNet as a function of its input, the error correction scheme is function-oriented. A main challenge is that a deep NeuralNet often has millions to hundreds of millions of weights, causing a large redundancy overhead for ECCs, and the relationship between the weights and its NeuralNet’s performance can be highly complex. To address the challenge, we propose a Selective Protection (SP) scheme, which chooses only a subset of important bits for ECC protection. To find such bits and achieve an optimized tradeoff between ECC’s redundancy and NeuralNet’s performance, we present an algorithm based on deep reinforcement learning. Experimental results verify that compared to the natural baseline scheme, the proposed algorithm achieves substantially better performance for the functional error correction task
    corecore