15 research outputs found

    On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation

    Get PDF
    Machine Learning (ML) is making a strong resurgence in tune with the massive generation of unstructured data which in turn requires massive computational resources. Due to the inherently compute- and power-intensive structure of Neural Networks (NNs), hardware accelerators emerge as a promising solution. However, with technology node scaling below 10nm, hardware accelerators become more susceptible to faults, which in turn can impact the NN accuracy. In this paper, we study the resilience aspects of Register-Transfer Level (RTL) model of NN accelerators, in particular, fault characterization and mitigation. By following a High-Level Synthesis (HLS) approach, first, we characterize the vulnerability of various components of RTL NN. We observed that the severity of faults depends on both i) application-level specifications, i.e., NN data (inputs, weights, or intermediate), NN layers, and NN activation functions, and ii) architectural-level specifications, i.e., data representation model and the parallelism degree of the underlying accelerator. Second, motivated by characterization results, we present a low-overhead fault mitigation technique that can efficiently correct bit flips, by 47.3% better than state-of-the-art methods.Comment: 8 pages, 6 figure

    On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation

    Get PDF
    Machine Learning (ML) is making a strong resurgence in tune with the massive generation of unstructured data which in turn requires massive computational resources. Due to the inherently compute and power-intensive structure of Neural Networks (NNs), hardware accelerators emerge as a promising solution. However, with technology node scaling below 10nm, hardware accelerators become more susceptible to faults, which in turn can impact the NN accuracy. In this paper, we study the resilience aspects of Register-Transfer Level (RTL) model of NN accelerators, in particular, fault characterization and mitigation. By following a High-Level Synthesis (HLS) approach, first, we characterize the vulnerability of various components of RTL NN. We observed that the severity of faults depends on both i) application-level specifications, i.e., NN data (inputs, weights, or intermediate) and NN layers and ii) architectural-level specifications, i.e., data representation model and the parallelism degree of the underlying accelerator. Second, motivated by characterization results, we present a low-overhead fault mitigation technique that can efficiently correct bit flips, by 47.3% better than state-of-the-art methods.We thank Pradip Bose, Alper Buyuktosunoglu, and Augusto Vega from IBM Watson for their contribution to this work. The research leading to these results has received funding from the European Union’s Horizon 2020 Programme under the LEGaTO Project (www.legato-project.eu), grant agreement nº 780681.Peer ReviewedPostprint (author's final draft

    A Matlab Tool for Analyzing and Improving Fault Tolerance of Artificial Neural Networks

    Get PDF
    Abstract: FTSET is a software tool that deals with fault tolerance of Artificial Neural Networks. This tool is capable of evaluating the fault tolerance degree of a previously trained Artificial Neural Network given its inputs ranges, the weights and the architecture. The FTSET is also capable of improving the fault tolerance by applying a technique of splitting the connections of the network that are more important to form the output. This technique improves fault tolerance without changing the network's output. The paper is concluded by two examples that show the application of the FTSET to different Artificial Neural Networks and the improvement of the fault tolerance obtained

    Fault Tolerance of Self Organizing Maps

    Get PDF
    International audienceAs the quest for performance confronts resource constraints, major breakthroughs in computing efficiency are expected to benefit from unconventional approaches and new models of computation such as brain-inspired computing. Beyond energy, the growing number of defects in physical substrates is becoming another major constraint that affects the design of computing devices and systems. Neural computing principles remain elusive, yet they are considered as the source of a promising paradigm to achieve fault-tolerant computation. Since the quest for fault tolerance can be translated into scalable and reliable computing systems, hardware design itself and the potential use of faulty circuits have motivated further the investigation on neural networks, which are potentially capable of absorbing some degrees of vulnerability based on their natural properties. In this paper, the fault tolerance properties of Self Organizing Maps (SOMs) are investigated. To asses the intrinsic fault tolerance and considering a general fully parallel digital implementations of SOM, we use the bit-flip fault model to inject faults in registers holding SOM weights. The distortion measure is used to evaluate performance on synthetic datasets and under different fault ratios. Additionally, we evaluate three passive techniques intended to enhance fault tolerance of SOM during training/learning under different scenarios
    corecore