79 research outputs found

    Reclaiming Fault Resilience and Energy Efficiency With Enhanced Performance in Low Power Architectures

    Get PDF
    Rapid developments of the AI domain has revolutionized the computing industry by the introduction of state-of-art AI architectures. This growth is also accompanied by a massive increase in the power consumption. Near-Theshold Computing (NTC) has emerged as a viable solution by offering significant savings in power consumption paving the way for an energy efficient design paradigm. However, these benefits are accompanied by a deterioration in performance due to the severe process variation and slower transistor switching at Near-Threshold operation. These problems severely restrict the usage of Near-Threshold operation in commercial applications. In this work, a novel AI architecture, Tensor Processing Unit, operating at NTC is thoroughly investigated to tackle the issues hindering system performance. Research problems are demonstrated in a scientific manner and unique opportunities are explored to propose novel design methodologies

    Impact of Structural Faults on Neural Network Performance

    Get PDF
    Deep Learning (DL), a subset of Artificial Intelligence (AI), is growing rapidly with possible applications in different domains such as speech recognition, computer vision etc. Deep Neural Network (DNN), the backbone of DL algorithms is a directed graph containing multiple layers with different number of neurons residing in each layer. The use of these networks has been increased in the last few years due to availability of large data sets and huge computation power. As the size of DNN is growing over the years, researchers have developed specialized hardware accelerators to reduce the inference compute time. An example of such domain specific architecture designed for Neural Network acceleration is Tensor Processing Unit (TPU) which outperforms GPU in the inference stage of DNN execution. The heart of this inference engine is a Matrix Multiplication unit which is based on systolic array architecture. The TPU\u27s systolic array is a grid-like structure made of individual processing elements that can be extended along rows and columns. Due to external environmental factors or internal scaling of semiconductor, these systems are often prone to faults which leads to improper calculations and thereby resulting in inaccurate decisions by the DNN. Although a lot of work has been done in the past on the computing array implementation and it\u27s reliability concerns, their fault tolerance behavior for DNN application is not very well understood. It is not even clear what would be the impact of various different faults on the accuracy. We in this work, first study possible mapping strategies to implement a convolution and dense layer weights on TPU systolic array. Next we consider various faults scenarios that may occur in the array. We divide these fault scenarios into low, high row and column faults (Fig. 1(a) pictorially represents column faults) modes with respect to the multiplication unit. Next, we study the impact of these fault models on the overall accuracy of the DNN performance on a faculty TPU unit. The goal is to study the resiliency and overcome the limitations of earlier work. The previous work was very effective in masking the random faults which used pruning of weights (removing weights or connections in the DNN) plus retraining to mask the faults on the array. However, it failed in the case of column faults which is clearly shown in Fig. 1(b). We also propose techniques to mitigate or bypass the row and column faults. Our mapping strategy follows physical_x(i) = i%N and physical_y(j) = j%N where (i,j) represents the index of dense (FC) weight matrix and (physical x(i), physical y(j)) indicates the actual physical location on the array of size N. The convolution filters are linearized with respect to every channel so as to convert them into proper weight matrix and mapped according to the previous mentioned policy. It was shown that DNNs can up to certain faults in the array while retaining the original accuracy (low row faults). The accuracy of the network decreases even with one column faults if it (column) is in the use. As per the results, it is proved that for the same number of row and column faults, the latter has most impact on the network accuracy because pruning input neuron has very little effect than pruning an output neuron. We experimented with three different networks and found the influence of these different faults to be the same. These faults can be mitigated using techniques like Matrix Transpose and Array Reduction which does not require retraining of weights. For low row faults, the original mapping policy can be retained such that weights can be mapped at their exact locations which does not affect the accuracy. Low column faults can be converted into low row faults by transposing the matrix. In the case of high row (column) faults, the entire row (column) has to be avoided to completely bypass the faulty locations. Static mapping of weights along with retraining the network on the array can be effective in the case of random faults. Adapting to change in the case of structured faults can reduce the burden of retraining which happens outside the TPU

    Review of Fault Mitigation Approaches for Deep Neural Networks for Computer Vision in Autonomous Driving

    Get PDF
    The aim of this work is to identify and present challenges and risks related to the employment of DNNs in Computer Vision for Autonomous Driving. Nowadays one of the major technological challenges is to choose the right technology among the abundance that is available on the market. Specifically, in this thesis it is collected a synopsis of the state-of-the-art architectures, techniques and methodologies adopted for building fault-tolerant hardware and ensuring robustness in DNNs-based Computer Vision applications for Autonomous Driving

    RescueSNN: Enabling Reliable Executions on Spiking Neural Network Accelerators under Permanent Faults

    Full text link
    To maximize the performance and energy efficiency of Spiking Neural Network (SNN) processing on resource-constrained embedded systems, specialized hardware accelerators/chips are employed. However, these SNN chips may suffer from permanent faults which can affect the functionality of weight memory and neuron behavior, thereby causing potentially significant accuracy degradation and system malfunctioning. Such permanent faults may come from manufacturing defects during the fabrication process, and/or from device/transistor damages (e.g., due to wear out) during the run-time operation. However, the impact of permanent faults in SNN chips and the respective mitigation techniques have not been thoroughly investigated yet. Toward this, we propose RescueSNN, a novel methodology to mitigate permanent faults in the compute engine of SNN chips without requiring additional retraining, thereby significantly cutting down the design time and retraining costs, while maintaining the throughput and quality. The key ideas of our RescueSNN methodology are (1) analyzing the characteristics of SNN under permanent faults; (2) leveraging this analysis to improve the SNN fault-tolerance through effective fault-aware mapping (FAM); and (3) devising lightweight hardware enhancements to support FAM. Our FAM technique leverages the fault map of SNN compute engine for (i) minimizing weight corruption when mapping weight bits on the faulty memory cells, and (ii) selectively employing faulty neurons that do not cause significant accuracy degradation to maintain accuracy and throughput, while considering the SNN operations and processing dataflow. The experimental results show that our RescueSNN improves accuracy by up to 80% while maintaining the throughput reduction below 25% in high fault rate (e.g., 0.5 of the potential fault locations), as compared to running SNNs on the faulty chip without mitigation.Comment: Accepted for publication at Frontiers in Neuroscience - Section Neuromorphic Engineerin

    On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation

    Get PDF
    Machine Learning (ML) is making a strong resurgence in tune with the massive generation of unstructured data which in turn requires massive computational resources. Due to the inherently compute and power-intensive structure of Neural Networks (NNs), hardware accelerators emerge as a promising solution. However, with technology node scaling below 10nm, hardware accelerators become more susceptible to faults, which in turn can impact the NN accuracy. In this paper, we study the resilience aspects of Register-Transfer Level (RTL) model of NN accelerators, in particular, fault characterization and mitigation. By following a High-Level Synthesis (HLS) approach, first, we characterize the vulnerability of various components of RTL NN. We observed that the severity of faults depends on both i) application-level specifications, i.e., NN data (inputs, weights, or intermediate) and NN layers and ii) architectural-level specifications, i.e., data representation model and the parallelism degree of the underlying accelerator. Second, motivated by characterization results, we present a low-overhead fault mitigation technique that can efficiently correct bit flips, by 47.3% better than state-of-the-art methods.We thank Pradip Bose, Alper Buyuktosunoglu, and Augusto Vega from IBM Watson for their contribution to this work. The research leading to these results has received funding from the European Union’s Horizon 2020 Programme under the LEGaTO Project (www.legato-project.eu), grant agreement nº 780681.Peer ReviewedPostprint (author's final draft
    • …
    corecore