1,132 research outputs found

    What does fault tolerant Deep Learning need from MPI?

    Full text link
    Deep Learning (DL) algorithms have become the de facto Machine Learning (ML) algorithm for large scale data analysis. DL algorithms are computationally expensive - even distributed DL implementations which use MPI require days of training (model learning) time on commonly studied datasets. Long running DL applications become susceptible to faults - requiring development of a fault tolerant system infrastructure, in addition to fault tolerant DL algorithms. This raises an important question: What is needed from MPI for de- signing fault tolerant DL implementations? In this paper, we address this problem for permanent faults. We motivate the need for a fault tolerant MPI specification by an in-depth consideration of recent innovations in DL algorithms and their properties, which drive the need for specific fault tolerance features. We present an in-depth discussion on the suitability of different parallelism types (model, data and hybrid); a need (or lack thereof) for check-pointing of any critical data structures; and most importantly, consideration for several fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI and their applicability to fault tolerant DL implementations. We leverage a distributed memory implementation of Caffe, currently available under the Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches by ex- tending MaTEx-Caffe for using ULFM-based implementation. Our evaluation using the ImageNet dataset and AlexNet, and GoogLeNet neural network topologies demonstrates the effectiveness of the proposed fault tolerant DL implementation using OpenMPI based ULFM

    On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation

    Get PDF
    Machine Learning (ML) is making a strong resurgence in tune with the massive generation of unstructured data which in turn requires massive computational resources. Due to the inherently compute- and power-intensive structure of Neural Networks (NNs), hardware accelerators emerge as a promising solution. However, with technology node scaling below 10nm, hardware accelerators become more susceptible to faults, which in turn can impact the NN accuracy. In this paper, we study the resilience aspects of Register-Transfer Level (RTL) model of NN accelerators, in particular, fault characterization and mitigation. By following a High-Level Synthesis (HLS) approach, first, we characterize the vulnerability of various components of RTL NN. We observed that the severity of faults depends on both i) application-level specifications, i.e., NN data (inputs, weights, or intermediate), NN layers, and NN activation functions, and ii) architectural-level specifications, i.e., data representation model and the parallelism degree of the underlying accelerator. Second, motivated by characterization results, we present a low-overhead fault mitigation technique that can efficiently correct bit flips, by 47.3% better than state-of-the-art methods.Comment: 8 pages, 6 figure

    Simulation, Application, and Resilience of an Organic Neuromorphic Architecture, Made with Organic Bistable Devices and Organic Field Effect Transistors

    Get PDF
    This thesis presents work done simulating a type of organic neuromorphic architecture, modeled after Artificial Neural Network, and termed Synthetic Neural Network, or SNN. The first major contribution of this thesis is development of a single-transistor-single-organic-bistable-device-per-input circuit that approximates behavior of an artificial neuron. The efficacy of this design is validated by comparing the behavior of a single synthetic neuron to that of an artificial neuron as well as two examples involving a network of synthetic neurons. The analysis utilizes electrical characteristics of polymer electronic elements, namely Organic Bistable Device and Organic Field Effect Transistor, created in the laboratory at University of Denver. Polymer electronics is a new branch of electronics that is based on conductive and semi-conductive polymers. These new elements hold a great advantage over the inorganic electronics in the form of physical flexibility and low cost of fabrication. However, their device variability between individual devices is also much greater. Therefore the second major contribution of this thesis is the analysis of resilience of neural networks subjected to physical damage and other manufacturing faults

    SNIFF: Reverse Engineering of Neural Networks with Fault Attacks

    Full text link
    Neural networks have been shown to be vulnerable against fault injection attacks. These attacks change the physical behavior of the device during the computation, resulting in a change of value that is currently being computed. They can be realized by various fault injection techniques, ranging from clock/voltage glitching to application of lasers to rowhammer. In this paper we explore the possibility to reverse engineer neural networks with the usage of fault attacks. SNIFF stands for sign bit flip fault, which enables the reverse engineering by changing the sign of intermediate values. We develop the first exact extraction method on deep-layer feature extractor networks that provably allows the recovery of the model parameters. Our experiments with Keras library show that the precision error for the parameter recovery for the tested networks is less than 10−1310^{-13} with the usage of 64-bit floats, which improves the current state of the art by 6 orders of magnitude. Additionally, we discuss the protection techniques against fault injection attacks that can be applied to enhance the fault resistance

    Fault Tolerance of Self Organizing Maps

    Get PDF
    International audienceAs the quest for performance confronts resource constraints, major breakthroughs in computing efficiency are expected to benefit from unconventional approaches and new models of computation such as brain-inspired computing. Beyond energy, the growing number of defects in physical substrates is becoming another major constraint that affects the design of computing devices and systems. Neural computing principles remain elusive, yet they are considered as the source of a promising paradigm to achieve fault-tolerant computation. Since the quest for fault tolerance can be translated into scalable and reliable computing systems, hardware design itself and the potential use of faulty circuits have motivated further the investigation on neural networks, which are potentially capable of absorbing some degrees of vulnerability based on their natural properties. In this paper, the fault tolerance properties of Self Organizing Maps (SOMs) are investigated. To asses the intrinsic fault tolerance and considering a general fully parallel digital implementations of SOM, we use the bit-flip fault model to inject faults in registers holding SOM weights. The distortion measure is used to evaluate performance on synthetic datasets and under different fault ratios. Additionally, we evaluate three passive techniques intended to enhance fault tolerance of SOM during training/learning under different scenarios

    Evaluation of Different Fault Diagnosis Methods and Their Applications in Vehicle Systems

    Get PDF
    A high level of automation in vehicles is accompanied by a variety of sensors and actuators, whose malfunctions must be dealt with caution because they might cause serious driving safety hazards. Therefore, a robust and highly accurate fault detection and diagnosis system to monitor the operational states of vehicle systems is an indispensable prerequisite. In the area of fault diagnosis, numerous techniques have been studied, and each one has pros and cons. Selecting the best approach based on the requirements or usage scenarios will save much needless work. In this article, the authors examine some of the most common fault diagnosis methods for their applicability to automated vehicle systems: the traditional binary logic method, the fuzzy logic method, the fuzzy neural method, and two neural network methods (the feedforward neural network and the convolutional neural network). For each approach, the diagnosis algorithms for vehicle systems were modeled differently. The analysis of the detection capabilities and the suitable application scenarios of each fault diagnosis approach for vehicle systems, as well as recommendations for selecting different methods for various diagnosis needs, are also provided. In the future, this can serve as an effective guide for the selection of a suitable fault diagnosis approach based on the application scenarios for vehicle systems
    • …
    corecore