1,132 research outputs found
What does fault tolerant Deep Learning need from MPI?
Deep Learning (DL) algorithms have become the de facto Machine Learning (ML)
algorithm for large scale data analysis. DL algorithms are computationally
expensive - even distributed DL implementations which use MPI require days of
training (model learning) time on commonly studied datasets. Long running DL
applications become susceptible to faults - requiring development of a fault
tolerant system infrastructure, in addition to fault tolerant DL algorithms.
This raises an important question: What is needed from MPI for de- signing
fault tolerant DL implementations? In this paper, we address this problem for
permanent faults. We motivate the need for a fault tolerant MPI specification
by an in-depth consideration of recent innovations in DL algorithms and their
properties, which drive the need for specific fault tolerance features. We
present an in-depth discussion on the suitability of different parallelism
types (model, data and hybrid); a need (or lack thereof) for check-pointing of
any critical data structures; and most importantly, consideration for several
fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI
and their applicability to fault tolerant DL implementations. We leverage a
distributed memory implementation of Caffe, currently available under the
Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches
by ex- tending MaTEx-Caffe for using ULFM-based implementation. Our evaluation
using the ImageNet dataset and AlexNet, and GoogLeNet neural network topologies
demonstrates the effectiveness of the proposed fault tolerant DL implementation
using OpenMPI based ULFM
On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation
Machine Learning (ML) is making a strong resurgence in tune with the massive
generation of unstructured data which in turn requires massive computational
resources. Due to the inherently compute- and power-intensive structure of
Neural Networks (NNs), hardware accelerators emerge as a promising solution.
However, with technology node scaling below 10nm, hardware accelerators become
more susceptible to faults, which in turn can impact the NN accuracy. In this
paper, we study the resilience aspects of Register-Transfer Level (RTL) model
of NN accelerators, in particular, fault characterization and mitigation. By
following a High-Level Synthesis (HLS) approach, first, we characterize the
vulnerability of various components of RTL NN. We observed that the severity of
faults depends on both i) application-level specifications, i.e., NN data
(inputs, weights, or intermediate), NN layers, and NN activation functions, and
ii) architectural-level specifications, i.e., data representation model and the
parallelism degree of the underlying accelerator. Second, motivated by
characterization results, we present a low-overhead fault mitigation technique
that can efficiently correct bit flips, by 47.3% better than state-of-the-art
methods.Comment: 8 pages, 6 figure
Simulation, Application, and Resilience of an Organic Neuromorphic Architecture, Made with Organic Bistable Devices and Organic Field Effect Transistors
This thesis presents work done simulating a type of organic neuromorphic architecture, modeled after Artificial Neural Network, and termed Synthetic Neural Network, or SNN. The first major contribution of this thesis is development of a single-transistor-single-organic-bistable-device-per-input circuit that approximates behavior of an artificial neuron. The efficacy of this design is validated by comparing the behavior of a single synthetic neuron to that of an artificial neuron as well as two examples involving a network of synthetic neurons. The analysis utilizes electrical characteristics of polymer electronic elements, namely Organic Bistable Device and Organic Field Effect Transistor, created in the laboratory at University of Denver. Polymer electronics is a new branch of electronics that is based on conductive and semi-conductive polymers. These new elements hold a great advantage over the inorganic electronics in the form of physical flexibility and low cost of fabrication. However, their device variability between individual devices is also much greater. Therefore the second major contribution of this thesis is the analysis of resilience of neural networks subjected to physical damage and other manufacturing faults
SNIFF: Reverse Engineering of Neural Networks with Fault Attacks
Neural networks have been shown to be vulnerable against fault injection
attacks. These attacks change the physical behavior of the device during the
computation, resulting in a change of value that is currently being computed.
They can be realized by various fault injection techniques, ranging from
clock/voltage glitching to application of lasers to rowhammer. In this paper we
explore the possibility to reverse engineer neural networks with the usage of
fault attacks. SNIFF stands for sign bit flip fault, which enables the reverse
engineering by changing the sign of intermediate values. We develop the first
exact extraction method on deep-layer feature extractor networks that provably
allows the recovery of the model parameters. Our experiments with Keras library
show that the precision error for the parameter recovery for the tested
networks is less than with the usage of 64-bit floats, which
improves the current state of the art by 6 orders of magnitude. Additionally,
we discuss the protection techniques against fault injection attacks that can
be applied to enhance the fault resistance
Fault Tolerance of Self Organizing Maps
International audienceAs the quest for performance confronts resource constraints, major breakthroughs in computing efficiency are expected to benefit from unconventional approaches and new models of computation such as brain-inspired computing. Beyond energy, the growing number of defects in physical substrates is becoming another major constraint that affects the design of computing devices and systems. Neural computing principles remain elusive, yet they are considered as the source of a promising paradigm to achieve fault-tolerant computation. Since the quest for fault tolerance can be translated into scalable and reliable computing systems, hardware design itself and the potential use of faulty circuits have motivated further the investigation on neural networks, which are potentially capable of absorbing some degrees of vulnerability based on their natural properties. In this paper, the fault tolerance properties of Self Organizing Maps (SOMs) are investigated. To asses the intrinsic fault tolerance and considering a general fully parallel digital implementations of SOM, we use the bit-flip fault model to inject faults in registers holding SOM weights. The distortion measure is used to evaluate performance on synthetic datasets and under different fault ratios. Additionally, we evaluate three passive techniques intended to enhance fault tolerance of SOM during training/learning under different scenarios
Evaluation of Different Fault Diagnosis Methods and Their Applications in Vehicle Systems
A high level of automation in vehicles is accompanied by a variety of sensors and actuators, whose malfunctions must be dealt with caution because they might cause serious driving safety hazards. Therefore, a robust and highly accurate fault detection and diagnosis system to monitor the operational states of vehicle systems is an indispensable prerequisite. In the area of fault diagnosis, numerous techniques have been studied, and each one has pros and cons. Selecting the best approach based on the requirements or usage scenarios will save much needless work. In this article, the authors examine some of the most common fault diagnosis methods for their applicability to automated vehicle systems: the traditional binary logic method, the fuzzy logic method, the fuzzy neural method, and two neural network methods (the feedforward neural network and the convolutional neural network). For each approach, the diagnosis algorithms for vehicle systems were modeled differently. The analysis of the detection capabilities and the suitable application scenarios of each fault diagnosis approach for vehicle systems, as well as recommendations for selecting different methods for various diagnosis needs, are also provided. In the future, this can serve as an effective guide for the selection of a suitable fault diagnosis approach based on the application scenarios for vehicle systems
- …