2,166 research outputs found
What does fault tolerant Deep Learning need from MPI?
Deep Learning (DL) algorithms have become the de facto Machine Learning (ML)
algorithm for large scale data analysis. DL algorithms are computationally
expensive - even distributed DL implementations which use MPI require days of
training (model learning) time on commonly studied datasets. Long running DL
applications become susceptible to faults - requiring development of a fault
tolerant system infrastructure, in addition to fault tolerant DL algorithms.
This raises an important question: What is needed from MPI for de- signing
fault tolerant DL implementations? In this paper, we address this problem for
permanent faults. We motivate the need for a fault tolerant MPI specification
by an in-depth consideration of recent innovations in DL algorithms and their
properties, which drive the need for specific fault tolerance features. We
present an in-depth discussion on the suitability of different parallelism
types (model, data and hybrid); a need (or lack thereof) for check-pointing of
any critical data structures; and most importantly, consideration for several
fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI
and their applicability to fault tolerant DL implementations. We leverage a
distributed memory implementation of Caffe, currently available under the
Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches
by ex- tending MaTEx-Caffe for using ULFM-based implementation. Our evaluation
using the ImageNet dataset and AlexNet, and GoogLeNet neural network topologies
demonstrates the effectiveness of the proposed fault tolerant DL implementation
using OpenMPI based ULFM
Artificial neural networks in geospatial analysis
Artificial neural networks are computational models widely used in geospatial analysis for data classification, change detection, clustering, function approximation, and forecasting or prediction. There are many types of neural networks based on learning paradigm and network architectures. Their use is expected to grow with increasing availability of massive data from remote sensing and mobile platforms
Quantum gate learning in engineered qubit networks: Toffoli gate with always-on interactions
We put forward a strategy to encode a quantum operation into the unmodulated
dynamics of a quantum network without the need of external control pulses,
measurements or active feedback. Our optimization scheme, inspired by
supervised machine learning, consists in engineering the pairwise couplings
between the network qubits so that the target quantum operation is encoded in
the natural reduced dynamics of a network section. The efficacy of the proposed
scheme is demonstrated by the finding of uncontrolled four-qubit networks that
implement either the Toffoli gate, the Fredkin gate, or remote logic
operations. The proposed Toffoli gate is stable against imperfections, has a
high-fidelity for fault tolerant quantum computation, and is fast, being based
on the non-equilibrium dynamics.Comment: 8 pages, 3 figure
A Study of Deep Learning Robustness Against Computation Failures
For many types of integrated circuits, accepting larger failure rates in
computations can be used to improve energy efficiency. We study the performance
of faulty implementations of certain deep neural networks based on pessimistic
and optimistic models of the effect of hardware faults. After identifying the
impact of hyperparameters such as the number of layers on robustness, we study
the ability of the network to compensate for computational failures through an
increase of the network size. We show that some networks can achieve equivalent
performance under faulty implementations, and quantify the required increase in
computational complexity
Investigation of Air Transportation Technology at Princeton University, 1989-1990
The Air Transportation Technology Program at Princeton University proceeded along six avenues during the past year: microburst hazards to aircraft; machine-intelligent, fault tolerant flight control; computer aided heuristics for piloted flight; stochastic robustness for flight control systems; neural networks for flight control; and computer aided control system design. These topics are briefly discussed, and an annotated bibliography of publications that appeared between January 1989 and June 1990 is given
Analysis of fault-tolerant neurocontrol architectures
The fault-tolerance of analog parallel distributed implementations of a multivariable aircraft neurocontroller is analyzed by simulating weight and neuron failures in a simplified scheme of analog processing based on the functional architecture of the ETANN chip (Electrically Trainable Artificial Neural Network). The neural information processing is found to be only partially distributed throughout the set of weights of the neurocontroller synthesized with the backpropagation algorithm. Although the degree of distribution of the neural processing, and consequently the fault-tolerance of the neurocontroller, could be enhanced using Locally Distributed Weight and Neuron Approaches, a satisfactory level of fault-tolerance could only be obtained by retraining the degrated VLSI neurocontroller. The possibility of maintaining neurocontrol performance and stability in the presence of single weight of neuron failures was demonstrated through an automated retraining procedure of the neurocontroller based on a pre-programmed choice and sequence of the training parameters
Neural networks for aircraft control
Current research in Artificial Neural Networks indicates that networks offer some potential advantages in adaptation and fault tolerance. This research is directed at determining the possible applicability of neural networks to aircraft control. The first application will be to aircraft trim. Neural network node characteristics, network topology and operation, neural network learning and example histories using neighboring optimal control with a neural net are discussed
- …