11,505 research outputs found
What does fault tolerant Deep Learning need from MPI?
Deep Learning (DL) algorithms have become the de facto Machine Learning (ML)
algorithm for large scale data analysis. DL algorithms are computationally
expensive - even distributed DL implementations which use MPI require days of
training (model learning) time on commonly studied datasets. Long running DL
applications become susceptible to faults - requiring development of a fault
tolerant system infrastructure, in addition to fault tolerant DL algorithms.
This raises an important question: What is needed from MPI for de- signing
fault tolerant DL implementations? In this paper, we address this problem for
permanent faults. We motivate the need for a fault tolerant MPI specification
by an in-depth consideration of recent innovations in DL algorithms and their
properties, which drive the need for specific fault tolerance features. We
present an in-depth discussion on the suitability of different parallelism
types (model, data and hybrid); a need (or lack thereof) for check-pointing of
any critical data structures; and most importantly, consideration for several
fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI
and their applicability to fault tolerant DL implementations. We leverage a
distributed memory implementation of Caffe, currently available under the
Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches
by ex- tending MaTEx-Caffe for using ULFM-based implementation. Our evaluation
using the ImageNet dataset and AlexNet, and GoogLeNet neural network topologies
demonstrates the effectiveness of the proposed fault tolerant DL implementation
using OpenMPI based ULFM
Efficient fault-tolerant quantum computing
Fault tolerant quantum computing methods which work with efficient quantum
error correcting codes are discussed. Several new techniques are introduced to
restrict accumulation of errors before or during the recovery. Classes of
eligible quantum codes are obtained, and good candidates exhibited. This
permits a new analysis of the permissible error rates and minimum overheads for
robust quantum computing. It is found that, under the standard noise model of
ubiquitous stochastic, uncorrelated errors, a quantum computer need be only an
order of magnitude larger than the logical machine contained within it in order
to be reliable. For example, a scale-up by a factor of 22, with gate error rate
of order , is sufficient to permit large quantum algorithms such as
factorization of thousand-digit numbers.Comment: 21 pages plus 5 figures. Replaced with figures in new format to avoid
problem
Braid Matrices and Quantum Gates for Ising Anyons Topological Quantum Computation
We study various aspects of the topological quantum computation scheme based
on the non-Abelian anyons corresponding to fractional quantum hall effect
states at filling fraction 5/2 using the Temperley-Lieb recoupling theory.
Unitary braiding matrices are obtained by a normalization of the degenerate
ground states of a system of anyons, which is equivalent to a modification of
the definition of the 3-vertices in the Temperley-Lieb recoupling theory as
proposed by Kauffman and Lomonaco. With the braid matrices available, we
discuss the problems of encoding of qubit states and construction of quantum
gates from the elementary braiding operation matrices for the Ising anyons
model. In the encoding scheme where 2 qubits are represented by 8 Ising anyons,
we give an alternative proof of the no-entanglement theorem given by Bravyi and
compare it to the case of Fibonacci anyons model. In the encoding scheme where
2 qubits are represented by 6 Ising anyons, we construct a set of quantum gates
which is equivalent to the construction of Georgiev.Comment: 25 pages, 13 figure
- …