182,431 research outputs found
Stochastic Deep Networks
Machine learning is increasingly targeting areas where input data cannot be
accurately described by a single vector, but can be modeled instead using the
more flexible concept of random vectors, namely probability measures or more
simply point clouds of varying cardinality. Using deep architectures on
measures poses, however, many challenging issues. Indeed, deep architectures
are originally designed to handle fixedlength vectors, or, using recursive
mechanisms, ordered sequences thereof. In sharp contrast, measures describe a
varying number of weighted observations with no particular order. We propose in
this work a deep framework designed to handle crucial aspects of measures,
namely permutation invariances, variations in weights and cardinality.
Architectures derived from this pipeline can (i) map measures to measures -
using the concept of push-forward operators; (ii) bridge the gap between
measures and Euclidean spaces - through integration steps. This allows to
design discriminative networks (to classify or reduce the dimensionality of
input measures), generative architectures (to synthesize measures) and
recurrent pipelines (to predict measure dynamics). We provide a theoretical
analysis of these building blocks, review our architectures' approximation
abilities and robustness w.r.t. perturbation, and try them on various
discriminative and generative tasks
Learning for Single-Shot Confidence Calibration in Deep Neural Networks through Stochastic Inferences
We propose a generic framework to calibrate accuracy and confidence of a
prediction in deep neural networks through stochastic inferences. We interpret
stochastic regularization using a Bayesian model, and analyze the relation
between predictive uncertainty of networks and variance of the prediction
scores obtained by stochastic inferences for a single example. Our empirical
study shows that the accuracy and the score of a prediction are highly
correlated with the variance of multiple stochastic inferences given by
stochastic depth or dropout. Motivated by this observation, we design a novel
variance-weighted confidence-integrated loss function that is composed of two
cross-entropy loss terms with respect to ground-truth and uniform distribution,
which are balanced by variance of stochastic prediction scores. The proposed
loss function enables us to learn deep neural networks that predict confidence
calibrated scores using a single inference. Our algorithm presents outstanding
confidence calibration performance and improves classification accuracy when
combined with two popular stochastic regularization techniques---stochastic
depth and dropout---in multiple models and datasets; it alleviates
overconfidence issue in deep neural networks significantly by training networks
to achieve prediction accuracy proportional to confidence of prediction
Overcoming Challenges in Fixed Point Training of Deep Convolutional Networks
It is known that training deep neural networks, in particular, deep
convolutional networks, with aggressively reduced numerical precision is
challenging. The stochastic gradient descent algorithm becomes unstable in the
presence of noisy gradient updates resulting from arithmetic with limited
numeric precision. One of the well-accepted solutions facilitating the training
of low precision fixed point networks is stochastic rounding. However, to the
best of our knowledge, the source of the instability in training neural
networks with noisy gradient updates has not been well investigated. This work
is an attempt to draw a theoretical connection between low numerical precision
and training algorithm stability. In doing so, we will also propose and verify
through experiments methods that are able to improve the training performance
of deep convolutional networks in fixed point.Comment: ICML2016 - Workshop on On-Device Intelligenc
When Does Stochastic Gradient Algorithm Work Well?
In this paper, we consider a general stochastic optimization problem which is
often at the core of supervised learning, such as deep learning and linear
classification. We consider a standard stochastic gradient descent (SGD) method
with a fixed, large step size and propose a novel assumption on the objective
function, under which this method has the improved convergence rates (to a
neighborhood of the optimal solutions). We then empirically demonstrate that
these assumptions hold for logistic regression and standard deep neural
networks on classical data sets. Thus our analysis helps to explain when
efficient behavior can be expected from the SGD method in training
classification models and deep neural networks
Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations
In our work, we bridge deep neural network design with numerical differential
equations. We show that many effective networks, such as ResNet, PolyNet,
FractalNet and RevNet, can be interpreted as different numerical
discretizations of differential equations. This finding brings us a brand new
perspective on the design of effective deep architectures. We can take
advantage of the rich knowledge in numerical analysis to guide us in designing
new and potentially more effective deep networks. As an example, we propose a
linear multi-step architecture (LM-architecture) which is inspired by the
linear multi-step method solving ordinary differential equations. The
LM-architecture is an effective structure that can be used on any ResNet-like
networks. In particular, we demonstrate that LM-ResNet and LM-ResNeXt (i.e. the
networks obtained by applying the LM-architecture on ResNet and ResNeXt
respectively) can achieve noticeably higher accuracy than ResNet and ResNeXt on
both CIFAR and ImageNet with comparable numbers of trainable parameters. In
particular, on both CIFAR and ImageNet, LM-ResNet/LM-ResNeXt can significantly
compress (\%) the original networks while maintaining a similar
performance. This can be explained mathematically using the concept of modified
equation from numerical analysis. Last but not least, we also establish a
connection between stochastic control and noise injection in the training
process which helps to improve generalization of the networks. Furthermore, by
relating stochastic training strategy with stochastic dynamic system, we can
easily apply stochastic training to the networks with the LM-architecture. As
an example, we introduced stochastic depth to LM-ResNet and achieve significant
improvement over the original LM-ResNet on CIFAR10
Progressive Stochastic Binarization of Deep Networks
A plethora of recent research has focused on improving the memory footprint
and inference speed of deep networks by reducing the complexity of (i)
numerical representations (for example, by deterministic or stochastic
quantization) and (ii) arithmetic operations (for example, by binarization of
weights).
We propose a stochastic binarization scheme for deep networks that allows for
efficient inference on hardware by restricting itself to additions of small
integers and fixed shifts. Unlike previous approaches, the underlying
randomized approximation is progressive, thus permitting an adaptive control of
the accuracy of each operation at run-time. In a low-precision setting, we
match the accuracy of previous binarized approaches. Our representation is
unbiased - it approaches continuous computation with increasing sample size. In
a high-precision regime, the computational costs are competitive with previous
quantization schemes. Progressive stochastic binarization also permits
localized, dynamic accuracy control within a single network, thereby providing
a new tool for adaptively focusing computational attention.
We evaluate our method on networks of various architectures, already
pretrained on ImageNet. With representational costs comparable to previous
schemes, we obtain accuracies close to the original floating point
implementation. This includes pruned networks, except the known special case of
certain types of separated convolutions. By focusing computational attention
using progressive sampling, we reduce inference costs on ImageNet further by a
factor of up to 33% (before network pruning)
Steps Toward Deep Kernel Methods from Infinite Neural Networks
Contemporary deep neural networks exhibit impressive results on practical
problems. These networks generalize well although their inherent capacity may
extend significantly beyond the number of training examples. We analyze this
behavior in the context of deep, infinite neural networks. We show that deep
infinite layers are naturally aligned with Gaussian processes and kernel
methods, and devise stochastic kernels that encode the information of these
networks. We show that stability results apply despite the size, offering an
explanation for their empirical success
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks
We study the problem of training deep neural networks with Rectified Linear
Unit (ReLU) activation function using gradient descent and stochastic gradient
descent. In particular, we study the binary classification problem and show
that for a broad family of loss functions, with proper random weight
initialization, both gradient descent and stochastic gradient descent can find
the global minima of the training loss for an over-parameterized deep ReLU
network, under mild assumption on the training data. The key idea of our proof
is that Gaussian random initialization followed by (stochastic) gradient
descent produces a sequence of iterates that stay inside a small perturbation
region centering around the initial weights, in which the empirical loss
function of deep ReLU networks enjoys nice local curvature properties that
ensure the global convergence of (stochastic) gradient descent. Our theoretical
results shed light on understanding the optimization for deep learning, and
pave the way for studying the optimization dynamics of training modern deep
neural networks.Comment: 54 pages. This version relaxes the assumptions on the loss functions
and data distribution, and improves the dependency on the problem-specific
parameters in the main theor
StochasticNet: Forming Deep Neural Networks via Stochastic Connectivity
Deep neural networks is a branch in machine learning that has seen a meteoric
rise in popularity due to its powerful abilities to represent and model
high-level abstractions in highly complex data. One area in deep neural
networks that is ripe for exploration is neural connectivity formation. A
pivotal study on the brain tissue of rats found that synaptic formation for
specific functional connectivity in neocortical neural microcircuits can be
surprisingly well modeled and predicted as a random formation. Motivated by
this intriguing finding, we introduce the concept of StochasticNet, where deep
neural networks are formed via stochastic connectivity between neurons. As a
result, any type of deep neural networks can be formed as a StochasticNet by
allowing the neuron connectivity to be stochastic. Stochastic synaptic
formations, in a deep neural network architecture, can allow for efficient
utilization of neurons for performing specific tasks. To evaluate the
feasibility of such a deep neural network architecture, we train a
StochasticNet using four different image datasets (CIFAR-10, MNIST, SVHN, and
STL-10). Experimental results show that a StochasticNet, using less than half
the number of neural connections as a conventional deep neural network,
achieves comparable accuracy and reduces overfitting on the CIFAR-10, MNIST and
SVHN dataset. Interestingly, StochasticNet with less than half the number of
neural connections, achieved a higher accuracy (relative improvement in test
error rate of ~6% compared to ConvNet) on the STL-10 dataset than a
conventional deep neural network. Finally, StochasticNets have faster
operational speeds while achieving better or similar accuracy performances.Comment: 8 page
An All-Memristor Deep Spiking Neural Computing System: A Step Towards Realizing the Low Power,Stochastic Brain
Deep 'Analog Artificial Neural Networks' (ANNs) perform complex
classification problems with remarkably high accuracy. However, they rely on
humongous amount of power to perform the calculations, veiling the accuracy
benefits. The biological brain on the other hand is significantly more powerful
than such networks and consumes orders of magnitude less power, indicating us
about some conceptual mismatch. Given that the biological neurons communicate
using energy efficient trains of spikes, and the behavior is non-deterministic,
incorporating these effects in Deep Artificial Neural Networks may drive us few
steps towards a more realistic neuron. In this work, we propose how the
inherent stochasticity of nano-scale resistive devices can be harnessed to
emulate the functionality of a spiking neuron that can be incorporated in deep
stochastic Spiking Neural Networks (SNN). At the algorithmic level, we propose
how the training can be modified to convert an ANN to an SNN while supporting
the stochastic activation function offered by these devices. We devise circuit
architectures to incorporate stochastic memristive neurons along with
memristive crossbars which perform the functionality of the synaptic weights.
We tested the proposed All Memristor deep stochastic SNN for image
classification and observed only about 1% degradation in accuracy with the ANN
baseline after incorporating the circuit and device related non-idealities. We
witnessed that the network is robust to certain variations and consumes ~ 6.4x
less energy than its CMOS counterpart.Comment: In IEEE Transactions on Emerging Topics in Computational Intelligenc
- …