6 research outputs found
An Information-Theoretic View for Deep Learning
Deep learning has transformed computer vision, natural language processing,
and speech recognition\cite{badrinarayanan2017segnet, dong2016image,
ren2017faster, ji20133d}. However, two critical questions remain obscure: (1)
why do deep neural networks generalize better than shallow networks; and (2)
does it always hold that a deeper network leads to better performance?
Specifically, letting be the number of convolutional and pooling layers in
a deep neural network, and be the size of the training sample, we derive an
upper bound on the expected generalization error for this network, i.e.,
\begin{eqnarray*}
\mathbb{E}[R(W)-R_S(W)] \leq
\exp{\left(-\frac{L}{2}\log{\frac{1}{\eta}}\right)}\sqrt{\frac{2\sigma^2}{n}I(S,W)
}
\end{eqnarray*} where is a constant depending on the loss
function, is a constant depending on the information loss for each
convolutional or pooling layer, and is the mutual information between
the training sample and the output hypothesis . This upper bound shows
that as the number of convolutional and pooling layers increases in the
network, the expected generalization error will decrease exponentially to zero.
Layers with strict information loss, such as the convolutional layers, reduce
the generalization error for the whole network; this answers the first
question. However, algorithms with zero expected generalization error does not
imply a small test error or . This is because
is large when the information for fitting the data is lost
as the number of layers increases. This suggests that the claim `the deeper the
better' is conditioned on a small training error or .
Finally, we show that deep learning satisfies a weak notion of stability and
the sample complexity of deep neural networks will decrease as increases.Comment: Add details in the proof of Theorem
Theoretical Analysis of Adversarial Learning: A Minimax Approach
Here we propose a general theoretical method for analyzing the risk bound in
the presence of adversaries. Specifically, we try to fit the adversarial
learning problem into the minimax framework. We first show that the original
adversarial learning problem can be reduced to a minimax statistical learning
problem by introducing a transport map between distributions. Then, we prove a
new risk bound for this minimax problem in terms of covering numbers under a
weak version of Lipschitz condition. Our method can be applied to multi-class
classification problems and commonly used loss functions such as the hinge and
ramp losses. As some illustrative examples, we derive the adversarial risk
bounds for SVMs, deep neural networks, and PCA, and our bounds have two
data-dependent terms, which can be optimized for achieving adversarial
robustness.Comment: 27 pages, add some reference
Autonomous Deep Quality Monitoring in Streaming Environments
The common practice of quality monitoring in industry relies on manual
inspection well-known to be slow, error-prone and operator-dependent. This
issue raises strong demand for automated real-time quality monitoring developed
from data-driven approaches thus alleviating from operator dependence and
adapting to various process uncertainties. Nonetheless, current approaches do
not take into account the streaming nature of sensory information while relying
heavily on hand-crafted features making them application-specific. This paper
proposes the online quality monitoring methodology developed from recently
developed deep learning algorithms for data streams, Neural Networks with
Dynamically Evolved Capacity (NADINE), namely NADINE++. It features the
integration of 1-D and 2-D convolutional layers to extract natural features of
time-series and visual data streams captured from sensors and cameras of the
injection molding machines from our own project. Real-time experiments have
been conducted where the online quality monitoring task is simulated on the fly
under the prequential test-then-train fashion - the prominent data stream
evaluation protocol. Comparison with the state-of-the-art techniques clearly
exhibits the advantage of NADINE++ with 4.68\% improvement on average for the
quality monitoring task in streaming environments. To support the reproducible
research initiative, codes, results of NADINE++ along with supplementary
materials and injection molding dataset are made available in
\url{https://github.com/ContinualAL/NADINE-IJCNN2021}.Comment: This paper has been accepted for publication in IJCNN, 202
A Probabilistic Representation of DNNs: Bridging Mutual Information and Generalization
Recently, Mutual Information (MI) has attracted attention in bounding the
generalization error of Deep Neural Networks (DNNs). However, it is intractable
to accurately estimate the MI in DNNs, thus most previous works have to relax
the MI bound, which in turn weakens the information theoretic explanation for
generalization. To address the limitation, this paper introduces a
probabilistic representation of DNNs for accurately estimating the MI.
Leveraging the proposed MI estimator, we validate the information theoretic
explanation for generalization, and derive a tighter generalization bound than
the state-of-the-art relaxations.Comment: To appear in the ICML 2021 Workshop on Theoretic Foundation,
Criticism, and Application Trend of Explainable A
Think Global, Act Local: Relating DNN generalisation and node-level SNR
The reasons behind good DNN generalisation remain an open question. In this
paper we explore the problem by looking at the Signal-to-Noise Ratio of nodes
in the network. Starting from information theory principles, it is possible to
derive an expression for the SNR of a DNN node output. Using this expression we
construct figures-of-merit that quantify how well the weights of a node
optimise SNR (or, equivalently, information rate). Applying these
figures-of-merit, we give examples indicating that weight sets that promote
good SNR performance also exhibit good generalisation. In addition, we are able
to identify the qualities of weight sets that exhibit good SNR behaviour and
hence promote good generalisation. This leads to a discussion of how these
results relate to network training and regularisation. Finally, we identify
some ways that these observations can be used in training design.Comment: 15 pages, 5 figures; for associated colab files see
http://github.com/pnorridge/think-global-act-local/setting
Generalization Bounds for Convolutional Neural Networks
Convolutional neural networks (CNNs) have achieved breakthrough performances
in a wide range of applications including image classification, semantic
segmentation, and object detection. Previous research on characterizing the
generalization ability of neural networks mostly focuses on fully connected
neural networks (FNNs), regarding CNNs as a special case of FNNs without taking
into account the special structure of convolutional layers. In this work, we
propose a tighter generalization bound for CNNs by exploiting the sparse and
permutation structure of its weight matrices. As the generalization bound
relies on the spectral norm of weight matrices, we further study spectral norms
of three commonly used convolution operations including standard convolution,
depthwise convolution, and pointwise convolution. Theoretical and experimental
results both demonstrate that our bounds for CNNs are tighter than existing
bounds