7 research outputs found
A Capacity Scaling Law for Artificial Neural Networks
We derive the calculation of two critical numbers predicting the behavior of
perceptron networks. First, we derive the calculation of what we call the
lossless memory (LM) dimension. The LM dimension is a generalization of the
Vapnik--Chervonenkis (VC) dimension that avoids structured data and therefore
provides an upper bound for perfectly fitting almost any training data. Second,
we derive what we call the MacKay (MK) dimension. This limit indicates a 50%
chance of not being able to train a given function. Our derivations are
performed by embedding a neural network into Shannon's communication model
which allows to interpret the two points as capacities measured in bits. We
present a proof and practical experiments that validate our upper bounds with
repeatable experiments using different network configurations, diverse
implementations, varying activation functions, and several learning algorithms.
The bottom line is that the two capacity points scale strictly linear with the
number of weights. Among other practical applications, our result allows to
compare and benchmark different neural network implementations independent of a
concrete learning task. Our results provide insight into the capabilities and
limits of neural networks and generate valuable know how for experimental
design decisions.Comment: 13 pages, 4 figures, 2 listings of source cod
From Tinkering to Engineering: Measurements in Tensorflow Playground
In this article, we present an extension of the Tensorflow Playground, called
Tensorflow Meter (short TFMeter). TFMeter is an interactive neural network
architecting tool that allows the visual creation of different architectures of
neural networks. In addition to its ancestor, the playground, our tool shows
information-theoretic measurements while constructing, training, and testing
the network. As a result, each change results in a change in at least one of
the measurements, providing for a better engineering intuition of what
different architectures are able to learn. The measurements are derived from
various places in the literature. In this demo, we describe our web application
that is available online at http://tfmeter.icsi.berkeley.edu/ and argue that in
the same way that the original Playground is meant to build an intuition about
neural networks, our extension educates users on available measurements, which
we hope will ultimately improve experimental design and reproducibility in the
field.Comment: 3 pages, 3 figures, ICPR 202
Principle of Conservation of Computational Complexity
In this manuscript, we derive the principle of conservation of computational
complexity. We measure computational complexity as the number of binary
computations (decisions) required to solve a problem. Every problem then
defines a unique solution space measurable in bits. For an exact result,
decisions in the solution space can neither be predicted nor discarded, only
transferred between input and algorithm. We demonstrate and explain this
principle using the example of the propositional logic satisfiability problem
(). It inevitably follows that . We
also provide an alternative explanation for the undecidability of the halting
problem based on the principle.Comment: This version of the article improves on the previous versions by
generalizing to a general principle. This way, the very technical reduction
of the halting problem to SAT_syntax is unnecessary. The authors would like
to thank their peers for feedback and arxiv.org for enabling it. Feedback is
always encourage
One Bit Matters: Understanding Adversarial Examples as the Abuse of Redundancy
Despite the great success achieved in machine learning (ML), adversarial
examples have caused concerns with regards to its trustworthiness: A small
perturbation of an input results in an arbitrary failure of an otherwise
seemingly well-trained ML model. While studies are being conducted to discover
the intrinsic properties of adversarial examples, such as their transferability
and universality, there is insufficient theoretic analysis to help understand
the phenomenon in a way that can influence the design process of ML
experiments. In this paper, we deduce an information-theoretic model which
explains adversarial attacks as the abuse of feature redundancies in ML
algorithms. We prove that feature redundancy is a necessary condition for the
existence of adversarial examples. Our model helps to explain some major
questions raised in many anecdotal studies on adversarial examples. Our theory
is backed up by empirical measurements of the information content of benign and
adversarial examples on both image and text datasets. Our measurements show
that typical adversarial examples introduce just enough redundancy to overflow
the decision making of an ML model trained on corresponding benign examples. We
conclude with actionable recommendations to improve the robustness of machine
learners against adversarial examples
A Practical Approach to Sizing Neural Networks
Memorization is worst-case generalization. Based on MacKay's information
theoretic model of supervised machine learning, this article discusses how to
practically estimate the maximum size of a neural network given a training data
set. First, we present four easily applicable rules to analytically determine
the capacity of neural network architectures. This allows the comparison of the
efficiency of different network architectures independently of a task. Second,
we introduce and experimentally validate a heuristic method to estimate the
neural network capacity requirement for a given dataset and labeling. This
allows an estimate of the required size of a neural network for a given
problem. We conclude the article with a discussion on the consequences of
sizing the network wrongly, which includes both increased computation effort
for training as well as reduced generalization capability
Information Scaling Law of Deep Neural Networks
With the rapid development of Deep Neural Networks (DNNs), various network
models that show strong computing power and impressive expressive power are
proposed. However, there is no comprehensive informational interpretation of
DNNs from the perspective of information theory. Due to the nonlinear function
and the uncertain number of layers and neural units used in the DNNs, the
network structure shows nonlinearity and complexity. With the typical DNNs
named Convolutional Arithmetic Circuits (ConvACs), the complex DNNs can be
converted into mathematical formula. Thus, we can use rigorous mathematical
theory especially the information theory to analyse the complicated DNNs. In
this paper, we propose a novel information scaling law scheme that can
interpret the network's inner organization by information theory. First, we
show the informational interpretation of the activation function. Secondly, we
prove that the information entropy increases when the information is
transmitted through the ConvACs. Finally, we propose the information scaling
law of ConvACs through making a reasonable assumption.Comment: 7 pages, 5 figure
Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior
We describe an approach to understand the peculiar and counterintuitive
generalization properties of deep neural networks. The approach involves going
beyond worst-case theoretical capacity control frameworks that have been
popular in machine learning in recent years to revisit old ideas in the
statistical mechanics of neural networks. Within this approach, we present a
prototypical Very Simple Deep Learning (VSDL) model, whose behavior is
controlled by two control parameters, one describing an effective amount of
data, or load, on the network (that decreases when noise is added to the
input), and one with an effective temperature interpretation (that increases
when algorithms are early stopped). Using this model, we describe how a very
simple application of ideas from the statistical mechanics theory of
generalization provides a strong qualitative description of recently-observed
empirical results regarding the inability of deep neural networks not to
overfit training data, discontinuous learning and sharp transitions in the
generalization properties of learning algorithms, etc.Comment: 31 pages; added brief discussion of recent papers that use/extend
these idea