7 research outputs found

    A Capacity Scaling Law for Artificial Neural Networks

    Full text link
    We derive the calculation of two critical numbers predicting the behavior of perceptron networks. First, we derive the calculation of what we call the lossless memory (LM) dimension. The LM dimension is a generalization of the Vapnik--Chervonenkis (VC) dimension that avoids structured data and therefore provides an upper bound for perfectly fitting almost any training data. Second, we derive what we call the MacKay (MK) dimension. This limit indicates a 50% chance of not being able to train a given function. Our derivations are performed by embedding a neural network into Shannon's communication model which allows to interpret the two points as capacities measured in bits. We present a proof and practical experiments that validate our upper bounds with repeatable experiments using different network configurations, diverse implementations, varying activation functions, and several learning algorithms. The bottom line is that the two capacity points scale strictly linear with the number of weights. Among other practical applications, our result allows to compare and benchmark different neural network implementations independent of a concrete learning task. Our results provide insight into the capabilities and limits of neural networks and generate valuable know how for experimental design decisions.Comment: 13 pages, 4 figures, 2 listings of source cod

    From Tinkering to Engineering: Measurements in Tensorflow Playground

    Full text link
    In this article, we present an extension of the Tensorflow Playground, called Tensorflow Meter (short TFMeter). TFMeter is an interactive neural network architecting tool that allows the visual creation of different architectures of neural networks. In addition to its ancestor, the playground, our tool shows information-theoretic measurements while constructing, training, and testing the network. As a result, each change results in a change in at least one of the measurements, providing for a better engineering intuition of what different architectures are able to learn. The measurements are derived from various places in the literature. In this demo, we describe our web application that is available online at http://tfmeter.icsi.berkeley.edu/ and argue that in the same way that the original Playground is meant to build an intuition about neural networks, our extension educates users on available measurements, which we hope will ultimately improve experimental design and reproducibility in the field.Comment: 3 pages, 3 figures, ICPR 202

    Principle of Conservation of Computational Complexity

    Full text link
    In this manuscript, we derive the principle of conservation of computational complexity. We measure computational complexity as the number of binary computations (decisions) required to solve a problem. Every problem then defines a unique solution space measurable in bits. For an exact result, decisions in the solution space can neither be predicted nor discarded, only transferred between input and algorithm. We demonstrate and explain this principle using the example of the propositional logic satisfiability problem (SATSAT). It inevitably follows that SAT∉P⇒P≠NPSAT \not\in P \Rightarrow P\neq NP. We also provide an alternative explanation for the undecidability of the halting problem based on the principle.Comment: This version of the article improves on the previous versions by generalizing to a general principle. This way, the very technical reduction of the halting problem to SAT_syntax is unnecessary. The authors would like to thank their peers for feedback and arxiv.org for enabling it. Feedback is always encourage

    One Bit Matters: Understanding Adversarial Examples as the Abuse of Redundancy

    Full text link
    Despite the great success achieved in machine learning (ML), adversarial examples have caused concerns with regards to its trustworthiness: A small perturbation of an input results in an arbitrary failure of an otherwise seemingly well-trained ML model. While studies are being conducted to discover the intrinsic properties of adversarial examples, such as their transferability and universality, there is insufficient theoretic analysis to help understand the phenomenon in a way that can influence the design process of ML experiments. In this paper, we deduce an information-theoretic model which explains adversarial attacks as the abuse of feature redundancies in ML algorithms. We prove that feature redundancy is a necessary condition for the existence of adversarial examples. Our model helps to explain some major questions raised in many anecdotal studies on adversarial examples. Our theory is backed up by empirical measurements of the information content of benign and adversarial examples on both image and text datasets. Our measurements show that typical adversarial examples introduce just enough redundancy to overflow the decision making of an ML model trained on corresponding benign examples. We conclude with actionable recommendations to improve the robustness of machine learners against adversarial examples

    A Practical Approach to Sizing Neural Networks

    Full text link
    Memorization is worst-case generalization. Based on MacKay's information theoretic model of supervised machine learning, this article discusses how to practically estimate the maximum size of a neural network given a training data set. First, we present four easily applicable rules to analytically determine the capacity of neural network architectures. This allows the comparison of the efficiency of different network architectures independently of a task. Second, we introduce and experimentally validate a heuristic method to estimate the neural network capacity requirement for a given dataset and labeling. This allows an estimate of the required size of a neural network for a given problem. We conclude the article with a discussion on the consequences of sizing the network wrongly, which includes both increased computation effort for training as well as reduced generalization capability

    Information Scaling Law of Deep Neural Networks

    Full text link
    With the rapid development of Deep Neural Networks (DNNs), various network models that show strong computing power and impressive expressive power are proposed. However, there is no comprehensive informational interpretation of DNNs from the perspective of information theory. Due to the nonlinear function and the uncertain number of layers and neural units used in the DNNs, the network structure shows nonlinearity and complexity. With the typical DNNs named Convolutional Arithmetic Circuits (ConvACs), the complex DNNs can be converted into mathematical formula. Thus, we can use rigorous mathematical theory especially the information theory to analyse the complicated DNNs. In this paper, we propose a novel information scaling law scheme that can interpret the network's inner organization by information theory. First, we show the informational interpretation of the activation function. Secondly, we prove that the information entropy increases when the information is transmitted through the ConvACs. Finally, we propose the information scaling law of ConvACs through making a reasonable assumption.Comment: 7 pages, 5 figure

    Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior

    Full text link
    We describe an approach to understand the peculiar and counterintuitive generalization properties of deep neural networks. The approach involves going beyond worst-case theoretical capacity control frameworks that have been popular in machine learning in recent years to revisit old ideas in the statistical mechanics of neural networks. Within this approach, we present a prototypical Very Simple Deep Learning (VSDL) model, whose behavior is controlled by two control parameters, one describing an effective amount of data, or load, on the network (that decreases when noise is added to the input), and one with an effective temperature interpretation (that increases when algorithms are early stopped). Using this model, we describe how a very simple application of ideas from the statistical mechanics theory of generalization provides a strong qualitative description of recently-observed empirical results regarding the inability of deep neural networks not to overfit training data, discontinuous learning and sharp transitions in the generalization properties of learning algorithms, etc.Comment: 31 pages; added brief discussion of recent papers that use/extend these idea
    corecore