2,755 research outputs found

    Why and When Can Deep -- but Not Shallow -- Networks Avoid the Curse of Dimensionality: a Review

    Get PDF
    The paper characterizes classes of functions for which deep learning can be exponentially better than shallow learning. Deep convolutional networks are a special case of these conditions, though weight sharing is not the main reason for their exponential advantage

    Approximation Error Bounds via Rademacher's Complexity

    Get PDF
    Approximation properties of some connectionistic models, commonly used to construct approximation schemes for optimization problems with multivariable functions as admissible solutions, are investigated. Such models are made up of linear combinations of computational units with adjustable parameters. The relationship between model complexity (number of computational units) and approximation error is investigated using tools from Statistical Learning Theory, such as Talagrand's inequality, fat-shattering dimension, and Rademacher's complexity. For some families of multivariable functions, estimates of the approximation accuracy of models with certain computational units are derived in dependence of the Rademacher's complexities of the families. The estimates improve previously-available ones, which were expressed in terms of V C dimension and derived by exploiting union-bound techniques. The results are applied to approximation schemes with certain radial-basis-functions as computational units, for which it is shown that the estimates do not exhibit the curse of dimensionality with respect to the number of variables

    Theory I: Why and When Can Deep Networks Avoid the Curse of Dimensionality?

    Get PDF
    [formerly titled "Why and When Can Deep – but Not Shallow – Networks Avoid the Curse of Dimensionality: a Review"] The paper reviews and extends an emerging body of theoretical results on deep learning including the conditions under which it can be exponentially better than shallow learning. A class of deep convolutional networks represent an important special case of these conditions, though weight sharing is not the main reason for their exponential advantage. Implications of a few key theorems are discussed, together with new results, open problems and conjectures.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF – 1231216

    Signal Perceptron: On the Identifiability of Boolean Function Spaces and Beyond

    Get PDF
    In a seminal book, Minsky and Papert define the perceptron as a limited implementation of what they called “parallel machines.” They showed that some binary Boolean functions including XOR are not definable in a single layer perceptron due to its limited capacity to learn only linearly separable functions. In this work, we propose a new more powerful implementation of such parallel machines. This new mathematical tool is defined using analytic sinusoids—instead of linear combinations—to form an analytic signal representation of the function that we want to learn. We show that this re-formulated parallel mechanism can learn, with a single layer, any non-linear k-ary Boolean function. Finally, to provide an example of its practical applications, we show that it outperforms the single hidden layer multilayer perceptron in both Boolean function learning and image classification tasks, while also being faster and requiring fewer parameters

    On the Relationship Between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions

    Get PDF
    In this paper, we bound the generalization error of a class of Radial Basis Function networks, for certain well defined function learning tasks, in terms of the number of parameters and number of examples. We show that the total generalization error is partly due to the insufficient representational capacity of the network (because of its finite size) and partly due to insufficient information about the target function (because of finite number of samples). We make several observations about generalization error which are valid irrespective of the approximation scheme. Our result also sheds light on ways to choose an appropriate network architecture for a particular problem
    corecore