181 research outputs found

    Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality

    Full text link
    Adversarial training is a popular method to give neural nets robustness against adversarial perturbations. In practice adversarial training leads to low robust training loss. However, a rigorous explanation for why this happens under natural conditions is still missing. Recently a convergence theory for standard (non-adversarial) supervised training was developed by various groups for {\em very overparametrized} nets. It is unclear how to extend these results to adversarial training because of the min-max objective. Recently, a first step towards this direction was made by Gao et al. using tools from online learning, but they require the width of the net to be \emph{exponential} in input dimension dd, and with an unnatural activation function. Our work proves convergence to low robust training loss for \emph{polynomial} width instead of exponential, under natural assumptions and with the ReLU activation. Key element of our proof is showing that ReLU networks near initialization can approximate the step function, which may be of independent interest

    Guarantees on learning depth-2 neural networks under a data-poisoning attack

    Full text link
    In recent times many state-of-the-art machine learning models have been shown to be fragile to adversarial attacks. In this work we attempt to build our theoretical understanding of adversarially robust learning with neural nets. We demonstrate a specific class of neural networks of finite size and a non-gradient stochastic algorithm which tries to recover the weights of the net generating the realizable true labels in the presence of an oracle doing a bounded amount of malicious additive distortion to the labels. We prove (nearly optimal) trade-offs among the magnitude of the adversarial attack, the accuracy and the confidence achieved by the proposed algorithm.Comment: 11 page

    Frivolous Units: Wider Networks Are Not Really That Wide

    Full text link
    A remarkable characteristic of overparameterized deep neural networks (DNNs) is that their accuracy does not degrade when the network's width is increased. Recent evidence suggests that developing compressible representations is key for adjusting the complexity of large networks to the learning task at hand. However, these compressible representations are poorly understood. A promising strand of research inspired from biology is understanding representations at the unit level as it offers a more granular and intuitive interpretation of the neural mechanisms. In order to better understand what facilitates increases in width without decreases in accuracy, we ask: Are there mechanisms at the unit level by which networks control their effective complexity as their width is increased? If so, how do these depend on the architecture, dataset, and training parameters? We identify two distinct types of "frivolous" units that proliferate when the network's width is increased: prunable units which can be dropped out of the network without significant change to the output and redundant units whose activities can be expressed as a linear combination of others. These units imply complexity constraints as the function the network represents could be expressed by a network without them. We also identify how the development of these units can be influenced by architecture and a number of training factors. Together, these results help to explain why the accuracy of DNNs does not degrade when width is increased and highlight the importance of frivolous units toward understanding implicit regularization in DNNs

    Non-negative Least Squares via Overparametrization

    Full text link
    In many applications, solutions of numerical problems are required to be non-negative, e.g., when retrieving pixel intensity values or physical densities of a substance. In this context, non-negative least squares (NNLS) is a ubiquitous tool, e.g., when seeking sparse solutions of high-dimensional statistical problems. Despite vast efforts since the seminal work of Lawson and Hanson in the '70s, the non-negativity assumption is still an obstacle for the theoretical analysis and scalability of many off-the-shelf solvers. In the different context of deep neural networks, we recently started to see that the training of overparametrized models via gradient descent leads to surprising generalization properties and the retrieval of regularized solutions. In this paper, we prove that, by using an overparametrized formulation, NNLS solutions can reliably be approximated via vanilla gradient flow. We furthermore establish stability of the method against negative perturbations of the ground-truth. Our simulations confirm that this allows the use of vanilla gradient descent as a novel and scalable numerical solver for NNLS. From a conceptual point of view, our work proposes a novel approach to trading side-constraints in optimization problems against complexity of the optimization landscape, which does not build upon the concept of Lagrangian multipliers
    • …
    corecore