181 research outputs found
Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality
Adversarial training is a popular method to give neural nets robustness
against adversarial perturbations. In practice adversarial training leads to
low robust training loss. However, a rigorous explanation for why this happens
under natural conditions is still missing. Recently a convergence theory for
standard (non-adversarial) supervised training was developed by various groups
for {\em very overparametrized} nets. It is unclear how to extend these results
to adversarial training because of the min-max objective. Recently, a first
step towards this direction was made by Gao et al. using tools from online
learning, but they require the width of the net to be \emph{exponential} in
input dimension , and with an unnatural activation function. Our work proves
convergence to low robust training loss for \emph{polynomial} width instead of
exponential, under natural assumptions and with the ReLU activation. Key
element of our proof is showing that ReLU networks near initialization can
approximate the step function, which may be of independent interest
Guarantees on learning depth-2 neural networks under a data-poisoning attack
In recent times many state-of-the-art machine learning models have been shown
to be fragile to adversarial attacks. In this work we attempt to build our
theoretical understanding of adversarially robust learning with neural nets. We
demonstrate a specific class of neural networks of finite size and a
non-gradient stochastic algorithm which tries to recover the weights of the net
generating the realizable true labels in the presence of an oracle doing a
bounded amount of malicious additive distortion to the labels. We prove (nearly
optimal) trade-offs among the magnitude of the adversarial attack, the accuracy
and the confidence achieved by the proposed algorithm.Comment: 11 page
Frivolous Units: Wider Networks Are Not Really That Wide
A remarkable characteristic of overparameterized deep neural networks (DNNs)
is that their accuracy does not degrade when the network's width is increased.
Recent evidence suggests that developing compressible representations is key
for adjusting the complexity of large networks to the learning task at hand.
However, these compressible representations are poorly understood. A promising
strand of research inspired from biology is understanding representations at
the unit level as it offers a more granular and intuitive interpretation of the
neural mechanisms. In order to better understand what facilitates increases in
width without decreases in accuracy, we ask: Are there mechanisms at the unit
level by which networks control their effective complexity as their width is
increased? If so, how do these depend on the architecture, dataset, and
training parameters? We identify two distinct types of "frivolous" units that
proliferate when the network's width is increased: prunable units which can be
dropped out of the network without significant change to the output and
redundant units whose activities can be expressed as a linear combination of
others. These units imply complexity constraints as the function the network
represents could be expressed by a network without them. We also identify how
the development of these units can be influenced by architecture and a number
of training factors. Together, these results help to explain why the accuracy
of DNNs does not degrade when width is increased and highlight the importance
of frivolous units toward understanding implicit regularization in DNNs
Non-negative Least Squares via Overparametrization
In many applications, solutions of numerical problems are required to be
non-negative, e.g., when retrieving pixel intensity values or physical
densities of a substance. In this context, non-negative least squares (NNLS) is
a ubiquitous tool, e.g., when seeking sparse solutions of high-dimensional
statistical problems. Despite vast efforts since the seminal work of Lawson and
Hanson in the '70s, the non-negativity assumption is still an obstacle for the
theoretical analysis and scalability of many off-the-shelf solvers. In the
different context of deep neural networks, we recently started to see that the
training of overparametrized models via gradient descent leads to surprising
generalization properties and the retrieval of regularized solutions. In this
paper, we prove that, by using an overparametrized formulation, NNLS solutions
can reliably be approximated via vanilla gradient flow. We furthermore
establish stability of the method against negative perturbations of the
ground-truth. Our simulations confirm that this allows the use of vanilla
gradient descent as a novel and scalable numerical solver for NNLS. From a
conceptual point of view, our work proposes a novel approach to trading
side-constraints in optimization problems against complexity of the
optimization landscape, which does not build upon the concept of Lagrangian
multipliers
- …