1,859 research outputs found
Generalization Error in Deep Learning
Deep learning models have lately shown great performance in various fields
such as computer vision, speech recognition, speech translation, and natural
language processing. However, alongside their state-of-the-art performance, it
is still generally unclear what is the source of their generalization ability.
Thus, an important question is what makes deep neural networks able to
generalize well from the training set to new data. In this article, we provide
an overview of the existing theory and bounds for the characterization of the
generalization error of deep neural networks, combining both classical and more
recent theoretical and empirical results
Robustness and Regularization of Support Vector Machines
We consider regularized support vector machines (SVMs) and show that they are
precisely equivalent to a new robust optimization formulation. We show that
this equivalence of robust optimization and regularization has implications for
both algorithms, and analysis. In terms of algorithms, the equivalence suggests
more general SVM-like algorithms for classification that explicitly build in
protection to noise, and at the same time control overfitting. On the analysis
front, the equivalence of robustness and regularization, provides a robust
optimization interpretation for the success of regularized SVMs. We use the
this new robustness interpretation of SVMs to give a new proof of consistency
of (kernelized) SVMs, thus establishing robustness as the reason regularized
SVMs generalize well
Robust Large-Margin Learning in Hyperbolic Space
Recently, there has been a surge of interest in representation learning in
hyperbolic spaces, driven by their ability to represent hierarchical data with
significantly fewer dimensions than standard Euclidean spaces. However, the
viability and benefits of hyperbolic spaces for downstream machine learning
tasks have received less attention. In this paper, we present, to our
knowledge, the first theoretical guarantees for learning a classifier in
hyperbolic rather than Euclidean space. Specifically, we consider the problem
of learning a large-margin classifier for data possessing a hierarchical
structure. Our first contribution is a hyperbolic perceptron algorithm, which
provably converges to a separating hyperplane. We then provide an algorithm to
efficiently learn a large-margin hyperplane, relying on the careful injection
of adversarial examples. Finally, we prove that for hierarchical data that
embeds well into hyperbolic space, the low embedding dimension ensures superior
guarantees when learning the classifier directly in hyperbolic space.Comment: Accepted to NeurIPS 202
- …