6,638 research outputs found
Hardness of Learning Neural Networks with Natural Weights
Neural networks are nowadays highly successful despite strong hardness
results. The existing hardness results focus on the network architecture, and
assume that the network's weights are arbitrary. A natural approach to settle
the discrepancy is to assume that the network's weights are "well-behaved" and
posses some generic properties that may allow efficient learning. This approach
is supported by the intuition that the weights in real-world networks are not
arbitrary, but exhibit some "random-like" properties with respect to some
"natural" distributions. We prove negative results in this regard, and show
that for depth- networks, and many "natural" weights distributions such as
the normal and the uniform distribution, most networks are hard to learn.
Namely, there is no efficient learning algorithm that is provably successful
for most weights, and every input distribution. It implies that there is no
generic property that holds with high probability in such random networks and
allows efficient learning
-regularized Neural Networks are Improperly Learnable in Polynomial Time
We study the improper learning of multi-layer neural networks. Suppose that
the neural network to be learned has hidden layers and that the
-norm of the incoming weights of any neuron is bounded by . We
present a kernel-based method, such that with probability at least , it learns a predictor whose generalization error is at most
worse than that of the neural network. The sample complexity and the time
complexity of the presented method are polynomial in the input dimension and in
, where is a function depending on
and on the activation function, independent of the number of neurons.
The algorithm applies to both sigmoid-like activation functions and ReLU-like
activation functions. It implies that any sufficiently sparse neural network is
learnable in polynomial time.Comment: 16 page
Angular Visual Hardness
Recent convolutional neural networks (CNNs) have led to impressive performance but often suffer from poor calibration. They tend to be overconfident, with the model confidence not always reflecting the underlying true ambiguity and hardness. In this paper, we propose angular visual hardness (AVH), a score given by the normalized angular distance between the sample feature embedding and the target classifier to measure sample hardness. We validate this score with an in-depth and extensive scientific study, and observe that CNN models with the highest accuracy also have the best AVH scores. This agrees with an earlier finding that state-of-art models improve on the classification of harder examples. We observe that the training dynamics of AVH is vastly different compared to the training loss. Specifically, AVH quickly reaches a plateau for all samples even though the training loss keeps improving. This suggests the need for designing better loss functions that can target harder examples more effectively. We also find that AVH has a statistically significant correlation with human visual hardness. Finally, we demonstrate the benefit of AVH to a variety of applications such as self-training for domain adaptation and domain generalization
Complexity of Training ReLU Neural Network
In this paper, we explore some basic questions on the complexity of training
Neural networks with ReLU activation function. We show that it is NP-hard to
train a two- hidden layer feedforward ReLU neural network. If dimension d of
the data is fixed then we show that there exists a polynomial time algorithm
for the same training problem. We also show that if sufficient
over-parameterization is provided in the first hidden layer of ReLU neural
network then there is a polynomial time algorithm which finds weights such that
output of the over-parameterized ReLU neural network matches with the output of
the given dat
Mixing Complexity and its Applications to Neural Networks
We suggest analyzing neural networks through the prism of space constraints.
We observe that most training algorithms applied in practice use bounded
memory, which enables us to use a new notion introduced in the study of
space-time tradeoffs that we call mixing complexity. This notion was devised in
order to measure the (in)ability to learn using a bounded-memory algorithm. In
this paper we describe how we use mixing complexity to obtain new results on
what can and cannot be learned using neural networks
Learning Halfspaces and Neural Networks with Random Initialization
We study non-convex empirical risk minimization for learning halfspaces and
neural networks. For loss functions that are -Lipschitz continuous, we
present algorithms to learn halfspaces and multi-layer neural networks that
achieve arbitrarily small excess risk . The time complexity is
polynomial in the input dimension and the sample size , but exponential
in the quantity . These algorithms run multiple
rounds of random initialization followed by arbitrary optimization steps. We
further show that if the data is separable by some neural network with constant
margin , then there is a polynomial-time algorithm for learning a
neural network that separates the training data with margin .
As a consequence, the algorithm achieves arbitrary generalization error
with sample and time complexity. We
establish the same learnability result when the labels are randomly flipped
with probability .Comment: 31 page
On the Connection Between Learning Two-Layers Neural Networks and Tensor Decomposition
We establish connections between the problem of learning a two-layer neural
network and tensor decomposition. We consider a model with feature vectors
, hidden units with weights and output , i.e., , with activation functions
given by low-degree polynomials. In particular, if , we prove that no polynomial-time learning algorithm can
outperform the trivial predictor that assigns to each example the response
variable , when . Our conclusion holds for a
`natural data distribution', namely standard Gaussian feature vectors
, and output distributed according to a two-layer neural network
with random isotropic weights, and under a certain complexity-theoretic
assumption on tensor decomposition. Roughly speaking, we assume that no
polynomial-time algorithm can substantially outperform current methods for
tensor decomposition based on the sum-of-squares hierarchy.
We also prove generalizations of this statement for higher degree polynomial
activations, and non-random weight vectors. Remarkably, several existing
algorithms for learning two-layer networks with rigorous guarantees are based
on tensor decomposition. Our results support the idea that this is indeed the
core computational difficulty in learning such networks, under the stated
generative model for the data. As a side result, we show that under this model
learning the network requires accurate learning of its weights, a property that
does not hold in a more general setting.Comment: 41 pages, 1 figur
Robust Optimization for Non-Convex Objectives
We consider robust optimization problems, where the goal is to optimize in
the worst case over a class of objective functions. We develop a reduction from
robust improper optimization to Bayesian optimization: given an oracle that
returns -approximate solutions for distributions over objectives, we
compute a distribution over solutions that is -approximate in the worst
case. We show that de-randomizing this solution is NP-hard in general, but can
be done for a broad class of statistical learning tasks. We apply our results
to robust neural network training and submodular optimization. We evaluate our
approach experimentally on corrupted character classification, and robust
influence maximization in networks
The Limitations of Deep Learning in Adversarial Settings
Deep learning takes advantage of large datasets and computationally efficient
training algorithms to outperform other approaches at various machine learning
tasks. However, imperfections in the training phase of deep neural networks
make them vulnerable to adversarial samples: inputs crafted by adversaries with
the intent of causing deep neural networks to misclassify. In this work, we
formalize the space of adversaries against deep neural networks (DNNs) and
introduce a novel class of algorithms to craft adversarial samples based on a
precise understanding of the mapping between inputs and outputs of DNNs. In an
application to computer vision, we show that our algorithms can reliably
produce samples correctly classified by human subjects but misclassified in
specific targets by a DNN with a 97% adversarial success rate while only
modifying on average 4.02% of the input features per sample. We then evaluate
the vulnerability of different sample classes to adversarial perturbations by
defining a hardness measure. Finally, we describe preliminary work outlining
defenses against adversarial samples by defining a predictive measure of
distance between a benign input and a target classification.Comment: Accepted to the 1st IEEE European Symposium on Security & Privacy,
IEEE 2016. Saarbrucken, German
Function Norms and Regularization in Deep Networks
Deep neural networks (DNNs) have become increasingly important due to their
excellent empirical performance on a wide range of problems. However,
regularization is generally achieved by indirect means, largely due to the
complex set of functions defined by a network and the difficulty in measuring
function complexity. There exists no method in the literature for additive
regularization based on a norm of the function, as is classically considered in
statistical learning theory. In this work, we propose sampling-based
approximations to weighted function norms as regularizers for deep neural
networks. We provide, to the best of our knowledge, the first proof in the
literature of the NP-hardness of computing function norms of DNNs, motivating
the necessity of an approximate approach. We then derive a generalization bound
for functions trained with weighted norms and prove that a natural stochastic
optimization strategy minimizes the bound. Finally, we empirically validate the
improved performance of the proposed regularization strategies for both convex
function sets as well as DNNs on real-world classification and image
segmentation tasks demonstrating improved performance over weight decay,
dropout, and batch normalization. Source code will be released at the time of
publication.Comment: 17 pages, 8 figure
- …