10,266 research outputs found
Learning to Sample Hard Instances for Graph Algorithms
Hard instances, which require a long time for a specific algorithm to solve,
help (1) analyze the algorithm for accelerating it and (2) build a good
benchmark for evaluating the performance of algorithms. There exist several
efforts for automatic generation of hard instances. For example, evolutionary
algorithms have been utilized to generate hard instances. However, they
generate only finite number of hard instances. The merit of such methods is
limited because it is difficult to extract meaningful patterns from small
number of instances. We seek for a probabilistic generator of hard instances.
Once the generative distribution of hard instances is obtained, we can sample a
variety of hard instances to build a benchmark, and we can extract meaningful
patterns of hard instances from sampled instances. The existing methods for
modeling the hard instance distribution rely on parameters or rules that are
found by domain experts; however, they are specific to the problem. Hence, it
is challenging to model the distribution for general cases. In this paper, we
focus on graph problems. We propose HiSampler, the hard instance sampler, to
model the hard instance distribution of graph algorithms. HiSampler makes it
possible to obtain the distribution of hard instances without hand-engineered
features. To the best of our knowledge, this is the first method to learn the
distribution of hard instances using machine learning. Through experiments, we
demonstrate that our proposed method can generate instances that are a few to
several orders of magnitude harder than the random-based approach in many
settings. In particular, our method outperforms rule-based algorithms in the
3-coloring problem.Comment: 16 pages, 4 figures, accepted by ACML 201
-regularized Neural Networks are Improperly Learnable in Polynomial Time
We study the improper learning of multi-layer neural networks. Suppose that
the neural network to be learned has hidden layers and that the
-norm of the incoming weights of any neuron is bounded by . We
present a kernel-based method, such that with probability at least , it learns a predictor whose generalization error is at most
worse than that of the neural network. The sample complexity and the time
complexity of the presented method are polynomial in the input dimension and in
, where is a function depending on
and on the activation function, independent of the number of neurons.
The algorithm applies to both sigmoid-like activation functions and ReLU-like
activation functions. It implies that any sufficiently sparse neural network is
learnable in polynomial time.Comment: 16 page
Motivating the Rules of the Game for Adversarial Example Research
Advances in machine learning have led to broad deployment of systems with
impressive performance on important problems. Nonetheless, these systems can be
induced to make errors on data that are surprisingly similar to examples the
learned system handles correctly. The existence of these errors raises a
variety of questions about out-of-sample generalization and whether bad actors
might use such examples to abuse deployed systems. As a result of these
security concerns, there has been a flurry of recent papers proposing
algorithms to defend against such malicious perturbations of correctly handled
examples. It is unclear how such misclassifications represent a different kind
of security problem than other errors, or even other attacker-produced examples
that have no specific relationship to an uncorrupted input. In this paper, we
argue that adversarial example defense papers have, to date, mostly considered
abstract, toy games that do not relate to any specific security concern.
Furthermore, defense papers have not yet precisely described all the abilities
and limitations of attackers that would be relevant in practical security.
Towards this end, we establish a taxonomy of motivations, constraints, and
abilities for more plausible adversaries. Finally, we provide a series of
recommendations outlining a path forward for future work to more clearly
articulate the threat model and perform more meaningful evaluation
On the Learnability of Deep Random Networks
In this paper we study the learnability of deep random networks from both
theoretical and practical points of view. On the theoretical front, we show
that the learnability of random deep networks with sign activation drops
exponentially with its depth. On the practical front, we find that the
learnability drops sharply with depth even with the state-of-the-art training
methods, suggesting that our stylized theoretical results are closer to
reality
Learning Halfspaces and Neural Networks with Random Initialization
We study non-convex empirical risk minimization for learning halfspaces and
neural networks. For loss functions that are -Lipschitz continuous, we
present algorithms to learn halfspaces and multi-layer neural networks that
achieve arbitrarily small excess risk . The time complexity is
polynomial in the input dimension and the sample size , but exponential
in the quantity . These algorithms run multiple
rounds of random initialization followed by arbitrary optimization steps. We
further show that if the data is separable by some neural network with constant
margin , then there is a polynomial-time algorithm for learning a
neural network that separates the training data with margin .
As a consequence, the algorithm achieves arbitrary generalization error
with sample and time complexity. We
establish the same learnability result when the labels are randomly flipped
with probability .Comment: 31 page
Mixing Complexity and its Applications to Neural Networks
We suggest analyzing neural networks through the prism of space constraints.
We observe that most training algorithms applied in practice use bounded
memory, which enables us to use a new notion introduced in the study of
space-time tradeoffs that we call mixing complexity. This notion was devised in
order to measure the (in)ability to learn using a bounded-memory algorithm. In
this paper we describe how we use mixing complexity to obtain new results on
what can and cannot be learned using neural networks
Revised Note on Learning Algorithms for Quadratic Assignment with Graph Neural Networks
Inverse problems correspond to a certain type of optimization problems
formulated over appropriate input distributions. Recently, there has been a
growing interest in understanding the computational hardness of these
optimization problems, not only in the worst case, but in an average-complexity
sense under this same input distribution.
In this revised note, we are interested in studying another aspect of
hardness, related to the ability to learn how to solve a problem by simply
observing a collection of previously solved instances. These 'planted
solutions' are used to supervise the training of an appropriate predictive
model that parametrizes a broad class of algorithms, with the hope that the
resulting model will provide good accuracy-complexity tradeoffs in the average
sense.
We illustrate this setup on the Quadratic Assignment Problem, a fundamental
problem in Network Science. We observe that data-driven models based on Graph
Neural Networks offer intriguingly good performance, even in regimes where
standard relaxation based techniques appear to suffer.Comment: Revised note to arXiv:1706.07450v1 that appeared in IEEE Data Science
Workshop 201
Transfer learning for vision-based tactile sensing
Due to the complexity of modeling the elastic properties of materials, the
use of machine learning algorithms is continuously increasing for tactile
sensing applications. Recent advances in deep neural networks applied to
computer vision make vision-based tactile sensors very appealing for their
high-resolution and low cost. A soft optical tactile sensor that is scalable to
large surfaces with arbitrary shape is discussed in this paper. A supervised
learning algorithm trains a model that is able to reconstruct the normal force
distribution on the sensor's surface, purely from the images recorded by an
internal camera. In order to reduce the training times and the need for large
datasets, a calibration procedure is proposed to transfer the acquired
knowledge across multiple sensors while maintaining satisfactory performance.Comment: Accompanying video: https://youtu.be/CdYK5I6Scc
Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs
Deep learning models are often successfully trained using gradient descent,
despite the worst case hardness of the underlying non-convex optimization
problem. The key question is then under what conditions can one prove that
optimization will succeed. Here we provide a strong result of this kind. We
consider a neural net with one hidden layer and a convolutional structure with
no overlap and a ReLU activation function. For this architecture we show that
learning is NP-complete in the general case, but that when the input
distribution is Gaussian, gradient descent converges to the global optimum in
polynomial time. To the best of our knowledge, this is the first global
optimality guarantee of gradient descent on a convolutional neural network with
ReLU activations
On the Computational Efficiency of Training Neural Networks
It is well-known that neural networks are computationally hard to train. On
the other hand, in practice, modern day neural networks are trained efficiently
using SGD and a variety of tricks that include different activation functions
(e.g. ReLU), over-specification (i.e., train networks which are larger than
needed), and regularization. In this paper we revisit the computational
complexity of training neural networks from a modern perspective. We provide
both positive and negative results, some of them yield new provably efficient
and practical algorithms for training certain types of neural networks.Comment: Section 2 is revised due to a mistak
- …