Search CORE

10,266 research outputs found

Learning to Sample Hard Instances for Graph Algorithms

Author: Kashima Hisashi
Sato Ryoma
Yamada Makoto
Publication venue
Publication date: 03/10/2019
Field of study

Hard instances, which require a long time for a specific algorithm to solve, help (1) analyze the algorithm for accelerating it and (2) build a good benchmark for evaluating the performance of algorithms. There exist several efforts for automatic generation of hard instances. For example, evolutionary algorithms have been utilized to generate hard instances. However, they generate only finite number of hard instances. The merit of such methods is limited because it is difficult to extract meaningful patterns from small number of instances. We seek for a probabilistic generator of hard instances. Once the generative distribution of hard instances is obtained, we can sample a variety of hard instances to build a benchmark, and we can extract meaningful patterns of hard instances from sampled instances. The existing methods for modeling the hard instance distribution rely on parameters or rules that are found by domain experts; however, they are specific to the problem. Hence, it is challenging to model the distribution for general cases. In this paper, we focus on graph problems. We propose HiSampler, the hard instance sampler, to model the hard instance distribution of graph algorithms. HiSampler makes it possible to obtain the distribution of hard instances without hand-engineered features. To the best of our knowledge, this is the first method to learn the distribution of hard instances using machine learning. Through experiments, we demonstrate that our proposed method can generate instances that are a few to several orders of magnitude harder than the random-based approach in many settings. In particular, our method outperforms rule-based algorithms in the 3-coloring problem.Comment: 16 pages, 4 figures, accepted by ACML 201

arXiv.org e-Print Archive

$\ell_1$ -regularized Neural Networks are Improperly Learnable in Polynomial Time

Author: Jordan Michael I.
Lee Jason D.
Zhang Yuchen
Publication venue
Publication date: 13/10/2015
Field of study

We study the improper learning of multi-layer neural networks. Suppose that the neural network to be learned has

k

hidden layers and that the

\ell_1

-norm of the incoming weights of any neuron is bounded by

L

. We present a kernel-based method, such that with probability at least

1 - \delta

, it learns a predictor whose generalization error is at most

\epsilon

worse than that of the neural network. The sample complexity and the time complexity of the presented method are polynomial in the input dimension and in

(1/\epsilon,\log(1/\delta),F(k,L))

, where

F(k,L)

is a function depending on

(k,L)

and on the activation function, independent of the number of neurons. The algorithm applies to both sigmoid-like activation functions and ReLU-like activation functions. It implies that any sufficiently sparse neural network is learnable in polynomial time.Comment: 16 page

arXiv.org e-Print Archive

Motivating the Rules of the Game for Adversarial Example Research

Author: Adams Ryan P.
Andersen David
Dahl George E.
Gilmer Justin
Goodfellow Ian
Publication venue
Publication date: 19/07/2018
Field of study

Advances in machine learning have led to broad deployment of systems with impressive performance on important problems. Nonetheless, these systems can be induced to make errors on data that are surprisingly similar to examples the learned system handles correctly. The existence of these errors raises a variety of questions about out-of-sample generalization and whether bad actors might use such examples to abuse deployed systems. As a result of these security concerns, there has been a flurry of recent papers proposing algorithms to defend against such malicious perturbations of correctly handled examples. It is unclear how such misclassifications represent a different kind of security problem than other errors, or even other attacker-produced examples that have no specific relationship to an uncorrupted input. In this paper, we argue that adversarial example defense papers have, to date, mostly considered abstract, toy games that do not relate to any specific security concern. Furthermore, defense papers have not yet precisely described all the abilities and limitations of attackers that would be relevant in practical security. Towards this end, we establish a taxonomy of motivations, constraints, and abilities for more plausible adversaries. Finally, we provide a series of recommendations outlining a path forward for future work to more clearly articulate the threat model and perform more meaningful evaluation

arXiv.org e-Print Archive

On the Learnability of Deep Random Networks

Author: Das Abhimanyu
Gollapudi Sreenivas
Kumar Ravi
Panigrahy Rina
Publication venue
Publication date: 08/04/2019
Field of study

In this paper we study the learnability of deep random networks from both theoretical and practical points of view. On the theoretical front, we show that the learnability of random deep networks with sign activation drops exponentially with its depth. On the practical front, we find that the learnability drops sharply with depth even with the state-of-the-art training methods, suggesting that our stylized theoretical results are closer to reality

arXiv.org e-Print Archive

Learning Halfspaces and Neural Networks with Random Initialization

Author: Jordan Michael I.
Lee Jason D.
Wainwright Martin J.
Zhang Yuchen
Publication venue
Publication date: 24/11/2015
Field of study

We study non-convex empirical risk minimization for learning halfspaces and neural networks. For loss functions that are

L

-Lipschitz continuous, we present algorithms to learn halfspaces and multi-layer neural networks that achieve arbitrarily small excess risk

\epsilon>0

. The time complexity is polynomial in the input dimension

d

and the sample size

n

, but exponential in the quantity

(L/\epsilon^2)\log(L/\epsilon)

. These algorithms run multiple rounds of random initialization followed by arbitrary optimization steps. We further show that if the data is separable by some neural network with constant margin

\gamma>0

, then there is a polynomial-time algorithm for learning a neural network that separates the training data with margin

\Omega(\gamma)

. As a consequence, the algorithm achieves arbitrary generalization error

\epsilon>0

with

{\rm poly}(d,1/\epsilon)

sample and time complexity. We establish the same learnability result when the labels are randomly flipped with probability

\eta<1/2

.Comment: 31 page

arXiv.org e-Print Archive

Mixing Complexity and its Applications to Neural Networks

Author: Moshkovitz Michal
Tishby Naftali
Publication venue
Publication date: 02/03/2017
Field of study

We suggest analyzing neural networks through the prism of space constraints. We observe that most training algorithms applied in practice use bounded memory, which enables us to use a new notion introduced in the study of space-time tradeoffs that we call mixing complexity. This notion was devised in order to measure the (in)ability to learn using a bounded-memory algorithm. In this paper we describe how we use mixing complexity to obtain new results on what can and cannot be learned using neural networks

arXiv.org e-Print Archive

Revised Note on Learning Algorithms for Quadratic Assignment with Graph Neural Networks

Author: Bandeira Afonso S.
Bruna Joan
Nowak Alex
Villar Soledad
Publication venue
Publication date: 30/08/2018
Field of study

Inverse problems correspond to a certain type of optimization problems formulated over appropriate input distributions. Recently, there has been a growing interest in understanding the computational hardness of these optimization problems, not only in the worst case, but in an average-complexity sense under this same input distribution. In this revised note, we are interested in studying another aspect of hardness, related to the ability to learn how to solve a problem by simply observing a collection of previously solved instances. These 'planted solutions' are used to supervise the training of an appropriate predictive model that parametrizes a broad class of algorithms, with the hope that the resulting model will provide good accuracy-complexity tradeoffs in the average sense. We illustrate this setup on the Quadratic Assignment Problem, a fundamental problem in Network Science. We observe that data-driven models based on Graph Neural Networks offer intriguingly good performance, even in regimes where standard relaxation based techniques appear to suffer.Comment: Revised note to arXiv:1706.07450v1 that appeared in IEEE Data Science Workshop 201

arXiv.org e-Print Archive

Transfer learning for vision-based tactile sensing

Author: D'Andrea Raffaello
Sferrazza Carmelo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/06/2020
Field of study

Due to the complexity of modeling the elastic properties of materials, the use of machine learning algorithms is continuously increasing for tactile sensing applications. Recent advances in deep neural networks applied to computer vision make vision-based tactile sensors very appealing for their high-resolution and low cost. A soft optical tactile sensor that is scalable to large surfaces with arbitrary shape is discussed in this paper. A supervised learning algorithm trains a model that is able to reconstruct the normal force distribution on the sensor's surface, purely from the images recorded by an internal camera. In order to reduce the training times and the need for large datasets, a calibration procedure is proposed to transfer the acquired knowledge across multiple sensors while maintaining satisfactory performance.Comment: Accompanying video: https://youtu.be/CdYK5I6Scc

arXiv.org e-Print Archive

Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs

Author: Brutzkus Alon
Globerson Amir
Publication venue
Publication date: 25/02/2017
Field of study

Deep learning models are often successfully trained using gradient descent, despite the worst case hardness of the underlying non-convex optimization problem. The key question is then under what conditions can one prove that optimization will succeed. Here we provide a strong result of this kind. We consider a neural net with one hidden layer and a convolutional structure with no overlap and a ReLU activation function. For this architecture we show that learning is NP-complete in the general case, but that when the input distribution is Gaussian, gradient descent converges to the global optimum in polynomial time. To the best of our knowledge, this is the first global optimality guarantee of gradient descent on a convolutional neural network with ReLU activations

arXiv.org e-Print Archive

On the Computational Efficiency of Training Neural Networks

Author: Livni Roi
Shalev-Shwartz Shai
Shamir Ohad
Publication venue
Publication date: 28/10/2014
Field of study

It is well-known that neural networks are computationally hard to train. On the other hand, in practice, modern day neural networks are trained efficiently using SGD and a variety of tricks that include different activation functions (e.g. ReLU), over-specification (i.e., train networks which are larger than needed), and regularization. In this paper we revisit the computational complexity of training neural networks from a modern perspective. We provide both positive and negative results, some of them yield new provably efficient and practical algorithms for training certain types of neural networks.Comment: Section 2 is revised due to a mistak

arXiv.org e-Print Archive