32,746 research outputs found
Tight Hardness Results for Training Depth-2 ReLU Networks
We prove several hardness results for training depth-2 neural networks with
the ReLU activation function; these networks are simply weighted sums (that may
include negative coefficients) of ReLUs. Our goal is to output a depth-2 neural
network that minimizes the square loss with respect to a given training set. We
prove that this problem is NP-hard already for a network with a single ReLU. We
also prove NP-hardness for outputting a weighted sum of ReLUs minimizing
the squared error (for ) even in the realizable setting (i.e., when the
labels are consistent with an unknown depth-2 ReLU network). We are also able
to obtain lower bounds on the running time in terms of the desired additive
error . To obtain our lower bounds, we use the Gap Exponential Time
Hypothesis (Gap-ETH) as well as a new hypothesis regarding the hardness of
approximating the well known Densest -Subgraph problem in
subexponential time (these hypotheses are used separately in proving different
lower bounds). For example, we prove that under reasonable hardness
assumptions, any proper learning algorithm for finding the best fitting ReLU
must run in time exponential in . Together with a previous work
regarding improperly learning a ReLU (Goel et al., COLT'17), this implies the
first separation between proper and improper algorithms for learning a ReLU. We
also study the problem of properly learning a depth-2 network of ReLUs with
bounded weights giving new (worst-case) upper bounds on the running time needed
to learn such networks both in the realizable and agnostic settings. Our upper
bounds on the running time essentially matches our lower bounds in terms of the
dependency on .Comment: To appear in ITCS'2
Pre-Reduction Graph Products: Hardnesses of Properly Learning DFAs and Approximating EDP on DAGs
The study of graph products is a major research topic and typically concerns
the term , e.g., to show that . In this paper, we
study graph products in a non-standard form where is a
"reduction", a transformation of any graph into an instance of an intended
optimization problem. We resolve some open problems as applications.
(1) A tight -approximation hardness for the minimum
consistent deterministic finite automaton (DFA) problem, where is the
sample size. Due to Board and Pitt [Theoretical Computer Science 1992], this
implies the hardness of properly learning DFAs assuming (the
weakest possible assumption).
(2) A tight hardness for the edge-disjoint paths (EDP)
problem on directed acyclic graphs (DAGs), where denotes the number of
vertices.
(3) A tight hardness of packing vertex-disjoint -cycles for large .
(4) An alternative (and perhaps simpler) proof for the hardness of properly
learning DNF, CNF and intersection of halfspaces [Alekhnovich et al., FOCS 2004
and J. Comput.Syst.Sci. 2008]
Learning and selection
Are learning processes selection processes? This paper takes a slightly modified version of the account of selection presented in Hull et al. (Behav Brain Sci 24:511–527, 2001) and asks whether it applies to learning processes. The answer is that although some learning processes are selectional, many are not. This has consequences for teleological theories of mental content. According to these theories, mental states have content in virtue of having proper functions, and they have proper functions in virtue of being the products of selection processes. For some mental states, it is plausible that the relevant selection process is natural selection, but there are many for which it is not plausible. One response to this (due to David Papineau) is to suggest that the learning processes by which we acquire non-innate mental states are selection processes and can therefore confer proper functions on mental states. This paper considers two ways in which this response could be elaborated, and argues that neither of them succeed: the teleosemanticist cannot rely on the claim that learning processes are selection processes in order to justify the attribution of proper functions to beliefs
Order-Revealing Encryption and the Hardness of Private Learning
An order-revealing encryption scheme gives a public procedure by which two
ciphertexts can be compared to reveal the ordering of their underlying
plaintexts. We show how to use order-revealing encryption to separate
computationally efficient PAC learning from efficient -differentially private PAC learning. That is, we construct a concept
class that is efficiently PAC learnable, but for which every efficient learner
fails to be differentially private. This answers a question of Kasiviswanathan
et al. (FOCS '08, SIAM J. Comput. '11).
To prove our result, we give a generic transformation from an order-revealing
encryption scheme into one with strongly correct comparison, which enables the
consistent comparison of ciphertexts that are not obtained as the valid
encryption of any message. We believe this construction may be of independent
interest.Comment: 28 page
From average case complexity to improper learning complexity
The basic problem in the PAC model of computational learning theory is to
determine which hypothesis classes are efficiently learnable. There is
presently a dearth of results showing hardness of learning problems. Moreover,
the existing lower bounds fall short of the best known algorithms.
The biggest challenge in proving complexity results is to establish hardness
of {\em improper learning} (a.k.a. representation independent learning).The
difficulty in proving lower bounds for improper learning is that the standard
reductions from -hard problems do not seem to apply in this
context. There is essentially only one known approach to proving lower bounds
on improper learning. It was initiated in (Kearns and Valiant 89) and relies on
cryptographic assumptions.
We introduce a new technique for proving hardness of improper learning, based
on reductions from problems that are hard on average. We put forward a (fairly
strong) generalization of Feige's assumption (Feige 02) about the complexity of
refuting random constraint satisfaction problems. Combining this assumption
with our new technique yields far reaching implications. In particular,
1. Learning 's is hard.
2. Agnostically learning halfspaces with a constant approximation ratio is
hard.
3. Learning an intersection of halfspaces is hard.Comment: 34 page
- …