32,746 research outputs found

    Tight Hardness Results for Training Depth-2 ReLU Networks

    Get PDF
    We prove several hardness results for training depth-2 neural networks with the ReLU activation function; these networks are simply weighted sums (that may include negative coefficients) of ReLUs. Our goal is to output a depth-2 neural network that minimizes the square loss with respect to a given training set. We prove that this problem is NP-hard already for a network with a single ReLU. We also prove NP-hardness for outputting a weighted sum of kk ReLUs minimizing the squared error (for k>1k>1) even in the realizable setting (i.e., when the labels are consistent with an unknown depth-2 ReLU network). We are also able to obtain lower bounds on the running time in terms of the desired additive error ϵ\epsilon. To obtain our lower bounds, we use the Gap Exponential Time Hypothesis (Gap-ETH) as well as a new hypothesis regarding the hardness of approximating the well known Densest κ\kappa-Subgraph problem in subexponential time (these hypotheses are used separately in proving different lower bounds). For example, we prove that under reasonable hardness assumptions, any proper learning algorithm for finding the best fitting ReLU must run in time exponential in 1/ϵ21/\epsilon^2. Together with a previous work regarding improperly learning a ReLU (Goel et al., COLT'17), this implies the first separation between proper and improper algorithms for learning a ReLU. We also study the problem of properly learning a depth-2 network of ReLUs with bounded weights giving new (worst-case) upper bounds on the running time needed to learn such networks both in the realizable and agnostic settings. Our upper bounds on the running time essentially matches our lower bounds in terms of the dependency on ϵ\epsilon.Comment: To appear in ITCS'2

    Pre-Reduction Graph Products: Hardnesses of Properly Learning DFAs and Approximating EDP on DAGs

    Full text link
    The study of graph products is a major research topic and typically concerns the term f(GH)f(G*H), e.g., to show that f(GH)=f(G)f(H)f(G*H)=f(G)f(H). In this paper, we study graph products in a non-standard form f(R[GH]f(R[G*H] where RR is a "reduction", a transformation of any graph into an instance of an intended optimization problem. We resolve some open problems as applications. (1) A tight n1ϵn^{1-\epsilon}-approximation hardness for the minimum consistent deterministic finite automaton (DFA) problem, where nn is the sample size. Due to Board and Pitt [Theoretical Computer Science 1992], this implies the hardness of properly learning DFAs assuming NPRPNP\neq RP (the weakest possible assumption). (2) A tight n1/2ϵn^{1/2-\epsilon} hardness for the edge-disjoint paths (EDP) problem on directed acyclic graphs (DAGs), where nn denotes the number of vertices. (3) A tight hardness of packing vertex-disjoint kk-cycles for large kk. (4) An alternative (and perhaps simpler) proof for the hardness of properly learning DNF, CNF and intersection of halfspaces [Alekhnovich et al., FOCS 2004 and J. Comput.Syst.Sci. 2008]

    Learning and selection

    Get PDF
    Are learning processes selection processes? This paper takes a slightly modified version of the account of selection presented in Hull et al. (Behav Brain Sci 24:511–527, 2001) and asks whether it applies to learning processes. The answer is that although some learning processes are selectional, many are not. This has consequences for teleological theories of mental content. According to these theories, mental states have content in virtue of having proper functions, and they have proper functions in virtue of being the products of selection processes. For some mental states, it is plausible that the relevant selection process is natural selection, but there are many for which it is not plausible. One response to this (due to David Papineau) is to suggest that the learning processes by which we acquire non-innate mental states are selection processes and can therefore confer proper functions on mental states. This paper considers two ways in which this response could be elaborated, and argues that neither of them succeed: the teleosemanticist cannot rely on the claim that learning processes are selection processes in order to justify the attribution of proper functions to beliefs

    Order-Revealing Encryption and the Hardness of Private Learning

    Full text link
    An order-revealing encryption scheme gives a public procedure by which two ciphertexts can be compared to reveal the ordering of their underlying plaintexts. We show how to use order-revealing encryption to separate computationally efficient PAC learning from efficient (ϵ,δ)(\epsilon, \delta)-differentially private PAC learning. That is, we construct a concept class that is efficiently PAC learnable, but for which every efficient learner fails to be differentially private. This answers a question of Kasiviswanathan et al. (FOCS '08, SIAM J. Comput. '11). To prove our result, we give a generic transformation from an order-revealing encryption scheme into one with strongly correct comparison, which enables the consistent comparison of ciphertexts that are not obtained as the valid encryption of any message. We believe this construction may be of independent interest.Comment: 28 page

    From average case complexity to improper learning complexity

    Full text link
    The basic problem in the PAC model of computational learning theory is to determine which hypothesis classes are efficiently learnable. There is presently a dearth of results showing hardness of learning problems. Moreover, the existing lower bounds fall short of the best known algorithms. The biggest challenge in proving complexity results is to establish hardness of {\em improper learning} (a.k.a. representation independent learning).The difficulty in proving lower bounds for improper learning is that the standard reductions from NP\mathbf{NP}-hard problems do not seem to apply in this context. There is essentially only one known approach to proving lower bounds on improper learning. It was initiated in (Kearns and Valiant 89) and relies on cryptographic assumptions. We introduce a new technique for proving hardness of improper learning, based on reductions from problems that are hard on average. We put forward a (fairly strong) generalization of Feige's assumption (Feige 02) about the complexity of refuting random constraint satisfaction problems. Combining this assumption with our new technique yields far reaching implications. In particular, 1. Learning DNF\mathrm{DNF}'s is hard. 2. Agnostically learning halfspaces with a constant approximation ratio is hard. 3. Learning an intersection of ω(1)\omega(1) halfspaces is hard.Comment: 34 page
    corecore