159 research outputs found

    Weighted Polynomial Approximations: Limits for Learning and Pseudorandomness

    Get PDF
    Polynomial approximations to boolean functions have led to many positive results in computer science. In particular, polynomial approximations to the sign function underly algorithms for agnostically learning halfspaces, as well as pseudorandom generators for halfspaces. In this work, we investigate the limits of these techniques by proving inapproximability results for the sign function. Firstly, the polynomial regression algorithm of Kalai et al. (SIAM J. Comput. 2008) shows that halfspaces can be learned with respect to log-concave distributions on Rn\mathbb{R}^n in the challenging agnostic learning model. The power of this algorithm relies on the fact that under log-concave distributions, halfspaces can be approximated arbitrarily well by low-degree polynomials. We ask whether this technique can be extended beyond log-concave distributions, and establish a negative result. We show that polynomials of any degree cannot approximate the sign function to within arbitrarily low error for a large class of non-log-concave distributions on the real line, including those with densities proportional to exp⁑(βˆ’βˆ£x∣0.99)\exp(-|x|^{0.99}). Secondly, we investigate the derandomization of Chernoff-type concentration inequalities. Chernoff-type tail bounds on sums of independent random variables have pervasive applications in theoretical computer science. Schmidt et al. (SIAM J. Discrete Math. 1995) showed that these inequalities can be established for sums of random variables with only O(log⁑(1/Ξ΄))O(\log(1/\delta))-wise independence, for a tail probability of Ξ΄\delta. We show that their results are tight up to constant factors. These results rely on techniques from weighted approximation theory, which studies how well functions on the real line can be approximated by polynomials under various distributions. We believe that these techniques will have further applications in other areas of computer science.Comment: 22 page

    Algorithms and lower bounds for de Morgan formulas of low-communication leaf gates

    Get PDF
    The class FORMULA[s]∘GFORMULA[s] \circ \mathcal{G} consists of Boolean functions computable by size-ss de Morgan formulas whose leaves are any Boolean functions from a class G\mathcal{G}. We give lower bounds and (SAT, Learning, and PRG) algorithms for FORMULA[n1.99]∘GFORMULA[n^{1.99}]\circ \mathcal{G}, for classes G\mathcal{G} of functions with low communication complexity. Let R(k)(G)R^{(k)}(\mathcal{G}) be the maximum kk-party NOF randomized communication complexity of G\mathcal{G}. We show: (1) The Generalized Inner Product function GIPnkGIP^k_n cannot be computed in FORMULA[s]∘GFORMULA[s]\circ \mathcal{G} on more than 1/2+Ξ΅1/2+\varepsilon fraction of inputs for s=o ⁣(n2(kβ‹…4kβ‹…R(k)(G)β‹…log⁑(n/Ξ΅)β‹…log⁑(1/Ξ΅))2). s = o \! \left ( \frac{n^2}{ \left(k \cdot 4^k \cdot {R}^{(k)}(\mathcal{G}) \cdot \log (n/\varepsilon) \cdot \log(1/\varepsilon) \right)^{2}} \right). As a corollary, we get an average-case lower bound for GIPnkGIP^k_n against FORMULA[n1.99]∘PTFkβˆ’1FORMULA[n^{1.99}]\circ PTF^{k-1}. (2) There is a PRG of seed length n/2+O(sβ‹…R(2)(G)β‹…log⁑(s/Ξ΅)β‹…log⁑(1/Ξ΅))n/2 + O\left(\sqrt{s} \cdot R^{(2)}(\mathcal{G}) \cdot\log(s/\varepsilon) \cdot \log (1/\varepsilon) \right) that Ξ΅\varepsilon-fools FORMULA[s]∘GFORMULA[s] \circ \mathcal{G}. For FORMULA[s]∘LTFFORMULA[s] \circ LTF, we get the better seed length O(n1/2β‹…s1/4β‹…log⁑(n)β‹…log⁑(n/Ξ΅))O\left(n^{1/2}\cdot s^{1/4}\cdot \log(n)\cdot \log(n/\varepsilon)\right). This gives the first non-trivial PRG (with seed length o(n)o(n)) for intersections of nn half-spaces in the regime where Ρ≀1/n\varepsilon \leq 1/n. (3) There is a randomized 2nβˆ’t2^{n-t}-time #\#SAT algorithm for FORMULA[s]∘GFORMULA[s] \circ \mathcal{G}, where t=Ξ©(nsβ‹…log⁑2(s)β‹…R(2)(G))1/2.t=\Omega\left(\frac{n}{\sqrt{s}\cdot\log^2(s)\cdot R^{(2)}(\mathcal{G})}\right)^{1/2}. In particular, this implies a nontrivial #SAT algorithm for FORMULA[n1.99]∘LTFFORMULA[n^{1.99}]\circ LTF. (4) The Minimum Circuit Size Problem is not in FORMULA[n1.99]∘XORFORMULA[n^{1.99}]\circ XOR. On the algorithmic side, we show that FORMULA[n1.99]∘XORFORMULA[n^{1.99}] \circ XOR can be PAC-learned in time 2O(n/log⁑n)2^{O(n/\log n)}

    Pre-Reduction Graph Products: Hardnesses of Properly Learning DFAs and Approximating EDP on DAGs

    Full text link
    The study of graph products is a major research topic and typically concerns the term f(Gβˆ—H)f(G*H), e.g., to show that f(Gβˆ—H)=f(G)f(H)f(G*H)=f(G)f(H). In this paper, we study graph products in a non-standard form f(R[Gβˆ—H]f(R[G*H] where RR is a "reduction", a transformation of any graph into an instance of an intended optimization problem. We resolve some open problems as applications. (1) A tight n1βˆ’Ο΅n^{1-\epsilon}-approximation hardness for the minimum consistent deterministic finite automaton (DFA) problem, where nn is the sample size. Due to Board and Pitt [Theoretical Computer Science 1992], this implies the hardness of properly learning DFAs assuming NPβ‰ RPNP\neq RP (the weakest possible assumption). (2) A tight n1/2βˆ’Ο΅n^{1/2-\epsilon} hardness for the edge-disjoint paths (EDP) problem on directed acyclic graphs (DAGs), where nn denotes the number of vertices. (3) A tight hardness of packing vertex-disjoint kk-cycles for large kk. (4) An alternative (and perhaps simpler) proof for the hardness of properly learning DNF, CNF and intersection of halfspaces [Alekhnovich et al., FOCS 2004 and J. Comput.Syst.Sci. 2008]

    Who Should Predict? Exact Algorithms For Learning to Defer to Humans

    Full text link
    Automated AI classifiers should be able to defer the prediction to a human decision maker to ensure more accurate predictions. In this work, we jointly train a classifier with a rejector, which decides on each data point whether the classifier or the human should predict. We show that prior approaches can fail to find a human-AI system with low misclassification error even when there exists a linear classifier and rejector that have zero error (the realizable setting). We prove that obtaining a linear pair with low error is NP-hard even when the problem is realizable. To complement this negative result, we give a mixed-integer-linear-programming (MILP) formulation that can optimally solve the problem in the linear setting. However, the MILP only scales to moderately-sized problems. Therefore, we provide a novel surrogate loss function that is realizable-consistent and performs well empirically. We test our approaches on a comprehensive set of datasets and compare to a wide range of baselines.Comment: AISTATS 202
    • …
    corecore