11 research outputs found

    On the Complexity of Random Satisfiability Problems with Planted Solutions

    Get PDF
    The problem of identifying a planted assignment given a random kk-SAT formula consistent with the assignment exhibits a large algorithmic gap: while the planted solution becomes unique and can be identified given a formula with O(nlogn)O(n\log n) clauses, there are distributions over clauses for which the best known efficient algorithms require nk/2n^{k/2} clauses. We propose and study a unified model for planted kk-SAT, which captures well-known special cases. An instance is described by a planted assignment σ\sigma and a distribution on clauses with kk literals. We define its distribution complexity as the largest rr for which the distribution is not rr-wise independent (1rk1 \le r \le k for any distribution with a planted assignment). Our main result is an unconditional lower bound, tight up to logarithmic factors, for statistical (query) algorithms [Kearns 1998, Feldman et. al 2012], matching known upper bounds, which, as we show, can be implemented using a statistical algorithm. Since known approaches for problems over distributions have statistical analogues (spectral, MCMC, gradient-based, convex optimization etc.), this lower bound provides a rigorous explanation of the observed algorithmic gap. The proof introduces a new general technique for the analysis of statistical query algorithms. It also points to a geometric paring phenomenon in the space of all planted assignments. We describe consequences of our lower bounds to Feige's refutation hypothesis [Feige 2002] and to lower bounds on general convex programs that solve planted kk-SAT. Our bounds also extend to other planted kk-CSP models, and, in particular, provide concrete evidence for the security of Goldreich's one-way function and the associated pseudorandom generator when used with a sufficiently hard predicate [Goldreich 2000].Comment: Extended abstract appeared in STOC 201

    Efficient Algorithms and Lower Bounds for Robust Linear Regression

    Full text link
    We study the problem of high-dimensional linear regression in a robust model where an ϵ\epsilon-fraction of the samples can be adversarially corrupted. We focus on the fundamental setting where the covariates of the uncorrupted samples are drawn from a Gaussian distribution N(0,Σ)\mathcal{N}(0, \Sigma) on Rd\mathbb{R}^d. We give nearly tight upper bounds and computational lower bounds for this problem. Specifically, our main contributions are as follows: For the case that the covariance matrix is known to be the identity, we give a sample near-optimal and computationally efficient algorithm that outputs a candidate hypothesis vector β^\widehat{\beta} which approximates the unknown regression vector β\beta within 2\ell_2-norm O(ϵlog(1/ϵ)σ)O(\epsilon \log(1/\epsilon) \sigma), where σ\sigma is the standard deviation of the random observation noise. An error of Ω(ϵσ)\Omega (\epsilon \sigma) is information-theoretically necessary, even with infinite sample size. Prior work gave an algorithm for this problem with sample complexity Ω~(d2/ϵ2)\tilde{\Omega}(d^2/\epsilon^2) whose error guarantee scales with the 2\ell_2-norm of β\beta. For the case of unknown covariance, we show that we can efficiently achieve the same error guarantee as in the known covariance case using an additional O~(d2/ϵ2)\tilde{O}(d^2/\epsilon^2) unlabeled examples. On the other hand, an error of O(ϵσ)O(\epsilon \sigma) can be information-theoretically attained with O(d/ϵ2)O(d/\epsilon^2) samples. We prove a Statistical Query (SQ) lower bound providing evidence that this quadratic tradeoff in the sample size is inherent. More specifically, we show that any polynomial time SQ learning algorithm for robust linear regression (in Huber's contamination model) with estimation complexity O(d2c)O(d^{2-c}), where c>0c>0 is an arbitrarily small constant, must incur an error of Ω(ϵσ)\Omega(\sqrt{\epsilon} \sigma)

    Statistical Query Algorithms for Mean Vector Estimation and Stochastic Convex Optimization

    Get PDF
    Stochastic convex optimization, by which the objective is the expectation of a random convex function, is an important and widely used method with numerous applications in machine learning, statistics, operations research, and other areas. We study the complexity of stochastic convex optimization given only statistical query (SQ) access to the objective function. We show that well-known and popular first-order iterative methods can be implemented using only statistical queries. For many cases of interest, we derive nearly matching upper and lower bounds on the estimation (sample) complexity, including linear optimization in the most general setting. We then present several consequences for machine learning, differential privacy, and proving concrete lower bounds on the power of convex optimization–based methods. The key ingredient of our work is SQ algorithms and lower bounds for estimating the mean vector of a distribution over vectors supported on a convex body in Rd. This natural problem has not been previously studied, and we show that our solutions can be used to get substantially improved SQ versions of Perceptron and other online algorithms for learning halfspaces

    Statistical Query Algorithms for Stochastic Convex Optimization

    No full text
    corecore