Search CORE

11 research outputs found

On the Complexity of Random Satisfiability Problems with Planted Solutions

Author: Feldman Vitaly
Perkins Will
Vempala Santosh
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 06/03/2018
Field of study

The problem of identifying a planted assignment given a random

k

-SAT formula consistent with the assignment exhibits a large algorithmic gap: while the planted solution becomes unique and can be identified given a formula with

O(n\log n)

clauses, there are distributions over clauses for which the best known efficient algorithms require

n^{k/2}

clauses. We propose and study a unified model for planted

k

-SAT, which captures well-known special cases. An instance is described by a planted assignment

\sigma

and a distribution on clauses with

k

literals. We define its distribution complexity as the largest

r

for which the distribution is not

r

-wise independent (

1 \le r \le k

for any distribution with a planted assignment). Our main result is an unconditional lower bound, tight up to logarithmic factors, for statistical (query) algorithms [Kearns 1998, Feldman et. al 2012], matching known upper bounds, which, as we show, can be implemented using a statistical algorithm. Since known approaches for problems over distributions have statistical analogues (spectral, MCMC, gradient-based, convex optimization etc.), this lower bound provides a rigorous explanation of the observed algorithmic gap. The proof introduces a new general technique for the analysis of statistical query algorithms. It also points to a geometric paring phenomenon in the space of all planted assignments. We describe consequences of our lower bounds to Feige's refutation hypothesis [Feige 2002] and to lower bounds on general convex programs that solve planted

k

-SAT. Our bounds also extend to other planted

k

-CSP models, and, in particular, provide concrete evidence for the security of Goldreich's one-way function and the associated pseudorandom generator when used with a sufficiently hard predicate [Goldreich 2000].Comment: Extended abstract appeared in STOC 201

arXiv.org e-Print Archive

Crossref

University of Birmingham Research Portal

Efficient Algorithms and Lower Bounds for Robust Linear Regression

Author: Diakonikolas Ilias
Kong Weihao
Stewart Alistair
Publication venue
Publication date: 31/05/2018
Field of study

We study the problem of high-dimensional linear regression in a robust model where an

\epsilon

-fraction of the samples can be adversarially corrupted. We focus on the fundamental setting where the covariates of the uncorrupted samples are drawn from a Gaussian distribution

\mathcal{N}(0, \Sigma)

\mathbb{R}^d

. We give nearly tight upper bounds and computational lower bounds for this problem. Specifically, our main contributions are as follows: For the case that the covariance matrix is known to be the identity, we give a sample near-optimal and computationally efficient algorithm that outputs a candidate hypothesis vector

\widehat{\beta}

which approximates the unknown regression vector

\beta

within

\ell_2

-norm

O(\epsilon \log(1/\epsilon) \sigma)

, where

\sigma

is the standard deviation of the random observation noise. An error of

\Omega (\epsilon \sigma)

is information-theoretically necessary, even with infinite sample size. Prior work gave an algorithm for this problem with sample complexity

\tilde{\Omega}(d^2/\epsilon^2)

whose error guarantee scales with the

\ell_2

-norm of

\beta

. For the case of unknown covariance, we show that we can efficiently achieve the same error guarantee as in the known covariance case using an additional

\tilde{O}(d^2/\epsilon^2)

unlabeled examples. On the other hand, an error of

O(\epsilon \sigma)

can be information-theoretically attained with

O(d/\epsilon^2)

samples. We prove a Statistical Query (SQ) lower bound providing evidence that this quadratic tradeoff in the sample size is inherent. More specifically, we show that any polynomial time SQ learning algorithm for robust linear regression (in Huber's contamination model) with estimation complexity

O(d^{2-c})

, where

c>0

is an arbitrarily small constant, must incur an error of

\Omega(\sqrt{\epsilon} \sigma)

arXiv.org e-Print Archive

Crossref

Statistical Query Algorithms for Mean Vector Estimation and Stochastic Convex Optimization

Author: Feldman Vitaly
Guzmán Paredes Cristóbal Andrés
Vempala Santosh
Publication venue
Publication date: 01/08/2021
Field of study

Stochastic convex optimization, by which the objective is the expectation of a random convex function, is an important and widely used method with numerous applications in machine learning, statistics, operations research, and other areas. We study the complexity of stochastic convex optimization given only statistical query (SQ) access to the objective function. We show that well-known and popular first-order iterative methods can be implemented using only statistical queries. For many cases of interest, we derive nearly matching upper and lower bounds on the estimation (sample) complexity, including linear optimization in the most general setting. We then present several consequences for machine learning, differential privacy, and proving concrete lower bounds on the power of convex optimization–based methods. The key ingredient of our work is SQ algorithms and lower bounds for estimating the mean vector of a distribution over vectors supported on a convex body in Rd. This natural problem has not been previously studied, and we show that our solutions can be used to get substantially improved SQ versions of Perceptron and other online algorithms for learning halfspaces

University of Twente Research Information

Statistical Query Algorithms for Stochastic Convex Optimization

Author: Feldman V.
Guzman Paredes Cristobal
Vempala S.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 16/01/2017
Field of study

Crossref

CWI's Institutional Repository