16 research outputs found
A Regularity Lemma and Low-Weight Approximators for Low-Degree Polynomial Threshold Functions
We give a “regularity lemma ” for degree-d polynomial threshold functions (PTFs) over the Boolean cube {−1, 1} n. Roughly speaking, this result shows that every degree-d PTF can be decomposed into a constant number of subfunctions such that almost all of the subfunctions are close to being regular PTFs. Here a “regular ” PTF is a PTF sign(p(x)) where the influence of each variable on the polynomial p(x) is a small fraction of the total influence of p. As an application of this regularity lemma, we prove that for any constants d ≥ 1, ɛ> 0, every degree-d PTF over n variables can be approximated to accuracy ɛ by a constant-degree PTF that has integer weights of total magnitude O(n d). This weight bound is shown to be optimal up to constant factors
The Correct Exponent for the Gotsman-Linial Conjecture
We prove a new bound on the average sensitivity of polynomial threshold
functions. In particular we show that a polynomial threshold function of degree
in at most variables has average sensitivity at most
. For fixed the exponent
in terms of in this bound is known to be optimal. This bound makes
significant progress towards the Gotsman-Linial Conjecture which would put the
correct bound at
Moment-Matching Polynomials
We give a new framework for proving the existence of low-degree, polynomial
approximators for Boolean functions with respect to broad classes of
non-product distributions. Our proofs use techniques related to the classical
moment problem and deviate significantly from known Fourier-based methods,
which require the underlying distribution to have some product structure.
Our main application is the first polynomial-time algorithm for agnostically
learning any function of a constant number of halfspaces with respect to any
log-concave distribution (for any constant accuracy parameter). This result was
not known even for the case of learning the intersection of two halfspaces
without noise. Additionally, we show that in the "smoothed-analysis" setting,
the above results hold with respect to distributions that have sub-exponential
tails, a property satisfied by many natural and well-studied distributions in
machine learning.
Given that our algorithms can be implemented using Support Vector Machines
(SVMs) with a polynomial kernel, these results give a rigorous theoretical
explanation as to why many kernel methods work so well in practice
Weighted Polynomial Approximations: Limits for Learning and Pseudorandomness
Polynomial approximations to boolean functions have led to many positive
results in computer science. In particular, polynomial approximations to the
sign function underly algorithms for agnostically learning halfspaces, as well
as pseudorandom generators for halfspaces. In this work, we investigate the
limits of these techniques by proving inapproximability results for the sign
function.
Firstly, the polynomial regression algorithm of Kalai et al. (SIAM J. Comput.
2008) shows that halfspaces can be learned with respect to log-concave
distributions on in the challenging agnostic learning model. The
power of this algorithm relies on the fact that under log-concave
distributions, halfspaces can be approximated arbitrarily well by low-degree
polynomials. We ask whether this technique can be extended beyond log-concave
distributions, and establish a negative result. We show that polynomials of any
degree cannot approximate the sign function to within arbitrarily low error for
a large class of non-log-concave distributions on the real line, including
those with densities proportional to .
Secondly, we investigate the derandomization of Chernoff-type concentration
inequalities. Chernoff-type tail bounds on sums of independent random variables
have pervasive applications in theoretical computer science. Schmidt et al.
(SIAM J. Discrete Math. 1995) showed that these inequalities can be established
for sums of random variables with only -wise independence,
for a tail probability of . We show that their results are tight up to
constant factors.
These results rely on techniques from weighted approximation theory, which
studies how well functions on the real line can be approximated by polynomials
under various distributions. We believe that these techniques will have further
applications in other areas of computer science.Comment: 22 page
Non interactive simulation of correlated distributions is decidable
A basic problem in information theory is the following: Let be an arbitrary distribution where the marginals
and are (potentially) correlated. Let Alice and Bob
be two players where Alice gets samples and Bob gets
samples and for all , . What
joint distributions can be simulated by Alice and Bob without any
interaction?
Classical works in information theory by G{\'a}cs-K{\"o}rner and Wyner answer
this question when at least one of or is the
distribution on where each marginal is unbiased and
identical. However, other than this special case, the answer to this question
is understood in very few cases. Recently, Ghazi, Kamath and Sudan showed that
this problem is decidable for supported on . We extend their result to supported on any finite
alphabet.
We rely on recent results in Gaussian geometry (by the authors) as well as a
new \emph{smoothing argument} inspired by the method of \emph{boosting} from
learning theory and potential function arguments from complexity theory and
additive combinatorics.Comment: The reduction for non-interactive simulation for general source
distribution to the Gaussian case was incorrect in the previous version. It
has been rectified no
Bounded Independence Fools Degree-2 Threshold Functions
Let x be a random vector coming from any k-wise independent distribution over
{-1,1}^n. For an n-variate degree-2 polynomial p, we prove that E[sgn(p(x))] is
determined up to an additive epsilon for k = poly(1/epsilon). This answers an
open question of Diakonikolas et al. (FOCS 2009). Using standard constructions
of k-wise independent distributions, we obtain a broad class of explicit
generators that epsilon-fool the class of degree-2 threshold functions with
seed length log(n)*poly(1/epsilon).
Our approach is quite robust: it easily extends to yield that the
intersection of any constant number of degree-2 threshold functions is
epsilon-fooled by poly(1/epsilon)-wise independence. Our results also hold if
the entries of x are k-wise independent standard normals, implying for example
that bounded independence derandomizes the Goemans-Williamson hyperplane
rounding scheme.
To achieve our results, we introduce a technique we dub multivariate
FT-mollification, a generalization of the univariate form introduced by Kane et
al. (SODA 2010) in the context of streaming algorithms. Along the way we prove
a generalized hypercontractive inequality for quadratic forms which takes the
operator norm of the associated matrix into account. These techniques may be of
independent interest.Comment: Using v1 numbering: removed Lemma G.5 from the Appendix (it was
wrong). Net effect is that Theorem G.6 reduces the m^6 dependence of Theorem
8.1 to m^4, not m^