868 research outputs found
Oracles Are Subtle But Not Malicious
Theoretical computer scientists have been debating the role of oracles since
the 1970's. This paper illustrates both that oracles can give us nontrivial
insights about the barrier problems in circuit complexity, and that they need
not prevent us from trying to solve those problems.
First, we give an oracle relative to which PP has linear-sized circuits, by
proving a new lower bound for perceptrons and low- degree threshold
polynomials. This oracle settles a longstanding open question, and generalizes
earlier results due to Beigel and to Buhrman, Fortnow, and Thierauf. More
importantly, it implies the first nonrelativizing separation of "traditional"
complexity classes, as opposed to interactive proof classes such as MIP and
MA-EXP. For Vinodchandran showed, by a nonrelativizing argument, that PP does
not have circuits of size n^k for any fixed k. We present an alternative proof
of this fact, which shows that PP does not even have quantum circuits of size
n^k with quantum advice. To our knowledge, this is the first nontrivial lower
bound on quantum circuit size.
Second, we study a beautiful algorithm of Bshouty et al. for learning Boolean
circuits in ZPP^NP. We show that the NP queries in this algorithm cannot be
parallelized by any relativizing technique, by giving an oracle relative to
which ZPP^||NP and even BPP^||NP have linear-size circuits. On the other hand,
we also show that the NP queries could be parallelized if P=NP. Thus, classes
such as ZPP^||NP inhabit a "twilight zone," where we need to distinguish
between relativizing and black-box techniques. Our results on this subject have
implications for computational learning theory as well as for the circuit
minimization problem.Comment: 20 pages, 1 figur
Immunity and Simplicity for Exact Counting and Other Counting Classes
Ko [RAIRO 24, 1990] and Bruschi [TCS 102, 1992] showed that in some
relativized world, PSPACE (in fact, ParityP) contains a set that is immune to
the polynomial hierarchy (PH). In this paper, we study and settle the question
of (relativized) separations with immunity for PH and the counting classes PP,
C_{=}P, and ParityP in all possible pairwise combinations. Our main result is
that there is an oracle A relative to which C_{=}P contains a set that is
immune to BPP^{ParityP}. In particular, this C_{=}P^A set is immune to PH^{A}
and ParityP^{A}. Strengthening results of Tor\'{a}n [J.ACM 38, 1991] and Green
[IPL 37, 1991], we also show that, in suitable relativizations, NP contains a
C_{=}P-immune set, and ParityP contains a PP^{PH}-immune set. This implies the
existence of a C_{=}P^{B}-simple set for some oracle B, which extends results
of Balc\'{a}zar et al. [SIAM J.Comp. 14, 1985; RAIRO 22, 1988] and provides the
first example of a simple set in a class not known to be contained in PH. Our
proof technique requires a circuit lower bound for ``exact counting'' that is
derived from Razborov's [Mat. Zametki 41, 1987] lower bound for majority.Comment: 20 page
A note on quantum algorithms and the minimal degree of epsilon-error polynomials for symmetric functions
The degrees of polynomials representing or approximating Boolean functions
are a prominent tool in various branches of complexity theory. Sherstov
recently characterized the minimal degree deg_{\eps}(f) among all polynomials
(over the reals) that approximate a symmetric function f:{0,1}^n-->{0,1} up to
worst-case error \eps: deg_{\eps}(f) = ~\Theta(deg_{1/3}(f) +
\sqrt{n\log(1/\eps)}). In this note we show how a tighter version (without the
log-factors hidden in the ~\Theta-notation), can be derived quite easily using
the close connection between polynomials and quantum algorithms.Comment: 7 pages LaTeX. 2nd version: corrected a few small inaccuracie
Model Interpretability through the Lens of Computational Complexity
In spite of several claims stating that some models are more interpretable
than others -- e.g., "linear models are more interpretable than deep neural
networks" -- we still lack a principled notion of interpretability to formally
compare among different classes of models. We make a step towards such a notion
by studying whether folklore interpretability claims have a correlate in terms
of computational complexity theory. We focus on local post-hoc explainability
queries that, intuitively, attempt to answer why individual inputs are
classified in a certain way by a given model. In a nutshell, we say that a
class of models is more interpretable than another class
, if the computational complexity of answering post-hoc queries
for models in is higher than for those in . We
prove that this notion provides a good theoretical counterpart to current
beliefs on the interpretability of models; in particular, we show that under
our definition and assuming standard complexity-theoretical assumptions (such
as PNP), both linear and tree-based models are strictly more
interpretable than neural networks. Our complexity analysis, however, does not
provide a clear-cut difference between linear and tree-based models, as we
obtain different results depending on the particular post-hoc explanations
considered. Finally, by applying a finer complexity analysis based on
parameterized complexity, we are able to prove a theoretical result suggesting
that shallow neural networks are more interpretable than deeper ones.Comment: 36 pages, including 9 pages of main text. This is the arXiv version
of the NeurIPS'2020 paper. Except from minor differences that could be
introduced by the publisher, the only difference should be the addition of
the appendix, which contains all the proofs that do not appear in the main
tex
Polynomials that Sign Represent Parity and Descartes' Rule of Signs
A real polynomial sign represents if
for every , the sign of equals
. Such sign representations are well-studied in computer
science and have applications to computational complexity and computational
learning theory. In this work, we present a systematic study of tradeoffs
between degree and sparsity of sign representations through the lens of the
parity function. We attempt to prove bounds that hold for any choice of set
. We show that sign representing parity over with the
degree in each variable at most requires sparsity at least . We show
that a tradeoff exists between sparsity and degree, by exhibiting a sign
representation that has higher degree but lower sparsity. We show a lower bound
of on the sparsity of polynomials of any degree representing
parity over . We prove exact bounds on the sparsity of such
polynomials for any two element subset . The main tool used is Descartes'
Rule of Signs, a classical result in algebra, relating the sparsity of a
polynomial to its number of real roots. As an application, we use bounds on
sparsity to derive circuit lower bounds for depth-two AND-OR-NOT circuits with
a Threshold Gate at the top. We use this to give a simple proof that such
circuits need size to compute parity, which improves the previous bound
of due to Goldmann (1997). We show a tight lower bound of
for the inner product function over .Comment: To appear in Computational Complexit
The intersection of two halfspaces has high threshold degree
The threshold degree of a Boolean function f:{0,1}^n->{-1,+1} is the least
degree of a real polynomial p such that f(x)=sgn p(x). We construct two
halfspaces on {0,1}^n whose intersection has threshold degree Theta(sqrt n), an
exponential improvement on previous lower bounds. This solves an open problem
due to Klivans (2002) and rules out the use of perceptron-based techniques for
PAC learning the intersection of two halfspaces, a central unresolved challenge
in computational learning. We also prove that the intersection of two majority
functions has threshold degree Omega(log n), which is tight and settles a
conjecture of O'Donnell and Servedio (2003).
Our proof consists of two parts. First, we show that for any nonconstant
Boolean functions f and g, the intersection f(x)^g(y) has threshold degree O(d)
if and only if ||f-F||_infty + ||g-G||_infty < 1 for some rational functions F,
G of degree O(d). Second, we settle the least degree required for approximating
a halfspace and a majority function to any given accuracy by rational
functions.
Our technique further allows us to make progress on Aaronson's challenge
(2008) and contribute strong direct product theorems for polynomial
representations of composed Boolean functions of the form F(f_1,...,f_n). In
particular, we give an improved lower bound on the approximate degree of the
AND-OR tree.Comment: Full version of the FOCS'09 pape
Why and When Can Deep -- but Not Shallow -- Networks Avoid the Curse of Dimensionality: a Review
The paper characterizes classes of functions for which deep learning can be
exponentially better than shallow learning. Deep convolutional networks are a
special case of these conditions, though weight sharing is not the main reason
for their exponential advantage
Weighted Polynomial Approximations: Limits for Learning and Pseudorandomness
Polynomial approximations to boolean functions have led to many positive
results in computer science. In particular, polynomial approximations to the
sign function underly algorithms for agnostically learning halfspaces, as well
as pseudorandom generators for halfspaces. In this work, we investigate the
limits of these techniques by proving inapproximability results for the sign
function.
Firstly, the polynomial regression algorithm of Kalai et al. (SIAM J. Comput.
2008) shows that halfspaces can be learned with respect to log-concave
distributions on in the challenging agnostic learning model. The
power of this algorithm relies on the fact that under log-concave
distributions, halfspaces can be approximated arbitrarily well by low-degree
polynomials. We ask whether this technique can be extended beyond log-concave
distributions, and establish a negative result. We show that polynomials of any
degree cannot approximate the sign function to within arbitrarily low error for
a large class of non-log-concave distributions on the real line, including
those with densities proportional to .
Secondly, we investigate the derandomization of Chernoff-type concentration
inequalities. Chernoff-type tail bounds on sums of independent random variables
have pervasive applications in theoretical computer science. Schmidt et al.
(SIAM J. Discrete Math. 1995) showed that these inequalities can be established
for sums of random variables with only -wise independence,
for a tail probability of . We show that their results are tight up to
constant factors.
These results rely on techniques from weighted approximation theory, which
studies how well functions on the real line can be approximated by polynomials
under various distributions. We believe that these techniques will have further
applications in other areas of computer science.Comment: 22 page
- …