104 research outputs found
Noise-Tolerant Learning, the Parity Problem, and the Statistical Query Model
We describe a slightly sub-exponential time algorithm for learning parity
functions in the presence of random classification noise. This results in a
polynomial-time algorithm for the case of parity functions that depend on only
the first O(log n log log n) bits of input. This is the first known instance of
an efficient noise-tolerant algorithm for a concept class that is provably not
learnable in the Statistical Query model of Kearns. Thus, we demonstrate that
the set of problems learnable in the statistical query model is a strict subset
of those problems learnable in the presence of noise in the PAC model.
In coding-theory terms, what we give is a poly(n)-time algorithm for decoding
linear k by n codes in the presence of random noise for the case of k = c log n
loglog n for some c > 0. (The case of k = O(log n) is trivial since one can
just individually check each of the 2^k possible messages and choose the one
that yields the closest codeword.)
A natural extension of the statistical query model is to allow queries about
statistical properties that involve t-tuples of examples (as opposed to single
examples). The second result of this paper is to show that any class of
functions learnable (strongly or weakly) with t-wise queries for t = O(log n)
is also weakly learnable with standard unary queries. Hence this natural
extension to the statistical query model does not increase the set of weakly
learnable functions
Learning to Reason: Leveraging Neural Networks for Approximate DNF Counting
Weighted model counting (WMC) has emerged as a prevalent approach for
probabilistic inference. In its most general form, WMC is #P-hard. Weighted DNF
counting (weighted #DNF) is a special case, where approximations with
probabilistic guarantees are obtained in O(nm), where n denotes the number of
variables, and m the number of clauses of the input DNF, but this is not
scalable in practice. In this paper, we propose a neural model counting
approach for weighted #DNF that combines approximate model counting with deep
learning, and accurately approximates model counts in linear time when width is
bounded. We conduct experiments to validate our method, and show that our model
learns and generalizes very well to large-scale #DNF instances.Comment: To appear in Proceedings of the Thirty-Fourth AAAI Conference on
Artificial Intelligence (AAAI-20). Code and data available at:
https://github.com/ralphabb/NeuralDNF
Pseudorandomness and Fourier Growth Bounds for Width-3 Branching Programs
We present an explicit pseudorandom generator for oblivious, read-once, width-3 branching programs, which can read their input bits in any order. The generator has seed length O~( log^3 n ).
The previously best known seed length for this model is n^{1/2+o(1)} due to Impagliazzo, Meka, and Zuckerman (FOCS\u2712). Our work generalizes a recent result of Reingold, Steinke, and Vadhan (RANDOM\u2713) for permutation branching programs. The main technical novelty underlying our generator is a new bound on the Fourier growth of width-3, oblivious, read-once branching programs. Specifically, we show that for any f : {0,1}^n -> {0,1} computed by such a branching program, and k in [n], sum_{|s|=k} |hat{f}(s)| < n^2 * (O(log n))^k,
where f(x) = sum_s hat{f}(s) (-1)^ is the standard Fourier transform over Z_2^n. The base O(log n) of the Fourier growth is tight up to a factor of log log n
From average case complexity to improper learning complexity
The basic problem in the PAC model of computational learning theory is to
determine which hypothesis classes are efficiently learnable. There is
presently a dearth of results showing hardness of learning problems. Moreover,
the existing lower bounds fall short of the best known algorithms.
The biggest challenge in proving complexity results is to establish hardness
of {\em improper learning} (a.k.a. representation independent learning).The
difficulty in proving lower bounds for improper learning is that the standard
reductions from -hard problems do not seem to apply in this
context. There is essentially only one known approach to proving lower bounds
on improper learning. It was initiated in (Kearns and Valiant 89) and relies on
cryptographic assumptions.
We introduce a new technique for proving hardness of improper learning, based
on reductions from problems that are hard on average. We put forward a (fairly
strong) generalization of Feige's assumption (Feige 02) about the complexity of
refuting random constraint satisfaction problems. Combining this assumption
with our new technique yields far reaching implications. In particular,
1. Learning 's is hard.
2. Agnostically learning halfspaces with a constant approximation ratio is
hard.
3. Learning an intersection of halfspaces is hard.Comment: 34 page
Optimal Bounds on Approximation of Submodular and XOS Functions by Juntas
We investigate the approximability of several classes of real-valued
functions by functions of a small number of variables ({\em juntas}). Our main
results are tight bounds on the number of variables required to approximate a
function within -error over
the uniform distribution: 1. If is submodular, then it is -close
to a function of variables.
This is an exponential improvement over previously known results. We note that
variables are necessary even for linear
functions. 2. If is fractionally subadditive (XOS) it is -close
to a function of variables. This result holds for all
functions with low total -influence and is a real-valued analogue of
Friedgut's theorem for boolean functions. We show that
variables are necessary even for XOS functions.
As applications of these results, we provide learning algorithms over the
uniform distribution. For XOS functions, we give a PAC learning algorithm that
runs in time . For submodular functions we give
an algorithm in the more demanding PMAC learning model (Balcan and Harvey,
2011) which requires a multiplicative factor approximation with
probability at least over the target distribution. Our uniform
distribution algorithm runs in time .
This is the first algorithm in the PMAC model that over the uniform
distribution can achieve a constant approximation factor arbitrarily close to 1
for all submodular functions. As follows from the lower bounds in (Feldman et
al., 2013) both of these algorithms are close to optimal. We also give
applications for proper learning, testing and agnostic learning with value
queries of these classes.Comment: Extended abstract appears in proceedings of FOCS 201
Conspiracies between learning algorithms, circuit lower bounds, and pseudorandomness
We prove several results giving new and stronger connections between learning theory, circuit
complexity and pseudorandomness. Let C be any typical class of Boolean circuits, and C[s(n)]
denote n-variable C-circuits of size ≤ s(n). We show:
Learning Speedups. If C[poly(n)] admits a randomized weak learning algorithm under the
uniform distribution with membership queries that runs in time 2n/nω(1), then for every k ≥ 1
and ε > 0 the class C[n
k
] can be learned to high accuracy in time O(2n
ε
). There is ε > 0 such that
C[2n
ε
] can be learned in time 2n/nω(1) if and only if C[poly(n)] can be learned in time 2(log n)
O(1)
.
Equivalences between Learning Models. We use learning speedups to obtain equivalences
between various randomized learning and compression models, including sub-exponential
time learning with membership queries, sub-exponential time learning with membership and
equivalence queries, probabilistic function compression and probabilistic average-case function
compression.
A Dichotomy between Learnability and Pseudorandomness. In the non-uniform setting,
there is non-trivial learning for C[poly(n)] if and only if there are no exponentially secure
pseudorandom functions computable in C[poly(n)].
Lower Bounds from Nontrivial Learning. If for each k ≥ 1, (depth-d)-C[n
k
] admits a
randomized weak learning algorithm with membership queries under the uniform distribution
that runs in time 2n/nω(1), then for each k ≥ 1, BPE * (depth-d)-C[n
k
]. If for some ε > 0 there
are P-natural proofs useful against C[2n
ε
], then ZPEXP * C[poly(n)].
Karp-Lipton Theorems for Probabilistic Classes. If there is a k > 0 such that BPE ⊆
i.o.Circuit[n
k
], then BPEXP ⊆ i.o.EXP/O(log n). If ZPEXP ⊆ i.o.Circuit[2n/3
], then ZPEXP ⊆
i.o.ESUBEXP.
Hardness Results for MCSP. All functions in non-uniform NC1
reduce to the Minimum
Circuit Size Problem via truth-table reductions computable by TC0
circuits. In particular, if
MCSP ∈ TC0
then NC1 = TC0
Conspiracies Between Learning Algorithms, Circuit Lower Bounds, and Pseudorandomness
We prove several results giving new and stronger connections between learning theory, circuit complexity and pseudorandomness. Let C be any typical class of Boolean circuits, and C[s(n)] denote n-variable C-circuits of size <= s(n). We show:
Learning Speedups: If C[s(n)] admits a randomized weak learning algorithm under the uniform distribution with membership queries that runs in time 2^n/n^{omega(1)}, then for every k >= 1 and epsilon > 0 the class C[n^k] can be learned to high accuracy in time O(2^{n^epsilon}). There is epsilon > 0 such that C[2^{n^{epsilon}}] can be learned in time 2^n/n^{omega(1)} if and only if C[poly(n)] can be learned in time 2^{(log(n))^{O(1)}}.
Equivalences between Learning Models: We use learning speedups to obtain equivalences between various randomized learning and compression models, including sub-exponential time learning with membership queries, sub-exponential time learning with membership and equivalence queries, probabilistic function compression and probabilistic average-case function compression.
A Dichotomy between Learnability and Pseudorandomness: In the non-uniform setting, there is non-trivial learning for C[poly(n)] if and only if there are no exponentially secure pseudorandom functions computable in C[poly(n)].
Lower Bounds from Nontrivial Learning: If for each k >= 1, (depth-d)-C[n^k] admits a randomized weak learning algorithm with membership queries under the uniform distribution that runs in time 2^n/n^{omega(1)}, then for each k >= 1, BPE is not contained in (depth-d)-C[n^k]. If for some epsilon > 0 there are P-natural proofs useful against C[2^{n^{epsilon}}], then ZPEXP is not contained in C[poly(n)].
Karp-Lipton Theorems for Probabilistic Classes: If there is a k > 0 such that BPE is contained in i.o.Circuit[n^k], then BPEXP is contained in i.o.EXP/O(log(n)). If ZPEXP is contained in i.o.Circuit[2^{n/3}], then ZPEXP is contained in i.o.ESUBEXP.
Hardness Results for MCSP: All functions in non-uniform NC^1 reduce to the Minimum Circuit Size Problem via truth-table reductions computable by TC^0 circuits. In particular, if MCSP is in TC^0 then NC^1 = TC^0
- …