31,583 research outputs found
MCMC Learning
The theory of learning under the uniform distribution is rich and deep, with
connections to cryptography, computational complexity, and the analysis of
boolean functions to name a few areas. This theory however is very limited due
to the fact that the uniform distribution and the corresponding Fourier basis
are rarely encountered as a statistical model.
A family of distributions that vastly generalizes the uniform distribution on
the Boolean cube is that of distributions represented by Markov Random Fields
(MRF). Markov Random Fields are one of the main tools for modeling high
dimensional data in many areas of statistics and machine learning.
In this paper we initiate the investigation of extending central ideas,
methods and algorithms from the theory of learning under the uniform
distribution to the setup of learning concepts given examples from MRF
distributions. In particular, our results establish a novel connection between
properties of MCMC sampling of MRFs and learning under the MRF distribution.Comment: 28 pages, 1 figur
Learning pseudo-Boolean k-DNF and Submodular Functions
We prove that any submodular function f: {0,1}^n -> {0,1,...,k} can be
represented as a pseudo-Boolean 2k-DNF formula. Pseudo-Boolean DNFs are a
natural generalization of DNF representation for functions with integer range.
Each term in such a formula has an associated integral constant. We show that
an analog of Hastad's switching lemma holds for pseudo-Boolean k-DNFs if all
constants associated with the terms of the formula are bounded.
This allows us to generalize Mansour's PAC-learning algorithm for k-DNFs to
pseudo-Boolean k-DNFs, and hence gives a PAC-learning algorithm with membership
queries under the uniform distribution for submodular functions of the form
f:{0,1}^n -> {0,1,...,k}. Our algorithm runs in time polynomial in n, k^{O(k
\log k / \epsilon)}, 1/\epsilon and log(1/\delta) and works even in the
agnostic setting. The line of previous work on learning submodular functions
[Balcan, Harvey (STOC '11), Gupta, Hardt, Roth, Ullman (STOC '11), Cheraghchi,
Klivans, Kothari, Lee (SODA '12)] implies only n^{O(k)} query complexity for
learning submodular functions in this setting, for fixed epsilon and delta.
Our learning algorithm implies a property tester for submodularity of
functions f:{0,1}^n -> {0, ..., k} with query complexity polynomial in n for
k=O((\log n/ \loglog n)^{1/2}) and constant proximity parameter \epsilon
Improved Learning from Kolmogorov Complexity
Carmosino, Impagliazzo, Kabanets, and Kolokolova (CCC, 2016) showed that the existence of natural properties in the sense of Razborov and Rudich (JCSS, 1997) implies PAC learning algorithms in the sense of Valiant (Comm. ACM, 1984), for boolean functions in P/poly, under the uniform distribution and with membership queries. It is still an open problem to get from natural properties learning algorithms that do not rely on membership queries but rather use randomly drawn labeled examples.
Natural properties may be understood as an average-case version of MCSP, the problem of deciding the minimum size of a circuit computing a given truth-table. Problems related to MCSP include those concerning time-bounded Kolmogorov complexity. MKTP, for example, asks for the KT-complexity of a given string. KT-complexity is a relaxation of circuit size, as it does away with the requirement that a short description of a string be interpreted as a boolean circuit. In this work, under assumptions of MKTP and the related problem MK^tP being easy on average, we get learning algorithms for boolean functions in P/poly that
- work over any distribution D samplable by a family of polynomial-size circuits (given explicitly in the case of MKTP),
- only use randomly drawn labeled examples from D, and
- are agnostic (do not require the target function to belong to the hypothesis class). Our results build upon the recent work of Hirahara and Nanashima (FOCS, 2021) who showed similar learning consequences but under a stronger assumption that NP is easy on average
Conspiracies between learning algorithms, circuit lower bounds, and pseudorandomness
We prove several results giving new and stronger connections between learning theory, circuit
complexity and pseudorandomness. Let C be any typical class of Boolean circuits, and C[s(n)]
denote n-variable C-circuits of size ≤ s(n). We show:
Learning Speedups. If C[poly(n)] admits a randomized weak learning algorithm under the
uniform distribution with membership queries that runs in time 2n/nω(1), then for every k ≥ 1
and ε > 0 the class C[n
k
] can be learned to high accuracy in time O(2n
ε
). There is ε > 0 such that
C[2n
ε
] can be learned in time 2n/nω(1) if and only if C[poly(n)] can be learned in time 2(log n)
O(1)
.
Equivalences between Learning Models. We use learning speedups to obtain equivalences
between various randomized learning and compression models, including sub-exponential
time learning with membership queries, sub-exponential time learning with membership and
equivalence queries, probabilistic function compression and probabilistic average-case function
compression.
A Dichotomy between Learnability and Pseudorandomness. In the non-uniform setting,
there is non-trivial learning for C[poly(n)] if and only if there are no exponentially secure
pseudorandom functions computable in C[poly(n)].
Lower Bounds from Nontrivial Learning. If for each k ≥ 1, (depth-d)-C[n
k
] admits a
randomized weak learning algorithm with membership queries under the uniform distribution
that runs in time 2n/nω(1), then for each k ≥ 1, BPE * (depth-d)-C[n
k
]. If for some ε > 0 there
are P-natural proofs useful against C[2n
ε
], then ZPEXP * C[poly(n)].
Karp-Lipton Theorems for Probabilistic Classes. If there is a k > 0 such that BPE ⊆
i.o.Circuit[n
k
], then BPEXP ⊆ i.o.EXP/O(log n). If ZPEXP ⊆ i.o.Circuit[2n/3
], then ZPEXP ⊆
i.o.ESUBEXP.
Hardness Results for MCSP. All functions in non-uniform NC1
reduce to the Minimum
Circuit Size Problem via truth-table reductions computable by TC0
circuits. In particular, if
MCSP ∈ TC0
then NC1 = TC0
Learning using Local Membership Queries
We introduce a new model of membership query (MQ) learning, where the
learning algorithm is restricted to query points that are \emph{close} to
random examples drawn from the underlying distribution. The learning model is
intermediate between the PAC model (Valiant, 1984) and the PAC+MQ model (where
the queries are allowed to be arbitrary points).
Membership query algorithms are not popular among machine learning
practitioners. Apart from the obvious difficulty of adaptively querying
labelers, it has also been observed that querying \emph{unnatural} points leads
to increased noise from human labelers (Lang and Baum, 1992). This motivates
our study of learning algorithms that make queries that are close to examples
generated from the data distribution.
We restrict our attention to functions defined on the -dimensional Boolean
hypercube and say that a membership query is local if its Hamming distance from
some example in the (random) training data is at most . We show the
following results in this model:
(i) The class of sparse polynomials (with coefficients in R) over
is polynomial time learnable under a large class of \emph{locally smooth}
distributions using -local queries. This class also includes the
class of -depth decision trees.
(ii) The class of polynomial-sized decision trees is polynomial time
learnable under product distributions using -local queries.
(iii) The class of polynomial size DNF formulas is learnable under the
uniform distribution using -local queries in time
.
(iv) In addition we prove a number of results relating the proposed model to
the traditional PAC model and the PAC+MQ model
Conspiracies Between Learning Algorithms, Circuit Lower Bounds, and Pseudorandomness
We prove several results giving new and stronger connections between learning theory, circuit complexity and pseudorandomness. Let C be any typical class of Boolean circuits, and C[s(n)] denote n-variable C-circuits of size <= s(n). We show:
Learning Speedups: If C[s(n)] admits a randomized weak learning algorithm under the uniform distribution with membership queries that runs in time 2^n/n^{omega(1)}, then for every k >= 1 and epsilon > 0 the class C[n^k] can be learned to high accuracy in time O(2^{n^epsilon}). There is epsilon > 0 such that C[2^{n^{epsilon}}] can be learned in time 2^n/n^{omega(1)} if and only if C[poly(n)] can be learned in time 2^{(log(n))^{O(1)}}.
Equivalences between Learning Models: We use learning speedups to obtain equivalences between various randomized learning and compression models, including sub-exponential time learning with membership queries, sub-exponential time learning with membership and equivalence queries, probabilistic function compression and probabilistic average-case function compression.
A Dichotomy between Learnability and Pseudorandomness: In the non-uniform setting, there is non-trivial learning for C[poly(n)] if and only if there are no exponentially secure pseudorandom functions computable in C[poly(n)].
Lower Bounds from Nontrivial Learning: If for each k >= 1, (depth-d)-C[n^k] admits a randomized weak learning algorithm with membership queries under the uniform distribution that runs in time 2^n/n^{omega(1)}, then for each k >= 1, BPE is not contained in (depth-d)-C[n^k]. If for some epsilon > 0 there are P-natural proofs useful against C[2^{n^{epsilon}}], then ZPEXP is not contained in C[poly(n)].
Karp-Lipton Theorems for Probabilistic Classes: If there is a k > 0 such that BPE is contained in i.o.Circuit[n^k], then BPEXP is contained in i.o.EXP/O(log(n)). If ZPEXP is contained in i.o.Circuit[2^{n/3}], then ZPEXP is contained in i.o.ESUBEXP.
Hardness Results for MCSP: All functions in non-uniform NC^1 reduce to the Minimum Circuit Size Problem via truth-table reductions computable by TC^0 circuits. In particular, if MCSP is in TC^0 then NC^1 = TC^0
Learning circuits with few negations
Monotone Boolean functions, and the monotone Boolean circuits that compute
them, have been intensively studied in complexity theory. In this paper we
study the structure of Boolean functions in terms of the minimum number of
negations in any circuit computing them, a complexity measure that interpolates
between monotone functions and the class of all functions. We study this
generalization of monotonicity from the vantage point of learning theory,
giving near-matching upper and lower bounds on the uniform-distribution
learnability of circuits in terms of the number of negations they contain. Our
upper bounds are based on a new structural characterization of negation-limited
circuits that extends a classical result of A. A. Markov. Our lower bounds,
which employ Fourier-analytic tools from hardness amplification, give new
results even for circuits with no negations (i.e. monotone functions)
- …