35,624 research outputs found
Learning Coverage Functions and Private Release of Marginals
We study the problem of approximating and learning coverage functions. A
function is a coverage function, if
there exists a universe with non-negative weights for each
and subsets of such that . Alternatively, coverage functions can be described
as non-negative linear combinations of monotone disjunctions. They are a
natural subclass of submodular functions and arise in a number of applications.
We give an algorithm that for any , given random and uniform
examples of an unknown coverage function , finds a function that
approximates within factor on all but -fraction of the
points in time . This is the first fully-polynomial
algorithm for learning an interesting class of functions in the demanding PMAC
model of Balcan and Harvey (2011). Our algorithms are based on several new
structural properties of coverage functions. Using the results in (Feldman and
Kothari, 2014), we also show that coverage functions are learnable agnostically
with excess -error over all product and symmetric
distributions in time . In contrast, we show that,
without assumptions on the distribution, learning coverage functions is at
least as hard as learning polynomial-size disjoint DNF formulas, a class of
functions for which the best known algorithm runs in time
(Klivans and Servedio, 2004).
As an application of our learning results, we give simple
differentially-private algorithms for releasing monotone conjunction counting
queries with low average error. In particular, for any , we obtain
private release of -way marginals with average error in time
Differentially Private Release and Learning of Threshold Functions
We prove new upper and lower bounds on the sample complexity of differentially private algorithms for releasing approximate answers to
threshold functions. A threshold function over a totally ordered domain
evaluates to if , and evaluates to otherwise. We
give the first nontrivial lower bound for releasing thresholds with
differential privacy, showing that the task is impossible
over an infinite domain , and moreover requires sample complexity , which grows with the size of the domain. Inspired by the
techniques used to prove this lower bound, we give an algorithm for releasing
thresholds with samples. This improves the
previous best upper bound of (Beimel et al., RANDOM
'13).
Our sample complexity upper and lower bounds also apply to the tasks of
learning distributions with respect to Kolmogorov distance and of properly PAC
learning thresholds with differential privacy. The lower bound gives the first
separation between the sample complexity of properly learning a concept class
with differential privacy and learning without privacy. For
properly learning thresholds in dimensions, this lower bound extends to
.
To obtain our results, we give reductions in both directions from releasing
and properly learning thresholds and the simpler interior point problem. Given
a database of elements from , the interior point problem asks for an
element between the smallest and largest elements in . We introduce new
recursive constructions for bounding the sample complexity of the interior
point problem, as well as further reductions and techniques for proving
impossibility results for other basic problems in differential privacy.Comment: 43 page
Privately Releasing Conjunctions and the Statistical Query Barrier
Suppose we would like to know all answers to a set of statistical queries C
on a data set up to small error, but we can only access the data itself using
statistical queries. A trivial solution is to exhaustively ask all queries in
C. Can we do any better?
+ We show that the number of statistical queries necessary and sufficient for
this task is---up to polynomial factors---equal to the agnostic learning
complexity of C in Kearns' statistical query (SQ) model. This gives a complete
answer to the question when running time is not a concern.
+ We then show that the problem can be solved efficiently (allowing arbitrary
error on a small fraction of queries) whenever the answers to C can be
described by a submodular function. This includes many natural concept classes,
such as graph cuts and Boolean disjunctions and conjunctions.
While interesting from a learning theoretic point of view, our main
applications are in privacy-preserving data analysis:
Here, our second result leads to the first algorithm that efficiently
releases differentially private answers to of all Boolean conjunctions with 1%
average error. This presents significant progress on a key open problem in
privacy-preserving data analysis.
Our first result on the other hand gives unconditional lower bounds on any
differentially private algorithm that admits a (potentially
non-privacy-preserving) implementation using only statistical queries. Not only
our algorithms, but also most known private algorithms can be implemented using
only statistical queries, and hence are constrained by these lower bounds. Our
result therefore isolates the complexity of agnostic learning in the SQ-model
as a new barrier in the design of differentially private algorithms
Pseudorandomness for Approximate Counting and Sampling
We study computational procedures that use both randomness and nondeterminism. The goal of this paper is to derandomize such procedures under the weakest possible assumptions.
Our main technical contribution allows one to “boost” a given hardness assumption: We show that if there is a problem in EXP that cannot be computed by poly-size nondeterministic circuits then there is one which cannot be computed by poly-size circuits that make non-adaptive NP oracle queries. This in particular shows that the various assumptions used over the last few years by several authors to derandomize Arthur-Merlin games (i.e., show AM = NP) are in fact all equivalent.
We also define two new primitives that we regard as the natural pseudorandom objects associated with approximate counting and sampling of NP-witnesses. We use the “boosting” theorem and hashing techniques to construct these primitives using an assumption that is no stronger than that used to derandomize AM.
We observe that Cai's proof that S_2^P ⊆ PP⊆(NP) and the learning algorithm of Bshouty et al. can be seen as reductions to sampling that are not probabilistic. As a consequence they can be derandomized under an assumption which is weaker than the assumption that was previously known to suffice
Understanding the Complexity of Lifted Inference and Asymmetric Weighted Model Counting
In this paper we study lifted inference for the Weighted First-Order Model
Counting problem (WFOMC), which counts the assignments that satisfy a given
sentence in first-order logic (FOL); it has applications in Statistical
Relational Learning (SRL) and Probabilistic Databases (PDB). We present several
results. First, we describe a lifted inference algorithm that generalizes prior
approaches in SRL and PDB. Second, we provide a novel dichotomy result for a
non-trivial fragment of FO CNF sentences, showing that for each sentence the
WFOMC problem is either in PTIME or #P-hard in the size of the input domain; we
prove that, in the first case our algorithm solves the WFOMC problem in PTIME,
and in the second case it fails. Third, we present several properties of the
algorithm. Finally, we discuss limitations of lifted inference for symmetric
probabilistic databases (where the weights of ground literals depend only on
the relation name, and not on the constants of the domain), and prove the
impossibility of a dichotomy result for the complexity of probabilistic
inference for the entire language FOL
Oracles Are Subtle But Not Malicious
Theoretical computer scientists have been debating the role of oracles since
the 1970's. This paper illustrates both that oracles can give us nontrivial
insights about the barrier problems in circuit complexity, and that they need
not prevent us from trying to solve those problems.
First, we give an oracle relative to which PP has linear-sized circuits, by
proving a new lower bound for perceptrons and low- degree threshold
polynomials. This oracle settles a longstanding open question, and generalizes
earlier results due to Beigel and to Buhrman, Fortnow, and Thierauf. More
importantly, it implies the first nonrelativizing separation of "traditional"
complexity classes, as opposed to interactive proof classes such as MIP and
MA-EXP. For Vinodchandran showed, by a nonrelativizing argument, that PP does
not have circuits of size n^k for any fixed k. We present an alternative proof
of this fact, which shows that PP does not even have quantum circuits of size
n^k with quantum advice. To our knowledge, this is the first nontrivial lower
bound on quantum circuit size.
Second, we study a beautiful algorithm of Bshouty et al. for learning Boolean
circuits in ZPP^NP. We show that the NP queries in this algorithm cannot be
parallelized by any relativizing technique, by giving an oracle relative to
which ZPP^||NP and even BPP^||NP have linear-size circuits. On the other hand,
we also show that the NP queries could be parallelized if P=NP. Thus, classes
such as ZPP^||NP inhabit a "twilight zone," where we need to distinguish
between relativizing and black-box techniques. Our results on this subject have
implications for computational learning theory as well as for the circuit
minimization problem.Comment: 20 pages, 1 figur
On the exact learnability of graph parameters: The case of partition functions
We study the exact learnability of real valued graph parameters which are
known to be representable as partition functions which count the number of
weighted homomorphisms into a graph with vertex weights and edge
weights . M. Freedman, L. Lov\'asz and A. Schrijver have given a
characterization of these graph parameters in terms of the -connection
matrices of . Our model of learnability is based on D. Angluin's
model of exact learning using membership and equivalence queries. Given such a
graph parameter , the learner can ask for the values of for graphs of
their choice, and they can formulate hypotheses in terms of the connection
matrices of . The teacher can accept the hypothesis as correct, or
provide a counterexample consisting of a graph. Our main result shows that in
this scenario, a very large class of partition functions, the rigid partition
functions, can be learned in time polynomial in the size of and the size of
the largest counterexample in the Blum-Shub-Smale model of computation over the
reals with unit cost.Comment: 14 pages, full version of the MFCS 2016 conference pape
Order-Revealing Encryption and the Hardness of Private Learning
An order-revealing encryption scheme gives a public procedure by which two
ciphertexts can be compared to reveal the ordering of their underlying
plaintexts. We show how to use order-revealing encryption to separate
computationally efficient PAC learning from efficient -differentially private PAC learning. That is, we construct a concept
class that is efficiently PAC learnable, but for which every efficient learner
fails to be differentially private. This answers a question of Kasiviswanathan
et al. (FOCS '08, SIAM J. Comput. '11).
To prove our result, we give a generic transformation from an order-revealing
encryption scheme into one with strongly correct comparison, which enables the
consistent comparison of ciphertexts that are not obtained as the valid
encryption of any message. We believe this construction may be of independent
interest.Comment: 28 page
- …