73,532 research outputs found
Privately Releasing Conjunctions and the Statistical Query Barrier
Suppose we would like to know all answers to a set of statistical queries C
on a data set up to small error, but we can only access the data itself using
statistical queries. A trivial solution is to exhaustively ask all queries in
C. Can we do any better?
+ We show that the number of statistical queries necessary and sufficient for
this task is---up to polynomial factors---equal to the agnostic learning
complexity of C in Kearns' statistical query (SQ) model. This gives a complete
answer to the question when running time is not a concern.
+ We then show that the problem can be solved efficiently (allowing arbitrary
error on a small fraction of queries) whenever the answers to C can be
described by a submodular function. This includes many natural concept classes,
such as graph cuts and Boolean disjunctions and conjunctions.
While interesting from a learning theoretic point of view, our main
applications are in privacy-preserving data analysis:
Here, our second result leads to the first algorithm that efficiently
releases differentially private answers to of all Boolean conjunctions with 1%
average error. This presents significant progress on a key open problem in
privacy-preserving data analysis.
Our first result on the other hand gives unconditional lower bounds on any
differentially private algorithm that admits a (potentially
non-privacy-preserving) implementation using only statistical queries. Not only
our algorithms, but also most known private algorithms can be implemented using
only statistical queries, and hence are constrained by these lower bounds. Our
result therefore isolates the complexity of agnostic learning in the SQ-model
as a new barrier in the design of differentially private algorithms
MCMC Learning
The theory of learning under the uniform distribution is rich and deep, with
connections to cryptography, computational complexity, and the analysis of
boolean functions to name a few areas. This theory however is very limited due
to the fact that the uniform distribution and the corresponding Fourier basis
are rarely encountered as a statistical model.
A family of distributions that vastly generalizes the uniform distribution on
the Boolean cube is that of distributions represented by Markov Random Fields
(MRF). Markov Random Fields are one of the main tools for modeling high
dimensional data in many areas of statistics and machine learning.
In this paper we initiate the investigation of extending central ideas,
methods and algorithms from the theory of learning under the uniform
distribution to the setup of learning concepts given examples from MRF
distributions. In particular, our results establish a novel connection between
properties of MCMC sampling of MRFs and learning under the MRF distribution.Comment: 28 pages, 1 figur
Simplest random K-satisfiability problem
We study a simple and exactly solvable model for the generation of random
satisfiability problems. These consist of random boolean constraints
which are to be satisfied simultaneously by logical variables. In
statistical-mechanics language, the considered model can be seen as a diluted
p-spin model at zero temperature. While such problems become extraordinarily
hard to solve by local search methods in a large region of the parameter space,
still at least one solution may be superimposed by construction. The
statistical properties of the model can be studied exactly by the replica
method and each single instance can be analyzed in polynomial time by a simple
global solution method. The geometrical/topological structures responsible for
dynamic and static phase transitions as well as for the onset of computational
complexity in local search method are thoroughly analyzed. Numerical analysis
on very large samples allows for a precise characterization of the critical
scaling behaviour.Comment: 14 pages, 5 figures, to appear in Phys. Rev. E (Feb 2001). v2: minor
errors and references correcte
Combinatorial Control through Allostery
Many instances of cellular signaling and transcriptional regulation involve
switch-like molecular responses to the presence or absence of input ligands. To
understand how these responses come about and how they can be harnessed, we
develop a statistical mechanical model to characterize the types of Boolean
logic that can arise from allosteric molecules following the
Monod-Wyman-Changeux (MWC) model. Building upon previous work, we show how an
allosteric molecule regulated by two inputs can elicit AND, OR, NAND and NOR
responses, but is unable to realize XOR or XNOR gates. Next, we demonstrate the
ability of an MWC molecule to perform ratiometric sensing - a response behavior
where activity depends monotonically on the ratio of ligand concentrations. We
then extend our analysis to more general schemes of combinatorial control
involving either additional binding sites for the two ligands or an additional
third ligand and show how these additions can cause a switch in the logic
behavior of the molecule. Overall, our results demonstrate the wide variety of
control schemes that biological systems can implement using simple mechanisms
On the Complexity and Approximation of Binary Evidence in Lifted Inference
Lifted inference algorithms exploit symmetries in probabilistic models to
speed up inference. They show impressive performance when calculating
unconditional probabilities in relational models, but often resort to
non-lifted inference when computing conditional probabilities. The reason is
that conditioning on evidence breaks many of the model's symmetries, which can
preempt standard lifting techniques. Recent theoretical results show, for
example, that conditioning on evidence which corresponds to binary relations is
#P-hard, suggesting that no lifting is to be expected in the worst case. In
this paper, we balance this negative result by identifying the Boolean rank of
the evidence as a key parameter for characterizing the complexity of
conditioning in lifted inference. In particular, we show that conditioning on
binary evidence with bounded Boolean rank is efficient. This opens up the
possibility of approximating evidence by a low-rank Boolean matrix
factorization, which we investigate both theoretically and empirically.Comment: To appear in Advances in Neural Information Processing Systems 26
(NIPS), Lake Tahoe, USA, December 201
- …