25,021 research outputs found
Privately Releasing Conjunctions and the Statistical Query Barrier
Suppose we would like to know all answers to a set of statistical queries C
on a data set up to small error, but we can only access the data itself using
statistical queries. A trivial solution is to exhaustively ask all queries in
C. Can we do any better?
+ We show that the number of statistical queries necessary and sufficient for
this task is---up to polynomial factors---equal to the agnostic learning
complexity of C in Kearns' statistical query (SQ) model. This gives a complete
answer to the question when running time is not a concern.
+ We then show that the problem can be solved efficiently (allowing arbitrary
error on a small fraction of queries) whenever the answers to C can be
described by a submodular function. This includes many natural concept classes,
such as graph cuts and Boolean disjunctions and conjunctions.
While interesting from a learning theoretic point of view, our main
applications are in privacy-preserving data analysis:
Here, our second result leads to the first algorithm that efficiently
releases differentially private answers to of all Boolean conjunctions with 1%
average error. This presents significant progress on a key open problem in
privacy-preserving data analysis.
Our first result on the other hand gives unconditional lower bounds on any
differentially private algorithm that admits a (potentially
non-privacy-preserving) implementation using only statistical queries. Not only
our algorithms, but also most known private algorithms can be implemented using
only statistical queries, and hence are constrained by these lower bounds. Our
result therefore isolates the complexity of agnostic learning in the SQ-model
as a new barrier in the design of differentially private algorithms
Information-based complexity, feedback and dynamics in convex programming
We study the intrinsic limitations of sequential convex optimization through
the lens of feedback information theory. In the oracle model of optimization,
an algorithm queries an {\em oracle} for noisy information about the unknown
objective function, and the goal is to (approximately) minimize every function
in a given class using as few queries as possible. We show that, in order for a
function to be optimized, the algorithm must be able to accumulate enough
information about the objective. This, in turn, puts limits on the speed of
optimization under specific assumptions on the oracle and the type of feedback.
Our techniques are akin to the ones used in statistical literature to obtain
minimax lower bounds on the risks of estimation procedures; the notable
difference is that, unlike in the case of i.i.d. data, a sequential
optimization algorithm can gather observations in a {\em controlled} manner, so
that the amount of information at each step is allowed to change in time. In
particular, we show that optimization algorithms often obey the law of
diminishing returns: the signal-to-noise ratio drops as the optimization
algorithm approaches the optimum. To underscore the generality of the tools, we
use our approach to derive fundamental lower bounds for a certain active
learning problem. Overall, the present work connects the intuitive notions of
information in optimization, experimental design, estimation, and active
learning to the quantitative notion of Shannon information.Comment: final version; to appear in IEEE Transactions on Information Theor
- …