1,557 research outputs found
Privately Releasing Conjunctions and the Statistical Query Barrier
Suppose we would like to know all answers to a set of statistical queries C
on a data set up to small error, but we can only access the data itself using
statistical queries. A trivial solution is to exhaustively ask all queries in
C. Can we do any better?
+ We show that the number of statistical queries necessary and sufficient for
this task is---up to polynomial factors---equal to the agnostic learning
complexity of C in Kearns' statistical query (SQ) model. This gives a complete
answer to the question when running time is not a concern.
+ We then show that the problem can be solved efficiently (allowing arbitrary
error on a small fraction of queries) whenever the answers to C can be
described by a submodular function. This includes many natural concept classes,
such as graph cuts and Boolean disjunctions and conjunctions.
While interesting from a learning theoretic point of view, our main
applications are in privacy-preserving data analysis:
Here, our second result leads to the first algorithm that efficiently
releases differentially private answers to of all Boolean conjunctions with 1%
average error. This presents significant progress on a key open problem in
privacy-preserving data analysis.
Our first result on the other hand gives unconditional lower bounds on any
differentially private algorithm that admits a (potentially
non-privacy-preserving) implementation using only statistical queries. Not only
our algorithms, but also most known private algorithms can be implemented using
only statistical queries, and hence are constrained by these lower bounds. Our
result therefore isolates the complexity of agnostic learning in the SQ-model
as a new barrier in the design of differentially private algorithms
On the Complexity of Mining Itemsets from the Crowd Using Taxonomies
We study the problem of frequent itemset mining in domains where data is not
recorded in a conventional database but only exists in human knowledge. We
provide examples of such scenarios, and present a crowdsourcing model for them.
The model uses the crowd as an oracle to find out whether an itemset is
frequent or not, and relies on a known taxonomy of the item domain to guide the
search for frequent itemsets. In the spirit of data mining with oracles, we
analyze the complexity of this problem in terms of (i) crowd complexity, that
measures the number of crowd questions required to identify the frequent
itemsets; and (ii) computational complexity, that measures the computational
effort required to choose the questions. We provide lower and upper complexity
bounds in terms of the size and structure of the input taxonomy, as well as the
size of a concise description of the output itemsets. We also provide
constructive algorithms that achieve the upper bounds, and consider more
efficient variants for practical situations.Comment: 18 pages, 2 figures. To be published to ICDT'13. Added missing
acknowledgemen
Modeling of Phenomena and Dynamic Logic of Phenomena
Modeling of complex phenomena such as the mind presents tremendous
computational complexity challenges. Modeling field theory (MFT) addresses
these challenges in a non-traditional way. The main idea behind MFT is to match
levels of uncertainty of the model (also, problem or theory) with levels of
uncertainty of the evaluation criterion used to identify that model. When a
model becomes more certain, then the evaluation criterion is adjusted
dynamically to match that change to the model. This process is called the
Dynamic Logic of Phenomena (DLP) for model construction and it mimics processes
of the mind and natural evolution. This paper provides a formal description of
DLP by specifying its syntax, semantics, and reasoning system. We also outline
links between DLP and other logical approaches. Computational complexity issues
that motivate this work are presented using an example of polynomial models
Testing Linear-Invariant Non-Linear Properties
We consider the task of testing properties of Boolean functions that are
invariant under linear transformations of the Boolean cube. Previous work in
property testing, including the linearity test and the test for Reed-Muller
codes, has mostly focused on such tasks for linear properties. The one
exception is a test due to Green for "triangle freeness": a function
f:\cube^{n}\to\cube satisfies this property if do not all
equal 1, for any pair x,y\in\cube^{n}.
Here we extend this test to a more systematic study of testing for
linear-invariant non-linear properties. We consider properties that are
described by a single forbidden pattern (and its linear transformations), i.e.,
a property is given by points v_{1},...,v_{k}\in\cube^{k} and
f:\cube^{n}\to\cube satisfies the property that if for all linear maps
L:\cube^{k}\to\cube^{n} it is the case that do
not all equal 1. We show that this property is testable if the underlying
matroid specified by is a graphic matroid. This extends
Green's result to an infinite class of new properties.
Our techniques extend those of Green and in particular we establish a link
between the notion of "1-complexity linear systems" of Green and Tao, and
graphic matroids, to derive the results.Comment: This is the full version; conference version appeared in the
proceedings of STACS 200
- …