1,557 research outputs found

    Privately Releasing Conjunctions and the Statistical Query Barrier

    Full text link
    Suppose we would like to know all answers to a set of statistical queries C on a data set up to small error, but we can only access the data itself using statistical queries. A trivial solution is to exhaustively ask all queries in C. Can we do any better? + We show that the number of statistical queries necessary and sufficient for this task is---up to polynomial factors---equal to the agnostic learning complexity of C in Kearns' statistical query (SQ) model. This gives a complete answer to the question when running time is not a concern. + We then show that the problem can be solved efficiently (allowing arbitrary error on a small fraction of queries) whenever the answers to C can be described by a submodular function. This includes many natural concept classes, such as graph cuts and Boolean disjunctions and conjunctions. While interesting from a learning theoretic point of view, our main applications are in privacy-preserving data analysis: Here, our second result leads to the first algorithm that efficiently releases differentially private answers to of all Boolean conjunctions with 1% average error. This presents significant progress on a key open problem in privacy-preserving data analysis. Our first result on the other hand gives unconditional lower bounds on any differentially private algorithm that admits a (potentially non-privacy-preserving) implementation using only statistical queries. Not only our algorithms, but also most known private algorithms can be implemented using only statistical queries, and hence are constrained by these lower bounds. Our result therefore isolates the complexity of agnostic learning in the SQ-model as a new barrier in the design of differentially private algorithms

    On the Complexity of Mining Itemsets from the Crowd Using Taxonomies

    Full text link
    We study the problem of frequent itemset mining in domains where data is not recorded in a conventional database but only exists in human knowledge. We provide examples of such scenarios, and present a crowdsourcing model for them. The model uses the crowd as an oracle to find out whether an itemset is frequent or not, and relies on a known taxonomy of the item domain to guide the search for frequent itemsets. In the spirit of data mining with oracles, we analyze the complexity of this problem in terms of (i) crowd complexity, that measures the number of crowd questions required to identify the frequent itemsets; and (ii) computational complexity, that measures the computational effort required to choose the questions. We provide lower and upper complexity bounds in terms of the size and structure of the input taxonomy, as well as the size of a concise description of the output itemsets. We also provide constructive algorithms that achieve the upper bounds, and consider more efficient variants for practical situations.Comment: 18 pages, 2 figures. To be published to ICDT'13. Added missing acknowledgemen

    Modeling of Phenomena and Dynamic Logic of Phenomena

    Get PDF
    Modeling of complex phenomena such as the mind presents tremendous computational complexity challenges. Modeling field theory (MFT) addresses these challenges in a non-traditional way. The main idea behind MFT is to match levels of uncertainty of the model (also, problem or theory) with levels of uncertainty of the evaluation criterion used to identify that model. When a model becomes more certain, then the evaluation criterion is adjusted dynamically to match that change to the model. This process is called the Dynamic Logic of Phenomena (DLP) for model construction and it mimics processes of the mind and natural evolution. This paper provides a formal description of DLP by specifying its syntax, semantics, and reasoning system. We also outline links between DLP and other logical approaches. Computational complexity issues that motivate this work are presented using an example of polynomial models

    Testing Linear-Invariant Non-Linear Properties

    Get PDF
    We consider the task of testing properties of Boolean functions that are invariant under linear transformations of the Boolean cube. Previous work in property testing, including the linearity test and the test for Reed-Muller codes, has mostly focused on such tasks for linear properties. The one exception is a test due to Green for "triangle freeness": a function f:\cube^{n}\to\cube satisfies this property if f(x),f(y),f(x+y)f(x),f(y),f(x+y) do not all equal 1, for any pair x,y\in\cube^{n}. Here we extend this test to a more systematic study of testing for linear-invariant non-linear properties. We consider properties that are described by a single forbidden pattern (and its linear transformations), i.e., a property is given by kk points v_{1},...,v_{k}\in\cube^{k} and f:\cube^{n}\to\cube satisfies the property that if for all linear maps L:\cube^{k}\to\cube^{n} it is the case that f(L(v1)),...,f(L(vk))f(L(v_{1})),...,f(L(v_{k})) do not all equal 1. We show that this property is testable if the underlying matroid specified by v1,...,vkv_{1},...,v_{k} is a graphic matroid. This extends Green's result to an infinite class of new properties. Our techniques extend those of Green and in particular we establish a link between the notion of "1-complexity linear systems" of Green and Tao, and graphic matroids, to derive the results.Comment: This is the full version; conference version appeared in the proceedings of STACS 200
    • …
    corecore