65 research outputs found

    Finite Open-World Query Answering with Number Restrictions (Extended Version)

    Full text link
    Open-world query answering is the problem of deciding, given a set of facts, conjunction of constraints, and query, whether the facts and constraints imply the query. This amounts to reasoning over all instances that include the facts and satisfy the constraints. We study finite open-world query answering (FQA), which assumes that the underlying world is finite and thus only considers the finite completions of the instance. The major known decidable cases of FQA derive from the following: the guarded fragment of first-order logic, which can express referential constraints (data in one place points to data in another) but cannot express number restrictions such as functional dependencies; and the guarded fragment with number restrictions but on a signature of arity only two. In this paper, we give the first decidability results for FQA that combine both referential constraints and number restrictions for arbitrary signatures: we show that, for unary inclusion dependencies and functional dependencies, the finiteness assumption of FQA can be lifted up to taking the finite implication closure of the dependencies. Our result relies on new techniques to construct finite universal models of such constraints, for any bound on the maximal query size.Comment: 59 pages. To appear in LICS 2015. Extended version including proof

    When Can We Answer Queries Using Result-Bounded Data Interfaces?

    Full text link
    We consider answering queries on data available through access methods, that provide lookup access to the tuples matching a given binding. Such interfaces are common on the Web; further, they often have bounds on how many results they can return, e.g., because of pagination or rate limits. We thus study result-bounded methods, which may return only a limited number of tuples. We study how to decide if a query is answerable using result-bounded methods, i.e., how to compute a plan that returns all answers to the query using the methods, assuming that the underlying data satisfies some integrity constraints. We first show how to reduce answerability to a query containment problem with constraints. Second, we show "schema simplification" theorems describing when and how result bounded services can be used. Finally, we use these theorems to give decidability and complexity results about answerability for common constraint classes.Comment: 65 pages; journal version of the PODS'18 paper arXiv:1706.0793

    When Can We Answer Queries Using Result-Bounded Data Interfaces?

    Full text link
    We consider answering queries where the underlying data is available only over limited interfaces which provide lookup access to the tuples matching a given binding, but possibly restricting the number of output tuples returned. Interfaces imposing such "result bounds" are common in accessing data via the web. Given a query over a set of relations as well as some integrity constraints that relate the queried relations to the data sources, we examine the problem of deciding if the query is answerable over the interfaces; that is, whether there exists a plan that returns all answers to the query, assuming the source data satisfies the integrity constraints. The first component of our analysis of answerability is a reduction to a query containment problem with constraints. The second component is a set of "schema simplification" theorems capturing limitations on how interfaces with result bounds can be useful to obtain complete answers to queries. These results also help to show decidability for the containment problem that captures answerability, for many classes of constraints. The final component in our analysis of answerability is a "linearization" method, showing that query containment with certain guarded dependencies -- including those that emerge from answerability problems -- can be reduced to query containment for a well-behaved class of linear dependencies. Putting these components together, we get a detailed picture of how to check answerability over result-bounded services.Comment: 45 pages, 2 tables, 43 references. Complete version with proofs of the PODS'18 paper. The main text of this paper is almost identical to the PODS'18 except that we have fixed some small mistakes. Relative to the earlier arXiv version, many errors were corrected, and some terminology has change

    Uniform Reliability for Unbounded Homomorphism-Closed Graph Queries

    Get PDF
    We study the uniform query reliability problem, which asks, for a fixed Boolean query Q, given an instance I, how many subinstances of I satisfy Q. Equivalently, this is a restricted case of Boolean query evaluation on tuple-independent probabilistic databases where all facts must have probability 1/2. We focus on graph signatures, and on queries closed under homomorphisms. We show that for any such query that is unbounded, i.e., not equivalent to a union of conjunctive queries, the uniform reliability problem is #P-hard. This recaptures the hardness, e.g., of s-t connectedness, which counts how many subgraphs of an input graph have a path between a source and a sink. This new hardness result on uniform reliability strengthens our earlier hardness result on probabilistic query evaluation for unbounded homomorphism-closed queries (ICDT'20). Indeed, our earlier proof crucially used facts with probability 1, so it did not apply to the unweighted case. The new proof presented in this paper avoids this; it uses our recent hardness result on uniform reliability for non-hierarchical conjunctive queries without self-joins (ICDT'21), along with new techniques.Comment: 41 pages. Submitte

    Enumerating Regular Languages with Bounded Delay

    Get PDF

    On the Complexity of Mining Itemsets from the Crowd Using Taxonomies

    Full text link
    We study the problem of frequent itemset mining in domains where data is not recorded in a conventional database but only exists in human knowledge. We provide examples of such scenarios, and present a crowdsourcing model for them. The model uses the crowd as an oracle to find out whether an itemset is frequent or not, and relies on a known taxonomy of the item domain to guide the search for frequent itemsets. In the spirit of data mining with oracles, we analyze the complexity of this problem in terms of (i) crowd complexity, that measures the number of crowd questions required to identify the frequent itemsets; and (ii) computational complexity, that measures the computational effort required to choose the questions. We provide lower and upper complexity bounds in terms of the size and structure of the input taxonomy, as well as the size of a concise description of the output itemsets. We also provide constructive algorithms that achieve the upper bounds, and consider more efficient variants for practical situations.Comment: 18 pages, 2 figures. To be published to ICDT'13. Added missing acknowledgemen
    • …
    corecore