65 research outputs found
Finite Open-World Query Answering with Number Restrictions (Extended Version)
Open-world query answering is the problem of deciding, given a set of facts,
conjunction of constraints, and query, whether the facts and constraints imply
the query. This amounts to reasoning over all instances that include the facts
and satisfy the constraints. We study finite open-world query answering (FQA),
which assumes that the underlying world is finite and thus only considers the
finite completions of the instance. The major known decidable cases of FQA
derive from the following: the guarded fragment of first-order logic, which can
express referential constraints (data in one place points to data in another)
but cannot express number restrictions such as functional dependencies; and the
guarded fragment with number restrictions but on a signature of arity only two.
In this paper, we give the first decidability results for FQA that combine both
referential constraints and number restrictions for arbitrary signatures: we
show that, for unary inclusion dependencies and functional dependencies, the
finiteness assumption of FQA can be lifted up to taking the finite implication
closure of the dependencies. Our result relies on new techniques to construct
finite universal models of such constraints, for any bound on the maximal query
size.Comment: 59 pages. To appear in LICS 2015. Extended version including proof
When Can We Answer Queries Using Result-Bounded Data Interfaces?
We consider answering queries on data available through access methods, that
provide lookup access to the tuples matching a given binding. Such interfaces
are common on the Web; further, they often have bounds on how many results they
can return, e.g., because of pagination or rate limits. We thus study
result-bounded methods, which may return only a limited number of tuples. We
study how to decide if a query is answerable using result-bounded methods,
i.e., how to compute a plan that returns all answers to the query using the
methods, assuming that the underlying data satisfies some integrity
constraints. We first show how to reduce answerability to a query containment
problem with constraints. Second, we show "schema simplification" theorems
describing when and how result bounded services can be used. Finally, we use
these theorems to give decidability and complexity results about answerability
for common constraint classes.Comment: 65 pages; journal version of the PODS'18 paper arXiv:1706.0793
When Can We Answer Queries Using Result-Bounded Data Interfaces?
We consider answering queries where the underlying data is available only
over limited interfaces which provide lookup access to the tuples matching a
given binding, but possibly restricting the number of output tuples returned.
Interfaces imposing such "result bounds" are common in accessing data via the
web. Given a query over a set of relations as well as some integrity
constraints that relate the queried relations to the data sources, we examine
the problem of deciding if the query is answerable over the interfaces; that
is, whether there exists a plan that returns all answers to the query, assuming
the source data satisfies the integrity constraints.
The first component of our analysis of answerability is a reduction to a
query containment problem with constraints. The second component is a set of
"schema simplification" theorems capturing limitations on how interfaces with
result bounds can be useful to obtain complete answers to queries. These
results also help to show decidability for the containment problem that
captures answerability, for many classes of constraints. The final component in
our analysis of answerability is a "linearization" method, showing that query
containment with certain guarded dependencies -- including those that emerge
from answerability problems -- can be reduced to query containment for a
well-behaved class of linear dependencies. Putting these components together,
we get a detailed picture of how to check answerability over result-bounded
services.Comment: 45 pages, 2 tables, 43 references. Complete version with proofs of
the PODS'18 paper. The main text of this paper is almost identical to the
PODS'18 except that we have fixed some small mistakes. Relative to the
earlier arXiv version, many errors were corrected, and some terminology has
change
Uniform Reliability for Unbounded Homomorphism-Closed Graph Queries
We study the uniform query reliability problem, which asks, for a fixed
Boolean query Q, given an instance I, how many subinstances of I satisfy Q.
Equivalently, this is a restricted case of Boolean query evaluation on
tuple-independent probabilistic databases where all facts must have probability
1/2. We focus on graph signatures, and on queries closed under homomorphisms.
We show that for any such query that is unbounded, i.e., not equivalent to a
union of conjunctive queries, the uniform reliability problem is #P-hard. This
recaptures the hardness, e.g., of s-t connectedness, which counts how many
subgraphs of an input graph have a path between a source and a sink.
This new hardness result on uniform reliability strengthens our earlier
hardness result on probabilistic query evaluation for unbounded
homomorphism-closed queries (ICDT'20). Indeed, our earlier proof crucially used
facts with probability 1, so it did not apply to the unweighted case. The new
proof presented in this paper avoids this; it uses our recent hardness result
on uniform reliability for non-hierarchical conjunctive queries without
self-joins (ICDT'21), along with new techniques.Comment: 41 pages. Submitte
On the Complexity of Mining Itemsets from the Crowd Using Taxonomies
We study the problem of frequent itemset mining in domains where data is not
recorded in a conventional database but only exists in human knowledge. We
provide examples of such scenarios, and present a crowdsourcing model for them.
The model uses the crowd as an oracle to find out whether an itemset is
frequent or not, and relies on a known taxonomy of the item domain to guide the
search for frequent itemsets. In the spirit of data mining with oracles, we
analyze the complexity of this problem in terms of (i) crowd complexity, that
measures the number of crowd questions required to identify the frequent
itemsets; and (ii) computational complexity, that measures the computational
effort required to choose the questions. We provide lower and upper complexity
bounds in terms of the size and structure of the input taxonomy, as well as the
size of a concise description of the output itemsets. We also provide
constructive algorithms that achieve the upper bounds, and consider more
efficient variants for practical situations.Comment: 18 pages, 2 figures. To be published to ICDT'13. Added missing
acknowledgemen
- …