172 research outputs found
Teaching an RDBMS about ontological constraints
International audienceIn the presence of an ontology, query answers must reflect not only data explicitly present in the database, but also implicit data, which holds due to the ontology, even though it is not present in the database. A large and useful set of ontology languages enjoys FOL reducibility of query answering: answering a query can be reduced to evaluating a certain first-order logic (FOL) formula (obtained from the query and ontology) against only the explicit facts. We present a novel query optimization framework for ontology-based data access settings enjoying FOL reducibility. Our framework is based on searching within a set of alternative equivalent FOL queries, i.e., FOL reformulations, one with minimal evaluation cost when evaluated through a relational database system. We apply this framework to the DL-LiteR Description Logic underpinning the W3C's OWL2 QL ontology language, and demonstrate through experiments its performance benefits when two leading SQL systems, one open-source and one commercial, are used for evaluating the FOL query reformulations
Magic Sets for Disjunctive Datalog Programs
In this paper, a new technique for the optimization of (partially) bound
queries over disjunctive Datalog programs with stratified negation is
presented. The technique exploits the propagation of query bindings and extends
the Magic Set (MS) optimization technique.
An important feature of disjunctive Datalog is nonmonotonicity, which calls
for nondeterministic implementations, such as backtracking search. A
distinguishing characteristic of the new method is that the optimization can be
exploited also during the nondeterministic phase. In particular, after some
assumptions have been made during the computation, parts of the program may
become irrelevant to a query under these assumptions. This allows for dynamic
pruning of the search space. In contrast, the effect of the previously defined
MS methods for disjunctive Datalog is limited to the deterministic portion of
the process. In this way, the potential performance gain by using the proposed
method can be exponential, as could be observed empirically.
The correctness of MS is established thanks to a strong relationship between
MS and unfounded sets that has not been studied in the literature before. This
knowledge allows for extending the method also to programs with stratified
negation in a natural way.
The proposed method has been implemented in DLV and various experiments have
been conducted. Experimental results on synthetic data confirm the utility of
MS for disjunctive Datalog, and they highlight the computational gain that may
be obtained by the new method w.r.t. the previously proposed MS methods for
disjunctive Datalog programs. Further experiments on real-world data show the
benefits of MS within an application scenario that has received considerable
attention in recent years, the problem of answering user queries over possibly
inconsistent databases originating from integration of autonomous sources of
information.Comment: 67 pages, 19 figures, preprint submitted to Artificial Intelligenc
CREOLE: a Universal Language for Creating, Requesting, Updating and Deleting Resources
In the context of Service-Oriented Computing, applications can be developed
following the REST (Representation State Transfer) architectural style. This
style corresponds to a resource-oriented model, where resources are manipulated
via CRUD (Create, Request, Update, Delete) interfaces. The diversity of CRUD
languages due to the absence of a standard leads to composition problems
related to adaptation, integration and coordination of services. To overcome
these problems, we propose a pivot architecture built around a universal
language to manipulate resources, called CREOLE, a CRUD Language for Resource
Edition. In this architecture, scripts written in existing CRUD languages, like
SQL, are compiled into Creole and then executed over different CRUD interfaces.
After stating the requirements for a universal language for manipulating
resources, we formally describe the language and informally motivate its
definition with respect to the requirements. We then concretely show how the
architecture solves adaptation, integration and coordination problems in the
case of photo management in Flickr and Picasa, two well-known service-oriented
applications. Finally, we propose a roadmap for future work.Comment: In Proceedings FOCLASA 2010, arXiv:1007.499
Schema Independent Relational Learning
Learning novel concepts and relations from relational databases is an
important problem with many applications in database systems and machine
learning. Relational learning algorithms learn the definition of a new relation
in terms of existing relations in the database. Nevertheless, the same data set
may be represented under different schemas for various reasons, such as
efficiency, data quality, and usability. Unfortunately, the output of current
relational learning algorithms tends to vary quite substantially over the
choice of schema, both in terms of learning accuracy and efficiency. This
variation complicates their off-the-shelf application. In this paper, we
introduce and formalize the property of schema independence of relational
learning algorithms, and study both the theoretical and empirical dependence of
existing algorithms on the common class of (de) composition schema
transformations. We study both sample-based learning algorithms, which learn
from sets of labeled examples, and query-based algorithms, which learn by
asking queries to an oracle. We prove that current relational learning
algorithms are generally not schema independent. For query-based learning
algorithms we show that the (de) composition transformations influence their
query complexity. We propose Castor, a sample-based relational learning
algorithm that achieves schema independence by leveraging data dependencies. We
support the theoretical results with an empirical study that demonstrates the
schema dependence/independence of several algorithms on existing benchmark and
real-world datasets under (de) compositions
Querying Data Exchange Settings Beyond Positive Queries
Data exchange, the problem of transferring data from a source schema to a
target schema, has been studied for several years.
The semantics of answering positive queries over the target schema has been
defined in early work, but little attention has been paid to more general
queries. A few proposals of semantics for more general queries exist but they
either do not properly extend the standard semantics under positive queries,
giving rise to counterintuitive answers, or they make query answering
undecidable even for the most important data exchange settings, e.g., with
weakly-acyclic dependencies.
The goal of this paper is to provide a new semantics for data exchange that
is able to deal with general queries. At the same time, we want our semantics
to coincide with the classical one when focusing on positive queries, and to
not trade-off too much in terms of complexity of query answering. We show that
query answering is undecidable in general under the new semantics, but it is
\co\NP\complete when the dependencies are weakly-acyclic.
Moreover, in the latter case, we show that exact answers under our semantics
can be computed by means of logic programs with choice, thus exploiting
existing efficient systems. For more efficient computations, we also show that
our semantics allows for the construction of a representative target instance,
similar in spirit to a universal solution, that can be exploited for computing
approximate answers in polynomial time. Under consideration in Theory and
Practice of Logic Programming (TPLP).Comment: Under consideration in Theory and Practice of Logic Programming
(TPLP
Worst-case Optimal Query Answering for Greedy Sets of Existential Rules and Their Subclasses
The need for an ontological layer on top of data, associated with advanced
reasoning mechanisms able to exploit the semantics encoded in ontologies, has
been acknowledged both in the database and knowledge representation
communities. We focus in this paper on the ontological query answering problem,
which consists of querying data while taking ontological knowledge into
account. More specifically, we establish complexities of the conjunctive query
entailment problem for classes of existential rules (also called
tuple-generating dependencies, Datalog+/- rules, or forall-exists-rules. Our
contribution is twofold. First, we introduce the class of greedy
bounded-treewidth sets (gbts) of rules, which covers guarded rules, and their
most well-known generalizations. We provide a generic algorithm for query
entailment under gbts, which is worst-case optimal for combined complexity with
or without bounded predicate arity, as well as for data complexity and query
complexity. Secondly, we classify several gbts classes, whose complexity was
unknown, with respect to combined complexity (with both unbounded and bounded
predicate arity) and data complexity to obtain a comprehensive picture of the
complexity of existential rule fragments that are based on diverse guardedness
notions. Upper bounds are provided by showing that the proposed algorithm is
optimal for all of them
- âŠ