1,078 research outputs found
Query Reformulation: Data Integration Approach to Multi Domain Query Answering System
Data integration gives the user with a unified view of all heterogeneous data sources. The basic service provided by data integration is query processing. Whatever query posed to the system is being given to global schema which has to reformulate to sub queries that are to be posed to the local sources. Reformulation is being accomplished by mapping between global and local sources by Global-as-View (GAV), Local-as-view (LAV) and Global-local-as-view (GLAV) approach. When a query involves multiple domains, it is difficult to extract information in case of general service engines
Query Containment for Highly Expressive Datalog Fragments
The containment problem of Datalog queries is well known to be undecidable.
There are, however, several Datalog fragments for which containment is known to
be decidable, most notably monadic Datalog and several "regular" query
languages on graphs. Monadically Defined Queries (MQs) have been introduced
recently as a joint generalization of these query languages. In this paper, we
study a wide range of Datalog fragments with decidable query containment and
determine exact complexity results for this problem. We generalize MQs to
(Frontier-)Guarded Queries (GQs), and show that the containment problem is
3ExpTime-complete in either case, even if we allow arbitrary Datalog in the
sub-query. If we focus on graph query languages, i.e., fragments of linear
Datalog, then this complexity is reduced to 2ExpSpace. We also consider nested
queries, which gain further expressivity by using predicates that are defined
by inner queries. We show that nesting leads to an exponentially increasing
hierarchy for the complexity of query containment, both in the linear and in
the general case. Our results settle open problems for (nested) MQs, and they
paint a comprehensive picture of the state of the art in Datalog query
containment.Comment: 20 page
Recommended from our members
A Monte Carlo model checker for probabilistic LTL with numerical constraints
We define the syntax and semantics of a new temporal logic called probabilistic LTL with numerical constraints (PLTLc).
We introduce an efficient model checker for PLTLc properties. The efficiency of the model checker is through approximation
using Monte Carlo sampling of finite paths through the model’s state space (simulation outputs) and parallel model checking
of the paths. Our model checking method can be applied to any model producing quantitative output – continuous or
stochastic, including those with complex dynamics and those with an infinite state space. Furthermore, our offline approach
allows the analysis of observed (real-life) behaviour traces. We find in this paper that PLTLc properties with constraints
over free variables can replace full model checking experiments, resulting in a significant gain in efficiency. This overcomes
one disadvantage of model checking experiments which is that the complexity depends on system granularity and number of
variables, and quickly becomes infeasible. We focus on models of biochemical networks, and specifically in this paper on
intracellular signalling pathways; however our method can be applied to a wide range of biological as well as technical
systems and their models. Our work contributes to the emerging field of synthetic biology by proposing a rigourous approach
for the structured formal engineering of biological systems
The Dichotomy of Evaluating Homomorphism-Closed Queries on Probabilistic Graphs
We study the problem of probabilistic query evaluation on probabilistic
graphs, namely, tuple-independent probabilistic databases on signatures of
arity two. Our focus is the class of queries that is closed under
homomorphisms, or equivalently, the infinite unions of conjunctive queries. Our
main result states that all unbounded queries from this class are #P-hard for
probabilistic query evaluation. As bounded queries from this class are
equivalent to a union of conjunctive queries, they are already classified by
the dichotomy of Dalvi and Suciu (2012). Hence, our result and theirs imply a
complete data complexity dichotomy, between polynomial time and #P-hardness,
for evaluating infinite unions of conjunctive queries over probabilistic
graphs. This dichotomy covers in particular all fragments of infinite unions of
conjunctive queries such as negation-free (disjunctive) Datalog, regular path
queries, and a large class of ontology-mediated queries on arity-two
signatures. Our result is shown by reducing from counting the valuations of
positive partitioned 2-DNF formulae for some queries, or from the
source-to-target reliability problem in an undirected graph for other queries,
depending on properties of minimal models. The presented dichotomy result
applies to even a special case of probabilistic query evaluation called
generalized model counting, where fact probabilities must be 0, 0.5, or 1.Comment: 30 pages. Journal version of the ICDT'20 paper
https://drops.dagstuhl.de/opus/volltexte/2020/11939/. Submitted to LMCS. The
previous version (version 2) was the same as the ICDT'20 paper with some
minor formatting tweaks and 7 extra pages of technical appendi
Federated Query Processing for the Semantic Web
The recent years have witnessed a constant growth in the amount of RDF data available on the Web. This growth is largely based on the increasing rate of data publication on the Web by different actors such governments, life science researchers or geographical institutes. RDF data generation is mainly done by converting already existing legacy data resources into RDF (e.g. converting data stored in relational databases into RDF), but also by creating that RDF data directly (e.g. sensors). These RDF data are normally exposed by means of Linked Data-enabled URIs and SPARQL endpoints. Given the sustained growth that we are experiencing in the number of SPARQL endpoints available, the need to be able to send federated SPARQL queries across them has also grown. Tools for accessing sets of RDF data repositories are starting to appear, differing between them on the way in which they allow users to access these data (allowing users to specify directly what RDF data set they want to query, or making this process transparent to them). To overcome this heterogeneity in federated query processing solutions, the W3C SPARQL working group is defining a federation extension for SPARQL 1.1, which allows combining in a single query, graph patterns that can be evaluated in several endpoints. In this PhD thesis, we describe the syntax of that SPARQL extension for providing access to distributed RDF data sets and formalise its semantics. We adapt existing techniques for distributed data access in relational databases in order to deal with SPARQL endpoints, which we have implemented in our federation query evaluation system (SPARQL-DQP). We describe the static optimisation techniques that we implemented in our system and we carry out a series of experiments that show that our optimisations significantly speed up the query evaluation process in presence of large query results and optional operator
- …