4,330 research outputs found
On the Complexity of Enumerating the Answers to Well-designed Pattern Trees
Well-designed pattern trees (wdPTs) have been introduced as an extension of conjunctive queries to allow for partial matching - analogously to the OPTIONAL operator of the semantic web query language SPARQL. Several computational problems of wdPTs have been studied in recent years, such as the evaluation problem in various settings, the counting problem, as well as static analysis tasks including the containment and equivalence problems. Also restrictions needed to achieve tractability of these tasks have been proposed. In contrast, the problem of enumerating the answers to a wdPT has been largely ignored so far. In this work, we embark on a systematic study of the complexity of the enumeration problem of wdPTs. As our main result, we identify several tractable and intractable cases of this problem both from a classical complexity point of view and from a parameterized complexity point of view
Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers
We aim to provide table answers to keyword queries against knowledge bases.
For queries referring to multiple entities, like "Washington cities population"
and "Mel Gibson movies", it is better to represent each relevant answer as a
table which aggregates a set of entities or entity-joins within the same table
scheme or pattern. In this paper, we study how to find highly relevant patterns
in a knowledge base for user-given keyword queries to compose table answers. A
knowledge base can be modeled as a directed graph called knowledge graph, where
nodes represent entities in the knowledge base and edges represent the
relationships among them. Each node/edge is labeled with type and text. A
pattern is an aggregation of subtrees which contain all keywords in the texts
and have the same structure and types on node/edges. We propose efficient
algorithms to find patterns that are relevant to the query for a class of
scoring functions. We show the hardness of the problem in theory, and propose
path-based indexes that are affordable in memory. Two query-processing
algorithms are proposed: one is fast in practice for small queries (with small
patterns as answers) by utilizing the indexes; and the other one is better in
theory, with running time linear in the sizes of indexes and answers, which can
handle large queries better. We also conduct extensive experimental study to
compare our approaches with a naive adaption of known techniques.Comment: VLDB 201
The tractability frontier of well-designed SPARQL queries
We study the complexity of query evaluation of SPARQL queries. We focus on
the fundamental fragment of well-designed SPARQL restricted to the AND,
OPTIONAL and UNION operators. Our main result is a structural characterisation
of the classes of well-designed queries that can be evaluated in polynomial
time. In particular, we introduce a new notion of width called domination
width, which relies on the well-known notion of treewidth. We show that, under
some complexity theoretic assumptions, the classes of well-designed queries
that can be evaluated in polynomial time are precisely those of bounded
domination width
Adding Logical Operators to Tree Pattern Queries on Graph-Structured Data
As data are increasingly modeled as graphs for expressing complex
relationships, the tree pattern query on graph-structured data becomes an
important type of queries in real-world applications. Most practical query
languages, such as XQuery and SPARQL, support logical expressions using
logical-AND/OR/NOT operators to define structural constraints of tree patterns.
In this paper, (1) we propose generalized tree pattern queries (GTPQs) over
graph-structured data, which fully support propositional logic of structural
constraints. (2) We make a thorough study of fundamental problems including
satisfiability, containment and minimization, and analyze the computational
complexity and the decision procedures of these problems. (3) We propose a
compact graph representation of intermediate results and a pruning approach to
reduce the size of intermediate results and the number of join operations --
two factors that often impair the efficiency of traditional algorithms for
evaluating tree pattern queries. (4) We present an efficient algorithm for
evaluating GTPQs using 3-hop as the underlying reachability index. (5)
Experiments on both real-life and synthetic data sets demonstrate the
effectiveness and efficiency of our algorithm, from several times to orders of
magnitude faster than state-of-the-art algorithms in terms of evaluation time,
even for traditional tree pattern queries with only conjunctive operations.Comment: 16 page
An Analytical Study of Large SPARQL Query Logs
With the adoption of RDF as the data model for Linked Data and the Semantic
Web, query specification from end- users has become more and more common in
SPARQL end- points. In this paper, we conduct an in-depth analytical study of
the queries formulated by end-users and harvested from large and up-to-date
query logs from a wide variety of RDF data sources. As opposed to previous
studies, ours is the first assessment on a voluminous query corpus, span- ning
over several years and covering many representative SPARQL endpoints. Apart
from the syntactical structure of the queries, that exhibits already
interesting results on this generalized corpus, we drill deeper in the
structural char- acteristics related to the graph- and hypergraph represen-
tation of queries. We outline the most common shapes of queries when visually
displayed as pseudographs, and char- acterize their (hyper-)tree width.
Moreover, we analyze the evolution of queries over time, by introducing the
novel con- cept of a streak, i.e., a sequence of queries that appear as
subsequent modifications of a seed query. Our study offers several fresh
insights on the already rich query features of real SPARQL queries formulated
by real users, and brings us to draw a number of conclusions and pinpoint
future di- rections for SPARQL query evaluation, query optimization, tuning,
and benchmarking
Performance and scalability of indexed subgraph query processing methods
Graph data management systems have become very popular
as graphs are the natural data model for many applications.
One of the main problems addressed by these systems is subgraph
query processing; i.e., given a query graph, return all
graphs that contain the query. The naive method for processing
such queries is to perform a subgraph isomorphism
test against each graph in the dataset. This obviously does
not scale, as subgraph isomorphism is NP-Complete. Thus,
many indexing methods have been proposed to reduce the
number of candidate graphs that have to underpass the subgraph
isomorphism test. In this paper, we identify a set of
key factors-parameters, that influence the performance of
related methods: namely, the number of nodes per graph,
the graph density, the number of distinct labels, the number
of graphs in the dataset, and the query graph size. We then
conduct comprehensive and systematic experiments that analyze
the sensitivity of the various methods on the values of
the key parameters. Our aims are twofold: first to derive
conclusions about the algorithms’ relative performance, and,
second, to stress-test all algorithms, deriving insights as to
their scalability, and highlight how both performance and
scalability depend on the above factors. We choose six wellestablished
indexing methods, namely Grapes, CT-Index,
GraphGrepSX, gIndex, Tree+∆, and gCode, as representative
approaches of the overall design space, including the
most recent and best performing methods. We report on
their index construction time and index size, and on query
processing performance in terms of time and false positive
ratio. We employ both real and synthetic datasets. Specifi-
cally, four real datasets of different characteristics are used:
AIDS, PDBS, PCM, and PPI. In addition, we generate a
large number of synthetic graph datasets, empowering us to
systematically study the algorithms’ performance and scalability
versus the aforementioned key parameters
- …