1,714 research outputs found
Answering FO+MOD Queries Under Updates on Bounded Degree Databases
We investigate the query evaluation problem for fixed queries over fully dynamic databases, where tuples can be inserted or deleted. The task is to design a dynamic algorithm that immediately reports the new result of a fixed query after every database update.
We consider queries in first-order logic (FO) and its extension with modulo-counting quantifiers (FO+MOD), and show that they can be efficiently evaluated under updates, provided that the dynamic database does not exceed a certain degree bound.
In particular, we construct a data structure that allows to answer a Boolean FO+MOD query and to compute the size of the query result within constant time after every database update. Furthermore, after every update we are able to immediately enumerate the new query result with constant delay between the output tuples. The time needed to build the data structure is linear in the size of the database.
Our results extend earlier work on the evaluation of first-order queries on static databases of bounded degree and rely on an effective Hanf normal form for FO+MOD recently obtained by [Heimberg, Kuske, and Schweikardt, LICS, 2016]
Performance and scalability of indexed subgraph query processing methods
Graph data management systems have become very popular
as graphs are the natural data model for many applications.
One of the main problems addressed by these systems is subgraph
query processing; i.e., given a query graph, return all
graphs that contain the query. The naive method for processing
such queries is to perform a subgraph isomorphism
test against each graph in the dataset. This obviously does
not scale, as subgraph isomorphism is NP-Complete. Thus,
many indexing methods have been proposed to reduce the
number of candidate graphs that have to underpass the subgraph
isomorphism test. In this paper, we identify a set of
key factors-parameters, that influence the performance of
related methods: namely, the number of nodes per graph,
the graph density, the number of distinct labels, the number
of graphs in the dataset, and the query graph size. We then
conduct comprehensive and systematic experiments that analyze
the sensitivity of the various methods on the values of
the key parameters. Our aims are twofold: first to derive
conclusions about the algorithmsâ relative performance, and,
second, to stress-test all algorithms, deriving insights as to
their scalability, and highlight how both performance and
scalability depend on the above factors. We choose six wellestablished
indexing methods, namely Grapes, CT-Index,
GraphGrepSX, gIndex, Tree+â, and gCode, as representative
approaches of the overall design space, including the
most recent and best performing methods. We report on
their index construction time and index size, and on query
processing performance in terms of time and false positive
ratio. We employ both real and synthetic datasets. Specifi-
cally, four real datasets of different characteristics are used:
AIDS, PDBS, PCM, and PPI. In addition, we generate a
large number of synthetic graph datasets, empowering us to
systematically study the algorithmsâ performance and scalability
versus the aforementioned key parameters
A Practically Efficient Algorithm for Generating Answers to Keyword Search over Data Graphs
In keyword search over a data graph, an answer is a non-redundant subtree
that contains all the keywords of the query. A naive approach to producing all
the answers by increasing height is to generalize Dijkstra's algorithm to
enumerating all acyclic paths by increasing weight. The idea of freezing is
introduced so that (most) non-shortest paths are generated only if they are
actually needed for producing answers. The resulting algorithm for generating
subtrees, called GTF, is subtle and its proof of correctness is intricate.
Extensive experiments show that GTF outperforms existing systems, even ones
that for efficiency's sake are incomplete (i.e., cannot produce all the
answers). In particular, GTF is scalable and performs well even on large data
graphs and when many answers are needed.Comment: Full version of ICDT'16 pape
Enumeration Complexity of Conjunctive Queries with Functional Dependencies
We study the complexity of enumerating the answers of Conjunctive Queries (CQs) in the presence of Functional Dependencies (FDs). Our focus is on the ability to list output tuples with a constant delay in between, following a linear-time preprocessing. A known dichotomy classifies the acyclic self-join-free CQs into those that admit such enumeration, and those that do not. However, this classification no longer holds in the common case where the database exhibits dependencies among attributes. That is, some queries that are classified as hard are in fact tractable if dependencies are accounted for. We establish a generalization of the dichotomy to accommodate FDs; hence, our classification determines which combination of a CQ and a set of FDs admits constant-delay enumeration with a linear-time preprocessing.
In addition, we generalize a hardness result for cyclic CQs to accommodate a common type of FDs. Further conclusions of our development include a dichotomy for enumeration with linear delay, and a dichotomy for CQs with disequalities. Finally, we show that all our results apply to the known class of "cardinality dependencies" that generalize FDs (e.g., by stating an upper bound on the number of genres per movies, or friends per person)
First-order queries on classes of structures with bounded expansion
We consider the evaluation of first-order queries over classes of databases
with bounded expansion. The notion of bounded expansion is fairly broad and
generalizes bounded degree, bounded treewidth and exclusion of at least one
minor. It was known that over a class of databases with bounded expansion,
first-order sentences could be evaluated in time linear in the size of the
database. We give a different proof of this result. Moreover, we show that
answers to first-order queries can be enumerated with constant delay after a
linear time preprocessing. We also show that counting the number of answers to
a query can be done in time linear in the size of the database
- âŠ