30,182 research outputs found
Regular path queries on graphs with data
Graph data models received much attention lately due to applications in social networks, semantic web, biological databases and other areas. Typical query languages for graph databases retrieve their topology, while actual data stored in them is usually queried using standard relational mechanisms. Our goal is to develop techniques that combine these two modes of querying, and give us query languages that can ask questions about both data and topology. As the basic querying mechanism we consider regular path queries, with the key difference that conditions on paths between nodes now talk not only about labels but also specify how data changes along the path. Paths that combine edge labels with data values are closely related to data words, so for stating conditions in queries, we look at several data-word formalisms developed recently. We show that many of them immediately lead to intractable data complexity for graph queries, with the notable exception of register automata, which can specify many properties of interest, and have NLOGSPACE data and PSPACE combined complexity. As register automata themselves are not easy to use in querying, we define two types of extensions of regular expressions that are more userfriendly, and develop query evaluation techniques for them. For one class, regular expressions with memory, we achieve the same bounds as for automata, and for the other class, regular expressions with equality, we also obtain tractable combined complexity of query evaluation. In addition, we show that results extends to analogs of conjunctive regular path queries
A Trichotomy for Regular Simple Path Queries on Graphs
Regular path queries (RPQs) select nodes connected by some path in a graph.
The edge labels of such a path have to form a word that matches a given regular
expression. We investigate the evaluation of RPQs with an additional constraint
that prevents multiple traversals of the same nodes. Those regular simple path
queries (RSPQs) find several applications in practice, yet they quickly become
intractable, even for basic languages such as (aa)* or a*ba*.
In this paper, we establish a comprehensive classification of regular
languages with respect to the complexity of the corresponding regular simple
path query problem. More precisely, we identify the fragment that is maximal in
the following sense: regular simple path queries can be evaluated in polynomial
time for every regular language L that belongs to this fragment and evaluation
is NP-complete for languages outside this fragment. We thus fully characterize
the frontier between tractability and intractability for RSPQs, and we refine
our results to show the following trichotomy: Evaluations of RSPQs is either
AC0, NL-complete or NP-complete in data complexity, depending on the regular
language L. The fragment identified also admits a simple characterization in
terms of regular expressions.
Finally, we also discuss the complexity of the following decision problem:
decide, given a language L, whether finding a regular simple path for L is
tractable. We consider several alternative representations of L: DFAs, NFAs or
regular expressions, and prove that this problem is NL-complete for the first
representation and PSPACE-complete for the other two. As a conclusion we extend
our results from edge-labeled graphs to vertex-labeled graphs and vertex-edge
labeled graphs.Comment: 15 pages, conference submissio
Expressive Path Queries on Graph with Data
Graph data models have recently become popular owing to their applications,
e.g., in social networks and the semantic web. Typical navigational query
languages over graph databases - such as Conjunctive Regular Path Queries
(CRPQs) - cannot express relevant properties of the interaction between the
underlying data and the topology. Two languages have been recently proposed to
overcome this problem: walk logic (WL) and regular expressions with memory
(REM). In this paper, we begin by investigating fundamental properties of WL
and REM, i.e., complexity of evaluation problems and expressive power. We first
show that the data complexity of WL is nonelementary, which rules out its
practicality. On the other hand, while REM has low data complexity, we point
out that many natural data/topology properties of graphs expressible in WL
cannot be expressed in REM. To this end, we propose register logic, an
extension of REM, which we show to be able to express many natural graph
properties expressible in WL, while at the same time preserving the
elementariness of data complexity of REMs. It is also incomparable to WL in
terms of expressive power.Comment: 39 page
Path Logics for Querying Graphs: Combining Expressiveness and Efficiency
International audienceWe study logics expressing properties of paths in graphs that are tailored to querying graph databases: a data model for new applications such as social networks, the Semantic Web, biological data, crime detection, and others. The basic construct of such logics, a regular path query, checks for paths whose labels belong to a regular language. These logics fail to capture two commonly needed features: counting properties, and the ability to compare paths. It is known that regular path-comparison relations (e.g., prefix or equality) can be added without significant complexity overhead; however, adding common relations often demanded by applications (e.g., subword, subsequence, suffix) results in either undecidability or astronomical complexity. We propose, as a way around this problem, to use automata with counting functionalities, namely Parikh automata. They express many counting properties directly, and they approximate many relations of interest. We prove that with Parikh automata defining both languages and relations used in queries, we retain the low complexity of the standard path logics for graphs. In particular, this gives us efficient approximations to queries with prohibitively high complexity. We extend the best known decidability results by showing that even more expressive classes of relations are possible in query languages (sometimes with restriction on the shape of formulae). We also show that Parikh automata admit two convenient representations by analogs of regular expressions, making them usable in real-life querying
The Dichotomy of Evaluating Homomorphism-Closed Queries on Probabilistic Graphs
We study the problem of probabilistic query evaluation on probabilistic
graphs, namely, tuple-independent probabilistic databases on signatures of
arity two. Our focus is the class of queries that is closed under
homomorphisms, or equivalently, the infinite unions of conjunctive queries. Our
main result states that all unbounded queries from this class are #P-hard for
probabilistic query evaluation. As bounded queries from this class are
equivalent to a union of conjunctive queries, they are already classified by
the dichotomy of Dalvi and Suciu (2012). Hence, our result and theirs imply a
complete data complexity dichotomy, between polynomial time and #P-hardness,
for evaluating infinite unions of conjunctive queries over probabilistic
graphs. This dichotomy covers in particular all fragments of infinite unions of
conjunctive queries such as negation-free (disjunctive) Datalog, regular path
queries, and a large class of ontology-mediated queries on arity-two
signatures. Our result is shown by reducing from counting the valuations of
positive partitioned 2-DNF formulae for some queries, or from the
source-to-target reliability problem in an undirected graph for other queries,
depending on properties of minimal models. The presented dichotomy result
applies to even a special case of probabilistic query evaluation called
generalized model counting, where fact probabilities must be 0, 0.5, or 1.Comment: 30 pages. Journal version of the ICDT'20 paper
https://drops.dagstuhl.de/opus/volltexte/2020/11939/. Submitted to LMCS. The
previous version (version 2) was the same as the ICDT'20 paper with some
minor formatting tweaks and 7 extra pages of technical appendi
Regular Path Query Evaluation on Streaming Graphs
We study persistent query evaluation over streaming graphs, which is becoming
increasingly important. We focus on navigational queries that determine if
there exists a path between two entities that satisfies a user-specified
constraint. We adopt the Regular Path Query (RPQ) model that specifies
navigational patterns with labeled constraints. We propose deterministic
algorithms to efficiently evaluate persistent RPQs under both arbitrary and
simple path semantics in a uniform manner. Experimental analysis on real and
synthetic streaming graphs shows that the proposed algorithms can process up to
tens of thousands of edges per second and efficiently answer RPQs that are
commonly used in real-world workloads.Comment: A shorter version of this paper has been accepted for publication in
2020 International Conference on Management of Data (SIGMOD 2020
Context-Free Path Queries on RDF Graphs
Navigational graph queries are an important class of queries that canextract
implicit binary relations over the nodes of input graphs. Most of the
navigational query languages used in the RDF community, e.g. property paths in
W3C SPARQL 1.1 and nested regular expressions in nSPARQL, are based on the
regular expressions. It is known that regular expressions have limited
expressivity; for instance, some natural queries, like same generation-queries,
are not expressible with regular expressions. To overcome this limitation, in
this paper, we present cfSPARQL, an extension of SPARQL query language equipped
with context-free grammars. The cfSPARQL language is strictly more expressive
than property paths and nested expressions. The additional expressivity can be
used for modelling graph similarities, graph summarization and ontology
alignment. Despite the increasing expressivity, we show that cfSPARQL still
enjoys a low computational complexity and can be evaluated efficiently.Comment: 25 page
- …