1,543 research outputs found
Context-Free Path Queries on RDF Graphs
Navigational graph queries are an important class of queries that canextract
implicit binary relations over the nodes of input graphs. Most of the
navigational query languages used in the RDF community, e.g. property paths in
W3C SPARQL 1.1 and nested regular expressions in nSPARQL, are based on the
regular expressions. It is known that regular expressions have limited
expressivity; for instance, some natural queries, like same generation-queries,
are not expressible with regular expressions. To overcome this limitation, in
this paper, we present cfSPARQL, an extension of SPARQL query language equipped
with context-free grammars. The cfSPARQL language is strictly more expressive
than property paths and nested expressions. The additional expressivity can be
used for modelling graph similarities, graph summarization and ontology
alignment. Despite the increasing expressivity, we show that cfSPARQL still
enjoys a low computational complexity and can be evaluated efficiently.Comment: 25 page
Logics for Unranked Trees: An Overview
Labeled unranked trees are used as a model of XML documents, and logical
languages for them have been studied actively over the past several years. Such
logics have different purposes: some are better suited for extracting data,
some for expressing navigational properties, and some make it easy to relate
complex properties of trees to the existence of tree automata for those
properties. Furthermore, logics differ significantly in their model-checking
properties, their automata models, and their behavior on ordered and unordered
trees. In this paper we present a survey of logics for unranked trees
Joining Extractions of Regular Expressions
Regular expressions with capture variables, also known as "regex formulas,"
extract relations of spans (interval positions) from text. These relations can
be further manipulated via Relational Algebra as studied in the context of
document spanners, Fagin et al.'s formal framework for information extraction.
We investigate the complexity of querying text by Conjunctive Queries (CQs) and
Unions of CQs (UCQs) on top of regex formulas. We show that the lower bounds
(NP-completeness and W[1]-hardness) from the relational world also hold in our
setting; in particular, hardness hits already single-character text! Yet, the
upper bounds from the relational world do not carry over. Unlike the relational
world, acyclic CQs, and even gamma-acyclic CQs, are hard to compute. The source
of hardness is that it may be intractable to instantiate the relation defined
by a regex formula, simply because it has an exponential number of tuples. Yet,
we are able to establish general upper bounds. In particular, UCQs can be
evaluated with polynomial delay, provided that every CQ has a bounded number of
atoms (while unions and projection can be arbitrary). Furthermore, UCQ
evaluation is solvable with FPT (Fixed-Parameter Tractable) delay when the
parameter is the size of the UCQ
Enumerating Subgraph Instances Using Map-Reduce
The theme of this paper is how to find all instances of a given "sample"
graph in a larger "data graph," using a single round of map-reduce. For the
simplest sample graph, the triangle, we improve upon the best known such
algorithm. We then examine the general case, considering both the communication
cost between mappers and reducers and the total computation cost at the
reducers. To minimize communication cost, we exploit the techniques of (Afrati
and Ullman, TKDE 2011)for computing multiway joins (evaluating conjunctive
queries) in a single map-reduce round. Several methods are shown for
translating sample graphs into a union of conjunctive queries with as few
queries as possible. We also address the matter of optimizing computation cost.
Many serial algorithms are shown to be "convertible," in the sense that it is
possible to partition the data graph, explore each partition in a separate
reducer, and have the total computation cost at the reducers be of the same
order as the computation cost of the serial algorithm.Comment: 37 page
Regular Queries on Graph Databases
Graph databases are currently one of the most popular paradigms for storing data. One of the key conceptual differences between graph and relational databases is the focus on navigational queries that ask whether some nodes are connected by paths satisfying certain restrictions. This focus has driven the definition of several different query languages and the subsequent study of their fundamental properties.
We define the graph query language of Regular Queries, which is a natural extension of unions of conjunctive 2-way regular path queries (UC2RPQs) and unions of conjunctive nested 2-way regular path queries (UCN2RPQs). Regular queries allow expressing complex regular patterns between nodes. We formalize regular queries as nonrecursive Datalog programs with transitive closure rules. This language has been previously considered, but its algorithmic properties are not well understood.
Our main contribution is to show elementary tight bounds for the containment problem for regular queries. Specifically, we show that this problem is 2EXPSPACE-complete. For all extensions of regular queries known to date, the containment problem turns out to be non-elementary. Together with the fact that evaluating regular queries is not harder than evaluating UCN2RPQs, our results show that regular queries achieve a good balance between expressiveness and complexity, and constitute a well-behaved class that deserves further investigation
An Analytical Study of Large SPARQL Query Logs
With the adoption of RDF as the data model for Linked Data and the Semantic
Web, query specification from end- users has become more and more common in
SPARQL end- points. In this paper, we conduct an in-depth analytical study of
the queries formulated by end-users and harvested from large and up-to-date
query logs from a wide variety of RDF data sources. As opposed to previous
studies, ours is the first assessment on a voluminous query corpus, span- ning
over several years and covering many representative SPARQL endpoints. Apart
from the syntactical structure of the queries, that exhibits already
interesting results on this generalized corpus, we drill deeper in the
structural char- acteristics related to the graph- and hypergraph represen-
tation of queries. We outline the most common shapes of queries when visually
displayed as pseudographs, and char- acterize their (hyper-)tree width.
Moreover, we analyze the evolution of queries over time, by introducing the
novel con- cept of a streak, i.e., a sequence of queries that appear as
subsequent modifications of a seed query. Our study offers several fresh
insights on the already rich query features of real SPARQL queries formulated
by real users, and brings us to draw a number of conclusions and pinpoint
future di- rections for SPARQL query evaluation, query optimization, tuning,
and benchmarking
Four Lessons in Versatility or How Query Languages Adapt to the Web
Exposing not only human-centered information, but machine-processable data on the Web is one of the commonalities of recent Web trends. It has enabled a new kind of applications and businesses where the data is used in ways not foreseen by the data providers. Yet this exposition has fractured the Web into islands of data, each in different Web formats: Some providers choose XML, others RDF, again others JSON or OWL, for their data, even in similar domains. This fracturing stifles innovation as application builders have to cope not only with one Web stack (e.g., XML technology) but with several ones, each of considerable complexity. With Xcerpt we have developed a rule- and pattern based query language that aims to give shield application builders from much of this complexity: In a single query language XML and RDF data can be accessed, processed, combined, and re-published. Though the need for combined access to XML and RDF data has been recognized in previous work (including the W3C’s GRDDL), our approach differs in four main aspects: (1) We provide a single language (rather than two separate or embedded languages), thus minimizing the conceptual overhead of dealing with disparate data formats. (2) Both the declarative (logic-based) and the operational semantics are unified in that they apply for querying XML and RDF in the same way. (3) We show that the resulting query language can be implemented reusing traditional database technology, if desirable. Nevertheless, we also give a unified evaluation approach based on interval labelings of graphs that is at least as fast as existing approaches for tree-shaped XML data, yet provides linear time and space querying also for many RDF graphs. We believe that Web query languages are the right tool for declarative data access in Web applications and that Xcerpt is a significant step towards a more convenient, yet highly efficient data access in a “Web of Data”
- …