699 research outputs found
AMaχoS—Abstract Machine for Xcerpt
Web query languages promise convenient and efficient access
to Web data such as XML, RDF, or Topic Maps. Xcerpt is one such Web
query language with strong emphasis on novel high-level constructs for
effective and convenient query authoring, particularly tailored to versatile
access to data in different Web formats such as XML or RDF.
However, so far it lacks an efficient implementation to supplement the
convenient language features. AMaχoS is an abstract machine implementation
for Xcerpt that aims at efficiency and ease of deployment. It
strictly separates compilation and execution of queries: Queries are compiled
once to abstract machine code that consists in (1) a code segment
with instructions for evaluating each rule and (2) a hint segment that
provides the abstract machine with optimization hints derived by the
query compilation. This article summarizes the motivation and principles
behind AMaχoS and discusses how its current architecture realizes
these principles
Four Lessons in Versatility or How Query Languages Adapt to the Web
Exposing not only human-centered information, but machine-processable data on the Web is one of the commonalities of recent Web trends. It has enabled a new kind of applications and businesses where the data is used in ways not foreseen by the data providers. Yet this exposition has fractured the Web into islands of data, each in different Web formats: Some providers choose XML, others RDF, again others JSON or OWL, for their data, even in similar domains. This fracturing stifles innovation as application builders have to cope not only with one Web stack (e.g., XML technology) but with several ones, each of considerable complexity. With Xcerpt we have developed a rule- and pattern based query language that aims to give shield application builders from much of this complexity: In a single query language XML and RDF data can be accessed, processed, combined, and re-published. Though the need for combined access to XML and RDF data has been recognized in previous work (including the W3C’s GRDDL), our approach differs in four main aspects: (1) We provide a single language (rather than two separate or embedded languages), thus minimizing the conceptual overhead of dealing with disparate data formats. (2) Both the declarative (logic-based) and the operational semantics are unified in that they apply for querying XML and RDF in the same way. (3) We show that the resulting query language can be implemented reusing traditional database technology, if desirable. Nevertheless, we also give a unified evaluation approach based on interval labelings of graphs that is at least as fast as existing approaches for tree-shaped XML data, yet provides linear time and space querying also for many RDF graphs. We believe that Web query languages are the right tool for declarative data access in Web applications and that Xcerpt is a significant step towards a more convenient, yet highly efficient data access in a “Web of Data”
Logics for Unranked Trees: An Overview
Labeled unranked trees are used as a model of XML documents, and logical
languages for them have been studied actively over the past several years. Such
logics have different purposes: some are better suited for extracting data,
some for expressing navigational properties, and some make it easy to relate
complex properties of trees to the existence of tree automata for those
properties. Furthermore, logics differ significantly in their model-checking
properties, their automata models, and their behavior on ordered and unordered
trees. In this paper we present a survey of logics for unranked trees
Evaluating Datalog via Tree Automata and Cycluits
We investigate parameterizations of both database instances and queries that
make query evaluation fixed-parameter tractable in combined complexity. We show
that clique-frontier-guarded Datalog with stratified negation (CFG-Datalog)
enjoys bilinear-time evaluation on structures of bounded treewidth for programs
of bounded rule size. Such programs capture in particular conjunctive queries
with simplicial decompositions of bounded width, guarded negation fragment
queries of bounded CQ-rank, or two-way regular path queries. Our result is
shown by translating to alternating two-way automata, whose semantics is
defined via cyclic provenance circuits (cycluits) that can be tractably
evaluated.Comment: 56 pages, 63 references. Journal version of "Combined Tractability of
Query Evaluation via Tree Automata and Cycluits (Extended Version)" at
arXiv:1612.04203. Up to the stylesheet, page/environment numbering, and
possible minor publisher-induced changes, this is the exact content of the
journal paper that will appear in Theory of Computing Systems. Update wrt
version 1: latest reviewer feedbac
Adding Logical Operators to Tree Pattern Queries on Graph-Structured Data
As data are increasingly modeled as graphs for expressing complex
relationships, the tree pattern query on graph-structured data becomes an
important type of queries in real-world applications. Most practical query
languages, such as XQuery and SPARQL, support logical expressions using
logical-AND/OR/NOT operators to define structural constraints of tree patterns.
In this paper, (1) we propose generalized tree pattern queries (GTPQs) over
graph-structured data, which fully support propositional logic of structural
constraints. (2) We make a thorough study of fundamental problems including
satisfiability, containment and minimization, and analyze the computational
complexity and the decision procedures of these problems. (3) We propose a
compact graph representation of intermediate results and a pruning approach to
reduce the size of intermediate results and the number of join operations --
two factors that often impair the efficiency of traditional algorithms for
evaluating tree pattern queries. (4) We present an efficient algorithm for
evaluating GTPQs using 3-hop as the underlying reachability index. (5)
Experiments on both real-life and synthetic data sets demonstrate the
effectiveness and efficiency of our algorithm, from several times to orders of
magnitude faster than state-of-the-art algorithms in terms of evaluation time,
even for traditional tree pattern queries with only conjunctive operations.Comment: 16 page
Compressed k2-Triples for Full-In-Memory RDF Engines
Current "data deluge" has flooded the Web of Data with very large RDF
datasets. They are hosted and queried through SPARQL endpoints which act as
nodes of a semantic net built on the principles of the Linked Data project.
Although this is a realistic philosophy for global data publishing, its query
performance is diminished when the RDF engines (behind the endpoints) manage
these huge datasets. Their indexes cannot be fully loaded in main memory, hence
these systems need to perform slow disk accesses to solve SPARQL queries. This
paper addresses this problem by a compact indexed RDF structure (called
k2-triples) applying compact k2-tree structures to the well-known
vertical-partitioning technique. It obtains an ultra-compressed representation
of large RDF graphs and allows SPARQL queries to be full-in-memory performed
without decompression. We show that k2-triples clearly outperforms
state-of-the-art compressibility and traditional vertical-partitioning query
resolution, remaining very competitive with multi-index solutions.Comment: In Proc. of AMCIS'201
Simulation Subsumption or Déjà vu on the Web
Simulation unification is a special kind of unification adapted to retrieving semi-structured data on the Web. This article introduces simulation subsumption, or containment, that is, query subsumption under simulation unification. Simulation subsumption is crucial in general for query optimization, in particular for optimizing pattern-based search engines, and for the termination of recursive rule-based web languages such as the XML and RDF query language Xcerpt. This paper first motivates and formalizes simulation subsumption. Then, it establishes decidability of simulation subsumption for advanced query patterns featuring descendant constructs, regular expressions, negative subterms (or subterm exclusions), and multiple variable occurrences. Finally, we show that subsumption between two query terms can be decided in O(n!n) where n is the sum of the sizes of both query terms
First-Order Query Evaluation with Cardinality Conditions
We study an extension of first-order logic that allows to express cardinality
conditions in a similar way as SQL's COUNT operator. The corresponding logic
FOC(P) was introduced by Kuske and Schweikardt (LICS'17), who showed that query
evaluation for this logic is fixed-parameter tractable on classes of structures
(or databases) of bounded degree. In the present paper, we first show that the
fixed-parameter tractability of FOC(P) cannot even be generalised to very
simple classes of structures of unbounded degree such as unranked trees or
strings with a linear order relation.
Then we identify a fragment FOC1(P) of FOC(P) which is still sufficiently
strong to express standard applications of SQL's COUNT operator. Our main
result shows that query evaluation for FOC1(P) is fixed-parameter tractable
with almost linear running time on nowhere dense classes of structures. As a
corollary, we also obtain a fixed-parameter tractable algorithm for counting
the number of tuples satisfying a query over nowhere dense classes of
structures
- …