Search CORE

75 research outputs found

Early = Earliest?

Author: Lick Anthony
Niehren Joachim
Publication venue: HAL CCSD
Publication date: 16/10/2013
Field of study

Early query answering is the core issue of memory efficient query evaluation on data streams. The idea is to select and reject answer candidates as early as possible on the stream, so that they do not have to be stored in main memory. Since earliest query answering is unfeasible for XPath, as first no- ticed by Benedikt, Jeffrey and Ley-Wild in 2008, most exist- ing streaming algorithms for XPath approximate it in some early manner, while focussing on high time efficiency. Such approximations, however, spoil all theoretical guarantees on memory efficiency. In this paper, we prove that earliest query answering is indeed feasible for positive Forward XPath queries, which have neither unsatisfiable nor valid subqueries. The core in- sight is that a variant of Colmerauer's independence property can be proven for the corresponding fragment of the FXP tree logic. Based on this independence property, we can show that the early query answering algorithm from [13], which is based on a compiler from FXP to early nested word automata, is indeed earliest for all positive FXP0 queries with neither unsatisfiable nor valid subformulas. Further- more, this algorithm outperforms most previous algorithms for XPath evaluation on XML streams in time efficiency and coverage, as shown elsewhere. Available here.</p

HAL - Lille 3

INRIA a CCSD electronic archive server

Early Nested Word Automata for XPath Query Answering on XML Streams

Author: Debarbieux Denis
Gauwin Olivier
Niehren Joachim
Sebastian Tom
Zergaoui Mohamed
Publication venue: HAL CCSD
Publication date: 16/07/2013
Field of study

International audienceolynomial time for disjunctions of k-bounded simpl

INRIA a CCSD electronic archive server

Earliest Query Answering for Deterministic Nested Word Automata

Author: A. Berlea
A. Neumann
D. Olteanu
G. Miklau
H. Seidl
L. Segoufin
M. Benedikt
M. Grohe
O. Gauwin
O. Gauwin
R. Alur
W. Martens
Publication venue: 'Nordic Pulp and Paper Research Journal'
Publication date: 01/01/2009
Field of study

International audienceEarliest query answering (EQA) is an objective of many recent streaming algorithms for XML query answering, that aim for close to optimal memory management. In this paper, we show that EQA is infeasible even for a small fragment of Forward XPath except if P=NP. We then present an EQA algorithm for queries and schemas defined by deterministic nested word automata (dNWAs) and distinguish a large class of dNWAs for which streaming query answering is feasible in polynomial space and time

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

XQuery Streaming by Forest Transducers

Author: Hakuta Shizuya
Iwasaki Hideya
Maneth Sebastian
Nakano Keisuke
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/12/2013
Field of study

Streaming of XML transformations is a challenging task and only very few systems support streaming. Research approaches generally define custom fragments of XQuery and XPath that are amenable to streaming, and then design custom algorithms for each fragment. These languages have several shortcomings. Here we take a more principles approach to the problem of streaming XQuery-based transformations. We start with an elegant transducer model for which many static analysis problems are well-understood: the Macro Forest Transducer (MFT). We show that a large fragment of XQuery can be translated into MFTs --- indeed, a fragment of XQuery, that can express important features that are missing from other XQuery stream engines, such as GCX: our fragment of XQuery supports XPath predicates and let-statements. We then rely on a streaming execution engine for MFTs, one which uses a well-founded set of optimizations from functional programming, such as strictness analysis and deforestation. Our prototype achieves time and memory efficiency comparable to the fastest known engine for XQuery streaming, GCX. This is surprising because our engine relies on the OCaml built in garbage collector and does not use any specialized buffer management, while GCX's efficiency is due to clever and explicit buffer management.Comment: Full version of the paper in the Proceedings of the 30th IEEE International Conference on Data Engineering (ICDE 2014

arXiv.org e-Print Archive

CiteSeerX

Bounded Delay and Concurrency for Earliest Query Answering

Author: A. Berlea
A. Neumann
A. Weber
C. Allauzen
D. Olteanu
H. Seidl
J. Carme
J. Carme
M. Benedikt
O. Gauwin
R.E. Stearns
W. Martens
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

International audienceEarliest query answering is needed for streaming XML processing with optimal memory management. We study the feasibility of earliest query answering for node selection queries. Tractable queries are distinguished by a bounded number of concurrently alive answer candidates at every time point, and a bounded delay for node selection. We show that both properties are decidable in polynomial time for queries defined by deterministic automata for unranked trees. Our results are obtained by reduction to the bounded valuedness problem for recognizable relations between unranked trees

HAL - Lille 3

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

A Benchmark Collection of Deterministic Automata for XPath Queries

Author: Al Serhali Antonio
Niehren Joachim
Publication venue: HAL CCSD
Publication date: 09/06/2022
Field of study

International audienceWe provide a benchmark collection of deterministic automatafor regular XPath queries. For this, we select the subcollection offorward navigational XPath queries from a corpus that Lick and Schmitzextracted from real-world XSLT and XQuery programs, compile them tostepwise hedge automata (SHAs), and determinize them. Large blowups by automatadeterminization are avoided by using schema-based determinization. The schemacaptures the \XML data model and the fact thatany answer of a path query must return a single node.Our collection also provides deterministic nested word automatathat we obtain by compilation from deterministic SHAs

INRIA a CCSD electronic archive server

Projection for Nested Word Automata Speeds up XPath Evaluation on XML Streams

Author: Niehren Joachim
Sebastian Tom
Publication venue: HAL CCSD
Publication date: 23/01/2016
Field of study

International audienceWe present an evaluator for navigational XPath on Xmlstreams with projection. The idea is to project away those parts of anXml stream that are irrelevant for evaluating a given XPath query. Thistask is relevant for processing Xml streams in general since all Xmlstandard languages are based on XPath. The best existing streamingalgorithm for navigational XPath queries runs nested word automata.Therefore, we develop a projection algorithm for nested word automata,for the first time to the best of our knowledge. It turns out that projection can speed up the evaluation of navigational XPath queries on Xmlstreams by a factor of 4 in average on the usual XPath benchmarks.The extended version of the document is available in pdf here

INRIA a CCSD electronic archive server

Certain Query Answering on Compressed String Patterns: From Streams to Hyperstreams

Author: D Angluin
D Debarbieux
D Olteanu
H Björklund
H Straubing
M Blondin
O Gauwin
O Gauwin
O Gauwin
O Kupferman
P Bille
S Maneth
TJ Green
Publication venue: HAL CCSD
Publication date: 24/09/2018
Field of study

International audienceWe study the problem of certain query answering (CQA) on compressed string patterns. These are incomplete singleton context-free grammars, that can model systems of multiple streams with references to others, called hyperstreams more recently. In order to capture regular path queries on strings, we consider nondeterministic finite automata (NFAs) for query definition. It turns out that CQA for Boolean NFA queries is equivalent to regular string pattern inclusion, i.e., whether all strings completing a compressed string pattern belong to a regular language. We prove that CQA on compressed string patterns is PSpace- complete for NFA queries. The PSpace-hardness even applies to Boolean queries defined by deterministic finite automata (DFAs) and without compression. We also show that CQA on compressed linear string patterns can be solved in PTime for DFA queries. The proofs of the results presented here can be found in the long version of this paper (https://hal.inria.fr/hal-01846016)

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Certain Query Answering on Compressed String Patterns: From Streams to Hyperstreams (long version)

Author: Boneva Iovka
Niehren Joachim
Sakho Momar
Publication venue: HAL CCSD
Publication date: 20/07/2018
Field of study

We study the problem of certain query answering (CQA) on compressed string patterns. These are incomplete singleton context-free grammars, that can model systems of multiple streams with references to others, called hyperstreams more recently. In order to capture regular path queries on strings, we consider nondeterministic finite automata (NFAs) for query definition. It turns out that CQA for Boolean NFA queries is equivalent to regular string pattern inclusion, i.e., whether all strings completing a compressed string pattern belong to a regular language. We prove that CQA on compressed string patterns is PSPACE-complete for NFA queries. The PSPACE-hardness even applies to Boolean queries defined by deterministic finite automata (DFAs) and without compression. We also show that CQA on compressed linear string patterns can be solved in PTIME for DFA queries

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Determinization and Minimization of Automata for Nested Words Revisited

Author: Niehren Joachim
Sakho Momar
Publication venue: 'MDPI AG'
Publication date: 24/02/2021
Field of study

International audienceWe consider the problem of determinizing and minimizing automata for nested words in practice. For this we compile the nested regular expressions (

NRE_s

) from the usual XPath benchmark to nested word automata (

NW

A_s

). The determinization of these

NW

A_s

, however, fails to produce reasonably small automata. In the best case, huge deterministic

NW

A_s

are produced after few hours, even for relatively small

NRE_s

of the benchmark. We propose a different approach to the determinization of automata for nested words. For this, we introduce stepwise hedge automata (

SHA_s

) that generalize naturally on both (stepwise) tree automata and on finite word automata. We then show how to determinize

SHA_s

, yielding reasonably small deterministic automata for the

NRE_s

from the XPath benchmark. The size of deterministic

SHA_s

automata can be reduced further by a novel minimization algorithm for a subclass of

SHA_s

. In order to understand why the new approach to determinization and minimization works so nicely, we investigate the relationship between

NWA_s

and

SHA_s

further. Clearly, deterministic

SHA_s

can be compiled to deterministic NWAs in linear time, and conversely,

NW

A_s

can be compiled to nondeterministic

SHA_s

in polynomial time. Therefore, we can use

SHA_s

as intermediates for determinizing

NWA_s

, while avoiding the huge size increase with the usual determinization algorithm for

NWA_s

. Notably, the NWAs obtained from the

SHA_s

perform bottom-up and left-to-right computations only, but no top-down computations. This

NWA

-behavior can be distinguished syntactically by the (weak) single-entry property, suggesting a close relationship between

SHA_s

and single-entry

NWA_s

. In particular, it turns out that the usual determinization algorithm for

NWA_s

behaves well for single-entry

NWA_s

, while it quickly explodes without the single-entry property. Furthermore, it is known that the class of deterministic multi-module single-entry

NWA_s

enjoys unique minimization. The subclass of deterministic

SHA_s

to which our novel minimization algorithm applies is different though, in that we do not impose multiple modules. As further optimizations for reducing the sizes of the constructed

SHA_s

, we propose schema-based cleaning and symbolic representations based on apply-else rules, that can be maintained by determinization. We implemented the optimizations and report the experimental results for the automata constructed for the XPathMark benchmark

Multidisciplinary Digital Publishing Institute

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1