14 research outputs found
A Functional Language for Hyperstreaming XSLT
The problem of how to transform large data trees received on streams with a much smaller memory is still an open challenge despite of a decade of research on XML. Therefore, the current approach of the XSLT working of the W3C is to provide streaming support only for a smaller fragment of XSLT 3.0. This has the drawback that many existing XSLT programs need to be rewritten in order to become executable on XML streams, while many others cannot be rewritten at all, since defining nonstreamble transformations. In this paper, we propose a new hyperstreaming approach that does not require any a priori restrictions. The model of hyperstreaming generalizes on the model of streaming by adding shredding operations for the output stream, so that its parts may be plugged together later on. Many transformations such as flips of document pairs are hyperstreamable but not streamable. We then present the functional language X-Fun for defining transformations between XML data trees, while providing shredding instructions. X-Fun can be understood as an extension of Frisch's XStream language with output shredding, while pattern matching is replaced by tree navigation with XPath expressions. We provide a compiler from XSLT into a fragment of X-Fun, which can be considered as the core of XSLT. We then present a hyperstreaming algorithm for evaluating X-Fun programs which combines a recent XPath evaluator with a traditional functional programming engine. We have implemented a hyperstreaming evaluator for X-Fun and thus for XSLT and compare it experimentally with Saxon's XSLT implementation. It turns out that many XSLT programs become hyperstreamable with good efficiency and without any manual rewriting. Available here.</p
Earliest Query Answering for Deterministic Nested Word Automata
International audienceEarliest query answering (EQA) is an objective of many recent streaming algorithms for XML query answering, that aim for close to optimal memory management. In this paper, we show that EQA is infeasible even for a small fragment of Forward XPath except if P=NP. We then present an EQA algorithm for queries and schemas defined by deterministic nested word automata (dNWAs) and distinguish a large class of dNWAs for which streaming query answering is feasible in polynomial space and time
XQuery Streaming by Forest Transducers
Streaming of XML transformations is a challenging task and only very few
systems support streaming. Research approaches generally define custom
fragments of XQuery and XPath that are amenable to streaming, and then design
custom algorithms for each fragment. These languages have several shortcomings.
Here we take a more principles approach to the problem of streaming
XQuery-based transformations. We start with an elegant transducer model for
which many static analysis problems are well-understood: the Macro Forest
Transducer (MFT). We show that a large fragment of XQuery can be translated
into MFTs --- indeed, a fragment of XQuery, that can express important features
that are missing from other XQuery stream engines, such as GCX: our fragment of
XQuery supports XPath predicates and let-statements. We then rely on a
streaming execution engine for MFTs, one which uses a well-founded set of
optimizations from functional programming, such as strictness analysis and
deforestation. Our prototype achieves time and memory efficiency comparable to
the fastest known engine for XQuery streaming, GCX. This is surprising because
our engine relies on the OCaml built in garbage collector and does not use any
specialized buffer management, while GCX's efficiency is due to clever and
explicit buffer management.Comment: Full version of the paper in the Proceedings of the 30th IEEE
International Conference on Data Engineering (ICDE 2014
Early Nested Word Automata for XPath Query Answering on XML Streams
International audienceolynomial time for disjunctions of k-bounded simpl
Projection for Nested Word Automata Speeds up XPath Evaluation on XML Streams
International audienceWe present an evaluator for navigational XPath on Xmlstreams with projection. The idea is to project away those parts of anXml stream that are irrelevant for evaluating a given XPath query. Thistask is relevant for processing Xml streams in general since all Xmlstandard languages are based on XPath. The best existing streamingalgorithm for navigational XPath queries runs nested word automata.Therefore, we develop a projection algorithm for nested word automata,for the first time to the best of our knowledge. It turns out that projection can speed up the evaluation of navigational XPath queries on Xmlstreams by a factor of 4 in average on the usual XPath benchmarks.The extended version of the document is available in pdf here
Nested Regular Expressions can be Compiled to Small Deterministic Nested Word Automata
International audienceWe study the problem of whether regular expressions for nested words can be compiled to small deterministic nested word au-tomata (NWAs). In theory, we obtain a positive answer for small deter-ministic regular expressions for nested words. In practice of navigational path queries, nondeterministic NWAs are obtained for which NWA de-terminization explodes. We show that practical good solutions can be obtained by using stepwise hedge automata as intermediates
Certain Query Answering on Compressed String Patterns: From Streams to Hyperstreams
International audienceWe study the problem of certain query answering (CQA) on compressed string patterns. These are incomplete singleton context-free grammars, that can model systems of multiple streams with references to others, called hyperstreams more recently. In order to capture regular path queries on strings, we consider nondeterministic finite automata (NFAs) for query definition. It turns out that CQA for Boolean NFA queries is equivalent to regular string pattern inclusion, i.e., whether all strings completing a compressed string pattern belong to a regular language. We prove that CQA on compressed string patterns is PSpace- complete for NFA queries. The PSpace-hardness even applies to Boolean queries defined by deterministic finite automata (DFAs) and without compression. We also show that CQA on compressed linear string patterns can be solved in PTime for DFA queries. The proofs of the results presented here can be found in the long version of this paper (https://hal.inria.fr/hal-01846016)
Certain Query Answering on Compressed String Patterns: From Streams to Hyperstreams (long version)
We study the problem of certain query answering (CQA) on compressed string patterns. These are incomplete singleton context-free grammars, that can model systems of multiple streams with references to others, called hyperstreams more recently. In order to capture regular path queries on strings, we consider nondeterministic finite automata (NFAs) for query definition. It turns out that CQA for Boolean NFA queries is equivalent to regular string pattern inclusion, i.e., whether all strings completing a compressed string pattern belong to a regular language. We prove that CQA on compressed string patterns is PSPACE-complete for NFA queries. The PSPACE-hardness even applies to Boolean queries defined by deterministic finite automata (DFAs) and without compression. We also show that CQA on compressed linear string patterns can be solved in PTIME for DFA queries
Determinization and Minimization of Automata for Nested Words Revisited
International audienceWe consider the problem of determinizing and minimizing automata for nested words in practice. For this we compile the nested regular expressions () from the usual XPath benchmark to nested word automata (). The determinization of these , however, fails to produce reasonably small automata. In the best case, huge deterministic are produced after few hours, even for relatively small of the benchmark. We propose a different approach to the determinization of automata for nested words. For this, we introduce stepwise hedge automata () that generalize naturally on both (stepwise) tree automata and on finite word automata. We then show how to determinize , yielding reasonably small deterministic automata for the from the XPath benchmark. The size of deterministic automata can be reduced further by a novel minimization algorithm for a subclass of . In order to understand why the new approach to determinization and minimization works so nicely, we investigate the relationship between and further. Clearly, deterministic can be compiled to deterministic NWAs in linear time, and conversely, can be compiled to nondeterministic in polynomial time. Therefore, we can use as intermediates for determinizing , while avoiding the huge size increase with the usual determinization algorithm for . Notably, the NWAs obtained from the perform bottom-up and left-to-right computations only, but no top-down computations. This -behavior can be distinguished syntactically by the (weak) single-entry property, suggesting a close relationship between and single-entry . In particular, it turns out that the usual determinization algorithm for behaves well for single-entry , while it quickly explodes without the single-entry property. Furthermore, it is known that the class of deterministic multi-module single-entry enjoys unique minimization. The subclass of deterministic to which our novel minimization algorithm applies is different though, in that we do not impose multiple modules. As further optimizations for reducing the sizes of the constructed , we propose schema-based cleaning and symbolic representations based on apply-else rules, that can be maintained by determinization. We implemented the optimizations and report the experimental results for the automata constructed for the XPathMark benchmark