429 research outputs found

    Bounded Delay and Concurrency for Earliest Query Answering

    Get PDF
    International audienceEarliest query answering is needed for streaming XML processing with optimal memory management. We study the feasibility of earliest query answering for node selection queries. Tractable queries are distinguished by a bounded number of concurrently alive answer candidates at every time point, and a bounded delay for node selection. We show that both properties are decidable in polynomial time for queries defined by deterministic automata for unranked trees. Our results are obtained by reduction to the bounded valuedness problem for recognizable relations between unranked trees

    Earliest Query Answering for Deterministic Nested Word Automata

    Get PDF
    International audienceEarliest query answering (EQA) is an objective of many recent streaming algorithms for XML query answering, that aim for close to optimal memory management. In this paper, we show that EQA is infeasible even for a small fragment of Forward XPath except if P=NP. We then present an EQA algorithm for queries and schemas defined by deterministic nested word automata (dNWAs) and distinguish a large class of dNWAs for which streaming query answering is feasible in polynomial space and time

    Early = Earliest?

    No full text
    Early query answering is the core issue of memory efficient query evaluation on data streams. The idea is to select and reject answer candidates as early as possible on the stream, so that they do not have to be stored in main memory. Since earliest query answering is unfeasible for XPath, as first no- ticed by Benedikt, Jeffrey and Ley-Wild in 2008, most exist- ing streaming algorithms for XPath approximate it in some early manner, while focussing on high time efficiency. Such approximations, however, spoil all theoretical guarantees on memory efficiency. In this paper, we prove that earliest query answering is indeed feasible for positive Forward XPath queries, which have neither unsatisfiable nor valid subqueries. The core in- sight is that a variant of Colmerauer's independence property can be proven for the corresponding fragment of the FXP tree logic. Based on this independence property, we can show that the early query answering algorithm from [13], which is based on a compiler from FXP to early nested word automata, is indeed earliest for all positive FXP0 queries with neither unsatisfiable nor valid subformulas. Further- more, this algorithm outperforms most previous algorithms for XPath evaluation on XML streams in time efficiency and coverage, as shown elsewhere. Available here.</p

    XQuery Streaming by Forest Transducers

    Full text link
    Streaming of XML transformations is a challenging task and only very few systems support streaming. Research approaches generally define custom fragments of XQuery and XPath that are amenable to streaming, and then design custom algorithms for each fragment. These languages have several shortcomings. Here we take a more principles approach to the problem of streaming XQuery-based transformations. We start with an elegant transducer model for which many static analysis problems are well-understood: the Macro Forest Transducer (MFT). We show that a large fragment of XQuery can be translated into MFTs --- indeed, a fragment of XQuery, that can express important features that are missing from other XQuery stream engines, such as GCX: our fragment of XQuery supports XPath predicates and let-statements. We then rely on a streaming execution engine for MFTs, one which uses a well-founded set of optimizations from functional programming, such as strictness analysis and deforestation. Our prototype achieves time and memory efficiency comparable to the fastest known engine for XQuery streaming, GCX. This is surprising because our engine relies on the OCaml built in garbage collector and does not use any specialized buffer management, while GCX's efficiency is due to clever and explicit buffer management.Comment: Full version of the paper in the Proceedings of the 30th IEEE International Conference on Data Engineering (ICDE 2014

    Streaming Tree Automata

    Get PDF
    International audienceStreaming validation and querying of XML documents are often based on automata for tree-like structures. We propose a new notion of streaming tree automata in order to unify the two main approaches, which have not been linked so far: automata for nested words or equivalently visibly pushdown automata, and respectively pushdown forest automata

    Streaming Enumeration on Nested Documents

    Get PDF
    Some of the most relevant document schemas used online, such as XML and JSON, have a nested format. In the last decade, the task of extracting data from nested documents over streams has become especially relevant. We focus on the streaming evaluation of queries with outputs of varied sizes over nested documents. We model queries of this kind as Visibly Pushdown Transducers (VPT), a computational model that extends visibly pushdown automata with outputs and has the same expressive power as MSO over nested documents. Since processing a document through a VPT can generate a massive number of results, we are interested in reading the input in a streaming fashion and enumerating the outputs one after another as efficiently as possible, namely, with constant-delay. This paper presents an algorithm that enumerates these elements with constant-delay after processing the document stream in a single pass. Furthermore, we show that this algorithm is worst-case optimal in terms of update-time per symbol and memory usage

    Streamability of nested word transductions

    Full text link
    We consider the problem of evaluating in streaming (i.e., in a single left-to-right pass) a nested word transduction with a limited amount of memory. A transduction T is said to be height bounded memory (HBM) if it can be evaluated with a memory that depends only on the size of T and on the height of the input word. We show that it is decidable in coNPTime for a nested word transduction defined by a visibly pushdown transducer (VPT), if it is HBM. In this case, the required amount of memory may depend exponentially on the height of the word. We exhibit a sufficient, decidable condition for a VPT to be evaluated with a memory that depends quadratically on the height of the word. This condition defines a class of transductions that strictly contains all determinizable VPTs

    Early Nested Word Automata for XPath Query Answering on XML Streams

    Get PDF
    International audienceolynomial time for disjunctions of k-bounded simpl

    Certain Query Answering on Compressed String Patterns: From Streams to Hyperstreams (long version)

    Get PDF
    We study the problem of certain query answering (CQA) on compressed string patterns. These are incomplete singleton context-free grammars, that can model systems of multiple streams with references to others, called hyperstreams more recently. In order to capture regular path queries on strings, we consider nondeterministic finite automata (NFAs) for query definition. It turns out that CQA for Boolean NFA queries is equivalent to regular string pattern inclusion, i.e., whether all strings completing a compressed string pattern belong to a regular language. We prove that CQA on compressed string patterns is PSPACE-complete for NFA queries. The PSPACE-hardness even applies to Boolean queries defined by deterministic finite automata (DFAs) and without compression. We also show that CQA on compressed linear string patterns can be solved in PTIME for DFA queries

    A Benchmark Collection of Deterministic Automata for XPath Queries

    Get PDF
    International audienceWe provide a benchmark collection of deterministic automatafor regular XPath queries. For this, we select the subcollection offorward navigational XPath queries from a corpus that Lick and Schmitzextracted from real-world XSLT and XQuery programs, compile them tostepwise hedge automata (SHAs), and determinize them. Large blowups by automatadeterminization are avoided by using schema-based determinization. The schemacaptures the \XML data model and the fact thatany answer of a path query must return a single node.Our collection also provides deterministic nested word automatathat we obtain by compilation from deterministic SHAs
    • …
    corecore