18 research outputs found

    XQuery Streaming by Forest Transducers

    Full text link
    Streaming of XML transformations is a challenging task and only very few systems support streaming. Research approaches generally define custom fragments of XQuery and XPath that are amenable to streaming, and then design custom algorithms for each fragment. These languages have several shortcomings. Here we take a more principles approach to the problem of streaming XQuery-based transformations. We start with an elegant transducer model for which many static analysis problems are well-understood: the Macro Forest Transducer (MFT). We show that a large fragment of XQuery can be translated into MFTs --- indeed, a fragment of XQuery, that can express important features that are missing from other XQuery stream engines, such as GCX: our fragment of XQuery supports XPath predicates and let-statements. We then rely on a streaming execution engine for MFTs, one which uses a well-founded set of optimizations from functional programming, such as strictness analysis and deforestation. Our prototype achieves time and memory efficiency comparable to the fastest known engine for XQuery streaming, GCX. This is surprising because our engine relies on the OCaml built in garbage collector and does not use any specialized buffer management, while GCX's efficiency is due to clever and explicit buffer management.Comment: Full version of the paper in the Proceedings of the 30th IEEE International Conference on Data Engineering (ICDE 2014

    Transforming XML Streams with References

    Get PDF

    Bounded Delay and Concurrency for Earliest Query Answering

    Get PDF
    International audienceEarliest query answering is needed for streaming XML processing with optimal memory management. We study the feasibility of earliest query answering for node selection queries. Tractable queries are distinguished by a bounded number of concurrently alive answer candidates at every time point, and a bounded delay for node selection. We show that both properties are decidable in polynomial time for queries defined by deterministic automata for unranked trees. Our results are obtained by reduction to the bounded valuedness problem for recognizable relations between unranked trees

    Evaluation of XPath Queries against XML Streams

    Get PDF
    XML is nowadays the de facto standard for electronic data interchange on the Web. Available XML data ranges from small Web pages to ever-growing repositories of, e.g., biological and astronomical data, and even to rapidly changing and possibly unbounded streams, as used in Web data integration and publish-subscribe systems. Animated by the ubiquity of XML data, the basic task of XML querying is becoming of great theoretical and practical importance. The last years witnessed efforts as well from practitioners, as also from theoreticians towards defining an appropriate XML query language. At the core of this common effort has been identified a navigational approach for information localization in XML data, comprised in a practical and simple query language called XPath. This work brings together the two aforementioned ``worlds'', i.e., the XPath query evaluation and the XML data streams, and shows as well theoretical as also practical relevance of this fusion. Its relevance can not be subsumed by traditional database management systems, because the latter are not designed for rapid and continuous loading of individual data items, and do not directly support the continuous queries that are typical for stream applications. The first central contribution of this work consists in the definition and the theoretical investigation of three term rewriting systems to rewrite queries with reverse predicates, like parent or ancestor, into equivalent forward queries, i.e., queries without reverse predicates. Our rewriting approach is vital to the evaluation of queries with reverse predicates against unbounded XML streams, because neither the storage of past fragments of the stream, nor several stream traversals, as required by the evaluation of reverse predicates, are affordable. Beyond their declared main purpose of providing equivalences between queries with reverse predicates and forward queries, the applications of our rewriting systems shed light on other query language properties, like the expressivity of some of its fragments, the query minimization, or even the complexity of query evaluation. For example, using these systems, one can rewrite any graph query into an equivalent forward forest query. The second main contribution consists in a streamed and progressive evaluation strategy of forward queries against XML streams. The evaluation is specified using compositions of so-called stream processing functions, and is implemented using networks of deterministic pushdown transducers. The complexity of this evaluation strategy is polynomial in both the query and the data sizes for forward forest queries and even for a large fragment of graph queries. The third central contribution consists in two real monitoring applications that use directly the results of this work: the monitoring of processes running on UNIX computers, and a system for providing graphically real-time traffic and travel information, as broadcasted within ubiquitous radio signals

    Programming Using Automata and Transducers

    Get PDF
    Automata, the simplest model of computation, have proven to be an effective tool in reasoning about programs that operate over strings. Transducers augment automata to produce outputs and have been used to model string and tree transformations such as natural language translations. The success of these models is primarily due to their closure properties and decidable procedures, but good properties come at the price of limited expressiveness. Concretely, most models only support finite alphabets and can only represent small classes of languages and transformations. We focus on addressing these limitations and bridge the gap between the theory of automata and transducers and complex real-world applications: Can we extend automata and transducer models to operate over structured and infinite alphabets? Can we design languages that hide the complexity of these formalisms? Can we define executable models that can process the input efficiently? First, we introduce succinct models of transducers that can operate over large alphabets and design BEX, a language for analysing string coders. We use BEX to prove the correctness of UTF and BASE64 encoders and decoders. Next, we develop a theory of tree transducers over infinite alphabets and design FAST, a language for analysing tree-manipulating programs. We use FAST to detect vulnerabilities in HTML sanitizers, check whether augmented reality taggers conflict, and optimize and analyze functional programs that operate over lists and trees. Finally, we focus on laying the foundations of stream processing of hierarchical data such as XML files and program traces. We introduce two new efficient and executable models that can process the input in a left-to-right linear pass: symbolic visibly pushdown automata and streaming tree transducers. Symbolic visibly pushdown automata are closed under Boolean operations and can specify and efficiently monitor complex properties for hierarchical structures over infinite alphabets. Streaming tree transducers can express and efficiently process complex XML transformations while enjoying decidable procedures

    Equivalence Problems for Tree Transducers: A Brief Survey

    Get PDF
    The decidability of equivalence for three important classes of tree transducers is discussed. Each class can be obtained as a natural restriction of deterministic macro tree transducers (MTTs): (1) no context parameters, i.e., top-down tree transducers, (2) linear size increase, i.e., MSO definable tree transducers, and (3) monadic input and output ranked alphabets. For the full class of MTTs, decidability of equivalence remains a long-standing open problem.Comment: In Proceedings AFL 2014, arXiv:1405.527
    corecore