9 research outputs found

    Streamability of nested word transductions

    Full text link
    We consider the problem of evaluating in streaming (i.e., in a single left-to-right pass) a nested word transduction with a limited amount of memory. A transduction T is said to be height bounded memory (HBM) if it can be evaluated with a memory that depends only on the size of T and on the height of the input word. We show that it is decidable in coNPTime for a nested word transduction defined by a visibly pushdown transducer (VPT), if it is HBM. In this case, the required amount of memory may depend exponentially on the height of the word. We exhibit a sufficient, decidable condition for a VPT to be evaluated with a memory that depends quadratically on the height of the word. This condition defines a class of transductions that strictly contains all determinizable VPTs

    Tree Transducers and Formal Methods (Dagstuhl Seminar 13192)

    Get PDF
    The aim of this Dagstuhl Seminar was to bring together researchers from various research areas related to the theory and application of tree transducers. Recently, interest in tree transducers has been revived due to surprising new applications in areas such as XML databases, security verification, programming language theory, and linguistics. This seminar therefore aimed to inspire the exchange of theoretical results and information regarding the practical requirements related to tree transducers

    Programming Using Automata and Transducers

    Get PDF
    Automata, the simplest model of computation, have proven to be an effective tool in reasoning about programs that operate over strings. Transducers augment automata to produce outputs and have been used to model string and tree transformations such as natural language translations. The success of these models is primarily due to their closure properties and decidable procedures, but good properties come at the price of limited expressiveness. Concretely, most models only support finite alphabets and can only represent small classes of languages and transformations. We focus on addressing these limitations and bridge the gap between the theory of automata and transducers and complex real-world applications: Can we extend automata and transducer models to operate over structured and infinite alphabets? Can we design languages that hide the complexity of these formalisms? Can we define executable models that can process the input efficiently? First, we introduce succinct models of transducers that can operate over large alphabets and design BEX, a language for analysing string coders. We use BEX to prove the correctness of UTF and BASE64 encoders and decoders. Next, we develop a theory of tree transducers over infinite alphabets and design FAST, a language for analysing tree-manipulating programs. We use FAST to detect vulnerabilities in HTML sanitizers, check whether augmented reality taggers conflict, and optimize and analyze functional programs that operate over lists and trees. Finally, we focus on laying the foundations of stream processing of hierarchical data such as XML files and program traces. We introduce two new efficient and executable models that can process the input in a left-to-right linear pass: symbolic visibly pushdown automata and streaming tree transducers. Symbolic visibly pushdown automata are closed under Boolean operations and can specify and efficiently monitor complex properties for hierarchical structures over infinite alphabets. Streaming tree transducers can express and efficiently process complex XML transformations while enjoying decidable procedures

    XQuery Streaming by Forest Transducers

    Full text link
    Streaming of XML transformations is a challenging task and only very few systems support streaming. Research approaches generally define custom fragments of XQuery and XPath that are amenable to streaming, and then design custom algorithms for each fragment. These languages have several shortcomings. Here we take a more principles approach to the problem of streaming XQuery-based transformations. We start with an elegant transducer model for which many static analysis problems are well-understood: the Macro Forest Transducer (MFT). We show that a large fragment of XQuery can be translated into MFTs --- indeed, a fragment of XQuery, that can express important features that are missing from other XQuery stream engines, such as GCX: our fragment of XQuery supports XPath predicates and let-statements. We then rely on a streaming execution engine for MFTs, one which uses a well-founded set of optimizations from functional programming, such as strictness analysis and deforestation. Our prototype achieves time and memory efficiency comparable to the fastest known engine for XQuery streaming, GCX. This is surprising because our engine relies on the OCaml built in garbage collector and does not use any specialized buffer management, while GCX's efficiency is due to clever and explicit buffer management.Comment: Full version of the paper in the Proceedings of the 30th IEEE International Conference on Data Engineering (ICDE 2014

    Streaming Enumeration on Nested Documents

    Get PDF
    Some of the most relevant document schemas used online, such as XML and JSON, have a nested format. In the last decade, the task of extracting data from nested documents over streams has become especially relevant. We focus on the streaming evaluation of queries with outputs of varied sizes over nested documents. We model queries of this kind as Visibly Pushdown Transducers (VPT), a computational model that extends visibly pushdown automata with outputs and has the same expressive power as MSO over nested documents. Since processing a document through a VPT can generate a massive number of results, we are interested in reading the input in a streaming fashion and enumerating the outputs one after another as efficiently as possible, namely, with constant-delay. This paper presents an algorithm that enumerates these elements with constant-delay after processing the document stream in a single pass. Furthermore, we show that this algorithm is worst-case optimal in terms of update-time per symbol and memory usage

    Safe Programming Over Distributed Streams

    Get PDF
    The sheer scale of today\u27s data processing needs has led to a new paradigm of software systems centered around requirements for high-throughput, distributed, low-latency computation.Despite their widespread adoption, existing solutions have yet to provide a programming model with safe semantics -- and they disagree on basic design choices, in particular with their approach to parallelism. As a result, naive programmers are easily led to introduce correctness and performance bugs. This work proposes a reliable programming model for modern distributed stream processing, founded in a type system for partially ordered data streams. On top of the core type system, we propose language abstractions for working with streams -- mechanisms to build stream operators with (1) type-safe compositionality, (2) deterministic distribution, (3) run-time testing, and (4) static performance bounds. Our thesis is that viewing streams as partially ordered conveniently exposes parallelism without compromising safety or determinism. The ideas contained in this work are implemented in a series of open source software projects, including the Flumina, DiffStream, and Data Transducers libraries

    Quantitative and Approximate Monitoring

    Get PDF
    In runtime verification, a monitor watches a trace of a system and, if possible, decides after observing each finite prefix whether or not the unknown infinite trace satisfies a given specification. We generalize the theory of runtime verification to monitors that attempt to estimate numerical values of quantitative trace properties (instead of attempting to conclude boolean values of trace specifications), such as maximal or average response time along a trace. Quantitative monitors are approximate: with every finite prefix, they can improve their estimate of the infinite trace's unknown property value. Consequently, quantitative monitors can be compared with regard to a precision-cost trade-off: better approximations of the property value require more monitor resources, such as states (in the case of finite-state monitors) or registers, and additional resources yield better approximations. We introduce a formal framework for quantitative and approximate monitoring, show how it conservatively generalizes the classical boolean setting for monitoring, and give several precision-cost trade-offs for monitors. For example, we prove that there are quantitative properties for which every additional register improves monitoring precision.Comment: To appear in LICS 2021; corrected a referenc

    Sound Theaters of the 21st Century: Material Functions and Voltaic Performativities

    Get PDF
    corecore