43 research outputs found

    Stream Processing using Grammars and Regular Expressions

    Full text link
    In this dissertation we study regular expression based parsing and the use of grammatical specifications for the synthesis of fast, streaming string-processing programs. In the first part we develop two linear-time algorithms for regular expression based parsing with Perl-style greedy disambiguation. The first algorithm operates in two passes in a semi-streaming fashion, using a constant amount of working memory and an auxiliary tape storage which is written in the first pass and consumed by the second. The second algorithm is a single-pass and optimally streaming algorithm which outputs as much of the parse tree as is semantically possible based on the input prefix read so far, and resorts to buffering as many symbols as is required to resolve the next choice. Optimality is obtained by performing a PSPACE-complete pre-analysis on the regular expression. In the second part we present Kleenex, a language for expressing high-performance streaming string processing programs as regular grammars with embedded semantic actions, and its compilation to streaming string transducers with worst-case linear-time performance. Its underlying theory is based on transducer decomposition into oracle and action machines, and a finite-state specialization of the streaming parsing algorithm presented in the first part. In the second part we also develop a new linear-time streaming parsing algorithm for parsing expression grammars (PEG) which generalizes the regular grammars of Kleenex. The algorithm is based on a bottom-up tabulation algorithm reformulated using least fixed points and evaluated using an instance of the chaotic iteration scheme by Cousot and Cousot

    Comparison-Free Polyregular Functions.

    Get PDF
    This paper introduces a new automata-theoretic class of string-to-string functions with polynomialgrowth. Several equivalent definitions are provided: a machine model which is a restricted variant ofpebble transducers, and a few inductive definitions that close the class of regular functions undercertain operations. Our motivation for studying this class comes from another characterization,which we merely mention here but prove elsewhere, based on a λ-calculus with a linear type system.As their name suggests, these comparison-free polyregular functions form a subclass of polyregularfunctions; we prove that the inclusion is strict. We also show that they are incomparable withHDT0L transductions, closed under usual function composition – but not under a certain “map”combinator – and satisfy a comparison-free version of the pebble minimization theorem.On the broader topic of polynomial growth transductions, we also consider the recently introducedlayered streaming string transducers (SSTs), or equivalently k-marble transducers. We prove that afunction can be obtained by composing such transducers together if and only if it is polyregular,and that k-layered SSTs (or k-marble transducers) are closed under “map” and equivalent to acorresponding notion of (k + 1)-layered HDT0L systems

    Automata for branching and layered temporal structures: An investigation into regularities of infinite transition systems

    Get PDF
    This manuscript is a revised version of the PhD Thesis I wrote under the supervision of Prof. Angelo Montanari at Udine University. The leitmotif underlying the results herein provided is that, given any infinite complex system (e.g., a computer program) to be verified against a finite set of properties, there often exists a simpler system that satisfies the same properties and, in addition, presents strong regularities (e.g., periodicity) in its structure. Those regularities can then be exploited to decide, in an effective way, which property is satisfied by the system and which is not. Perhaps the most natural and effective way to deal with inherent regularities of infinite systems is through the notion of finite-state automaton. Intuitively, a finite-state automaton is an abstract machine with only a bounded amount of memory at its disposal, which processes an input (e.g., a sequence of symbols) and eventually outputs true or false, depending on the way the machine was designed and on the input itself. The present book focuses precisely on automaton-based approaches that ease the representation of and the reasoning on properties of infinite complex systems. The most simple notion of finite-state automaton, is that of single-string automaton. Such a device outputs true on a single (finite or infinite) sequence of symbols and false on any other sequence. We will show how single-string automata processing infinite sequences of symbols can be successfully applied in various frameworks for temporal representation and reasoning. In particular, we will use them to model single ultimately periodic time granularities, namely, temporal structures that are left-bounded and that, ultimately, periodically group instants of the underlying temporal domain (a simple example of such a structure is given by the partitioning of the temporal domain of days into weeks). The notion of single-string automaton can be further refined by introducing counters in order to compactly represent repeated occurrences of the same subsequence in the given input. By introducing restricted policies of counter update and by exploiting suitable abstractions of the configuration space for the resulting class of automata, we will devise efficient algorithms for reasoning on quasi-periodic time granularities (e.g., the partitioning of the temporal domain of days into years). Similar abstractions can be used when reasoning on infinite branching (temporal) structures. In such a case, one has to consider a generalized notion of automaton, which is able to process labeled branching structures (hereafter called trees), rather than linear sequences of symbols. We will show that sets of trees featuring the same properties can be identified with the equivalence classes induced by a suitable automaton. More precisely, given a property to be verified, one can first define a corresponding automaton that accepts all and only the trees satisfying that property, then introduce a suitable equivalence relation that refines the standard language equivalence and groups all trees being indistinguishable by the automaton, and, finally, exploit such an equivalence to reduce several instances of the verification problem to equivalent simpler instances, which can be eventually decided

    Deciding Linear Height and Linear Size-to-Height Increase for Macro Tree Transducers

    Full text link
    In this paper we study Macro Tree Transducers (MTT), specifically the Linear Height Increase ("LHI") and Linear input Size to output Height ("LSHI") constraints. In order to decide whether a Macro tree transducer (MTT) is of LHI or LSHI, we define a notion of depth-properness: a MTT is depth-proper if, for each state, there is no bound to the depth at which it places its argument trees. We show how to effectively put a MTT in depth-proper form. For MTTs in Depth-proper form, we characterize the LSH property as equivalent to the finite-nesting property, and we characterize the LHI property as equivalent to the finiteness of a new type of nesting which we call Multi-Leaf-nesting (or ML-nesting). As opposed to regular nesting where we look at the nesting of states applied to a single input node, we count the nesting of states applied to nodes that are not ancestors of each other. We use this characterization to give a decision procedure for the LSHI and LHI properties. Finally we consider the decision problem of the LSOI (Linear input Size to number of distinct Output subtrees Increase) property. A long standing open problem is whether MTT of LSOI are as expressive as Attribute Tree Transducers (ATT), in this paper we show that deciding whether a MTT is of LSOI is as hard as deciding the equivalence of ATTs

    Programming Using Automata and Transducers

    Get PDF
    Automata, the simplest model of computation, have proven to be an effective tool in reasoning about programs that operate over strings. Transducers augment automata to produce outputs and have been used to model string and tree transformations such as natural language translations. The success of these models is primarily due to their closure properties and decidable procedures, but good properties come at the price of limited expressiveness. Concretely, most models only support finite alphabets and can only represent small classes of languages and transformations. We focus on addressing these limitations and bridge the gap between the theory of automata and transducers and complex real-world applications: Can we extend automata and transducer models to operate over structured and infinite alphabets? Can we design languages that hide the complexity of these formalisms? Can we define executable models that can process the input efficiently? First, we introduce succinct models of transducers that can operate over large alphabets and design BEX, a language for analysing string coders. We use BEX to prove the correctness of UTF and BASE64 encoders and decoders. Next, we develop a theory of tree transducers over infinite alphabets and design FAST, a language for analysing tree-manipulating programs. We use FAST to detect vulnerabilities in HTML sanitizers, check whether augmented reality taggers conflict, and optimize and analyze functional programs that operate over lists and trees. Finally, we focus on laying the foundations of stream processing of hierarchical data such as XML files and program traces. We introduce two new efficient and executable models that can process the input in a left-to-right linear pass: symbolic visibly pushdown automata and streaming tree transducers. Symbolic visibly pushdown automata are closed under Boolean operations and can specify and efficiently monitor complex properties for hierarchical structures over infinite alphabets. Streaming tree transducers can express and efficiently process complex XML transformations while enjoying decidable procedures

    Probabilistic Logic, Probabilistic Regular Expressions, and Constraint Temporal Logic

    Get PDF
    The classic theorems of BĂĽchi and Kleene state the expressive equivalence of finite automata to monadic second order logic and regular expressions, respectively. These fundamental results enjoy applications in nearly every field of theoretical computer science. Around the same time as BĂĽchi and Kleene, Rabin investigated probabilistic finite automata. This equally well established model has applications ranging from natural language processing to probabilistic model checking. Here, we give probabilistic extensions BĂĽchi\\\''s theorem and Kleene\\\''s theorem to the probabilistic setting. We obtain a probabilistic MSO logic by adding an expected second order quantifier. In the scope of this quantifier, membership is determined by a Bernoulli process. This approach turns out to be universal and is applicable for finite and infinite words as well as for finite trees. In order to prove the expressive equivalence of this probabilistic MSO logic to probabilistic automata, we show a Nivat-theorem, which decomposes a recognisable function into a regular language, homomorphisms, and a probability measure. For regular expressions, we build upon existing work to obtain probabilistic regular expressions on finite and infinite words. We show the expressive equivalence between these expressions and probabilistic Muller-automata. To handle Muller-acceptance conditions, we give a new construction from probabilistic regular expressions to Muller-automata. Concerning finite trees, we define probabilistic regular tree expressions using a new iteration operator, called infinity-iteration. Again, we show that these expressions are expressively equivalent to probabilistic tree automata. On a second track of our research we investigate Constraint LTL over multidimensional data words with data values from the infinite tree. Such LTL formulas are evaluated over infinite words, where every position possesses several data values from the infinite tree. Within Constraint LTL on can compare these values from different positions. We show that the model checking problem for this logic is PSPACE-complete via investigating the emptiness problem of Constraint BĂĽchi automata
    corecore